muhan.ai

Neural Architecture Research

Model Architecture Transformer v4
Parameters 1.7T
Training Tokens 14.2T
Benchmark Score 94.7%
Latency 12ms
Active Nodes 2,048

Research Areas

Neural Architecture

Input Hidden 1 Hidden 2 Output
layers96
attention_heads128
hidden_dim12,288
vocab_size128,000
context_length128K
activationSwiGLU
Reasoning Code Math Language Vision Safety

Training Progress

0 Epoch 100 accuracy loss
final_loss0.0142
accuracy97.3%
epochs100
lr_schedulecosine
batch_size4,096
optimizerAdamW

Model Stats

1.7T Parameters
14.2T Tokens
96 Layers
128K Context

Benchmarks

MMLU
94.7
HumanEval
91.2
GSM8K
97.1
ARC-C
96.3
HellaSwag
95.8
TruthfulQA
78.4
MMLU HE GSM ARC HS TQA

Attention Heatmap

Head 0 / Layer 48
Query tokens Key tokens

Inference

12ms Latency
847 tok/s
99.97% Uptime
8x H100 GPUs

Data Pipeline

>_
Ingest 14.2T tokens
</>
Clean Quality filter
[:]
Tokenize 128K vocab
AI
Train 96 layers

Publications

Efficient Attention Mechanisms for Long-Context Models NeurIPS 2025
Scaling Laws for Neural Architecture Search ICML 2025
Robust Alignment via Constitutional Training ICLR 2026

Scaling Analysis

10M Parameters 1.7T Score

Safety

99.2% Harmless
97.8% Truthful
A+ RLHF Grade
0.04% Refusal Rate