muhan
.
ai
Neural Architecture Research
Model Architecture
Transformer v4
Parameters
1.7T
Training Tokens
14.2T
Benchmark Score
94.7%
Latency
12ms
Active Nodes
2,048
Research Areas
Neural Architecture
Input
Hidden 1
Hidden 2
Output
layers
96
attention_heads
128
hidden_dim
12,288
vocab_size
128,000
context_length
128K
activation
SwiGLU
Reasoning
Code
Math
Language
Vision
Safety
Training Progress
0
Epoch
100
accuracy
loss
final_loss
0.0142
accuracy
97.3%
epochs
100
lr_schedule
cosine
batch_size
4,096
optimizer
AdamW
Model Stats
1.7T
Parameters
14.2T
Tokens
96
Layers
128K
Context
Benchmarks
MMLU
94.7
HumanEval
91.2
GSM8K
97.1
ARC-C
96.3
HellaSwag
95.8
TruthfulQA
78.4
MMLU
HE
GSM
ARC
HS
TQA
Attention Heatmap
Head 0 / Layer 48
Query tokens
Key tokens
Inference
12ms
Latency
847
tok/s
99.97%
Uptime
8x
H100 GPUs
Data Pipeline
>_
Ingest
14.2T tokens
</>
Clean
Quality filter
[:]
Tokenize
128K vocab
AI
Train
96 layers
Publications
Efficient Attention Mechanisms for Long-Context Models
NeurIPS 2025
Scaling Laws for Neural Architecture Search
ICML 2025
Robust Alignment via Constitutional Training
ICLR 2026
Scaling Analysis
10M
Parameters
1.7T
Score
Safety
99.2%
Harmless
97.8%
Truthful
A+
RLHF Grade
0.04%
Refusal Rate