muhan.ai

Neural Architecture Research

Model Architecture Transformer v4

Parameters 1.7T

Training Tokens 14.2T

Benchmark Score 94.7%

Latency 12ms

Active Nodes 2,048

Research Areas

Neural Architecture

layers96

attention_heads128

hidden_dim12,288

vocab_size128,000

context_length128K

activationSwiGLU

Training Progress

final_loss0.0142

accuracy97.3%

epochs100

lr_schedulecosine

batch_size4,096

optimizerAdamW

Model Stats

1.7T Parameters

14.2T Tokens

96 Layers

128K Context

Benchmarks

MMLU

94.7

HumanEval

91.2

GSM8K

97.1

ARC-C

96.3

HellaSwag

95.8

TruthfulQA

78.4

Attention Heatmap

Head 0 / Layer 48

Query tokens Key tokens

Inference

12ms Latency

847 tok/s

99.97% Uptime

8x H100 GPUs

Data Pipeline

Ingest 14.2T tokens

</>

Clean Quality filter

[:]

Tokenize 128K vocab

Train 96 layers

Publications

Efficient Attention Mechanisms for Long-Context Models NeurIPS 2025

Scaling Laws for Neural Architecture Search ICML 2025

Robust Alignment via Constitutional Training ICLR 2026

Scaling Analysis

Safety

99.2% Harmless

97.8% Truthful

A+ RLHF Grade

0.04% Refusal Rate