# Capacity Planning and Scaling Analysis

**Purpose**: Understand resource requirements and scaling options
**Audience**: System architects, capacity planners, infrastructure teams
**Status**: Comprehensive scaling reference

---

## Part 1: WDMaker Current Baseline

### System Configuration
- **Total sites**: 568
- **Main batch**: 517 sites (batch 001)
- **Final batch**: 1 site (batch 010)
- **Archive batches**: 48 sites (batches 002-009)

### Resource Usage

#### Memory per Agent
| Component | Usage |
|-----------|-------|
| Opus agent baseline | 50MB |
| Site processing overhead | 50-100MB |
| Total per agent | 100-150MB |

**Calculation for batch 001**:
- 25 agents/wave × 9 waves sequential
- Peak concurrent: 25 agents
- Peak memory: 25 × 125MB = 3.125GB
- Safe margin (2x): 6GB recommended

#### Disk Space per Site
| Component | Size |
|-----------|------|
| Design file (DESIGN.md) | 2-5KB |
| HTML file (index.html) | 5-15KB |
| CSS file (styles.css) | 3-10KB |
| JS file (script.js) | 2-8KB |
| **Total per site** | **12-38KB** |

**Calculation for batch 001**:
- 517 sites × 25KB average = 12.9MB generated
- Plus registry and metadata: ~1MB
- Total disk: ~14MB

**For 568 total sites**: ~18MB

#### CPU Usage
- **Baseline**: 1-2 CPU cores per agent
- **Peak usage**: 25 agents × 1.5 cores = 37.5 core-hours
- **On 8-core system**: Oversubscribed 4-5x (expected for LLM work)
- **System handling**: Queuing and time-sharing

#### Network (If Remote)
- **Design file transfer**: 5KB × 568 = 2.84MB
- **Generation results**: 18MB
- **Registry updates**: Continuous, lightweight
- **Total bandwidth**: < 30MB for entire project

---

## Part 2: Scaling Analysis

### Scenario 1: Scale to 5,000 Sites

**Approach**: Keep wave size, add more waves

**Configuration**:
- 25 agents per wave (same as now)
- 200 total waves needed (5000 / 25)
- Sequential waves: 200 waves × 30 min average = 100 hours

**Optimization 1**: Run waves in parallel
- 5 concurrent wave groups × 40 waves each
- Parallel execution: 40 × 30 min = 20 hours

**Optimization 2**: Increase wave size
- 50 agents per wave (2x)
- 100 total waves
- Concurrent: 20 waves × 30 min = 10 hours

**Optimization 3**: Combine both
- 50 agents per wave
- 10 concurrent wave groups
- 10 waves sequential per group
- Time: 10 × 30 min = 5 hours

**Resource requirements for 5,000 sites**:

| Resource | Calculation | Amount |
|----------|---|---|
| Memory | 50 agents × 125MB | 6.25GB |
| Disk | 5,000 × 25KB | 125MB |
| CPU | 50 agents × 1.5 cores | 75 cores (oversubscribed) |
| Network | Total data < 100MB | Negligible |

**Bottleneck**: Memory (need 6-8GB available)

---

### Scenario 2: Scale to 10,000 Sites

**Time calculation**:
- Design phase: 10,000 × 10 min = 100,000 min = 167 hours
  - Optimization: Deterministic design (10 sec) = 28 hours
- Implementation phase: 10,000 ÷ 50 agents ÷ 2 sites/min = 100 hours
- Total: 128+ hours (~5 days)

**Optimization to reach 1 day**:
- Deterministic design: 28 hours
- Parallel implementation: 50 agents, 10 concurrent groups
- Target: 24-32 hours

**Resource requirements**:

| Resource | Peak Usage |
|----------|-----------|
| Memory | 8-10GB |
| Disk | 250MB |
| CPU | 75-100 cores |
| Network | ~150MB |

**Feasibility**: Tight but achievable on modern infrastructure

---

## Part 3: Hardware Recommendations

### For WDMaker (568 sites) - Current

**Minimum**:
- Memory: 4GB
- Disk: 50GB (includes margin)
- CPU: 4 cores
- Network: 10Mbps

**Recommended**:
- Memory: 8GB
- Disk: 100GB
- CPU: 8 cores
- Network: 100Mbps

**Optimal**:
- Memory: 16GB
- Disk: 200GB
- CPU: 16 cores
- Network: 1Gbps

### For 5,000 Sites

**Minimum**:
- Memory: 8GB
- Disk: 200GB
- CPU: 8 cores
- Network: 10Mbps

**Recommended**:
- Memory: 16GB
- Disk: 500GB
- CPU: 16 cores
- Network: 100Mbps

**Optimal**:
- Memory: 32GB
- Disk: 1TB
- CPU: 32 cores
- Network: 1Gbps

### For 10,000 Sites

**Minimum**:
- Memory: 16GB
- Disk: 500GB
- CPU: 16 cores
- Network: 100Mbps

**Recommended**:
- Memory: 32GB
- Disk: 1TB
- CPU: 32 cores
- Network: 1Gbps

**Optimal**:
- Memory: 64GB
- Disk: 2TB
- CPU: 64 cores
- Network: 10Gbps

---

## Part 4: Bottleneck Analysis

### Current System (568 sites)

**Likely bottleneck**: Agent execution time
- Design: Opus agent takes 5-10 min per site
- Implementation: Opus agent takes 5-10 min per site
- Sequential waves hide parallelism benefits

**Solution**: Increase wave concurrency
- Run multiple waves in parallel (requires more memory)
- Speedup: 2-4x with 8-16GB memory

### Scaled System (5,000+ sites)

**Likely bottleneck**: Memory or disk I/O
- 50+ agents in memory simultaneously
- Continuous disk writes (registry, designs, outputs)
- Network (if remote)

**Solution hierarchy**:
1. **Most impactful**: Deterministic design generation (10-20x speedup)
2. **Next**: Parallel waves (2-4x speedup)
3. **Then**: Faster hardware (1.5-2x speedup)
4. **Finally**: Code optimization (1.1-1.3x speedup)

**Combined effect**: 10-50x total speedup achievable

---

## Part 5: Cost Analysis

### Current WDMaker (568 sites)

**Infrastructure cost** (cloud example, AWS):
- t3.large instance (8GB, 2 vCPU): $0.10/hour × 9 hours = $0.90
- Storage (100GB, 9 hours): $0.02
- **Total**: < $1

**Agent cost** (Anthropic API):
- ~500 Opus calls × $0.015 = $7.50
- ~5,000 Haiku orchestration calls × $0.0001 = $0.50
- **Total**: ~$8

**Total project cost**: ~$10 (extremely cost-effective)

### Scaled to 5,000 Sites

**Infrastructure cost** (with optimization, 20 hours):
- Memory: Need 16GB (d3.xlarge class)
- d3.2xlarge (16GB, 8 vCPU): $0.30/hour × 20 hours = $6
- Storage: $0.10
- **Total**: ~$6

**Agent cost**:
- ~5,000 Opus calls × $0.015 = $75
- ~50,000 Haiku calls × $0.0001 = $5
- **Total**: ~$80

**Cost per site**: $86 / 5000 = $0.017 per site

---

## Part 6: Time vs. Cost Trade-offs

### WDMaker (568 sites)

| Approach | Time | Cost | Cost/Site |
|----------|------|------|-----------|
| Sequential (1 agent) | 72 hours | $8 | $0.014 |
| Current (25 agents, 9 waves) | 9 hours | $10 | $0.018 |
| Optimized (50 agents, concurrent) | 5 hours | $12 | $0.021 |

**Insight**: Small time savings not worth additional cost

### 5,000 Sites

| Approach | Time | Cost | Cost/Site |
|----------|------|------|-----------|
| Sequential (1 agent) | 600 hours | $120 | $0.024 |
| Parallel waves (25 agents) | 150 hours | $140 | $0.028 |
| Optimized + deterministic | 20 hours | $86 | $0.017 |

**Insight**: Deterministic design is critical for scaling

---

## Part 7: Optimization ROI

### Investment vs. Return

**For deterministic design generation**:
- Development cost: 20-40 hours
- Time saved per project: 50-100 hours
- ROI breakeven: After 2-4 projects
- **Recommendation**: Implement if running 3+ similar projects

**For parallel wave orchestration**:
- Development cost: 10-20 hours
- Time saved per project: 3-6 hours
- ROI breakeven: After 2-4 projects
- **Recommendation**: Implement if running 3+ similar projects

**For custom optimization**:
- Development cost: 50-100 hours
- Time saved per project: 20-50 hours
- ROI breakeven: After 2-5 projects
- **Recommendation**: Implement if running 5+ similar projects

---

## Part 8: Scaling Strategy Recommendations

### For Next Project (1-2 similar projects planned)

**Stick with current architecture**:
- Proven reliable
- Minimal overhead
- Good enough for 1,000 sites

**Invest in**:
- Documentation (already done!)
- Operational procedures
- Team training

### For Repeated Projects (3-5+ planned)

**Optimize architecture**:
- Implement deterministic design
- Add parallel wave orchestration
- Build automated deployment

**Invest in**:
- 50-80 hours optimization development
- Infrastructure planning
- Team skill building

### For Industrial Scale (10+ projects)

**Build dedicated platform**:
- Complete automation
- Distributed processing
- Advanced optimization
- Custom agents for each phase

**Invest in**:
- 200-400 hours platform development
- Dedicated operations team
- Continuous monitoring/optimization

---

## Capacity Planning Worksheet

**For your specific project** (fill in):

```
Project Size: _____ sites
Timeline Available: _____ hours
Team Size: _____ people
Hardware Available: _____ GB RAM, _____ GB disk, _____ CPU cores

Calculations:
Design time: _____ sites × 10 min = _____ hours
Implementation time: _____ sites ÷ 25 agents ÷ 1 site/min = _____ hours
Total time: _____ hours

Does available time > calculated time?
  YES: Can proceed with current approach
  NO: Need to optimize (parallel waves, faster design, more agents)

Memory needed: _____ agents × 125MB = _____ GB
Do you have this available?
  YES: Proceed
  NO: Reduce concurrent agents or add memory

Disk needed: _____ sites × 25KB = _____ MB
Do you have this available?
  YES: Proceed
  NO: Archive previous batches or add disk

Bottleneck Analysis:
Most constrained resource: [Memory / Disk / Time / CPU]
Recommendation: [Add resource / Optimize code / Reduce scope]
```

---

## Conclusion

**Key insights**:
1. **WDMaker is highly efficient** - Only ~$10 total cost for 568 sites
2. **Current architecture scales** - Can handle up to 5,000 sites with optimization
3. **Deterministic design is key** - Biggest speedup lever for scaling
4. **Hardware is rarely bottleneck** - Modern systems have plenty capacity
5. **Parallelism matters** - 4-10x speedup possible with optimization

---

*Capacity Planning and Scaling Analysis: 2026-03-24*
*Purpose: Understanding resource needs and scaling options*
*Status: Ready for capacity planning decisions*
