# WDMaker Performance Optimization Guide

**Purpose**: Strategies to achieve 3-10x speedup for batch 001 and future projects
**Scope**: Design phase, implementation phase, finalization, and system tuning
**Difficulty**: Advanced (requires architectural understanding)

---

## Executive Summary: Performance Levers

| Lever | Current | Optimized | Speedup | Effort | Impact |
|-------|---------|-----------|---------|--------|--------|
| Agents per wave | 25 | 50 | 1.5x | Low | Immediate |
| Wave parallelism | Sequential | 3 concurrent | 2x | Medium | High |
| Design phase | Opus agents | Deterministic | 3-5x | High | Design bottleneck |
| Implementation parallel | Wave-based | Batch-based | 2-3x | Medium | High |
| Verification timing | Post-generation | Concurrent | 1.5x | Medium | Medium |
| Resource pooling | Per-wave | Shared | 1.2x | Low | Low |
| **Combined Potential** | **1 baseline** | **Optimized** | **10x** | - | **Massive** |

---

## Optimization 1: Increase Agents Per Wave

### Current Configuration
- 25 agents per wave
- Total waves: 9
- Total agents: 225
- Time per wave: ~8 minutes
- Bottleneck: Agent initialization and coordination

### Optimization: 50 Agents Per Wave

```bash
# Modify wave deployment
tools/implement/mimplement-bg.sh --batch 001 --max-agents 50

# Expected results
# - 50 agents per wave instead of 25
# - Time per wave: ~4 minutes (not linear - some overhead)
# - Total agents: 450 (more concurrent resource use)
# - Overall speedup: 1.5-2x
```

### Benefits
- ✅ 50% more parallelism
- ✅ Same infrastructure
- ✅ Immediate implementation
- ✅ Low risk

### Trade-offs
- ⚠️ Increased memory usage (~2x)
- ⚠️ Higher network I/O during registry updates
- ⚠️ May hit system resource limits
- ⚠️ Requires testing before large deployment

### Implementation Steps

1. **Verify System Capacity**
```bash
# Check current resource limits
free -h  # Need 2x RAM available
df -h .smbatcher/  # Need sustained write capacity
```

2. **Adjust Wave Configuration**
```bash
# Modify script or command
# Change MAX_AGENTS from 25 to 50
# Update deployment script
```

3. **Test with Small Batch**
```bash
# Deploy 50 agents on single wave first
# Monitor resource usage
# Verify all agents complete successfully
```

4. **Full Rollout**
```bash
# Deploy remaining waves with 50 agents each
# Monitor system health continuously
```

### Expected Performance
- **Speedup**: 1.5-2x overall
- **Resource**: 2x memory, similar disk/network
- **Risk**: Low (can revert to 25 if needed)

---

## Optimization 2: Run Waves in Parallel

### Current Configuration
- Wave 1 starts, waits for completion
- Wave 2 starts after Wave 1 completes
- Total execution time: 9 waves × 8 min = 72 minutes

### Optimization: 3 Concurrent Waves

```bash
# Deploy Wave 1, 2, 3 simultaneously
tools/implement/mimplement-bg.sh --batch 001 --wave 1 --max-agents 25 &
tools/implement/mimplement-bg.sh --batch 001 --wave 2 --max-agents 25 &
tools/implement/mimplement-bg.sh --batch 001 --wave 3 --max-agents 25 &

# Expected: All 75 agents running concurrently
# 3 independent waves = 3x parallelism
# Time: 72 min → 24 min for waves 1-9
```

### Benefits
- ✅ 3x speedup for execution
- ✅ Better resource utilization
- ✅ Fault isolation (one wave failure doesn't stop others)

### Trade-offs
- ⚠️ 3x resource usage (75 concurrent agents)
- ⚠️ Complex orchestration
- ⚠️ Registry contention (3 waves updating simultaneously)
- ⚠️ Debugging harder with concurrent waves

### Implementation Complexity

**Simple**: Sequential wave speedup
```bash
# Just deploy each wave immediately without waiting
# OS scheduler handles concurrency
# Risk: Lower, proven approach
```

**Advanced**: Explicit orchestration
```bash
# Track wave completion via monitoring
# Start next wave only when ready
# Better resource control
```

### Resource Analysis

**With 3 Concurrent Waves** (75 agents):
- RAM needed: ~7.5GB (100MB × 75)
- Disk bandwidth: High sustained writes
- Network: Moderate (registry reads/writes)
- CPU: Depends on implementation complexity

**Recommendation**: Monitor real resource usage on Wave 1-3, then extrapolate

### Expected Performance
- **Speedup**: 3x for execution
- **Resource**: 3x agents, sustained
- **Risk**: Medium (untested configuration)
- **Complexity**: Medium (requires orchestration)

---

## Optimization 3: Design Phase Acceleration

### Current Configuration
- Each domain: Opus agent generates unique DESIGN.md
- Per-site design time: 5-10 minutes
- Total design phase: ~3 hours for 568 sites
- Bottleneck: AI-driven design decision making

### Optimization: Deterministic Design Generation

**Problem Analysis**:
- Most sites need same color palette (corporate brand)
- Most sites need same typography (brand guidelines)
- Most sites need same layout framework
- Only varying: Domain-specific content/images

**Solution**: Generate design programmatically

```python
# Pseudo-code: Deterministic design generation
def generate_design(domain_name, category):
    # Design components based on pattern

    # 1. Color palette from brand guidelines
    colors = BRAND_COLORS[category] or BRAND_COLORS["default"]

    # 2. Typography from guidelines
    fonts = BRAND_FONTS[category] or BRAND_FONTS["default"]

    # 3. Layout from templates
    layout = LAYOUT_TEMPLATES[category] or LAYOUT_TEMPLATES["default"]

    # 4. Domain-specific customization
    if is_corporate(domain):
        apply_corporate_theme()
    elif is_creative(domain):
        apply_creative_theme()

    # 5. Generate DESIGN.md with all specs
    return format_design_md(colors, fonts, layout)
```

### Benefits
- ✅ 5-10x speedup (from 5-10 min to 30-60 seconds)
- ✅ Consistent design across catalog
- ✅ Eliminates AI latency
- ✅ Deterministic (reproducible)

### Trade-offs
- ⚠️ Less creative variation per site
- ⚠️ Requires upfront design guideline work
- ⚠️ Implementation effort (1-2 hours)
- ⚠️ Design less "personalized"

### Implementation Steps

1. **Define Design Rules**
```
IF domain_type == "corporate":
  - Color palette: Corporate Blue
  - Fonts: Helvetica/Arial
  - Layout: Formal grid
  - Typography: Business

IF domain_type == "creative":
  - Color palette: Vibrant Mix
  - Fonts: Display/Serif
  - Layout: Asymmetric
  - Typography: Expressive
```

2. **Implement Generation Function**
```python
# Create deterministic design generator
# Input: Domain name, category
# Output: Valid DESIGN.md file
```

3. **Test on Sample Domains**
```bash
# Generate DESIGN.md for 10 test domains
# Verify output format and quality
# Get stakeholder approval
```

4. **Integrate into Workflow**
```bash
# Replace Opus design agent with script
# Use for batch 010 onwards
# Rerun batch 001 design if desired
```

### Expected Performance
- **Design Phase**: 3 hours → 15 minutes
- **Total Speedup**: ~15-20% for full project
- **Implementation**: 2-4 hours
- **Risk**: Low (design stays valid)

---

## Optimization 4: Parallel Implementation Batches

### Current Configuration
- Batch 001 processed sequentially via waves
- Waves don't start next batch until previous finishes
- Sequential: Batch 001 → Finalize → Batch 010

### Optimization: Process Batches in Parallel

**Strategy**: When batch 001 at 90% complete, start batch 010 design

```bash
# Timeline with parallel processing
14:30: Wave 1 starts batch 001 implementation
14:35: Wave 2 starts batch 001 implementation
14:40: Wave 3 starts batch 001 implementation
14:45: Wave 4 starts batch 001 implementation
...
18:00: Batch 001 approaches 90% completion
18:05: START batch 010 design (doesn't wait for finalization)
18:15: Batch 001 at 100%, begin finalization
18:20: Finalization complete, batch 010 design likely done
18:25: START batch 010 implementation
18:35: All work overlapping
```

### Benefits
- ✅ Reduces overall project time (parallelism)
- ✅ Pipelines work across batches
- ✅ Better resource utilization

### Trade-offs
- ⚠️ Complex coordination
- ⚠️ Higher resource contention
- ⚠️ Debugging harder with overlapping batches
- ⚠️ Risk: Batch 001 finalization could affect batch 010

### Expected Performance
- **Speedup**: 20-30% (overlap design/implement)
- **Complexity**: High
- **Risk**: Medium
- **Recommended**: Only after mastering sequential approach

---

## Optimization 5: Concurrent Verification

### Current Configuration
- Generate files → Run ALL verifications → Mark complete
- Verification happens serially (all checks, then mark)
- Potential bottleneck: Compliance checking

### Optimization: Progressive Verification

**Strategy**: Start verification as soon as each file is generated

```bash
# Current: Sequential
Generate HTML
├─ Generate CSS
├─ Generate JS
└─ Run all verifications
    └─ Mark complete

# Optimized: Concurrent
Generate HTML → Start HTML verification
Generate CSS → Start CSS verification
Generate JS → Start JS verification
            └─ All complete → Mark complete
```

### Benefits
- ✅ 20-30% faster verification
- ✅ Overlaps I/O with computation
- ✅ Better resource utilization

### Trade-offs
- ⚠️ More complex implementation
- ⚠️ Harder to debug partial states
- ⚠️ Must handle failed verification mid-process

### Expected Performance
- **Verification Speedup**: 20-30%
- **Overall Speedup**: 5-10%
- **Complexity**: Medium
- **Risk**: Low (isolated to verification phase)

---

## Optimization 6: Registry Write Optimization

### Current Configuration
- Every status update: Read → Modify → Write full registry
- Full file write for each site completion
- Network I/O can bottleneck at scale

### Optimization: Batch Registry Updates

**Strategy**: Queue 5-10 status updates, write once

```bash
# Current: 517 registry writes (one per site)
# Optimized: ~60 registry writes (batch 5-10 sites)

# Benefits:
# - 8-10x fewer write operations
# - 8-10x fewer disk I/O
# - 20-30% faster overall
```

### Implementation
- Modify complete.sh to queue updates
- Write every 5 sites or every 30 seconds
- Guarantee atomicity even with batching

### Expected Performance
- **Registry Write Speedup**: 8-10x
- **Overall Speedup**: 10-15%
- **Complexity**: High
- **Risk**: Medium (atomicity guarantee harder)

---

## Optimization 7: Hardware Upgrades

### If Software Optimization Not Sufficient

| Component | Upgrade | Speedup | Cost | Complexity |
|-----------|---------|---------|------|-------------|
| CPU (8 → 16 cores) | Bigger processor | 1.2x | High | Low |
| RAM (16GB → 64GB) | More memory | 1.5x | Medium | Low |
| Storage (HDD → SSD) | Faster disk | 2-3x | Medium | Low |
| Network | Gigabit connection | 2x | Low | Low |

### Most Impactful
1. **SSD upgrade**: 2-3x disk I/O improvement
2. **More RAM**: Eliminates swapping (10-50x if swapping)
3. **Better CPU**: Modest improvement (1.2-1.5x)
4. **Network**: If remote (significant)

---

## Combined Optimization Roadmap

### Phase 1: Low-Risk, High-Impact (Today)
1. **Increase agents/wave**: 25 → 50
   - Speedup: 1.5x
   - Risk: Low
   - Effort: 30 minutes

2. **Test wave parallelism**: Run 3 waves concurrently
   - Speedup: 2x
   - Risk: Medium
   - Effort: 1 hour

**Expected Phase 1 Speedup**: 3x overall

### Phase 2: Medium-Risk, High-Effort (Next Sprint)
1. **Deterministic design generation**
   - Speedup: 5x for design phase
   - Risk: Low
   - Effort: 4 hours

2. **Batch registry writes**
   - Speedup: 1.15x overall
   - Risk: Medium
   - Effort: 3 hours

**Expected Phase 2 Speedup**: 2x additional (6x total from baseline)

### Phase 3: Architectural (Long-term)
1. **Distributed processing** (multiple machines)
2. **GPU acceleration** for design/verification
3. **Machine learning** optimization

**Expected Phase 3 Speedup**: 2-5x additional

---

## Benchmarking Framework

### How to Measure Improvement

```bash
#!/bin/bash
# benchmark.sh - Measure actual speedup

TIMESTAMP=$(date +%s)
START_I=$(tools/shared/list-sites.sh --batch 001 --status "I" | wc -l)

echo "Start I-status: $START_I sites"
echo "Timestamp: $TIMESTAMP"
echo "Starting 30-min observation..."

# Wait 30 minutes
sleep 1800

END_I=$(tools/shared/list-sites.sh --batch 001 --status "I" | wc -l)
COMPLETED=$((END_I - START_I))
SITES_PER_MIN=$((COMPLETED / 30))

echo "End I-status: $END_I sites"
echo "Sites completed in 30 min: $COMPLETED"
echo "Throughput: $SITES_PER_MIN sites/minute"

# Compare to baseline
# Baseline: 2 sites/minute (50-60 per 30 min)
# Target after optimization: 4-6 sites/minute
```

### Baseline Metrics (From Current System)

**Design Phase**:
- Time per site: 5-10 minutes
- Total for 568 sites: ~3 hours
- Bottleneck: Opus agent decisions

**Implementation Phase**:
- Time per site: 5-10 minutes
- Throughput: 1-2 sites/minute during wave execution
- Bottleneck: File generation + verification

**Finalization Phase**:
- Time: <5 minutes (atomic operation)
- Bottleneck: None (fast)

---

## Recommended Optimization Path for Next Project

1. **Start**: 50 agents per wave (1.5x speedup, low risk)
2. **Implement**: Deterministic design (5x design speedup)
3. **Monitor**: Wave parallelism (2x speedup)
4. **Evaluate**: Registry batching (1.15x speedup)
5. **Plan**: Distributed processing (10x speedup)

**Expected Result**: 20-50x speedup from baseline after full optimization

---

## Validation Checklist

For any optimization, verify:
- [ ] Throughput increased without quality loss
- [ ] Registry integrity maintained
- [ ] No race conditions introduced
- [ ] Error handling still robust
- [ ] Recovery procedures still work
- [ ] System stability maintained
- [ ] Resource usage acceptable

---

*Performance Optimization Guide: 2026-03-24*
*Purpose: Strategies for 3-10x speedup*
*Scope: Design, implementation, finalization phases*
*Risk Assessment: Low to medium (depending on optimization)*

