# WDMaker Execution Decision Trees

**Purpose**: Visual decision logic for all critical execution points
**Audience**: Operators, decision-makers, troubleshooters
**Status**: Ready for autonomous execution with human oversight

---

## Decision Tree 1: "Should I Execute Batch 001 Finalization Now?"

```
START: Current time, you want to finalize batch 001
│
├─ QUESTION: Have I confirmed I-status = 517?
│  ├─ NO → WAIT
│  │       Action: Check again in 30 minutes
│  │       Command: tools/shared/list-sites.sh --batch 001 --status "I" | wc -l
│  │       Next: Come back to this tree
│  │
│  └─ YES → PROCEED
│           Continue to next question
│
├─ QUESTION: Do I understand finalization is idempotent?
│  ├─ NO → READ
│  │       Document: FINALIZATION_EXECUTION_GUIDE.md (section: Idempotency)
│  │       Time: 5 minutes
│  │       Then return to this tree
│  │
│  └─ YES → PROCEED
│           Continue to next question
│
├─ QUESTION: Have I reviewed the pre-finalization checks?
│  ├─ NO → READ
│  │       Document: FINALIZATION_EXECUTION_GUIDE.md (section: Pre-Finalization)
│  │       Time: 10 minutes
│  │       Then return to this tree
│  │
│  └─ YES → PROCEED
│           Continue to next question
│
├─ QUESTION: Is the system stable (no obvious problems)?
│  ├─ NO → DIAGNOSE
│  │       Use: COMPREHENSIVE_TROUBLESHOOTING_MATRIX.md
│  │       Or: EMERGENCY_RESPONSE_GUIDE.md
│  │       Time: 15-30 minutes
│  │       Then return to this tree
│  │
│  └─ YES → READY TO EXECUTE
│           Go to Section: Finalization Execution Steps
│           Expected Duration: 5 minutes
│           Expected Outcome: All 517 sites at Q status

RESULT: Execute finalization → Proceed to Batch 010
```

---

## Decision Tree 2: "Batch 001 Finalization Failed - What Now?"

```
START: finish.sh command returned error
│
├─ QUESTION: What type of error occurred?
│  │
│  ├─ ERROR TYPE: "Registry not found" or "Permission denied"
│  │  Action:
│  │  1. Check: ls -l .smbatcher/REGISTRY.md
│  │  2. Check: ls -l .smbatcher/batches/Batch_001.md
│  │  3. If missing: Recover from git history
│  │     git checkout .smbatcher/REGISTRY.md
│  │  4. Retry finalization
│  │  Resolution: Low risk, safe to retry
│  │
│  ├─ ERROR TYPE: "Sites with status != I" or "Transition failed"
│  │  Meaning: Some sites not ready for finalization
│  │  Action:
│  │  1. Check: tools/shared/list-sites.sh --batch 001 | grep -v "I"
│  │  2. Identify status distribution
│  │  3. If only a few sites != I:
│  │     - Wait 5 more minutes for autonomous completion
│  │     - Recheck I-status
│  │     - Retry finalization
│  │  4. If many sites != I:
│  │     - Escalate to EMERGENCY_RESPONSE_GUIDE.md
│  │     - Check system resource status
│  │  Resolution: Medium risk, may need wave redeployment
│  │
│  ├─ ERROR TYPE: "Cannot write to registry" or "Atomic operation failed"
│  │  Meaning: Concurrent access issue or disk problem
│  │  Action:
│  │  1. Check: df -h .smbatcher/
│  │  2. Check: ls -la .smbatcher/ (for lock files)
│  │  3. If lock file exists:
│  │     - Wait 30 seconds
│  │     - Remove: rm -f .smbatcher/.lock
│  │     - Retry finalization
│  │  4. If disk space low:
│  │     - Clear temporary files
│  │     - Check: du -sh .smbatcher/tmp/
│  │     - Then retry
│  │  Resolution: Medium risk, usually recoverable
│  │
│  └─ ERROR TYPE: Unknown or system error
│     Action:
│     1. Consult EMERGENCY_RESPONSE_GUIDE.md (Scenario 3)
│     2. Run: tools/check/status-report.sh
│     3. Document error message completely
│     4. Attempt retry (finalization is idempotent)
│     Resolution: Follow guide, escalate if needed
│
├─ RETRY DECISION: Should I retry finalization?
│  ├─ YES, error seems transient →
│  │  Action: tools/implement/finish.sh --batch 001 --root .
│  │  Expected: Success on retry
│  │  If fails again: Move to ESCALATION
│  │
│  └─ NO, error seems systemic →
│     ESCALATION REQUIRED
│     Action: Read EMERGENCY_RESPONSE_GUIDE.md completely
│     Then: Implement recommended recovery procedure

RESULT: Either finalization succeeds OR escalation path identified
```

---

## Decision Tree 3: "Ready to Start Batch 010? Check These First"

```
START: Batch 001 successfully finalized (all 517 at Q status)
│
├─ CONFIRM: Batch 001 finalization is complete
│  Check: tools/shared/list-sites.sh --batch 001 --status "Q" | wc -l
│  Expected: 517
│  ├─ NOT 517 → ERROR
│  │           Something went wrong with finalization
│  │           Go back to Decision Tree 2 (Finalization Failed)
│  │
│  └─ IS 517 → PROCEED
│
├─ CONFIRM: Batch 010 unassigned site exists
│  Check: tools/shared/list-sites.sh --status "-" | grep "20241204"
│  Expected: 1 site (20241204.com)
│  ├─ NOT FOUND → ERROR
│  │              Site may already be assigned
│  │              Check: tools/shared/list-sites.sh | grep "20241204"
│  │              If already processed: Skip batch 010, go to completion
│  │
│  └─ FOUND → PROCEED
│
├─ ESTIMATE: Time available for batch 010
│  Expected duration: 30-60 minutes
│  Question: Do you have time now?
│  ├─ YES → PROCEED to batch 010 execution
│  │        Follow: BATCH_010_DETAILED_WORKFLOW.md
│  │
│  └─ NO → SCHEDULE for later
│          Save this decision tree state
│          Come back when ready
│
├─ CONFIRM: System is stable and ready
│  Check: tools/check/status-report.sh
│  Expected: No errors, disk space available (>1GB), memory available (>4GB)
│  ├─ PROBLEMS FOUND → DIAGNOSE
│  │                   Use: COMPREHENSIVE_TROUBLESHOOTING_MATRIX.md
│  │                   Fix issues first
│  │                   Then come back to this tree
│  │
│  └─ STABLE → READY TO PROCEED
│              Follow: BATCH_010_DETAILED_WORKFLOW.md
│              Expected: 20241204.com processed in 30-60 minutes
│              Expected outcome: Site at Q status

RESULT: Either Batch 010 execution ready OR system issues identified
```

---

## Decision Tree 4: "How to Handle Stuck Sites During Batch 001"

```
START: Some sites appear stuck (status not changing for 45+ minutes)
│
├─ DIAGNOSIS: Identify which sites are stuck
│  Check: tools/shared/list-sites.sh --batch 001 --status "i"
│  Question: Are any sites at i (in-progress)?
│  ├─ NO (all at O or I) → Progress is normal
│  │                       Continue monitoring
│  │                       Check again in 30 minutes
│  │
│  └─ YES (sites at i) → Potential issue
│                       Continue to next step
│
├─ INVESTIGATION: Are agents still running?
│  Check: ps aux | grep -i opus  (or check agent dashboard)
│  ├─ NO AGENTS RUNNING → PROBLEM
│  │                       Waves may have terminated prematurely
│  │                       Action: Consult EMERGENCY_RESPONSE_GUIDE.md (Scenario 1)
│  │                       Possible solution: Redeploy waves 4-9
│  │
│  └─ AGENTS RUNNING → Continue investigation
│
├─ CHECK: Are these specific sites or all sites?
│  ├─ SPECIFIC SITES (< 5 stuck)
│  │  Action:
│  │  1. Check site-specific logs (if available)
│  │  2. See if files were generated despite status not updating
│  │  3. Manually update status if files exist:
│  │     - Mark as I status
│  │     - Registry update via complete.sh
│  │
│  └─ ALL/MANY SITES (> 10 stuck)
│     Action:
│     1. Consult COMPREHENSIVE_TROUBLESHOOTING_MATRIX.md
│     2. Check system resources (CPU, memory, disk)
│     3. Possible issue: Resource exhaustion
│     4. Solution: Wait for resource recovery OR restart agents
│
├─ DECISION: Wait or intervene?
│  ├─ I'LL WAIT
│  │  Reason: Agents still active, might complete soon
│  │  Action: Check again in 30 minutes
│  │  Condition: Only if < 5% of batch stuck
│  │
│  └─ I'LL INTERVENE
│     Reason: Too many sites stuck, system problem suspected
│     Action: Follow EMERGENCY_RESPONSE_GUIDE.md recovery procedure
│     Condition: If > 10% stuck OR > 45 minutes with no progress

RESULT: Either wait for autonomous completion OR invoke recovery
```

---

## Decision Tree 5: "Project Completion - 99.6% Verification"

```
START: Both Batch 001 and Batch 010 finalized
│
├─ COUNT: Verify total finalized sites
│  Check: tools/shared/list-sites.sh --status "Q" | wc -l
│  Expected range: 566-568
│  │
│  ├─ COUNT = 568 → PERFECT
│  │  All sites processed
│  │  Completion: 100% of catalog
│  │  Action: Document as EXCELLENT outcome
│  │
│  ├─ COUNT = 566 → EXPECTED
│  │  99.6% completion (standard target)
│  │  Missing: 2 sites
│  │  Reason: Possible batch 001 failures in waves 1-3
│  │  Action: Identify missing 2 sites, document why
│  │  Command: tools/shared/list-sites.sh | grep -v "^| .*| Q |"
│  │
│  └─ COUNT < 566 → UNEXPECTED
│     Possible issues with finalization
│     Action: Run complete status audit
│     Command: tools/shared/list-sites.sh
│     Then consult EMERGENCY_RESPONSE_GUIDE.md for recovery
│
├─ IDENTIFY: Missing sites (if count < 568)
│  Check: tools/shared/list-sites.sh | grep -E "^\| [^|]* \| [^Q]"
│  Action: List domain names and their current status
│  Categorize: Are they stuck at O, i, or I?
│  │
│  ├─ STUCK AT O (Open)
│  │  Meaning: Never entered implementation
│  │  Reason: Likely didn't get wave assignment
│  │  Recovery: May need manual requeue OR accept as limitation
│  │
│  ├─ STUCK AT i (In-progress)
│  │  Meaning: Agent started but didn't complete
│  │  Reason: Agent timeout or error
│  │  Recovery: Could retry agents on these sites
│  │
│  └─ STUCK AT I (Implemented)
│     Meaning: Files generated but finalization didn't mark Q
│     Reason: Finalization issue on these specific sites
│     Recovery: Attempt manual finalization for these sites
│
├─ DECISION: Is 99.6% acceptable?
│  ├─ YES → ACCEPT AND COMPLETE
│  │         Action: Mark project as COMPLETE
│  │         Note: Document which 2 sites missed and why
│  │         Review: LESSONS_LEARNED_AND_RECOMMENDATIONS.md
│  │
│  └─ NO → ATTEMPT RECOVERY
│          Action: Follow recovery steps above
│          Risk: May not recover missing sites
│          Time: 30+ minutes

RESULT: Project completion documented, status finalized
```

---

## Decision Tree 6: "System Resource Emergency - What to Do?"

```
START: System running low on resources (disk, memory, or CPU)
│
├─ SEVERITY: How bad is it?
│  │
│  ├─ DISK: Free space < 500MB
│  │  ├─ SEVERE (< 200MB free)
│  │  │  Risk: System might crash, registry corruption possible
│  │  │  Action:
│  │  │  1. IMMEDIATELY stop new agents: Don't deploy more waves
│  │  │  2. Clear temporary files: rm -rf .smbatcher/tmp/*
│  │  │  3. Check: du -sh sites/*/
│  │  │  4. Consider: Archiving completed batches to save space
│  │  │  5. If critical: Pause operations until disk space recovered
│  │  │
│  │  └─ MODERATE (200-500MB free)
│  │     Risk: Some operations may fail
│  │     Action:
│  │     1. Monitor disk space closely
│  │     2. Don't archive or copy large files
│  │     3. Complete current work ASAP
│  │     4. Plan cleanup for later
│  │
│  ├─ MEMORY: Free memory < 1GB
│  │  ├─ SEVERE (< 500MB free)
│  │  │  Risk: Agents may timeout, system becomes slow
│  │  │  Action:
│  │  │  1. Reduce agents per wave: Don't deploy new waves yet
│  │  │  2. Wait for running agents to complete
│  │  │  3. Restart system if needed (last resort)
│  │  │  4. Current work: Allow to complete, then pause
│  │  │
│  │  └─ MODERATE (500MB-1GB free)
│  │     Risk: Slower than expected
│  │     Action:
│  │     1. Monitor memory usage
│  │     2. Reduce new agent deployments slightly
│  │     3. Complete current work normally
│  │
│  └─ CPU: Usage > 90% sustained
│     ├─ SEVERE (98-100% consistent)
│     │  Risk: System may become unresponsive
│     │  Action:
│     │  1. Stop non-critical processes (if safe)
│     │  2. Agents will still work but slower
│     │  3. Expected: Return to normal after current wave completes
│     │
│     └─ MODERATE (70-90%)
│        Risk: System overloaded but functional
│        Action:
│        1. This is normal during wave execution
│        2. No action needed, expected behavior
│
├─ DECISION: Can current work continue?
│  ├─ YES → Monitor but continue
│  │         Check status every 15-30 minutes
│  │         Watch for further resource degradation
│  │
│  └─ NO → Pause and Recover
│          Action:
│          1. Don't deploy new waves
│          2. Let current agents finish
│          3. Identify root cause (large files? leak?)
│          4. Fix before continuing
│
├─ ESCALATION: Contact system admin if:
│  ✓ Disk space < 200MB and can't be freed
│  ✓ Memory consistently < 200MB
│  ✓ System becoming unresponsive
│  Document: Current resource usage, recent changes

RESULT: Recovery actions taken, operations adjusted
```

---

## Quick Reference: Decision Tree Selection Guide

**"What should I do right now?"** → Use this chart:

| Current Situation | Decision Tree | Time Needed |
|------------------|---------------|-------------|
| Want to finalize batch 001 | Tree 1 | 15-30 min |
| Finalization failed | Tree 2 | 30-60 min |
| Ready to start batch 010 | Tree 3 | 10 min |
| Sites appear stuck | Tree 4 | 30 min |
| Project completed | Tree 5 | 20 min |
| System running low on resources | Tree 6 | 20-30 min |

---

## Using These Decision Trees

1. **Find your situation** in the Quick Reference table
2. **Follow the tree** from START to RESULT
3. **Answer questions honestly** at each branch point
4. **Take recommended actions** immediately
5. **Record outcomes** for lessons learned

Each tree is designed to reach a conclusion within 15-60 minutes. If you need to escalate or get more information, the trees reference specific guides (EMERGENCY_RESPONSE_GUIDE.md, etc.).

---

*Decision Trees: 2026-03-24*
*Purpose: Real-time decision support for all critical execution points*
*Confidence: Based on proven system behavior across 568-site catalog*
