# New Scripts Reference - Atomic Claiming with fcntl Locking

## Overview

Scripts have been upgraded to support **atomic claiming** when auto-detecting the next batch or sites. This prevents race conditions when multiple subagents run the same scripts simultaneously.

## Problem Solved

Without locking, this race condition could occur:
```
Agent A: find-next → returns batch 003
Agent B: find-next → returns batch 003 (same!)
Agent A: starts working on batch 003
Agent B: starts working on batch 003 (collision!)
```

With atomic claiming:
```
Agent A: find-next --claim → locks, finds batch 003, claims it (B→i), unlocks
Agent B: find-next --claim → locks, finds batch 004 (003 already claimed), claims it, unlocks
```

## Core Helper: tools/shared/find-next.sh

Central utility that atomically finds AND claims the next work item.

### Usage

```bash
# Read-only find (no locking, for status display)
tools/shared/find-next.sh --registry .smbatcher/REGISTRY.md --mode batch --phase implement

# Atomic find + claim (uses fcntl locking)
tools/shared/find-next.sh --registry .smbatcher/REGISTRY.md --mode batch --phase implement --claim

# Output formats
--format json   # {"batch": "003", "sites": [...], "claimed": true}
--format simple # Just prints "003"
```

### Modes

| Mode | Description |
|------|-------------|
| `batch` | Find next batch ID |
| `sites` | Find sites needing work |
| `status` | Show status counts |
| `active` | Find currently active batch |

### Exit Codes

| Code | Meaning | Output (stdout) |
|------|---------|-----------------|
| 0 | Work found successfully | `{"batch": "003", "sites": [...], "exit_code": 0}` |
| 1 | No work available (expected) | `{"reason": "no_work_found", "exit_code": 1}` |
| 2 | Actual error (permission, OS) | `{"error": "permission_denied", "message": "...", "exit_code": 2}` |

**Important for Coding Tools:** All output goes to stdout (including errors) so the caller can parse and understand the result. The exit code + stdout JSON together provide complete information.

### Claim Behavior

| Phase | From Status | To Status (claimed) | Action |
|-------|-------------|---------------------|--------|
| design | B | d | Start design |
| design | d | (no change) | Continue existing |
| implement | D | O | Transition to implement |
| implement | O | i | Start implement |
| implement | i | (no change) | Continue existing |

## Updated Scripts

### tools/design/run.sh

```bash
# Without argument: atomically finds and claims sites for design
tools/design/run.sh

# With argument: manual override (no auto-detection)
tools/design/run.sh --sites "example.com:Example:desc:theme"
```

**Auto-detection priority:**
1. Sites with `d` status → continue in-progress design
2. Sites with `B` status → claim and start design
3. Sites with `-` status → need batching first

### tools/implement/run.sh

```bash
# Without argument: atomically finds and claims batch for implementation
tools/implement/run.sh

# With argument: manual override
tools/implement/run.sh --batch 003
```

**Auto-detection priority:**
1. Batch with `i` status → continue in-progress implement
2. Batch with `O` status → claim and start implement
3. Batch with `D` status → transition to O first

### tools/implement/lock.sh

```bash
# Without argument: atomically finds and claims next batch to lock
tools/implement/lock.sh

# With argument: lock specific batch
tools/implement/lock.sh 003
```

### tools/implement/status.sh

```bash
# Without argument: shows active/latest batch (read-only, no claiming)
tools/implement/status.sh

# With argument: show specific batch
tools/implement/status.sh 003
```

## Implementation Details

### fcntl Locking

All atomic operations use `fcntl.LOCK_EX` (exclusive lock) on `.smbatcher/REGISTRY.lock`:

```python
with lock_path.open("w") as lf:
    fcntl.flock(lf, fcntl.LOCK_EX)  # Blocks until lock acquired
    try:
        # Read registry
        # Find next item
        # Update status (claim)
        # Write registry
    finally:
        fcntl.flock(lf, fcntl.LOCK_UN)  # Release lock
```

### Lock File Location

Default: `.smbatcher/REGISTRY.lock` (same directory as REGISTRY.md)

Can be overridden with `--lock-file PATH`.

## Status Flow Reference

```
- → B → d → D → O → i → I → Q
```

| Status | Meaning | Claimable |
|--------|---------|-----------|
| `-` | Registered, not batched | No (needs batching first) |
| `B` | In design batch | Yes → `d` |
| `d` | Design in progress | No (already claimed) |
| `D` | Design done | Yes → `O` |
| `O` | Ready for implement | Yes → `i` |
| `i` | Implement in progress | No (already claimed) |
| `I` | Implement complete | No |
| `Q` | Finished | No |

## Concurrency Safety

| Scenario | Behavior |
|----------|----------|
| 2 agents call `find-next --claim` simultaneously | First acquires lock, claims batch, releases. Second waits, then gets different batch. |
| Agent calls without `--claim` | Read-only, no locking. Safe for status display. |
| Agent uses explicit `--batch ID` | No auto-detection, works on specified batch. |

## Error Handling for Callers

### Shell Script Pattern
```bash
FIND_RESULT="$("$ROOT_DIR/tools/shared/find-next.sh" ... --format json)" || FIND_EXIT=$?
FIND_EXIT=${FIND_EXIT:-0}

if [ "$FIND_EXIT" -eq 2 ]; then
    echo "Error during auto-detection:"
    echo "$FIND_RESULT"  # Show error details to stdout
    exit 2
fi
# Exit 0 or 1: parse JSON normally
```

### Python Script Pattern
```python
find_result = subprocess.run([...], capture_output=True, text=True, check=False)

if find_result.returncode == 2:
    print("Error during auto-detection:")
    print(find_result.stdout)  # Error details in stdout JSON
    return 2
elif find_result.returncode == 0:
    result_data = json.loads(find_result.stdout)
    # ... use result_data
else:
    # Exit 1 = no work available (not an error)
    result_data = json.loads(find_result.stdout)
    print(json.dumps(result_data, indent=2))  # Show reason
    return 1
```

## Exit Code Convention (All Scripts)

All scripts follow this exit code convention:

| Code | Meaning | stdout Format |
|------|---------|---------------|
| 0 | Success | `OK:<result_type>` or detailed output |
| 1 | Operation failed (validation, no work) | `FAIL:<reason>:<details>` |
| 2 | Input/setup error (missing args, files) | `ERROR:<type>:<details>` |

### Output Format Examples

```bash
# Success
OK:html_valid
OK:check_passed
OK:all_files_present

# Failure (operation ran but failed)
FAIL:html_invalid:3 issues
FAIL:check_failed:2 failures
FAIL:missing_files:index.html, styles.css

# Error (couldn't even run)
ERROR:directory_not_found:/path/to/site
ERROR:missing_argument:--sites is required
ERROR:batch_not_found:/path/to/batch.md
```

### Key Principle

**All output goes to stdout** (not stderr) so Coding Tools can parse and understand:
- What happened
- Why it failed
- What to fix

## Migration Notes

### Exit Code + stdout Upgrade

Scripts updated to use structured stdout output:

**Core Scripts:**
- `tools/shared/find-next.py` - Exit 0/1/2 with JSON output
- `tools/shared/similarity.py` - `OK:similarity` / `ERROR:no_files`
- `tools/shared/metrics.sh` - `OK:metrics_passed` / `FAIL:metrics_failed`
- `tools/shared/dry-run.sh` - `OK:dry_run`
- `tools/shared/metrics-batch-counts.py` - `OK:counts` / `ERROR:registry_not_found`
- `tools/shared/metrics-completion-rate.py` - `OK:rate` / `ERROR:missing_args`
- `tools/shared/metrics-collision-rate.py` - `OK:rate` / `ERROR:missing_args`
- `tools/shared/ci-dry-run-batch-id.py` - `OK:batch_id` / `ERROR:missing_env`
- `tools/shared/ci-dry-run-limit-sites.py` - `OK:limited` / `OK:empty`

**Design Scripts:**
- `tools/design/run.py` - `OK:dry_run` / `FAIL:no_sites_ready`
- `tools/design/register.py` - `OK:registered` / `ERROR:no_sites`
- `tools/design/batch.py` - `OK:batch_created` / `ERROR:no_entries`
- `tools/design/complete.py` - `OK:marked_complete` / `FAIL:missing_headings`
- `tools/design/start.py` - `OK:marked_in_progress` / `ERROR:domain_not_found`
- `tools/design/check-design.py` - `OK:check_passed` / `FAIL:check_failed`
- `tools/design/check-dirs.py` - `OK:all_present` / `FAIL:incomplete`
- `tools/design/verify-post-design.py` - `OK:verified` / `FAIL:no_designs_passed`
- `tools/design/read-design.py` - Structured output / `FAIL:section_not_found`
- `tools/design/write-design.py` - `Stamped:` / `ERROR:design_not_found`
- `tools/design/analyze-seeds.py` - `OK:annotated` / `ERROR:no_batch_files`
- `tools/design/frequency.py` - `OK:analyzed` / `FAIL:no_designs_found`
- `tools/design/list-designed.py` - `OK:listed`

**Implement Scripts:**
- `tools/implement/run.py` - `OK:batch_complete` / `FAIL:no_batch_ready`
- `tools/implement/lock.py` - `OK:locked` / `FAIL:no_domains` / `ERROR:registry_not_found`
- `tools/implement/generate.py` - `OK:prepared` / `ERROR:design_not_found`
- `tools/implement/complete.py` - `OK:marked_complete` / `ERROR:domain_not_found`
- `tools/implement/start.py` - `OK:marked_in_progress` / `ERROR:domain_not_found`
- `tools/implement/finish.py` - `OK:marked_done` / `FAIL:no_sites_to_mark`
- `tools/implement/check-outputs.py` - `OK:all_files_present` / `FAIL:missing_files`
- `tools/implement/status.sh` - `OK:no_active_batch` / `ERROR:auto_detect_failed`

**Prepare Scripts:**
- `tools/prepare/info.py` - `OK:info_complete` / `FAIL:missing_prerequisites`
- `tools/prepare/register.py` - `OK:registered` / `ERROR:no_sites`
- `tools/prepare/batch.py` - `OK:batch_created` / `FAIL:no_eligible_sites`

**Check Scripts:**
- `tools/check/html-check.sh` - `OK:html_valid` / `FAIL:html_invalid`
- `tools/check/js-syntax.sh` - `OK:js_valid` / `FAIL:js_syntax_errors`
- `tools/check/verify-site.sh` - `OK:verification_passed` / `FAIL:verification_failed`
- `tools/check/design-compliance.sh` - `OK:full_compliance` / `FAIL:compliance_issues`
- `tools/check/status-report.sh` - `OK:status_report` / `ERROR:registry_not_found`
- `tools/check/file-stats.sh` - Exit 2 on input error
- `tools/check/tree-view.sh` - Exit 2 on input error
- `tools/check/serve.sh` - Exit 2 on input error
- `tools/design/register.sh` - `ERROR:invalid_argument` / `ERROR:missing_argument`
- `tools/design/batch.sh` - `ERROR:invalid_argument` / `ERROR:missing_argument`

### Atomic Claiming Upgrade

Scripts using `--claim` for atomic find + status update:
- `tools/shared/find-next.py` - Added `--claim` and `--lock-file` parameters
- `tools/design/run.py` - Uses `--claim` for auto-detection
- `tools/implement/run.py` - Uses `--claim` for auto-detection
- `tools/implement/lock.sh` - Uses `--claim` for auto-detection