# Site Indexer Design Spec

## Context

`/Volumes/Scratch/Sites` contains ~25 CMass directories with ~6,800 sites (each having an `index.html`). There is no central index page to browse them. This tool generates a browsable site index as a static HTML file, served via `python -m http.server` or similar.

## Requirements

1. Rust CLI binary at `/Volumes/Scratch/Sites/indexer`
2. Required CLI argument: root directory to scan
3. Uses `fd index.html <root> --type f` to discover sites
4. Generates `index.html` in the root directory
5. All links are relative paths from the root (no web server built in)
6. Client-side JavaScript sorting on all columns
7. Two title fields: directory name title and HTML `<title>` tag content

## Architecture

Single binary, no library crate. Minimal dependencies.

### Dependencies

- `clap` — CLI argument parsing
- `regex` — extract `<title>` from HTML files
- `chrono` — format modification timestamps

### Modules (all in main.rs unless it grows large)

- **CLI parsing**: clap-derived struct with one required positional arg (`root_dir: PathBuf`)
- **Discovery**: shell out to `fd`, parse stdout into `Vec<PathBuf>`
- **Metadata collection**: for each path, build `SiteEntry`
- **HTML generation**: render entries into a self-contained HTML string
- **File output**: write HTML to `<root_dir>/index.html`

### Data Model

```rust
struct SiteEntry {
    /// Relative path from root to the index.html (e.g., "CMassD2/sites/aiice.quest-v1/index.html")
    rel_path: String,
    /// Directory containing index.html, relative (e.g., "CMassD2/sites/aiice.quest-v1/")
    dir_path: String,
    /// Directory name title — last component of parent dir (e.g., "aiice.quest-v1")
    dir_title: String,
    /// HTML <title> tag content extracted from the file, or empty string
    html_title: String,
    /// File size in bytes
    size: u64,
    /// Last modified timestamp (ISO 8601 string)
    modified: String,
}
```

### Discovery

```
fd index.html <root> --type f
```

- Run via `std::process::Command`
- If `fd` not found, print error: "Error: `fd` is required but not installed. Install it: https://github.com/sharkdp/fd" and exit 1
- Parse stdout: one path per line, convert to `PathBuf`

### Metadata Extraction

For each discovered `index.html`:

1. **Relative path**: strip root prefix from absolute path
2. **Directory path**: parent directory of the index.html, relative
3. **Directory title**: last component of the parent directory (e.g., "aiice.quest-v1")
4. **HTML title**: read first 4KB of file, regex `<title>(.*?)</title>` (case-insensitive, dotall)
5. **Size**: `fs::metadata().len()`
6. **Modified**: `fs::metadata().modified()` → format as `YYYY-MM-DD HH:MM`

Errors in reading/parsing any individual file are logged to stderr and the entry uses defaults (empty title, 0 size, no date).

### HTML Generation

Self-contained HTML with embedded CSS and JS. No external dependencies.

**Table columns:**
| Column | Data | Sort type |
|--------|------|-----------|
| # | Row number | Numeric |
| Path | Relative directory path (clickable link) | String |
| Dir Title | Directory name (e.g., "aiice.quest-v1") | String |
| HTML Title | `<title>` tag content from index.html | String |
| Size | File size (human-readable display, raw bytes as data attribute) | Numeric (by raw bytes) |
| Modified | Last modified date | Date string |

**Sorting behavior:**
- Click column header to sort ascending
- Click again to sort descending
- Active column shows ▲ (asc) or ▼ (desc) indicator
- Default: sorted by Path ascending
- Row numbers re-index after each sort

**Styling:**
- Clean, minimal table with alternating row colors
- Sticky header row
- Responsive width
- Monospace font for paths

### Output

Write generated HTML to `<root_dir>/index.html`.

If file already exists, overwrite it (this is a generated file).

Print to stdout: "Generated index.html with N sites at <root_dir>/index.html"

## Usage

```bash
# Build
cd /Volumes/Scratch/Sites/indexer
cargo build --release

# Generate index
./target/release/site-indexer /Volumes/Scratch/Sites

# Serve
cd /Volumes/Scratch/Sites
python3 -m http.server 8000
# Open http://localhost:8000
```

## Verification

1. `cargo build` succeeds with no warnings
2. Running against `/Volumes/Scratch/Sites` produces `index.html`
3. Opening `index.html` in browser shows table with ~6,800 entries
4. Clicking each column header sorts correctly (ascending then descending)
5. Both Dir Title and HTML Title columns are populated and independently sortable
6. Links navigate to the correct site when served via python HTTP server
7. Running with a non-existent path gives a clear error
8. Running without `fd` installed gives a clear error message
