docs: comprehensive documentation expansion
Add 4 new doc files (configuration reference, development guide, API reference with full request/response schemas, database schema), expand the README with a feature list and how-it-works walkthrough, fix missing Firmware and Burn stages in architecture.md and test-suite.md, add threshold engine and host-mode agent sections, and add godoc comments to 11 packages and 6 model types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -11,13 +11,70 @@ Built for solo-operator home labs: one Go binary, SQLite + flat files,
|
||||
HTMX + SSE UI, bundled dnsmasq, optional ntfy / Discord / SMTP
|
||||
notifications.
|
||||
|
||||
## Features
|
||||
|
||||
- **Automated PXE boot** — dnsmasq proxy-DHCP serves a disposable
|
||||
Debian live image to registered MACs. No VLAN, no dedicated bridge.
|
||||
- **11-stage validation pipeline** — Inventory, Firmware, SpecValidate,
|
||||
SMART, CPUStress, Storage, Network, Burn, GPU, PSU, Reporting.
|
||||
- **Three vetting profiles** — quick (~10 min), deep (~8-12 h),
|
||||
soak (~36-40 h). Same probes and gates; only durations scale.
|
||||
- **Server-side threshold engine** — per-run rules evaluate every
|
||||
sensor batch in real time. Critical breaches (thermal runaway,
|
||||
EDAC UE, voltage sag) fail the run immediately.
|
||||
- **FailedHolding with SSH** — when a stage fails the pipeline parks
|
||||
the host and issues a one-time SSH key so you can triage in the
|
||||
live image.
|
||||
- **Real-time dashboard** — HTMX + SSE push tile updates, stage
|
||||
progress, sub-step detail, and live log tailing to the browser.
|
||||
- **Pluggable notifications** — ntfy, Discord webhooks, and SMTP with
|
||||
severity-routed delivery.
|
||||
- **Non-destructive mode** — skip badblocks + wipe for hosts with
|
||||
data you want to keep.
|
||||
- **Host-mode agent** — a persistent reporter that heartbeats from
|
||||
installed hosts and reboots into the live image on command.
|
||||
- **Self-contained HTML reports** — offline-viewable summaries with
|
||||
inlined CSS; machine-readable JSON alongside.
|
||||
- **Four-layer safety gates** — MAC allowlist, signed run token,
|
||||
wipe probe, device allowlist protect against accidental disk wipes.
|
||||
- **Janitor** — automatic retention-based cleanup of artifact files
|
||||
and log files.
|
||||
|
||||
## How it works
|
||||
|
||||
1. Install the host-mode agent on each node (one-liner from the
|
||||
dashboard's quick-register script).
|
||||
2. Register the host in the web UI — name, MAC, expected hardware
|
||||
spec (YAML).
|
||||
3. Click **Start Vetting** and choose a profile (quick / deep / soak).
|
||||
4. The host-mode agent receives a `reboot_for_vetting` heartbeat
|
||||
command and reboots into PXE.
|
||||
5. dnsmasq serves the iPXE script; the host boots a disposable Linux
|
||||
live image containing the vetting agent.
|
||||
6. The agent claims the run (token auth), then walks through each
|
||||
stage — posting logs, sensor readings, and results back to the
|
||||
orchestrator.
|
||||
7. Thresholds are evaluated server-side on every sensor batch.
|
||||
8. **Pass** — auto-reboot to local disk, HTML report generated,
|
||||
notification fires.
|
||||
9. **Fail** — pipeline parks in FailedHolding, SSH key issued,
|
||||
notification fires. Operator triages and retries or releases.
|
||||
|
||||
## Documentation
|
||||
|
||||
- [docs/operations.md](docs/operations.md) — install + first run +
|
||||
- [docs/operations.md](docs/operations.md) — install, first run,
|
||||
troubleshooting
|
||||
- [docs/architecture.md](docs/architecture.md) — packages, state
|
||||
machine, protocol
|
||||
machine, protocol, safety model
|
||||
- [docs/test-suite.md](docs/test-suite.md) — what each stage measures
|
||||
- [docs/configuration.md](docs/configuration.md) — every YAML config
|
||||
knob, profiles, thresholds
|
||||
- [docs/api-reference.md](docs/api-reference.md) — HTTP API with
|
||||
request/response schemas, SSE events
|
||||
- [docs/database.md](docs/database.md) — SQLite schema, tables,
|
||||
entity relationships
|
||||
- [docs/development.md](docs/development.md) — dev setup, building,
|
||||
testing, adding stages
|
||||
|
||||
## Quick start (local, against QEMU)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user