docs: comprehensive documentation expansion
CI / Lint + build + test (push) Successful in 1m36s
Release / detect (push) Successful in 5s
Release / build-live-image (push) Has been skipped
Release / bundle (push) Successful in 49s

Add 4 new doc files (configuration reference, development guide, API
reference with full request/response schemas, database schema), expand
the README with a feature list and how-it-works walkthrough, fix
missing Firmware and Burn stages in architecture.md and test-suite.md,
add threshold engine and host-mode agent sections, and add godoc
comments to 11 packages and 6 model types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-23 18:37:26 -04:00
parent 17ec55cb85
commit 8367ec2a9f
18 changed files with 1548 additions and 10 deletions
+193
View File
@@ -0,0 +1,193 @@
# Development guide
How to build, test, and contribute to the vetting orchestrator and
agent.
## Prerequisites
| Tool | Version | Notes |
|------|---------|-------|
| Go | 1.22+ | Pure Go — no cgo required. |
| templ | latest | `go install github.com/a-h/templ/cmd/templ@latest` |
| make | any | GNU Make on Linux/macOS/WSL; `make` ships with Git for Windows. |
| mkosi | 25.3+ | Only needed for `make live-image`. Linux/WSL only. |
Windows hosts can build and test everything except `live-image` and
`e2e`. Those targets require a real Linux userspace — use WSL:
`wsl make live-image`.
## Repository structure
```
cmd/
vetting/ orchestrator binary — HTTP server, dispatcher, runner
vetting-agent/ agent binary — dual-mode (live-image + host-mode)
internal/
config/ YAML loader, ProfileRegistry (quick/deep/soak)
db/ SQLite open + embedded migrations (pure Go via modernc.org/sqlite)
model/ Plain structs: Host, Run, Stage, SubStep, Measurement, SpecDiff
store/ Repository layer — hand-written SQL, no ORM
orchestrator/ State machine, dispatcher, runner, WoL, HMAC tokens, iperf supervisor
api/ HTTP handlers — agent_handlers.go + ui_handlers.go
httpserver/ chi router assembly (exists to break api ↔ orchestrator import cycle)
web/ Embedded static assets + compiled Templ templates
pxe/ dnsmasq subprocess supervisor + per-MAC iPXE script generator
events/ In-process SSE hub (fan-out to browser clients)
logs/ Per-run flat-file writer + SSE fan-out
spec/ Expected-vs-actual hardware diff engine
notify/ Pluggable notifier registry (ntfy, Discord, SMTP)
report/ HTML + JSON report generation
hold/ Per-run SSH key issuance for FailedHolding
janitor/ Retention-based cleanup (artifact + log files)
agent/
runner.go In-image agent: claim loop, stage dispatch, heartbeat, log forwarder
client.go HTTP client for orchestrator API
sensor_mux.go Thermal + performance metric sidecar
bootstate/ Kernel cmdline parser (run_id, mac, orchestrator_url, token)
hostmode/ Persistent host-mode reporter (systemd service)
probes/ Hardware interrogation (lshw, dmidecode, smartctl, etc.)
tests/ Per-stage test implementations
live-image/ mkosi config + scripts for Debian live image
deploy/ systemd unit, install.sh, pxe-setup.sh, example config
docs/ You are here
test/e2e/ Build-tagged QEMU + PXE full-stack integration test
```
**Key architectural insight:** `internal/httpserver` exists solely to
break the `api ↔ orchestrator` import cycle. The `internal/` tree is
the orchestrator binary's code; the `agent/` tree is the agent
binary's code. They share only `internal/model` (plain structs) and
`internal/spec` (diff engine, used by the agent's inventory probe and
the orchestrator's SpecValidate resolver).
## Building
| Target | Command | Description |
|--------|---------|-------------|
| Everything | `make all` | Build orchestrator + agent for host OS. |
| Orchestrator | `make orchestrator` | Host OS binary (`bin/vetting`). |
| Orchestrator (Linux) | `make orchestrator-linux` | Cross-compile to `bin/vetting-linux-amd64`. |
| Agent | `make agent` | Host OS binary (dev/testing only). |
| Agent (Linux) | `make agent-linux` | Cross-compile to `bin/vetting-agent.linux-amd64`. |
| Templates | `make templ` | Regenerate `.templ``.go` files. Run before build if templates changed. |
| Live image | `make live-image` | Build Debian live image via mkosi (Linux/WSL only). |
| Release bundle | `make release` | Slim tarball: binaries + deploy scripts + VERSION pointer. |
| Tidy | `make tidy` | `go mod tidy`. |
| Format | `make fmt` | `go fmt ./...`. |
| Lint | `make vet` | `go vet ./...`. |
| Clean | `make clean` | Remove `bin/`, `build/`, `tmp/`, `out/`, `dist/`. |
Build flags: the git SHA is baked into the binary via
`-ldflags -X vetting/internal/version.GitSHA=<sha>`.
## Running locally
```bash
make run
# → builds orchestrator, launches with deploy/vetting.example.yaml
# → http://localhost:8080
```
The example config binds to `127.0.0.1:8080`, disables PXE, and uses
`./var/` relative paths for the database, artifacts, and logs. Edit
`deploy/vetting.example.yaml` to tune for your dev environment.
For a QEMU walkthrough (register a host, PXE-boot a VM, watch the
pipeline), see [operations.md § First vetting run](operations.md#first-vetting-run).
## Testing
| Command | What it does |
|---------|--------------|
| `make test` | Unit + smoke tests across all packages. Cross-platform. |
| `make test-race` | Same tests with Go's race detector (`-race -count=1`). |
| `make vet` | `go vet ./...` — catches common mistakes. |
| `make e2e` | QEMU + PXE full-stack integration test. Requires Linux root, a built live image, and a running orchestrator with a registered host and queued run. |
**Test design:**
- Tests use real SQLite (in-memory or temp file) — no mocking the
database.
- The `agent/tests/fakes/` directory contains mock binaries
(`dmidecode`, `stress-ng`, etc.) used by agent probe tests.
- E2E tests are build-tagged with `-tags=e2e` and live in
`test/e2e/qemu_test.go`.
## Adding a new test stage
1. Add a `State<Name>` constant to `internal/model/model.go`.
2. Wire it into `internal/orchestrator/statemachine.go` — both the
forward transition table and the stage-for-state lookup.
3. Add the stage name to `DefaultStages()` in
`internal/config/profiles.go`.
4. Add a `case "<Name>":` to the `runStage` switch in
`agent/runner.go`.
5. Drop the implementation into `agent/tests/<name>.go`.
6. If the stage is **orchestrator-owned** (like SpecValidate or
Reporting), add a `resolve<Name>` helper to
`internal/api/agent_handlers.go` and call it from `resultAdvance`.
7. Add the stage to `vetting.stages` in
`deploy/vetting.example.yaml`.
See [test-suite.md](test-suite.md) for what each existing stage
measures and its pass/fail criteria.
## Adding a new notifier
1. Implement the `notify.Notifier` interface (single `Send` method)
in a new file under `internal/notify/`.
2. Register the new type in the notifier builder (the switch in
`internal/notify/build.go` or equivalent factory).
3. Add the type-specific config fields to the `Notifier` struct in
`internal/config/config.go`.
4. Document the new notifier type in
[configuration.md § notifiers](configuration.md#notifiers).
## Code conventions
- **No cgo** — the SQLite driver is `modernc.org/sqlite` (pure Go).
Builds cross-compile to Linux from Windows/macOS without a C
toolchain.
- **Hand-written SQL** — no ORM. Queries are explicit and testable.
Each store method is a single SQL statement or a short transaction.
- **Templ for UI** — `.templ` files compile to type-safe Go functions.
The report module uses `html/template` instead (self-contained HTML
with inlined CSS).
- **chi for routing** — `github.com/go-chi/chi/v5`. Standard
middleware stack: `RealIP`, `Recoverer`, `Logger`.
- **Error handling** — fail-soft in SSE/tile paths (log and skip),
fail-hard in store/migration paths (return error up).
- **Log convention** — `log.Printf` with a context prefix
(e.g. `"claim: seed stages run %d: %v"`).
## CI/CD
Three Gitea Actions workflows in `.gitea/workflows/`:
| Workflow | Trigger | What it does |
|----------|---------|--------------|
| `ci.yml` | Push to main + PRs | Templ generate, tidy check, vet, build (native + linux), test with race detector + coverage. |
| `release.yml` | Push to main (skips doc/test paths) | Detects `live-image/VERSION` changes → builds + publishes live image to registry. Always builds slim bundle → publishes to `vetting/latest/`. |
| `e2e.yml` | Manual dispatch | Builds live image + orchestrator, installs QEMU + deps, runs `make e2e`. |
**Release bundle structure:**
```
vetting-bundle/
bin/
vetting-linux-amd64
vetting-agent.linux-amd64
live-image/
VERSION # pointer — actual vmlinuz/initrd.img fetched on install
install.sh
pxe-setup.sh
vetting.service
vetting.production.yaml
ipxe-shas.txt
VERSION # git SHA
```
The ~30 MB bundle is published on every push to main. The ~300 MB live
image (`vmlinuz` + `initrd.img`) is published separately under
`live-image/<version>/` and only rebuilds when `live-image/VERSION`
changes.