Post-repair hardware validation pipeline for Proxmox cluster hosts. Go orchestrator + in-image agent + mkosi live image + bundled dnsmasq PXE + SQLite + HTMX/SSE UI + notify registry + janitor + full docs.
This commit is contained in:
@@ -0,0 +1,85 @@
|
||||
# Vetting
|
||||
|
||||
Post-repair hardware validation pipeline for Proxmox cluster hosts.
|
||||
Register a host, click **Start Vetting**, and the orchestrator will
|
||||
PXE-boot it into a custom Linux live image and run it through a
|
||||
consistent battery of tests (CPU stress, RAM stress, SMART, disk I/O,
|
||||
network throughput, GPU, PSU telemetry). Pass → auto-shutdown + HTML
|
||||
report. Fail → pipeline halts, SSH drops in, notification fires.
|
||||
|
||||
Built for solo-operator home labs: one Go binary, SQLite + flat files,
|
||||
HTMX + SSE UI, bundled dnsmasq, optional ntfy / Discord / SMTP
|
||||
notifications.
|
||||
|
||||
## Documentation
|
||||
|
||||
- [docs/operations.md](docs/operations.md) — install + first run +
|
||||
troubleshooting
|
||||
- [docs/architecture.md](docs/architecture.md) — packages, state
|
||||
machine, protocol
|
||||
- [docs/test-suite.md](docs/test-suite.md) — what each stage measures
|
||||
|
||||
## Quick start (local, against QEMU)
|
||||
|
||||
```bash
|
||||
# 1. Build
|
||||
make all
|
||||
|
||||
# 2. Generate an admin password hash and paste it into the config.
|
||||
./bin/gen-admin-password 'your-password'
|
||||
# Edit deploy/vetting.example.yaml:
|
||||
# auth.admin_password_bcrypt = <that hash>
|
||||
# auth.session_secret_hex = $(openssl rand -hex 32)
|
||||
|
||||
# 3. Run
|
||||
./bin/vetting --config deploy/vetting.example.yaml
|
||||
# → http://localhost:8080
|
||||
```
|
||||
|
||||
For a full end-to-end QEMU walk-through (bridge setup, host registration,
|
||||
PXE boot), see [docs/operations.md § First vetting run](docs/operations.md#first-vetting-run).
|
||||
|
||||
## Production install (Proxmox LXC)
|
||||
|
||||
```bash
|
||||
make orchestrator-linux
|
||||
scp -r bin deploy lxc:/opt/vetting/
|
||||
ssh lxc "cd /opt/vetting && sudo ./deploy/install.sh"
|
||||
# Edit /etc/vetting/vetting.yaml, then:
|
||||
ssh lxc "sudo systemctl enable --now vetting"
|
||||
```
|
||||
|
||||
See [docs/operations.md § Install](docs/operations.md#install-proxmox-lxc)
|
||||
for the full walkthrough.
|
||||
|
||||
## Repository layout
|
||||
|
||||
```
|
||||
cmd/ orchestrator + agent entrypoints
|
||||
internal/ core packages (see docs/architecture.md for the map)
|
||||
agent/ in-image agent logic (claim loop, stage dispatch, probes)
|
||||
live-image/ mkosi config for the PXE-bootable Debian live image
|
||||
deploy/ systemd unit + install.sh + example config
|
||||
docs/ operator + developer docs
|
||||
test/e2e/ build-tag-gated QEMU + PXE full-stack test
|
||||
tools/ small CLI helpers (e.g. gen-admin-password)
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
- `make test` — Go unit + smoke tests (cross-platform)
|
||||
- `make vet` — `go vet` on the whole module
|
||||
- `make live-image` — Linux-only; run under WSL from Windows
|
||||
- `make e2e` — requires Linux root + live image + running orchestrator
|
||||
- `make run` — build + launch the orchestrator with the example config
|
||||
|
||||
Windows hosts: everything except `live-image` and `e2e` works natively.
|
||||
The live image build calls `mkosi` which needs a real Linux userspace,
|
||||
so use WSL for those targets.
|
||||
|
||||
## Status
|
||||
|
||||
All six phases in the original plan are implemented. The E2E QEMU
|
||||
harness is wired in `test/e2e/qemu_test.go` but requires a running
|
||||
orchestrator + registered host + queued run as preconditions — it's a
|
||||
developer-facing integration harness, not a unit test.
|
||||
Reference in New Issue
Block a user