josh 3656af9823
CI / Lint + build + test (push) Successful in 1m47s
Release / release (push) Successful in 10m8s
feat(end-of-run): reboot to local disk instead of powering off
Completed runs now reboot the host and fall through iPXE to the next
boot device (local disk) instead of powering off. Three coordinated
changes:

- pxe/ipxe: NoActiveRunScript exits iPXE (drops to next boot entry)
  instead of `sleep 10; poweroff`. Without this, a Completed reboot
  just loops through PXE and gets told to poweroff.
- api/agent_handlers: heartbeat returns cmd=reboot (was cmd=shutdown)
  when the run reaches Completed.
- agent/runner: runs `systemctl reboot` (with `shutdown -r now`
  fallback) in response to cmd=reboot.

Operator cancel still powers off — powerOffAndReturn is unchanged
because a cancel means the operator wants the host idle so they can
walk up to it, not back in rotation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 22:45:11 -04:00

Vetting

Post-repair hardware validation pipeline for Proxmox cluster hosts. Register a host, click Start Vetting, and the orchestrator will PXE-boot it into a custom Linux live image and run it through a consistent battery of tests (CPU stress, RAM stress, SMART, disk I/O, network throughput, GPU, PSU telemetry). Pass → auto-shutdown + HTML report. Fail → pipeline halts, SSH drops in, notification fires.

Built for solo-operator home labs: one Go binary, SQLite + flat files, HTMX + SSE UI, bundled dnsmasq, optional ntfy / Discord / SMTP notifications.

Documentation

Quick start (local, against QEMU)

make all
./bin/vetting --config deploy/vetting.example.yaml
# → http://localhost:8080

The UI has no built-in auth — bind to loopback or LAN only, or front the service with a reverse proxy (Caddy/nginx basic-auth) if you want a password. The agent↔orchestrator channel keeps its own bearer-token auth and is unaffected.

For a full end-to-end QEMU walk-through (bridge setup, host registration, PXE boot), see docs/operations.md § First vetting run.

Production install (Proxmox LXC)

On a fresh Debian/Ubuntu LXC, as root:

curl -fsSL https://gitea.thewrightserver.net/josh/Vetting/raw/branch/main/deploy/proxmox-install.sh | bash

That installs Go (if missing), clones the repo to /opt/vetting-src, builds vetting-linux-amd64, and hands off to deploy/install.sh — which lays down the binary, systemd unit, example config, and vetting service user. Then:

# Edit /etc/vetting/vetting.yaml (server.bind + server.public_url)
sudo systemctl enable --now vetting
journalctl -fu vetting

Prefer to build yourself? The manual path:

make orchestrator-linux
scp -r bin deploy lxc:/opt/vetting/
ssh lxc "cd /opt/vetting && sudo ./deploy/install.sh"
ssh lxc "sudo systemctl enable --now vetting"

See docs/operations.md § Install for the full walkthrough.

Repository layout

cmd/                  orchestrator + agent entrypoints
internal/             core packages (see docs/architecture.md for the map)
agent/                in-image agent logic (claim loop, stage dispatch, probes)
live-image/           mkosi config for the PXE-bootable Debian live image
deploy/               systemd unit + install.sh + example config
docs/                 operator + developer docs
test/e2e/             build-tag-gated QEMU + PXE full-stack test
tools/                small CLI helpers

Development

  • make test — Go unit + smoke tests (cross-platform)
  • make vetgo vet on the whole module
  • make live-image — Linux-only; run under WSL from Windows
  • make e2e — requires Linux root + live image + running orchestrator
  • make run — build + launch the orchestrator with the example config

Windows hosts: everything except live-image and e2e works natively. The live image build calls mkosi which needs a real Linux userspace, so use WSL for those targets.

Status

All six phases in the original plan are implemented. The E2E QEMU harness is wired in test/e2e/qemu_test.go but requires a running orchestrator + registered host + queued run as preconditions — it's a developer-facing integration harness, not a unit test.

S
Description
Hardware validation pipeline
Readme 976 KiB
Languages
Go 81.1%
Shell 6.7%
templ 5.5%
CSS 3.8%
Go Template 1%
Other 1.9%