josh 506c856046
CI / Lint + build + test (push) Successful in 1m48s
Release / release (push) Successful in 2m22s
pxe: switch dnsmasq to proxy-DHCP mode on the LAN
Previously the orchestrator ran a full DHCP server on a dedicated
br-vetting bridge (10.77.0.0/24), which required a hypervisor-level
bridge + physical cabling onto that bridge for every repaired host.
Real-world bite: the LXC's br-vetting had no L2 path to the target
host's PXE NIC, so DHCPDISCOVERs never reached eth1 and PXE silently
timed out.

dnsmasq's proxy-DHCP mode is the idiomatic answer: it coexists with
the LAN's existing DHCP server (UniFi, etc.), never assigns an IP
itself, and only supplements the PXE options. No dedicated bridge,
no VLAN, no cabling changes \u2014 dnsmasq binds to the LAN interface
and layers option 66/67 + the PXE BINL on top of the real DHCP
exchange. The MAC allowlist still gates replies, so random LAN
clients booting from network get nothing.

Template switches dhcp-range=<start,end,lease> to
dhcp-range=<cidr>,proxy and replaces dhcp-boot= for first-boot ROM
clients with pxe-service= directives (the correct proxy-mode
chainload form). Validation drops the dhcp_range regex for a
net.ParseCIDR check on pxe.subnet. Config, production/example yaml,
and pxe-setup.sh swap --dhcp-range for --subnet.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 12:02:49 -04:00

Vetting

Post-repair hardware validation pipeline for Proxmox cluster hosts. Register a host, click Start Vetting, and the orchestrator will PXE-boot it into a custom Linux live image and run it through a consistent battery of tests (CPU stress, RAM stress, SMART, disk I/O, network throughput, GPU, PSU telemetry). Pass → auto-shutdown + HTML report. Fail → pipeline halts, SSH drops in, notification fires.

Built for solo-operator home labs: one Go binary, SQLite + flat files, HTMX + SSE UI, bundled dnsmasq, optional ntfy / Discord / SMTP notifications.

Documentation

Quick start (local, against QEMU)

make all
./bin/vetting --config deploy/vetting.example.yaml
# → http://localhost:8080

The UI has no built-in auth — bind to loopback or LAN only, or front the service with a reverse proxy (Caddy/nginx basic-auth) if you want a password. The agent↔orchestrator channel keeps its own bearer-token auth and is unaffected.

For a full end-to-end QEMU walk-through (bridge setup, host registration, PXE boot), see docs/operations.md § First vetting run.

Production install (Proxmox LXC)

On a fresh Debian/Ubuntu LXC, as root:

curl -fsSL https://gitea.thewrightserver.net/josh/Vetting/raw/branch/main/deploy/proxmox-install.sh | bash

That installs Go (if missing), clones the repo to /opt/vetting-src, builds vetting-linux-amd64, and hands off to deploy/install.sh — which lays down the binary, systemd unit, example config, and vetting service user. Then:

# Edit /etc/vetting/vetting.yaml (server.bind + server.public_url)
sudo systemctl enable --now vetting
journalctl -fu vetting

Prefer to build yourself? The manual path:

make orchestrator-linux
scp -r bin deploy lxc:/opt/vetting/
ssh lxc "cd /opt/vetting && sudo ./deploy/install.sh"
ssh lxc "sudo systemctl enable --now vetting"

See docs/operations.md § Install for the full walkthrough.

Repository layout

cmd/                  orchestrator + agent entrypoints
internal/             core packages (see docs/architecture.md for the map)
agent/                in-image agent logic (claim loop, stage dispatch, probes)
live-image/           mkosi config for the PXE-bootable Debian live image
deploy/               systemd unit + install.sh + example config
docs/                 operator + developer docs
test/e2e/             build-tag-gated QEMU + PXE full-stack test
tools/                small CLI helpers

Development

  • make test — Go unit + smoke tests (cross-platform)
  • make vetgo vet on the whole module
  • make live-image — Linux-only; run under WSL from Windows
  • make e2e — requires Linux root + live image + running orchestrator
  • make run — build + launch the orchestrator with the example config

Windows hosts: everything except live-image and e2e works natively. The live image build calls mkosi which needs a real Linux userspace, so use WSL for those targets.

Status

All six phases in the original plan are implemented. The E2E QEMU harness is wired in test/e2e/qemu_test.go but requires a running orchestrator + registered host + queued run as preconditions — it's a developer-facing integration harness, not a unit test.

S
Description
Hardware validation pipeline
Readme 976 KiB
Languages
Go 81.1%
Shell 6.7%
templ 5.5%
CSS 3.8%
Go Template 1%
Other 1.9%