9 Commits

Author SHA1 Message Date
josh 8367ec2a9f docs: comprehensive documentation expansion
CI / Lint + build + test (push) Successful in 1m36s
Release / detect (push) Successful in 5s
Release / build-live-image (push) Has been skipped
Release / bundle (push) Successful in 49s
Add 4 new doc files (configuration reference, development guide, API
reference with full request/response schemas, database schema), expand
the README with a feature list and how-it-works walkthrough, fix
missing Firmware and Burn stages in architecture.md and test-suite.md,
add threshold engine and host-mode agent sections, and add godoc
comments to 11 packages and 6 model types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-23 18:37:26 -04:00
josh 211abdf08f feat(release): version live-image, skip rebuild+redownload when unchanged
CI / Lint + build + test (push) Successful in 1m41s
Release / detect (push) Successful in 7s
Release / build-live-image (push) Failing after 3m58s
Release / bundle (push) Has been skipped
Splits the release workflow into three jobs (detect, build-live-image,
bundle) so the ~9 min mkosi build only runs when live-image/VERSION
bumps. The slim bundle (~30 MB: orchestrator + agent + deploy scripts
+ a live-image/VERSION pointer) rebuilds every push; the ~300 MB
vmlinuz+initrd.img are published separately under the immutable
live-image/<version>/ path. install.sh compares the pointer to
/var/lib/vetting/live/VERSION and fetches the files only on mismatch,
cutting repeat-install wall-clock from ~30 s + 300 MB to ~10 s + 0 MB
on the common no-live-image-change release.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-20 21:04:14 -04:00
josh cf3a75591c install: stage pxe-setup.sh at /usr/local/sbin/vetting-pxe-setup
CI / Lint + build + test (push) Successful in 1m36s
Release / release (push) Successful in 2m29s
proxmox-install.sh tarball-extracts into a tempdir that gets wiped on
EXIT, so after the one-liner there's no pxe-setup.sh on disk for the
operator to run. Have install.sh drop the script + ipxe-shas.txt into
/usr/local/share/vetting/ and symlink it as
/usr/local/sbin/vetting-pxe-setup (in PATH).

pxe-setup.sh now readlink -f's BASH_SOURCE so the symlink resolves to
the share dir where ipxe-shas.txt lives, and gracefully handles the
case where install.sh already staged vmlinuz + initrd.img into
LIVE_DIR (no bundle live-image/ needed at that point).

Update the trailing hint in proxmox-install.sh and the operations
runbook to surface the new `sudo vetting-pxe-setup ...` command.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 12:10:23 -04:00
josh bcbbc35489 docs+e2e: document proxy-DHCP topology; default e2e bridge to LAN
CI / Lint + build + test (push) Successful in 1m37s
Release / release (push) Has been cancelled
Rewrites the PXE section of the ops runbook around the new proxy-DHCP
model (no dedicated bridge, coexists with UniFi/pfSense/etc.) and
swaps the e2e test's default bridge + orchestrator URL to match. The
e2e file now calls out the LAN-DHCP precondition in its header so
future-me (or CI) doesn't hang at PXE wondering why nothing answers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 12:07:05 -04:00
josh f927a4a66b install.sh: stage live image and auto-restart on upgrade
CI / Lint + build + test (push) Successful in 1m38s
Release / release (push) Successful in 1m45s
Single-command upgrades were leaving /var/lib/vetting/live/ stale on
PXE-enabled LXCs because install.sh explicitly punted live-image
staging to pxe-setup.sh. That was right when make-release ran on a
dev box, but the new registry-pull flow ships vmlinuz+initrd.img
inside the bundle — they should land in place during every install.

install.sh now:
  - auto-detects live-image/{vmlinuz,initrd.img} (release bundle
    layout) or ../live-image/build/ (repo dev checkout) and stages
    them into --live-dir (default /var/lib/vetting/live).
  - restarts vetting.service when already enabled, so the
    curl | sudo bash one-liner is the full upgrade loop. First-
    install path still leaves the service stopped for config edits.

pxe-setup.sh's own live-image copy is now redundant on upgrade but
still runs for first-time PXE setup (it also writes the pxe: block
of vetting.yaml, which install.sh has no business touching).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 10:38:34 -04:00
josh f188c7add4 proxmox-install: fetch prebuilt bundle from Gitea package registry
CI / Lint + build + test (push) Has been cancelled
Release / release (push) Has been cancelled
Drops the per-install Go toolchain dance + source build. The installer
now just curls the bundle from
${REGISTRY_URL}/api/packages/${PACKAGE_OWNER}/generic/vetting/${VETTING_VERSION}/vetting-bundle.tar.gz,
extracts it, and hands off to the bundled install.sh with explicit
--binary / --agent-binary paths so the in-bundle layout is picked up.

Default version is `latest` (rolling alias, overwritten by release.yml
on each push to main). Pin via `VETTING_VERSION=sha-abc1234 curl ... |
bash` when rolling back or testing a specific commit.

Removes the `apt install build-essential git` + Go toolchain download
+ templ install + `make orchestrator-linux agent-linux` path — the CI
workflow already produced all of that. Install time on a cold LXC
drops from minutes to under a minute, and live-image kernel/initrd
now arrive with every install instead of requiring a separate WSL
build.

Also rewrites docs/operations.md's install section around the
one-liner, keeps the `make release` + scp path as the offline
fallback, and swaps the upgrade section to just "rerun the one-liner."

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 02:16:02 -04:00
josh a5055b3c7a Automate PXE setup: release bundle + pxe-setup.sh + startup validation
CI / Lint + build + test (push) Has been cancelled
Collapses the LXC side of PXE enablement from a six-step manual dance
(build, fetch iPXE, scp, bridge, hand-edit yaml) into:

  make release                   # dev box (Linux/WSL)
  scp bundle.tar.gz lxc:/tmp/
  sudo ./install.sh              # base install, unchanged
  sudo ./pxe-setup.sh --interface ... --dhcp-range ... --orchestrator-url ...

pxe-setup.sh fetches iPXE from boot.ipxe.org, verifies against pinned
SHA256s in deploy/ipxe-shas.txt (fail-closed), places vmlinuz/initrd.img
from the bundle, and rewrites only the pxe: block of vetting.yaml.
Idempotent; --force gates overwriting a hand-edited block.

Adds Supervisor.Validate() — called before dnsmasq spawn — so typo'd
configs fail at orchestrator startup with clear errors naming the
missing file or yaml key, instead of silently serving broken TFTP
until a real host tries to PXE-boot. Nine tests cover missing files,
bogus interface, malformed dhcp_range, bad orchestrator_url, and
aggregate reporting.

Hypervisor bridge creation stays documented (LXC can't do it) but
everything downstream of the bridge is now scripted.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 01:38:43 -04:00
josh 42da48864f Remove operator auth — trust the LAN
CI / Lint + build + test (push) Failing after 5m15s
Can't log in from a fresh LXC deploy, and the service is LAN-only by
design. Rip out the whole bcrypt-password / signed-cookie session
layer: internal/auth, login templates, gen-admin-password binary +
Makefile targets, auth config block, login/logout routes and the
RequireSession middleware wrap. Agent bearer-token auth on
/api/v1/runs/{id}/* is untouched.

Operators who want a password can front the service with a reverse
proxy — noted in README and docs/operations.md.
2026-04-17 22:31:49 -04:00
josh 9bb4b09a04 Initial commit: full Phases 1-6 implementation
CI / Lint + build + test (push) Has been cancelled
Post-repair hardware validation pipeline for Proxmox cluster hosts.
Go orchestrator + in-image agent + mkosi live image + bundled dnsmasq
PXE + SQLite + HTMX/SSE UI + notify registry + janitor + full docs.
2026-04-17 21:32:10 -04:00