Single-command upgrades were leaving /var/lib/vetting/live/ stale on
PXE-enabled LXCs because install.sh explicitly punted live-image
staging to pxe-setup.sh. That was right when make-release ran on a
dev box, but the new registry-pull flow ships vmlinuz+initrd.img
inside the bundle — they should land in place during every install.
install.sh now:
- auto-detects live-image/{vmlinuz,initrd.img} (release bundle
layout) or ../live-image/build/ (repo dev checkout) and stages
them into --live-dir (default /var/lib/vetting/live).
- restarts vetting.service when already enabled, so the
curl | sudo bash one-liner is the full upgrade loop. First-
install path still leaves the service stopped for config edits.
pxe-setup.sh's own live-image copy is now redundant on upgrade but
still runs for first-time PXE setup (it also writes the pxe: block
of vetting.yaml, which install.sh has no business touching).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drops the per-install Go toolchain dance + source build. The installer
now just curls the bundle from
${REGISTRY_URL}/api/packages/${PACKAGE_OWNER}/generic/vetting/${VETTING_VERSION}/vetting-bundle.tar.gz,
extracts it, and hands off to the bundled install.sh with explicit
--binary / --agent-binary paths so the in-bundle layout is picked up.
Default version is `latest` (rolling alias, overwritten by release.yml
on each push to main). Pin via `VETTING_VERSION=sha-abc1234 curl ... |
bash` when rolling back or testing a specific commit.
Removes the `apt install build-essential git` + Go toolchain download
+ templ install + `make orchestrator-linux agent-linux` path — the CI
workflow already produced all of that. Install time on a cold LXC
drops from minutes to under a minute, and live-image kernel/initrd
now arrive with every install instead of requiring a separate WSL
build.
Also rewrites docs/operations.md's install section around the
one-liner, keeps the `make release` + scp path as the offline
fallback, and swaps the upgrade section to just "rerun the one-liner."
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The production yaml ships `interface: "" # e.g. "eth0"`.
The old extractor did `gsub(/^"|"$/, "")` which only strips outer quotes, so
with an inline comment containing quotes it produced garbage like
`" # e.g. "eth0`, tripping the idempotency check.
Replaces the two inline extractors with one `extract_yaml_value` helper
that first tries to match `"[^"]*"` (grabbing only the first quoted
value), falling back to strip-trailing-comment + trim for unquoted
values.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Git on Windows dropped the exec bit when the files were first committed,
so `sudo ./pxe-setup.sh` on the LXC errored with "command not found".
Fix via `git update-index --chmod=+x`.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Collapses the LXC side of PXE enablement from a six-step manual dance
(build, fetch iPXE, scp, bridge, hand-edit yaml) into:
make release # dev box (Linux/WSL)
scp bundle.tar.gz lxc:/tmp/
sudo ./install.sh # base install, unchanged
sudo ./pxe-setup.sh --interface ... --dhcp-range ... --orchestrator-url ...
pxe-setup.sh fetches iPXE from boot.ipxe.org, verifies against pinned
SHA256s in deploy/ipxe-shas.txt (fail-closed), places vmlinuz/initrd.img
from the bundle, and rewrites only the pxe: block of vetting.yaml.
Idempotent; --force gates overwriting a hand-edited block.
Adds Supervisor.Validate() — called before dnsmasq spawn — so typo'd
configs fail at orchestrator startup with clear errors naming the
missing file or yaml key, instead of silently serving broken TFTP
until a real host tries to PXE-boot. Nine tests cover missing files,
bogus interface, malformed dhcp_range, bad orchestrator_url, and
aggregate reporting.
Hypervisor bridge creation stays documented (LXC can't do it) but
everything downstream of the bridge is now scripted.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The agent binary is never run on the LXC, but it has to be present
so /assets/vetting-agent-linux-amd64 can serve it to target hosts
via the quick-register one-liner. Install was failing because only
orchestrator-linux was being built.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Regenerated _templ.go files embed the templ source path at runtime,
which differs between the dev machine and /opt/vetting-src on the
target. That left tracked files modified after every build, and the
next upgrade-run hit "local changes would be overwritten by checkout"
and aborted. /opt/vetting-src is script-managed, so `git reset --hard
origin/<branch>` is the right semantics.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
vetting-agent gains a `host` subcommand that runs as a systemd service
installed by the quick-register one-liner, POSTing every 30s to
/api/v1/hosts/{mac}/heartbeat so the dashboard tile shows "online" or
"Nm ago" without waiting on WoL. Ships dormant client code for the
Phase 2 reboot_for_vetting command so the server can flip it on later
without a binary redeploy.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Can't log in from a fresh LXC deploy, and the service is LAN-only by
design. Rip out the whole bcrypt-password / signed-cookie session
layer: internal/auth, login templates, gen-admin-password binary +
Makefile targets, auth config block, login/logout routes and the
RequireSession middleware wrap. Agent bearer-token auth on
/api/v1/runs/{id}/* is untouched.
Operators who want a password can front the service with a reverse
proxy — noted in README and docs/operations.md.
Service was crashing on every boot because vetting.example.yaml uses
./var/... relative paths that resolve to / under ProtectSystem=strict.
Ship a separate vetting.production.yaml with absolute /var/lib/vetting
+ /var/log/vetting paths that match the unit's ReadWritePaths, and
have install.sh copy that one. Also move StartLimit* keys into [Unit]
to silence the 'Unknown key' warning on modern systemd.
proxmox-install.sh + install.sh left operators with no way to
generate the bcrypt hash on the LXC — 'vetting gen-admin-password'
was suggested in the post-install message but the binary has no
subcommands. Cross-build gen-admin-password-linux-amd64 during the
one-liner flow and drop it into /usr/local/bin.
deploy/proxmox-install.sh bootstraps a fresh LXC end-to-end: apt
prereqs, Go toolchain (if missing), git clone, build, then hands off
to deploy/install.sh. README documents the curl|bash invocation.