Commit Graph

11 Commits

Author SHA1 Message Date
josh 6d50f3a804 feat(install): polish install UX with banner, spinner, progress bar, summary
CI / Lint + build + test (push) Successful in 1m38s
Release / detect (push) Successful in 7s
Release / build-live-image (push) Has been skipped
Release / bundle (push) Successful in 55s
Wrap the three install scripts in a shared inline style block (TTY/UTF-8/
NO_COLOR-aware) so the one-liner install looks and feels intentional:
banner on start, timed step lines, braille spinner over silent apt/
systemctl calls with failure log dumps, single-line curl progress bars
with size-prefixed headers, and a summary box at the end with live-image
version + service state + next steps. install.sh defers banner/summary
to proxmox-install.sh when VETTING_INSTALL_WRAPPED is set so the two
scripts compose without duplication.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-20 22:29:44 -04:00
josh 211abdf08f feat(release): version live-image, skip rebuild+redownload when unchanged
CI / Lint + build + test (push) Successful in 1m41s
Release / detect (push) Successful in 7s
Release / build-live-image (push) Failing after 3m58s
Release / bundle (push) Has been skipped
Splits the release workflow into three jobs (detect, build-live-image,
bundle) so the ~9 min mkosi build only runs when live-image/VERSION
bumps. The slim bundle (~30 MB: orchestrator + agent + deploy scripts
+ a live-image/VERSION pointer) rebuilds every push; the ~300 MB
vmlinuz+initrd.img are published separately under the immutable
live-image/<version>/ path. install.sh compares the pointer to
/var/lib/vetting/live/VERSION and fetches the files only on mismatch,
cutting repeat-install wall-clock from ~30 s + 300 MB to ~10 s + 0 MB
on the common no-live-image-change release.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-20 21:04:14 -04:00
josh a01db63952 feat(install): auto-heal pxe.interface/pxe.subnet against the host
CI / Lint + build + test (push) Successful in 1m42s
Release / release (push) Successful in 19m30s
A stale /etc/vetting/vetting.yaml (e.g. pxe.interface=eth1 after an
LXC rebuild renamed the NIC to eth0) blocks vetting.service startup
with "pxe.interface 'eth1' not found on host", requiring the operator
to ssh in and hand-edit the yaml after every rebuild.

install.sh now validates the pxe block against the host's actual
network state on every install/upgrade run. If pxe.enabled is true and
pxe.interface doesn't exist (or pxe.subnet is missing/malformed), the
script auto-detects the primary NIC via the default route, reads its
subnet from the kernel-scope route, and patches both values in place.
Valid configs are left exactly as the operator had them; fresh
installs with pxe.enabled=false skip the check entirely.

The one-liner install/update is now self-healing for the most common
stale-config failure mode.
2026-04-20 19:56:39 -04:00
josh cf3a75591c install: stage pxe-setup.sh at /usr/local/sbin/vetting-pxe-setup
CI / Lint + build + test (push) Successful in 1m36s
Release / release (push) Successful in 2m29s
proxmox-install.sh tarball-extracts into a tempdir that gets wiped on
EXIT, so after the one-liner there's no pxe-setup.sh on disk for the
operator to run. Have install.sh drop the script + ipxe-shas.txt into
/usr/local/share/vetting/ and symlink it as
/usr/local/sbin/vetting-pxe-setup (in PATH).

pxe-setup.sh now readlink -f's BASH_SOURCE so the symlink resolves to
the share dir where ipxe-shas.txt lives, and gracefully handles the
case where install.sh already staged vmlinuz + initrd.img into
LIVE_DIR (no bundle live-image/ needed at that point).

Update the trailing hint in proxmox-install.sh and the operations
runbook to surface the new `sudo vetting-pxe-setup ...` command.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 12:10:23 -04:00
josh f927a4a66b install.sh: stage live image and auto-restart on upgrade
CI / Lint + build + test (push) Successful in 1m38s
Release / release (push) Successful in 1m45s
Single-command upgrades were leaving /var/lib/vetting/live/ stale on
PXE-enabled LXCs because install.sh explicitly punted live-image
staging to pxe-setup.sh. That was right when make-release ran on a
dev box, but the new registry-pull flow ships vmlinuz+initrd.img
inside the bundle — they should land in place during every install.

install.sh now:
  - auto-detects live-image/{vmlinuz,initrd.img} (release bundle
    layout) or ../live-image/build/ (repo dev checkout) and stages
    them into --live-dir (default /var/lib/vetting/live).
  - restarts vetting.service when already enabled, so the
    curl | sudo bash one-liner is the full upgrade loop. First-
    install path still leaves the service stopped for config edits.

pxe-setup.sh's own live-image copy is now redundant on upgrade but
still runs for first-time PXE setup (it also writes the pxe: block
of vetting.yaml, which install.sh has no business touching).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 10:38:34 -04:00
josh 6ce95547f4 deploy: mark install.sh + pxe-setup.sh executable in git index
CI / Lint + build + test (push) Failing after 5m13s
Git on Windows dropped the exec bit when the files were first committed,
so `sudo ./pxe-setup.sh` on the LXC errored with "command not found".
Fix via `git update-index --chmod=+x`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 01:43:02 -04:00
josh a0c0fb114f Add host-mode heartbeat: vetting-agent host + last-seen badge
CI / Lint + build + test (push) Has been cancelled
vetting-agent gains a `host` subcommand that runs as a systemd service
installed by the quick-register one-liner, POSTing every 30s to
/api/v1/hosts/{mac}/heartbeat so the dashboard tile shows "online" or
"Nm ago" without waiting on WoL. Ships dormant client code for the
Phase 2 reboot_for_vetting command so the server can flip it on later
without a binary redeploy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-17 23:34:15 -04:00
josh 42da48864f Remove operator auth — trust the LAN
CI / Lint + build + test (push) Failing after 5m15s
Can't log in from a fresh LXC deploy, and the service is LAN-only by
design. Rip out the whole bcrypt-password / signed-cookie session
layer: internal/auth, login templates, gen-admin-password binary +
Makefile targets, auth config block, login/logout routes and the
RequireSession middleware wrap. Agent bearer-token auth on
/api/v1/runs/{id}/* is untouched.

Operators who want a password can front the service with a reverse
proxy — noted in README and docs/operations.md.
2026-04-17 22:31:49 -04:00
josh 273e7593bc Fix LXC deploy: absolute paths + systemd section for StartLimit
CI / Lint + build + test (push) Failing after 5m17s
Service was crashing on every boot because vetting.example.yaml uses
./var/... relative paths that resolve to / under ProtectSystem=strict.
Ship a separate vetting.production.yaml with absolute /var/lib/vetting
+ /var/log/vetting paths that match the unit's ReadWritePaths, and
have install.sh copy that one. Also move StartLimit* keys into [Unit]
to silence the 'Unknown key' warning on modern systemd.
2026-04-17 22:02:03 -04:00
josh 47b4fa35a6 Install gen-admin-password alongside vetting
CI / Lint + build + test (push) Failing after 5m16s
proxmox-install.sh + install.sh left operators with no way to
generate the bcrypt hash on the LXC — 'vetting gen-admin-password'
was suggested in the post-install message but the binary has no
subcommands. Cross-build gen-admin-password-linux-amd64 during the
one-liner flow and drop it into /usr/local/bin.
2026-04-17 21:50:54 -04:00
josh 9bb4b09a04 Initial commit: full Phases 1-6 implementation
CI / Lint + build + test (push) Has been cancelled
Post-repair hardware validation pipeline for Proxmox cluster hosts.
Go orchestrator + in-image agent + mkosi live image + bundled dnsmasq
PXE + SQLite + HTMX/SSE UI + notify registry + janitor + full docs.
2026-04-17 21:32:10 -04:00