Commit Graph

14 Commits

Author SHA1 Message Date
josh b809bf5f3e proxmox-install: show download progress bar for the bundle fetch
CI / Lint + build + test (push) Successful in 1m37s
Release / release (push) Successful in 2m44s
-fsSL suppresses all output during the ~30 MB download, which
leaves the operator staring at 'fetching bundle...' for up to a
minute on a cold registry. Drop -s and add --progress-bar so there
is a live indicator; keep -fL so we still fail on HTTP errors and
follow redirects. Print the downloaded size alongside 'extracting'
for quick sanity-checking.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 11:43:51 -04:00
josh f927a4a66b install.sh: stage live image and auto-restart on upgrade
CI / Lint + build + test (push) Successful in 1m38s
Release / release (push) Successful in 1m45s
Single-command upgrades were leaving /var/lib/vetting/live/ stale on
PXE-enabled LXCs because install.sh explicitly punted live-image
staging to pxe-setup.sh. That was right when make-release ran on a
dev box, but the new registry-pull flow ships vmlinuz+initrd.img
inside the bundle — they should land in place during every install.

install.sh now:
  - auto-detects live-image/{vmlinuz,initrd.img} (release bundle
    layout) or ../live-image/build/ (repo dev checkout) and stages
    them into --live-dir (default /var/lib/vetting/live).
  - restarts vetting.service when already enabled, so the
    curl | sudo bash one-liner is the full upgrade loop. First-
    install path still leaves the service stopped for config edits.

pxe-setup.sh's own live-image copy is now redundant on upgrade but
still runs for first-time PXE setup (it also writes the pxe: block
of vetting.yaml, which install.sh has no business touching).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 10:38:34 -04:00
josh f188c7add4 proxmox-install: fetch prebuilt bundle from Gitea package registry
CI / Lint + build + test (push) Has been cancelled
Release / release (push) Has been cancelled
Drops the per-install Go toolchain dance + source build. The installer
now just curls the bundle from
${REGISTRY_URL}/api/packages/${PACKAGE_OWNER}/generic/vetting/${VETTING_VERSION}/vetting-bundle.tar.gz,
extracts it, and hands off to the bundled install.sh with explicit
--binary / --agent-binary paths so the in-bundle layout is picked up.

Default version is `latest` (rolling alias, overwritten by release.yml
on each push to main). Pin via `VETTING_VERSION=sha-abc1234 curl ... |
bash` when rolling back or testing a specific commit.

Removes the `apt install build-essential git` + Go toolchain download
+ templ install + `make orchestrator-linux agent-linux` path — the CI
workflow already produced all of that. Install time on a cold LXC
drops from minutes to under a minute, and live-image kernel/initrd
now arrive with every install instead of requiring a separate WSL
build.

Also rewrites docs/operations.md's install section around the
one-liner, keeps the `make release` + scp path as the offline
fallback, and swaps the upgrade section to just "rerun the one-liner."

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 02:16:02 -04:00
josh 05bd88b016 pxe-setup: handle quoted defaults whose comments contain quotes
CI / Lint + build + test (push) Failing after 5m14s
The production yaml ships `interface: ""                          # e.g. "eth0"`.
The old extractor did `gsub(/^"|"$/, "")` which only strips outer quotes, so
with an inline comment containing quotes it produced garbage like
`"                          # e.g. "eth0`, tripping the idempotency check.

Replaces the two inline extractors with one `extract_yaml_value` helper
that first tries to match `"[^"]*"` (grabbing only the first quoted
value), falling back to strip-trailing-comment + trim for unquoted
values.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 01:52:31 -04:00
josh 6ce95547f4 deploy: mark install.sh + pxe-setup.sh executable in git index
CI / Lint + build + test (push) Failing after 5m13s
Git on Windows dropped the exec bit when the files were first committed,
so `sudo ./pxe-setup.sh` on the LXC errored with "command not found".
Fix via `git update-index --chmod=+x`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 01:43:02 -04:00
josh a5055b3c7a Automate PXE setup: release bundle + pxe-setup.sh + startup validation
CI / Lint + build + test (push) Has been cancelled
Collapses the LXC side of PXE enablement from a six-step manual dance
(build, fetch iPXE, scp, bridge, hand-edit yaml) into:

  make release                   # dev box (Linux/WSL)
  scp bundle.tar.gz lxc:/tmp/
  sudo ./install.sh              # base install, unchanged
  sudo ./pxe-setup.sh --interface ... --dhcp-range ... --orchestrator-url ...

pxe-setup.sh fetches iPXE from boot.ipxe.org, verifies against pinned
SHA256s in deploy/ipxe-shas.txt (fail-closed), places vmlinuz/initrd.img
from the bundle, and rewrites only the pxe: block of vetting.yaml.
Idempotent; --force gates overwriting a hand-edited block.

Adds Supervisor.Validate() — called before dnsmasq spawn — so typo'd
configs fail at orchestrator startup with clear errors naming the
missing file or yaml key, instead of silently serving broken TFTP
until a real host tries to PXE-boot. Nine tests cover missing files,
bogus interface, malformed dhcp_range, bad orchestrator_url, and
aggregate reporting.

Hypervisor bridge creation stays documented (LXC can't do it) but
everything downstream of the bridge is now scripted.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 01:38:43 -04:00
josh a3d5e2d0a4 proxmox-install: build agent binary for serving
CI / Lint + build + test (push) Failing after 5m22s
The agent binary is never run on the LXC, but it has to be present
so /assets/vetting-agent-linux-amd64 can serve it to target hosts
via the quick-register one-liner. Install was failing because only
orchestrator-linux was being built.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 00:12:41 -04:00
josh af41644929 proxmox-install: reset hard instead of checkout
CI / Lint + build + test (push) Failing after 5m25s
Regenerated _templ.go files embed the templ source path at runtime,
which differs between the dev machine and /opt/vetting-src on the
target. That left tracked files modified after every build, and the
next upgrade-run hit "local changes would be overwritten by checkout"
and aborted. /opt/vetting-src is script-managed, so `git reset --hard
origin/<branch>` is the right semantics.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 00:02:01 -04:00
josh a0c0fb114f Add host-mode heartbeat: vetting-agent host + last-seen badge
CI / Lint + build + test (push) Has been cancelled
vetting-agent gains a `host` subcommand that runs as a systemd service
installed by the quick-register one-liner, POSTing every 30s to
/api/v1/hosts/{mac}/heartbeat so the dashboard tile shows "online" or
"Nm ago" without waiting on WoL. Ships dormant client code for the
Phase 2 reboot_for_vetting command so the server can flip it on later
without a binary redeploy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-17 23:34:15 -04:00
josh 42da48864f Remove operator auth — trust the LAN
CI / Lint + build + test (push) Failing after 5m15s
Can't log in from a fresh LXC deploy, and the service is LAN-only by
design. Rip out the whole bcrypt-password / signed-cookie session
layer: internal/auth, login templates, gen-admin-password binary +
Makefile targets, auth config block, login/logout routes and the
RequireSession middleware wrap. Agent bearer-token auth on
/api/v1/runs/{id}/* is untouched.

Operators who want a password can front the service with a reverse
proxy — noted in README and docs/operations.md.
2026-04-17 22:31:49 -04:00
josh 273e7593bc Fix LXC deploy: absolute paths + systemd section for StartLimit
CI / Lint + build + test (push) Failing after 5m17s
Service was crashing on every boot because vetting.example.yaml uses
./var/... relative paths that resolve to / under ProtectSystem=strict.
Ship a separate vetting.production.yaml with absolute /var/lib/vetting
+ /var/log/vetting paths that match the unit's ReadWritePaths, and
have install.sh copy that one. Also move StartLimit* keys into [Unit]
to silence the 'Unknown key' warning on modern systemd.
2026-04-17 22:02:03 -04:00
josh 47b4fa35a6 Install gen-admin-password alongside vetting
CI / Lint + build + test (push) Failing after 5m16s
proxmox-install.sh + install.sh left operators with no way to
generate the bcrypt hash on the LXC — 'vetting gen-admin-password'
was suggested in the post-install message but the binary has no
subcommands. Cross-build gen-admin-password-linux-amd64 during the
one-liner flow and drop it into /usr/local/bin.
2026-04-17 21:50:54 -04:00
josh 64acb97073 Add one-liner Proxmox LXC installer
CI / Lint + build + test (push) Failing after 5m17s
deploy/proxmox-install.sh bootstraps a fresh LXC end-to-end: apt
prereqs, Go toolchain (if missing), git clone, build, then hands off
to deploy/install.sh. README documents the curl|bash invocation.
2026-04-17 21:39:47 -04:00
josh 9bb4b09a04 Initial commit: full Phases 1-6 implementation
CI / Lint + build + test (push) Has been cancelled
Post-repair hardware validation pipeline for Proxmox cluster hosts.
Go orchestrator + in-image agent + mkosi live image + bundled dnsmasq
PXE + SQLite + HTMX/SSE UI + notify registry + janitor + full docs.
2026-04-17 21:32:10 -04:00