Automate PXE setup: release bundle + pxe-setup.sh + startup validation
CI / Lint + build + test (push) Has been cancelled
CI / Lint + build + test (push) Has been cancelled
Collapses the LXC side of PXE enablement from a six-step manual dance (build, fetch iPXE, scp, bridge, hand-edit yaml) into: make release # dev box (Linux/WSL) scp bundle.tar.gz lxc:/tmp/ sudo ./install.sh # base install, unchanged sudo ./pxe-setup.sh --interface ... --dhcp-range ... --orchestrator-url ... pxe-setup.sh fetches iPXE from boot.ipxe.org, verifies against pinned SHA256s in deploy/ipxe-shas.txt (fail-closed), places vmlinuz/initrd.img from the bundle, and rewrites only the pxe: block of vetting.yaml. Idempotent; --force gates overwriting a hand-edited block. Adds Supervisor.Validate() — called before dnsmasq spawn — so typo'd configs fail at orchestrator startup with clear errors naming the missing file or yaml key, instead of silently serving broken TFTP until a real host tries to PXE-boot. Nine tests cover missing files, bogus interface, malformed dhcp_range, bad orchestrator_url, and aggregate reporting. Hypervisor bridge creation stays documented (LXC can't do it) but everything downstream of the bridge is now scripted. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
+83
-45
@@ -11,66 +11,104 @@ Target: a Debian/Ubuntu LXC on the Proxmox host that holds the cluster
|
||||
you're vetting for. The LXC must be on the same L2 segment as the
|
||||
repaired nodes so DHCP and WoL work.
|
||||
|
||||
1. On your workstation, cross-build the binary:
|
||||
### One-shot release bundle (recommended)
|
||||
|
||||
```
|
||||
make orchestrator-linux
|
||||
```
|
||||
On your dev workstation (Linux, or WSL on Windows):
|
||||
|
||||
This produces `bin/vetting-linux-amd64`.
|
||||
```
|
||||
make release
|
||||
```
|
||||
|
||||
2. Copy the repo tree (or just `bin/`, `deploy/`) into the LXC, then
|
||||
from inside the LXC:
|
||||
Produces `bin/vetting-bundle-<sha>.tar.gz` containing the orchestrator
|
||||
binary, agent binary, live image (`vmlinuz` + `initrd.img`), install
|
||||
scripts, `vetting.service`, the production yaml, and the pinned iPXE
|
||||
SHA256 file.
|
||||
|
||||
```
|
||||
sudo ./deploy/install.sh
|
||||
```
|
||||
Ship it to the LXC:
|
||||
|
||||
The installer:
|
||||
- `apt install`s `dnsmasq`, `iperf3`, `ca-certificates`
|
||||
- creates the `vetting` system user (home = `/var/lib/vetting`)
|
||||
- installs the binary into `/usr/local/bin/vetting`
|
||||
- drops `vetting.example.yaml` into `/etc/vetting/vetting.yaml`
|
||||
(only if there's no existing config — existing configs are
|
||||
preserved)
|
||||
- drops `/etc/systemd/system/vetting.service`
|
||||
- disables the distro-default dnsmasq (the orchestrator supervises
|
||||
its own)
|
||||
```
|
||||
scp bin/vetting-bundle-<sha>.tar.gz lxc:/tmp/
|
||||
ssh lxc 'cd /tmp && tar xzf vetting-bundle-*.tar.gz'
|
||||
ssh lxc 'cd /tmp/vetting-bundle-<sha> && sudo ./install.sh'
|
||||
```
|
||||
|
||||
The installer does **not** enable the service. You'll want to edit
|
||||
the config first.
|
||||
`install.sh` does the base install (user, binaries, config, systemd
|
||||
unit). If you don't need PXE (e.g. host-mode reporter only, no
|
||||
automated live-boots), you can stop here — edit
|
||||
`/etc/vetting/vetting.yaml` to tune `server.bind` / `public_url`,
|
||||
then `sudo systemctl enable --now vetting`.
|
||||
|
||||
3. Edit `/etc/vetting/vetting.yaml`:
|
||||
### PXE enablement
|
||||
|
||||
- `server.bind` — defaults to `127.0.0.1:8080`. Switch to
|
||||
`0.0.0.0:8080` (or bind to a specific LAN IP) once you're ready
|
||||
to expose it. There is no built-in auth — see *Exposing outside
|
||||
the LAN* below.
|
||||
- `server.public_url` — the URL your browser hits the LXC on
|
||||
(e.g. `http://vetting.lan:8080`). Used as the click-through link
|
||||
in notifications.
|
||||
PXE is gated behind a second script so non-PXE installs stay simple.
|
||||
|
||||
4. (Optional) Configure notifiers in the same file — see the
|
||||
commented-out example block for ntfy / Discord / SMTP.
|
||||
**Prerequisite: dedicated PXE bridge on the Proxmox hypervisor.** The
|
||||
LXC can't create bridges on its host, so do this once on the Proxmox
|
||||
node (not inside the LXC):
|
||||
|
||||
5. Enable and start:
|
||||
```
|
||||
sudo ip link add br-vetting type bridge
|
||||
sudo ip addr add 10.77.0.1/24 dev br-vetting
|
||||
sudo ip link set br-vetting up
|
||||
```
|
||||
|
||||
```
|
||||
sudo systemctl enable --now vetting
|
||||
sudo journalctl -fu vetting
|
||||
```
|
||||
Attach a veth from the LXC onto `br-vetting` (e.g. `eth1` inside the
|
||||
LXC at `10.77.0.2/24`). Repaired nodes PXE-boot from a NIC cabled or
|
||||
bridged onto `br-vetting` only — keep this network isolated from your
|
||||
household DHCP, or both DHCP servers will fight.
|
||||
|
||||
On the LXC, inside the extracted bundle:
|
||||
|
||||
```
|
||||
sudo ./pxe-setup.sh \
|
||||
--interface eth1 \
|
||||
--dhcp-range 10.77.0.100,10.77.0.200,12h \
|
||||
--orchestrator-url http://10.77.0.2:8080
|
||||
```
|
||||
|
||||
The script:
|
||||
|
||||
- Fetches `ipxe.efi` + `undionly.kpxe` from boot.ipxe.org and verifies
|
||||
SHA256 against `ipxe-shas.txt` (fail-closed on mismatch).
|
||||
- Places `vmlinuz` + `initrd.img` into `/var/lib/vetting/live/`.
|
||||
- Rewrites the `pxe:` block of `/etc/vetting/vetting.yaml` to enable
|
||||
PXE with the flags you passed.
|
||||
|
||||
It does **not** restart the service — review the rendered config,
|
||||
then:
|
||||
|
||||
```
|
||||
sudo systemctl restart vetting
|
||||
sudo journalctl -fu vetting
|
||||
```
|
||||
|
||||
The orchestrator validates PXE preconditions at startup (interface
|
||||
exists, iPXE binaries are on disk, `dhcp_range` parses) and exits
|
||||
non-zero with a clear error if anything's wrong, instead of failing
|
||||
silently when a host first PXE-boots.
|
||||
|
||||
`pxe-setup.sh` is idempotent — safe to re-run. Pass `--force` to
|
||||
overwrite a hand-edited `pxe:` block.
|
||||
|
||||
### Manual install (no release tarball)
|
||||
|
||||
For dev-loop iteration on the LXC itself:
|
||||
|
||||
1. On your workstation: `make orchestrator-linux && make agent-linux`
|
||||
2. Copy the repo tree (or just `bin/` + `deploy/`) onto the LXC
|
||||
3. `sudo ./deploy/install.sh` → base install
|
||||
4. For PXE: `wsl make live-image` on your workstation,
|
||||
`scp live-image/build/vmlinuz lxc:/tmp/ && scp live-image/build/initrd.img lxc:/tmp/`,
|
||||
then run `pxe-setup.sh --bundle-dir /tmp` (or accept the default
|
||||
repo-tree detection when running from the repo root).
|
||||
|
||||
## First vetting run
|
||||
|
||||
Against a QEMU VM first, before you point it at real hardware:
|
||||
|
||||
1. On the Proxmox host (or wherever your LXC lives):
|
||||
|
||||
```
|
||||
sudo ip link add br-vetting type bridge
|
||||
sudo ip addr add 10.77.0.1/24 dev br-vetting
|
||||
sudo ip link set br-vetting up
|
||||
```
|
||||
1. Make sure the `br-vetting` bridge exists on the hypervisor (see
|
||||
above). From inside the LXC, confirm it's reachable on your
|
||||
PXE-side interface.
|
||||
|
||||
2. In the UI at `http://<lxc>:8080`, register a host:
|
||||
- Name: `qemu-test`
|
||||
@@ -82,7 +120,7 @@ Against a QEMU VM first, before you point it at real hardware:
|
||||
cpu: { logical_cores: 4 }
|
||||
```
|
||||
|
||||
3. Click **Start Vetting**. The UI tile will sit at `Queued → WaitingWoL`.
|
||||
3. Click **Start Vetting**. The UI tile will sit at `Queued → WaitingReboot`.
|
||||
|
||||
4. Launch the QEMU VM on the bridge so it PXE-boots from dnsmasq:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user