feat(end-of-run): reboot to local disk instead of powering off
CI / Lint + build + test (push) Successful in 1m47s
Release / release (push) Successful in 10m8s

Completed runs now reboot the host and fall through iPXE to the next
boot device (local disk) instead of powering off. Three coordinated
changes:

- pxe/ipxe: NoActiveRunScript exits iPXE (drops to next boot entry)
  instead of `sleep 10; poweroff`. Without this, a Completed reboot
  just loops through PXE and gets told to poweroff.
- api/agent_handlers: heartbeat returns cmd=reboot (was cmd=shutdown)
  when the run reaches Completed.
- agent/runner: runs `systemctl reboot` (with `shutdown -r now`
  fallback) in response to cmd=reboot.

Operator cancel still powers off — powerOffAndReturn is unchanged
because a cancel means the operator wants the host idle so they can
walk up to it, not back in rotation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-04-19 22:45:11 -04:00
parent 8acef92a60
commit 3656af9823
4 changed files with 23 additions and 17 deletions
+8 -7
View File
@@ -10,8 +10,9 @@
// Terminal states:
// - FailedHolding → request hold key, install authorized_keys, wait
// on heartbeats for a retry_stage directive.
// - Completed → heartbeat carries cmd=shutdown; agent runs
// `systemctl poweroff` and exits.
// - Completed → heartbeat carries cmd=reboot; agent runs
// `systemctl reboot` and exits. The host comes back through iPXE,
// finds no active run, and exits iPXE into the next boot device.
//
// Thermal sidecar runs from the moment the agent claims until ctx
// cancel; it posts a handful of /sys/class/hwmon samples every 5s.
@@ -604,13 +605,13 @@ func heartbeatLoop(ctx context.Context, c *Client, fwd *logForwarder, out chan<-
fwd.warn("orchestrator said abort; stopping loop")
return
}
if resp.Cmd == "shutdown" {
fwd.info("orchestrator said shutdown; powering off host")
if resp.Cmd == "reboot" {
fwd.info("orchestrator said reboot; rebooting host")
// Best effort: systemd then sysvinit fallback. Either way,
// return so the agent process stops issuing heartbeats.
if err := exec.Command("systemctl", "poweroff").Run(); err != nil {
fwd.warn("systemctl poweroff failed: " + err.Error())
_ = exec.Command("shutdown", "-h", "now").Run()
if err := exec.Command("systemctl", "reboot").Run(); err != nil {
fwd.warn("systemctl reboot failed: " + err.Error())
_ = exec.Command("shutdown", "-r", "now").Run()
}
return
}