feat(end-of-run): reboot to local disk instead of powering off
Completed runs now reboot the host and fall through iPXE to the next boot device (local disk) instead of powering off. Three coordinated changes: - pxe/ipxe: NoActiveRunScript exits iPXE (drops to next boot entry) instead of `sleep 10; poweroff`. Without this, a Completed reboot just loops through PXE and gets told to poweroff. - api/agent_handlers: heartbeat returns cmd=reboot (was cmd=shutdown) when the run reaches Completed. - agent/runner: runs `systemctl reboot` (with `shutdown -r now` fallback) in response to cmd=reboot. Operator cancel still powers off — powerOffAndReturn is unchanged because a cancel means the operator wants the host idle so they can walk up to it, not back in rotation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
+8
-7
@@ -10,8 +10,9 @@
|
||||
// Terminal states:
|
||||
// - FailedHolding → request hold key, install authorized_keys, wait
|
||||
// on heartbeats for a retry_stage directive.
|
||||
// - Completed → heartbeat carries cmd=shutdown; agent runs
|
||||
// `systemctl poweroff` and exits.
|
||||
// - Completed → heartbeat carries cmd=reboot; agent runs
|
||||
// `systemctl reboot` and exits. The host comes back through iPXE,
|
||||
// finds no active run, and exits iPXE into the next boot device.
|
||||
//
|
||||
// Thermal sidecar runs from the moment the agent claims until ctx
|
||||
// cancel; it posts a handful of /sys/class/hwmon samples every 5s.
|
||||
@@ -604,13 +605,13 @@ func heartbeatLoop(ctx context.Context, c *Client, fwd *logForwarder, out chan<-
|
||||
fwd.warn("orchestrator said abort; stopping loop")
|
||||
return
|
||||
}
|
||||
if resp.Cmd == "shutdown" {
|
||||
fwd.info("orchestrator said shutdown; powering off host")
|
||||
if resp.Cmd == "reboot" {
|
||||
fwd.info("orchestrator said reboot; rebooting host")
|
||||
// Best effort: systemd then sysvinit fallback. Either way,
|
||||
// return so the agent process stops issuing heartbeats.
|
||||
if err := exec.Command("systemctl", "poweroff").Run(); err != nil {
|
||||
fwd.warn("systemctl poweroff failed: " + err.Error())
|
||||
_ = exec.Command("shutdown", "-h", "now").Run()
|
||||
if err := exec.Command("systemctl", "reboot").Run(); err != nil {
|
||||
fwd.warn("systemctl reboot failed: " + err.Error())
|
||||
_ = exec.Command("shutdown", "-r", "now").Run()
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user