Heartbeat-first dispatch: retire WoL-as-default, add WaitingReboot
CI / Lint + build + test (push) Has been cancelled
CI / Lint + build + test (push) Has been cancelled
Every supported host runs vetting-reporter in-OS and heartbeats every 30s. WoL was never the thing that started vetting — the heartbeat response's reboot_for_vetting command was. Firing WoL first only crowded the run log with misleading diagnostics when the real failure mode is "reporter isn't installed." - StartRun 409s if the host hasn't heartbeated within 60s, pointing the operator at /register/quick.sh. - Dispatcher re-checks LastSeenAt at dispatch time (run may sit in Queued long enough for the host to go offline); stale hosts mark the run Failed with failed_stage=dispatch instead of looping. - New StateWaitingReboot + TriggerRebootCommanded capture the actual semantics. StateWaitingWoL kept as the hook point for a future manual-override button. - Tile disables the Start button with a quick.sh tooltip when the host is offline, matching the server-side 409. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -76,6 +76,19 @@ func (r *Runs) MarkFailed(ctx context.Context, runID int64, failedStage, holdIP
|
||||
return err
|
||||
}
|
||||
|
||||
// MarkDispatchFailed records a terminal failure discovered before the run
|
||||
// ever reached a live image, e.g. the dispatcher refused to start because
|
||||
// the host isn't heartbeating. Goes to StateFailed (not FailedHolding)
|
||||
// because there's no live image to ssh into.
|
||||
func (r *Runs) MarkDispatchFailed(ctx context.Context, runID int64, failedStage, result string) error {
|
||||
now := time.Now().UTC()
|
||||
_, err := r.DB.ExecContext(ctx, `
|
||||
UPDATE runs SET state = ?, result = ?, failed_stage = ?, completed_at = ?
|
||||
WHERE id = ?
|
||||
`, string(model.StateFailed), result, failedStage, now, runID)
|
||||
return err
|
||||
}
|
||||
|
||||
func (r *Runs) MarkCompleted(ctx context.Context, runID int64, reportPath string) error {
|
||||
now := time.Now().UTC()
|
||||
_, err := r.DB.ExecContext(ctx, `
|
||||
|
||||
Reference in New Issue
Block a user