runs: add non-destructive flag + operator Cancel button
CI / Lint + build + test (push) Successful in 2m5s
Release / release (push) Successful in 3m5s

Non-destructive pre-declares "don't touch the disks" on Start: the
Storage stage skips wipe-probe, badblocks -w, and write-mode fio,
and reports a read-only summary. Runs a new non_destructive column;
threaded through Claim → agent tests.Deps → Storage stage.

Cancel halts an in-flight run. The orchestrator transitions to a
new StateCancelled via TriggerOperatorCancelled (valid from any
active state); the agent's next heartbeat returns cmd=cancel_stage,
which fires a stored CancelFunc on the per-stage context. Stage
subprocesses spawned with exec.CommandContext die with the context,
the agent posts a cancelled outcome, then powers the host off.

Destructive stages mid-run may leave the host in an intermediate
state — the UI confirm dialog warns the operator; recovery is
manual for now.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-04-18 13:01:42 -04:00
parent 2c440fce8a
commit 4524ab8dc0
22 changed files with 434 additions and 230 deletions
+10 -5
View File
@@ -207,11 +207,12 @@ func (a *Agent) Claim(w http.ResponseWriter, r *http.Request) {
iperfPort = 5201
}
writeJSON(w, http.StatusOK, map[string]any{
"ok": true,
"run_id": runID,
"stages": store.DefaultStageOrder,
"expected_disks": expectedDisks,
"iperf_port": iperfPort,
"ok": true,
"run_id": runID,
"stages": store.DefaultStageOrder,
"expected_disks": expectedDisks,
"iperf_port": iperfPort,
"non_destructive": run.NonDestructive,
})
}
@@ -236,6 +237,10 @@ func (a *Agent) Heartbeat(w http.ResponseWriter, r *http.Request) {
case run.State == model.StateCompleted:
// Pipeline succeeded — agent should power the host down.
cmd = "shutdown"
case run.State == model.StateCancelled:
// Operator clicked Cancel — agent cancels the active stage ctx,
// posts a cancelled outcome, and powers off.
cmd = "cancel_stage"
case run.State == model.StateFailedHolding || run.State == model.StateReleased:
cmd = "abort"
case run.FailedStage == "Storage" && overrideWipeSet(run.OverrideFlagsJSON):