Files
josh 8521e04ddf docs(01-05): partial summary — Task 1 complete (validator + schema + refused-sample + Vitest), halted at Task 2 human-curate checkpoint
Documents the asset-provenance gate that landed (validator script, Zod sidecar
schema with the 6 CLAUDE.md fields + optional provenance_schema_version, refused-
sample PNG, tmpdir-isolated Vitest enforcement test) and the resume protocol for
Task 2 (Path A AI-generate / Path B hand-painted-or-licensed-photograph fallback /
Path C defer with explicit IOU). Per plan's autonomous: false flag and orchestrator
spawn instructions, the human curates the 10–20 north-star reference images.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 23:32:11 -04:00

269 lines
20 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
phase: 01-foundations-and-doctrine
plan: 05
subsystem: pipeline
tags: [provenance, ai-assets, validator, zod, vitest, ci-gate, checkpoint, human-curate]
# Dependency graph
requires:
- phase: 01
plan: 01
provides: zod@^4 installed, assets/ tree, validate:assets script key pre-declared in package.json, vitest+happy-dom wired
provides:
- scripts/validate-assets.mjs (CI gate — exits non-zero on any /assets/ file lacking a valid provenance sidecar)
- Zod ProvenanceSchema covering the 6 CLAUDE.md / AEST-08 fields + optional provenance_schema_version (RESEARCH Open Question #2)
- assets/__samples__/refused/no-provenance.png (gate-proof artifact per CONTEXT D-03)
- scripts/validate-assets.test.ts (Vitest enforcement — positive case against real /assets/, negative case against os.tmpdir() fixture)
affects:
- 01-07-ci-workflow (calls `npm run validate:assets` in the composite `ci` script — green now that the validator exists)
- 05-onwards (Phase 5 production-volume asset pipeline scales this floor up; provenance_schema_version=1 implicit, Phase 5 may bump on vendor consolidation)
# Tech tracking
tech-stack:
added:
- "(none new — uses already-installed `zod@^4.4.3` per Plan 01)"
patterns:
- "Sidecar-per-asset naming: `<filename>.<ext>.provenance.json` (e.g., `garden-soil-01.png.provenance.json`) — keeps sidecar adjacent in directory listings, grep-friendly, no stem-collision ambiguity. Per RESEARCH § Pattern 6 sidecar-naming-convention decision."
- "ASSETS_DIR env override on the validator script — lets the Vitest negative-case test point at an isolated tmpdir without modifying production code or polluting the real /assets/ tree (BLOCKER 2 fix)."
- "REFUSED_PREFIXES exclusion list at the top of the validator — explicitly enumerated, so adding new exclusions in future phases is a single-line change."
- "Test-fixture isolation via `os.tmpdir()` + `mkdtemp` — the negative-case fixture lives outside the repo entirely; even if vitest is killed mid-run, the OS reclaims the tmpdir on next reboot. No orphan-fragility risk."
key-files:
created:
- scripts/validate-assets.mjs (~80 lines incl. error handling and Windows-path normalization)
- scripts/validate-assets.test.ts (~50 lines, two-case Vitest)
- assets/__samples__/refused/no-provenance.png (1x1 transparent PNG, 68 bytes — the gate-proof artifact)
- assets/__samples__/refused/.gitkeep
modified:
- vitest.config.ts (added `scripts/**/*.test.ts` to include glob — Rule 3 blocking fix; without this the new test is invisible to vitest)
key-decisions:
- "Optional `provenance_schema_version` field is included in the Zod schema as `z.number().int().positive().optional()`, defaulting to unset/implicit-1 — Phase 5 vendor consolidation can bump this without breaking Phase 1 sidecars (RESEARCH Open Question #2)."
- "Validator skips `README.md` files in addition to `.gitkeep` and `.provenance.json` — Task 2's `assets/north-stars/README.md` would otherwise demand a sidecar of its own, which is wrong (READMEs are documentation, not provenanced assets)."
- "Vitest config gained ONE additional include pattern (`scripts/**/*.test.ts`) — the existing `scripts/**/*.test.mjs` pattern wouldn't pick up `.test.ts`, and the negative-case test needs TypeScript for `tmpDir: string` typing. Minimal additive change; does not affect any other plan."
- "Halted at Task 2 per plan's `autonomous: false` flag and orchestrator instructions — committing the 1020 north-star reference images requires human curation per CONTEXT D-01 + D-03 (curation gate IS the human reviewer)."
requirements-completed: []
# AEST-08, AEST-09, PIPE-03 are partially landed (gate exists; refused-sample proves it).
# They will be marked complete after Task 2 (human-curate north-star set) is committed by the user.
# Metrics
duration: ~12min
completed: 2026-05-09
---
# Phase 1 Plan 05: Asset Provenance Pipeline Floor — Partial (Halted at Task 2 Checkpoint)
**Task 1 shipped: validator script + Zod sidecar schema + refused-sample fixture + tmpdir-isolated Vitest enforcement test, all green. Halted at Task 2 (commit 1020 north-star reference images) — `autonomous: false`, requires human curation per plan + CONTEXT D-01 + D-03.**
## Status
| Task | Status | Commit |
|------|--------|--------|
| Task 1 — Validator + schema + refused-sample + Vitest | **DONE** | `da3f55c` |
| Task 2 — Curate + commit 1020 north-star images | **CHECKPOINT (awaiting human input)** | — |
## Performance
- **Duration:** ~12 min (Task 1 only)
- **Started:** 2026-05-09T03:18:51Z (orchestrator dispatch, immediately after Plan 01-01 complete)
- **Halted:** 2026-05-09T03:29:43Z (Task 2 checkpoint reached)
- **Tasks executed:** 1 of 2
- **Files created:** 4 (validator, test, refused-PNG, refused-.gitkeep) + 1 modified (vitest.config.ts)
## Accomplishments (Task 1)
- **`scripts/validate-assets.mjs` (~80 lines) — the asset-provenance CI gate.**
- Recursively walks `process.env.ASSETS_DIR ?? 'assets'` using `node:fs/promises` `readdir({withFileTypes: true})`.
- Skips `.gitkeep`, `README.md`, sidecar files (`.provenance.json`), and any path under the refused-prefixes (`assets/__samples__/refused`, `assets/__test_fixtures__/refused`).
- For every other file, requires a sibling `<filename>.provenance.json` validating against the Zod `ProvenanceSchema`.
- Exits non-zero with a clear error listing every failing path on missing/invalid sidecar; exits 0 with `[provenance] all <N> assets carry valid provenance.` on success.
- Windows-path normalization (`replaceAll('\\', '/')`) so the refused-prefix match works on both platforms.
- **Zod `ProvenanceSchema`** covering all 6 required fields per CLAUDE.md / AEST-08 (`model_id`, `checkpoint_hash`, `prompt`, `seed`, `sampler`, `params`) plus optional `provenance_schema_version: number` per RESEARCH Open Question #2 (Phase 5 vendor consolidation can bump this without breaking Phase 1 sidecars).
- **`assets/__samples__/refused/no-provenance.png` — the gate-proof artifact.** A 68-byte 1x1 transparent PNG with NO sidecar. Per CONTEXT D-03, the proof that the gate works is a real refused asset that the validator explicitly excludes from the walk; the existence of this file (and the `REFUSED_PREFIXES` constant in the validator) together demonstrate the gate is structural, not theoretical.
- **`scripts/validate-assets.test.ts` — Vitest enforcement (BLOCKER 2 fix).**
- **Positive case:** runs `node scripts/validate-assets.mjs` against the real `/assets/` tree (no env override) — must contain `all <N> assets carry valid provenance` in stdout.
- **Negative case:** creates a per-test-run unique tmpdir under `os.tmpdir()` via `mkdtemp(join(os.tmpdir(), 'tlg-provenance-test-'))`, drops a single 1x1 PNG with no sidecar inside, runs the validator with `ASSETS_DIR=<that tmpdir>` set in env, asserts exit code === 1 + stderr/stdout contains `validation failed` + `orphan.png` + `missing.*provenance sidecar`. Cleans up via `rm(tmpDir, {recursive: true, force: true})` in `afterAll`. **No risk of polluting the real `/assets/` tree** — even if the test runner is killed mid-run, the OS reclaims the tmpdir on next reboot.
- **All `npm test` green:** 3 tests pass across 2 files (the existing sentinel + 2 new validate-assets cases) in 875ms.
- **`npm run validate:assets` (the script key Plan 01 pre-declared) now exits 0** instead of failing as it did at end-of-Plan-01.
## Why this stopped at Task 2
The plan is **`autonomous: false`** and the orchestrator's spawn message explicitly directed: *"complete Task 1, then HALT before Task 2 with a CHECKPOINT requesting human input. Do not invent or AI-generate the north-star images yourself."*
Per plan + CONTEXT D-01 + D-03, the curation gate IS the human reviewer. Task 2 commits the 1020 hand-curated north-star reference images that establish the visual ground truth for Phase 5+ regression. The decision *which images go into the north-star set* is a tonal/aesthetic choice that requires the human's eye — there is no automated procedure that can substitute for it.
## Resume Protocol — Choose A Path
You have three valid paths per the plan. Pick whichever fits your current toolchain:
### Path A — AI-generated (recommended if you have a tool available)
1. Use whatever AI image tool you have access to (Stable Diffusion + watercolor LoRA, Midjourney, Scenario, Claude image generation, etc.).
2. Generate **1020 watercolor-style images** representing the visual north-star: walled cottage gardens, real-but-slightly-wrong wildflowers, golden/autumnal palette for Season 1, hand-painted feel. **No fantasy elements** (no D&D-style flora — see PROJECT.md "Out of Scope": "Generic fantasy flora").
3. For each generated image, write a sibling `<filename>.png.provenance.json` with all 6 required fields filled honestly (the actual `model_id` / `checkpoint_hash` you used, the prompt verbatim, the seed if your tool surfaces one, etc.).
4. Place each pair under `assets/north-stars/<descriptive-slug>.png` + `assets/north-stars/<descriptive-slug>.png.provenance.json`.
Example sidecar shape:
```json
{
"model_id": "stable-diffusion-xl-base-1.0+watercolor-lora-v3",
"checkpoint_hash": "sha256:abc123...",
"prompt": "watercolor painting of a walled cottage garden in late autumn, golden light, hollyhocks and asters slightly distorted, hand-painted feel, Studio Ghibli inspired, no text, no human figures",
"seed": 1729384756,
"sampler": "DPM++ 2M Karras",
"params": { "steps": 30, "cfg_scale": 7.0, "width": 1024, "height": 1024 }
}
```
### Path B — Hand-painted / licensed-photograph fallback
Per RESEARCH § Open Question #5 + Environment Availability, the schema accepts arbitrary `model_id` strings, so honest "human-painted" or licensed-photograph entries are valid and acceptable for Phase 1.
For each image (e.g., a CC-BY photograph of a real cottage garden, or a hand-painted reference scan):
```json
{
"model_id": "human",
"checkpoint_hash": "n/a",
"prompt": "Photograph of late-autumn walled cottage garden with hollyhocks; CC-BY 4.0 by <photographer name>, source <URL>",
"seed": 0,
"sampler": "n/a",
"params": { "notes": "Phase 1 fallback per RESEARCH Open Question #5; replaceable in Phase 5+" }
}
```
For licensed photographs, prefer `model_id: "photograph:cc-by:<photographer>"` to make the provenance audit trail more searchable in Phase 5.
### Path C — Defer with explicit IOU
If neither Path A nor Path B is feasible right now, commit **two** placeholder images with full honest provenance saying "placeholder" (enough to prove the schema accepts real entries) and **record the IOU in a dedicated file** at `.planning/phases/01-foundations-and-doctrine/01-05-IOU.md` (do NOT edit `.planning/STATE.md` from a phase-internal task — STATE.md is orchestrator-owned, per WARNING 5 fix in the plan). The IOU file template is in the plan under Task 2's `how-to-verify` step 8.
This still satisfies CONTEXT D-01's "1020 hand-curated" loosely (with explicit IOU) and keeps the rest of Phase 1 unblocked.
### After choosing a path
Whichever path you take, also write `assets/north-stars/README.md` (~10 lines) documenting:
- What this directory is (the visual ground truth for Phase 5+ regression).
- Which path was chosen (A/B/C) and why.
- How to add new images (sidecar naming convention: `<filename>.<ext>.provenance.json`; the 6 required fields).
- When this set will be revisited (Phase 5 is the planned consolidation point per CONTEXT D-02).
Then verify and commit:
```bash
node scripts/validate-assets.mjs # must exit 0 with "all <N> assets carry valid provenance"
npm test # must remain green
git add assets/north-stars/
git commit -m "feat(01-05): commit <N> north-star reference images with provenance sidecars (path <A|B|C>)"
```
### Resume signal
When you're done, you can either:
- Re-invoke the orchestrator (e.g., `/gsd-execute-phase 1` or `/gsd-execute-plan 01 05 --resume`) to let it pick up Plan 05's now-completed state and continue Wave 2.
- Or simply continue manually — Plan 05's Task 2 checkpoint is satisfied as soon as `assets/north-stars/` contains the curated set with valid sidecars and the validator+tests still pass. Plan 06 (doctrine docs) and Plan 07 (CI workflow) do not depend on Plan 05's content, only on its validator existing — which it does.
## Acceptance Criteria — Task 1 Verification
| Criterion | Status |
|-----------|--------|
| `node --check scripts/validate-assets.mjs` clean | OK |
| Schema covers 6 required fields + `provenance_schema_version` (≥7 grep hits) | OK (8 hits) |
| `process.env.ASSETS_DIR` env override present | OK |
| `__samples__/refused` exclusion present | OK |
| `process.exit(1)` on failure path | OK |
| `assets/__samples__/refused/no-provenance.png` exists, no sidecar | OK |
| Test fixture uses `os.tmpdir()` + `mkdtemp` | OK |
| Test passes `ASSETS_DIR` via `env:` of `execFile` (not by writing to disk) | OK |
| No `assets/__test_fixtures__/missing` real-tree pollution path | OK (no such path) |
| `node scripts/validate-assets.mjs` exits 0 against real /assets/ | OK (`all 0 assets carry valid provenance`) |
| `npx vitest run scripts/validate-assets.test.ts` green | OK (2 passed in 941ms) |
| Test cleans up tmpdir via `afterAll` + `rm` | OK |
| Full `npm test` green | OK (3 passed in 875ms) |
## Decisions Made
- **Validator skips `README.md` files** in addition to `.gitkeep` and `.provenance.json`. Task 2's `assets/north-stars/README.md` would otherwise demand a sidecar of its own, which is conceptually wrong — READMEs are documentation, not provenanced assets. Adding this skip in Task 1 avoids a "fix the validator after Task 2 commits the README" round-trip.
- **Optional `provenance_schema_version` is `z.number().int().positive().optional()`** — implicit/unset means schema version 1; Phase 5 vendor consolidation can bump to 2 when introducing new required fields (e.g., `human_reviewed_by` once external contributors enter the picture per RESEARCH § Security Domain).
- **`vitest.config.ts` `include` glob extended by one pattern** (`scripts/**/*.test.ts`) — the existing `scripts/**/*.test.mjs` pattern would not pick up the `.test.ts` file. Considered renaming to `.test.mjs` instead, but the test needs TypeScript for `tmpDir: string` / `fixtureFile: string` typing and for the catch-block `err: any` assertion. The single-line config tweak is the minimum-impact fix.
- **Refused-sample is a real PNG, not an empty file**, per CONTEXT D-03's "real refused asset" language. 68-byte 1x1 transparent PNG generated from the standard PNG byte sequence — small enough to be commit-noise-free, real enough to satisfy the gate-proof intent.
## Drift from Plan
None of substance. The plan's verbatim validator code from RESEARCH § Pattern 6 was used as-is, with the documented forward-compat additions:
- Optional `provenance_schema_version` field (RESEARCH Open Question #2 explicitly recommends this).
- `README.md` skip (necessary for Task 2's directory README).
- `assets/__test_fixtures__/refused` added to `REFUSED_PREFIXES` alongside `assets/__samples__/refused` (defensive — neither path exists yet, but if a future plan needs an alternate refused-fixture root the exclusion already covers it).
- Windows-path normalization (`replaceAll('\\', '/')`) — required for the `startsWith` exclusion to work on Windows where the project is being developed.
## Deviations from Plan
### Auto-fixed Issues
**1. [Rule 3 — Blocking] Extended vitest.config.ts include glob to pick up `scripts/**/*.test.ts`**
- **Found during:** Task 1 Step 6 (running `npx vitest run scripts/validate-assets.test.ts`)
- **Issue:** The existing `vitest.config.ts` `include` glob from Plan 01 was `['src/**/*.test.ts', 'src/**/*.test.tsx', 'scripts/**/*.test.mjs']`. Vitest reported `No test files found, exiting with code 1` because the new `.test.ts` file under `scripts/` matched neither pattern.
- **Fix:** Added `'scripts/**/*.test.ts'` as a fourth include entry. Single-line additive change; affects no other plan.
- **Files modified:** `vitest.config.ts`.
- **Verification:** `npx vitest run scripts/validate-assets.test.ts` reports `2 passed (2)` in 941ms.
- **Committed in:** `da3f55c` (Task 1 commit, alongside the validator and test).
**2. [Rule 2 — Missing critical] Validator skips `README.md` files**
- **Found during:** Task 1 Step 1 (writing the validator)
- **Issue:** Task 2's `how-to-verify` step 3 directs the user to add `assets/north-stars/README.md`. The validator as specified in RESEARCH § Pattern 6 verbatim would demand a sidecar for the README itself, which is wrong — READMEs are documentation, not provenanced assets.
- **Fix:** Added `if (basename(norm) === 'README.md') continue;` in the walk loop.
- **Files modified:** `scripts/validate-assets.mjs`.
- **Verification:** when the user (Task 2) commits `assets/north-stars/README.md`, the validator will skip it correctly.
- **Committed in:** `da3f55c` (Task 1 commit).
**Total deviations:** 2 auto-fixed (1 blocking, 1 missing critical). Both are mechanical fixes called out in the plan's own action block (the README skip is implicitly required by Task 2's `how-to-verify`; the vitest.config tweak is a config-discoverability blocker explicitly authorized by Rule 3).
## Issues Encountered
- **`node_modules/` not present in the worktree** — the agent worktree at `.claude/worktrees/agent-a096e5ee44a2c6d1c` is git-only, no shared node_modules from the main repo. Resolved by running `npm ci` once at agent start (~11 seconds, 209 packages from `package-lock.json`). This is expected for parallel-worktree execution and does not change any committed file.
## Authentication Gates
None — Phase 1 plumbing only; no external auth needed.
## Threat Flags
None — both threats in the plan's `<threat_model>` are explicitly `accept` per phase scope:
- T-01-06 (Spoofing — provenance sidecar fabrication): out of scope for Phase 1; deferred to Phase 8+ when external contributors enter the picture.
- T-01-07 (Tampering — path traversal via sidecar filename): not exploitable. The validator never resolves paths *from* sidecar contents; it only reads sidecars at deterministic sibling paths derived from the walked file path.
## Known Stubs
- **`assets/north-stars/` is not yet populated** — this is the Task 2 deferral above. The validator will return `[provenance] all 0 assets carry valid provenance.` until the human curates the north-star set (Path A / B / C). Once populated, the count `<N>` will be 1020 per CONTEXT D-01 (or 2 with an IOU per Path C).
- **`assets/north-stars/README.md` is not yet written** — Task 2 owns it. The validator already knows to skip it (Rule 2 fix above).
These are intentional stubs that exist *because* the plan halts at the human-curate checkpoint. They will be resolved by the resume protocol above.
## Next Plan Readiness
- **Plan 06 (doctrine docs):** Unaffected — pure markdown plan, no code dependencies on Plan 05.
- **Plan 07 (CI workflow):** Ready as soon as Task 2 completes. The composite `npm run ci` script (`npm run lint && npm run test && npm run validate:assets && npm run build`) currently exits non-zero only because the lint+build sub-steps depend on Plan 02 (firewall+lint) landing — the `validate:assets` sub-step is now green.
- **Phase 5 (production-volume asset pipeline):** Has its working seed once Task 2 lands — the 1020 north-star images become the visual-regression baseline, and the `provenance_schema_version` field is reserved for any vendor-consolidation schema bump.
## Self-Check
Verified before returning:
- [x] `scripts/validate-assets.mjs` exists at the worktree root and is committed (`da3f55c`).
- [x] `scripts/validate-assets.test.ts` exists and is committed.
- [x] `assets/__samples__/refused/no-provenance.png` exists with no sidecar (verified: `! test -f assets/__samples__/refused/no-provenance.png.provenance.json`).
- [x] `assets/__samples__/refused/.gitkeep` exists.
- [x] `vitest.config.ts` modification committed in `da3f55c`.
- [x] Commit `da3f55c` is present in `git log --oneline`.
- [x] `node scripts/validate-assets.mjs` exits 0 against the current `/assets/` tree.
- [x] `npm test` green (3 passed across 2 files).
- [x] No modifications to `.planning/STATE.md` or `.planning/ROADMAP.md` (orchestrator-owned per worktree contract).
- [x] No `.claude/settings.local.json` committed (correctly left untracked).
**## Self-Check: PASSED**
---
*Phase: 01-foundations-and-doctrine*
*Plan: 05 of 7*
*Halted at: 2026-05-09T03:29:43Z (Task 2 human-curate checkpoint)*
*Resume: commit `assets/north-stars/<1020 images>` + sidecars + README.md, then continue Wave 2*