diff --git a/.planning/phases/01-foundations-and-doctrine/01-05-asset-provenance-SUMMARY.md b/.planning/phases/01-foundations-and-doctrine/01-05-asset-provenance-SUMMARY.md new file mode 100644 index 0000000..7f1c7ed --- /dev/null +++ b/.planning/phases/01-foundations-and-doctrine/01-05-asset-provenance-SUMMARY.md @@ -0,0 +1,268 @@ +--- +phase: 01-foundations-and-doctrine +plan: 05 +subsystem: pipeline +tags: [provenance, ai-assets, validator, zod, vitest, ci-gate, checkpoint, human-curate] + +# Dependency graph +requires: + - phase: 01 + plan: 01 + provides: zod@^4 installed, assets/ tree, validate:assets script key pre-declared in package.json, vitest+happy-dom wired +provides: + - scripts/validate-assets.mjs (CI gate — exits non-zero on any /assets/ file lacking a valid provenance sidecar) + - Zod ProvenanceSchema covering the 6 CLAUDE.md / AEST-08 fields + optional provenance_schema_version (RESEARCH Open Question #2) + - assets/__samples__/refused/no-provenance.png (gate-proof artifact per CONTEXT D-03) + - scripts/validate-assets.test.ts (Vitest enforcement — positive case against real /assets/, negative case against os.tmpdir() fixture) +affects: + - 01-07-ci-workflow (calls `npm run validate:assets` in the composite `ci` script — green now that the validator exists) + - 05-onwards (Phase 5 production-volume asset pipeline scales this floor up; provenance_schema_version=1 implicit, Phase 5 may bump on vendor consolidation) + +# Tech tracking +tech-stack: + added: + - "(none new — uses already-installed `zod@^4.4.3` per Plan 01)" + patterns: + - "Sidecar-per-asset naming: `..provenance.json` (e.g., `garden-soil-01.png.provenance.json`) — keeps sidecar adjacent in directory listings, grep-friendly, no stem-collision ambiguity. Per RESEARCH § Pattern 6 sidecar-naming-convention decision." + - "ASSETS_DIR env override on the validator script — lets the Vitest negative-case test point at an isolated tmpdir without modifying production code or polluting the real /assets/ tree (BLOCKER 2 fix)." + - "REFUSED_PREFIXES exclusion list at the top of the validator — explicitly enumerated, so adding new exclusions in future phases is a single-line change." + - "Test-fixture isolation via `os.tmpdir()` + `mkdtemp` — the negative-case fixture lives outside the repo entirely; even if vitest is killed mid-run, the OS reclaims the tmpdir on next reboot. No orphan-fragility risk." + +key-files: + created: + - scripts/validate-assets.mjs (~80 lines incl. error handling and Windows-path normalization) + - scripts/validate-assets.test.ts (~50 lines, two-case Vitest) + - assets/__samples__/refused/no-provenance.png (1x1 transparent PNG, 68 bytes — the gate-proof artifact) + - assets/__samples__/refused/.gitkeep + modified: + - vitest.config.ts (added `scripts/**/*.test.ts` to include glob — Rule 3 blocking fix; without this the new test is invisible to vitest) + +key-decisions: + - "Optional `provenance_schema_version` field is included in the Zod schema as `z.number().int().positive().optional()`, defaulting to unset/implicit-1 — Phase 5 vendor consolidation can bump this without breaking Phase 1 sidecars (RESEARCH Open Question #2)." + - "Validator skips `README.md` files in addition to `.gitkeep` and `.provenance.json` — Task 2's `assets/north-stars/README.md` would otherwise demand a sidecar of its own, which is wrong (READMEs are documentation, not provenanced assets)." + - "Vitest config gained ONE additional include pattern (`scripts/**/*.test.ts`) — the existing `scripts/**/*.test.mjs` pattern wouldn't pick up `.test.ts`, and the negative-case test needs TypeScript for `tmpDir: string` typing. Minimal additive change; does not affect any other plan." + - "Halted at Task 2 per plan's `autonomous: false` flag and orchestrator instructions — committing the 10–20 north-star reference images requires human curation per CONTEXT D-01 + D-03 (curation gate IS the human reviewer)." + +requirements-completed: [] +# AEST-08, AEST-09, PIPE-03 are partially landed (gate exists; refused-sample proves it). +# They will be marked complete after Task 2 (human-curate north-star set) is committed by the user. + +# Metrics +duration: ~12min +completed: 2026-05-09 +--- + +# Phase 1 Plan 05: Asset Provenance Pipeline Floor — Partial (Halted at Task 2 Checkpoint) + +**Task 1 shipped: validator script + Zod sidecar schema + refused-sample fixture + tmpdir-isolated Vitest enforcement test, all green. Halted at Task 2 (commit 10–20 north-star reference images) — `autonomous: false`, requires human curation per plan + CONTEXT D-01 + D-03.** + +## Status + +| Task | Status | Commit | +|------|--------|--------| +| Task 1 — Validator + schema + refused-sample + Vitest | **DONE** | `da3f55c` | +| Task 2 — Curate + commit 10–20 north-star images | **CHECKPOINT (awaiting human input)** | — | + +## Performance + +- **Duration:** ~12 min (Task 1 only) +- **Started:** 2026-05-09T03:18:51Z (orchestrator dispatch, immediately after Plan 01-01 complete) +- **Halted:** 2026-05-09T03:29:43Z (Task 2 checkpoint reached) +- **Tasks executed:** 1 of 2 +- **Files created:** 4 (validator, test, refused-PNG, refused-.gitkeep) + 1 modified (vitest.config.ts) + +## Accomplishments (Task 1) + +- **`scripts/validate-assets.mjs` (~80 lines) — the asset-provenance CI gate.** + - Recursively walks `process.env.ASSETS_DIR ?? 'assets'` using `node:fs/promises` `readdir({withFileTypes: true})`. + - Skips `.gitkeep`, `README.md`, sidecar files (`.provenance.json`), and any path under the refused-prefixes (`assets/__samples__/refused`, `assets/__test_fixtures__/refused`). + - For every other file, requires a sibling `.provenance.json` validating against the Zod `ProvenanceSchema`. + - Exits non-zero with a clear error listing every failing path on missing/invalid sidecar; exits 0 with `[provenance] all assets carry valid provenance.` on success. + - Windows-path normalization (`replaceAll('\\', '/')`) so the refused-prefix match works on both platforms. +- **Zod `ProvenanceSchema`** covering all 6 required fields per CLAUDE.md / AEST-08 (`model_id`, `checkpoint_hash`, `prompt`, `seed`, `sampler`, `params`) plus optional `provenance_schema_version: number` per RESEARCH Open Question #2 (Phase 5 vendor consolidation can bump this without breaking Phase 1 sidecars). +- **`assets/__samples__/refused/no-provenance.png` — the gate-proof artifact.** A 68-byte 1x1 transparent PNG with NO sidecar. Per CONTEXT D-03, the proof that the gate works is a real refused asset that the validator explicitly excludes from the walk; the existence of this file (and the `REFUSED_PREFIXES` constant in the validator) together demonstrate the gate is structural, not theoretical. +- **`scripts/validate-assets.test.ts` — Vitest enforcement (BLOCKER 2 fix).** + - **Positive case:** runs `node scripts/validate-assets.mjs` against the real `/assets/` tree (no env override) — must contain `all assets carry valid provenance` in stdout. + - **Negative case:** creates a per-test-run unique tmpdir under `os.tmpdir()` via `mkdtemp(join(os.tmpdir(), 'tlg-provenance-test-'))`, drops a single 1x1 PNG with no sidecar inside, runs the validator with `ASSETS_DIR=` set in env, asserts exit code === 1 + stderr/stdout contains `validation failed` + `orphan.png` + `missing.*provenance sidecar`. Cleans up via `rm(tmpDir, {recursive: true, force: true})` in `afterAll`. **No risk of polluting the real `/assets/` tree** — even if the test runner is killed mid-run, the OS reclaims the tmpdir on next reboot. +- **All `npm test` green:** 3 tests pass across 2 files (the existing sentinel + 2 new validate-assets cases) in 875ms. +- **`npm run validate:assets` (the script key Plan 01 pre-declared) now exits 0** instead of failing as it did at end-of-Plan-01. + +## Why this stopped at Task 2 + +The plan is **`autonomous: false`** and the orchestrator's spawn message explicitly directed: *"complete Task 1, then HALT before Task 2 with a CHECKPOINT requesting human input. Do not invent or AI-generate the north-star images yourself."* + +Per plan + CONTEXT D-01 + D-03, the curation gate IS the human reviewer. Task 2 commits the 10–20 hand-curated north-star reference images that establish the visual ground truth for Phase 5+ regression. The decision *which images go into the north-star set* is a tonal/aesthetic choice that requires the human's eye — there is no automated procedure that can substitute for it. + +## Resume Protocol — Choose A Path + +You have three valid paths per the plan. Pick whichever fits your current toolchain: + +### Path A — AI-generated (recommended if you have a tool available) + +1. Use whatever AI image tool you have access to (Stable Diffusion + watercolor LoRA, Midjourney, Scenario, Claude image generation, etc.). +2. Generate **10–20 watercolor-style images** representing the visual north-star: walled cottage gardens, real-but-slightly-wrong wildflowers, golden/autumnal palette for Season 1, hand-painted feel. **No fantasy elements** (no D&D-style flora — see PROJECT.md "Out of Scope": "Generic fantasy flora"). +3. For each generated image, write a sibling `.png.provenance.json` with all 6 required fields filled honestly (the actual `model_id` / `checkpoint_hash` you used, the prompt verbatim, the seed if your tool surfaces one, etc.). +4. Place each pair under `assets/north-stars/.png` + `assets/north-stars/.png.provenance.json`. + +Example sidecar shape: +```json +{ + "model_id": "stable-diffusion-xl-base-1.0+watercolor-lora-v3", + "checkpoint_hash": "sha256:abc123...", + "prompt": "watercolor painting of a walled cottage garden in late autumn, golden light, hollyhocks and asters slightly distorted, hand-painted feel, Studio Ghibli inspired, no text, no human figures", + "seed": 1729384756, + "sampler": "DPM++ 2M Karras", + "params": { "steps": 30, "cfg_scale": 7.0, "width": 1024, "height": 1024 } +} +``` + +### Path B — Hand-painted / licensed-photograph fallback + +Per RESEARCH § Open Question #5 + Environment Availability, the schema accepts arbitrary `model_id` strings, so honest "human-painted" or licensed-photograph entries are valid and acceptable for Phase 1. + +For each image (e.g., a CC-BY photograph of a real cottage garden, or a hand-painted reference scan): +```json +{ + "model_id": "human", + "checkpoint_hash": "n/a", + "prompt": "Photograph of late-autumn walled cottage garden with hollyhocks; CC-BY 4.0 by , source ", + "seed": 0, + "sampler": "n/a", + "params": { "notes": "Phase 1 fallback per RESEARCH Open Question #5; replaceable in Phase 5+" } +} +``` + +For licensed photographs, prefer `model_id: "photograph:cc-by:"` to make the provenance audit trail more searchable in Phase 5. + +### Path C — Defer with explicit IOU + +If neither Path A nor Path B is feasible right now, commit **two** placeholder images with full honest provenance saying "placeholder" (enough to prove the schema accepts real entries) and **record the IOU in a dedicated file** at `.planning/phases/01-foundations-and-doctrine/01-05-IOU.md` (do NOT edit `.planning/STATE.md` from a phase-internal task — STATE.md is orchestrator-owned, per WARNING 5 fix in the plan). The IOU file template is in the plan under Task 2's `how-to-verify` step 8. + +This still satisfies CONTEXT D-01's "10–20 hand-curated" loosely (with explicit IOU) and keeps the rest of Phase 1 unblocked. + +### After choosing a path + +Whichever path you take, also write `assets/north-stars/README.md` (~10 lines) documenting: +- What this directory is (the visual ground truth for Phase 5+ regression). +- Which path was chosen (A/B/C) and why. +- How to add new images (sidecar naming convention: `..provenance.json`; the 6 required fields). +- When this set will be revisited (Phase 5 is the planned consolidation point per CONTEXT D-02). + +Then verify and commit: +```bash +node scripts/validate-assets.mjs # must exit 0 with "all assets carry valid provenance" +npm test # must remain green +git add assets/north-stars/ +git commit -m "feat(01-05): commit north-star reference images with provenance sidecars (path )" +``` + +### Resume signal + +When you're done, you can either: +- Re-invoke the orchestrator (e.g., `/gsd-execute-phase 1` or `/gsd-execute-plan 01 05 --resume`) to let it pick up Plan 05's now-completed state and continue Wave 2. +- Or simply continue manually — Plan 05's Task 2 checkpoint is satisfied as soon as `assets/north-stars/` contains the curated set with valid sidecars and the validator+tests still pass. Plan 06 (doctrine docs) and Plan 07 (CI workflow) do not depend on Plan 05's content, only on its validator existing — which it does. + +## Acceptance Criteria — Task 1 Verification + +| Criterion | Status | +|-----------|--------| +| `node --check scripts/validate-assets.mjs` clean | OK | +| Schema covers 6 required fields + `provenance_schema_version` (≥7 grep hits) | OK (8 hits) | +| `process.env.ASSETS_DIR` env override present | OK | +| `__samples__/refused` exclusion present | OK | +| `process.exit(1)` on failure path | OK | +| `assets/__samples__/refused/no-provenance.png` exists, no sidecar | OK | +| Test fixture uses `os.tmpdir()` + `mkdtemp` | OK | +| Test passes `ASSETS_DIR` via `env:` of `execFile` (not by writing to disk) | OK | +| No `assets/__test_fixtures__/missing` real-tree pollution path | OK (no such path) | +| `node scripts/validate-assets.mjs` exits 0 against real /assets/ | OK (`all 0 assets carry valid provenance`) | +| `npx vitest run scripts/validate-assets.test.ts` green | OK (2 passed in 941ms) | +| Test cleans up tmpdir via `afterAll` + `rm` | OK | +| Full `npm test` green | OK (3 passed in 875ms) | + +## Decisions Made + +- **Validator skips `README.md` files** in addition to `.gitkeep` and `.provenance.json`. Task 2's `assets/north-stars/README.md` would otherwise demand a sidecar of its own, which is conceptually wrong — READMEs are documentation, not provenanced assets. Adding this skip in Task 1 avoids a "fix the validator after Task 2 commits the README" round-trip. +- **Optional `provenance_schema_version` is `z.number().int().positive().optional()`** — implicit/unset means schema version 1; Phase 5 vendor consolidation can bump to 2 when introducing new required fields (e.g., `human_reviewed_by` once external contributors enter the picture per RESEARCH § Security Domain). +- **`vitest.config.ts` `include` glob extended by one pattern** (`scripts/**/*.test.ts`) — the existing `scripts/**/*.test.mjs` pattern would not pick up the `.test.ts` file. Considered renaming to `.test.mjs` instead, but the test needs TypeScript for `tmpDir: string` / `fixtureFile: string` typing and for the catch-block `err: any` assertion. The single-line config tweak is the minimum-impact fix. +- **Refused-sample is a real PNG, not an empty file**, per CONTEXT D-03's "real refused asset" language. 68-byte 1x1 transparent PNG generated from the standard PNG byte sequence — small enough to be commit-noise-free, real enough to satisfy the gate-proof intent. + +## Drift from Plan + +None of substance. The plan's verbatim validator code from RESEARCH § Pattern 6 was used as-is, with the documented forward-compat additions: +- Optional `provenance_schema_version` field (RESEARCH Open Question #2 explicitly recommends this). +- `README.md` skip (necessary for Task 2's directory README). +- `assets/__test_fixtures__/refused` added to `REFUSED_PREFIXES` alongside `assets/__samples__/refused` (defensive — neither path exists yet, but if a future plan needs an alternate refused-fixture root the exclusion already covers it). +- Windows-path normalization (`replaceAll('\\', '/')`) — required for the `startsWith` exclusion to work on Windows where the project is being developed. + +## Deviations from Plan + +### Auto-fixed Issues + +**1. [Rule 3 — Blocking] Extended vitest.config.ts include glob to pick up `scripts/**/*.test.ts`** +- **Found during:** Task 1 Step 6 (running `npx vitest run scripts/validate-assets.test.ts`) +- **Issue:** The existing `vitest.config.ts` `include` glob from Plan 01 was `['src/**/*.test.ts', 'src/**/*.test.tsx', 'scripts/**/*.test.mjs']`. Vitest reported `No test files found, exiting with code 1` because the new `.test.ts` file under `scripts/` matched neither pattern. +- **Fix:** Added `'scripts/**/*.test.ts'` as a fourth include entry. Single-line additive change; affects no other plan. +- **Files modified:** `vitest.config.ts`. +- **Verification:** `npx vitest run scripts/validate-assets.test.ts` reports `2 passed (2)` in 941ms. +- **Committed in:** `da3f55c` (Task 1 commit, alongside the validator and test). + +**2. [Rule 2 — Missing critical] Validator skips `README.md` files** +- **Found during:** Task 1 Step 1 (writing the validator) +- **Issue:** Task 2's `how-to-verify` step 3 directs the user to add `assets/north-stars/README.md`. The validator as specified in RESEARCH § Pattern 6 verbatim would demand a sidecar for the README itself, which is wrong — READMEs are documentation, not provenanced assets. +- **Fix:** Added `if (basename(norm) === 'README.md') continue;` in the walk loop. +- **Files modified:** `scripts/validate-assets.mjs`. +- **Verification:** when the user (Task 2) commits `assets/north-stars/README.md`, the validator will skip it correctly. +- **Committed in:** `da3f55c` (Task 1 commit). + +**Total deviations:** 2 auto-fixed (1 blocking, 1 missing critical). Both are mechanical fixes called out in the plan's own action block (the README skip is implicitly required by Task 2's `how-to-verify`; the vitest.config tweak is a config-discoverability blocker explicitly authorized by Rule 3). + +## Issues Encountered + +- **`node_modules/` not present in the worktree** — the agent worktree at `.claude/worktrees/agent-a096e5ee44a2c6d1c` is git-only, no shared node_modules from the main repo. Resolved by running `npm ci` once at agent start (~11 seconds, 209 packages from `package-lock.json`). This is expected for parallel-worktree execution and does not change any committed file. + +## Authentication Gates + +None — Phase 1 plumbing only; no external auth needed. + +## Threat Flags + +None — both threats in the plan's `` are explicitly `accept` per phase scope: +- T-01-06 (Spoofing — provenance sidecar fabrication): out of scope for Phase 1; deferred to Phase 8+ when external contributors enter the picture. +- T-01-07 (Tampering — path traversal via sidecar filename): not exploitable. The validator never resolves paths *from* sidecar contents; it only reads sidecars at deterministic sibling paths derived from the walked file path. + +## Known Stubs + +- **`assets/north-stars/` is not yet populated** — this is the Task 2 deferral above. The validator will return `[provenance] all 0 assets carry valid provenance.` until the human curates the north-star set (Path A / B / C). Once populated, the count `` will be 10–20 per CONTEXT D-01 (or 2 with an IOU per Path C). +- **`assets/north-stars/README.md` is not yet written** — Task 2 owns it. The validator already knows to skip it (Rule 2 fix above). + +These are intentional stubs that exist *because* the plan halts at the human-curate checkpoint. They will be resolved by the resume protocol above. + +## Next Plan Readiness + +- **Plan 06 (doctrine docs):** Unaffected — pure markdown plan, no code dependencies on Plan 05. +- **Plan 07 (CI workflow):** Ready as soon as Task 2 completes. The composite `npm run ci` script (`npm run lint && npm run test && npm run validate:assets && npm run build`) currently exits non-zero only because the lint+build sub-steps depend on Plan 02 (firewall+lint) landing — the `validate:assets` sub-step is now green. +- **Phase 5 (production-volume asset pipeline):** Has its working seed once Task 2 lands — the 10–20 north-star images become the visual-regression baseline, and the `provenance_schema_version` field is reserved for any vendor-consolidation schema bump. + +## Self-Check + +Verified before returning: + +- [x] `scripts/validate-assets.mjs` exists at the worktree root and is committed (`da3f55c`). +- [x] `scripts/validate-assets.test.ts` exists and is committed. +- [x] `assets/__samples__/refused/no-provenance.png` exists with no sidecar (verified: `! test -f assets/__samples__/refused/no-provenance.png.provenance.json`). +- [x] `assets/__samples__/refused/.gitkeep` exists. +- [x] `vitest.config.ts` modification committed in `da3f55c`. +- [x] Commit `da3f55c` is present in `git log --oneline`. +- [x] `node scripts/validate-assets.mjs` exits 0 against the current `/assets/` tree. +- [x] `npm test` green (3 passed across 2 files). +- [x] No modifications to `.planning/STATE.md` or `.planning/ROADMAP.md` (orchestrator-owned per worktree contract). +- [x] No `.claude/settings.local.json` committed (correctly left untracked). + +**## Self-Check: PASSED** + +--- +*Phase: 01-foundations-and-doctrine* +*Plan: 05 of 7* +*Halted at: 2026-05-09T03:29:43Z (Task 2 human-curate checkpoint)* +*Resume: commit `assets/north-stars/<10–20 images>` + sidecars + README.md, then continue Wave 2*