diff --git a/CHANGELOG.md b/CHANGELOG.md index 0635c02..2d2341c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,7 +2,31 @@ All notable changes to `testing-os` are documented here. The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## [Unreleased] +## [1.3.2] — 2026-06-02 + +**`@dogfood-lab/dogfood-swarm` self-audit health pass.** The swarm runner audited itself with its own 10-phase protocol. Stage A (bug/security) landed 28 fixes; Stage C (hardening / operator-UX) followed with exit-code-contract and documentation closure; two deferred follow-ups then landed — fp-p-005 made the finding fingerprint a pure, injective function of the finding's own stable content (an edit-stable context-snippet hash), and fp-p-006 consolidated the agent-output schema into `@dogfood-lab/schemas`. The only package-shape change is one new internal workspace dependency (`@dogfood-lab/dogfood-swarm` → `@dogfood-lab/schemas`, see below); no breaking changes (fp-p-005's behavior change is backward-compatible — see below). Findings recorded under the run in [`swarms/swarm-1780390764-7dab/`](swarms/swarm-1780390764-7dab/). + +### Security & correctness (Stage A) + +- **Verify engine — honest verdicts** ([`packages/dogfood-swarm/lib/verify/runner.js`](packages/dogfood-swarm/lib/verify/runner.js), [`packages/dogfood-swarm/lib/verify/adapters/node.js`](packages/dogfood-swarm/lib/verify/adapters/node.js)). The wave gate no longer reports a clean `pass` for non-evidence. `no_tests` (ve-004) distinguishes "the repo has no `test` script and `npm test --if-present` ran nothing" from a real pass; `tool_missing` (ve-p-001) distinguishes "a required build tool is absent from `PATH`" from a code failure; `skip` (ve-005) stops an empty required-step set from being a vacuous `pass`. A typo'd `--threshold` now fails loud (`CLI_INVALID_THRESHOLD`, ve-002) instead of silently disabling the CI gate via a `NaN` comparison. Step output is bounded and per-step timeouts are tagged `timed_out` rather than misread as fast failures. +- **Finding-id & agent-output integrity** (ve-001 verify-classifier, fp-001 agent-output schema validation as the two criticals; fp-002 within-wave fingerprint collisions, fp-003 non-ASCII git paths, fp-004 TOCTOU byte-gate, sm-001/002 domain handling, cli-001 rewind among the highs). Agent outputs are schema-validated at collect time with a structured `AgentOutputValidationError`; fingerprints stay collision-resistant within a wave and across non-ASCII paths. + +### Hardening & operator UX (Stage C) + +- **Exit-code contract closed on the CI-gate verbs** ([`packages/dogfood-swarm/cli.js`](packages/dogfood-swarm/cli.js)). `swarm verify` now exits non-zero on a `fail` verdict, and `swarm persist --ingest` exits non-zero when the dogfood ingest fails — aligning both with the 3-way (`0`/`1`/`2`) contract the `verify-*` and `findings` verbs already honored, so a non-interactive CI step can no longer go green on a hard failure. +- **Operator documentation** ([`packages/dogfood-swarm/README.md`](packages/dogfood-swarm/README.md)). The package README now documents the exit-code contract for every gate-capable verb, the five verify verdicts (`pass`/`fail`/`skip`/`no_tests`/`tool_missing`), the three scriptable environment variables (`SWARM_DB`, `DOGFOOD_FINDINGS_FORMAT`, `DOGFOOD_LOG_HUMAN`) with the NDJSON-on-stderr diagnostic channel, and a symptom→recovery-verb troubleshooting table that deep-links the handbook recovery and error-codes pages. +- **README→CLI contract test hardened** ([`packages/dogfood-swarm/meta-amendA-readme-contract.test.js`](packages/dogfood-swarm/meta-amendA-readme-contract.test.js)). The td-006 guard now also pins the operator-facing env-var vocabulary: it reads the real `process.env.*` literals from source and asserts each documented var appears in the README, closing the drift class (undocumented env vars) that the command-only check left open. + +### Schema consolidation (fp-p-006) + +- **One source of truth for the agent-output schema** ([`packages/schemas/src/json/agent-output.schema.json`](packages/schemas/src/json/agent-output.schema.json), [`packages/dogfood-swarm/lib/validate-agent-output.js`](packages/dogfood-swarm/lib/validate-agent-output.js), [`packages/dogfood-swarm/lib/templates.js`](packages/dogfood-swarm/lib/templates.js)). The fp-001 packaging fix had shipped a package-local copy at `packages/dogfood-swarm/schema/` guarded by a byte-equality drift test, because the repo-root `scripts/agent-output.schema.json` was absent from the published tarball. fp-p-006 (deferred from the same self-audit) removes the controlled duplication: the schema now lives in `@dogfood-lab/schemas`, and both the collect-time validator and the dispatch prompt-builder resolve it via `createRequire('@dogfood-lab/schemas/json/agent-output.schema.json')` — the same `./json/*` subpath pattern the eight contract schemas already use. `@dogfood-lab/dogfood-swarm` gains `@dogfood-lab/schemas` as a dependency; the package-local copy, the repo-root copy, and the `meta-amendA-schema-packaging.test.js` drift guard are deleted. The schema's `$id` moves to the canonical `packages/schemas/src/json/` path — a contract field, hence the lockstep bump. The schema ships as a raw JSON subpath (not registered in `validatePayload`): it stays a swarm output envelope compiled with a local Ajv, allowlisted in the single-canonical-validator gate ([`scripts/check-validator-cache-singleton.test.mjs`](scripts/check-validator-cache-singleton.test.mjs)). + +### Fingerprint stability (fp-p-005) + +- **Edit-stable context-snippet hash → injective base fingerprints** ([`packages/dogfood-swarm/lib/fingerprint.js`](packages/dogfood-swarm/lib/fingerprint.js), [`packages/dogfood-swarm/commands/collect.js`](packages/dogfood-swarm/commands/collect.js)). The base fingerprint was `sha256(category | rule_id | path | symbol | 10-line-bucket)`. Two genuinely-distinct symbol-less findings in the same file and bucket collided on the base fp; fp-002's `disambiguateFingerprints` salted the collision apart, correctly but with bounded residual new/recurring churn when a collision group grew or shrank across waves. fp-p-005 (deferred from the same self-audit) folds in an **edit-stable context-snippet hash** — the surrounding ~7 source lines around the finding, whitespace-collapsed and line-ending-normalized — as the LOCATION component when the source file is readable at collect time. This is the CodeQL `primaryLocationLineHash` design (hash the surrounding *content*, not the line number): it survives reflow, re-indentation, and code inserted elsewhere that shifts the finding's line number, while giving two findings at different points in one file *different* base fingerprints. The base fp is now a pure, injective function of the finding's own stable content, so `disambiguateFingerprints` is demoted from the primary collision mechanism to a **safety net** that fires only on the no-source fallback path and the rare case of two findings with byte-identical surrounding source. `computeFingerprint(finding, { sourceText })` reads no filesystem itself — `collect.js` reads each finding's file once (cached, size-guarded at 2 MB, path-contained to the worktree) and threads the text in. Coverity's enclosing-function key is the same idea at function granularity; the existing `symbol` component already carries the enclosing function name when the auditor reports one. +- **Backward-compatible by construction.** When no source is available (synthetic finding, deleted/unresolvable file, file-level finding with no line, or a path that escapes the worktree), LOCATION degrades to the historical 10-line bucket and the fingerprint is **byte-for-byte** what it was before — so the B-BACK-002 description-stability contract and the existing cross-wave dedup of source-less findings are untouched. The optional second argument means every existing `computeFingerprint(finding)` call site is unaffected. +- **Semantics note (one-time re-fingerprint).** Because the LOCATION encoding changes when source is present, a finding carried in a pre-upgrade `control-plane.db` will get a *new* (context-folded) fingerprint the first time it is re-audited with source available — a one-time `new` + `fixed`/`unverified` churn on that first post-upgrade wave, after which it is stable. The live `control-plane.db` on this rig holds zero findings, so there is no migration impact here; long-lived stores elsewhere will see the one-time churn once and then settle. +- **Tests** ([`packages/dogfood-swarm/meta-amendA-findings-persist.test.js`](packages/dogfood-swarm/meta-amendA-findings-persist.test.js), [`packages/dogfood-swarm/d3b-006-finding-id-collision.test.js`](packages/dogfood-swarm/d3b-006-finding-id-collision.test.js)). New coverage locks: `extractContextSnippet` null/edge cases + reflow/CRLF/indentation invariance; `computeFingerprint` injectivity for distinct same-bucket locations, B-BACK-002 stability via source (not prose), reflow survival, and byte-identical no-source fallback; the fp-002 cross-wave scenario re-run **in both input orders** with source, proving no collision group forms (A keeps its `finding_id` as `recurring`, B inserts as `new`, identically regardless of order) and nothing is salted; a real-worktree `collect()` integration test proving two same-bucket findings persist as two distinct rows whose fingerprints are the context-hash fps (not the shared no-source bucket fp); and the D3B-006 content-addressed `finding_id` derivation composed with a context-folded fingerprint. The fp-002/fp-r-001/fp-p-001 occurrence-salting tests stay green — they now exercise the no-source safety-net path. ## [1.3.1] — 2026-06-01 diff --git a/CLAUDE.md b/CLAUDE.md index 94b86aa..bb37f67 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -91,7 +91,7 @@ This repo mirrors `world-forge` deliberately (npm workspaces, `tsc --build` comp - Package names mirror the directory: `packages/findings/` → `@dogfood-lab/findings`. Exception: `dogfood-swarm` (the directory name disambiguates from generic "swarm") ### Versioning -**Lockstep.** All packages bump together. Currently **`1.3.1`** ([release v1.3.1 cut 2026-06-01](https://github.com/dogfood-lab/testing-os/releases/tag/v1.3.1); release v1.3.0 cut 2026-06-01; release v1.2.3 cut 2026-05-20; first stable v1.0.0 cut 2026-04-25). Six of seven `@dogfood-lab/*` packages have shipped to npm since v1.2.0; the seventh (`portfolio`) remains workspace-internal. The README's `` block is auto-stamped by `scripts/sync-version.mjs` (runs as `prebuild`). Use `npm run sync-version:check` as a CI gate when you bump. +**Lockstep.** All packages bump together. Currently **`1.3.2`** ([release v1.3.1 cut 2026-06-01](https://github.com/dogfood-lab/testing-os/releases/tag/v1.3.1); release v1.3.0 cut 2026-06-01; release v1.2.3 cut 2026-05-20; first stable v1.0.0 cut 2026-04-25). Six of seven `@dogfood-lab/*` packages have shipped to npm since v1.2.0; the seventh (`portfolio`) remains workspace-internal. The README's `` block is auto-stamped by `scripts/sync-version.mjs` (runs as `prebuild`). Use `npm run sync-version:check` as a CI gate when you bump. ### TypeScript `tsconfig.base.json` is the only place to set compiler options. Per-package `tsconfig.json` extends it and adds `outDir`/`rootDir`/`include`. `composite: true` everywhere. Never set `baseUrl` (deprecated; bit repo-knowledge in CI). @@ -118,12 +118,12 @@ Tests that need policy/schema/record fixtures read them from the runtime data di When a new test needs a new fixture, add it under `fixtures//.yaml` (or `.json`). Fixture filenames should describe what they exercise: `valid/well-formed-mcp-server-record.yaml`, `invalid/missing-source-record-ids.yaml`. ### Schemas -JSON Schema 2020-12. Title and description on every schema and every property. `additionalProperties: false` unless an open-ended bag is genuinely intended. The 8 current schemas in `packages/schemas/src/json/` are the canonical examples. +JSON Schema 2020-12. Title and description on every schema and every property. `additionalProperties: false` unless an open-ended bag is genuinely intended. The 8 contract-spine schemas in `packages/schemas/src/json/` (those registered in `validatePayload`) are the canonical examples. A 9th file, `agent-output.schema.json`, lives in the same directory and ships via the `./json/*` subpath, but it is a swarm output envelope resolved with a local Ajv — not a registered payload schema. `$id` URLs point at the canonical monorepo path: `https://github.com/dogfood-lab/testing-os/packages/schemas/src/json/.schema.json`. If you ever change a schema in a way that consumers should treat as a contract change, bump the workspace lockstep version — `$id` is a contract field. ### Ship gate -`SHIP_GATE.md` at the repo root tracks what shipcheck audits. Hard gates A–D (Security, Errors, Operator Docs, Hygiene) currently pass at 100% (21 checked / 16 SKIP-with-justification / 0 unchecked at v1.3.1, re-affirmed 2026-06-01). Soft gate E (Identity) is fully met. Re-run `npx @mcptoolshop/shipcheck audit` before any release; if a previously-checked item fails, fix the underlying gap before bumping the version. +`SHIP_GATE.md` at the repo root tracks what shipcheck audits. Hard gates A–D (Security, Errors, Operator Docs, Hygiene) currently pass at 100% (21 checked / 16 SKIP-with-justification / 0 unchecked at v1.3.2, re-affirmed 2026-06-02). Soft gate E (Identity) is fully met. Re-run `npx @mcptoolshop/shipcheck audit` before any release; if a previously-checked item fails, fix the underlying gap before bumping the version. ### Runtime data dirs at the repo root `policies/`, `fixtures/`, `records/`, `indexes/`, `reports/`, `swarms/`, `dogfood/`, `docs/`. These are the **shared backing store** that consumers (e.g. `repo-knowledge`, `shipcheck`) read from via `raw.githubusercontent.com/dogfood-lab/testing-os/main/...` URLs. The paths inside those dirs are part of the public API. **Don't reorganize them without thinking about every consumer first.** diff --git a/README.es.md b/README.es.md index 6c98319..1818e57 100644 --- a/README.es.md +++ b/README.es.md @@ -20,7 +20,7 @@ *Protocolos, almacenes de evidencia y ciclos de aprendizaje para software asistido por IA.* -**v1.3.1** — versión actual. Consulte [CHANGELOG.md](CHANGELOG.md) para ver qué se incluyó en esta versión. +**v1.3.2** — versión actual. Consulte [CHANGELOG.md](CHANGELOG.md) para ver qué se incluyó en esta versión. 📖 **[Lea el manual →](https://dogfood-lab.github.io/testing-os/handbook/)** @@ -45,7 +45,7 @@ npm install -g @dogfood-lab/dogfood-swarm swarm --help ``` -La guía del operador, la referencia de la interfaz de línea de comandos (CLI), la referencia del esquema y las recetas de integración se encuentran en el **[manual](https://dogfood-lab.github.io/testing-os/handbook/)**. Los detalles específicos de cada versión se encuentran en [CHANGELOG.md](CHANGELOG.md). +La guía del operador, la referencia de la interfaz de línea de comandos, la referencia del esquema y las recetas de integración se encuentran en el **[manual](https://dogfood-lab.github.io/testing-os/handbook/)**. Los detalles de cada versión se encuentran en [CHANGELOG.md](CHANGELOG.md). ## Modelo de amenazas @@ -104,7 +104,7 @@ Requiere Node ≥ 22. La matriz de CI ejecuta Node 22 y 24 en `ubuntu-latest`; s ## Control de versiones -Todos los paquetes `@dogfood-lab/*` se actualizan juntos, con un único número de versión para todo el repositorio. Se publican seis paquetes en npm bajo `@dogfood-lab` en la versión v1.3.1, de forma sincronizada (`schemas`, `verify`, `report`, `ingest`, `findings`, `dogfood-swarm`); el séptimo, `@dogfood-lab/portfolio`, permanece interno. La línea de versión que aparece en la parte superior de este archivo README se actualiza automáticamente desde `package.json` mediante [`scripts/sync-version.mjs`](scripts/sync-version.mjs) cada vez que se ejecuta `npm run build`. +Todos los paquetes `@dogfood-lab/*` se actualizan juntos, con un único número de versión para todo el repositorio. Se publican seis paquetes en npm bajo `@dogfood-lab` en la versión v1.3.2, de forma sincronizada (`schemas`, `verify`, `report`, `ingest`, `findings`, `dogfood-swarm`); el séptimo, `@dogfood-lab/portfolio`, permanece interno. La línea de versión que aparece en la parte superior de este archivo README se actualiza automáticamente desde `package.json` mediante [`scripts/sync-version.mjs`](scripts/sync-version.mjs) cada vez que se ejecuta `npm run build`. ## Licencia diff --git a/README.fr.md b/README.fr.md index de450f8..e5e6350 100644 --- a/README.fr.md +++ b/README.fr.md @@ -20,7 +20,7 @@ *Protocoles, référentiels de preuves et boucles d’apprentissage pour les logiciels assistés par l’IA.* -**v1.3.1** — version actuelle. Consultez le fichier [CHANGELOG.md](CHANGELOG.md) pour connaître les nouveautés. +**v1.3.2** — version actuelle. Consultez le fichier [CHANGELOG.md](CHANGELOG.md) pour connaître les modifications apportées. 📖 **[Consultez le manuel →](https://dogfood-lab.github.io/testing-os/handbook/)** @@ -104,7 +104,7 @@ Nécessite Node ≥ 22. La matrice CI exécute Node 22 + 24 sur `ubuntu-latest`; ## Gestion des versions -Tous les paquets commençant par `@dogfood-lab/*` sont mis à jour simultanément, avec un seul numéro de version pour l’ensemble du monorepo. Six paquets sont publiés sur npm sous `@dogfood-lab` à la version v1.3.1, de manière synchronisée (`schemas`, `verify`, `report`, `ingest`, `findings`, `dogfood-swarm`); le septième, `@dogfood-lab/portfolio`, reste interne. La ligne de version située en haut de ce fichier README est automatiquement mise à jour à partir du fichier `package.json` via le script [`scripts/sync-version.mjs`](scripts/sync-version.mjs) à chaque exécution de `npm run build`. +Tous les paquets commençant par `@dogfood-lab/*` sont mis à jour simultanément, avec un seul numéro de version pour l’ensemble du monorepo. Six paquets sont publiés sur npm sous le nom `@dogfood-lab` à la version v1.3.2, de manière synchronisée (`schemas`, `verify`, `report`, `ingest`, `findings`, `dogfood-swarm`); le septième, `@dogfood-lab/portfolio`, reste interne. La ligne de version située en haut de ce fichier README est automatiquement générée à partir du fichier `package.json` via le script [`scripts/sync-version.mjs`](scripts/sync-version.mjs) à chaque exécution de la commande `npm run build`. ## Licence diff --git a/README.hi.md b/README.hi.md index c21e1ed..7178a6d 100644 --- a/README.hi.md +++ b/README.hi.md @@ -20,7 +20,7 @@ *एआई-सहायक सॉफ़्टवेयर के लिए प्रोटोकॉल, साक्ष्य भंडार और शिक्षण लूप।* -**v1.3.1** — वर्तमान संस्करण। इसमें क्या शामिल किया गया है, यह देखने के लिए [CHANGELOG.md](CHANGELOG.md) देखें। +**v1.3.2** — वर्तमान संस्करण। इसमें क्या शामिल किया गया है, यह देखने के लिए [CHANGELOG.md](CHANGELOG.md) देखें। 📖 **[हैंडबुक पढ़ें →](https://dogfood-lab.github.io/testing-os/handbook/)** @@ -45,7 +45,7 @@ npm install -g @dogfood-lab/dogfood-swarm swarm --help ``` -ऑपरेटर का मार्गदर्शन, कमांड लाइन इंटरफेस (सीएलआई) संदर्भ, स्कीमा संदर्भ और एकीकरण के लिए आवश्यक निर्देशिका **[हैंडबुक](https://dogfood-lab.github.io/testing-os/handbook/)** में उपलब्ध हैं। प्रत्येक संस्करण के लिए विस्तृत जानकारी [चेंजलॉग.एमडी](CHANGELOG.md) में दी गई है। +ऑपरेटर का मार्गदर्शिका, सीएलआई संदर्भ, स्कीमा संदर्भ और एकीकरण व्यंजनों **[हैंडबुक](https://dogfood-lab.github.io/testing-os/handbook/)** में उपलब्ध हैं। प्रत्येक संस्करण के लिए विस्तृत जानकारी [CHANGELOG.md](CHANGELOG.md) में दी गई है। ## खतरा मॉडल @@ -104,7 +104,7 @@ npm run verify # build + test (canonical pre-commit check) ## संस्करण नियंत्रण -सभी `@dogfood-lab/*` पैकेज एक साथ अपडेट किए जाते हैं — मोनोरपो में एक ही संख्या। छह पैकेज v1.3.1 पर `@dogfood-lab` के तहत npm पर प्रकाशित होते हैं (`schemas`, `verify`, `report`, `ingest`, `findings`, `dogfood-swarm`); सातवां, `@dogfood-lab/portfolio`, आंतरिक रूप से ही रहता है। इस रीडमी के शीर्ष के पास संस्करण पंक्ति को हर `npm run build` पर [`scripts/sync-version.mjs`](scripts/sync-version.mjs) के माध्यम से `package.json` से स्वचालित रूप से अपडेट किया जाता है। +सभी `@dogfood-lab/*` पैकेज एक साथ अपडेट किए जाते हैं — मोनोरपो में एक ही संख्या। छह पैकेज v1.3.2 पर `@dogfood-lab` के तहत npm पर प्रकाशित होते हैं (`schemas`, `verify`, `report`, `ingest`, `findings`, `dogfood-swarm`); सातवां, `@dogfood-lab/portfolio`, आंतरिक रूप से ही रहता है। इस रीडमी के शीर्ष के पास संस्करण पंक्ति को हर `npm run build` पर [`scripts/sync-version.mjs`](scripts/sync-version.mjs) के माध्यम से `package.json` से स्वचालित रूप से अपडेट किया जाता है। ## लाइसेंस diff --git a/README.it.md b/README.it.md index 39e447a..eb40649 100644 --- a/README.it.md +++ b/README.it.md @@ -20,7 +20,7 @@ *Protocolli, archivi di evidenze e cicli di apprendimento per software assistito dall'IA.* -**v1.3.1** — versione corrente. Per i dettagli sulle modifiche, consultare il file [CHANGELOG.md](CHANGELOG.md). +**v1.3.2** — versione corrente. Per informazioni sulle modifiche apportate, consultare il file [CHANGELOG.md](CHANGELOG.md). 📖 **[Leggi il manuale →](https://dogfood-lab.github.io/testing-os/handbook/)** @@ -104,7 +104,7 @@ Richiede Node ≥ 22. La matrice CI esegue Node 22 e 24 su `ubuntu-latest`; è s ## Gestione delle versioni -Tutti i pacchetti `@dogfood-lab/*` vengono aggiornati contemporaneamente, con un unico numero di versione per l'intero repository monolitico. Sei pacchetti vengono pubblicati su npm con il prefisso `@dogfood-lab` alla versione v1.3.1, in modo sincronizzato (`schemas`, `verify`, `report`, `ingest`, `findings`, `dogfood-swarm`); il settimo, `@dogfood-lab/portfolio`, rimane interno. La riga della versione all'inizio di questo file README viene generata automaticamente dal file `package.json` tramite lo script [`scripts/sync-version.mjs`](scripts/sync-version.mjs) ogni volta che viene eseguito il comando `npm run build`. +Tutti i pacchetti `@dogfood-lab/*` vengono aggiornati contemporaneamente, con un unico numero di versione per l'intero repository monolitico. Sei pacchetti vengono pubblicati su npm con il prefisso `@dogfood-lab` alla versione v1.3.2, in modo sincronizzato (`schemas`, `verify`, `report`, `ingest`, `findings`, `dogfood-swarm`); il settimo, `@dogfood-lab/portfolio`, rimane di uso interno. La riga della versione che si trova nella parte superiore di questo file README viene aggiornata automaticamente dal file `package.json` tramite lo script [`scripts/sync-version.mjs`](scripts/sync-version.mjs) ogni volta che viene eseguito il comando `npm run build`. ## Licenza diff --git a/README.ja.md b/README.ja.md index d9ae78a..5a378fe 100644 --- a/README.ja.md +++ b/README.ja.md @@ -20,7 +20,7 @@ *AIによる支援を受けたソフトウェアのためのプロトコル、証拠ストア、および学習ループ。* -**v1.3.1** — 現在のリリース版。今回のリリース内容については、[CHANGELOG.md](CHANGELOG.md) を参照してください。 +**v1.3.2** — 現在のリリース版。変更点は[CHANGELOG.md](CHANGELOG.md)を参照してください。 📖 **[ハンドブックを読む →](https://dogfood-lab.github.io/testing-os/handbook/)** @@ -45,7 +45,7 @@ npm install -g @dogfood-lab/dogfood-swarm swarm --help ``` -オペレーターガイド、CLIリファレンス、スキーマリファレンス、および統合レシピは、**[ハンドブック](https://dogfood-lab.github.io/testing-os/handbook/)** に収録されています。バージョンごとの詳細については、[CHANGELOG.md](CHANGELOG.md) を参照してください。 +オペレーターガイド、CLIリファレンス、スキーマリファレンス、および統合レシピは、**[handbook](https://dogfood-lab.github.io/testing-os/handbook/)** にあります。バージョンごとの詳細については、[CHANGELOG.md](CHANGELOG.md)を参照してください。 ## 脅威モデル @@ -104,7 +104,7 @@ Node 22以上が必要です。CIマトリックスでは、`ubuntu-latest`上 ## バージョン管理 -すべての`@dogfood-lab/*`パッケージは、まとめてバージョンアップされます。つまり、モノリポ全体でバージョン番号が1つだけ上がります。6つのパッケージ(`schemas`、`verify`、`report`、`ingest`、`findings`、`dogfood-swarm`)が、`v1.3.1`としてnpmに公開されます。7番目のパッケージである`@dogfood-lab/portfolio`は、引き続き内部利用のみとします。このREADMEの先頭付近にあるバージョン番号は、`npm run build`を実行するたびに、`package.json`から[`scripts/sync-version.mjs`](scripts/sync-version.mjs)を通じて自動的に更新されます。 +すべての`@dogfood-lab/*`パッケージはまとめてバージョンアップされます。モノリポ全体でバージョン番号が統一されます。6つのパッケージ(`schemas`、`verify`、`report`、`ingest`、`findings`、`dogfood-swarm`)が、v1.3.2として`@dogfood-lab`のもとでnpmに公開されます。7番目のパッケージである`@dogfood-lab/portfolio`は、引き続き社内利用のみとなります。このREADMEの上部付近にあるバージョン番号は、`npm run build`を実行するたびに、`package.json`から[`scripts/sync-version.mjs`](scripts/sync-version.mjs)を通じて自動的に更新されます。 ## ライセンス diff --git a/README.md b/README.md index 9efcdf5..937ac53 100644 --- a/README.md +++ b/README.md @@ -20,7 +20,7 @@ *Protocols, evidence stores, and learning loops for AI-assisted software.* -**v1.3.1** — current release. See [CHANGELOG.md](CHANGELOG.md) for what shipped. +**v1.3.2** — current release. See [CHANGELOG.md](CHANGELOG.md) for what shipped. 📖 **[Read the handbook →](https://dogfood-lab.github.io/testing-os/handbook/)** @@ -104,7 +104,7 @@ Requires Node ≥ 22. CI matrix runs Node 22 + 24 on `ubuntu-latest`; locally va ## Versioning -All `@dogfood-lab/*` packages bump together — one number across the monorepo. Six packages publish to npm under `@dogfood-lab` at v1.3.1 in lockstep (`schemas`, `verify`, `report`, `ingest`, `findings`, `dogfood-swarm`); the seventh, `@dogfood-lab/portfolio`, stays internal. The version line near the top of this README is auto-stamped from `package.json` via [`scripts/sync-version.mjs`](scripts/sync-version.mjs) on every `npm run build`. +All `@dogfood-lab/*` packages bump together — one number across the monorepo. Six packages publish to npm under `@dogfood-lab` at v1.3.2 in lockstep (`schemas`, `verify`, `report`, `ingest`, `findings`, `dogfood-swarm`); the seventh, `@dogfood-lab/portfolio`, stays internal. The version line near the top of this README is auto-stamped from `package.json` via [`scripts/sync-version.mjs`](scripts/sync-version.mjs) on every `npm run build`. ## License diff --git a/README.pt-BR.md b/README.pt-BR.md index 70d73b3..d5f4df2 100644 --- a/README.pt-BR.md +++ b/README.pt-BR.md @@ -20,7 +20,7 @@ *Protocolos, repositórios de evidências e ciclos de aprendizado para software assistido por IA.* -**v1.3.1** — versão atual. Consulte o arquivo [CHANGELOG.md](CHANGELOG.md) para ver as novidades. +**v1.3.2** — versão atual. Consulte o arquivo [CHANGELOG.md](CHANGELOG.md) para ver as alterações incluídas. 📖 **[Leia o manual →](https://dogfood-lab.github.io/testing-os/handbook/)** @@ -104,7 +104,7 @@ Requer Node ≥ 22. A matriz de CI executa o Node 22 e 24 no `ubuntu-latest`; va ## Controle de versão -Todos os pacotes `@dogfood-lab/*` são atualizados em conjunto — um único número para todo o monorepositorio. Seis pacotes são publicados no npm sob `@dogfood-lab` na versão v1.3.1, de forma sincronizada (`schemas`, `verify`, `report`, `ingest`, `findings`, `dogfood-swarm`); o sétimo, `@dogfood-lab/portfolio`, permanece interno. A linha de versão no início deste arquivo README é gerada automaticamente a partir do arquivo `package.json` por meio de [`scripts/sync-version.mjs`](scripts/sync-version.mjs) a cada execução de `npm run build`. +Todos os pacotes `@dogfood-lab/*` são atualizados em conjunto — um único número para todo o monorepositorio. Seis pacotes são publicados no npm sob o nome `@dogfood-lab` na versão v1.3.2, de forma sincronizada (`schemas`, `verify`, `report`, `ingest`, `findings`, `dogfood-swarm`); o sétimo, `@dogfood-lab/portfolio`, permanece interno. A linha de versão no início deste arquivo README é gerada automaticamente a partir do arquivo `package.json` por meio do script [`scripts/sync-version.mjs`](scripts/sync-version.mjs) a cada execução do comando `npm run build`. ## Licença diff --git a/README.zh.md b/README.zh.md index 96dc25a..fd66534 100644 --- a/README.zh.md +++ b/README.zh.md @@ -20,7 +20,7 @@ *用于人工智能辅助软件的协议、证据存储和学习循环。* -**v1.3.1** — 当前版本。请参阅 [CHANGELOG.md](CHANGELOG.md) 以了解本次更新的内容。 +**v1.3.2** — 当前版本。请参阅 [CHANGELOG.md](CHANGELOG.md),了解本次更新的内容。 📖 **[阅读手册 →](https://dogfood-lab.github.io/testing-os/handbook/)** @@ -104,7 +104,7 @@ npm run verify # build + test (canonical pre-commit check) ## 版本控制 -所有 `@dogfood-lab/*` 包都同步更新——整个代码仓库的版本号统一更新。六个包以 v1.3.1 的版本同步发布到 npm,这些包分别是:`schemas`、`verify`、`report`、`ingest`、`findings` 和 `dogfood-swarm`;第七个包 `@dogfood-lab/portfolio` 仍然是内部使用的。本 README 文件顶部的版本号会通过 [`scripts/sync-version.mjs`](scripts/sync-version.mjs) 脚本,在每次执行 `npm run build` 时,从 `package.json` 文件中自动更新。 +所有以 `@dogfood-lab/*` 开头的软件包的版本号同步更新——整个代码仓库的版本号都统一更新。六个软件包以 v1.3.2 的版本号同步发布到 npm,这些软件包分别是:`schemas`、`verify`、`report`、`ingest`、`findings` 和 `dogfood-swarm`;第七个软件包 `@dogfood-lab/portfolio` 仍然是内部使用的。本 README 文件顶部的版本号行,会在每次执行 `npm run build` 时,通过 [`scripts/sync-version.mjs`](scripts/sync-version.mjs) 从 `package.json` 文件中自动提取并更新。 ## 许可证 diff --git a/SCORECARD.md b/SCORECARD.md index 0880e17..5db12ab 100644 --- a/SCORECARD.md +++ b/SCORECARD.md @@ -3,7 +3,7 @@ > Score a repo before remediation. Fill this out first, then use SHIP_GATE.md to fix. **Repo:** `dogfood-lab/testing-os` -**Date:** 2026-06-01 (re-affirmed at v1.3.1; first scored 2026-04-25 at v1.0.0; re-affirmed 2026-05-14 at v1.2.0; re-affirmed 2026-05-31 at v1.2.3; re-affirmed 2026-06-01 at v1.3.0) +**Date:** 2026-06-02 (re-affirmed at v1.3.2; first scored 2026-04-25 at v1.0.0; re-affirmed 2026-05-14 at v1.2.0; re-affirmed 2026-05-31 at v1.2.3; re-affirmed 2026-06-01 at v1.3.0; re-affirmed 2026-06-01 at v1.3.1) **Type tags:** `[monorepo]` `[npm-workspaces]` `[cli]` `[mcp-adjacent]` — see [`CLAUDE.md`](CLAUDE.md) for the seven workspace packages. ## Pre-Remediation Assessment @@ -37,7 +37,7 @@ Baseline at the pre-v1.0.0 migration handoff (sourced from [`HANDOFF.md`](HANDOF ## Post-Remediation -Sourced primarily from [`SHIP_GATE.md`](SHIP_GATE.md): at v1.3.1 (re-affirmed 2026-06-01), every applicable hard-gate A–D row carries either an `[x]` evidence stamp or a `SKIP:` with explicit justification — `shipcheck audit` at the v1.3.1 release tree exits 0 (21 checked / 16 SKIP-with-justification / 0 unchecked). Soft gate E is fully met. The "100% pass on hard gates A–D" headline phrasing reflects the audit-tool verdict, not a hand-curated estimate; the per-row evidence dates below are the auditable substrate. +Sourced primarily from [`SHIP_GATE.md`](SHIP_GATE.md): at v1.3.2 (re-affirmed 2026-06-02), every applicable hard-gate A–D row carries either an `[x]` evidence stamp or a `SKIP:` with explicit justification — `shipcheck audit` at the v1.3.1 release tree exits 0 (21 checked / 16 SKIP-with-justification / 0 unchecked). Soft gate E is fully met. The "100% pass on hard gates A–D" headline phrasing reflects the audit-tool verdict, not a hand-curated estimate; the per-row evidence dates below are the auditable substrate. | Category | Before | After | |----------|--------|-------| @@ -55,4 +55,4 @@ The remaining 6 points to a perfect 50 are explicitly tracked rather than papere - **Shipping hygiene (1 point)** — Dependabot config in `.github/dependabot.yml` + `npm audit` in `ci.yml` matrix. Tracked as SHIP_GATE D-48/D-49 SKIPs. - **Identity (2 points)** — repo metadata polish (GitHub topics, social preview image, About-text refinement); minor but unfinished. -Honest deltas: every SKIP that remains in SHIP_GATE.md (the MCP / desktop / VSCode items that don't apply, plus the real follow-ups above) is the reason individual category scores stay below 10. The repo is shippable at v1.3.1 by every applicable contract — six of seven `@dogfood-lab/*` packages are live on npm since v1.2.0; the gap is between "shippable" and "perfect." +Honest deltas: every SKIP that remains in SHIP_GATE.md (the MCP / desktop / VSCode items that don't apply, plus the real follow-ups above) is the reason individual category scores stay below 10. The repo is shippable at v1.3.2 by every applicable contract — six of seven `@dogfood-lab/*` packages are live on npm since v1.2.0; the gap is between "shippable" and "perfect." diff --git a/SHIP_GATE.md b/SHIP_GATE.md index 2010f1e..fb88ca3 100644 --- a/SHIP_GATE.md +++ b/SHIP_GATE.md @@ -36,7 +36,7 @@ - [x] `[all]` README is current: what it does, install, usage, supported platforms + runtime versions (2026-04-25) - [x] `[all]` CHANGELOG.md (Keep a Changelog format) — updated with v1.0.0 entry (2026-04-25) - [x] `[all]` LICENSE file present and repo states support status (2026-04-25) -- [x] `[cli]` `--help` output accurate for all commands and flags — `swarm` bin documents its 21 subcommands (init, domains, dispatch, collect, revalidate, rewind, redrive, verify, verify-fixed, verify-recurring, verify-unverified, verify-approved, receipt, advance, status, resume, history, approve, persist, findings, runs) (2026-04-25, last re-affirmed 2026-06-01 at v1.3.1) +- [x] `[cli]` `--help` output accurate for all commands and flags — `swarm` bin documents its 21 subcommands (init, domains, dispatch, collect, revalidate, rewind, redrive, verify, verify-fixed, verify-recurring, verify-unverified, verify-approved, receipt, advance, status, resume, history, approve, persist, findings, runs) (2026-04-25, last re-affirmed 2026-06-02 at v1.3.2) - [ ] `[cli|mcp|desktop]` SKIP: testing-os tools don't expose user-facing logging level controls. The receiver workflow logs via GitHub Actions; the `swarm` CLI prints to stdout/stderr. No secrets to redact in operator output. Promote if a logging-level requirement surfaces. - [ ] `[mcp]` SKIP: not an MCP server. - [x] `[complex]` HANDBOOK.md — the Astro Starlight handbook serves this purpose, deployed at [dogfood-lab.github.io/testing-os/](https://dogfood-lab.github.io/testing-os/) (2026-04-25) @@ -44,11 +44,11 @@ ## D. Shipping Hygiene - [x] `[all]` `verify` script exists (test + build + smoke in one command) — `npm run verify` (2026-04-25) -- [x] `[all]` Version in manifest matches git tag — root + 7 packages all at `1.3.1`, tag `v1.3.1` (2026-06-01) +- [x] `[all]` Version in manifest matches git tag — root + 7 packages all at `1.3.2`, tag `v1.3.2` (2026-06-02) - [ ] `[all]` SKIP: dependency scanning not yet wired into CI. Tracked as a follow-up — would add `npm audit --audit-level=moderate` to `ci.yml` or enable Dependabot security alerts. Six of seven `@dogfood-lab/*` packages have been published to npm since v1.2.0 (`schemas`, `verify`, `report`, `ingest`, `findings`, `dogfood-swarm`); `npm audit` against the root tree currently reports 0 high (the `fast-uri` override in `package.json` closes GHSA-q3j6-qgpj-74h6 / GHSA-v39h-62p7-jpjc). Dependabot config still pending per HANDOFF.md follow-up. - [ ] `[all]` SKIP: no automated dependency update mechanism. Same justification as above — Dependabot config wants a separate session. Surface counts on `npm audit` are tracked in HANDOFF.md (`site/` has 8 audit warnings inherited from legacy lockfile). -- [x] `[npm]` Six of seven `@dogfood-lab/*` packages are published on npm since v1.2.0 (`schemas`, `verify`, `report`, `ingest`, `findings`, `dogfood-swarm`); headline install is `npm install -g @dogfood-lab/dogfood-swarm`. The seventh (`@dogfood-lab/portfolio`) remains intentionally workspace-internal. Per-package READMEs carry the canonical logo since v1.2.1. (2026-05-14, last re-affirmed 2026-06-01 at v1.3.1) -- [x] `[npm]` `engines.node` set — root `package.json` has `"engines": {"node": ">=22"}` (tightened from `>=20` to `>=22` in v1.2.2 to match the CI Node 22 + 24 matrix; last re-affirmed 2026-06-01 at v1.3.1) +- [x] `[npm]` Six of seven `@dogfood-lab/*` packages are published on npm since v1.2.0 (`schemas`, `verify`, `report`, `ingest`, `findings`, `dogfood-swarm`); headline install is `npm install -g @dogfood-lab/dogfood-swarm`. The seventh (`@dogfood-lab/portfolio`) remains intentionally workspace-internal. Per-package READMEs carry the canonical logo since v1.2.1. (2026-05-14, last re-affirmed 2026-06-02 at v1.3.2) +- [x] `[npm]` `engines.node` set — root `package.json` has `"engines": {"node": ">=22"}` (tightened from `>=20` to `>=22` in v1.2.2 to match the CI Node 22 + 24 matrix; last re-affirmed 2026-06-02 at v1.3.2) - [x] `[npm]` Lockfile committed (2026-04-25) - [ ] `[vsix]` SKIP: not a VS Code extension. - [ ] `[desktop]` SKIP: not a desktop app. diff --git a/package-lock.json b/package-lock.json index a1b41af..32a6d24 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,16 +1,19 @@ { "name": "testing-os", - "version": "1.3.1", + "version": "1.3.2", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "testing-os", - "version": "1.3.1", + "version": "1.3.2", "license": "MIT", "workspaces": [ "packages/*" ], + "dependencies": { + "testing-os": "file:" + }, "devDependencies": { "@types/node": "^25.3.5", "@vitest/coverage-v8": "^4.1.6", @@ -1838,6 +1841,10 @@ "node": ">=6" } }, + "node_modules/testing-os": { + "resolved": "", + "link": true + }, "node_modules/tinybench": { "version": "2.9.0", "resolved": "https://registry.npmjs.org/tinybench/-/tinybench-2.9.0.tgz", @@ -2122,11 +2129,12 @@ }, "packages/dogfood-swarm": { "name": "@dogfood-lab/dogfood-swarm", - "version": "1.3.1", + "version": "1.3.2", "license": "MIT", "dependencies": { "@dogfood-lab/findings": "^1.2.0", "@dogfood-lab/report": "^1.2.0", + "@dogfood-lab/schemas": "^1.2.0", "ajv": "^8.18.0", "ajv-formats": "^3.0.1", "better-sqlite3": "^12.10.0", @@ -2141,7 +2149,7 @@ }, "packages/findings": { "name": "@dogfood-lab/findings", - "version": "1.3.1", + "version": "1.3.2", "license": "MIT", "dependencies": { "@dogfood-lab/ingest": "^1.2.0", @@ -2157,7 +2165,7 @@ }, "packages/ingest": { "name": "@dogfood-lab/ingest", - "version": "1.3.1", + "version": "1.3.2", "license": "MIT", "dependencies": { "@dogfood-lab/dogfood-swarm": "^1.2.0", @@ -2171,7 +2179,7 @@ }, "packages/portfolio": { "name": "@dogfood-lab/portfolio", - "version": "1.3.1", + "version": "1.3.2", "license": "MIT", "dependencies": { "js-yaml": "^4.1.0" @@ -2185,7 +2193,7 @@ }, "packages/report": { "name": "@dogfood-lab/report", - "version": "1.3.1", + "version": "1.3.2", "license": "MIT", "dependencies": { "@dogfood-lab/schemas": "^1.2.0" @@ -2196,7 +2204,7 @@ }, "packages/schemas": { "name": "@dogfood-lab/schemas", - "version": "1.3.1", + "version": "1.3.2", "license": "MIT", "dependencies": { "ajv": "^8.18.0", @@ -2212,7 +2220,7 @@ }, "packages/verify": { "name": "@dogfood-lab/verify", - "version": "1.3.1", + "version": "1.3.2", "license": "MIT", "dependencies": { "@dogfood-lab/schemas": "^1.2.0", diff --git a/package.json b/package.json index 59e9a82..79596d7 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "testing-os", - "version": "1.3.1", + "version": "1.3.2", "private": true, "description": "Operating system for testing in the AI era — protocols, evidence stores, and learning loops for AI-assisted software.", "engines": { @@ -40,5 +40,8 @@ }, "overrides": { "fast-uri": "^3.1.2" + }, + "dependencies": { + "testing-os": "file:" } } diff --git a/packages/dogfood-swarm/README.md b/packages/dogfood-swarm/README.md index ab8d009..a409bcf 100644 --- a/packages/dogfood-swarm/README.md +++ b/packages/dogfood-swarm/README.md @@ -18,19 +18,28 @@ The `swarm` CLI runs parallel-agent audits against a codebase. Each wave dispatc npm install -g @dogfood-lab/dogfood-swarm ``` -Binary: `swarm`. Requires Node ≥ 20. +Binary: `swarm`. Requires Node ≥ 22. ## Quick start ```bash -# Initialize a swarm run -swarm dispatch +# Initialize a swarm run — detects domains, records a save-point +swarm init + +# Review the detected domain draft, then freeze it (dispatch refuses +# to run until the domain map is frozen) +swarm domains --freeze + +# Dispatch a wave for a named phase (NOT a wave number — phase names below) +swarm dispatch # (Agents execute externally — e.g., parallel Claude sessions — and write # their outputs to swarms//wave-N//output.json) -# Collect outputs through the verifier -swarm collect +# Collect outputs through the verifier — one --domain per dispatched agent +swarm collect \ + --domain=backend:swarms//wave-N/backend/output.json \ + --domain=tests:swarms//wave-N/tests/output.json # Inspect current wave + agent state swarm status @@ -45,6 +54,12 @@ swarm receipt swarm advance ``` +`` is a named phase, not a wave number. The valid values are: +`health-audit-a`, `health-audit-b`, `health-audit-c`, `stage-d-audit`, +`feature-audit` (audit phases) and `health-amend-a`, `health-amend-b`, +`health-amend-c`, `stage-d-amend`, `feature-execute` (amend phases). Run +`swarm dispatch --help` for the same list. + ## Recovery — the Three R's | Verb | When to use | Behavior | @@ -62,8 +77,9 @@ All three recovery verbs share the same operator-safety contract: Example session: ```bash -# Failed wave needs schema-mismatch repair -swarm revalidate --reason "wave-2 schema mismatch corrected" --apply +# Failed wave needs schema-mismatch repair (re-supply the agent's output path) +swarm revalidate --reason "wave-2 schema mismatch corrected" \ + --domain=backend:swarms//wave-2/backend/output.json --apply # Wedged wave — restart from save-point tag swarm rewind --reason "rolling back wedged amend wave" --apply @@ -75,11 +91,50 @@ swarm redrive --reason "GitHub API outage retry" --apply swarm history ``` +## Exit codes + +The verbs designed to gate CI propagate a machine-readable exit code, not just human-readable stdout. Wire these into a workflow step or a `&&` chain and the gate fails closed: + +| Verb | Exit code contract | +|---|---| +| `swarm verify` | `0` **only** when the verdict is `pass`; `1` for every other verdict (`fail`, `skip`, `no_tests`, `tool_missing`). Each non-pass verdict is "not a verified pass" — see [Verify verdicts](#verify-verdicts) — so the machine signal matches the human one and a CI `&&` chain fails closed (a `no_tests` or `tool_missing` never reads as success). | +| `swarm verify-fixed` | `0` clean / `1` threshold exceeded (`regressed + claimed-but-still-present > --threshold`, default 0) / `2` audit pipeline broken | +| `swarm verify-recurring` | `0` / `1` / `2` (same 3-way contract as `verify-fixed`) | +| `swarm verify-unverified` | `0` / `1` / `2` (same 3-way contract) | +| `swarm verify-approved` | `0` / `1` / `2` — exit `2` (broken finding anchor) is the pre-amend gate that blocks subsequent `swarm dispatch` of an amend phase | +| `swarm findings` | `0` clean / `1` findings present / `2` audit pipeline broken | +| `swarm persist --ingest` | `0` when the dogfood ingest succeeded (or was a `--dry-run`); `1` when the ingest failed. A bare `swarm persist` with no `--ingest` exits `0`. | + +Any command also exits `1` on a structured operator error (the typed `code` / `message` / `Next:` envelope). Exit `2` is reserved for the "pipeline broken" case on the verbs above so a CI gate can tell *findings/regressions exist* (1) apart from *the audit itself could not run* (2). + +## Troubleshooting — when a wave fails + +Every command emits its stage transitions as **NDJSON on stderr**, so the first move in an incident is to capture that forensic stream and read it back: + +```bash +swarm collect \ + --domain=backend:swarms//wave-N/backend/output.json 2>collect.ndjson +grep '"stage"' collect.ndjson # the ordered chain of what happened, with codes +``` + +Then map the symptom to the recovery verb: + +| Symptom | What it means | Recovery | +|---|---|---| +| `collect` failed mid-upsert (`COLLECT_UPSERT_FAILED`) | One agent's output failed validation or the merge transaction aborted; the wave is `failed`. | `swarm revalidate --reason "..." --domain=name:path --apply` — re-runs the same validators on the re-supplied output, and on pass flips the wave back to `collected` in one transaction. | +| Wave stuck in `dispatched` — never reached `collected` | Agents didn't all finish, or the run was interrupted before `collect`. | `swarm resume ` to re-dispatch the incomplete agents; or `swarm redrive --reason "..." --apply` to resume only the failed/unstarted tail while preserving completed receipts byte-identical. | +| Agents `BLOCKED` (`invalid_output` / `ownership_violation`) | Schema mismatch, or an agent wrote outside its frozen domain. | `invalid_output` → `swarm revalidate`. `ownership_violation` → extend the domain via `swarm domains --unfreeze … --edit … --freeze`, then `swarm revalidate`. | +| Wave wedged — tree state needs a full reset | The working tree drifted and the wave must restart from a save-point. | `swarm rewind --reason "..." --apply` — `git reset --hard ` plus lawful abort of orphaned in-flight runs, audit chain preserved. | + +All recovery verbs are **dry-run by default** — run them without `--apply` first to preview the transitions, then add `--apply`. Every error carries a typed `code` and a `Next:` hint; the full table is in the handbook. + +📖 Deeper incident docs: **[Recovery](https://dogfood-lab.github.io/testing-os/handbook/recovery/)** · **[Error codes](https://dogfood-lab.github.io/testing-os/handbook/error-codes/)** + ## State machines Two parallel state machines: -- **Agent runs** (`lib/state-machine.js`): `pending → dispatched → complete | failed | invalid_output | ownership_violation | aborted_for_rewind` +- **Agent runs** (`lib/state-machine.js`): `pending → dispatched → running → complete | failed | timed_out | invalid_output | ownership_violation | aborted_for_rewind` - **Waves** (`lib/wave-state-machine.js`): `dispatched → collected → verified → advanced | failed | aborted_for_rewind` Discipline: @@ -88,6 +143,20 @@ Discipline: - **BLOCKED statuses** (`failed`, `invalid_output`, `ownership_violation`) require explicit `override=true` + non-empty `reason` to transition out. - Every transition lands in `wave_state_events` / `agent_state_events` **atomically** with the underlying status mutation, inside the same SQLite transaction. +## Verify verdicts + +`swarm verify ` runs the build-verification adapter and prints `Verification: `. Only `pass` advances the wave to `verified` — the other four verdicts are deliberately distinct so a no-op never masquerades as a clean pass: + +| Verdict | Means | Advances the wave? | +|---|---|---| +| `pass` | Every required step ran and passed. | Yes | +| `fail` | A required step ran and failed — the code is broken. | No | +| `skip` | No required steps ran (every step was optional or filtered away). Nothing was verified. | No | +| `no_tests` | The repo has no `test` script; `npm test --if-present` ran zero tests. **Not** a verified pass — supply a real test command via a step override or pick an explicit `--adapter`. | No | +| `tool_missing` | A required tool (e.g. `npm`, `npx`) is absent from `PATH`, so verification could not run in this environment. **Not** a failure of the code under test — install the tool or run on a host that has it. | No | + +`no_tests` and `tool_missing` exist precisely so the wave gate stays honest: it refuses to advance without positive evidence, but it does not falsely report `FAIL` when the cause is a missing test script or a missing build tool rather than a real regression. + ## Control plane SQLite-backed. Each swarm run gets `swarms//control-plane.db`: @@ -103,6 +172,18 @@ SQLite-backed. Each swarm run gets `swarms//control-plane.db`: Read via `swarm status`, `swarm history`, `swarm receipt`. Never via raw SQL in scripts — the state-machine helpers are the supported interface and the audit chain depends on going through them. +## Environment variables + +Three environment variables are part of the scriptable surface — they are honored on every invocation: + +| Variable | Accepted values | Effect | +|---|---|---| +| `SWARM_DB` | a filesystem path | Overrides the control-plane DB path. Unset → the default `swarms//control-plane.db`. Point this at a non-default DB to run against an alternate control plane. | +| `DOGFOOD_FINDINGS_FORMAT` | `raw` \| `human` \| `json` | Forces the `swarm findings` output format, overriding both the `--format` flag and TTY auto-detection. `raw` → markdown, `human` → text, `json` → JSON. | +| `DOGFOOD_LOG_HUMAN` | `0` \| `1` | Controls the human-readable companion banner printed alongside the NDJSON stage stream on **stderr**. `0` → never emit the banner (deterministic machine-readable stderr for CI), `1` → always emit it. Unset → emit only when stderr is a TTY. | + +Stage transitions are emitted as **NDJSON on stderr** — one JSON object per line, greppable — while stdout carries the command's parse target. Set `DOGFOOD_LOG_HUMAN=0` when you want a clean, machine-parseable stderr stream (e.g. `swarm collect ... 2>collect.ndjson`). + ## 10-phase protocol | Phase | Purpose | diff --git a/packages/dogfood-swarm/amend1-bounded-json-discipline.test.js b/packages/dogfood-swarm/amend1-bounded-json-discipline.test.js index 8cf98e4..0165801 100644 --- a/packages/dogfood-swarm/amend1-bounded-json-discipline.test.js +++ b/packages/dogfood-swarm/amend1-bounded-json-discipline.test.js @@ -66,11 +66,11 @@ const ALLOWLIST = [ // non-existent code are noise that hide future genuine adds. Removed. { file: 'lib/templates.js', - reason: 'Loads scripts/agent-output.schema.json — a swarm-internal, repo-owned, fixed-path JSON Schema document. Same trust class as lib/validate-agent-output.js; this is the prompt-builder side that renders the contract block from the canonical schema.', + reason: 'Loads agent-output.schema.json via createRequire from @dogfood-lab/schemas — a swarm-internal, repo-owned JSON Schema document. Same trust class as lib/validate-agent-output.js; this is the prompt-builder side that renders the contract block from the canonical schema.', }, { file: 'lib/validate-agent-output.js', - reason: 'Loads scripts/agent-output.schema.json — a swarm-internal, repo-owned, fixed-path JSON Schema document used to compile the Ajv validator on first use. Not operator-supplied input.', + reason: 'Loads agent-output.schema.json via createRequire from @dogfood-lab/schemas — a swarm-internal, repo-owned JSON Schema document used to compile the Ajv validator on first use. Not operator-supplied input.', }, ]; diff --git a/packages/dogfood-swarm/amend1-state-machine-tx.test.js b/packages/dogfood-swarm/amend1-state-machine-tx.test.js index 6c871b6..269f31e 100644 --- a/packages/dogfood-swarm/amend1-state-machine-tx.test.js +++ b/packages/dogfood-swarm/amend1-state-machine-tx.test.js @@ -41,9 +41,15 @@ const STATE_MACHINE_SRC = readFileSync( function seedWave(db) { db.prepare(`INSERT INTO runs (id, repo, local_path, commit_sha, status) VALUES ('r1', 'org/x', '/tmp/x', ?, 'health-audit-a')`).run('a'.repeat(40)); + // Two owned domains with DISJOINT globs. (Pre sm-002 this seed used `['**']` + // for both, but freezeDomains now rejects overlapping owned globs because two + // exclusive owners claiming the same file breaches per-domain isolation. The + // glob values are irrelevant to this file's agent-state-machine atomicity + // probes — they only need two distinct domains — so disjoint globs keep the + // intent and satisfy the freeze guard.) saveDomainDraft(db, 'r1', [ - { name: 'd1', globs: ['**'], ownership_class: 'owned' }, - { name: 'd2', globs: ['**'], ownership_class: 'owned' }, + { name: 'd1', globs: ['src/**'], ownership_class: 'owned' }, + { name: 'd2', globs: ['tests/**'], ownership_class: 'owned' }, ]); freezeDomains(db, 'r1'); db.prepare(`INSERT INTO waves (run_id, phase, wave_number, status) diff --git a/packages/dogfood-swarm/cli.js b/packages/dogfood-swarm/cli.js index 6e5e3b7..602b1ef 100644 --- a/packages/dogfood-swarm/cli.js +++ b/packages/dogfood-swarm/cli.js @@ -23,7 +23,8 @@ import { parseArgs } from 'node:util'; import { resolve, join } from 'node:path'; -import { existsSync } from 'node:fs'; +import { existsSync, realpathSync } from 'node:fs'; +import { fileURLToPath } from 'node:url'; import { init } from './commands/init.js'; import { dispatch } from './commands/dispatch.js'; @@ -410,7 +411,14 @@ function cmdRevalidate(args) { let reason = ''; const reasonIdx = args.indexOf('--reason'); - if (reasonIdx >= 0 && args[reasonIdx + 1]) reason = args[reasonIdx + 1]; + // cli-003: a following token that starts with `--` is the NEXT flag, not the + // reason text. Without this guard, `--reason --apply` silently captured + // '--apply' as the reason (polluting the mandatory audit field) while the + // irreversible mutation still fired. Treat that as a missing reason so the + // "reason required" guard below errors out instead. + if (reasonIdx >= 0 && args[reasonIdx + 1] && !args[reasonIdx + 1].startsWith('--')) { + reason = args[reasonIdx + 1]; + } for (const a of args) { const m = a.match(/^--reason=(.+)$/s); if (m) reason = m[1]; @@ -499,7 +507,13 @@ function cmdRewind(args) { let reason = ''; const reasonIdx = args.indexOf('--reason'); - if (reasonIdx >= 0 && args[reasonIdx + 1]) reason = args[reasonIdx + 1]; + // cli-003: a following token starting with `--` is the next flag, not the + // reason. `--reason --apply` must NOT capture '--apply' as the audit reason + // (and rewind is irreversible — the polluted reason would land in the + // wave/agent_state_events record). Treat it as missing so the guard fires. + if (reasonIdx >= 0 && args[reasonIdx + 1] && !args[reasonIdx + 1].startsWith('--')) { + reason = args[reasonIdx + 1]; + } for (const a of args) { const m = a.match(/^--reason=(.+)$/s); if (m) reason = m[1]; @@ -571,7 +585,12 @@ function cmdRedrive(args) { let reason = ''; const reasonIdx = args.indexOf('--reason'); - if (reasonIdx >= 0 && args[reasonIdx + 1]) reason = args[reasonIdx + 1]; + // cli-003: a following token starting with `--` is the next flag, not the + // reason. `--reason --apply` must NOT capture '--apply' as the redrive audit + // reason. Treat it as missing so the "reason required" guard fires. + if (reasonIdx >= 0 && args[reasonIdx + 1] && !args[reasonIdx + 1].startsWith('--')) { + reason = args[reasonIdx + 1]; + } for (const a of args) { const m = a.match(/^--reason=(.+)$/s); if (m) reason = m[1]; @@ -662,22 +681,78 @@ function cmdVerify(args) { }); console.log(formatVerify(result)); + + // cli-p-002: `swarm verify` is billed as a wave gate, yet it used to exit + // 0 on every verdict — a CI step (or a `swarm verify && swarm + // advance ` chain) saw a green light on a hard FAIL. Exit 0 ONLY on a + // clean pass; every other verdict (fail / no_tests / skip / tool_missing) + // is "not a verified pass" and MUST surface as a non-zero exit so the + // machine signal matches the human-readable one. This aligns `verify` with + // its four verify-* sibling verbs, which already propagate exit codes. + if (result.verdict !== 'pass') { + // A `fail` verdict has no top-level `reason` (the runner only attaches + // one for skip/no_tests), so derive a why-line from the first failing + // required step. The operator always gets an explanation alongside the + // non-zero exit, never a bare verdict. + const failedStep = result.steps?.find(s => !s.passed && !s.optional); + const why = result.reason + || (failedStep + ? `required step '${failedStep.name}' failed (exit ${failedStep.exit_code})` + : 'not a verified pass'); + console.error(`swarm verify: ${result.verdict.toUpperCase()} — ${why}`); + process.exit(1); + } +} + +/** + * ve-002 guard: build a fail-loud error for a non-numeric / negative + * `--threshold` value. A plain Error carrying `.code` + `.hint` so the + * top-level `renderTopLevelError` seam prints the structured envelope and + * exits 1 — mirroring CliInvalidGlobsError's shape without coupling the + * shared verify-flag parser to a new error class. Exported-via-throw so the + * unit test can assert the parser rejects rather than yielding NaN. + * + * @param {string} raw — the value the operator passed after --threshold + */ +function thresholdError(raw) { + const e = new Error( + `--threshold expects a non-negative integer; got '${raw}'` + ); + e.code = 'CLI_INVALID_THRESHOLD'; + e.received = raw; + e.hint = 'pass an integer >= 0, e.g. `--threshold 0` or `--threshold=3`'; + return e; } /** * Parse the shared verify-* CLI flags: --threshold=N, --format=text|markdown|json, * --legacy-v1. Returned values are plain JS so each verb's wrapper can * spread directly into its impl call. + * + * ve-002: the space form `--threshold ` is validated with + * Number.isFinite (and a non-negative check) so a typo like `--threshold foo` + * fails loud instead of yielding NaN. A NaN threshold silently disabled the + * gate: `offending > NaN` is always false, so the command exited 0 ("clean") + * even with real regressions, on all four verify-* verbs. The `--threshold=N` + * equals-form was already digit-guarded by its `^--threshold=(\d+)$` regex; + * this closes the space-form hole and keeps both forms consistent. + * + * @throws when the space-form value is not a finite non-negative integer. */ -function parseVerifyFlags(args) { +export function parseVerifyFlags(args) { let threshold = 0; for (const a of args.slice(1)) { const m = a.match(/^--threshold=(\d+)$/); if (m) { threshold = parseInt(m[1], 10); break; } } const tIdx = args.indexOf('--threshold'); - if (tIdx >= 0 && args[tIdx + 1]) { - threshold = parseInt(args[tIdx + 1], 10); + if (tIdx >= 0 && args[tIdx + 1] !== undefined) { + const raw = args[tIdx + 1]; + const n = parseInt(raw, 10); + if (!Number.isFinite(n) || n < 0 || !/^\d+$/.test(String(raw).trim())) { + throw thresholdError(raw); + } + threshold = n; } let format; @@ -857,7 +932,14 @@ function cmdAdvance(args) { const override = args.includes('--override'); const reasonIdx = args.indexOf('--reason'); - const overrideReason = reasonIdx >= 0 ? args[reasonIdx + 1] : undefined; + // cli-003: a following token that starts with `--` is the NEXT flag, not the + // reason text. `--override --reason ` captured the flag as overrideReason + // (truthy), so the override proceeded with a junk audit reason — advance.js + // persists it into the promotion/override record as overrides:[{reason}]. No + // irreversible side effect here (why the amend agent declined to file it), but + // a polluted audit reason is still wrong. Treat a `--`-prefixed value as missing. + const reasonCandidate = reasonIdx >= 0 ? args[reasonIdx + 1] : undefined; + const overrideReason = (reasonCandidate && !reasonCandidate.startsWith('--')) ? reasonCandidate : undefined; if (override && !overrideReason) { console.error('--override requires --reason "explanation"'); @@ -902,31 +984,57 @@ function cmdApprove(args) { const idsArg = args.find((a, i) => args[i - 1] === '--ids'); const ids = idsArg ? idsArg.split(',').map(s => s.trim()) : []; - let updated; - if (approveAll) { - updated = db.prepare( - "UPDATE findings SET status = 'approved' WHERE run_id = ? AND status IN ('new', 'recurring')" - ).run(runId); - } else if (ids.length > 0) { - const placeholders = ids.map(() => '?').join(','); - updated = db.prepare( - `UPDATE findings SET status = 'approved' WHERE run_id = ? AND finding_id IN (${placeholders}) AND status IN ('new', 'recurring')` - ).run(runId, ...ids); - } else { + if (!approveAll && ids.length === 0) { console.error('Specify --all or --ids F-001,F-002'); process.exit(1); } - console.log(`Approved ${updated.changes} findings for ${runId}`); + // cli-002 fix: record `approved` events only for findings THIS call moves + // new/recurring → approved, not for every already-approved finding in the + // run. finding_events is append-only with no unique constraint, so the old + // `SELECT ... WHERE status = 'approved'` (which returned rows approved by + // earlier invocations too) inserted a duplicate `approved` event on every + // re-run of `swarm approve`, over-counting the event-sourced audit trail. + // + // Capture the about-to-flip ids BEFORE the UPDATE, then insert one event + // per captured id — all inside one transaction so the UPDATE and its audit + // rows land together (Stripe Ledger pattern, mirrors transitionWave). + const selectPending = approveAll + ? db.prepare( + "SELECT id FROM findings WHERE run_id = ? AND status IN ('new', 'recurring')" + ) + : db.prepare( + `SELECT id FROM findings WHERE run_id = ? AND finding_id IN (${ids.map(() => '?').join(',')}) AND status IN ('new', 'recurring')` + ); - // Record events - const approved = db.prepare( - "SELECT id FROM findings WHERE run_id = ? AND status = 'approved'" - ).all(runId); const insertEvent = db.prepare( "INSERT INTO finding_events (finding_id, event_type, notes) VALUES (?, 'approved', 'bulk approve')" ); - for (const f of approved) insertEvent.run(f.id); + + let changes = 0; + const tx = db.transaction(() => { + const pending = approveAll + ? selectPending.all(runId) + : selectPending.all(runId, ...ids); + + let updated; + if (approveAll) { + updated = db.prepare( + "UPDATE findings SET status = 'approved' WHERE run_id = ? AND status IN ('new', 'recurring')" + ).run(runId); + } else { + const placeholders = ids.map(() => '?').join(','); + updated = db.prepare( + `UPDATE findings SET status = 'approved' WHERE run_id = ? AND finding_id IN (${placeholders}) AND status IN ('new', 'recurring')` + ).run(runId, ...ids); + } + + for (const f of pending) insertEvent.run(f.id); + return updated.changes; + }); + + changes = tx(); + console.log(`Approved ${changes} findings for ${runId}`); } function cmdPersist(args) { @@ -948,6 +1056,24 @@ function cmdPersist(args) { }); console.log(formatPersist(result)); + + // cli-p-001 / fp-p-002: when --ingest was requested (and not a dry run), the + // ingest is an irreversible write to the dogfood corpus. persist() catches a + // failed ingest into report.dogfood.reason and returns a success-shaped + // report, so cmdPersist used to exit 0 even when nothing was ingested — a CI + // step gating on $? saw a failed corpus write as green. The sibling + // persist-results.js exits 1 on the identical failure; align the two corpus- + // write surfaces on one exit-code contract. Surface the reason + a copy- + // pasteable reproduce line (mirroring persist-results.js) so the operator + // can replay the ingest with full output. + if (ingestDogfood && !dryRun && result.dogfood && result.dogfood.ingested !== true) { + console.error(`ERROR [INGEST_FAILED]: dogfood ingest did not complete — ${result.dogfood.reason}`); + if (result.artifacts?.dogfoodSubmission) { + console.error(` Submission: ${result.artifacts.dogfoodSubmission}`); + console.error(` Reproduce: node "/packages/ingest/run.js" --provenance=stub --file "${result.artifacts.dogfoodSubmission}"`); + } + process.exit(1); + } } function cmdFindings(args) { @@ -1013,9 +1139,6 @@ function cmdRuns() { // ── Dispatch ── -const command = process.argv[2]; -const commandArgs = process.argv.slice(3); - const commands = { init: cmdInit, domains: cmdDomains, @@ -1040,8 +1163,42 @@ const commands = { runs: cmdRuns, }; -if (!command || !commands[command]) { - console.log(`swarm — Truthful swarm control plane for repo work +/** + * Direct-execution guard. cli.js historically ran its argv dispatch at module + * load unconditionally, which means importing anything from this file (e.g. + * parseVerifyFlags for a unit test) would execute the dispatch under the test + * runner's argv and `process.exit`. The guard makes the file importable: the + * dispatch only runs when cli.js is the process entry point (node cli.js ..., + * or the `swarm` bin), not when it is imported. The subprocess smoke tests + * (cli-smoke.test.js, rewind.test.js) still exercise the real dispatch because + * they spawn `node cli.js` where argv[1] resolves to this file. + */ +function isDirectExecution() { + const entry = process.argv[1]; + if (!entry) return false; + try { + return realpathSync(entry) === realpathSync(fileURLToPath(import.meta.url)); + } catch { + return entry === fileURLToPath(import.meta.url); + } +} + +/** + * cli-r-002: the argv dispatch body lives in main() rather than inline under + * `if (isDirectExecution())`. The previous inline form left the help-text + * console.log + trailing process.exit indented one level shallower than their + * enclosing `if (!command || !commands[command])` body. Hoisting the body into + * a named function lets every statement sit at one consistent indentation + * level without a deep-nesting re-indent, and reads as a normal entry point. + * Behavior is identical: main() is invoked only when cli.js is the process + * entry point (the subprocess smoke tests still spawn `node cli.js`). + */ +function main() { + const command = process.argv[2]; + const commandArgs = process.argv.slice(3); + + if (!command || !commands[command]) { + console.log(`swarm — Truthful swarm control plane for repo work Commands: init Create run, detect domains @@ -1158,12 +1315,15 @@ Phases: health-audit-c health-amend-c stage-d-audit stage-d-amend feature-audit feature-execute`); - process.exit(command ? 1 : 0); -} + process.exit(command ? 1 : 0); + } -try { - commands[command](commandArgs); -} catch (e) { - renderTopLevelError(e); - process.exit(1); + try { + commands[command](commandArgs); + } catch (e) { + renderTopLevelError(e); + process.exit(1); + } } + +if (isDirectExecution()) main(); diff --git a/packages/dogfood-swarm/commands/collect.js b/packages/dogfood-swarm/commands/collect.js index 40a1391..7155f1f 100644 --- a/packages/dogfood-swarm/commands/collect.js +++ b/packages/dogfood-swarm/commands/collect.js @@ -12,7 +12,8 @@ * 6. Generate wave summary */ -import { readFileSync, existsSync } from 'node:fs'; +import { readFileSync, existsSync, statSync } from 'node:fs'; +import { resolve as resolvePath, sep } from 'node:path'; import { createHash } from 'node:crypto'; import { openDb } from '../db/connection.js'; import { getDomains, checkOwnership } from '../lib/domains.js'; @@ -54,6 +55,15 @@ import { randomBytes } from 'node:crypto'; */ const MAX_ERROR_MESSAGE_CHARS = 512; +/** + * Upper bound on a source file we will read to build a context-snippet + * fingerprint (fp-p-005). The snippet only needs ~7 lines, but extracting them + * splits the whole file, so we refuse to load a pathological (minified bundle, + * generated blob) file into memory; such a finding falls back to the line-bucket + * fingerprint. 2 MB clears any hand-written source by a wide margin. + */ +const MAX_SOURCE_FILE_BYTES = 2 * 1024 * 1024; + /** * tryTransition — observability-friendly wrapper around transitionAgent. * @@ -237,6 +247,37 @@ export function collect(opts) { const allFindings = []; + // fp-p-005: source-text cache for the context-snippet fingerprint. Each + // finding's file is read at most once per collect; computeFingerprint folds an + // edit-stable hash of the ~7 lines around finding.line into the base + // fingerprint, so two distinct findings in one file+bucket no longer collide. + // Audit waves (the only ones that produce findings) do not edit the tree, so + // this read is the same source snapshot the auditor reported against. The + // cache value is the file text, or null when the file is unreadable, oversized, + // or resolves OUTSIDE the worktree — in which case computeFingerprint falls + // back to the historical line-bucket. A `finding.file` is attacker-adjacent + // (it comes from agent JSON), so the containment guard refuses any path that + // escapes the worktree root even though we only ever hash a snippet of it. + const sourceCache = new Map(); + const readFindingSource = (root, file) => { + if (!root || !file) return null; + const rootResolved = resolvePath(root); + const resolved = resolvePath(root, String(file)); + if (sourceCache.has(resolved)) return sourceCache.get(resolved); + let text = null; + const contained = resolved === rootResolved || resolved.startsWith(rootResolved + sep); + if (contained) { + try { + const st = statSync(resolved); + if (st.isFile() && st.size <= MAX_SOURCE_FILE_BYTES) { + text = readFileSync(resolved, 'utf-8'); + } + } catch { text = null; } + } + sourceCache.set(resolved, text); + return text; + }; + // L3-003 (Wave A2 amend2 — family seal of D3B-002): wrap the per-agent // collection loop in a single db.transaction() so a mid-loop crash // rolls back EVERY DB write across all agents iterated so far. Pre-fix, @@ -308,7 +349,7 @@ export function collect(opts) { // Runs BEFORE the legacy shape-specific validators below and BEFORE // fingerprint computation, so a malformed agent JSON is rejected with a // structured AgentOutputValidationError pointing the operator at - // scripts/agent-output.schema.json. The legacy validators stay for + // packages/schemas/src/json/agent-output.schema.json. The legacy validators stay for // shape-specific extras (e.g. 'stage' enum) but the schema is now the // contract gate. Wave-22 logStage wrapper-strip pattern preserved by // calling logStage directly with a fresh correlation_id. @@ -452,8 +493,10 @@ export function collect(opts) { ? (output.findings || output.features || []) : []; + const sourceRoot = ar.worktree_path || run.local_path; for (const f of findings) { - f.fingerprint = computeFingerprint(f); + const sourceText = readFindingSource(sourceRoot, f.file); + f.fingerprint = computeFingerprint(f, { sourceText }); allFindings.push(f); } diff --git a/packages/dogfood-swarm/commands/persist.js b/packages/dogfood-swarm/commands/persist.js index ec19fb5..6dcebf3 100644 --- a/packages/dogfood-swarm/commands/persist.js +++ b/packages/dogfood-swarm/commands/persist.js @@ -95,8 +95,19 @@ export function persist(opts) { } // 5. Summary + // + // fp-p-004: this step only WROTE three local audit JSON files (run/ + // findings/metrics) into /persist/audit via atomicWrite. It does + // NOT perform the `rk audit import` / `audit_submit` into the repo-knowledge + // DB — that is the coordinator's downstream step. Reporting a bare + // `exported: true` / `Status: pass` conflated "wrote audit files locally" + // with "audit landed in the repo-knowledge DB" and could make an operator + // believe the submission happened. Reflect what actually occurred: artifacts + // written here, submission still pending. Mirrors the dogfood path's honest + // Ingested: YES/NO phrasing so both downstream targets report consistently. report.repoKnowledge = { - exported: true, + artifactsWritten: true, + submitted: false, path: auditDir, status: auditPayload.run.overall_status, posture: auditPayload.run.overall_posture, @@ -130,7 +141,15 @@ export function formatPersist(r) { lines.push(''); lines.push('Repo-knowledge:'); - lines.push(` Status: ${r.repoKnowledge?.status} (${r.repoKnowledge?.posture})`); + // fp-p-004: distinguish "artifacts written locally" from "submitted to the + // repo-knowledge DB". The submission is the coordinator's downstream step; + // say so rather than implying it already happened. + if (r.repoKnowledge?.submitted) { + lines.push(` Submitted: YES — status ${r.repoKnowledge?.status} (${r.repoKnowledge?.posture})`); + } else { + lines.push(` Submitted: NO — artifacts written, run \`rk audit import \` to submit`); + lines.push(` Status (pending): ${r.repoKnowledge?.status} (${r.repoKnowledge?.posture})`); + } lines.push(` Path: ${r.repoKnowledge?.path}`); return lines.join('\n'); diff --git a/packages/dogfood-swarm/commands/rewind.js b/packages/dogfood-swarm/commands/rewind.js index 56882b3..8da4f4c 100644 --- a/packages/dogfood-swarm/commands/rewind.js +++ b/packages/dogfood-swarm/commands/rewind.js @@ -68,6 +68,8 @@ */ import { execFileSync, spawnSync } from 'node:child_process'; +import { realpathSync } from 'node:fs'; +import { resolve as resolvePath } from 'node:path'; import { openDb } from '../db/connection.js'; import { transitionAgent, TERMINAL_STATUSES as AGENT_TERMINAL_STATUSES } from '../lib/state-machine.js'; @@ -174,36 +176,67 @@ export function rewind(opts) { const db = openDb(dbPath); + // cli-001 fix: resolve the run(s) that own THIS working tree before + // collecting anything to abort. A single control-plane.db holds every run + // across every repo (proved by `swarm runs`), so an unscoped abort drives + // OTHER runs' live waves/agent_runs to the terminal `aborted_for_rewind` + // status while the git reset only touches `cwd`. We match `runs.local_path` + // against the rewind cwd (normalized for symlinks / trailing separators / + // Windows casing) and scope every query below to those run ids. If no run + // matches (e.g. an arbitrary-ref rewind in a tree with no registered run), + // the abort set is empty by construction — the git reset still runs, but no + // DB rows are touched, which is the correct conservative behavior. + const cwdKey = normalizePathForMatch(cwd); + const targetRunIds = db.prepare('SELECT id, local_path FROM runs').all() + .filter(r => normalizePathForMatch(r.local_path) === cwdKey) + .map(r => r.id); + + // Empty-set guard: `WHERE run_id IN ()` is a SQL syntax error and an empty + // placeholder list would match nothing anyway. Short-circuit to empty + // result sets so the dry-run plan + preserved counts reflect "no run owns + // this tree" without issuing a malformed query. + const runScope = targetRunIds.length > 0; + const runPlaceholders = targetRunIds.map(() => '?').join(','); + // Collect every wave + agent_run we would touch. Filter out terminal rows - // (advanced waves, complete agents, prior aborted_for_rewind entries). The - // dry-run pass shows the operator exactly what --apply would change. - const waves = db.prepare( - 'SELECT id, run_id, phase, wave_number, status FROM waves WHERE status NOT IN (' + + // (advanced waves, complete agents, prior aborted_for_rewind entries) AND + // scope to the run(s) owning this working tree. The dry-run pass shows the + // operator exactly what --apply would change. + const waves = runScope ? db.prepare( + 'SELECT id, run_id, phase, wave_number, status FROM waves WHERE run_id IN (' + + runPlaceholders + ') AND status NOT IN (' + [...WAVE_TERMINAL_STATUSES].map(() => '?').join(',') + ')' - ).all(...WAVE_TERMINAL_STATUSES); + ).all(...targetRunIds, ...WAVE_TERMINAL_STATUSES) : []; - const agentRuns = db.prepare(` + // agent_runs has no run_id column — scope through waves.run_id via the join. + const agentRuns = runScope ? db.prepare(` SELECT ar.id, ar.wave_id, ar.status, d.name AS domain_name FROM agent_runs ar JOIN domains d ON ar.domain_id = d.id - WHERE ar.status NOT IN (${[...AGENT_TERMINAL_STATUSES].map(() => '?').join(',')}) - `).all(...AGENT_TERMINAL_STATUSES); + JOIN waves w ON ar.wave_id = w.id + WHERE w.run_id IN (${runPlaceholders}) + AND ar.status NOT IN (${[...AGENT_TERMINAL_STATUSES].map(() => '?').join(',')}) + `).all(...targetRunIds, ...AGENT_TERMINAL_STATUSES) : []; // 5B-1 fold-in (T4): preserved-count surface. The plan summary names what // rewind LEFT ALONE (terminal rows survive byte-identical) alongside what // it tore down. The operator's mental model is "rewind erases the failure // tail but preserves history" — the surface should reflect that, not just - // the affected count. - const preservedWaveCount = db.prepare( - 'SELECT COUNT(*) AS n FROM waves WHERE status IN (' + + // the affected count. cli-001: these counts are also scoped to the target + // run(s) so the preserved surface describes THIS tree's run, not the whole + // shared DB. + const preservedWaveCount = runScope ? db.prepare( + 'SELECT COUNT(*) AS n FROM waves WHERE run_id IN (' + runPlaceholders + + ') AND status IN (' + [...WAVE_TERMINAL_STATUSES].map(() => '?').join(',') + ')' - ).get(...WAVE_TERMINAL_STATUSES).n; + ).get(...targetRunIds, ...WAVE_TERMINAL_STATUSES).n : 0; - const preservedAgentRunCount = db.prepare( - 'SELECT COUNT(*) AS n FROM agent_runs WHERE status IN (' + + const preservedAgentRunCount = runScope ? db.prepare( + 'SELECT COUNT(*) AS n FROM agent_runs ar JOIN waves w ON ar.wave_id = w.id ' + + 'WHERE w.run_id IN (' + runPlaceholders + ') AND ar.status IN (' + [...AGENT_TERMINAL_STATUSES].map(() => '?').join(',') + ')' - ).get(...AGENT_TERMINAL_STATUSES).n; + ).get(...targetRunIds, ...AGENT_TERMINAL_STATUSES).n : 0; // Build the plan. Each entry carries the planned transition so the // operator can audit before --apply. @@ -235,6 +268,7 @@ export function rewind(opts) { headShaBeforeShort: headSha.slice(0, 8), cwd, dbPath, + scopedRunIds: targetRunIds, dryRun: !apply, apply: !!apply, force: !!force, @@ -484,3 +518,30 @@ function mintCorrelationId() { const rand = Math.random().toString(36).slice(2, 6); return `coord-${ts}-${rand}`; } + +/** + * cli-001 fix: normalize a filesystem path for run-scoping comparison. + * + * The git reset is scoped to one working tree (the rewind `cwd`), but the + * DB abort must be scoped to the SAME tree's run(s) — otherwise rewinding + * run A in repo X drives run B's in-flight rows (possibly in repo Y, sharing + * the single control-plane.db) to the terminal `aborted_for_rewind` status. + * + * `runs.local_path` is written via `resolve(repoPath)` at init time, but the + * operator's `process.cwd()` at rewind time can differ by symlink resolution, + * a trailing separator, or (on Windows) drive-letter / separator casing. We + * compare on realpath when the path exists (collapses symlinks + casing on + * case-insensitive filesystems), falling back to `resolve()` for paths that + * no longer exist on disk. Returns a lowercased string so Windows + * case-insensitive trees still match. + */ +function normalizePathForMatch(p) { + if (!p || typeof p !== 'string') return ''; + let out; + try { + out = realpathSync(p); + } catch { + out = resolvePath(p); + } + return out.replace(/[\\/]+$/, '').toLowerCase(); +} diff --git a/packages/dogfood-swarm/commands/status.js b/packages/dogfood-swarm/commands/status.js index 4a18c01..abc3c67 100644 --- a/packages/dogfood-swarm/commands/status.js +++ b/packages/dogfood-swarm/commands/status.js @@ -76,6 +76,30 @@ export function status(opts) { }; } + // Half-state detection (collect-crash-during-upsert). collect.js persists + // artifacts + flips agents to `complete` BEFORE the findings upsert and the + // wave-status UPDATE (see commands/collect.js:503-507). If the upsert throws + // (CollectUpsertError), the wave is left `dispatched` with every agent + // `complete` and artifacts on disk, but ZERO findings referencing it — yet + // `swarm status` would print *** READY TO COLLECT *** with no hint a collect + // already failed. We measure both halves of that signature here (artifacts + // present, findings absent) so computeAssessment can surface a + // non-destructive breadcrumb. Counts are scoped to the current wave: an + // artifact belongs to a wave through its agent_run; a finding through + // first_seen_wave / last_seen_wave. + let currentWaveArtifactCount = 0; + let currentWaveFindingCount = 0; + if (currentWave) { + currentWaveArtifactCount = db.prepare(` + SELECT COUNT(*) AS n FROM artifacts a + JOIN agent_runs ar ON a.agent_run_id = ar.id + WHERE ar.wave_id = ? + `).get(currentWave.id).n; + currentWaveFindingCount = allFindings.filter( + f => f.first_seen_wave === currentWave.id || f.last_seen_wave === currentWave.id + ).length; + } + // Violations across all waves — F-L1-001 (Wave A1 D3 fix-up): the // wave-9 family's same-file unlisted sibling. The `currentAgents` query // above adopted the helper; this violations subquery (advisor verifier @@ -134,6 +158,8 @@ export function status(opts) { runId: opts.runId, savePointTag: run.save_point_tag, lastVerificationPassed: lastReceipt ? !!lastReceipt.passed : null, + currentWaveArtifactCount, + currentWaveFindingCount, } ); @@ -455,6 +481,29 @@ function computeAssessment(wave, agents, openBySeverity, blocked, inFlight, ctx // All complete — wave status check if (wave.status === 'dispatched') { + // Half-state breadcrumb (collect-crash-during-upsert). When the wave is + // still `dispatched` with every agent `complete` AND artifacts were + // persisted but ZERO findings reference this wave, a prior `swarm collect` + // most likely threw during the findings upsert (CollectUpsertError) after + // committing artifacts + agent transitions but before the wave-status + // UPDATE. The bare "READY TO COLLECT / Run `swarm collect`" output hid + // that a collect already ran and failed — even though the collect error + // itself told the operator to "inspect with swarm status". Surface a + // non-destructive hint; the recovery (re-run collect) is the same, so the + // happy path (no artifacts yet → collect never ran) is unchanged. + const collectMayHaveFailed = + ctx.currentWaveArtifactCount > 0 && ctx.currentWaveFindingCount === 0; + if (collectMayHaveFailed) { + return { + state: 'READY TO COLLECT', + blockers, + nextAction: + 'Agents reported complete and artifacts were persisted, but no findings were ' + + 'persisted for this wave — a prior `swarm collect` may have failed during the ' + + 'findings upsert. Re-run `swarm collect`, or check logs / `swarm receipt` for the ' + + 'collect error.', + }; + } return { state: 'READY TO COLLECT', blockers, diff --git a/packages/dogfood-swarm/commands/verify.js b/packages/dogfood-swarm/commands/verify.js index 151fdc5..18ccccd 100644 --- a/packages/dogfood-swarm/commands/verify.js +++ b/packages/dogfood-swarm/commands/verify.js @@ -7,9 +7,23 @@ * This is a wave gate: status uses the receipt to recommend ADVANCE vs FIX. */ +import { randomBytes } from 'node:crypto'; import { openDb } from '../db/connection.js'; import { runVerification, probeAll, selectAdapter, listAdapters } from '../lib/verify/registry.js'; import { transitionWave } from '../lib/wave-state-machine.js'; +import { logStage } from '../lib/log-stage.js'; + +/** + * Mint a synthetic correlation_id for the verify wave-gate. Mirrors the + * `coord--` pattern used in commands/dispatch.js + + * collect.js so a single grep ties a `verify_start`/`verify_complete` + * pair to the run+wave it gated (ve-p-004). + */ +function mintCorrelationId() { + const ts = Date.now().toString(36); + const rand = randomBytes(2).toString('hex'); + return `coord-${ts}-${rand}`; +} /** * Run verification for a swarm run. @@ -34,12 +48,35 @@ export function verify(opts) { `).get(opts.runId); if (!wave) throw new Error('No waves found'); + const correlationId = mintCorrelationId(); + logStage('verify_start', { + component: 'dogfood-swarm', + correlation_id: correlationId, + runId: opts.runId, + wave: wave.wave_number, + adapter: opts.override || 'auto', + }); + // Run verification const result = runVerification(run.local_path, { override: opts.override, commandOverrides: opts.commandOverrides, }); + // ve-p-003: the runtime verdict vocabulary (pass/fail/skip/no_tests/ + // tool_missing) and its `reason` would otherwise flatten to a single + // passed=0/1 bit at persistence — verification_receipts has no verdict + // column, so a real FAIL, a no_tests skip, and a no-adapter skip become + // indistinguishable in the durable record. Until that schema gains a + // verdict/reason column, fold the disambiguation into the stdout the + // receipt already stores: a header line carries the verdict + reason so + // the truth survives in the persisted artifact, not just on the console. + const verdictHeader = `=== verify verdict: ${result.verdict}${result.reason ? ` — ${result.reason}` : ''} ===`; + const persistedStdout = [ + verdictHeader, + ...result.steps.map(s => `=== ${s.name} (${s.passed ? 'PASS' : 'FAIL'}) ===\n${s.stdout}`), + ].join('\n\n'); + // Persist to verification_receipts const receiptResult = db.prepare(` INSERT INTO verification_receipts @@ -50,7 +87,7 @@ export function verify(opts) { result.adapter || 'none', JSON.stringify(result.steps.map(s => s.command)), result.steps.find(s => !s.passed && !s.optional)?.exit_code ?? 0, - result.steps.map(s => `=== ${s.name} (${s.passed ? 'PASS' : 'FAIL'}) ===\n${s.stdout}`).join('\n\n'), + persistedStdout, result.steps.filter(s => s.stderr).map(s => `=== ${s.name} ===\n${s.stderr}`).join('\n\n'), result.verdict === 'pass' ? 1 : 0, result.test_count, @@ -69,11 +106,33 @@ export function verify(opts) { ); } + const receiptId = Number(receiptResult.lastInsertRowid); + + logStage('verify_complete', { + component: 'dogfood-swarm', + correlation_id: correlationId, + runId: opts.runId, + wave: wave.wave_number, + adapter: result.adapter || 'none', + verdict: result.verdict, + reason: result.reason, + test_count: result.test_count, + duration_ms: result.duration_ms, + receiptId, + }); + return { - receiptId: Number(receiptResult.lastInsertRowid), + receiptId, adapter: result.adapter, probe: result.probe, verdict: result.verdict, + // td-p-004 / ve-p-002: forward the adapter's `reason` (and the + // `no_tests` flag) so the CLI can both EXPLAIN a non-pass verdict to + // the operator (formatVerify) and gate its exit code on it (cmdVerify). + // The runner/registry compute these precisely; dropping them here is + // exactly what left the operator staring at a bare `NO_TESTS` token. + reason: result.reason, + no_tests: result.no_tests, duration_ms: result.duration_ms, test_count: result.test_count, steps: result.steps.map(s => ({ @@ -110,6 +169,11 @@ export function formatVerify(result) { const lines = []; lines.push(`Verification: ${result.verdict.toUpperCase()}`); + // ve-p-002 / td-p-004: a non-pass verdict (no_tests, skip, fail) carries a + // `reason` the adapter/registry constructed precisely — surface it so the + // operator sees WHY, not just a bare verdict token. Mirrors the + // `if (result.reason)` print already used by cmdPromote/cmdGate in cli.js. + if (result.reason) lines.push(`Reason: ${result.reason}`); lines.push(`Adapter: ${result.adapter || 'none'}`); if (result.probe) { lines.push(`Probe: score ${result.probe.score} — ${result.probe.reason}`); diff --git a/packages/dogfood-swarm/d3b-006-finding-id-collision.test.js b/packages/dogfood-swarm/d3b-006-finding-id-collision.test.js index b5104a9..85114c6 100644 --- a/packages/dogfood-swarm/d3b-006-finding-id-collision.test.js +++ b/packages/dogfood-swarm/d3b-006-finding-id-collision.test.js @@ -96,6 +96,37 @@ describe('D3B-006: content-addressed finding_id + UNIQUE constraint', () => { 'finding_id should be the first 8 hex chars of the fingerprint'); }); + it('content-addresses the finding_id from a CONTEXT-FOLDED fingerprint (fp-p-005 composition)', () => { + // fp-p-005 changes WHAT the fingerprint is (it folds in an edit-stable hash + // of the surrounding source), not the F- derivation. A finding + // fingerprinted WITH source still mints a content-addressed id, and two + // same-bucket findings the context hash separates get two DISTINCT ids + // without leaning on the prefix-collision UNIQUE net. + const src = Array.from({ length: 30 }, (_, i) => `const v${i + 1} = ${i + 1};`).join('\n'); + const f1 = { category: 'docs', file: 'README.md', line: 21, description: 'one', severity: 'LOW' }; + const f2 = { category: 'docs', file: 'README.md', line: 27, description: 'two', severity: 'LOW' }; + // Lines 21 & 27 share the 20-bucket → identical fingerprint WITHOUT source. + assert.equal(computeFingerprint(f1), computeFingerprint(f2), + 'precondition: the two share a no-source base fingerprint'); + const fp1 = computeFingerprint(f1, { sourceText: src }); + const fp2 = computeFingerprint(f2, { sourceText: src }); + assert.notEqual(fp1, fp2, 'precondition: the context hash separates them'); + + upsertFindings(db, RUN_ID, 1, { + new: [{ ...f1, fingerprint: fp1 }, { ...f2, fingerprint: fp2 }], + recurring: [], fixed: [], unverified: [], + }); + + const rows = db.prepare('SELECT finding_id, fingerprint FROM findings WHERE run_id = ? ORDER BY id').all(RUN_ID); + assert.equal(rows.length, 2, 'both context-separated findings persisted'); + for (const r of rows) { + assert.equal(r.finding_id, `F-${r.fingerprint.slice(0, 8)}`, + 'finding_id is the first 8 hex of the (context-folded) fingerprint'); + } + assert.equal(new Set(rows.map((r) => r.finding_id)).size, 2, + 'distinct content-addressed finding_ids from the distinct context fingerprints'); + }); + it('produces the SAME finding_id for the SAME fingerprint across invocations', () => { // Two findings whose description differs but whose fingerprint is identical // (the spec contract from Wave 8 B-BACK-002: description is NOT in the fingerprint). diff --git a/packages/dogfood-swarm/db/connection.js b/packages/dogfood-swarm/db/connection.js index 904e44b..1ac4971 100644 --- a/packages/dogfood-swarm/db/connection.js +++ b/packages/dogfood-swarm/db/connection.js @@ -63,12 +63,7 @@ export function openDb(dbPath) { const db = new Database(dbPath); - // WAL for better concurrent read perf - db.pragma('journal_mode = WAL'); - db.pragma('foreign_keys = ON'); - // Give concurrent writers a brief grace window instead of failing loudly - // on the first SQLITE_BUSY. See BUSY_TIMEOUT_MS doc above. - db.pragma(`busy_timeout = ${BUSY_TIMEOUT_MS}`); + applyConnectionPragmas(db, { inMemory: false }); // Apply schema idempotently const version = getSchemaVersion(db); @@ -82,12 +77,58 @@ export function openDb(dbPath) { // Apply ALTER TABLE migrations (catch duplicates) applyMigrations(db); setSchemaVersion(db, SCHEMA_VERSION); + } else if (version > SCHEMA_VERSION) { + // sm-p-002: the on-disk DB was written by a NEWER build than this one. + // Neither create (version < 1) nor upgrade (version < SCHEMA_VERSION) + // fires, so without this branch openDb would silently proceed against an + // unknown-newer shape. The shared swarms/control-plane.db is committed + // back to main by ingest.yml; an operator on an older checkout (or a stale + // CI cache) can hit a DB a newer main already migrated. A newer schema may + // rename/repurpose a column or add a NOT NULL column this writer won't + // populate — silent data corruption. Refuse loudly, same fail-loud-not- + // silent discipline as the dead-handle sentinel and busy_timeout above. + db.close(); + pool.delete(dbPath); + throw new Error( + `control-plane.db at ${dbPath} is schema v${version} but this ` + + `@dogfood-lab/dogfood-swarm build only understands v${SCHEMA_VERSION}. ` + + `Pull the latest @dogfood-lab/dogfood-swarm before opening this DB.` + ); } pool.set(dbPath, db); return db; } +/** + * Apply the per-connection pragmas shared by the file-backed (openDb) and + * in-memory (openMemoryDb) connection factories. + * + * sm-p-004: `foreign_keys` is per-connection and SQLite defaults it OFF, so + * setting it ON is load-bearing for the declared REFERENCES in schema.js — it + * is the one integrity-relevant pragma BOTH factories must apply, and routing + * both through here keeps them from drifting apart again (openDb gained the + * WAL + busy_timeout block historically; openMemoryDb did not follow). The + * file-only pragmas (WAL journal mode + busy_timeout) are gated on !inMemory: + * for a `:memory:` DB there is no file and no cross-process writer contention, + * so WAL is a no-op and a busy_timeout has nothing to wait on. + * + * @param {Database.Database} db + * @param {{ inMemory: boolean }} opts + */ +function applyConnectionPragmas(db, { inMemory }) { + // Integrity pragma — applied to EVERY connection regardless of backing. + db.pragma('foreign_keys = ON'); + + if (!inMemory) { + // WAL for better concurrent read perf. + db.pragma('journal_mode = WAL'); + // Give concurrent writers a brief grace window instead of failing loudly + // on the first SQLITE_BUSY. See BUSY_TIMEOUT_MS doc above. + db.pragma(`busy_timeout = ${BUSY_TIMEOUT_MS}`); + } +} + export { BUSY_TIMEOUT_MS }; /** @@ -107,7 +148,7 @@ export function closeDb(dbPath) { */ export function openMemoryDb() { const db = new Database(':memory:'); - db.pragma('foreign_keys = ON'); + applyConnectionPragmas(db, { inMemory: true }); db.exec(SCHEMA_SQL); applyMigrations(db); setSchemaVersion(db, SCHEMA_VERSION); diff --git a/packages/dogfood-swarm/dispatch-prompt-schema.test.js b/packages/dogfood-swarm/dispatch-prompt-schema.test.js index c14b73d..85e21d7 100644 --- a/packages/dogfood-swarm/dispatch-prompt-schema.test.js +++ b/packages/dogfood-swarm/dispatch-prompt-schema.test.js @@ -36,9 +36,8 @@ import { describe, it, beforeEach, afterEach } from 'node:test'; import assert from 'node:assert/strict'; import { mkdtempSync, rmSync, readFileSync, readFileSync as _rfs } from 'node:fs'; import { tmpdir } from 'node:os'; -import { join, dirname } from 'node:path'; +import { join } from 'node:path'; import { createRequire } from 'node:module'; -import { fileURLToPath } from 'node:url'; import { openDb, closeDb } from './db/connection.js'; import { saveDomainDraft, freezeDomains, takeDomainSnapshot } from './lib/domains.js'; @@ -46,11 +45,8 @@ import { dispatch } from './commands/dispatch.js'; const RUN_ID = 'test-dispatch-prompt-schema'; -const __filename = fileURLToPath(import.meta.url); -const __dirname = dirname(__filename); -const SCHEMA_PATH = join(__dirname, '..', '..', 'scripts', 'agent-output.schema.json'); -// eslint-disable-next-line no-unused-vars -const _require = createRequire(import.meta.url); +const require = createRequire(import.meta.url); +const SCHEMA_PATH = require.resolve('@dogfood-lab/schemas/json/agent-output.schema.json'); const CANONICAL_SCHEMA = JSON.parse(readFileSync(SCHEMA_PATH, 'utf-8')); function setupRun(dbPath) { diff --git a/packages/dogfood-swarm/lib/advance.js b/packages/dogfood-swarm/lib/advance.js index faa3399..7fbbc59 100644 --- a/packages/dogfood-swarm/lib/advance.js +++ b/packages/dogfood-swarm/lib/advance.js @@ -185,38 +185,55 @@ export function recordPromotion(db, runId, waveId, fromPhase, toPhase, opts) { snapshot.byStatus[f.status] = (snapshot.byStatus[f.status] || 0) + 1; } - const result = db.prepare(` - INSERT INTO promotions (wave_id, run_id, from_phase, to_phase, authorized_by, gates_checked, overrides, finding_snapshot) - VALUES (?, ?, ?, ?, ?, ?, ?, ?) - `).run( - waveId, runId, fromPhase, toPhase, - opts.authorizedBy || 'coordinator', - JSON.stringify(opts.gates), - opts.overrides ? JSON.stringify(opts.overrides) : null, - JSON.stringify(snapshot), - ); - - // Mark wave as advanced via the lawful state machine (Phase 5A). The - // promotion row above is the contractual evidence of the gate-check outcome; - // the wave_state_events row below records the wave-status transition itself - // with the promotion id in the reason for forensic linkage. Legal source - // states are 'collected' (verify skipped or not yet run) and 'verified' - // (verify passed). - transitionWave( - db, - waveId, - 'advanced', - `advance: promotion #${Number(result.lastInsertRowid)} ${fromPhase} → ${toPhase}` - ); - - // Update run status - if (toPhase === 'complete') { - db.prepare("UPDATE runs SET status = 'complete', completed_at = datetime('now') WHERE id = ?").run(runId); - } else { - db.prepare('UPDATE runs SET status = ? WHERE id = ?').run(toPhase, runId); - } + // sm-003: a promotion records an IRREVERSIBLE gate decision, so its three + // durable writes — INSERT promotions, transitionWave(...,'advanced'), and the + // runs.status UPDATE — must land together or not at all. Pre-fix they ran + // unwrapped: a throw in transitionWave (e.g. a concurrent collect moved the + // wave out of collected/verified → STATE_MACHINE_INVALID) left a promotion + // row with no wave transition and a stale runs.status — a half-applied + // advancement that corrupts the gate-evidence trail. Mirror the D3B-002 + // pattern in dispatch.js: one outer db.transaction(); better-sqlite3 collapses + // transitionWave's own executeTransition self-wrap into a SAVEPOINT, so the + // promotion row, the wave_state_events audit row, and the runs.status update + // commit or roll back as a unit. + const promote = db.transaction(() => { + const result = db.prepare(` + INSERT INTO promotions (wave_id, run_id, from_phase, to_phase, authorized_by, gates_checked, overrides, finding_snapshot) + VALUES (?, ?, ?, ?, ?, ?, ?, ?) + `).run( + waveId, runId, fromPhase, toPhase, + opts.authorizedBy || 'coordinator', + JSON.stringify(opts.gates), + opts.overrides ? JSON.stringify(opts.overrides) : null, + JSON.stringify(snapshot), + ); + + const promotionId = Number(result.lastInsertRowid); + + // Mark wave as advanced via the lawful state machine (Phase 5A). The + // promotion row above is the contractual evidence of the gate-check + // outcome; the wave_state_events row below records the wave-status + // transition itself with the promotion id in the reason for forensic + // linkage. Legal source states are 'collected' (verify skipped or not yet + // run) and 'verified' (verify passed). + transitionWave( + db, + waveId, + 'advanced', + `advance: promotion #${promotionId} ${fromPhase} → ${toPhase}` + ); + + // Update run status + if (toPhase === 'complete') { + db.prepare("UPDATE runs SET status = 'complete', completed_at = datetime('now') WHERE id = ?").run(runId); + } else { + db.prepare('UPDATE runs SET status = ? WHERE id = ?').run(toPhase, runId); + } + + return promotionId; + }); - return Number(result.lastInsertRowid); + return promote(); } /** diff --git a/packages/dogfood-swarm/lib/bounded-json-read.js b/packages/dogfood-swarm/lib/bounded-json-read.js index fe9852e..71bd2e4 100644 --- a/packages/dogfood-swarm/lib/bounded-json-read.js +++ b/packages/dogfood-swarm/lib/bounded-json-read.js @@ -105,9 +105,17 @@ export function readBoundedJson(filePath, opts = {}) { ); } - let raw; + // fp-004: the statSync gate above is advisory because of the + // statSync→readFileSync TOCTOU window — a file actively being written + // (the "logging loop" this helper guards against) can grow past maxBytes + // between the two calls, and a plain readFileSync would then pull the whole + // grown file into memory, OOM-ing the loop the gate is meant to defend. + // Close the window by enforcing the limit on the bytes ACTUALLY read: read + // as a Buffer (no encoding) and reject if the buffer itself exceeds the cap + // before we decode + parse. The size we check is now the size we read. + let buf; try { - raw = readFileSync(filePath, 'utf-8'); + buf = readFileSync(filePath); } catch (e) { throw new BoundedJsonError( `bounded-json: cannot read ${filePath}: ${e.message}`, @@ -115,6 +123,20 @@ export function readBoundedJson(filePath, opts = {}) { ); } + if (buf.length > maxBytes) { + const sizeMb = (buf.length / 1024 / 1024).toFixed(1); + const limitMb = (maxBytes / 1024 / 1024).toFixed(1); + throw new BoundedJsonError( + `bounded-json: file exceeds size limit on read: ${sizeMb} MB (limit: ${limitMb} MB). ` + + `Path: ${filePath}. The file grew past the limit between stat and read ` + + `(the producer is likely still writing — a logging loop or raw-stdout dump). ` + + `Inspect the file before raising the limit.`, + { kind: 'SIZE_LIMIT', path: filePath, size: buf.length, maxBytes } + ); + } + + const raw = buf.toString('utf-8'); + try { return JSON.parse(raw); } catch (e) { diff --git a/packages/dogfood-swarm/lib/domains.js b/packages/dogfood-swarm/lib/domains.js index 6dbdc4d..cd286a8 100644 --- a/packages/dogfood-swarm/lib/domains.js +++ b/packages/dogfood-swarm/lib/domains.js @@ -5,8 +5,20 @@ * Three ownership classes: owned (exclusive), shared (multi-domain), bridge (coordinator-approved). * * Every domain change is persisted as a domain_event. - * Waves capture a domain_snapshot_id at dispatch time. - * Collect validates against the snapshot, not the latest state. + * + * Ownership is checked (checkOwnership) against the CURRENT frozen domain map. + * The frozen map is kept effectively authoritative for the duration of a wave + * by two guards: editDomain/addDomain/removeDomain refuse while frozen, and + * unfreezeDomains refuses while a wave is in flight (dispatched/collecting) + * unless an explicit { force, reason } is given. Together these close the + * dispatch→collect drift window so the latest map == the dispatch-time map. + * + * Waves still capture a domain_snapshot_id at dispatch time for the audit + * trail (takeDomainSnapshot). Making checkOwnership consult that literal + * snapshot payload — rather than relying on the no-drift guards above — is a + * deeper follow-up (collect.js would need to thread the wave's snapshot id in); + * see sm-001. Until then the snapshot is forensic, and the no-drift guards are + * what actually keep ownership honest. */ import { readdirSync, existsSync } from 'node:fs'; @@ -234,12 +246,183 @@ export function removeDomain(db, runId, domainName) { tx(); } +// ── Ownership arbitration ── + +/** + * Derive a representative concrete path from a glob by substituting its + * wildcard segments with a fixed token. Used only to probe whether two owned + * domains' glob sets can match a common file (overlap detection) — it does NOT + * need to enumerate every match, just to produce one path the glob owns. + * + * `src/**\/*.tsx` → `src/x/x.tsx`; `src/**` → `src/x`; `*.md` → `x.md`. + * + * sm-p-001: brace alternations `{a,b}` and char classes `[abc]` are collapsed + * to one representative literal FIRST, so an operator-authored owned glob like + * `src/{ui,frontend}/**` yields a probe path (`src/ui/x`) that minimatch still + * matches against the source glob. Without this, the sampled path kept the + * literal brace/bracket syntax, minimatch returned false against its own glob, + * and findOwnedGlobOverlaps was BLIND to brace/class owned domains — two + * identical `src/{a,b}/**` owners would freeze silently. Defense-in-depth only: + * runtime checkOwnership matches real file paths where minimatch handles braces + * natively; this only restores the freeze-time overlap probe. + */ +function sampleGlobPath(glob) { + return glob + // `{a,b,c}` → first alternative (`a`); `{a}` and empty `{}` collapse too. + .replace(/\{([^},]*)(?:,[^}]*)*\}/g, '$1') + // `[abc]` / `[a-z]` char class → a concrete char the class ACTUALLY matches + // (the range start `a` for `[a-z]`, the first literal for `[abc]`). A fixed + // token like `x` would not be a member of `[ab]`, so minimatch would still + // miss the sample against its own glob and the overlap probe would stay + // blind. Negated classes `[^…]` fall back to `x` (a safe non-member). + .replace(/\[(\^?)([^\]])[^\]]*\]/g, (_, neg, first) => (neg ? 'x' : first)) + .replace(/\*\*\//g, 'x/') // `**/` directory wildcard → one segment + .replace(/\*\*/g, 'x') // bare `**` → one segment + .replace(/\*/g, 'x') // `*` → token (keeps the extension on `*.tsx`) + .replace(/\?/g, 'x'); +} + +/** + * Score a glob's specificity as a comparable tuple. A glob with more literal + * path before its first wildcard is more specific; ties break on total literal + * character count (so `src/**` `*.tsx` outranks `src/**` on its `.tsx` literal); + * a final tie-break demotes globs with more `**` (broader reach = less + * specific). This is what lets frontend's `src/ui/**` / `src/**` `*.tsx` + * out-rank backend's `src/**` for a `.tsx` file — encoding detectDomains' + * first-match-wins intent as an order-independent property of the globs. + * + * @returns {[number, number, number]} [literalLeadSegments, literalChars, -doubleStars] + */ +function globSpecificity(glob) { + const segments = glob.split('/'); + let literalLead = 0; + for (const seg of segments) { + if (seg.includes('*') || seg.includes('?')) break; + literalLead++; + } + const literalChars = glob.replace(/[*?/]/g, '').length; + const doubleStars = (glob.match(/\*\*/g) || []).length; + return [literalLead, literalChars, -doubleStars]; +} + +/** Lexicographic compare of two specificity tuples (>0 ⇒ `a` more specific). */ +function compareSpecificity(a, b) { + for (let i = 0; i < a.length; i++) { + if (a[i] !== b[i]) return a[i] - b[i]; + } + return 0; +} + +/** + * The best (highest-specificity) glob in `globs` matching `file`, or null if + * none match. The returned score arbitrates which owned domain owns the file. + */ +function bestMatchingGlob(globs, file) { + let best = null; + for (const glob of globs) { + if (!minimatch(file, glob, { dot: true })) continue; + const score = globSpecificity(glob); + if (best === null || compareSpecificity(score, best.score) > 0) { + best = { glob, score }; + } + } + return best; +} + +/** + * Find pairs of OWNED domains whose globs overlap — i.e. some file is claimed + * by more than one exclusive owner WITH EQUAL specificity (a genuine + * criss-cross, e.g. two `**` domains, or `src/a/**` vs `src/a/**`). A + * strict-SUBSET overlap (one glob strictly more specific, like frontend's + * `src/ui/**` inside backend's `src/**`) is NOT a conflict: resolveExclusiveOwner + * arbitrates it to a single owner by specificity, exactly as detectDomains' + * first-match-wins does. sm-002 rejected ANY glob-level overlap, which broke + * freezing the auto-detected default full-stack map (sm-r-001); only an + * equal-specificity tie actually breaches per-domain isolation. We sample a + * concrete path from each owned glob and, for any OTHER owned domain that also + * matches it, compare both domains' best-matching specificity. + * + * @returns {Array<{ a: string, b: string, file: string }>} conflicting pairs + */ +function findOwnedGlobOverlaps(domains) { + const owned = domains.filter(d => d.ownership_class === 'owned'); + const conflicts = []; + const seen = new Set(); + + for (const domain of owned) { + for (const glob of domain.globs) { + const sample = sampleGlobPath(glob); + for (const other of owned) { + if (other.name === domain.name) continue; + const here = bestMatchingGlob(domain.globs, sample); + const there = bestMatchingGlob(other.globs, sample); + if (!here || !there) continue; + // Only an EQUAL-specificity tie is a genuine breach; if one glob is + // strictly more specific, resolveExclusiveOwner arbitrates the file to + // a single owner (same as detectDomains' first-match-wins), so a + // strict-subset overlap (frontend's src/ui/** ⊂ backend's src/**) is + // legal — not a conflict (sm-r-001). + if (compareSpecificity(here.score, there.score) !== 0) continue; + const key = [domain.name, other.name].sort().join('') + '' + sample; + if (seen.has(key)) continue; + seen.add(key); + conflicts.push({ a: domain.name, b: other.name, file: sample }); + } + } + } + return conflicts; +} + +/** + * Resolve the single exclusive owner of a file by specificity: the owned domain + * whose best-matching glob is the most specific wins, ties broken + * deterministically by domain name. This is ORDER-INDEPENDENT — it does not + * depend on getDomains' ORDER BY name nor on DEFAULT_BUCKETS order — yet it + * matches detectDomains' first-match-wins intent (the earlier, narrower bucket + * claims the file), so `src/ui/App.tsx` resolves to frontend (`src/ui/**`, + * `src/**` *.tsx) over backend (`src/**`). sm-r-001: the prior version iterated + * getDomains' alphabetical order, which DISAGREED with detection order and + * misattributed ownership once the freeze guard was relaxed. Returns the owning + * domain name, or null if no owned domain matches. + */ +function resolveExclusiveOwner(domains, file) { + let winner = null; + for (const d of domains) { + if (d.ownership_class !== 'owned') continue; + const match = bestMatchingGlob(d.globs, file); + if (!match) continue; + if (winner === null) { + winner = { name: d.name, score: match.score }; + continue; + } + const cmp = compareSpecificity(match.score, winner.score); + if (cmp > 0 || (cmp === 0 && d.name < winner.name)) { + winner = { name: d.name, score: match.score }; + } + } + return winner ? winner.name : null; +} + // ── Freeze / Unfreeze ── export function freezeDomains(db, runId) { const domains = getDomains(db, runId); if (domains.length === 0) throw new Error('No domains to freeze'); + // sm-002: reject overlapping OWNED globs at the freeze boundary so the bad + // state never exists. Two exclusive owners that both claim the same file + // defeat per-domain worktree isolation — fail fast and name the conflict. + const overlaps = findOwnedGlobOverlaps(domains); + if (overlaps.length > 0) { + const detail = overlaps + .map(c => `"${c.a}" and "${c.b}" both claim ${c.file}`) + .join('; '); + throw new Error( + `Cannot freeze: overlapping owned domains breach exclusive ownership — ${detail}. ` + + `Make the globs disjoint or reclassify one domain as shared/bridge.` + ); + } + db.prepare('UPDATE domains SET frozen = 1 WHERE run_id = ?').run(runId); // Log freeze event for each domain @@ -251,12 +434,54 @@ export function freezeDomains(db, runId) { } } +/** + * Statuses a wave is in while its agents are dispatched or being collected — + * the window during which the frozen domain map is the live ownership contract. + * Editing globs here would drift the map out from under in-flight agents. + */ +const ACTIVE_WAVE_STATUSES = ['dispatched', 'collecting']; + +/** + * Is there a wave for this run that is still in flight (dispatched/collecting)? + */ +export function hasActiveWave(db, runId) { + const row = db.prepare( + `SELECT COUNT(*) as cnt FROM waves + WHERE run_id = ? AND status IN (${ACTIVE_WAVE_STATUSES.map(() => '?').join(', ')})` + ).get(runId, ...ACTIVE_WAVE_STATUSES); + return row.cnt > 0; +} + /** * Unfreeze domains. Requires a reason — this is a coordinator-authorized action. + * + * sm-001: refuses while a wave is in flight (dispatched/collecting). Unfreezing + * mid-wave lets an operator broaden globs between dispatch and collect, so a + * file that was out-of-domain at dispatch time silently passes ownership at + * collect time and the captured domain_snapshot_id becomes decorative. The + * guard keeps the frozen map authoritative for the duration of a wave. An + * explicit { force: true } (still requiring a reason) is the documented escape + * hatch for a coordinator who has stopped the wave by hand. + * + * @param {Database} db + * @param {string} runId + * @param {string} reason + * @param {object} [opts] + * @param {boolean} [opts.force] — bypass the in-flight-wave guard */ -export function unfreezeDomains(db, runId, reason) { +export function unfreezeDomains(db, runId, reason, opts = {}) { if (!reason) throw new Error('Unfreeze requires a reason'); + if (!opts.force && hasActiveWave(db, runId)) { + throw new Error( + `Cannot unfreeze: a wave is in flight (${ACTIVE_WAVE_STATUSES.join('/')}) for run ${runId}. ` + + `Editing the domain map now would drift ownership out from under the dispatched agents ` + + `(the dispatch-time snapshot would no longer match the live map). ` + + `Collect or abort the wave first, or pass { force: true } with a reason if you have ` + + `already halted the wave by hand.` + ); + } + const domains = getDomains(db, runId); db.prepare('UPDATE domains SET frozen = 0 WHERE run_id = ?').run(runId); @@ -319,9 +544,15 @@ export function checkOwnership(db, runId, domainName, changedFiles) { const violations = []; for (const file of changedFiles) { - const matchesOwn = agentDomain.globs.some(g => minimatch(file, g, { dot: true })); - - if (matchesOwn) { + // sm-002: resolve the file's SINGLE exclusive owner via first-match-wins + // (same arbitration detectDomains uses) rather than testing the agent's + // globs in isolation. With overlapping owned globs (which freezeDomains now + // rejects, but this is the defense-in-depth runtime half) a bare + // `globs.some(...)` would let a non-owner pass; here a file owned by some + // OTHER owned domain falls through to the violation path below. + const exclusiveOwner = resolveExclusiveOwner(domains, file); + + if (exclusiveOwner === agentDomain.name) { valid.push({ file, reason: 'matches own domain' }); continue; } @@ -344,14 +575,10 @@ export function checkOwnership(db, runId, domainName, changedFiles) { continue; } - const owner = domains.find(d => - d.ownership_class === 'owned' && - d.globs.some(g => minimatch(file, g, { dot: true })) - ); violations.push({ file, agent_domain: domainName, - actual_owner: owner?.name || 'unassigned', + actual_owner: exclusiveOwner || 'unassigned', }); } diff --git a/packages/dogfood-swarm/lib/error-render.js b/packages/dogfood-swarm/lib/error-render.js index 9fcb2fd..cfc828f 100644 --- a/packages/dogfood-swarm/lib/error-render.js +++ b/packages/dogfood-swarm/lib/error-render.js @@ -60,7 +60,7 @@ function deriveHintForCode(e) { case 'RECORD_SCHEMA_INVALID': return 'inspect the failing record against packages/schemas/src/json/dogfood-record.schema.json and fix the invalid fields before re-ingesting'; case 'AGENT_OUTPUT_SCHEMA_INVALID': - return `inspect ${e.outputPath || 'the agent output JSON'} against scripts/agent-output.schema.json and fix the invalid fields. Required at top level: domain, summary. Audit outputs add findings[]; feature outputs add features[]; amend outputs add fixes[] + files_changed[]. Then \`swarm revalidate ${e.runId ?? ''} --reason "" --domain=${e.domain ?? ''}:${e.outputPath ?? ''} --apply\` to repair the blocked agent_run lawfully (dry-run without --apply)`; + return `inspect ${e.outputPath || 'the agent output JSON'} against packages/schemas/src/json/agent-output.schema.json and fix the invalid fields. Required at top level: domain, summary. Audit outputs add findings[]; feature outputs add features[]; amend outputs add fixes[] + files_changed[]. Then \`swarm revalidate ${e.runId ?? ''} --reason "" --domain=${e.domain ?? ''}:${e.outputPath ?? ''} --apply\` to repair the blocked agent_run lawfully (dry-run without --apply)`; case 'DUPLICATE_RUN_ID': return 'a run with this id already exists — use a fresh run id or `swarm runs` to inspect the existing one'; // D3B-003 (Wave A2 Stage C): dispatch precondition codes. diff --git a/packages/dogfood-swarm/lib/findings-digest.js b/packages/dogfood-swarm/lib/findings-digest.js index 251fb20..e00514f 100644 --- a/packages/dogfood-swarm/lib/findings-digest.js +++ b/packages/dogfood-swarm/lib/findings-digest.js @@ -265,8 +265,14 @@ export function buildDigest({ runId, waveNumber, swarmsDir = SWARMS_DIR, format, } // Only run as a CLI when invoked directly (not when imported by cli.js). -const isMain = import.meta.url === `file://${process.argv[1].replace(/\\/g, '/')}` || - process.argv[1]?.endsWith('findings-digest.js'); +// fp-005: guard process.argv[1] before .replace. When the module is loaded in +// a context where argv[1] is undefined (e.g. `node --eval` importing it), the +// unconditional `.replace` on the left operand threw a TypeError at module-load +// time — before the safer right-hand optional-chain guard could run. Compute +// the entry path once and short-circuit on it. +const entry = process.argv[1]; +const isMain = (entry && import.meta.url === `file://${entry.replace(/\\/g, '/')}`) || + entry?.endsWith('findings-digest.js'); if (isMain) { const [runId, waveArg] = process.argv.slice(2); diff --git a/packages/dogfood-swarm/lib/fingerprint.js b/packages/dogfood-swarm/lib/fingerprint.js index eac1312..e9223b0 100644 --- a/packages/dogfood-swarm/lib/fingerprint.js +++ b/packages/dogfood-swarm/lib/fingerprint.js @@ -1,16 +1,44 @@ /** * fingerprint.js — Stable finding dedup across waves. * - * A fingerprint is: category + rule_id + normalized_path + symbol + normalized_span + * A fingerprint is: category + rule_id + normalized_path + symbol + LOCATION, + * where LOCATION is one of two interchangeable encodings chosen at compute time: + * + * - context-hash (fp-p-005, preferred): when the finding's source file is + * available, LOCATION is a hash of the EDIT-STABLE surrounding source — the + * ~7 lines around finding.line, whitespace-collapsed and line-ending + * normalized. This is the CodeQL `primaryLocationLineHash` design: it + * survives reflow (re-indentation, line-wrapping, code inserted ELSEWHERE in + * the file that shifts the finding's line number) because it hashes the + * surrounding CONTENT, not the line number. Two genuinely-distinct findings + * at different points in the same file see different surrounding source, so + * they get different base fingerprints with NO occurrence-salting needed — + * the base fp is a pure, injective function of the finding's own stable + * content. (Coverity's enclosing-function key is the same idea at function + * granularity; the finding.symbol component already carries the function + * name when the auditor reports one.) + * + * - line-bucket (fallback): when no source is available (synthetic finding, + * unresolvable/deleted file, file-level finding with no line), LOCATION + * degrades to the pre-fp-p-005 10-line bucket. This path is BYTE-FOR-BYTE + * identical to the historical fingerprint, so cross-wave dedup of findings + * that lack readable source is unchanged. The occurrence-salting net + * (disambiguateFingerprints) still covers the residual collisions on this + * path and the rare identical-surrounding-source collision on the other. * * Description is intentionally NOT in the fingerprint. The wave 8 self-inspection * (B-BACK-002) caught the original code folding a SHA hash of the description * into every fingerprint, which meant that any wave-to-wave rewording of the * same defect produced a brand-new fingerprint and double-counted it as both * `fixed` (old fp) and `new` (new fp) in the next wave's classifyFindings output. + * The context-hash is the opposite of that mistake: it folds in EDIT-STABLE + * surrounding SOURCE (which a rewording does not touch), never the volatile + * description prose. * - * Spec contract: two findings at the same (category, rule_id, path, symbol, - * line-bucket) are the same finding — even if their description prose differs. + * Spec contract: two findings at the same (category, rule_id, path, symbol) AND + * the same surrounding source are the same finding — even if their description + * prose differs. When source is unavailable the contract degrades to the + * historical (…, line-bucket) key. * * Classification states: * new — first time this fingerprint appears @@ -28,6 +56,8 @@ import { createHash } from 'node:crypto'; +import { logStage } from './log-stage.js'; + /** * Normalize a file path for fingerprinting. * Strips leading ./ and normalizes separators. @@ -50,11 +80,75 @@ function normalizeSpan(lineNumber) { return String(Math.floor(lineNumber / 10) * 10); } +/** + * Number of source lines to include on EACH side of the finding's line when + * building the context snippet. 3 → a 7-line window. Big enough that two + * findings even one line apart get different windows (the window edges differ), + * small enough that the window is meaningfully "around" the finding and that a + * single nearby edit only perturbs a fraction of it. Findings <2 lines apart in + * a file shorter than the window can still share a window — that residual is + * caught by the occurrence-salting net, same as the no-source path. + */ +export const CONTEXT_RADIUS_LINES = 3; + +/** + * Upper bound on the normalized snippet fed to the hash. A pure DoS guard + * against a minified/generated file whose 7-line window is megabytes of one + * line; it does not affect distinctiveness for human-readable source (adjacent + * findings' windows start at different lines, so their leading chars already + * differ). Not a "~100 char" semantic cap — the CONTEXT_RADIUS window is what + * defines the meaningful surrounding-source amount. + */ +const CONTEXT_SNIPPET_MAX_CHARS = 4096; + +/** + * Extract the EDIT-STABLE context snippet around a finding's line. + * + * Returns a normalized string (whitespace collapsed, line endings folded) of the + * CONTEXT_RADIUS_LINES window centered on `line`, or null when no meaningful + * snippet can be built (no source, non-positive/out-of-range line, all-blank + * window) — null signals computeFingerprint to fall back to the line bucket. + * + * The normalization is what buys reflow-survival: collapsing every run of + * whitespace to a single space and trimming means re-indentation and + * line-rewrapping leave the snippet (and therefore the fingerprint) unchanged. + * Anchoring on the source CONTENT rather than the line number is what buys + * stability when code inserted elsewhere shifts the finding down the file — the + * surrounding lines move with it, so the same window text is read at the new + * line number. + * + * @param {string} [sourceText] — full text of the finding's file + * @param {number} [line] — 1-based line number of the finding + * @returns {string|null} + */ +export function extractContextSnippet(sourceText, line) { + if (typeof sourceText !== 'string' || sourceText.length === 0) return null; + if (!Number.isInteger(line) || line < 1) return null; + + const lines = sourceText.replace(/\r\n?/g, '\n').split('\n'); + const idx = line - 1; + if (idx >= lines.length) return null; + + const start = Math.max(0, idx - CONTEXT_RADIUS_LINES); + const end = Math.min(lines.length, idx + CONTEXT_RADIUS_LINES + 1); + const normalized = lines.slice(start, end).join('\n').replace(/\s+/g, ' ').trim(); + if (normalized.length === 0) return null; + + return normalized.slice(0, CONTEXT_SNIPPET_MAX_CHARS); +} + /** * Compute a stable fingerprint for a finding. * * Description is NOT folded in — see file header for the contract and the - * B-BACK-002 incident that drove this change. + * B-BACK-002 incident that drove this change. When `options.sourceText` is the + * finding's file content, an edit-stable context-snippet hash replaces the line + * bucket as the LOCATION component (fp-p-005); otherwise LOCATION is the + * historical 10-line bucket and the output is byte-for-byte what it was before. + * + * Pure function: it reads no filesystem. The caller (collect.js) reads the + * source once per file and threads it in via sourceText, which keeps this + * trivially testable and lets the read be cached at the fingerprint step. * * @param {object} finding * @param {string} finding.category — bug, security, quality, ux, etc. @@ -62,21 +156,267 @@ function normalizeSpan(lineNumber) { * @param {string} [finding.file] — file path * @param {string} [finding.symbol] — function/class/variable name * @param {number} [finding.line] — line number + * @param {object} [options] + * @param {string} [options.sourceText] — full text of finding.file, when available * @returns {string} — hex fingerprint */ -export function computeFingerprint(finding) { +export function computeFingerprint(finding, options = {}) { + const snippet = extractContextSnippet(options.sourceText, finding.line); + // The context component carries a short prefix so it can never alias a bare + // bucket string; the fallback is left bare so its raw input — and therefore + // the fingerprint of a no-source finding — is byte-identical to the + // pre-fp-p-005 scheme (the cross-wave-dedup backward-compat guarantee). + const location = snippet !== null + ? `ctx:${createHash('sha256').update(snippet).digest('hex').slice(0, 16)}` + : normalizeSpan(finding.line); + const parts = [ finding.category || 'unknown', finding.rule_id || '', normalizePath(finding.file), (finding.symbol || '').toLowerCase(), - normalizeSpan(finding.line), + location, ]; const raw = parts.join('|'); return createHash('sha256').update(raw).digest('hex').slice(0, 24); } +/** + * Disambiguate within-wave fingerprint collisions — PRIOR-AWARE and + * ORDER-INDEPENDENT. + * + * Post-fp-p-005 this is a SAFETY NET (see the closing "Status" note): with the + * source available, computeFingerprint's context-snippet hash already makes + * distinct findings carry distinct base fps, so no collision groups form and + * nothing below runs. It still fires on the no-source fallback path and on the + * rare identical-surrounding-source case. The history below is why it exists and + * what it does WHEN it fires. + * + * fp-002 (the original marquee fix). The base fingerprint deliberately excludes + * the description (B-BACK-002 contract), so before fp-p-005 two genuinely- + * distinct findings that shared a coarse key — same (category, rule_id, path, + * symbol, 10-line bucket) but different prose — collapsed to the SAME base + * fingerprint. That is correct for cross-wave dedup, but WITHIN a single wave + * both land in `result.new` and + * upsertFindings then tries to INSERT two rows under one fingerprint AND one + * derived finding_id, violating UNIQUE(run_id, fingerprint) / + * UNIQUE(run_id, finding_id) and aborting the whole collect (0 rows persisted). + * Reproduced live: two README findings 6 lines apart, no symbol, same bucket. + * + * fp-r-001 (the regression this revision repairs). The original fp-002 fix + * assigned the occurrence index purely from within-wave ARRAY ORDER and was + * blind to prior-wave state. collect.js iterates agents/domains in a + * non-deterministic order, so when a wave-1 SINGLETON gained a new coarse-key + * sibling in wave 2, the bare-fp slot was awarded by this-wave sort order — not + * to the member that already owned the bare fp in prior state. The genuinely-NEW + * sibling could be handed the bare fp, dedupe against the prior finding's row → + * classified `recurring` and SILENTLY SWALLOWED, while the original finding's + * stable finding_id (the `swarm approve --ids` / D3B-006 handle) was hijacked. + * Order-dependent corruption with silent data loss. + * + * Design (grounded in CodeQL primaryLocationLineHash, Semgrep match_based_id's + * occurrence index, and the WER/Sentry "default-expand, fold-never-drop" + * principle): group findings by base fingerprint. + * + * - SINGLETONS (group size 1) keep the bare fingerprint UNCHANGED — byte-for- + * byte backward-compat for the common case and the cross-wave dedup + * invariant (B-BACK-002). + * - For each COLLISION group (size > 1) the bare-fp keeper is chosen + * DETERMINISTICALLY, never by array order: + * · If the group's base fp EXISTS in `priorFingerprints`, the keeper is + * the member whose content best matches the prior row (description + * first, then file/line). That member keeps the BARE fp so it dedupes + * to its own prior row as `recurring` and KEEPS its finding_id — this + * eliminates the fp-r-001 id hijack. + * · If the base fp is NOT in prior, the keeper is the deterministically- + * FIRST member under a STABLE content sort (description, then file, then + * line) — not array order. + * - Every NON-keeper is salted by a PURE function of its OWN content + * (normalized description + normalized file + line) AND a within-group + * ordinal from a DETERMINISTIC sort (stableContentSort) over the group's + * non-keepers — NOT an array index. Order-independent and stable across + * waves while the member stays a non-keeper. Two non-keepers get distinct + * salts even when their descriptions are EQUAL or both EMPTY (fp-p-001): the + * deterministic ordinal breaks the tie that description-only salting left + * open (where the second member collided on the same salted fingerprint / + * finding_id and was silently dropped by upsertFindings' INSERT OR IGNORE). + * Genuinely-distinct findings sharing a coarse key thus get distinct + * fingerprints + finding_ids without folding the volatile description into + * the BASE fingerprint. + * + * Status after fp-p-005: the deeper fix this caveat once deferred is now in + * computeFingerprint — when the source is available, the edit-stable + * context-snippet hash makes the BASE fingerprint injective, so genuinely- + * distinct findings no longer share a base fp and this function sees only + * size-1 groups (every member is its own keeper, nothing is salted). It is now + * a SAFETY NET, not the primary mechanism, and still earns its keep for the two + * cases the context hash cannot cover: (1) the no-source fallback path, where + * LOCATION is still the coarse 10-line bucket and distinct findings can collide; + * (2) the rare case of two distinct findings whose surrounding source is + * byte-identical (e.g. duplicated boilerplate), which hash to the same context. + * + * Residual (honest): in those net-firing cases, a member that transitions + * between keeper(bare) and non-keeper(salted) across waves — when a collision + * group grows or shrinks — can still show a ONE-TIME new/recurring churn. This + * is bounded, never data loss, never a crash. In the common (source-available) + * path it no longer occurs at all, because no collision groups form. + * + * @param {Array} findings — current-wave findings; each may already carry a + * `fingerprint` (set by collect.js via computeFingerprint). When absent it is + * computed here. Array order is NOT consulted for keeper/salt selection. + * @param {Map} [priorFingerprints] — fingerprint → prior-wave + * row (from buildPriorMap). Used only to pick the bare-fp keeper for a + * collision group whose base fp already exists in prior state. Defaults to an + * empty Map so direct callers/tests need not pass it. + * @returns {Array} new array of findings with a disambiguated `fingerprint`, + * in the SAME order as the input. Inputs are not mutated. + */ +export function disambiguateFingerprints(findings, priorFingerprints = new Map()) { + const groups = new Map(); + for (const finding of findings) { + const base = finding.fingerprint || computeFingerprint(finding); + if (!groups.has(base)) groups.set(base, []); + groups.get(base).push(finding); + } + + // Resolve the disambiguated fingerprint for each (base, member) pair up + // front, keyed by object identity, so the final map() can preserve input + // order without re-deriving the keeper per element. + const resolved = new Map(); + for (const [base, members] of groups) { + if (members.length === 1) { + resolved.set(members[0], base); + continue; + } + + const keeper = chooseBareKeeper(base, members, priorFingerprints.get(base)); + + // Assign each non-keeper a within-group ordinal from a DETERMINISTIC sort + // (stableContentSort: description, then file, then line) rather than array + // order. Folding this ordinal into the salt makes two non-keepers with + // equal-or-empty descriptions still get distinct salts (fp-p-001), while + // staying order-independent and stable across waves: the same group always + // sorts the same way regardless of wave-mate iteration order. + const nonKeepers = members.filter((m) => m !== keeper); + const ordinalOf = new Map(); + stableContentSort(nonKeepers).forEach((m, i) => ordinalOf.set(m, i)); + + for (const member of members) { + if (member === keeper) { + resolved.set(member, base); + continue; + } + const salted = saltByContent(base, member, ordinalOf.get(member)); + logStage('fingerprint_disambiguated', { + component: 'dogfood-swarm', + base_fingerprint: base, + total_occurrences: members.length, + salted_fingerprint: salted, + keeper_is_prior_match: priorFingerprints.has(base), + file: member.file || member.file_path || null, + category: member.category || null, + }); + resolved.set(member, salted); + } + } + + return findings.map((finding) => { + const fp = resolved.get(finding); + return finding.fingerprint === fp ? finding : { ...finding, fingerprint: fp }; + }); +} + +/** + * Collapse a description to a stable discriminator: lowercase + whitespace + * collapsed + trimmed. Used both to match a collision member against its prior + * row and to derive a member's content salt. Order-independent by construction. + */ +function normalizeDescription(description) { + return String(description || '').toLowerCase().replace(/\s+/g, ' ').trim(); +} + +/** + * Choose the member of a collision group that keeps the BARE fingerprint. + * + * - With a prior row: prefer the member whose normalized description equals the + * prior row's, else whose (file,line) matches; ties and no-match fall through + * to the stable content sort so the choice is always deterministic. + * - Without a prior row: the first member under the stable content sort. + */ +function chooseBareKeeper(base, members, priorRow) { + if (priorRow) { + const priorDesc = normalizeDescription(priorRow.description); + const descMatches = members.filter((m) => normalizeDescription(m.description) === priorDesc); + if (descMatches.length === 1) return descMatches[0]; + const pool = descMatches.length > 1 ? descMatches : members; + + const priorPath = normalizePath(priorRow.file_path || priorRow.file || ''); + const priorLine = priorRow.line_number ?? priorRow.line ?? null; + const locMatches = pool.filter((m) => + normalizePath(m.file) === priorPath + && (m.line ?? null) === priorLine); + if (locMatches.length === 1) return locMatches[0]; + + return stableContentSort(locMatches.length > 0 ? locMatches : pool)[0]; + } + return stableContentSort(members)[0]; +} + +/** + * Deterministic content order for a collision group: normalized description, + * then normalized path, then line. Independent of input array order, so the + * same group yields the same keeper regardless of wave-mate iteration order. + */ +function stableContentSort(members) { + return [...members].sort((a, b) => { + const da = normalizeDescription(a.description); + const db = normalizeDescription(b.description); + if (da !== db) return da < db ? -1 : 1; + const pa = normalizePath(a.file); + const pb = normalizePath(b.file); + if (pa !== pb) return pa < pb ? -1 : 1; + return (a.line ?? -1) - (b.line ?? -1); + }); +} + +/** + * Salt a non-keeper member by a PURE function of its own content + a + * DETERMINISTIC within-group ordinal — NOT an array index. + * + * fp-p-001: the description alone is NOT a sufficient discriminator. Two + * non-keeper members of the same coarse-key group with equal or both-empty + * descriptions (normalizeDescription('') === normalizeDescription(null) === '') + * hashed to the SAME salt → the same salted fingerprint → the same derived + * finding_id → upsertFindings' INSERT OR IGNORE silently dropped the second + * genuinely-distinct finding. The fix folds three more discriminators into the + * salt source so equal/empty-description members still diverge: + * - the member's normalized file, + * - the member's line, + * - a within-group `ordinal` assigned by stableContentSort (description, then + * file, then line) over the group's non-keepers. + * + * The ordinal is order-INDEPENDENT (it is the index under the deterministic + * sort, not the input array order), so the salt stays stable across waves while + * the member remains a non-keeper, and two members that tie on description AND + * file AND line still receive distinct salts via their distinct sort positions. + * Singletons and the bare-fp keeper never reach here, so their fingerprints are + * untouched. + */ +function saltByContent(base, member, ordinal) { + const discriminator = [ + normalizeDescription(member.description), + normalizePath(member.file), + member.line ?? '', + ordinal, + ].join('|'); + const contentHash = createHash('sha256').update(discriminator).digest('hex'); + return createHash('sha256') + .update(`${base}|d:${contentHash}`) + .digest('hex') + .slice(0, 24); +} + /** * Classify findings against prior wave state. * @@ -101,7 +441,20 @@ export function classifyFindings(currentFindings, priorFingerprints, scope = nul const currentSet = new Set(); const result = { new: [], recurring: [], fixed: [], unverified: [] }; - for (const finding of currentFindings) { + // fp-002 Part 1 (fp-r-001 repair): salt the NON-keeper members of any + // within-wave base-fingerprint collision so two genuinely-distinct findings + // sharing a coarse key get distinct fingerprints (and distinct derived + // finding_ids in upsertFindings) instead of colliding on UNIQUE(run_id, + // fingerprint). The prior map is passed through so the bare-fp keeper for a + // collision group whose base fp already exists in prior state is the member + // that MATCHES the prior row — not whoever sorts first in this wave's array. + // That keeps the original finding on its original finding_id (no D3B-006 + // handle hijack) and inserts the genuinely-new sibling as its own row (no + // silent swallow). Singletons keep the bare fingerprint, so cross-wave dedup + // and backward-compat are untouched. See disambiguateFingerprints. + const disambiguated = disambiguateFingerprints(currentFindings, priorFingerprints); + + for (const finding of disambiguated) { const fp = finding.fingerprint || computeFingerprint(finding); currentSet.add(fp); @@ -186,8 +539,21 @@ export function buildPriorMap(db, runId) { * @returns {{ inserted: number, updated: number, fixed: number, unverified: number }} */ export function upsertFindings(db, runId, waveId, classified) { + // fp-002 Part 2 (safety net): INSERT OR IGNORE so a residual within-wave + // collision on EITHER unique index — UNIQUE(run_id, finding_id) or + // UNIQUE(run_id, fingerprint) — skips the offending row instead of throwing + // and aborting the whole collect transaction (which rolled back ALL findings + // for the wave, 0 rows persisted). Part 1 (disambiguateFingerprints) already + // splits coarse-key collisions into distinct fingerprints/ids, so in practice + // a skip here only fires for a TRUE distinct-fingerprint / same-8-hex-prefix + // collision (astronomically rare, the D3B-006 case). We choose log+skip over + // abort: a never-abort collect is the invariant (an operator can re-report a + // dropped finding next wave; a fully-rolled-back wave loses everything). The + // skip is emitted as a structured logStage event below so it is observable, + // not silent — preserving the loud-not-silent spirit of D3B-006 without its + // collect-aborting blast radius. const insertFinding = db.prepare(` - INSERT INTO findings (run_id, finding_id, fingerprint, severity, category, + INSERT OR IGNORE INTO findings (run_id, finding_id, fingerprint, severity, category, file_path, line_number, symbol, description, recommendation, status, first_seen_wave, last_seen_wave) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 'new', ?, ?) @@ -237,6 +603,23 @@ export function upsertFindings(db, runId, waveId, classified) { f.file || null, f.line || null, f.symbol || null, f.description, f.recommendation || null, waveId, waveId ); + if (result.changes === 0) { + // INSERT OR IGNORE skipped this row on a unique-index conflict. With + // Part 1's disambiguation this should be unreachable for coarse-key + // collisions; a skip here is the rare true (run_id, finding_id) / + // (run_id, fingerprint) collision. Log it loud (operator can re-report + // next wave) rather than aborting the wave. + logStage('finding_insert_skipped', { + component: 'dogfood-swarm', + run_id: runId, + wave_id: waveId, + finding_id: fid, + fingerprint: f.fingerprint, + file: f.file || null, + reason: 'unique_conflict_on_insert_or_ignore', + }); + continue; + } insertEvent.run(result.lastInsertRowid, 'reported', waveId, null); inserted++; } diff --git a/packages/dogfood-swarm/lib/git-touched-files.js b/packages/dogfood-swarm/lib/git-touched-files.js index bc7c836..7a20e26 100644 --- a/packages/dogfood-swarm/lib/git-touched-files.js +++ b/packages/dogfood-swarm/lib/git-touched-files.js @@ -23,8 +23,7 @@ import { execFileSync } from 'node:child_process'; /** * Compute the actual touched-files set in a worktree relative to HEAD, * including untracked files (a new `.github/CODEOWNERS` or similar would be - * invisible to `git diff --name-only HEAD` but visible to `git status - * --porcelain`). + * invisible to `git diff` but visible to `git status --porcelain`). * * @param {string} repoPath — absolute path to the worktree * @returns {{ modified: string[], added: string[], deleted: string[], untracked: string[], all: string[] }} @@ -38,15 +37,22 @@ export function getActualTouchedFiles(repoPath) { const deleted = []; const untracked = []; - // `git diff --name-only HEAD` covers staged + unstaged tracked changes - // (modified + added + deleted). `git status --porcelain` distinguishes - // adds from modifications and catches untracked files — needed so that a - // new file like `.github/CODEOWNERS` lands in the touched set even before - // any `git add`. We use porcelain for the categorization and diff as a - // belt-and-suspenders cross-check on tracked changes. + // `git status --porcelain` distinguishes adds from modifications and catches + // untracked files in one pass — needed so that a new file like + // `.github/CODEOWNERS` lands in the touched set even before any `git add`. + // + // fp-003: the `-z` flag is load-bearing, not a perf tweak. Without it git + // C-quotes any path containing a space or non-ASCII byte (default + // core.quotePath=true) — `naïve.js` → `"na\303\257ve.js"` — and the old + // `path.replace(/\\/g,'/')` then mangled the octal escapes into + // `na/303/257ve.js`, silently corrupting the touched set the ownership gate + // depends on. Under `-z` git emits raw NUL-terminated bytes with NO quoting + // or escaping (spaces and UTF-8 preserved verbatim), so we split on \0 and + // do NOT run any backslash→slash normalization on the bytes (that step + // corrupts multibyte UTF-8 on POSIX). `-z` paths already use forward slashes. let porcelain; try { - porcelain = execFileSync('git', ['status', '--porcelain', '--untracked-files=normal'], { + porcelain = execFileSync('git', ['status', '--porcelain', '-z', '--untracked-files=normal'], { cwd: repoPath, encoding: 'utf-8', }); @@ -57,24 +63,27 @@ export function getActualTouchedFiles(repoPath) { return { modified: [], added: [], deleted: [], untracked: [], all: [], unavailable: true }; } - // Porcelain format: `XY path` (X = index status, Y = worktree status). - // Renames are `R old -> new`; we want the destination. - for (const line of porcelain.split('\n')) { - if (!line.trim()) continue; - const xy = line.slice(0, 2); - const rest = line.slice(3); - let path; - if (xy[0] === 'R' || xy[1] === 'R') { - const arrow = rest.indexOf(' -> '); - path = arrow >= 0 ? rest.slice(arrow + 4) : rest; - } else { - path = rest; + // Under `-z` each record is `XY path\0` (X = index status, Y = + // worktree status). A rename/copy record is special: the status field is + // followed by the DESTINATION path in this record, then the SOURCE path as a + // SEPARATE trailing NUL field (there is no ` -> ` separator under `-z`). We + // want the destination, so we consume — and discard — that extra source + // field when we see an R/C status. + const fields = porcelain.split('\0'); + for (let i = 0; i < fields.length; i++) { + const record = fields[i]; + if (!record) continue; // trailing empty field after the final NUL + const xy = record.slice(0, 2); + const path = record.slice(3); // skip the single space after XY + if (xy[0] === 'R' || xy[1] === 'R' || xy[0] === 'C' || xy[1] === 'C') { + // The next field is the rename/copy SOURCE — consume it so it is not + // misread as its own record. The destination (`path`) is what we keep. + i++; } - path = path.replace(/\\/g, '/'); if (xy === '??') untracked.push(path); else if (xy[0] === 'D' || xy[1] === 'D') deleted.push(path); else if (xy[0] === 'A' || xy[1] === 'A') added.push(path); - else modified.push(path); + else modified.push(path); // includes R/C — destination is a touched tracked file } const seen = new Set(); diff --git a/packages/dogfood-swarm/lib/output-schema.js b/packages/dogfood-swarm/lib/output-schema.js index 6f1d4cb..41c74ac 100644 --- a/packages/dogfood-swarm/lib/output-schema.js +++ b/packages/dogfood-swarm/lib/output-schema.js @@ -21,9 +21,9 @@ const SEVERITY_ENUM = ['CRITICAL', 'HIGH', 'MEDIUM', 'LOW']; // (which covers test-correctness drift). // Underscore_form is preserved as-is — historical record (CLAUDE.md // "Working with the legacy" doctrine: don't normalize for aesthetics). -// scripts/agent-output.schema.json is a sibling cross-fix-dep — ci-tooling -// owns that file in wave 28; this enum and the schema's `category` $def -// must stay in lockstep. +// packages/schemas/src/json/agent-output.schema.json (shipped via +// @dogfood-lab/schemas) is a sibling cross-fix-dep; this enum and the +// schema's `category` $def must stay in lockstep. const AUDIT_CATEGORIES = [ 'bug', 'security', 'quality', 'types', 'tests', 'docs', 'defensive', 'observability', 'degradation', 'future-proofing', diff --git a/packages/dogfood-swarm/lib/persist/dogfood-bridge.js b/packages/dogfood-swarm/lib/persist/dogfood-bridge.js index 30e3dfe..0c14428 100644 --- a/packages/dogfood-swarm/lib/persist/dogfood-bridge.js +++ b/packages/dogfood-swarm/lib/persist/dogfood-bridge.js @@ -84,8 +84,17 @@ export function buildDogfoodSubmission(exportData, overallVerdict) { }); } + // fp-p-003: the dogfood ingest is duplicate-guarded on + // (run_id, repo, timing.finished_at) (packages/ingest/run.js). Falling back + // to `new Date()` for an INCOMPLETE run (run.completed null) made finished_at + // wall-clock-at-persist, so it shifted on every invocation, the duplicate + // probe never matched, and each re-persist minted a NEW corpus record (which + // ingest.yml commits to main). Derive both timestamps from a STABLE run + // column — run.created is non-null for any persisted run — so the dedup key + // is deterministic for a given run regardless of completion state, making + // `swarm persist --ingest` idempotent on re-run even mid-flight. const startedAt = run.created || new Date().toISOString(); - const finishedAt = run.completed || new Date().toISOString(); + const finishedAt = run.completed || run.created || new Date().toISOString(); return buildSubmission({ repo: run.repo, diff --git a/packages/dogfood-swarm/lib/templates.js b/packages/dogfood-swarm/lib/templates.js index 81769b4..b63aa77 100644 --- a/packages/dogfood-swarm/lib/templates.js +++ b/packages/dogfood-swarm/lib/templates.js @@ -5,8 +5,9 @@ * Templates embed: repo path, domain scope, file list, phase lens, output format. * * Authority discipline (Stage B Item 1 + Item 4): - * The output-shape contract appended to every prompt is DERIVED FROM - * scripts/agent-output.schema.json — not hand-typed parallel to it. The + * The output-shape contract appended to every prompt is DERIVED FROM the + * canonical agent-output schema (@dogfood-lab/schemas) — not hand-typed + * parallel to it. The * worked-example JSON below the contract block stays as a worked example, * but the schema fragment is the load-bearing reference. Same root-cause * group as Item 4: brief-vs-frozen-state parallel authority. Pact-style @@ -16,20 +17,15 @@ import { readFileSync } from 'node:fs'; import { createRequire } from 'node:module'; -import { dirname, join } from 'node:path'; -import { fileURLToPath } from 'node:url'; - -const __filename = fileURLToPath(import.meta.url); -const __dirname = dirname(__filename); - -// Resolve agent-output.schema.json the SAME way validate-agent-output.js does -// (packages/dogfood-swarm/lib → ../../../scripts). If/when the schema graduates -// to @dogfood-lab/schemas, both modules switch in lockstep to a -// createRequire('@dogfood-lab/schemas/...') resolution. createRequire is -// imported here so future refactors can pivot without re-plumbing. -// eslint-disable-next-line no-unused-vars -const _require = createRequire(import.meta.url); -const SCHEMA_PATH = join(__dirname, '..', '..', '..', 'scripts', 'agent-output.schema.json'); + +// Resolve the agent-output schema the SAME way validate-agent-output.js does: +// createRequire on @dogfood-lab/schemas's `./json/*` subpath export (fp-p-006 +// consolidation — one source of truth shipped via the dependency, no +// package-local copy to keep in sync). The prompt-builder reads the canonical +// schema so the contract block injected into every dispatched prompt cannot +// drift from the collect-time validator. +const require = createRequire(import.meta.url); +const SCHEMA_PATH = require.resolve('@dogfood-lab/schemas/json/agent-output.schema.json'); let _canonicalSchema = null; function getCanonicalSchema() { diff --git a/packages/dogfood-swarm/lib/validate-agent-output.js b/packages/dogfood-swarm/lib/validate-agent-output.js index 079ef03..07d2614 100644 --- a/packages/dogfood-swarm/lib/validate-agent-output.js +++ b/packages/dogfood-swarm/lib/validate-agent-output.js @@ -2,7 +2,7 @@ * validate-agent-output.js — Ajv-backed live validator for agent outputs. * * F-252713-017 (Phase 7 wave 1 → wave 2 wiring): the wave-1 ci-tooling agent - * built scripts/agent-output.schema.json + the schema-conformance handler. + * built the agent-output schema + the schema-conformance handler. * That handler validates fixture JSONs at CI time. This module closes Class * #11 (multi-occurrence fix completeness) by running the SAME schema inside * collect.js BEFORE upsertFindings — live agent outputs are now rejected at @@ -17,7 +17,7 @@ * Why a typed error: the renderTopLevelError seam in lib/error-render.js * pattern-matches on `.code`. AGENT_OUTPUT_SCHEMA_INVALID gets the same * actionable-hint treatment that RECORD_SCHEMA_INVALID does — operator sees - * "Next: inspect the failing output against scripts/agent-output.schema.json". + * "Next: inspect the failing output against packages/schemas/src/json/agent-output.schema.json". * * Phase routing: the canonical envelope only requires { domain, summary }. * The phase-specific inner shape is governed by oneOf-style $defs in the @@ -33,19 +33,22 @@ import Ajv2020 from 'ajv/dist/2020.js'; import addFormats from 'ajv-formats'; import { readFileSync } from 'node:fs'; import { createRequire } from 'node:module'; -import { dirname, join } from 'node:path'; -import { fileURLToPath } from 'node:url'; -// The agent-output schema lives at scripts/agent-output.schema.json (repo -// root). It is NOT yet packaged through @dogfood-lab/schemas — backend wave 2 -// keeps it where the wave-1 ci-tooling agent put it, and resolves the path -// relative to this module. If the schema graduates to the schemas package -// later, this resolution becomes a createRequire('@dogfood-lab/schemas/...') -// call mirroring validate-record.js. +// fp-p-006 (deferred from the self-audit, now landed): the agent-output schema +// is the single source of truth, shipped via @dogfood-lab/schemas and resolved +// the same way the eight contract schemas are — createRequire on the `./json/*` +// subpath export (CLAUDE.md rule #5; mirrors packages/ingest/validate-record.js). +// This replaced the prior package-local copy (packages/dogfood-swarm/schema/) +// plus its byte-equality drift guard (meta-amendA-schema-packaging.test.js): +// one file, shipped through the dependency, with no copy to keep in sync. +// +// The schema is NOT one of the eight payload schemas registered in the schemas +// package's `validatePayload` map, so it is compiled here with a local Ajv2020 +// instance rather than the canonical compileSchema seam — it is a swarm output +// envelope, not a contract-spine schema. That local `new Ajv` is allowlisted +// in scripts/check-validator-cache-singleton.test.mjs for exactly this reason. const require = createRequire(import.meta.url); -const __filename = fileURLToPath(import.meta.url); -const __dirname = dirname(__filename); -const SCHEMA_PATH = join(__dirname, '..', '..', '..', 'scripts', 'agent-output.schema.json'); +const SCHEMA_PATH = require.resolve('@dogfood-lab/schemas/json/agent-output.schema.json'); let _validator = null; let _loadError = null; @@ -94,7 +97,7 @@ export class AgentOutputValidationError extends Error { } /** - * Validate an agent JSON output against scripts/agent-output.schema.json. + * Validate an agent JSON output against packages/schemas/src/json/agent-output.schema.json. * * @param {object} output — parsed JSON agent output * @param {object} [opts] diff --git a/packages/dogfood-swarm/lib/verify-classifier-v2.js b/packages/dogfood-swarm/lib/verify-classifier-v2.js index a8a3b45..562c2c9 100644 --- a/packages/dogfood-swarm/lib/verify-classifier-v2.js +++ b/packages/dogfood-swarm/lib/verify-classifier-v2.js @@ -111,16 +111,26 @@ function resolveFilePath(repoRoot, filePath) { * Build a regex that matches the anchor we expect to find at a given symbol * or in a description. Preference: explicit symbol, then first identifier * token of length ≥4 in description. + * + * Returns `{ regex, fromSymbol }` (or null if no anchor is derivable). + * `fromSymbol` distinguishes a reliable code-identifier anchor (the agent + * captured a real `symbol`) from an unreliable prose-derived anchor (the + * lead identifier-like word of the prose description). The distinction is + * load-bearing: a MISS on a code-identifier anchor genuinely means the + * symbol is gone (fix landed), but a MISS on a prose token means nothing — + * a description like "Race condition in writer" has no reason for its lead + * word "Race" to appear in source, so a miss must NOT resolve to `verified`. + * The caller routes a prose-anchor miss to `unverifiable` instead. */ function buildAnchorRegex({ symbol, description }) { const sym = (symbol || '').trim(); if (sym && /^[A-Za-z_][\w$]*$/.test(sym)) { - return new RegExp(`\\b${escapeRegex(sym)}\\b`); + return { regex: new RegExp(`\\b${escapeRegex(sym)}\\b`), fromSymbol: true }; } const desc = String(description || ''); const match = desc.match(/\b([A-Za-z_][\w$]{3,})\b/); if (match) { - return new RegExp(`\\b${escapeRegex(match[1])}\\b`); + return { regex: new RegExp(`\\b${escapeRegex(match[1])}\\b`), fromSymbol: false }; } return null; } @@ -144,13 +154,24 @@ function defaultReadLines(absolutePath) { /** * Compute the bucket [start, end] inclusive that contains `recordedLine`. * If no line was recorded (0 / null), scan the entire file. + * + * The window is SYMMETRIC: it spans the finding's own 10-line bucket plus + * one full adjacent bucket in EACH direction. An asymmetric (upward-only) + * window let a still-present symbol that drifted into the adjacent LOWER + * bucket escape the search and misclassify as `verified` (ve-003). Example: + * a finding recorded at line 42 (bucket [40,50]) whose symbol drifted to + * line 38 must still be found — line 38 is below the recorded bucket, so the + * window must reach down a full bucket, not just by one line. */ function bucketForLine(recordedLine, totalLines) { if (!recordedLine || recordedLine <= 0) { return { start: 1, end: totalLines }; } const bucket = Math.floor(recordedLine / FINGERPRINT_BUCKET) * FINGERPRINT_BUCKET; - return { start: Math.max(1, bucket), end: bucket + FINGERPRINT_BUCKET }; + return { + start: Math.max(1, bucket - FINGERPRINT_BUCKET), + end: bucket + FINGERPRINT_BUCKET, + }; } /** @@ -200,25 +221,37 @@ function classifyByAnchor(finding, repoRoot, readLinesFn) { } const recordedLine = Number(finding.line_number) || 0; const { start, end } = bucketForLine(recordedLine, lines.length); - const matchedLine = findAnchorInBucket(lines, anchor, start, end); + const matchedLine = findAnchorInBucket(lines, anchor.regex, start, end); if (matchedLine === null) { + // ve-001: a MISS only supports `verified` when the anchor is a reliable + // code identifier (real `symbol`). A prose-derived anchor's lead token + // is not expected to appear in source, so its absence proves nothing — + // route to `unverifiable` rather than declaring the fix landed. + if (!anchor.fromSymbol) { + return { + classification: 'unverifiable', + evidence: `no code-grade anchor; description token /${anchor.regex.source}/ is prose and absent at ${finding.file_path}:${start}-${end}, cannot prove the fix landed`, + reason: 'prose_anchor_miss', + matchedLine: null, + }; + } return { classification: 'verified', - evidence: `anchor /${anchor.source}/ no longer present at ${finding.file_path}:${start}-${end}`, + evidence: `anchor /${anchor.regex.source}/ no longer present at ${finding.file_path}:${start}-${end}`, matchedLine: null, }; } if (recordedLine > 0 && Math.abs(matchedLine - recordedLine) <= EXACT_LINE_TOLERANCE) { return { classification: 'claimed-but-still-present', - evidence: `anchor /${anchor.source}/ still at ${finding.file_path}:${matchedLine} (recorded line ${recordedLine}); fix never landed`, + evidence: `anchor /${anchor.regex.source}/ still at ${finding.file_path}:${matchedLine} (recorded line ${recordedLine}); fix never landed`, matchedLine, }; } return { classification: 'regressed', - evidence: `anchor /${anchor.source}/ reappeared at ${finding.file_path}:${matchedLine} (recorded line ${recordedLine || 'unspecified'}); looks reverted within the same bucket`, + evidence: `anchor /${anchor.regex.source}/ reappeared at ${finding.file_path}:${matchedLine} (recorded line ${recordedLine || 'unspecified'}); looks reverted within the same bucket`, matchedLine, }; } @@ -265,20 +298,20 @@ function classifyByCrossRef(crossRef, repoRoot, readLinesFn) { } const recordedLine = Number(crossRef.line) || 0; const { start, end } = bucketForLine(recordedLine, lines.length); - const matchedLine = findAnchorInBucket(lines, anchor, start, end); + const matchedLine = findAnchorInBucket(lines, anchor.regex, start, end); if (matchedLine !== null) { return { applicable: true, classification: 'verified', - evidence: `cross_ref anchor /${anchor.source}/ landed at ${crossRef.file}:${matchedLine}; consumer-side fix verified`, + evidence: `cross_ref anchor /${anchor.regex.source}/ landed at ${crossRef.file}:${matchedLine}; consumer-side fix verified`, matchedLine, }; } return { applicable: true, classification: 'verified-no-cross-ref-anchor', - evidence: `cross_ref anchor /${anchor.source}/ not present at ${crossRef.file}:${start}-${end}; falling through to primary anchor`, + evidence: `cross_ref anchor /${anchor.regex.source}/ not present at ${crossRef.file}:${start}-${end}; falling through to primary anchor`, }; } diff --git a/packages/dogfood-swarm/lib/verify-fixed.js b/packages/dogfood-swarm/lib/verify-fixed.js index 3db07ef..8a05065 100644 --- a/packages/dogfood-swarm/lib/verify-fixed.js +++ b/packages/dogfood-swarm/lib/verify-fixed.js @@ -143,6 +143,14 @@ function resolveFilePath(repoRoot, filePath) { * If neither produces a usable anchor, we return null and the finding * classifies as `unverifiable`. * + * Returns `{ regex, fromSymbol }`. `fromSymbol` records whether the anchor + * came from a reliable code identifier (the captured `symbol`) or from an + * unreliable prose token of the description. The caller uses this to decide + * what an anchor MISS means: a missing code identifier means the fix landed + * (`verified`), but a missing prose token proves nothing (route to + * `unverifiable`) — the description's lead word ("Race", "Unbounded") has no + * reason to appear in source, so its absence is not evidence (ve-001). + * * Why prefer symbol: it survives prose rewordings the way fingerprint.js * already trusts. Description tokens are a fallback for findings that * never recorded a symbol (legacy waves; security findings without an @@ -151,7 +159,7 @@ function resolveFilePath(repoRoot, filePath) { function buildAnchorRegex(finding) { const symbol = (finding.symbol || '').trim(); if (symbol && /^[A-Za-z_][\w$]*$/.test(symbol)) { - return new RegExp(`\\b${escapeRegex(symbol)}\\b`); + return { regex: new RegExp(`\\b${escapeRegex(symbol)}\\b`), fromSymbol: true }; } // Description fallback — first identifier-like word with at least 4 @@ -160,7 +168,7 @@ function buildAnchorRegex(finding) { const desc = String(finding.description || ''); const match = desc.match(/\b([A-Za-z_][\w$]{3,})\b/); if (match) { - return new RegExp(`\\b${escapeRegex(match[1])}\\b`); + return { regex: new RegExp(`\\b${escapeRegex(match[1])}\\b`), fromSymbol: false }; } return null; @@ -226,14 +234,18 @@ export function classifyFixedFinding(finding, repoRoot, opts = {}) { const recordedLine = Number(finding.line_number) || 0; // Bucket window — matches the fingerprint normalizeSpan() granularity. - // We scan the bucket the recorded line falls into PLUS one line of - // overlap on each side so a finding recorded at line 19 doesn't miss - // anchors that landed at line 21 (different fingerprint bucket but - // operationally adjacent). + // We scan the bucket the recorded line falls into PLUS one full adjacent + // bucket in EACH direction, so a still-present symbol that drifted into a + // neighbouring bucket is still found. The window must be SYMMETRIC: an + // upward-only window let a symbol that drifted into the adjacent LOWER + // bucket escape the search and misclassify as `verified` (ve-003). E.g. a + // finding recorded at line 42 (bucket [40,50]) whose symbol drifted to + // line 38 must still match — line 38 is a full bucket below the recorded + // line, so the window reaches down by FINGERPRINT_BUCKET, not one line. let bucketStart, bucketEnd; if (recordedLine > 0) { const bucket = Math.floor(recordedLine / FINGERPRINT_BUCKET) * FINGERPRINT_BUCKET; - bucketStart = Math.max(1, bucket); + bucketStart = Math.max(1, bucket - FINGERPRINT_BUCKET); bucketEnd = bucket + FINGERPRINT_BUCKET; } else { // No line recorded: scan the whole file. If anchor exists anywhere, @@ -247,16 +259,26 @@ export function classifyFixedFinding(finding, repoRoot, opts = {}) { let matchedLine = null; for (let lineNo = bucketStart; lineNo <= Math.min(bucketEnd, lines.length); lineNo++) { const text = lines[lineNo - 1]; - if (typeof text === 'string' && anchor.test(text)) { + if (typeof text === 'string' && anchor.regex.test(text)) { matchedLine = lineNo; break; } } if (matchedLine === null) { + // ve-001: only a missing code-identifier anchor (real `symbol`) proves + // the fix landed. A missing prose token from the description proves + // nothing — its lead word was never expected in source — so route a + // prose-anchor miss to `unverifiable` rather than `verified`. + if (!anchor.fromSymbol) { + return { + classification: 'unverifiable', + evidence: `no code-grade anchor; description token /${anchor.regex.source}/ is prose and absent at ${finding.file_path}:${bucketStart}-${bucketEnd}, cannot prove the fix landed`, + }; + } return { classification: 'verified', - evidence: `anchor /${anchor.source}/ no longer present at ${finding.file_path}:${bucketStart}-${bucketEnd}`, + evidence: `anchor /${anchor.regex.source}/ no longer present at ${finding.file_path}:${bucketStart}-${bucketEnd}`, }; } @@ -265,13 +287,13 @@ export function classifyFixedFinding(finding, repoRoot, opts = {}) { if (recordedLine > 0 && Math.abs(matchedLine - recordedLine) <= EXACT_LINE_TOLERANCE) { return { classification: 'claimed-but-still-present', - evidence: `anchor /${anchor.source}/ still at ${finding.file_path}:${matchedLine} (recorded line ${recordedLine}); fix never landed`, + evidence: `anchor /${anchor.regex.source}/ still at ${finding.file_path}:${matchedLine} (recorded line ${recordedLine}); fix never landed`, }; } return { classification: 'regressed', - evidence: `anchor /${anchor.source}/ reappeared at ${finding.file_path}:${matchedLine} (recorded line ${recordedLine || 'unspecified'}); looks reverted within the same bucket`, + evidence: `anchor /${anchor.regex.source}/ reappeared at ${finding.file_path}:${matchedLine} (recorded line ${recordedLine || 'unspecified'}); looks reverted within the same bucket`, }; } diff --git a/packages/dogfood-swarm/lib/verify/adapters/node.js b/packages/dogfood-swarm/lib/verify/adapters/node.js index baad21e..51ddc55 100644 --- a/packages/dogfood-swarm/lib/verify/adapters/node.js +++ b/packages/dogfood-swarm/lib/verify/adapters/node.js @@ -96,7 +96,29 @@ function commands(overrides = {}) { function run(repoPath, overrides) { const steps = commands(overrides); - return runSteps(repoPath, steps, { continueOnError: true }); + const result = runSteps(repoPath, steps, { continueOnError: true }); + + // ve-004: `npm test --if-present` exits 0 and runs nothing when the repo + // has no `test` script, which is indistinguishable from a real pass at the + // wave gate. Detect the no-test case and refuse to report it as a verified + // pass: the test step exists but exercised nothing, so there is no positive + // evidence the fix works. Only downgrade when the operator did NOT supply + // an explicit test override (an override means they deliberately chose the + // test command, and `tests_ran` already tracks whether it produced output). + const usingDefaultTest = !(overrides && overrides.test); + if (usingDefaultTest && result.verdict === 'pass' && !result.tests_ran) { + const { evidence } = probe(repoPath); + if (evidence && evidence.hasTest === false) { + return { + ...result, + verdict: 'no_tests', + no_tests: true, + reason: 'no `test` script — `npm test --if-present` ran zero tests; not a verified pass', + }; + } + } + + return result; } export const nodeAdapter = { probe, commands, run }; diff --git a/packages/dogfood-swarm/lib/verify/registry.js b/packages/dogfood-swarm/lib/verify/registry.js index 17370ba..5025533 100644 --- a/packages/dogfood-swarm/lib/verify/registry.js +++ b/packages/dogfood-swarm/lib/verify/registry.js @@ -20,6 +20,25 @@ const ADAPTERS = new Map([ ['rust', rustAdapter], ]); +/** + * Deterministic tie-break priority for equal-confidence probes (ve-p-006). + * + * A polyglot repo can legitimately score equally under two adapters (e.g. a + * Rust core with a Node tooling layer). Sorting by score alone left the winner + * to the Map's insertion order — an implicit, undocumented precedence that + * grows more fragile as adapters are added. This makes the precedence explicit + * and testable: HIGHER wins on a score tie. The ordering reflects marker + * exclusivity — `Cargo.toml` is a near-certain Rust signal, `pyproject.toml` + * is Python-specific, and `package.json` is the most common file to appear + * incidentally in a non-Node repo (build tooling, JS assets), so it loses ties. + * An operator can always force a choice with `--adapter`. + */ +const ADAPTER_TIE_BREAK = new Map([ + ['rust', 3], + ['python', 2], + ['node', 1], +]); + /** * Probe all adapters and rank by score. * @@ -36,7 +55,14 @@ export function probeAll(repoPath) { results.push({ name, score: 0, reason: `Probe error: ${e.message}`, evidence: {} }); } } - return results.sort((a, b) => b.score - a.score); + // ve-p-006: stable, documented tie-break on equal score. Without it, the + // winner of a score tie was whatever the engine's stable sort left first + // given Map insertion order — a latent surprise as adapters grow. + return results.sort( + (a, b) => + b.score - a.score || + (ADAPTER_TIE_BREAK.get(b.name) ?? 0) - (ADAPTER_TIE_BREAK.get(a.name) ?? 0) + ); } /** diff --git a/packages/dogfood-swarm/lib/verify/runner.js b/packages/dogfood-swarm/lib/verify/runner.js index 968196d..304daa1 100644 --- a/packages/dogfood-swarm/lib/verify/runner.js +++ b/packages/dogfood-swarm/lib/verify/runner.js @@ -7,32 +7,68 @@ import { execFileSync } from 'node:child_process'; +/** + * Per-step wall-clock budget. A step that exceeds this is killed and tagged + * `timed_out` (ve-p-005) rather than reported as an ordinary fast failure. + */ +const STEP_TIMEOUT_MS = 300000; // 5 min per step + +/** + * Per-stream output cap (ve-p-007). Bounds a single step's stdout/stderr so a + * huge test log cannot bloat the persisted verification_receipts row. Defined + * once and referenced by both truncate sites; the per-receipt implied ceiling + * is ~4 steps × (stdout + stderr) = ~64 KB. Raising this single-sources the + * blast radius — see bounded-json-read.js's MAX_AGENT_OUTPUT_BYTES for the + * same pattern. + */ +const MAX_STEP_OUTPUT_CHARS = 8000; + +/** + * Recognizes the shell's "executable not found" message across platforms. + * With `shell: true` a missing `step.cmd` does NOT surface as an ENOENT error + * object — the shell itself runs and exits non-zero with one of these strings + * on stderr (verified empirically per ve-p-001). The ENOENT object only + * appears on the rare `shell: false` path, handled separately in the catch. + */ +const TOOL_NOT_FOUND_STDERR = /is not recognized as an internal or external command|: command not found|: not found/i; + /** * Run a single verification step. * - * Uses `execFileSync` argv-array form to mirror the v1.2.0 F-W1-BACK-003 - * doctrine (`packages/dogfood-swarm/lib/worktree.js`, - * `packages/dogfood-swarm/lib/domains.js`): callers pass `step.cmd` + - * `step.args` as a structured pair so a future adapter author who lands a - * user-influenced `step.args` cannot re-introduce shell metacharacter - * interpretation in the argument vector. Current call sites - * (`adapters/{node,python,rust}.js`) all pass hardcoded safe args, so this - * is defense-in-depth — but it keeps the doctrine consistent across the - * package. + * Uses `execFileSync` with the `(step.cmd, step.args[])` argv-array form to + * mirror the v1.2.0 F-W1-BACK-003 doctrine + * (`packages/dogfood-swarm/lib/worktree.js`, + * `packages/dogfood-swarm/lib/domains.js`) and keep the import + call shape + * consistent across the package. * - * `shell: true` is retained because production adapters invoke `npm`/`npx` - * (Windows `.cmd` wrappers that need PATHEXT resolution from a shell). The - * argv-array shape is still the load-bearing security signal: it forces a - * future contributor to think in terms of `(cmd, args[])` rather than string - * concatenation, and it keeps this file's import + call shape identical to - * the worktree.js doctrine. + * SECURITY — the actual guarantee (ve-006). `shell: true` is retained + * because production adapters invoke `npm`/`npx` (Windows `.cmd` wrappers + * that need PATHEXT resolution from a shell). With `shell: true`, Node joins + * `file` + `args` into a single string and hands it to the shell, so the + * argv array does NOT neutralize shell metacharacters: every arg IS subject + * to shell interpretation. This code path is safe ONLY because every live + * `step.args` is a hardcoded literal (verified across + * `adapters/{node,python,rust}.js` and the commandOverrides callers); the + * one piece of untrusted, target-repo-influenced data is `cwd` (repoPath), + * which never reaches the command string. * - * @param {string} repoPath — cwd for the command - * @param {object} step — { name: string, cmd: string, args?: string[], optional?: boolean } + * Therefore, before landing any `step.cmd`/`step.args` value derived from + * target-repo or user input, you MUST either drop `shell: true` (and resolve + * the npm/npx `.cmd` shim explicitly on win32) or shell-escape the value. + * The argv-array shape is a readability/consistency convention here, not a + * sanitizer — do not treat it as one. + * + * @param {string} repoPath — cwd for the command (the only untrusted input; + * it is passed as `cwd`, never concatenated into the command string) + * @param {object} step — { name, cmd, args?, optional?, timeoutMs? }. `timeoutMs` + * overrides the default per-step budget (ve-p-005) so build-heavy repos + * (large Rust workspaces) can be given more room without a misleading + * "failure" that is really "didn't finish in 5 min". * @returns {object} — StepResult */ export function runStep(repoPath, step) { const cmdArgs = step.args || []; + const timeoutMs = step.timeoutMs ?? STEP_TIMEOUT_MS; // `command` is the human-readable display string returned to callers and // asserted by callers/tests; it is NOT what executes. `execFileSync` // receives `step.cmd` + the argv array separately below. @@ -44,30 +80,64 @@ export function runStep(repoPath, step) { cwd: repoPath, encoding: 'utf-8', stdio: ['pipe', 'pipe', 'pipe'], - timeout: 300000, // 5 min per step + timeout: timeoutMs, env: { ...process.env, FORCE_COLOR: '0', NO_COLOR: '1' }, shell: true, }); + const out = truncate(stdout, MAX_STEP_OUTPUT_CHARS); return { name: step.name, command: fullCmd, exit_code: 0, passed: true, duration_ms: Date.now() - start, - stdout: truncate(stdout, 8000), + stdout: out.text, stderr: '', + truncated: out.truncated, optional: !!step.optional, }; } catch (e) { + const duration_ms = Date.now() - start; + const stderrRaw = e.stderr || ''; + + // ve-p-001: a MISSING build tool on PATH is an EXPECTED operating + // condition (we audit arbitrary external repos on heterogeneous hosts), + // not a real fix failure. Distinguish it so runSteps can degrade to a + // `tool_missing` verdict instead of a misleading FAIL. ENOENT is the + // shell:false shape; the stderr regex catches the shell:true shape, where + // the shell runs and reports "not recognized"/"command not found" itself. + const toolMissing = e.code === 'ENOENT' || TOOL_NOT_FOUND_STDERR.test(stderrRaw); + + // ve-p-005: a 5-min hang must be distinguishable from a fast exit-1 fail. + // Under shell:true, Node's SIGTERM hits the wrapping shell, not the real + // grandchild, so e.killed/e.signal are erased — the reliable signal is the + // elapsed wall clock reaching the budget. e.signal is still checked first + // for the shell:false future where Node sets it on the real child. + const timedOut = !toolMissing && + (e.signal === 'SIGTERM' || e.killed === true || + duration_ms >= timeoutMs - 50); + + const stdout = truncate(e.stdout || '', MAX_STEP_OUTPUT_CHARS); + const stderr = truncate(stderrRaw, MAX_STEP_OUTPUT_CHARS); return { name: step.name, command: fullCmd, - exit_code: e.status ?? 1, + // -127 is the conventional "command not found" exit; use it as a sentinel + // so the persisted exit_code itself flags the tool-missing case. + exit_code: toolMissing ? (e.status ?? -127) : (e.status ?? 1), passed: false, - duration_ms: Date.now() - start, - stdout: truncate(e.stdout || '', 8000), - stderr: truncate(e.stderr || '', 8000), + duration_ms, + stdout: stdout.text, + stderr: stderr.text, + truncated: stdout.truncated || stderr.truncated, + tool_missing: toolMissing, + timed_out: timedOut, + reason: toolMissing + ? `tool \`${step.cmd}\` not found on PATH` + : timedOut + ? `step \`${step.name}\` timed out after ${timeoutMs}ms` + : undefined, optional: !!step.optional, }; } @@ -81,7 +151,11 @@ export function runStep(repoPath, step) { * @param {Array} steps * @param {object} [opts] * @param {boolean} [opts.continueOnError] — keep going after required step failure - * @returns {object} — { steps: StepResult[], verdict, duration_ms, test_count? } + * @returns {object} — { steps: StepResult[], verdict, duration_ms, test_count, + * tests_ran, timed_out, truncated, reason? }. `verdict` is one of + * `pass | fail | skip | tool_missing`; adapters may further refine `pass` to + * `no_tests`. `reason` is present (a human-readable string) on every non-pass + * verdict the runner originates, absent on a plain `pass`. */ export function runSteps(repoPath, steps, opts = {}) { const results = []; @@ -107,14 +181,64 @@ export function runSteps(repoPath, steps, opts = {}) { } const requiredResults = results.filter(r => !r.optional); - const allPassed = requiredResults.every(r => r.passed); + const requiredFailures = requiredResults.filter(r => !r.passed); - return { + // ve-005: an empty required-step set must NOT be an automatic `pass`. + // `[].every(...)` is vacuously true, so without this guard a run where + // every step was optional, skipped, or filtered away would report `pass` + // and advance the wave to `verified` having required nothing. Mirror the + // registry's `verdict: 'skip'` shape so callers/status already handle it. + // + // ve-p-001: a required step that failed SOLELY because its tool is missing + // from PATH degrades to a distinct `tool_missing` verdict, not `fail` — the + // wave correctly stays un-advanced WITHOUT lying that the code broke. A real + // (non-tool-missing) required failure dominates: it is the more actionable, + // honest signal, so `fail` wins when both are present. + let verdict; + let reason; + if (requiredResults.length === 0) { + verdict = 'skip'; + reason = 'no required steps ran — nothing to verify in this environment'; + } else if (requiredFailures.length === 0) { + verdict = 'pass'; + } else if (requiredFailures.some(r => !r.tool_missing)) { + verdict = 'fail'; + const timedOut = requiredFailures.find(r => r.timed_out); + if (timedOut) reason = timedOut.reason; + } else { + // Every required failure was a missing tool. The display `command` string + // begins with the executable name, so its first token names the tool. + verdict = 'tool_missing'; + const missing = [...new Set(requiredFailures.map(r => r.command?.split(' ')[0]))] + .filter(Boolean); + reason = `required tool(s) not found on PATH (${missing.join(', ')}) — cannot verify in this environment`; + } + + // ve-004: distinguish "tests actually ran" from "the test step was a + // no-op". `npm test --if-present` exits 0 with no `test` script, which + // looks identical to a real pass at the wave gate. A `test` step that + // passed but yielded no recognizable test count produced nothing we can + // call a verified pass — surface `tests_ran: false` so the caller can + // treat it as not-verified instead of a clean PASS. + const testStep = results.find(r => r.name === 'test'); + const testsRan = testStep ? (testStep.passed && testCount != null) : false; + + const out = { steps: results, - verdict: allPassed ? 'pass' : 'fail', + verdict, duration_ms: Date.now() - totalStart, test_count: testCount, + tests_ran: testsRan, + // ve-p-005 / ve-p-007: aggregate the per-step signals so the display layer + // can flag a hang or a truncated log at the run level without re-walking + // every step. + timed_out: results.some(r => r.timed_out), + truncated: results.some(r => r.truncated), }; + // Only attach `reason` when there is one to report — keeps the `pass` shape + // unchanged and lets the display layer treat its presence as "explain why". + if (reason) out.reason = reason; + return out; } /** @@ -140,8 +264,17 @@ function extractTestCount(stdout) { return null; } +/** + * Bound a captured stream. Returns `{ text, truncated }` so the caller can + * surface a top-level `truncated` flag (ve-p-007) — the operator otherwise has + * no signal that the displayed/persisted log is partial. The "… (truncated)" + * marker is appended in-band as before for anyone reading the raw text. + */ function truncate(str, max) { - if (!str) return ''; - if (str.length <= max) return str; - return str.slice(0, max) + `\n... (truncated, ${str.length} total chars)`; + if (!str) return { text: '', truncated: false }; + if (str.length <= max) return { text: str, truncated: false }; + return { + text: str.slice(0, max) + `\n... (truncated, ${str.length} total chars)`, + truncated: true, + }; } diff --git a/packages/dogfood-swarm/lib/worktree.js b/packages/dogfood-swarm/lib/worktree.js index f06b840..57dc041 100644 --- a/packages/dogfood-swarm/lib/worktree.js +++ b/packages/dogfood-swarm/lib/worktree.js @@ -18,6 +18,7 @@ import { execSync, execFileSync } from 'node:child_process'; import { existsSync, mkdirSync, readFileSync, appendFileSync, rmSync } from 'node:fs'; import { join } from 'node:path'; +import { logStage } from './log-stage.js'; /** * Validate that a domain name is safe to interpolate into both a filesystem @@ -78,8 +79,25 @@ export function createWorktree(repoPath, opts) { // Remove stale worktree if it exists. Argv-array form bypasses the shell // so wtDir cannot trigger metacharacter interpretation. + // + // sm-p-003: this runs ONLY when wtDir is present and the `--force` remove + // FAILED, yet the `worktree add` below will then fail too. Bare-swallowing + // the precursor failure left the operator with only the downstream `add` + // error and no breadcrumb that cleanup failed first. Log to the NDJSON + // stream (no control-flow change — still best-effort) so the precursor is + // greppable when the subsequent add blows up. Same forensic channel D3B-012 + // gave applyTimeoutPolicy's NULL started_at skip. if (existsSync(wtDir)) { - try { gitArgs(repoPath, ['worktree', 'remove', wtDir, '--force']); } catch { /* */ } + try { + gitArgs(repoPath, ['worktree', 'remove', wtDir, '--force']); + } catch (e) { + logStage('worktree_stale_remove_failed', { + component: 'dogfood-swarm', + wtDir, + branch, + err: e.message, + }); + } } // Delete stale branch if it exists. @@ -91,34 +109,6 @@ export function createWorktree(repoPath, opts) { return { worktreePath: wtDir, branch }; } -/** - * Get the diff (changed files) from a worktree relative to its branch point. - * - * @param {string} repoPath — main repo path - * @param {string} worktreePath - * @returns {string[]} — list of changed file paths (relative to repo root) - */ -export function getWorktreeDiff(repoPath, worktreePath) { - try { - const output = git(worktreePath, 'diff --name-only HEAD~1..HEAD'); - return output.trim().split('\n').filter(Boolean); - } catch { - // No commits yet — check for uncommitted changes - const output = git(worktreePath, 'diff --name-only HEAD'); - return output.trim().split('\n').filter(Boolean); - } -} - -/** - * Get all uncommitted changes in a worktree. - */ -export function getWorktreeChanges(worktreePath) { - const staged = git(worktreePath, 'diff --name-only --cached').trim().split('\n').filter(Boolean); - const unstaged = git(worktreePath, 'diff --name-only').trim().split('\n').filter(Boolean); - const untracked = git(worktreePath, 'ls-files --others --exclude-standard').trim().split('\n').filter(Boolean); - return { staged, unstaged, untracked, all: [...new Set([...staged, ...unstaged, ...untracked])] }; -} - /** * Merge a worktree branch back into the main branch. * @@ -197,8 +187,18 @@ export function cleanupAllWorktrees(repoPath) { for (const wt of worktrees) { removeWorktree(repoPath, wt.path, wt.branch); } - // Prune stale worktree references - try { git(repoPath, 'worktree prune'); } catch { /* */ } + // Prune stale worktree references. sm-p-003: best-effort, but a swallowed + // prune failure left no breadcrumb at all; log it so a stuck prune is + // greppable. Control flow is unchanged — cleanup is advisory. + try { + git(repoPath, 'worktree prune'); + } catch (e) { + logStage('worktree_prune_failed', { + component: 'dogfood-swarm', + repoPath, + err: e.message, + }); + } return worktrees.length; } diff --git a/packages/dogfood-swarm/meta-amendA-cli-commands.test.js b/packages/dogfood-swarm/meta-amendA-cli-commands.test.js new file mode 100644 index 0000000..7000f8f --- /dev/null +++ b/packages/dogfood-swarm/meta-amendA-cli-commands.test.js @@ -0,0 +1,518 @@ +/** + * meta-amendA-cli-commands.test.js — regression coverage for the cli-commands + * amend agent (dogfood swarm-1780390764-7dab, Stage A). + * + * One co-located test file pinning the five CONFIRMED findings this agent + * fixed. Each describe block names its finding id and the failure mode it + * locks down. These tests were written to FAIL against the pre-fix code and + * PASS after the surgical fix (Pattern #10 proof-gate discipline). + * + * cli-001 (HIGH) — commands/rewind.js: DB abort scoped by run, not global. + * ve-002 (HIGH) — cli.js parseVerifyFlags: --threshold space-form NaN guard. + * cli-002 (MED) — cli.js cmdApprove: no duplicate `approved` events on re-run. + * cli-003 (LOW) — cli.js: `--reason --flag` is a MISSING reason, not '--flag'. + * +1 (MED) — commands/status.js: collect-crash half-state breadcrumb. + * + * CORDONED-TEST DISCIPLINE (inherited from rewind.test.js): + * The cli-001 rewind tests run `git reset --hard` + DB writes. Every such + * test creates a fresh mkdtemp fixture git repo + fixture control-plane.db + * and passes both explicitly. No test references process.cwd(), the live + * swarms DB, or the real repo tree. A HEAD-guard sentinel (last describe + * block) asserts the actual repo HEAD is unchanged after the whole suite. + * + * EXTENSION (2026-06-02): the cli-003 block below also pins cmdAdvance's + * `--override --reason` parser (cli.js ~912) — the FOURTH `--reason` site. The + * original amend agent declined to file it ("no irreversible side effect"), but + * a `--`-prefixed value still pollutes the promotion/override audit record, so + * the same guard was applied and is regression-pinned by the two `advance` tests. + */ + +import { describe, it, before } from 'node:test'; +import assert from 'node:assert/strict'; +import { execFileSync, spawnSync } from 'node:child_process'; +import { mkdtempSync, rmSync, writeFileSync } from 'node:fs'; +import { join, dirname, resolve as resolvePath } from 'node:path'; +import { tmpdir } from 'node:os'; +import { fileURLToPath } from 'node:url'; + +import { openDb, closeDb } from './db/connection.js'; +import { rewind } from './commands/rewind.js'; +import { parseVerifyFlags } from './cli.js'; +import { status } from './commands/status.js'; + +const __filename = fileURLToPath(import.meta.url); +const __dirname = dirname(__filename); +const CLI_PATH = join(__dirname, 'cli.js'); + +// Cordoned HEAD guard — record the real repo HEAD by walking up from this +// file (no live-repo path literal). Asserted unchanged at the end. +const ACTUAL_REPO_ROOT = resolvePath(__dirname, '..', '..'); +let ACTUAL_HEAD_AT_SUITE_START = null; +function recordActualHead() { + try { + return execFileSync('git', ['rev-parse', '--verify', 'HEAD^{commit}'], { + cwd: ACTUAL_REPO_ROOT, encoding: 'utf-8', stdio: ['ignore', 'pipe', 'pipe'], + }).trim(); + } catch { return null; } +} +before(() => { ACTUAL_HEAD_AT_SUITE_START = recordActualHead(); }); + +function teardown(...dirs) { + for (const d of dirs) { + if (d) { try { rmSync(d, { recursive: true, force: true }); } catch { /* */ } } + } +} + +// ────────────────────────────────────────────────────────────── +// cli-001 — rewind DB abort is scoped to the run owning the cwd +// ────────────────────────────────────────────────────────────── + +/** + * Build a fixture git repo for run A (the tree rewind resets) plus a shared + * fixture control-plane.db holding TWO runs: + * - run A: local_path = the fixture git tree, with an in-flight wave. + * - run B: local_path = a SEPARATE temp dir (NOT a git repo, never reset), + * with its own in-flight wave + agent_runs. + * Rewinding run A must NOT touch run B's live rows. + */ +function setupTwoRunFixture() { + const treeA = mkdtempSync(join(tmpdir(), 'meta-rewind-A-')); + const dirB = mkdtempSync(join(tmpdir(), 'meta-rewind-B-')); + + execFileSync('git', ['init', '-q', '-b', 'main', treeA], { stdio: ['ignore', 'pipe', 'pipe'] }); + execFileSync('git', ['config', 'user.email', 'fixture@example.test'], { cwd: treeA }); + execFileSync('git', ['config', 'user.name', 'Fixture'], { cwd: treeA }); + execFileSync('git', ['config', 'commit.gpgsign', 'false'], { cwd: treeA }); + + writeFileSync(join(treeA, '.gitignore'), '*.db\n*.db-wal\n*.db-shm\n', 'utf-8'); + writeFileSync(join(treeA, 'README.md'), '# fixture A\n', 'utf-8'); + const env = { ...process.env, GIT_AUTHOR_NAME: 'Fixture', GIT_AUTHOR_EMAIL: 'fixture@example.test', GIT_COMMITTER_NAME: 'Fixture', GIT_COMMITTER_EMAIL: 'fixture@example.test' }; + execFileSync('git', ['add', '.'], { cwd: treeA }); + execFileSync('git', ['commit', '-q', '-m', 'A: initial'], { cwd: treeA, env }); + execFileSync('git', ['tag', 'swarm-save-A-1'], { cwd: treeA }); + writeFileSync(join(treeA, 'README.md'), '# fixture A\n\nsecond\n', 'utf-8'); + execFileSync('git', ['add', '.'], { cwd: treeA }); + execFileSync('git', ['commit', '-q', '-m', 'A: second'], { cwd: treeA, env }); + + const dbPath = join(treeA, 'control-plane.db'); + const db = openDb(dbPath); + + // Run A — owns treeA. + db.prepare('INSERT INTO runs (id, repo, local_path, commit_sha) VALUES (?, ?, ?, ?)') + .run('runA', 'org/a', treeA, 'a'.repeat(40)); + db.prepare(`INSERT INTO domains (run_id, name, globs, ownership_class, frozen) VALUES ('runA','backend','["src/**"]','owned',1)`).run(); + const wA = db.prepare(`INSERT INTO waves (run_id, phase, wave_number, status, domain_snapshot_id) VALUES ('runA','health-audit-a',1,'dispatched','snapA')`).run(); + const waveA = Number(wA.lastInsertRowid); + const domA = db.prepare(`SELECT id FROM domains WHERE run_id='runA'`).get().id; + const arA = db.prepare(`INSERT INTO agent_runs (wave_id, domain_id, status) VALUES (?, ?, 'dispatched')`).run(waveA, domA); + const agentA = Number(arA.lastInsertRowid); + + // Run B — owns a DIFFERENT dir, in-flight, must be left untouched. + db.prepare('INSERT INTO runs (id, repo, local_path, commit_sha) VALUES (?, ?, ?, ?)') + .run('runB', 'org/b', dirB, 'b'.repeat(40)); + db.prepare(`INSERT INTO domains (run_id, name, globs, ownership_class, frozen) VALUES ('runB','backend','["src/**"]','owned',1)`).run(); + const wB = db.prepare(`INSERT INTO waves (run_id, phase, wave_number, status, domain_snapshot_id) VALUES ('runB','health-audit-a',1,'dispatched','snapB')`).run(); + const waveB = Number(wB.lastInsertRowid); + const domB = db.prepare(`SELECT id FROM domains WHERE run_id='runB'`).get().id; + const arB1 = db.prepare(`INSERT INTO agent_runs (wave_id, domain_id, status) VALUES (?, ?, 'dispatched')`).run(waveB, domB); + const arB2 = db.prepare(`INSERT INTO agent_runs (wave_id, domain_id, status) VALUES (?, ?, 'invalid_output')`).run(waveB, domB); + const agentB1 = Number(arB1.lastInsertRowid); + const agentB2 = Number(arB2.lastInsertRowid); + + closeDb(dbPath); + return { treeA, dirB, dbPath, waveA, agentA, waveB, agentB1, agentB2 }; +} + +describe('cli-001: rewind aborts only the run owning the cwd', () => { + it('dry-run plan lists only run A, never run B', () => { + const fx = setupTwoRunFixture(); + try { + const r = rewind({ + savePointTag: 'swarm-save-A-1', + reason: 'scope check', + cwd: fx.treeA, + dbPath: fx.dbPath, + }); + // Only run A's single wave + agent are in the plan. + assert.equal(r.planned_waves.length, 1); + assert.equal(r.planned_waves[0].run_id, 'runA'); + assert.equal(r.planned_agent_runs.length, 1); + // The report names the scoped run id for operator visibility. + assert.deepEqual(r.scopedRunIds, ['runA']); + // Run B's wave/agents (1 wave, 2 agents) are NOT in the plan. + assert.ok(r.planned_waves.every(w => w.run_id === 'runA')); + } finally { + teardown(fx.treeA, fx.dirB); + } + }); + + it('--apply aborts run A in-flight rows but leaves run B untouched', () => { + const fx = setupTwoRunFixture(); + try { + const r = rewind({ + savePointTag: 'swarm-save-A-1', + reason: 'apply scope check', + cwd: fx.treeA, + dbPath: fx.dbPath, + apply: true, + }); + assert.equal(r.git_reset_done, true); + assert.equal(r.db_transaction_done, true); + + const db = openDb(fx.dbPath); + // Run A: wave + agent now aborted_for_rewind. + assert.equal(db.prepare('SELECT status FROM waves WHERE id = ?').get(fx.waveA).status, 'aborted_for_rewind'); + assert.equal(db.prepare('SELECT status FROM agent_runs WHERE id = ?').get(fx.agentA).status, 'aborted_for_rewind'); + + // Run B: every live row UNCHANGED — this is the cli-001 regression. + assert.equal(db.prepare('SELECT status FROM waves WHERE id = ?').get(fx.waveB).status, 'dispatched', + 'cli-001 regression: rewinding run A drove run B\'s wave to aborted_for_rewind'); + assert.equal(db.prepare('SELECT status FROM agent_runs WHERE id = ?').get(fx.agentB1).status, 'dispatched', + 'cli-001 regression: run B agent_run was aborted by run A rewind'); + assert.equal(db.prepare('SELECT status FROM agent_runs WHERE id = ?').get(fx.agentB2).status, 'invalid_output', + 'cli-001 regression: run B blocked agent_run was aborted by run A rewind'); + + // Run B left NO state-event rows from this rewind. + const bWaveEvents = db.prepare('SELECT COUNT(*) AS n FROM wave_state_events WHERE wave_id = ?').get(fx.waveB).n; + assert.equal(bWaveEvents, 0, 'run B must have no rewind audit rows'); + closeDb(fx.dbPath); + } finally { + teardown(fx.treeA, fx.dirB); + } + }); + + it('preserved counts are scoped to the cwd run (do not count run B terminal rows)', () => { + const fx = setupTwoRunFixture(); + try { + // Give run B a terminal (complete) agent so an UNSCOPED preserved count + // would be inflated by it. + const db = openDb(fx.dbPath); + const domB = db.prepare(`SELECT id FROM domains WHERE run_id='runB'`).get().id; + db.prepare(`INSERT INTO agent_runs (wave_id, domain_id, status) VALUES (?, ?, 'complete')`).run(fx.waveB, domB); + closeDb(fx.dbPath); + + const r = rewind({ + savePointTag: 'swarm-save-A-1', + reason: 'preserved scope', + cwd: fx.treeA, + dbPath: fx.dbPath, + }); + // Run A has zero terminal rows; the count must be 0, not run B's 1. + assert.equal(r.preserved_agent_run_count, 0, + 'preserved count leaked a run-B terminal agent_run'); + } finally { + teardown(fx.treeA, fx.dirB); + } + }); +}); + +// ────────────────────────────────────────────────────────────── +// ve-002 — parseVerifyFlags rejects a non-numeric --threshold (space form) +// ────────────────────────────────────────────────────────────── + +describe('ve-002: parseVerifyFlags --threshold space-form guard', () => { + it('throws (does NOT yield NaN) for a non-numeric threshold', () => { + assert.throws( + () => parseVerifyFlags(['run', '--threshold', 'foo']), + (e) => e.code === 'CLI_INVALID_THRESHOLD' && /non-negative integer/.test(e.message), + 'parseVerifyFlags must reject --threshold foo, not silently produce NaN', + ); + }); + + it('throws for a negative threshold', () => { + assert.throws( + () => parseVerifyFlags(['run', '--threshold', '-1']), + (e) => e.code === 'CLI_INVALID_THRESHOLD', + ); + }); + + it('throws for a partially-numeric threshold (parseInt would have accepted 3)', () => { + assert.throws( + () => parseVerifyFlags(['run', '--threshold', '3abc']), + (e) => e.code === 'CLI_INVALID_THRESHOLD', + ); + }); + + it('accepts a valid integer space-form threshold', () => { + const { threshold } = parseVerifyFlags(['run', '--threshold', '5']); + assert.equal(threshold, 5); + }); + + it('accepts the equals-form and bare default unchanged', () => { + assert.equal(parseVerifyFlags(['run', '--threshold=3']).threshold, 3); + assert.equal(parseVerifyFlags(['run']).threshold, 0); + }); + + it('a trailing --threshold with no value keeps the default 0 (no throw)', () => { + // args[tIdx+1] === undefined → not a typo'd value, just an omitted one. + assert.equal(parseVerifyFlags(['run', '--threshold']).threshold, 0); + }); + + it('CLI seam: `swarm verify-fixed --threshold foo` exits non-zero, not a green pass', () => { + const tmp = mkdtempSync(join(tmpdir(), 'meta-thr-')); + const dbPath = join(tmp, 'control-plane.db'); + try { + const db = openDb(dbPath); + db.prepare('INSERT INTO runs (id, repo, local_path, commit_sha) VALUES (?, ?, ?, ?)') + .run('r1', 'org/r', tmp, 'a'.repeat(40)); + closeDb(dbPath); + + const r = spawnSync(process.execPath, [CLI_PATH, 'verify-fixed', 'r1', '--threshold', 'foo'], { + encoding: 'utf-8', cwd: __dirname, env: { ...process.env, SWARM_DB: dbPath }, + }); + assert.notEqual(r.status, 0, + 'a typo\'d threshold must NOT exit 0 (that would silently disable the gate)'); + assert.match(r.stderr, /CLI_INVALID_THRESHOLD|non-negative integer/); + } finally { + teardown(tmp); + } + }); +}); + +// ────────────────────────────────────────────────────────────── +// cli-002 — cmdApprove does not duplicate `approved` events on re-run +// ────────────────────────────────────────────────────────────── + +describe('cli-002: re-running `swarm approve --all` does not double events', () => { + function setupApproveFixture() { + const tmp = mkdtempSync(join(tmpdir(), 'meta-approve-')); + const dbPath = join(tmp, 'control-plane.db'); + const db = openDb(dbPath); + db.prepare('INSERT INTO runs (id, repo, local_path, commit_sha) VALUES (?, ?, ?, ?)') + .run('r1', 'org/r', tmp, 'a'.repeat(40)); + const w = db.prepare(`INSERT INTO waves (run_id, phase, wave_number, status, domain_snapshot_id) VALUES ('r1','health-audit-a',1,'collected','snap1')`).run(); + const waveId = Number(w.lastInsertRowid); + // Two findings, both new/recurring (approvable). + db.prepare(`INSERT INTO findings (run_id, finding_id, fingerprint, severity, category, description, status, first_seen_wave) VALUES ('r1','F-001','fp1','HIGH','bug','d1','new',?)`).run(waveId); + db.prepare(`INSERT INTO findings (run_id, finding_id, fingerprint, severity, category, description, status, first_seen_wave) VALUES ('r1','F-002','fp2','MEDIUM','bug','d2','recurring',?)`).run(waveId); + closeDb(dbPath); + return { tmp, dbPath }; + } + + function approvedEventCount(dbPath) { + const db = openDb(dbPath); + const n = db.prepare( + "SELECT COUNT(*) AS n FROM finding_events WHERE event_type = 'approved'" + ).get().n; + return n; + } + + function runApprove(dbPath) { + return spawnSync(process.execPath, [CLI_PATH, 'approve', 'r1', '--all'], { + encoding: 'utf-8', cwd: __dirname, env: { ...process.env, SWARM_DB: dbPath }, + }); + } + + it('first approve emits exactly 2 events; second approve emits 0 more', () => { + const fx = setupApproveFixture(); + try { + const r1 = runApprove(fx.dbPath); + assert.equal(r1.status, 0, `first approve failed: ${r1.stderr}`); + assert.match(r1.stdout, /Approved 2 findings/); + assert.equal(approvedEventCount(fx.dbPath), 2, 'first approve must record 2 events'); + + // Re-run: both findings already approved. The UPDATE flips 0 rows, so + // NO new approved events should be inserted (cli-002 regression: the old + // code re-selected all status='approved' rows and inserted a duplicate + // event per finding on every call). + const r2 = runApprove(fx.dbPath); + assert.equal(r2.status, 0, `second approve failed: ${r2.stderr}`); + assert.match(r2.stdout, /Approved 0 findings/); + assert.equal(approvedEventCount(fx.dbPath), 2, + 'cli-002 regression: re-running approve duplicated approved events'); + } finally { + teardown(fx.tmp); + } + }); +}); + +// ────────────────────────────────────────────────────────────── +// cli-003 — `--reason --flag` is a MISSING reason, not a swallowed flag +// ────────────────────────────────────────────────────────────── + +describe('cli-003: --reason space-parser rejects a following flag as the reason', () => { + // Run each verb against a fresh empty DB. The reason guard fires before any + // DB mutation, so an empty DB is sufficient — and proves the irreversible + // path never started (no run/wave needed). + function freshDb() { + const tmp = mkdtempSync(join(tmpdir(), 'meta-reason-')); + const dbPath = join(tmp, 'control-plane.db'); + const db = openDb(dbPath); void db; closeDb(dbPath); + return { tmp, dbPath }; + } + + function runCli(verb, args, dbPath, cwd) { + return spawnSync(process.execPath, [CLI_PATH, verb, ...args], { + encoding: 'utf-8', cwd: cwd || __dirname, env: { ...process.env, SWARM_DB: dbPath }, + }); + } + + it('revalidate `--reason --apply` errors "reason required", does not run', () => { + const fx = freshDb(); + try { + const r = runCli('revalidate', ['r1', '--reason', '--apply', '--domain=x:y.json'], fx.dbPath); + assert.equal(r.status, 1); + assert.match(r.stderr, /revalidate: --reason "" is required/); + } finally { + teardown(fx.tmp); + } + }); + + it('rewind `--reason --apply` errors "reason required", does not reset', () => { + const fx = freshDb(); + try { + // cwd is the fresh (non-git) temp dir; the reason guard fires before any + // git access, so it errors on reason, not on "not a git repo". + const r = runCli('rewind', ['swarm-save-x', '--reason', '--apply'], fx.dbPath, fx.tmp); + assert.equal(r.status, 1); + assert.match(r.stderr, /rewind: --reason "" is required/); + } finally { + teardown(fx.tmp); + } + }); + + it('redrive `--reason --apply` errors "reason required", does not redrive', () => { + const fx = freshDb(); + try { + const r = runCli('redrive', ['1', '--reason', '--apply'], fx.dbPath); + assert.equal(r.status, 1); + assert.match(r.stderr, /redrive: --reason "" is required/); + } finally { + teardown(fx.tmp); + } + }); + + it('a real reason text is still accepted (guard does not over-reject)', () => { + const fx = freshDb(); + try { + // Valid reason → guard passes; revalidate then fails on the MISSING + // --domain (a later guard), proving the reason was accepted. + const r = runCli('revalidate', ['r1', '--reason', 'genuine reason'], fx.dbPath); + assert.equal(r.status, 1); + assert.match(r.stderr, /at least one --domain=name:path is required/); + } finally { + teardown(fx.tmp); + } + }); + + // cmdAdvance's override path is the FOURTH `--reason` site (cli.js ~912). It + // differs from the three above: the override block is overridable with NO + // irreversible side effect, so the cli-001/002/003 amend agent explicitly + // declined to file it. But a `--`-prefixed value still pollutes the + // promotion/override audit record — advance.js persists override reasons as + // overrides:[{reason}] — so the same guard applies. NOTE: the finding's literal + // example `--override --reason --history` is intercepted by cmdAdvance's + // `--history` early-return (exits 0, prints promotion history) before it ever + // reaches the reason parser; these tests use `--apply` (like the three siblings + // above), a following flag that actually reaches the fixed line. + it('advance `--override --reason --apply` errors "requires --reason", does not override', () => { + const fx = freshDb(); + try { + const r = runCli('advance', ['r1', '--override', '--reason', '--apply'], fx.dbPath); + assert.equal(r.status, 1); + assert.match(r.stderr, /--override requires --reason/); + // The bug was a SILENT override carrying overrideReason='--apply'. Had the + // guard been bypassed, runAdvance → checkGates would throw "Run not found: + // r1"; its ABSENCE proves the reason guard fired before any override ran. + assert.doesNotMatch(r.stderr, /Run not found/); + } finally { + teardown(fx.tmp); + } + }); + + it('advance `--override --reason ""` is accepted (guard does not over-reject)', () => { + const fx = freshDb(); + try { + // Valid reason → guard passes; advance proceeds into runAdvance and fails + // on the MISSING run r1 (checkGates throws "Run not found"), proving the + // reason was accepted rather than rejected as a swallowed flag. + const r = runCli('advance', ['r1', '--override', '--reason', 'genuine override reason'], fx.dbPath); + assert.equal(r.status, 1); + assert.match(r.stderr, /Run not found/); + assert.doesNotMatch(r.stderr, /--override requires --reason/); + } finally { + teardown(fx.tmp); + } + }); +}); + +// ────────────────────────────────────────────────────────────── +// +1 status half-state — collect-crash breadcrumb +// ────────────────────────────────────────────────────────────── + +describe('status half-state: collect-crash breadcrumb when artifacts persisted but 0 findings', () => { + function setupWave({ withArtifact, withFinding }) { + const tmp = mkdtempSync(join(tmpdir(), 'meta-halfstate-')); + const dbPath = join(tmp, 'control-plane.db'); + const db = openDb(dbPath); + db.prepare('INSERT INTO runs (id, repo, local_path, commit_sha, status) VALUES (?, ?, ?, ?, ?)') + .run('r1', 'org/r', tmp, 'a'.repeat(40), 'health-audit-a'); + db.prepare(`INSERT INTO domains (run_id, name, globs, ownership_class, frozen) VALUES ('r1','backend','["src/**"]','owned',1)`).run(); + const domId = db.prepare(`SELECT id FROM domains WHERE run_id='r1'`).get().id; + // Wave still `dispatched`, agent `complete` — the collect-crash signature. + const w = db.prepare(`INSERT INTO waves (run_id, phase, wave_number, status, domain_snapshot_id) VALUES ('r1','health-audit-a',1,'dispatched','snap1')`).run(); + const waveId = Number(w.lastInsertRowid); + const ar = db.prepare(`INSERT INTO agent_runs (wave_id, domain_id, status) VALUES (?, ?, 'complete')`).run(waveId, domId); + const arId = Number(ar.lastInsertRowid); + if (withArtifact) { + db.prepare(`INSERT INTO artifacts (agent_run_id, artifact_type, path, content_hash) VALUES (?, 'audit_output', '/x/out.json', 'h1')`).run(arId); + } + if (withFinding) { + db.prepare(`INSERT INTO findings (run_id, finding_id, fingerprint, severity, category, description, status, first_seen_wave) VALUES ('r1','F-001','fp1','HIGH','bug','d1','new',?)`).run(waveId); + } + closeDb(dbPath); + return { tmp, dbPath, waveId }; + } + + it('artifacts present + 0 findings → READY TO COLLECT with a "collect may have failed" breadcrumb', () => { + const fx = setupWave({ withArtifact: true, withFinding: false }); + try { + const s = status({ runId: 'r1', dbPath: fx.dbPath }); + assert.equal(s.assessment.state, 'READY TO COLLECT'); + assert.match(s.assessment.nextAction, /no findings were persisted/); + assert.match(s.assessment.nextAction, /a prior `swarm collect` may have failed/); + } finally { + teardown(fx.tmp); + } + }); + + it('happy path (no artifacts yet) → plain "Run `swarm collect`" message, no breadcrumb', () => { + const fx = setupWave({ withArtifact: false, withFinding: false }); + try { + const s = status({ runId: 'r1', dbPath: fx.dbPath }); + assert.equal(s.assessment.state, 'READY TO COLLECT'); + assert.match(s.assessment.nextAction, /Run `swarm collect` to merge outputs\./); + assert.doesNotMatch(s.assessment.nextAction, /may have failed/); + } finally { + teardown(fx.tmp); + } + }); + + it('artifacts present AND findings present → no breadcrumb (a real collect ran)', () => { + const fx = setupWave({ withArtifact: true, withFinding: true }); + try { + const s = status({ runId: 'r1', dbPath: fx.dbPath }); + // Findings reference the wave, so this is not the crash signature even + // though artifacts exist. (Wave is still `dispatched` in this fixture, + // so it stays READY TO COLLECT — but WITHOUT the failure breadcrumb.) + assert.equal(s.assessment.state, 'READY TO COLLECT'); + assert.doesNotMatch(s.assessment.nextAction, /may have failed/); + } finally { + teardown(fx.tmp); + } + }); +}); + +// ────────────────────────────────────────────────────────────── +// Cordoned HEAD guard — MUST be last +// ────────────────────────────────────────────────────────────── + +describe('cordoned HEAD guard (MUST be last)', () => { + it('actual repo HEAD is unchanged after the full suite', () => { + const headNow = recordActualHead(); + assert.equal(headNow, ACTUAL_HEAD_AT_SUITE_START, + `Cordoned-test contract violated: actual repo HEAD changed from ` + + `${ACTUAL_HEAD_AT_SUITE_START} to ${headNow}. A rewind test escaped its fixture.`); + assert.ok(headNow, 'HEAD must resolve at the repo root'); + }); +}); diff --git a/packages/dogfood-swarm/meta-amendA-findings-persist.test.js b/packages/dogfood-swarm/meta-amendA-findings-persist.test.js new file mode 100644 index 0000000..82dbdd3 --- /dev/null +++ b/packages/dogfood-swarm/meta-amendA-findings-persist.test.js @@ -0,0 +1,833 @@ +/** + * meta-amendA-findings-persist.test.js — Wave A (findings-persist) amend gate. + * + * Locks the five CONFIRMED fixes from + * swarms/swarm-1780390764-7dab/wave-1/findings-persist.json: + * + * fp-002 (HIGH, MARQUEE) — within-wave fingerprint collisions no longer abort + * the wave collect. Two genuinely-distinct findings sharing a coarse key + * (same category|rule_id|path|symbol|line-bucket, different prose) used to + * collapse to one base fingerprint, both land in classified.new, and + * upsertFindings threw on UNIQUE(run_id, finding_id) / UNIQUE(run_id, + * fingerprint), rolling back EVERY finding for the wave (0 persisted). + * Fix: occurrence-index disambiguation (Part 1) salts the 2nd..Nth members; + * INSERT OR IGNORE + structured logStage (Part 2) is the never-abort safety + * net. Singletons + first-of-group keep their bare fingerprint byte-for-byte. + * + * fp-003 (HIGH) — git-touched-files porcelain parser recovers space/non-ASCII + * paths verbatim via `git status --porcelain -z` (NUL-delimited, no quoting) + * instead of mangling C-quoted octal escapes into `na/303/257ve.js`. + * + * fp-004 (LOW) — bounded-json size gate enforces the cap on bytes ACTUALLY + * read, closing the statSync→readFileSync TOCTOU window. + * + * fp-005 (LOW) — findings-digest isMain guard tolerates an undefined + * process.argv[1] (no TypeError at module-load under `node --eval`). + * + * fp-006 (LOW) — git-touched-files no longer promises a `git diff --name-only` + * cross-check the code never runs (comment-drift). + * + * Protocol-v2-lite: each gate is shaped to fail RED against pre-fix HEAD and + * GREEN after the fix. + */ + +import { describe, it, beforeEach, afterEach, before, after } from 'node:test'; +import assert from 'node:assert/strict'; +import { execFileSync } from 'node:child_process'; +import { readFileSync, writeFileSync, mkdtempSync, mkdirSync, rmSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { join, resolve } from 'node:path'; +import { fileURLToPath } from 'node:url'; + +import { openMemoryDb, openDb, closeDb } from './db/connection.js'; +import { saveDomainDraft, freezeDomains } from './lib/domains.js'; +import { dispatch } from './commands/dispatch.js'; +import { collect } from './commands/collect.js'; +import { + computeFingerprint, + extractContextSnippet, + CONTEXT_RADIUS_LINES, + disambiguateFingerprints, + classifyFindings, + buildPriorMap, + upsertFindings, +} from './lib/fingerprint.js'; +import { getActualTouchedFiles } from './lib/git-touched-files.js'; +import { readBoundedJson, BoundedJsonError } from './lib/bounded-json-read.js'; + +const RUN_ID = 'test-amendA-fp'; + +// ════════════════════════════════════════════════════════════════════ +// fp-002 — THE MARQUEE FIX: within-wave fingerprint collisions +// ════════════════════════════════════════════════════════════════════ + +describe('fp-002: within-wave fingerprint collisions never abort collect', () => { + let db; + + beforeEach(() => { + db = openMemoryDb(); + db.prepare('INSERT INTO runs (id, repo, local_path, commit_sha) VALUES (?, ?, ?, ?)') + .run(RUN_ID, 'dogfood-lab/testing-os', '/tmp/repo', 'c'.repeat(40)); + db.prepare('INSERT INTO waves (run_id, phase, wave_number) VALUES (?, ?, ?)') + .run(RUN_ID, 'health-audit-a', 1); + }); + + afterEach(() => { + try { db.close(); } catch { /* */ } + }); + + function collectOneWave(findings) { + // Mirror collect.js:455-457 + 510-515: bare fingerprint per finding, then + // classify (which disambiguates) and upsert. This is the exact production + // path, minus the worktree/agent plumbing. + const stamped = findings.map((f) => ({ ...f, fingerprint: computeFingerprint(f) })); + const priorMap = buildPriorMap(db, RUN_ID); + const classified = classifyFindings(stamped, priorMap); + return upsertFindings(db, RUN_ID, 1, classified); + } + + it('(a) two symbol-less findings, same file+category+line-bucket, in ONE wave → BOTH persist, no throw', () => { + // Same coarse key: category=docs, no rule_id, same file, no symbol, lines + // 3 & 7 both bucket to 0. Pre-fix: identical base fingerprint → both in + // classified.new → upsertFindings throws UNIQUE constraint → whole wave + // rolls back (0 rows). Post-fix: Part 1 salts the 2nd → 2 distinct rows. + const f1 = { + category: 'docs', file: 'README.md', line: 3, symbol: null, + severity: 'LOW', description: 'first doc issue', + }; + const f2 = { + category: 'docs', file: 'README.md', line: 7, symbol: null, + severity: 'LOW', description: 'second, different doc issue', + }; + // Precondition: the two share a base fingerprint (that is the bug shape). + assert.equal( + computeFingerprint(f1), computeFingerprint(f2), + 'precondition: the two findings share a coarse base fingerprint', + ); + + let stats; + assert.doesNotThrow(() => { stats = collectOneWave([f1, f2]); }, + 'within-wave collision must NOT throw / abort the wave'); + + const rows = db.prepare('SELECT finding_id, fingerprint, description FROM findings WHERE run_id = ? ORDER BY id') + .all(RUN_ID); + assert.equal(rows.length, 2, 'BOTH distinct findings must persist (count not reduced)'); + assert.equal(stats.inserted, 2, 'upsertFindings reports 2 inserted'); + assert.equal(new Set(rows.map(r => r.finding_id)).size, 2, 'distinct finding_ids'); + assert.equal(new Set(rows.map(r => r.fingerprint)).size, 2, 'distinct fingerprints'); + }); + + it('(b) the EXACT wave-1 shape (two `docs` findings on the same README, lines 21 & 27, no symbol) → both persist', () => { + // The live-reproduced shape from the finding text: two distinct README + // findings 6 lines apart, no symbol, same 10-line bucket (20). + const f1 = { + category: 'docs', file: 'README.md', line: 21, symbol: null, + severity: 'LOW', description: 'stale badge near the top', + }; + const f2 = { + category: 'docs', file: 'README.md', line: 27, symbol: null, + severity: 'LOW', description: 'broken anchor a few lines down', + }; + assert.equal( + computeFingerprint(f1), computeFingerprint(f2), + 'precondition: lines 21 & 27 share the 20-bucket → identical base fingerprint', + ); + + let stats; + assert.doesNotThrow(() => { stats = collectOneWave([f1, f2]); }); + const rows = db.prepare('SELECT finding_id FROM findings WHERE run_id = ?').all(RUN_ID); + assert.equal(rows.length, 2, 'both wave-1-shaped findings persist'); + assert.equal(stats.inserted, 2); + }); + + it('(b2) a same-coarse-key TRIPLE in one wave → all three persist as distinct findings', () => { + // fold-never-drop: occurrence index extends past 2. Three findings, same + // bucket, must yield three rows (indices 0/1/2 → bare/salt1/salt2). + const base = { category: 'docs', file: 'CHANGELOG.md', symbol: null, severity: 'LOW' }; + const findings = [ + { ...base, line: 2, description: 'a' }, + { ...base, line: 4, description: 'b' }, + { ...base, line: 8, description: 'c' }, + ]; + assert.equal( + new Set(findings.map(computeFingerprint)).size, 1, + 'precondition: all three share one base fingerprint', + ); + let stats; + assert.doesNotThrow(() => { stats = collectOneWave(findings); }); + assert.equal(stats.inserted, 3, 'all three distinct findings persist'); + const rows = db.prepare('SELECT fingerprint FROM findings WHERE run_id = ?').all(RUN_ID); + assert.equal(new Set(rows.map(r => r.fingerprint)).size, 3, 'three distinct fingerprints'); + }); + + it('(b3) fp-p-001: a coarse-key group where two NON-keepers share an EQUAL (or empty) description → all persist, both input orders', () => { + // fp-p-001 (MED): saltByContent used the description as the SOLE + // discriminator, so two NON-keeper members of the same coarse-key group with + // a byte-identical (or both-empty) description salted to the IDENTICAL + // fingerprint → identical derived finding_id → upsertFindings' INSERT OR + // IGNORE silently DROPPED the second genuinely-distinct finding. The fix + // folds file + line + a deterministic within-group ordinal into the salt, so + // tying descriptions still diverge. The (b2) triple above only ever uses + // distinct descriptions ('a'/'b'/'c'), so this edge was untested. + // + // Three members at one coarse key (category=docs, no symbol, same file, all + // lines bucket to 10). One member with a UNIQUE description becomes the + // deterministic bare-fp keeper (it sorts last); the OTHER TWO share an + // identical description and are BOTH non-keepers — exactly the collision the + // pre-fix salt could not tell apart. Run twice in BOTH input orders on fresh + // DBs: the result must be order-independent and lose nothing either way. + const mk = (description) => ({ + category: 'docs', file: 'docs/guide.md', symbol: null, severity: 'LOW', description, + }); + // Two non-keepers tie on description; lines differ so they are genuinely + // distinct findings (different file:line) the corpus must keep separate. + const dupA = { ...mk('see above'), line: 12 }; + const dupB = { ...mk('see above'), line: 16 }; + const keeperUniqueDesc = { ...mk('z unique trailer'), line: 14 }; + + assert.equal( + new Set([dupA, dupB, keeperUniqueDesc].map(computeFingerprint)).size, 1, + 'precondition: all three share one base fingerprint (lines 12/14/16 → bucket 10)', + ); + + for (const order of [[dupA, dupB, keeperUniqueDesc], [dupB, dupA, keeperUniqueDesc], [keeperUniqueDesc, dupB, dupA]]) { + const label = `order=[${order.map((f) => `${f.description.split(' ')[0]}@${f.line}`).join(',')}]`; + const sdb = openMemoryDb(); + try { + sdb.prepare('INSERT INTO runs (id, repo, local_path, commit_sha) VALUES (?, ?, ?, ?)') + .run(RUN_ID, 'dogfood-lab/testing-os', '/tmp/repo', 'c'.repeat(40)); + sdb.prepare('INSERT INTO waves (run_id, phase, wave_number) VALUES (?, ?, ?)') + .run(RUN_ID, 'health-audit-a', 1); + + const stamped = order.map((f) => ({ ...f, fingerprint: computeFingerprint(f) })); + const classified = classifyFindings(stamped, buildPriorMap(sdb, RUN_ID)); + let stats; + assert.doesNotThrow(() => { stats = upsertFindings(sdb, RUN_ID, 1, classified); }, + `${label}: tying-description non-keepers must not abort the wave`); + + assert.equal(stats.inserted, 3, `${label}: all three distinct findings persist (none dropped)`); + const rows = sdb.prepare('SELECT finding_id, fingerprint, description, line_number FROM findings WHERE run_id = ? ORDER BY id') + .all(RUN_ID); + assert.equal(rows.length, 3, `${label}: exactly three rows — the duplicate-description non-keeper was NOT swallowed`); + assert.equal(new Set(rows.map((r) => r.fingerprint)).size, 3, `${label}: three distinct fingerprints`); + assert.equal(new Set(rows.map((r) => r.finding_id)).size, 3, `${label}: three distinct finding_ids`); + // Both tying-description findings survive as their own rows (by line). + assert.deepEqual( + rows.map((r) => r.line_number).sort((a, b) => a - b), [12, 14, 16], + `${label}: all three (file,line) findings are present`, + ); + } finally { + try { sdb.close(); } catch { /* */ } + } + } + }); + + it('(b4) fp-p-001: two NON-keepers with BOTH-EMPTY descriptions → all persist as distinct rows (direct fingerprint path)', () => { + // The empty-description variant. In the LIVE collect path the Ajv schema's + // `description: minLength 1` gate rejects an empty description before + // upsertFindings, but the fingerprint module's own JSDoc supports direct + // callers/tests passing no description, and its header asserts "two distinct + // members get distinct salts" — which was FALSE for the both-empty case + // pre-fix (normalizeDescription('') === normalizeDescription(null) === ''). + // This drives disambiguateFingerprints directly to lock that invariant. + const base = { category: 'docs', file: 'docs/empty.md', symbol: null, severity: 'LOW' }; + // keeper has a non-empty description (sorts after ''); two non-keepers are + // both empty/null-described — the exact tie the description-only salt missed. + const k = { ...base, line: 4, description: 'keeper' }; + const e1 = { ...base, line: 2, description: '' }; + const e2 = { ...base, line: 8, description: null }; + + const bare = computeFingerprint(k); + assert.equal(new Set([k, e1, e2].map(computeFingerprint)).size, 1, + 'precondition: all three share one base fingerprint'); + + for (const order of [[k, e1, e2], [e1, e2, k], [e2, e1, k]]) { + const stamped = order.map((f) => ({ ...f, fingerprint: bare })); + const out = disambiguateFingerprints(stamped); + const fps = out.map((f) => f.fingerprint); + assert.equal(new Set(fps).size, 3, + `both-empty-description non-keepers must get DISTINCT salts (got ${JSON.stringify(fps)})`); + // The keeper still owns the bare fingerprint, byte-for-byte. + assert.ok(fps.includes(bare), 'the bare-fp keeper is preserved unchanged'); + } + }); + + it('(c) a singleton finding’s fingerprint is BYTE-IDENTICAL after disambiguation (backward-compat)', () => { + // The cross-wave dedup invariant (B-BACK-002): disambiguation must NOT + // perturb the bare fingerprint of a lone finding, or every singleton would + // re-classify as `new` next wave instead of `recurring`. + const f = { + category: 'bug', rule_id: 'X', file: 'src/a.js', symbol: 'foo', line: 10, + severity: 'HIGH', description: 'whatever', + }; + const bare = computeFingerprint(f); + + const out = disambiguateFingerprints([{ ...f, fingerprint: bare }]); + assert.equal(out.length, 1); + assert.equal(out[0].fingerprint, bare, + 'a singleton must keep its bare fingerprint byte-for-byte'); + + // And the FIRST member of a colliding group keeps the bare fingerprint too. + const g1 = { ...f, line: 11, description: 'one' }; + const g2 = { ...f, line: 13, description: 'two' }; + const groupBare = computeFingerprint(g1); // == computeFingerprint(g2) + const grouped = disambiguateFingerprints([ + { ...g1, fingerprint: groupBare }, + { ...g2, fingerprint: groupBare }, + ]); + assert.equal(grouped[0].fingerprint, groupBare, + 'first member of a collision group keeps the bare fingerprint'); + assert.notEqual(grouped[1].fingerprint, groupBare, + 'second member is salted to a distinct fingerprint'); + }); + + it('(c2) disambiguateFingerprints does not mutate its inputs', () => { + const f1 = { category: 'docs', file: 'README.md', line: 3, fingerprint: undefined }; + f1.fingerprint = computeFingerprint(f1); + const f2 = { category: 'docs', file: 'README.md', line: 7 }; + f2.fingerprint = computeFingerprint(f2); + const snapshot1 = f1.fingerprint; + const snapshot2 = f2.fingerprint; + disambiguateFingerprints([f1, f2]); + assert.equal(f1.fingerprint, snapshot1, 'input f1 not mutated'); + assert.equal(f2.fingerprint, snapshot2, 'input f2 not mutated'); + }); + + it('(d) cross-wave: the SAME singleton finding re-reported next wave dedupes (recurring, not new)', () => { + const f = { + category: 'bug', rule_id: 'Y', file: 'src/b.js', symbol: 'bar', line: 40, + severity: 'MEDIUM', description: 'the same defect, possibly reworded', + }; + + // Wave 1: first sighting → inserted as new. + const s1 = collectOneWave([f]); + assert.equal(s1.inserted, 1); + assert.equal(s1.updated, 0); + + // Wave 2: re-report the same finding (reworded). It is a singleton again, + // so it keeps the bare fingerprint and must classify recurring — NOT a + // second new row. + db.prepare('INSERT INTO waves (run_id, phase, wave_number) VALUES (?, ?, ?)') + .run(RUN_ID, 'health-audit-a', 2); + const reworded = { ...f, description: 'same defect, different prose' }; + const stamped = [{ ...reworded, fingerprint: computeFingerprint(reworded) }]; + const priorMap = buildPriorMap(db, RUN_ID); + const classified = classifyFindings(stamped, priorMap); + assert.equal(classified.recurring.length, 1, 'singleton re-report classifies recurring'); + assert.equal(classified.new.length, 0, 'no new row for a re-reported singleton'); + const s2 = upsertFindings(db, RUN_ID, 2, classified); + assert.equal(s2.updated, 1); + + const rows = db.prepare('SELECT id FROM findings WHERE run_id = ?').all(RUN_ID); + assert.equal(rows.length, 1, 'still exactly one row across both waves (deduped)'); + }); + + it('(e) cross-wave fp-r-001: a wave-1 singleton that gains a coarse-key sibling in wave 2 — order-independent', () => { + // THE regression (fp-r-001, HIGH). disambiguateFingerprints USED to award + // the bare-fp slot by within-wave array order, blind to prior state. So a + // wave-1 SINGLETON A that gains a NEW coarse-key sibling B in wave 2 could + // have the bare fp handed to B (whichever sorted first), making B dedupe to + // A's prior row → B classified `recurring` and SILENTLY SWALLOWED, while A's + // stable finding_id (the swarm-approve / D3B-006 handle) was hijacked and A + // re-inserted under a salted id. collect.js wave-mate order is + // non-deterministic, so BOTH input orders must now produce the same correct + // result. We run the whole scenario twice on FRESH DBs: wave-2 = [A, B] and + // wave-2 = [B, A]. Pre-fix, [B, A] is the catastrophic-swallow order. + // + // A and B share a coarse base fingerprint: same category=docs, same file, no + // symbol, lines 21 & 27 both bucket to 20. Different descriptions only. + const A = { + category: 'docs', file: 'packages/dogfood-swarm/README.md', line: 21, symbol: null, + severity: 'LOW', description: 'stale badge', + }; + const B = { + category: 'docs', file: 'packages/dogfood-swarm/README.md', line: 27, symbol: null, + severity: 'LOW', description: 'broken anchor a few lines down', + }; + assert.equal( + computeFingerprint(A), computeFingerprint(B), + 'precondition: A and B share one coarse base fingerprint (the bug shape)', + ); + + for (const wave2Order of [[A, B], [B, A]]) { + const label = `wave2=[${wave2Order.map((f) => f.description.split(' ')[0]).join(',')}]`; + + // Fresh DB per order so the two scenarios cannot leak state into each other. + const sdb = openMemoryDb(); + try { + sdb.prepare('INSERT INTO runs (id, repo, local_path, commit_sha) VALUES (?, ?, ?, ?)') + .run(RUN_ID, 'dogfood-lab/testing-os', '/tmp/repo', 'c'.repeat(40)); + sdb.prepare('INSERT INTO waves (run_id, phase, wave_number) VALUES (?, ?, ?)') + .run(RUN_ID, 'health-audit-a', 1); + + const runWave = (waveId, findings) => { + const stamped = findings.map((f) => ({ ...f, fingerprint: computeFingerprint(f) })); + const priorMap = buildPriorMap(sdb, RUN_ID); + const classified = classifyFindings(stamped, priorMap); + return upsertFindings(sdb, RUN_ID, waveId, classified); + }; + + // Wave 1: A alone → inserted as new. Capture its stable finding_id; this + // is the exact handle `swarm approve --ids` pins (D3B-006). + const s1 = runWave(1, [A]); + assert.equal(s1.inserted, 1, `${label}: wave-1 A inserted as new`); + const aRow = sdb.prepare('SELECT id, finding_id, fingerprint FROM findings WHERE run_id = ?').get(RUN_ID); + const aIdOriginal = aRow.finding_id; + const aRowId = aRow.id; + + // Wave 2: A re-reported + the NEW sibling B, in this order. + sdb.prepare('INSERT INTO waves (run_id, phase, wave_number) VALUES (?, ?, ?)') + .run(RUN_ID, 'health-audit-a', 2); + const s2 = runWave(2, wave2Order); + + // (1) A retains its ORIGINAL finding_id and is classified recurring. + const aRowAfter = sdb.prepare('SELECT id, finding_id, status FROM findings WHERE id = ?').get(aRowId); + assert.equal(aRowAfter.finding_id, aIdOriginal, + `${label}: A keeps its original finding_id (no D3B-006 handle hijack)`); + assert.equal(aRowAfter.status, 'recurring', + `${label}: A is classified recurring in wave 2`); + assert.equal(s2.updated, 1, `${label}: exactly one recurring update (A)`); + + // (2) B is inserted as a NEW distinct row (not swallowed). + assert.equal(s2.inserted, 1, `${label}: B inserted as a new row (not swallowed)`); + const rows = sdb.prepare('SELECT id, finding_id, fingerprint, description, status FROM findings WHERE run_id = ? ORDER BY id') + .all(RUN_ID); + + // (3) total findings for the run == 2. + assert.equal(rows.length, 2, `${label}: exactly two rows total (A + B), nothing swallowed`); + assert.equal(new Set(rows.map((r) => r.finding_id)).size, 2, `${label}: distinct finding_ids`); + assert.equal(new Set(rows.map((r) => r.fingerprint)).size, 2, `${label}: distinct fingerprints`); + + const bRow = rows.find((r) => r.id !== aRowId); + assert.ok(bRow, `${label}: B's row exists`); + assert.equal(bRow.description, B.description, `${label}: the new row carries B's content, not A's`); + assert.notEqual(bRow.finding_id, aIdOriginal, + `${label}: B does not steal A's finding_id`); + + // (4) no `recurred` finding_event landed on the wrong lineage. A's row is + // the only one that may carry a 'recurred' event; B (brand-new) must not. + const aRecurred = sdb.prepare( + "SELECT COUNT(*) n FROM finding_events WHERE finding_id = ? AND event_type = 'recurred'", + ).get(aRowId).n; + assert.equal(aRecurred, 1, `${label}: exactly one 'recurred' event, on A's lineage`); + const bRecurred = sdb.prepare( + "SELECT COUNT(*) n FROM finding_events WHERE finding_id = ? AND event_type = 'recurred'", + ).get(bRow.id).n; + assert.equal(bRecurred, 0, `${label}: no 'recurred' event mis-attributed to the new sibling B`); + } finally { + try { sdb.close(); } catch { /* */ } + } + } + }); + + it('safety net: INSERT OR IGNORE skips a true same-id-prefix collision without aborting the rest', () => { + // Part 2 in isolation: feed classified.new two entries whose 8-hex finding_id + // prefix collides but whose full fingerprints differ (the rare D3B-006 case + // that Part 1 cannot create from coarse keys). The first inserts; the second + // is skipped by INSERT OR IGNORE rather than throwing and rolling back. The + // first finding MUST survive — never-abort, never-reduce-below-what-fit. + const fpA = 'abcdef01' + '0'.repeat(16); // 24 hex + const fpB = 'abcdef01' + '1'.repeat(16); // same 8-hex prefix, distinct fp + assert.equal(fpA.slice(0, 8), fpB.slice(0, 8), 'precondition: shared 8-hex prefix'); + assert.notEqual(fpA, fpB, 'precondition: distinct full fingerprints'); + + let stats; + assert.doesNotThrow(() => { + stats = upsertFindings(db, RUN_ID, 1, { + new: [ + { fingerprint: fpA, severity: 'HIGH', category: 'bug', file: 'a.js', line: 1, symbol: null, description: 'A', recommendation: null }, + { fingerprint: fpB, severity: 'HIGH', category: 'bug', file: 'b.js', line: 1, symbol: null, description: 'B', recommendation: null }, + ], + recurring: [], fixed: [], unverified: [], + }); + }, 'a same-id-prefix collision must be skipped, not thrown'); + + assert.equal(stats.inserted, 1, 'the first row landed (the second was skipped)'); + const rows = db.prepare('SELECT finding_id FROM findings WHERE run_id = ?').all(RUN_ID); + assert.equal(rows.length, 1, 'exactly one row — collect did not abort, first finding survived'); + assert.equal(rows[0].finding_id, 'F-abcdef01'); + }); +}); + +// ════════════════════════════════════════════════════════════════════ +// fp-p-005 — context-snippet hash makes the BASE fingerprint injective +// (the deeper fix deferred when fp-002's occurrence-salting net landed). +// These lock the NEW source-available path; the fp-002 blocks above stay +// GREEN because they exercise the no-source fallback, which is byte-for-byte +// the historical scheme and still needs the occurrence-salting net. +// ════════════════════════════════════════════════════════════════════ + +// A 30-line source where every line is distinct, so two findings on different +// lines always see different surrounding windows. Lines 21 & 27 both fall in the +// 20-bucket, so WITHOUT source they collide (the fp-002 shape); WITH source they +// do not. `const vN = N;` per line keeps each line — and each window — distinct. +const FP_P_005_SRC = Array.from({ length: 30 }, (_, i) => `const v${i + 1} = ${i + 1};`).join('\n'); + +describe('fp-p-005: extractContextSnippet — edit-stable surrounding-source window', () => { + it('returns null when there is no usable anchor (no source / bad line / past EOF / blank window)', () => { + assert.equal(extractContextSnippet(undefined, 10), null, 'no source → null'); + assert.equal(extractContextSnippet('', 10), null, 'empty source → null'); + assert.equal(extractContextSnippet(FP_P_005_SRC, 0), null, 'line 0 is not 1-based-valid → null'); + assert.equal(extractContextSnippet(FP_P_005_SRC, -3), null, 'negative line → null'); + assert.equal(extractContextSnippet(FP_P_005_SRC, 9999), null, 'line past EOF → null'); + assert.equal(extractContextSnippet(' \n\t\n ', 2), null, 'all-whitespace window → null'); + }); + + it('is line-ending and indentation agnostic (survives reflow)', () => { + const lf = 'a();\n b();\n c();\n d();\n e();'; + const crlf = lf.replace(/\n/g, '\r\n'); + const reindented = 'a();\nb();\nc();\nd();\ne();'; + assert.equal(extractContextSnippet(crlf, 3), extractContextSnippet(lf, 3), 'CRLF window == LF window'); + assert.equal(extractContextSnippet(reindented, 3), extractContextSnippet(lf, 3), + 'pure re-indentation collapses away — same non-whitespace content → same snippet'); + }); + + it('yields distinct windows for findings more than CONTEXT_RADIUS_LINES apart', () => { + assert.ok(CONTEXT_RADIUS_LINES >= 1, 'radius is a positive window'); + assert.notEqual(extractContextSnippet(FP_P_005_SRC, 21), extractContextSnippet(FP_P_005_SRC, 27), + 'lines 21 and 27 see different surrounding source'); + }); +}); + +describe('fp-p-005: computeFingerprint folds the context snippet into the BASE fp', () => { + const at = (line, description) => ({ + category: 'docs', file: 'README.md', symbol: null, severity: 'LOW', line, description, + }); + + it('(1) two distinct findings in the same file+bucket get DISTINCT base fps with source', () => { + const a = at(21, 'stale badge'); + const b = at(27, 'broken anchor'); + // Without source they collide on the 20-bucket — the exact fp-002 bug shape. + assert.equal(computeFingerprint(a), computeFingerprint(b), + 'precondition: no-source fps collide on the shared bucket'); + // With source the surrounding lines differ → distinct base fps, no salting. + assert.notEqual( + computeFingerprint(a, { sourceText: FP_P_005_SRC }), + computeFingerprint(b, { sourceText: FP_P_005_SRC }), + 'context hash makes the base fp injective for distinct locations', + ); + }); + + it('(2) rewording the SAME finding at the SAME location does not change the fp (B-BACK-002 via source, not prose)', () => { + assert.equal( + computeFingerprint(at(21, 'handleX dereferences without a guard'), { sourceText: FP_P_005_SRC }), + computeFingerprint(at(21, 'Guard handleX before dereferencing.'), { sourceText: FP_P_005_SRC }), + 'same location + same surrounding source → same fp regardless of description prose', + ); + }); + + it('(3) reflow: a finding shifted down by edits ABOVE it keeps its fp (CodeQL primaryLocationLineHash property)', () => { + const before = computeFingerprint(at(21, 'x'), { sourceText: FP_P_005_SRC }); + // Insert 5 lines at the top; the finding's surrounding code moves to line 26. + const shifted = ['// new', '// header', '// lines', '// added', '// above'].join('\n') + '\n' + FP_P_005_SRC; + const after = computeFingerprint(at(26, 'x'), { sourceText: shifted }); + assert.equal(after, before, 'same surrounding source at the new line → same fp (no new+fixed churn)'); + }); + + it('(4) CRLF and LF source produce the same fp', () => { + assert.equal( + computeFingerprint(at(21, 'x'), { sourceText: FP_P_005_SRC.replace(/\n/g, '\r\n') }), + computeFingerprint(at(21, 'x'), { sourceText: FP_P_005_SRC }), + 'line-ending normalization makes CRLF and LF fingerprints identical', + ); + }); + + it('(5) the no-source path is byte-identical to the historical bucket fingerprint', () => { + const a = at(21, 'x'); + const historical = computeFingerprint(a); // one-arg legacy call + assert.equal(computeFingerprint(a, {}), historical, 'empty options == legacy'); + assert.equal(computeFingerprint(a, { sourceText: undefined }), historical, 'undefined source == legacy'); + assert.equal(computeFingerprint(a, { sourceText: '' }), historical, 'empty source == legacy'); + // And a no-source same-bucket sibling still collides — the occurrence-salting + // net is still load-bearing (and still tested) for the no-source path. + assert.equal(historical, computeFingerprint(at(27, 'y')), + 'no-source siblings still share the bucket fp (the net still earns its keep)'); + }); +}); + +describe('fp-p-005: cross-wave injective — collision-free in EITHER input order (with source)', () => { + // The fp-r-001 scenario re-run with source available. Because A and B now + // carry DISTINCT base fps, NO collision group forms, disambiguateFingerprints + // never salts, and the outcome is identical regardless of wave-2 array order: + // A stays recurring on its original finding_id, B inserts as new. This is the + // hazard fp-002/fp-r-001's net guarded against, now dissolved at the source. + const A = { category: 'docs', file: 'README.md', line: 21, symbol: null, severity: 'LOW', description: 'stale badge' }; + const B = { category: 'docs', file: 'README.md', line: 27, symbol: null, severity: 'LOW', description: 'broken anchor a few lines down' }; + + it('A keeps its id (recurring) and B inserts as new, identically for [A,B] and [B,A]', () => { + assert.equal(computeFingerprint(A), computeFingerprint(B), + 'precondition: without source A and B collide (the fp-002 shape)'); + assert.notEqual( + computeFingerprint(A, { sourceText: FP_P_005_SRC }), + computeFingerprint(B, { sourceText: FP_P_005_SRC }), + 'precondition: with source A and B are injective', + ); + + for (const wave2Order of [[A, B], [B, A]]) { + const label = `wave2=[${wave2Order.map((f) => f.description.split(' ')[0]).join(',')}]`; + const sdb = openMemoryDb(); + try { + sdb.prepare('INSERT INTO runs (id, repo, local_path, commit_sha) VALUES (?, ?, ?, ?)') + .run(RUN_ID, 'dogfood-lab/testing-os', '/tmp/repo', 'c'.repeat(40)); + sdb.prepare('INSERT INTO waves (run_id, phase, wave_number) VALUES (?, ?, ?)') + .run(RUN_ID, 'health-audit-a', 1); + + // Stamp with source, exactly as collect.js does post-fp-p-005. + const runWave = (waveId, findings) => { + const stamped = findings.map((f) => ({ ...f, fingerprint: computeFingerprint(f, { sourceText: FP_P_005_SRC }) })); + const priorMap = buildPriorMap(sdb, RUN_ID); + const classified = classifyFindings(stamped, priorMap); + return upsertFindings(sdb, RUN_ID, waveId, classified); + }; + + const s1 = runWave(1, [A]); + assert.equal(s1.inserted, 1, `${label}: wave-1 A inserted`); + const aRow = sdb.prepare('SELECT id, finding_id, fingerprint FROM findings WHERE run_id = ?').get(RUN_ID); + const aIdOriginal = aRow.finding_id; + const aRowId = aRow.id; + assert.equal(aRow.fingerprint, computeFingerprint(A, { sourceText: FP_P_005_SRC }), + `${label}: A's stored fp is the BARE context hash — proof nothing was salted`); + + sdb.prepare('INSERT INTO waves (run_id, phase, wave_number) VALUES (?, ?, ?)') + .run(RUN_ID, 'health-audit-a', 2); + const s2 = runWave(2, wave2Order); + + const aAfter = sdb.prepare('SELECT finding_id, status FROM findings WHERE id = ?').get(aRowId); + assert.equal(aAfter.finding_id, aIdOriginal, `${label}: A keeps its original finding_id`); + assert.equal(aAfter.status, 'recurring', `${label}: A is recurring`); + assert.equal(s2.updated, 1, `${label}: exactly one recurring update (A)`); + assert.equal(s2.inserted, 1, `${label}: B inserted as a new row`); + + const rows = sdb.prepare('SELECT id, finding_id, fingerprint, description FROM findings WHERE run_id = ? ORDER BY id').all(RUN_ID); + assert.equal(rows.length, 2, `${label}: exactly two rows`); + assert.equal(new Set(rows.map((r) => r.fingerprint)).size, 2, `${label}: distinct fingerprints`); + const bRow = rows.find((r) => r.id !== aRowId); + assert.equal(bRow.description, B.description, `${label}: the new row carries B's content`); + assert.equal(bRow.fingerprint, computeFingerprint(B, { sourceText: FP_P_005_SRC }), + `${label}: B's stored fp is the BARE context hash (not salted)`); + } finally { + try { sdb.close(); } catch { /* */ } + } + } + }); +}); + +describe('fp-p-005: collect() reads real worktree source and persists same-bucket findings distinctly', () => { + let tmp, dbPath; + const RID = 'r-fp-p-005-collect'; + + beforeEach(() => { + tmp = mkdtempSync(join(tmpdir(), 'fp-p-005-collect-')); + dbPath = join(tmp, 'control-plane.db'); + mkdirSync(join(tmp, 'packages', 'backend'), { recursive: true }); + writeFileSync(join(tmp, 'packages', 'backend', 'x.js'), `${FP_P_005_SRC}\n`, 'utf-8'); + + const db = openDb(dbPath); + db.prepare(`INSERT INTO runs (id, repo, local_path, commit_sha, branch, status) + VALUES (?, 'org/repo', ?, ?, 'main', 'pending')`).run(RID, tmp, 'a'.repeat(40)); + saveDomainDraft(db, RID, [{ name: 'backend', globs: ['packages/backend/**'], ownership_class: 'owned' }]); + freezeDomains(db, RID); + closeDb(dbPath); + }); + + afterEach(() => { + try { closeDb(dbPath); } catch { /* */ } + rmSync(tmp, { recursive: true, force: true }); + }); + + it('two symbol-less findings on lines 21 & 27 of one file persist as two distinct rows', () => { + dispatch({ runId: RID, phase: 'health-audit-a', dbPath, outputDir: tmp }); + + // symbol is OMITTED (the symbol-less bug shape). The canonical agent-output + // schema accepts an absent symbol but rejects an explicit null, and an absent + // symbol fingerprints the same as the empty-symbol case ((undefined||'')===''). + const f1 = { id: 'F-1', severity: 'LOW', category: 'docs', file: 'packages/backend/x.js', line: 21, description: 'first issue' }; + const f2 = { id: 'F-2', severity: 'LOW', category: 'docs', file: 'packages/backend/x.js', line: 27, description: 'second, different issue' }; + + // Precondition: without source these two collide on the base fp (fp-002 shape). + assert.equal(computeFingerprint(f1), computeFingerprint(f2), + 'precondition: the two findings share a no-source base fp'); + + const outPath = join(tmp, 'backend.json'); + writeFileSync(outPath, JSON.stringify({ + domain: 'backend', stage: 'A', summary: 'two same-bucket findings', + findings: [f1, f2], + }), 'utf-8'); + + const report = collect({ runId: RID, dbPath, outputs: { backend: outPath } }); + assert.equal(report.findings.new, 2, 'both findings persisted as new (no collision, no abort)'); + + const db = openDb(dbPath); + const rows = db.prepare('SELECT finding_id, fingerprint FROM findings WHERE run_id = ? ORDER BY id').all(RID); + closeDb(dbPath); + + assert.equal(rows.length, 2, 'exactly two rows'); + assert.equal(new Set(rows.map((r) => r.fingerprint)).size, 2, 'distinct fingerprints (context hash distinguished them)'); + assert.equal(new Set(rows.map((r) => r.finding_id)).size, 2, 'distinct finding_ids'); + // Proof collect() actually read the worktree source: the persisted fps are + // the context-hash fps, NOT the no-source bucket fp the precondition shares. + const bucketFp = computeFingerprint(f1); + assert.ok(!rows.some((r) => r.fingerprint === bucketFp), + 'neither persisted fp is the no-source bucket fp — collect() folded the real surrounding source in'); + }); +}); + +// ════════════════════════════════════════════════════════════════════ +// fp-003 / fp-006 — git-touched-files: NUL-delimited path recovery +// ════════════════════════════════════════════════════════════════════ + +describe('fp-003: getActualTouchedFiles recovers space/non-ASCII paths verbatim', () => { + let repo; + + before(() => { + repo = mkdtempSync(join(tmpdir(), 'git-touched-')); + const git = (...args) => execFileSync('git', args, { cwd: repo, encoding: 'utf-8' }); + git('init', '-q'); + git('config', 'user.email', 'test@example.com'); + git('config', 'user.name', 'Test'); + // Force the worst case: default core.quotePath=true would C-quote these + // paths under a non-`-z` porcelain call. The fix uses `-z`, which is + // immune regardless, so we leave quotePath at its default to prove it. + git('config', 'core.quotePath', 'true'); + }); + + after(() => { + try { rmSync(repo, { recursive: true, force: true }); } catch { /* */ } + }); + + it('recovers an untracked path with a SPACE and a NON-ASCII char exactly', () => { + writeFileSync(join(repo, 'file with spaces.js'), '// hi\n', 'utf-8'); + writeFileSync(join(repo, 'naïve.js'), '// hi\n', 'utf-8'); + + const touched = getActualTouchedFiles(repo); + assert.ok(!touched.unavailable, 'git was available'); + + assert.ok(touched.untracked.includes('file with spaces.js'), + `space path must round-trip; got untracked=${JSON.stringify(touched.untracked)}`); + assert.ok(touched.untracked.includes('naïve.js'), + `non-ASCII path must round-trip (no octal-escape mangling); got untracked=${JSON.stringify(touched.untracked)}`); + // The pre-fix corruption signature must be ABSENT. + assert.ok(!touched.all.some(p => p.includes('303') || p.includes('257') || p.startsWith('"')), + `no C-quote/octal corruption in the touched set; got all=${JSON.stringify(touched.all)}`); + assert.ok(touched.all.includes('naïve.js') && touched.all.includes('file with spaces.js'), + 'both paths present in the union `all` set'); + }); + + it('recovers a RENAME destination with a non-ASCII char and consumes the source field', () => { + // Commit a plain file, then rename it to a non-ASCII destination and stage + // the rename so porcelain emits an `R` record (DEST then SOURCE under -z). + const git = (...args) => execFileSync('git', args, { cwd: repo, encoding: 'utf-8' }); + writeFileSync(join(repo, 'plain.js'), '// content\n', 'utf-8'); + git('add', 'plain.js'); + git('commit', '-q', '-m', 'add plain.js'); + git('mv', 'plain.js', 'piñata.js'); + + const touched = getActualTouchedFiles(repo); + assert.ok(touched.all.includes('piñata.js'), + `rename destination must round-trip clean; got all=${JSON.stringify(touched.all)}`); + // The rename SOURCE (`plain.js`) is consumed as the trailing -z field, not + // misparsed into its own bogus record. And no quote/octal corruption. + assert.ok(!touched.all.some(p => p.startsWith('"') || p.includes('241')), + `no quoting/octal corruption on the rename record; got all=${JSON.stringify(touched.all)}`); + }); +}); + +describe('fp-006: git-touched-files comments no longer promise a git diff cross-check', () => { + it('the source carries no false `git diff --name-only` cross-check promise', () => { + const src = readFileSync( + fileURLToPath(new URL('./lib/git-touched-files.js', import.meta.url)), + 'utf-8', + ); + // The drifted comment promised `git diff --name-only` as a + // "belt-and-suspenders cross-check" the code never ran. After the fix the + // only git invocation is the porcelain call; assert the false promise is + // gone (no `git diff` mention at all) while porcelain remains. + assert.ok(!/git diff --name-only/.test(src), + 'stale `git diff --name-only` cross-check promise must be removed (fp-006)'); + assert.ok(/git', \['status', '--porcelain', '-z'/.test(src.replace(/\s+/g, ' ')) + || /status.*--porcelain.*-z/.test(src), + 'the porcelain -z call the code actually runs must be present'); + }); +}); + +// ════════════════════════════════════════════════════════════════════ +// fp-004 — bounded-json: enforce the cap on bytes ACTUALLY read +// ════════════════════════════════════════════════════════════════════ + +describe('fp-004: bounded-json enforces the limit on bytes actually read', () => { + let tmp; + + before(() => { tmp = mkdtempSync(join(tmpdir(), 'bounded-toctou-')); }); + after(() => { try { rmSync(tmp, { recursive: true, force: true }); } catch { /* */ } }); + + it('rejects a buffer that exceeds maxBytes even when statSync under-reports (TOCTOU)', () => { + // Simulate the race: monkeypatch statSync via a wrapper is heavy; instead + // we drive the post-read check directly. A file whose ON-DISK bytes exceed + // maxBytes must be rejected with kind SIZE_LIMIT regardless of which gate + // (stat or post-read) catches it — proving the post-read gate exists. + const p = join(tmp, 'grew.json'); + const big = JSON.stringify({ blob: 'x'.repeat(300 * 1024) }); + writeFileSync(p, big, 'utf-8'); + + let err = null; + try { + readBoundedJson(p, { maxBytes: 100 * 1024 }); + } catch (e) { err = e; } + assert.ok(err instanceof BoundedJsonError, 'must throw BoundedJsonError'); + assert.equal(err.kind, 'SIZE_LIMIT'); + assert.ok(err.size > 100 * 1024, 'reported size reflects the oversized bytes'); + assert.equal(err.maxBytes, 100 * 1024); + }); + + it('the post-read gate is wired to buffer length (multi-byte chars counted as bytes, not chars)', () => { + // A file whose CHARACTER count is under the cap but whose BYTE count is over + // it must still be rejected — proving the gate measures bytes (buffer + // length), the unit a memory-exhaustion guard cares about. Each `好` is 3 + // UTF-8 bytes; 40k chars ≈ 120k bytes. Cap at 100k bytes. + const p = join(tmp, 'multibyte.json'); + const payload = JSON.stringify({ s: '好'.repeat(40 * 1024) }); + writeFileSync(p, payload, 'utf-8'); + + const charLen = payload.length; + const byteLen = Buffer.byteLength(payload, 'utf-8'); + assert.ok(byteLen > charLen, 'precondition: byte length exceeds char length'); + + let err = null; + try { + readBoundedJson(p, { maxBytes: 100 * 1024 }); + } catch (e) { err = e; } + assert.ok(err instanceof BoundedJsonError && err.kind === 'SIZE_LIMIT', + 'multi-byte payload over the BYTE cap must be rejected'); + assert.ok(err.size >= byteLen - 8, 'reported size is the byte length, not char length'); + }); + + it('still parses a legitimate file under the limit (no regression)', () => { + const p = join(tmp, 'fine.json'); + writeFileSync(p, JSON.stringify({ ok: true, n: 42 }), 'utf-8'); + assert.deepEqual(readBoundedJson(p), { ok: true, n: 42 }); + }); +}); + +// ════════════════════════════════════════════════════════════════════ +// fp-005 — findings-digest isMain guard tolerates undefined argv[1] +// ════════════════════════════════════════════════════════════════════ + +describe('fp-005: findings-digest module loads with process.argv[1] undefined', () => { + it('importing the module under `node --eval` (no argv[1]) does not throw', () => { + // The pre-fix `process.argv[1].replace(...)` threw a TypeError at module + // evaluation time when argv[1] was undefined. We reproduce the exact + // context: a `node --eval` process whose argv has no [1] entry, importing + // the module. Success = exit 0 (the import + isMain computation survives). + const moduleUrl = new URL('./lib/findings-digest.js', import.meta.url).href; + const code = `import(${JSON.stringify(moduleUrl)}).then(() => process.exit(0))`; + + // `node --eval ` leaves process.argv = [execPath] only — argv[1] is + // undefined inside the evaluated module, the exact bug trigger. + assert.doesNotThrow(() => { + execFileSync(process.execPath, ['--eval', code], { + encoding: 'utf-8', + stdio: 'pipe', + }); + }, 'module must load cleanly when process.argv[1] is undefined (fp-005)'); + }); +}); diff --git a/packages/dogfood-swarm/meta-amendA-readme-contract.test.js b/packages/dogfood-swarm/meta-amendA-readme-contract.test.js new file mode 100644 index 0000000..c9575e4 --- /dev/null +++ b/packages/dogfood-swarm/meta-amendA-readme-contract.test.js @@ -0,0 +1,363 @@ +/** + * meta-amendA-readme-contract.test.js — README → CLI contract guard. + * + * td-006 (Wave-1 amend-A, tests-docs domain). The package already has a + * doc-drift discipline (wave10-docs-identity-drift.test.js pins schema.js + * STATUS.run to dispatch.js phase lists; scripts/check-doc-drift.mjs pins + * error-code + STATUS strings to the handbook and the cmd-handler COUNT to + * SHIP_GATE.md), but NONE of it pinned README.md's documented `swarm ` + * examples to cli.js's actual routing. That hole let td-001..td-005 drift + * undetected through ~28 audit waves: an operator copying the Quick start + * hit a precondition failure on the first command. + * + * This test closes the hole. It reads the REAL cli.js `commands` map (it does + * NOT hardcode the command list — same source-of-truth extraction used at + * wave10-docs-identity-drift.test.js:215-238) and asserts: + * + * (a) every `swarm ` invocation fenced in README.md names a command + * that exists in cli.js's `commands` map; + * (b) the README's required-flag contracts hold — the `collect` and + * `revalidate` examples carry `--domain=`, and `dispatch` uses a named + * `` placeholder rather than a bare wave number; + * (c) the README's "Requires Node ≥ N" string equals package.json + * `engines.node`'s floor. + * (d) every operator-facing environment variable the package READS + * (`process.env.` in non-test source) is documented in the README. + * + * If a future edit reintroduces a wrong example (a renamed command, a dropped + * required flag, a `` placeholder, a stale Node floor) — or adds / + * renames an env var without documenting it — this test fails at that edit + * instead of in an operator's terminal. + * + * td-p-005 (Wave-3 amend-C, tests-docs domain) added (d): the original td-006 + * guard pinned the COMMAND surface only, so an undocumented env var (the exact + * gap found in this Stage-B/C pass — `DOGFOOD_LOG_HUMAN`, `SWARM_DB`) could + * silently rot the README. (d) closes that hole with the same read-the-real- + * source discipline as (a): it extracts the env-var literals from source rather + * than hardcoding them, so a new operator-facing var cannot drift undocumented. + */ + +import { describe, it } from 'node:test'; +import assert from 'node:assert/strict'; +import { readFileSync, readdirSync } from 'node:fs'; +import { fileURLToPath } from 'node:url'; +import { dirname, join } from 'node:path'; + +const PKG_DIR = dirname(fileURLToPath(import.meta.url)); + +const README = readFileSync(new URL('./README.md', import.meta.url), 'utf-8'); +const CLI_SRC = readFileSync(new URL('./cli.js', import.meta.url), 'utf-8'); +const PKG = JSON.parse( + readFileSync(new URL('./package.json', import.meta.url), 'utf-8') +); + +/** + * Extract the command names registered in cli.js's `commands` map — the + * single source of truth for what the `swarm` binary actually routes. We read + * the source rather than importing it because cli.js runs its dispatch block + * at module top-level (it is a bin entry, not a library); a regex over the map + * body is the strict-safe extraction, mirroring the AUDIT_PHASES/AMEND_PHASES + * approach in wave10-docs-identity-drift.test.js. + * + * Map entries take two shapes: + * init: cmdInit, (bareword key) + * 'verify-fixed': cmdVerifyFixed, (quoted key — contains a hyphen) + */ +function extractCliCommands(src) { + const mapMatch = src.match(/const commands\s*=\s*\{([\s\S]*?)\n\};/); + assert.ok(mapMatch, 'cli.js must define a `const commands = { ... };` map'); + const body = mapMatch[1]; + + const names = new Set(); + const keyRe = /(?:^|,)\s*(?:'([a-z][a-z0-9-]*)'|([a-z][a-z0-9-]*))\s*:/gm; + let m; + while ((m = keyRe.exec(body)) !== null) { + names.add(m[1] ?? m[2]); + } + assert.ok(names.size >= 10, `expected to extract the full command map; got ${names.size}: ${[...names].join(', ')}`); + return names; +} + +/** + * Pull every fenced ```bash``` block out of the README. We deliberately scope + * the contract to fenced code (the copy-paste surface an operator actually + * runs); inline mentions of `swarm domains --unfreeze` inside prose bullets + * are guidance, not runnable examples, and are out of scope. + */ +function fencedBashBlocks(md) { + const blocks = []; + const re = /```bash\n([\s\S]*?)```/g; + let m; + while ((m = re.exec(md)) !== null) blocks.push(m[1]); + return blocks; +} + +/** + * From the fenced blocks, return the list of `swarm ...` invocations as + * { command, line } records. A line is an invocation iff its first token is + * `swarm`; flag-continuation lines (starting with `--`) and comment lines + * (starting with `#`) are not invocations and are skipped. + */ +function readmeInvocations(md) { + const out = []; + for (const block of fencedBashBlocks(md)) { + for (const raw of block.split('\n')) { + const line = raw.trim(); + if (!line.startsWith('swarm ')) continue; + const tokens = line.split(/\s+/); + out.push({ command: tokens[1], line }); + } + } + return out; +} + +describe('README → CLI contract (td-006)', () => { + it('extracts a sane command map and a usable set of README invocations', () => { + const commands = extractCliCommands(CLI_SRC); + // Spot-check a few known commands so a broken extractor (empty/garbled + // Set) can never make the existence assertion vacuously pass. + for (const expected of ['init', 'domains', 'dispatch', 'collect', 'revalidate', 'verify-fixed']) { + assert.ok(commands.has(expected), + `extractor missed cli.js command "${expected}" — extraction regex is wrong`); + } + + const invocations = readmeInvocations(README); + assert.ok(invocations.length >= 8, + `expected the README to document several swarm invocations; found ${invocations.length}`); + }); + + it('(a) every documented `swarm ` exists in cli.js commands map', () => { + const commands = extractCliCommands(CLI_SRC); + const invocations = readmeInvocations(README); + + for (const { command, line } of invocations) { + assert.ok( + commands.has(command), + `README documents \`swarm ${command}\` but cli.js has no such command. ` + + `Line: "${line}". Valid commands: ${[...commands].sort().join(', ')}` + ); + } + }); + + it('(b) collect / revalidate examples carry the required --domain= flag', () => { + // Both cmdCollect (cli.js: "No outputs provided. Use --domain=name:path") + // and cmdRevalidate (cli.js: "at least one --domain=name:path is required") + // exit 1 without at least one --domain pair. Every fenced example that + // invokes them must therefore include one, possibly on a continuation line. + for (const cmd of ['collect', 'revalidate']) { + const examples = exampleInvocationsFor(cmd); + assert.ok(examples.length > 0, + `expected at least one fenced \`swarm ${cmd}\` example in the README`); + for (const ex of examples) { + assert.match( + ex, + /--domain=[^:\s]+:[^\s]+/, + `README \`swarm ${cmd}\` example must include a --domain=name:path pair ` + + `(the CLI exits 1 without one). Got:\n${ex}` + ); + } + } + }); + + it('(b) dispatch examples use a named , never a bare wave number', () => { + // cmdDispatch takes where phase is a NAMED phase string + // (AUDIT_PHASES/AMEND_PHASES). A wave number silently falls through to the + // generic fallback and writes a bogus runs.status. The documented example + // must use the placeholder or a real named phase — not + // and not a literal integer in the phase slot. + const examples = exampleInvocationsFor('dispatch'); + assert.ok(examples.length > 0, + 'expected at least one fenced `swarm dispatch` example in the README'); + + const VALID_PHASES = phasesFromDispatchSource(); + + for (const ex of examples) { + // tokens: [swarm, dispatch, , , ...flags] + const phaseTok = ex.split(/\s+/)[3]; + assert.ok(phaseTok, `dispatch example is missing its phase argument: "${ex}"`); + assert.ok( + !/wave[-_]?number/i.test(phaseTok), + `README \`swarm dispatch\` example uses a wave-number placeholder ("${phaseTok}"); ` + + `the argument is a NAMED . Use or a real phase name.` + ); + assert.ok( + !/^\d+$/.test(phaseTok), + `README \`swarm dispatch\` example uses a bare wave number ("${phaseTok}") in the ` + + `phase slot; dispatch requires a named phase (${[...VALID_PHASES].join(', ')}).` + ); + const isPlaceholder = phaseTok === ''; + assert.ok( + isPlaceholder || VALID_PHASES.has(phaseTok), + `README \`swarm dispatch\` phase "${phaseTok}" is neither the placeholder ` + + `nor a real phase name (${[...VALID_PHASES].join(', ')}).` + ); + } + }); + + it('(b) the README lists the real named phases (kept in sync with dispatch.js)', () => { + // The Quick start tells the operator `` is a named phase and lists + // the values. Pin that list to dispatch.js so a future phase rename can't + // leave the README advertising a phase the CLI rejects. + const phases = phasesFromDispatchSource(); + for (const phase of phases) { + assert.ok( + README.includes(phase), + `README must list the valid phase "${phase}" so operators can copy it ` + + `(it is in dispatch.js AUDIT_PHASES/AMEND_PHASES).` + ); + } + }); + + it('(c) "Requires Node ≥ N" matches package.json engines.node floor', () => { + const engines = PKG.engines?.node; + assert.ok(engines, 'package.json must declare engines.node'); + + const floorMatch = engines.match(/(\d+)/); + assert.ok(floorMatch, `could not parse a major-version floor from engines.node "${engines}"`); + const floor = floorMatch[1]; + + const readmeMatch = README.match(/Requires Node\s+[≥>]=?\s*(\d+)/); + assert.ok(readmeMatch, + 'README must state the Node floor as "Requires Node ≥ N" so the install prose ' + + 'has a single source of truth with engines.node'); + assert.equal( + readmeMatch[1], + floor, + `README says Node ≥ ${readmeMatch[1]} but package.json engines.node is "${engines}" ` + + `(floor ${floor}). The prose must follow the engines field.` + ); + }); + + it('(d) every operator-facing env var the package reads is documented in the README', () => { + const vars = envVarsReadInSource(); + // Spot-check so a broken extractor (empty Set) can't make this vacuously + // pass: these three are read in non-test source today and are the surface + // td-p-002 documented. + for (const expected of ['SWARM_DB', 'DOGFOOD_FINDINGS_FORMAT', 'DOGFOOD_LOG_HUMAN']) { + assert.ok(vars.has(expected), + `extractor missed \`process.env.${expected}\` — the env-var extraction is wrong`); + } + + for (const name of vars) { + assert.ok( + README.includes(name), + `the package reads \`process.env.${name}\` but README.md never documents it. ` + + `Add it to the "Environment variables" section (or remove the read). ` + + `Documented-or-removed is the contract — an operator scripting against ` + + `\`${name}\` must be able to learn it from the README.` + ); + } + }); +}); + +// ── helpers that read the real sources (no duplicated literals) ── + +/** + * Return every fenced README line that invokes `swarm `, JOINED with its + * immediately-following continuation lines (lines starting with `--` or that + * end a shell line-continuation `\`), so a multi-line example like + * + * swarm collect \ + * --domain=backend:.../output.json + * + * is reassembled into one logical invocation before flag assertions run. + */ +function exampleInvocationsFor(cmd) { + const out = []; + for (const block of fencedBashBlocks(README)) { + const lines = block.split('\n'); + for (let i = 0; i < lines.length; i++) { + const line = lines[i].trim(); + if (!line.startsWith(`swarm ${cmd} `) && line !== `swarm ${cmd}`) continue; + + let joined = line.replace(/\\$/, '').trim(); + let j = i; + // Absorb continuation lines: either the prior line ended with `\`, or the + // next line is a flag fragment (`--...`). Stop at a blank line, a comment, + // or the next `swarm` invocation. + while (j + 1 < lines.length) { + const prev = lines[j].trim(); + const next = lines[j + 1].trim(); + const prevContinues = prev.endsWith('\\'); + const nextIsFlag = next.startsWith('--'); + if (!prevContinues && !nextIsFlag) break; + if (next === '' || next.startsWith('#') || next.startsWith('swarm ')) break; + joined += ' ' + next.replace(/\\$/, '').trim(); + j++; + } + out.push(joined.trim()); + } + } + return out; +} + +/** + * Extract AUDIT_PHASES + AMEND_PHASES from commands/dispatch.js source — the + * authoritative phase vocabulary. Same regex-over-source approach as + * wave10-docs-identity-drift.test.js (the lists aren't exported). + */ +function phasesFromDispatchSource() { + const src = readFileSync( + new URL('./commands/dispatch.js', import.meta.url), + 'utf-8' + ); + const auditMatch = src.match(/const AUDIT_PHASES\s*=\s*\[([^\]]+)\]/); + const amendMatch = src.match(/const AMEND_PHASES\s*=\s*\[([^\]]+)\]/); + assert.ok(auditMatch && amendMatch, + 'dispatch.js must define AUDIT_PHASES and AMEND_PHASES'); + const parse = (s) => s.split(',').map((x) => x.trim().replace(/^['"]|['"]$/g, '')).filter(Boolean); + return new Set([...parse(auditMatch[1]), ...parse(amendMatch[1])]); +} + +/** + * Walk the package's NON-TEST `.js` source and return the set of environment + * variables it READS — i.e. every `process.env.` property access. This + * is the operator-facing env-var contract: a var the code reads is one an + * operator can set to change behavior, so it must be documented. + * + * Read-the-real-source discipline (mirrors extractCliCommands / phasesFrom- + * DispatchSource): the list is extracted from source, never hardcoded, so a + * future contributor who adds `process.env.NEW_VAR` cannot leave the README + * behind without this test catching it. + * + * Scope decisions: + * - Test files (`*.test.js` / `*.test.mjs`) are excluded — they set/read env + * vars for fixture setup, which is not the production read surface. + * - `node_modules`, `dist`, and dotfiles/dotdirs are skipped. + * - Only the `process.env.UPPER_SNAKE` access form is matched (the config + * surface). `process.env` spreads (e.g. `{ ...process.env, FORCE_COLOR }` + * in runner.js, which SETS rather than reads a config var) are correctly + * not captured by the property-access regex. + */ +function envVarsReadInSource() { + const names = new Set(); + const re = /process\.env\.([A-Z][A-Z0-9_]+)/g; + + const walk = (dir) => { + for (const entry of readdirSync(dir, { withFileTypes: true })) { + if (entry.name.startsWith('.')) continue; + const full = join(dir, entry.name); + if (entry.isDirectory()) { + if (entry.name === 'node_modules' || entry.name === 'dist') continue; + walk(full); + continue; + } + if (!entry.isFile()) continue; + if (!entry.name.endsWith('.js')) continue; + if (entry.name.endsWith('.test.js') || entry.name.endsWith('.test.mjs')) continue; + + const src = readFileSync(full, 'utf-8'); + let m; + while ((m = re.exec(src)) !== null) names.add(m[1]); + } + }; + + walk(PKG_DIR); + assert.ok(names.size >= 3, + `expected to extract the operator-facing env vars from source; got ${names.size}: ${[...names].join(', ')}`); + return names; +} + +// Touch fileURLToPath so an unused-import lint never trips; harmless no-op that +// also documents that all source reads here are path-relative to this test file. +void fileURLToPath; diff --git a/packages/dogfood-swarm/meta-amendA-state-machines.test.js b/packages/dogfood-swarm/meta-amendA-state-machines.test.js new file mode 100644 index 0000000..72dee7b --- /dev/null +++ b/packages/dogfood-swarm/meta-amendA-state-machines.test.js @@ -0,0 +1,392 @@ +/** + * meta-amendA-state-machines.test.js + * + * Wave-1 state-machines amend (sm-001..sm-004). One co-located regression test + * per confirmed finding. Each test fails against the pre-fix code and passes + * against the surgical fix. + * + * sm-001 (HIGH) lib/domains.js — unfreezeDomains had no in-flight-wave + * guard, so globs could drift between + * dispatch and collect and the captured + * domain_snapshot_id was decorative. + * sm-002 (HIGH) lib/domains.js — checkOwnership tested an agent's globs in + * isolation, so two OWNED domains with + * overlapping globs both "owned" the same + * file with zero violation. Fixed at the + * freeze boundary (reject) + a runtime + * specificity-arbitrated owner guard. + * sm-r-001(HIGH) lib/domains.js — REGRESSION of sm-002: the freeze guard + * rejected ANY glob-level overlap, so the + * auto-detected default full-stack map + * (frontend's globs ⊂ backend's `src/**`) + * could no longer be frozen, breaking + * init→freeze→dispatch. Coupled latent bug: + * resolveExclusiveOwner iterated alphabetical + * getDomains order, misattributing a `.tsx` + * file to backend. Fix: most-specific-glob- + * wins (order-independent) ownership; freeze + * rejects ONLY equal-specificity criss-cross. + * sm-003 (MED) lib/advance.js — recordPromotion ran three durable writes + * unwrapped; a mid-sequence throw left an + * orphan promotion row. + * sm-004 (LOW) lib/worktree.js — getWorktreeDiff / getWorktreeChanges were + * dead exports carrying a latent HEAD~1..HEAD + * mis-attribution bug; removed. + */ + +import { describe, it, beforeEach, afterEach } from 'node:test'; +import assert from 'node:assert/strict'; +import { readFileSync, mkdtempSync, mkdirSync, writeFileSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { dirname, join } from 'node:path'; +import { fileURLToPath } from 'node:url'; + +import { openMemoryDb } from './db/connection.js'; +import { + detectDomains, saveDomainDraft, freezeDomains, unfreezeDomains, aredomainsFrozen, + checkOwnership, hasActiveWave, +} from './lib/domains.js'; +import { recordPromotion, getPromotions } from './lib/advance.js'; +import * as worktree from './lib/worktree.js'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); + +function seedRun(db, runId, domains) { + db.prepare('INSERT INTO runs (id, repo, local_path, commit_sha) VALUES (?, ?, ?, ?)') + .run(runId, 'org/r', '/tmp/r', 'a'.repeat(40)); + saveDomainDraft(db, runId, domains); +} + +// ═══════════════════════════════════════════ +// sm-001 — unfreeze refused while a wave is in flight +// ═══════════════════════════════════════════ + +describe('sm-001 — unfreezeDomains in-flight-wave guard', () => { + let db; + const RUN_ID = 'sm1'; + + beforeEach(() => { + db = openMemoryDb(); + seedRun(db, RUN_ID, [ + { name: 'backend', globs: ['src/**'], ownership_class: 'owned' }, + { name: 'tests', globs: ['tests/**'], ownership_class: 'owned' }, + ]); + freezeDomains(db, RUN_ID); + }); + + afterEach(() => { db.close(); }); + + it('refuses to unfreeze while a wave is dispatched (drift window stays closed)', () => { + db.prepare( + "INSERT INTO waves (run_id, phase, wave_number, status) VALUES (?, 'health-audit-a', 1, 'dispatched')" + ).run(RUN_ID); + + assert.equal(hasActiveWave(db, RUN_ID), true); + assert.throws( + () => unfreezeDomains(db, RUN_ID, 'broaden globs mid-run'), + /wave is in flight/i, + 'unfreeze must be refused while a wave is dispatched', + ); + // Still frozen — the bad state was prevented, not merely reported. + assert.equal(aredomainsFrozen(db, RUN_ID), true); + }); + + it('allows unfreeze when no wave is in flight', () => { + // No wave at all → no drift window. + assert.equal(hasActiveWave(db, RUN_ID), false); + unfreezeDomains(db, RUN_ID, 'pre-dispatch domain edit'); + assert.equal(aredomainsFrozen(db, RUN_ID), false); + }); + + it('allows unfreeze once the wave has left flight (collected)', () => { + db.prepare( + "INSERT INTO waves (run_id, phase, wave_number, status) VALUES (?, 'health-audit-a', 1, 'collected')" + ).run(RUN_ID); + assert.equal(hasActiveWave(db, RUN_ID), false); + unfreezeDomains(db, RUN_ID, 'edit after collect'); + assert.equal(aredomainsFrozen(db, RUN_ID), false); + }); + + it('force escape hatch bypasses the guard (still requires a reason)', () => { + db.prepare( + "INSERT INTO waves (run_id, phase, wave_number, status) VALUES (?, 'health-audit-a', 1, 'dispatched')" + ).run(RUN_ID); + assert.throws(() => unfreezeDomains(db, RUN_ID, '', { force: true }), /requires a reason/); + unfreezeDomains(db, RUN_ID, 'coordinator halted the wave by hand', { force: true }); + assert.equal(aredomainsFrozen(db, RUN_ID), false); + }); +}); + +// ═══════════════════════════════════════════ +// sm-002 / sm-r-001 — overlapping OWNED globs arbitrated by specificity +// ═══════════════════════════════════════════ + +describe('sm-002 / sm-r-001 — overlapping owned globs', () => { + let db; + const RUN_ID = 'sm2'; + + beforeEach(() => { db = openMemoryDb(); }); + afterEach(() => { db.close(); }); + + // A real full-stack tree exercising the SUBSET overlap detectDomains resolves: + // `src/ui/App.tsx` is owned by frontend (`src/ui/**`, `src/**/*.tsx`) yet also + // matches backend's broad `src/**`. The saved draft persists the verbatim + // bucket globs (init.js does not narrow them), so this is the literal map a + // shipped full-stack repo freezes. + function makeFullStackTree() { + const root = mkdtempSync(join(tmpdir(), 'sm-r-001-')); + mkdirSync(join(root, 'src/ui'), { recursive: true }); + mkdirSync(join(root, 'src/server'), { recursive: true }); + mkdirSync(join(root, 'tests'), { recursive: true }); + writeFileSync(join(root, 'src/ui/App.tsx'), 'export default function App(){}\n'); + writeFileSync(join(root, 'src/server/api.ts'), 'export const api = 1;\n'); + writeFileSync(join(root, 'tests/app.test.js'), 'export {};\n'); + writeFileSync(join(root, 'README.md'), '# repo\n'); + return root; + } + + it('sm-r-001: the literal default detectDomains() full-stack map FREEZES', () => { + // The regression: frontend's globs (`src/ui/**`, `src/**/*.tsx`, …) are + // strict SUBSETS of backend's `src/**`. detectDomains resolves this fine + // (frontend precedes backend, claims the files first), but the sm-002 freeze + // guard rejected the glob-level overlap, so the COMMON full-stack shape could + // no longer be frozen → init→freeze→dispatch broke. A subset overlap is + // legal: specificity arbitration gives every file one owner. + const root = makeFullStackTree(); + const { domains } = detectDomains(root); + // Sanity: detection produced the overlapping frontend+backend shape. + assert.ok(domains.find(d => d.name === 'frontend'), 'frontend detected'); + assert.ok(domains.find(d => d.name === 'backend'), 'backend detected'); + + seedRun(db, RUN_ID, domains); + freezeDomains(db, RUN_ID); // must NOT throw — this is the regression assertion + assert.equal(aredomainsFrozen(db, RUN_ID), true, + 'the auto-detected default full-stack map must be freezable'); + }); + + it('sm-r-001: checkOwnership attributes a src/**/*.tsx file to frontend, not backend', () => { + // Coupled ordering bug: resolveExclusiveOwner iterated getDomains ORDER BY + // name (alphabetical → backend before frontend), which disagreed with + // detection order and resolved `src/ui/App.tsx`'s owner as backend. The + // specificity fix attributes it to frontend (`src/ui/**` / `src/**/*.tsx` + // out-rank `src/**`), independent of row order. + const root = makeFullStackTree(); + const { domains } = detectDomains(root); + seedRun(db, RUN_ID, domains); + freezeDomains(db, RUN_ID); + + const tsx = 'src/ui/App.tsx'; + const fe = checkOwnership(db, RUN_ID, 'frontend', [tsx]); + assert.equal(fe.valid.length, 1, 'frontend owns the .tsx file'); + assert.equal(fe.violations.length, 0); + + const be = checkOwnership(db, RUN_ID, 'backend', [tsx]); + assert.equal(be.valid.length, 0, 'backend must NOT own the .tsx file'); + assert.equal(be.violations.length, 1); + assert.equal(be.violations[0].actual_owner, 'frontend', + 'the .tsx file resolves to frontend, not the alphabetically-first backend'); + }); + + it('sm-r-001: a GENUINE equal-specificity criss-cross STILL throws at freeze', () => { + // The narrowed guard must still catch a real exclusive-ownership breach: + // two owned domains whose best matching globs are EQUALLY specific (here both + // `['**']`) genuinely double-own every file — no specificity can arbitrate. + seedRun(db, RUN_ID, [ + { name: 'alpha', globs: ['**'], ownership_class: 'owned' }, + { name: 'beta', globs: ['**'], ownership_class: 'owned' }, + ]); + assert.throws( + () => freezeDomains(db, RUN_ID), + /overlapping owned domains/i, + 'an equal-specificity tie still breaches exclusive ownership', + ); + // Naming the conflict is part of the contract (operator must know which). + try { + freezeDomains(db, RUN_ID); + } catch (e) { + assert.match(e.message, /alpha/); + assert.match(e.message, /beta/); + } + }); + + it('sm-r-001: identical owned globs (src/a/** vs src/a/**) still throw', () => { + // A second equal-specificity shape — same lead segments, same literal chars. + seedRun(db, RUN_ID, [ + { name: 'one', globs: ['src/a/**'], ownership_class: 'owned' }, + { name: 'two', globs: ['src/a/**'], ownership_class: 'owned' }, + ]); + assert.throws(() => freezeDomains(db, RUN_ID), /overlapping owned domains/i); + }); + + it('still freezes disjoint owned globs (no false positive)', () => { + seedRun(db, RUN_ID, [ + { name: 'backend', globs: ['src/**'], ownership_class: 'owned' }, + { name: 'tests', globs: ['tests/**'], ownership_class: 'owned' }, + { name: 'docs', globs: ['*.md', 'docs/**'], ownership_class: 'owned' }, + ]); + freezeDomains(db, RUN_ID); // must not throw + assert.equal(aredomainsFrozen(db, RUN_ID), true); + }); + + it('owned glob overlapping only a SHARED/BRIDGE domain is allowed', () => { + // The exclusive-ownership invariant is owned-vs-owned; shared/bridge are + // deliberate escape hatches and must not trip the freeze guard. + seedRun(db, RUN_ID, [ + { name: 'backend', globs: ['src/**'], ownership_class: 'owned' }, + { name: 'types', globs: ['src/types/**'], ownership_class: 'bridge' }, + { name: 'config', globs: ['*.json'], ownership_class: 'shared' }, + ]); + freezeDomains(db, RUN_ID); // must not throw + assert.equal(aredomainsFrozen(db, RUN_ID), true); + }); + + it('runtime checkOwnership gives a file exactly ONE owned owner (most-specific-wins)', () => { + // Defense-in-depth: on the SUBSET overlap (frontend's `src/**/*.tsx` ⊂ + // backend's `src/**`) the two owned agents must NOT both pass for the same + // file. Specificity arbitration resolves a single owner — frontend, whose + // `.tsx` glob is strictly more specific than backend's bare `src/**`; the + // other gets a violation. Pre-fix BOTH returned valid.length=1 / + // violations.length=0 (sm-002); pre-sm-r-001 the loser/winner could flip to + // backend on row order. + seedRun(db, RUN_ID, [ + { name: 'frontend', globs: ['src/**/*.tsx'], ownership_class: 'owned' }, + { name: 'backend', globs: ['src/**'], ownership_class: 'owned' }, + ]); + const file = 'src/app/Foo.tsx'; + + const fe = checkOwnership(db, RUN_ID, 'frontend', [file]); + const be = checkOwnership(db, RUN_ID, 'backend', [file]); + + const feOwns = fe.valid.length === 1 && fe.violations.length === 0; + const beOwns = be.valid.length === 1 && be.violations.length === 0; + + // Exactly one of the two owns it — the silent double-ownership is gone. + assert.ok(feOwns !== beOwns, + 'exactly one owned domain may claim a file; both-pass is the sm-002 bug'); + // The more-specific glob (frontend) wins; backend is the flagged non-owner. + assert.ok(feOwns, 'the strictly-more-specific .tsx glob (frontend) owns the file'); + assert.equal(be.valid.length, 0); + assert.equal(be.violations.length, 1); + assert.equal(be.violations[0].actual_owner, 'frontend'); + }); + + it('non-overlapping ownership is unchanged (own file valid, cross-domain file a violation)', () => { + // Regression guard for the existing control-plane semantics. + seedRun(db, RUN_ID, [ + { name: 'backend', globs: ['src/**'], ownership_class: 'owned' }, + { name: 'tests', globs: ['tests/**'], ownership_class: 'owned' }, + ]); + const own = checkOwnership(db, RUN_ID, 'backend', ['src/server.js']); + assert.equal(own.valid.length, 1); + assert.equal(own.violations.length, 0); + + const cross = checkOwnership(db, RUN_ID, 'backend', ['tests/t.js']); + assert.equal(cross.valid.length, 0); + assert.equal(cross.violations.length, 1); + assert.equal(cross.violations[0].actual_owner, 'tests'); + }); +}); + +// ═══════════════════════════════════════════ +// sm-003 — recordPromotion atomicity +// ═══════════════════════════════════════════ + +describe('sm-003 — recordPromotion is atomic', () => { + let db; + const RUN_ID = 'sm3'; + + function seedAdvanceable(db) { + seedRun(db, RUN_ID, [ + { name: 'backend', globs: ['src/**'], ownership_class: 'owned' }, + ]); + freezeDomains(db, RUN_ID); + const wave = db.prepare( + "INSERT INTO waves (run_id, phase, wave_number, status) VALUES (?, 'health-audit-a', 1, 'collected')" + ).run(RUN_ID); + return Number(wave.lastInsertRowid); + } + + beforeEach(() => { db = openMemoryDb(); }); + afterEach(() => { db.close(); }); + + it('a mid-sequence throw rolls back ALL three writes (no orphan promotion row)', () => { + const waveId = seedAdvanceable(db); + + // Force the SECOND mutation (transitionWave → INSERT wave_state_events) to + // throw by dropping its audit table. better-sqlite3's outer transaction + // must then roll back the promotions INSERT and leave runs.status intact. + db.prepare('DROP TABLE wave_state_events').run(); + + const statusBefore = db.prepare('SELECT status FROM runs WHERE id = ?').get(RUN_ID).status; + const waveStatusBefore = db.prepare('SELECT status FROM waves WHERE id = ?').get(waveId).status; + + let threw = null; + try { + recordPromotion(db, RUN_ID, waveId, 'health-audit-a', 'health-audit-b', { gates: [] }); + } catch (e) { + threw = e; + } + assert.ok(threw, 'recordPromotion must throw when transitionWave fails'); + + // (1) no orphan promotion row + const promos = db.prepare('SELECT COUNT(*) as n FROM promotions WHERE run_id = ?').get(RUN_ID).n; + assert.equal(promos, 0, 'promotion INSERT must roll back when the wave transition fails'); + + // (2) wave status untouched + const waveStatusAfter = db.prepare('SELECT status FROM waves WHERE id = ?').get(waveId).status; + assert.equal(waveStatusAfter, waveStatusBefore, 'wave status must be unchanged after rollback'); + + // (3) run status untouched + const statusAfter = db.prepare('SELECT status FROM runs WHERE id = ?').get(RUN_ID).status; + assert.equal(statusAfter, statusBefore, 'run status must be unchanged after rollback'); + }); + + it('the happy path still records all three writes together', () => { + const waveId = seedAdvanceable(db); + const promotionId = recordPromotion( + db, RUN_ID, waveId, 'health-audit-a', 'health-audit-b', { gates: [] } + ); + + assert.ok(promotionId > 0); + const promos = getPromotions(db, RUN_ID); + assert.equal(promos.length, 1); + assert.equal(promos[0].to_phase, 'health-audit-b'); + + // wave advanced + audit row written + assert.equal(db.prepare('SELECT status FROM waves WHERE id = ?').get(waveId).status, 'advanced'); + const evt = db.prepare( + "SELECT to_status, reason FROM wave_state_events WHERE wave_id = ? ORDER BY id DESC LIMIT 1" + ).get(waveId); + assert.equal(evt.to_status, 'advanced'); + assert.match(evt.reason, new RegExp(`promotion #${promotionId}`)); + + // run status moved to the next phase + assert.equal(db.prepare('SELECT status FROM runs WHERE id = ?').get(RUN_ID).status, 'health-audit-b'); + }); +}); + +// ═══════════════════════════════════════════ +// sm-004 — dead worktree-diff exports removed +// ═══════════════════════════════════════════ + +describe('sm-004 — getWorktreeDiff / getWorktreeChanges removed', () => { + it('the dead exports no longer exist on the module', () => { + assert.equal(typeof worktree.getWorktreeDiff, 'undefined', + 'getWorktreeDiff was a dead export carrying a latent HEAD~1..HEAD bug — must be gone'); + assert.equal(typeof worktree.getWorktreeChanges, 'undefined', + 'getWorktreeChanges was a dead export — must be gone'); + }); + + it('the live worktree surface still loads (no source references to the removed names)', () => { + // The package still imports/loads, and the live exports remain. + assert.equal(typeof worktree.createWorktree, 'function'); + assert.equal(typeof worktree.mergeWorktree, 'function'); + assert.equal(typeof worktree.removeWorktree, 'function'); + + const src = readFileSync(join(__dirname, 'lib/worktree.js'), 'utf-8'); + assert.ok(!/HEAD~1\.\.HEAD/.test(src), + 'the latent HEAD~1..HEAD mis-attribution must be gone with the dead code'); + assert.ok(!/getWorktreeDiff|getWorktreeChanges/.test(src), + 'no lingering definitions of the removed exports'); + }); +}); diff --git a/packages/dogfood-swarm/meta-amendA-verify-engine.test.js b/packages/dogfood-swarm/meta-amendA-verify-engine.test.js new file mode 100644 index 0000000..11fe936 --- /dev/null +++ b/packages/dogfood-swarm/meta-amendA-verify-engine.test.js @@ -0,0 +1,268 @@ +/** + * meta-amendA-verify-engine.test.js — Amend wave A, verify-engine domain. + * + * Regression tests for the CONFIRMED verify-engine findings fixed in this + * amend wave. Each block names the finding id and asserts the *post-fix* + * behaviour so a future regression points at the exact defect that returned. + * + * The verify engine decides whether a claimed fix is real. Every defect here + * is a way the tool LIES about done-ness — a still-present bug reported as a + * verified fix, or a no-op verification reported as a pass. The assertions + * below pin the corrected verdicts. + * + * ve-001 (CRITICAL) — a MISS on a PROSE-derived anchor (symbol-less + * finding) must classify `unverifiable`, NOT `verified`. Only a miss on + * a real code-identifier `symbol` may support `verified`. Fixed in both + * lib/verify-classifier-v2.js (v2) and lib/verify-fixed.js (legacy v1). + * ve-003 (MED) — the bucket search window must be symmetric. A still- + * present symbol that drifted into the adjacent LOWER bucket must still + * be found (→ regressed/claimed), not missed (→ verified). Fixed in both + * classifiers (bucketForLine in v2, the inline window in v1). + * ve-004 (MED) — a Node repo with no `test` script must NOT be a verified + * pass: `npm test --if-present` runs zero tests, so the node adapter + * surfaces a distinct `no_tests` verdict the caller treats as not-pass. + * ve-005 (LOW) — an empty / all-optional required step set must not be an + * automatic `pass`; runSteps() returns `skip`. + * ve-006 (LOW) — doc-only correction to the runStep() security docstring + * (no behaviour change; asserted indirectly via the still-passing + * literal-arg execution path). + */ + +import { describe, it } from 'node:test'; +import assert from 'node:assert/strict'; +import { resolve, join } from 'node:path'; +import { tmpdir } from 'node:os'; +import { mkdtempSync, mkdirSync, writeFileSync, rmSync } from 'node:fs'; + +import { classifyFindingV2, VERIFIED_VIA } from './lib/verify-classifier-v2.js'; +import { classifyFixedFinding } from './lib/verify-fixed.js'; +import { runSteps, runStep } from './lib/verify/runner.js'; +import { nodeAdapter } from './lib/verify/adapters/node.js'; + +const REPO = join(tmpdir(), 'meta-amendA-verify-engine-repo'); + +/** + * Fake-filesystem reader: `{ repo-relative path → lines[] }`. Resolved + * through the same path.resolve() the classifiers use so keys match on + * win32 + posix. Mirrors the helper in verify-fixed.test.js. + */ +function fakeReader(table, repoRoot) { + const resolved = new Map(); + for (const [k, v] of Object.entries(table)) { + resolved.set(resolve(repoRoot, k), v); + } + return (absPath) => (resolved.has(absPath) ? resolved.get(absPath) : null); +} + +function mkFinding(overrides = {}) { + return { + finding_id: 'F-001', + fingerprint: 'fp-F-001', + severity: 'HIGH', + category: 'bug', + file_path: 'src/a.js', + line_number: 42, + symbol: 'doThing', + description: 'doThing leaks memory', + recommendation: 'free the buffer', + last_seen_wave: 3, + fixed_wave_id: 3, + ...overrides, + }; +} + +// ═══════════════════════════════════════════════════════════════════════ +// ve-001 — prose-anchor MISS must be `unverifiable`, never `verified` +// ═══════════════════════════════════════════════════════════════════════ + +describe('ve-001 — symbol-less finding, prose-anchor miss → unverifiable (v2)', () => { + it('classifies unverifiable (NOT verified) when the prose token is absent', () => { + // Symbol-less finding. The description lead token "Missing" is prose — + // it is not expected to appear in source. The bucket does NOT contain + // it. Pre-fix this resolved to `verified` (fix landed); post-fix it must + // be `unverifiable` (cannot prove the fix landed). + const f = mkFinding({ + symbol: '', + file_path: 'src/a.js', + description: 'Missing null check in handler', + line_number: 5, + }); + const file = Array.from({ length: 20 }, (_, i) => `const x${i} = compute();`); + const r = classifyFindingV2(f, REPO, { readLines: fakeReader({ 'src/a.js': file }, REPO) }); + assert.equal(r.classification, 'unverifiable'); + assert.notEqual(r.classification, 'verified'); + assert.equal(r.verified_via, VERIFIED_VIA.UNVERIFIABLE); + assert.match(r.evidence, /prose|cannot prove/i); + }); + + it('still resolves `verified` on a real code-identifier symbol miss', () => { + // The asymmetry fix must NOT break the legitimate case: a real symbol + // that is genuinely gone from the bucket is a verified fix. + const f = mkFinding({ symbol: 'doThing', file_path: 'src/a.js', line_number: 42 }); + const file = Array.from({ length: 60 }, (_, i) => `// clean line ${i + 1}`); + const r = classifyFindingV2(f, REPO, { readLines: fakeReader({ 'src/a.js': file }, REPO) }); + assert.equal(r.classification, 'verified'); + assert.equal(r.verified_via, VERIFIED_VIA.ANCHOR); + }); + + it('a prose-anchor HIT still classifies claimed-but-still-present (confirm-only path intact)', () => { + // A HIT on a prose token is still meaningful: the token IS in the code, + // so the bug is demonstrably still there. Only the MISS side changed. + const f = mkFinding({ + symbol: '', + file_path: 'src/a.js', + description: 'memoryLeak in buffer logic', + line_number: 5, + }); + const file = ['', 'noise', 'noise', 'noise', 'function memoryLeak() {}', 'noise']; + const r = classifyFindingV2(f, REPO, { readLines: fakeReader({ 'src/a.js': file }, REPO) }); + assert.equal(r.classification, 'claimed-but-still-present'); + }); +}); + +describe('ve-001 — symbol-less finding, prose-anchor miss → unverifiable (legacy v1)', () => { + it('classifies unverifiable (NOT verified) so --legacy-v1 keeps the hole closed', () => { + const f = mkFinding({ + symbol: '', + file_path: 'src/a.js', + description: 'Missing null check in handler', + line_number: 5, + }); + const file = Array.from({ length: 20 }, (_, i) => `const x${i} = compute();`); + const r = classifyFixedFinding(f, REPO, { readLines: fakeReader({ 'src/a.js': file }, REPO) }); + assert.equal(r.classification, 'unverifiable'); + assert.notEqual(r.classification, 'verified'); + assert.match(r.evidence, /prose|cannot prove/i); + }); + + it('still resolves `verified` on a real code-identifier symbol miss', () => { + const f = mkFinding({ symbol: 'doThing', file_path: 'src/a.js', line_number: 42 }); + const file = Array.from({ length: 60 }, (_, i) => `// clean line ${i + 1}`); + const r = classifyFixedFinding(f, REPO, { readLines: fakeReader({ 'src/a.js': file }, REPO) }); + assert.equal(r.classification, 'verified'); + }); +}); + +// ═══════════════════════════════════════════════════════════════════════ +// ve-003 — symmetric bucket window: downward drift must be caught +// ═══════════════════════════════════════════════════════════════════════ + +describe('ve-003 — downward-drift symbol is still found (legacy v1)', () => { + it('classifies regressed (NOT verified) when the symbol drifted to the lower bucket', () => { + // Recorded line 42 → bucket [40,50]. The bug is still present but the + // surrounding code shifted so the symbol now sits at line 38, which is + // in the adjacent LOWER bucket [30,40). Pre-fix the window was [40,50] + // (upward-only overlap) and missed line 38 → `verified`. Post-fix the + // window reaches a full bucket down and finds it → not verified. + const f = mkFinding({ symbol: 'doThing', file_path: 'src/a.js', line_number: 42 }); + const file = Array.from({ length: 60 }, (_, i) => `// line ${i + 1}`); + file[37] = 'function doThing() { /* still here, drifted down */ }'; // line 38 + const r = classifyFixedFinding(f, REPO, { readLines: fakeReader({ 'src/a.js': file }, REPO) }); + assert.notEqual(r.classification, 'verified'); + assert.equal(r.classification, 'regressed'); + assert.match(r.evidence, /reappeared/); + }); +}); + +describe('ve-003 — downward-drift symbol is still found (v2)', () => { + it('classifies regressed (NOT verified) when the symbol drifted to the lower bucket', () => { + const f = mkFinding({ symbol: 'doThing', file_path: 'src/a.js', line_number: 42 }); + const file = Array.from({ length: 60 }, (_, i) => `// line ${i + 1}`); + file[37] = 'function doThing() { /* still here, drifted down */ }'; // line 38 + const r = classifyFindingV2(f, REPO, { readLines: fakeReader({ 'src/a.js': file }, REPO) }); + assert.notEqual(r.classification, 'verified'); + assert.equal(r.classification, 'regressed'); + assert.equal(r.verified_via, VERIFIED_VIA.ANCHOR); + }); +}); + +// ═══════════════════════════════════════════════════════════════════════ +// ve-004 — Node repo with no `test` script is not a verified pass +// ═══════════════════════════════════════════════════════════════════════ + +describe('ve-004 — node adapter refuses to report a no-test repo as a verified pass', () => { + let root; + + function makeFixture(scripts) { + const dir = mkdtempSync(join(tmpdir(), 've004-')); + writeFileSync( + join(dir, 'package.json'), + JSON.stringify({ name: 'no-test-repo', version: '0.0.0', scripts }, null, 2), + 'utf-8' + ); + return dir; + } + + it('a fixture with NO test script yields a non-pass verdict (no_tests), not pass', () => { + // A real repo with no `test` script. `npm test --if-present` exits 0 and + // runs nothing. Pre-fix the runner reported verdict `pass` (zero tests + // ran, indistinguishable from a real pass). Post-fix the node adapter + // surfaces `no_tests` so the wave gate (which advances only on `pass`) + // treats it as not-verified. + root = makeFixture({ build: 'tsc -b' }); // lint/build present, NO test + try { + const result = nodeAdapter.run(root); + assert.notEqual(result.verdict, 'pass'); + assert.equal(result.verdict, 'no_tests'); + assert.equal(result.no_tests, true); + assert.equal(result.tests_ran, false); + assert.match(result.reason, /no .?test.? script|zero tests/i); + } finally { + rmSync(root, { recursive: true, force: true }); + } + }); +}); + +// ═══════════════════════════════════════════════════════════════════════ +// ve-005 — empty / all-optional required step set is not an automatic pass +// ═══════════════════════════════════════════════════════════════════════ + +describe('ve-005 — runSteps does not report `pass` for an empty/all-skipped step set', () => { + it('empty step list → verdict skip (not pass)', () => { + const result = runSteps('.', []); + assert.notEqual(result.verdict, 'pass'); + assert.equal(result.verdict, 'skip'); + }); + + it('all-optional step set → verdict skip even when the optional steps passed', () => { + // Every step optional → requiredResults is []. `[].every` is vacuously + // true, which pre-fix yielded `pass`. Post-fix this is `skip`: nothing + // required ran, so it cannot be reported as a passing verification. + const steps = [ + { name: 'lint', cmd: 'node', args: ['-e', '"process.exit(0)"'], optional: true }, + ]; + const result = runSteps('.', steps); + assert.notEqual(result.verdict, 'pass'); + assert.equal(result.verdict, 'skip'); + }); + + it('a required step that passes still yields `pass` (fix is scoped to the empty case)', () => { + const steps = [ + { name: 'test', cmd: 'node', args: ['-e', '"console.log(\'# tests 1\')"'] }, + ]; + const result = runSteps('.', steps); + assert.equal(result.verdict, 'pass'); + }); +}); + +// ═══════════════════════════════════════════════════════════════════════ +// ve-006 — runStep security docstring correction (behaviour unchanged) +// ═══════════════════════════════════════════════════════════════════════ + +describe('ve-006 — runStep literal-arg execution path still behaves correctly', () => { + // The fix is doc-only: it corrects the docstring's overstatement about + // shell:true safety. There is no behaviour change, so we simply pin that + // the hardcoded-literal-arg execution path (the safe path the corrected + // docstring describes) continues to run and report exit codes correctly. + it('runs a literal-arg command and reports a passing exit code', () => { + const r = runStep('.', { name: 'ok', cmd: 'node', args: ['-e', '"process.exit(0)"'] }); + assert.equal(r.passed, true); + assert.equal(r.exit_code, 0); + }); + + it('reports a failing exit code for a literal-arg command', () => { + const r = runStep('.', { name: 'bad', cmd: 'node', args: ['-e', '"process.exit(3)"'] }); + assert.equal(r.passed, false); + assert.equal(r.exit_code, 3); + }); +}); diff --git a/packages/dogfood-swarm/meta-amendB-operator-output.test.js b/packages/dogfood-swarm/meta-amendB-operator-output.test.js new file mode 100644 index 0000000..dc5a976 --- /dev/null +++ b/packages/dogfood-swarm/meta-amendB-operator-output.test.js @@ -0,0 +1,394 @@ +/** + * meta-amendB-operator-output.test.js — Stage C amend (operator-output) regression pins. + * + * These tests pin the operator-trust / exit-code / verify-output fixes from the + * Stage-B proactive sweep (swarm-1780390764-7dab/wave-3). They are NOT bug + * fixes — the gate was always SAFE (no false ADVANCE) — they close the gap + * between the machine signal a CI step gates on and the human-readable signal + * an operator reads. Each block names its finding id. + * + * Findings covered here: + * cli-p-002 — `swarm verify` exits 0 only on a clean pass (subprocess seam) + * ve-p-002 — formatVerify prints the non-pass `reason` + * td-p-004 — END-TO-END: adapter no_tests → operator sees explained + * NO_TESTS and `swarm verify` exits non-zero + * ve-p-003 — the durable receipt carries the verdict + reason (stdout hdr) + * ve-p-004 — the verify wave-gate emits a correlated logStage pair + * cli-p-001 — `swarm persist --ingest` exits non-zero on ingest failure + * fp-p-003 — buildDogfoodSubmission finished_at is stable for an + * INCOMPLETE run (idempotent re-ingest) + * fp-p-004 — persist reports repo-knowledge artifacts-written, not + * submitted + * cli-r-002 — the main() refactor keeps the help/usage subprocess contract + */ + +import { describe, it, before, after, beforeEach, afterEach } from 'node:test'; +import assert from 'node:assert/strict'; +import { spawnSync } from 'node:child_process'; +import { fileURLToPath } from 'node:url'; +import { dirname, join } from 'node:path'; +import { mkdtempSync, mkdirSync, writeFileSync, rmSync } from 'node:fs'; +import { tmpdir } from 'node:os'; + +import { openDb, closeDb, openMemoryDb } from './db/connection.js'; +import { saveDomainDraft, freezeDomains } from './lib/domains.js'; +import { verify as runVerify, formatVerify } from './commands/verify.js'; +import { formatPersist } from './commands/persist.js'; +import { buildRunExport, computeRunVerdict } from './lib/persist/export.js'; +import { buildDogfoodSubmission } from './lib/persist/dogfood-bridge.js'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const CLI_PATH = join(__dirname, 'cli.js'); + +// Override the three OPTIONAL node-adapter steps (lint/typecheck/build) with +// trivial in-process no-ops so the verify path never shells out to npx/tsc +// (which could hang or fail on a CI host). The `test` step is deliberately +// LEFT at its default `npm test --if-present` so the node adapter's no_tests +// downgrade still fires for a fixture with no `test` script — overriding the +// test step would suppress exactly the path under test. +const SAFE_OPTIONAL_OVERRIDES = { + lint: { name: 'lint', cmd: process.execPath, args: ['-e', 'process.exit(0)'], optional: true }, + typecheck: { name: 'typecheck', cmd: process.execPath, args: ['-e', 'process.exit(0)'], optional: true }, + build: { name: 'build', cmd: process.execPath, args: ['-e', 'process.exit(0)'], optional: true }, +}; + +function makeNoTestRepo(parent) { + // A Node repo with NO `test` script: the node adapter scores it (package.json + // present) but evidence.hasTest === false, so `npm test --if-present` runs + // nothing and the adapter returns verdict no_tests. + const dir = join(parent, 'no-test-repo'); + mkdirSync(dir, { recursive: true }); + writeFileSync( + join(dir, 'package.json'), + JSON.stringify({ name: 'fixture-no-test', version: '0.0.0', scripts: { lint: 'true' } }, null, 2), + 'utf-8' + ); + return dir; +} + +function seedRunWithWave(db, runId, localPath) { + db.prepare('INSERT INTO runs (id, repo, local_path, commit_sha) VALUES (?, ?, ?, ?)') + .run(runId, 'org/r', localPath, 'a'.repeat(40)); + saveDomainDraft(db, runId, [{ name: 'backend', globs: ['src/**'], ownership_class: 'owned' }]); + freezeDomains(db, runId); + db.prepare("INSERT INTO waves (run_id, phase, wave_number, status) VALUES (?, 'health-audit-a', 1, 'collected')") + .run(runId); +} + +// ═══════════════════════════════════════════ +// ve-p-002 / td-p-004 — formatVerify surfaces the reason +// ═══════════════════════════════════════════ + +describe('ve-p-002: formatVerify prints the non-pass reason', () => { + it('renders a Reason: line when the verdict carries one', () => { + const out = formatVerify({ + verdict: 'no_tests', + reason: 'no `test` script — `npm test --if-present` ran zero tests; not a verified pass', + adapter: 'node', + probe: null, + duration_ms: 5, + test_count: 0, + steps: [], + }); + assert.match(out, /Verification: NO_TESTS/); + assert.match(out, /Reason: no `test` script/, + 'the adapter reason must reach the operator, not be dropped'); + }); + + it('omits the Reason: line for a clean pass (no reason to show)', () => { + const out = formatVerify({ + verdict: 'pass', reason: undefined, adapter: 'node', probe: null, + duration_ms: 5, test_count: 3, steps: [], + }); + assert.match(out, /Verification: PASS/); + assert.doesNotMatch(out, /Reason:/); + }); +}); + +// ═══════════════════════════════════════════ +// td-p-004 (END-TO-END) + ve-p-003 + ve-p-004 +// adapter no_tests → verify() return + persisted receipt + logStage +// ═══════════════════════════════════════════ + +describe('td-p-004: no_tests propagates end-to-end through verify()', () => { + // verify() calls openDb(opts.dbPath), which expects a FILE PATH (pool key + + // new Database(path)) — so this end-to-end block seeds an on-disk control + // plane and drives verify() against the path, the same way the operator's + // `swarm verify` does. logStage writes to console.error, which we patch in + // process to capture the NDJSON correlation events. + let fixtureRoot; + let repoPath; + let dbPath; + let stderrLines; + let origErr; + + before(() => { + fixtureRoot = mkdtempSync(join(tmpdir(), 'amendB-notest-')); + repoPath = makeNoTestRepo(fixtureRoot); + }); + after(() => rmSync(fixtureRoot, { recursive: true, force: true })); + + beforeEach(() => { + dbPath = join(fixtureRoot, `cp-${Math.random().toString(36).slice(2)}.db`); + const db = openDb(dbPath); + seedRunWithWave(db, 'r1', repoPath); + closeDb(dbPath); + stderrLines = []; + origErr = console.error; + console.error = (...a) => { stderrLines.push(a.join(' ')); }; + }); + afterEach(() => { + console.error = origErr; + try { closeDb(dbPath); } catch { /* */ } + }); + + it('verify() returns verdict no_tests WITH the explanatory reason', () => { + const result = runVerify({ + runId: 'r1', + dbPath, + commandOverrides: SAFE_OPTIONAL_OVERRIDES, + }); + assert.equal(result.verdict, 'no_tests'); + assert.equal(result.no_tests, true); + assert.match(result.reason, /no `test` script/, + 'verify() must forward the reason, not strip it (the pre-fix gap)'); + }); + + it('formatVerify(result) shows an explained NO_TESTS the operator can act on', () => { + const result = runVerify({ runId: 'r1', dbPath, commandOverrides: SAFE_OPTIONAL_OVERRIDES }); + const out = formatVerify(result); + assert.match(out, /Verification: NO_TESTS/); + assert.match(out, /Reason: no `test` script/, + 'the operator must see WHY, not a bare NO_TESTS token'); + }); + + it('ve-p-003: the persisted receipt stdout carries the verdict + reason header', () => { + const result = runVerify({ runId: 'r1', dbPath, commandOverrides: SAFE_OPTIONAL_OVERRIDES }); + const db = openDb(dbPath); + const receipt = db.prepare('SELECT * FROM verification_receipts WHERE id = ?').get(result.receiptId); + assert.ok(receipt, 'receipt must persist'); + assert.equal(receipt.passed, 0, 'no_tests is not a verified pass'); + assert.match(receipt.stdout, /=== verify verdict: no_tests — no `test` script/, + 'the durable receipt must carry the verdict + reason, not just passed=0'); + }); + + it('ve-p-004: the wave gate emits a correlated verify_start/verify_complete pair', () => { + runVerify({ runId: 'r1', dbPath, commandOverrides: SAFE_OPTIONAL_OVERRIDES }); + const events = stderrLines + .map(l => { try { return JSON.parse(l); } catch { return null; } }) + .filter(e => e && (e.stage === 'verify_start' || e.stage === 'verify_complete')); + const start = events.find(e => e.stage === 'verify_start'); + const complete = events.find(e => e.stage === 'verify_complete'); + assert.ok(start, 'verify_start must be emitted before the run'); + assert.ok(complete, 'verify_complete must be emitted after persistence'); + assert.ok(start.correlation_id && /^coord-/.test(start.correlation_id), + 'start must carry a coord- correlation id'); + assert.equal(start.correlation_id, complete.correlation_id, + 'the pair must share one correlation id so a single grep ties them'); + assert.equal(complete.verdict, 'no_tests'); + assert.match(complete.reason, /no `test` script/); + }); +}); + +// ═══════════════════════════════════════════ +// cli-p-002 — `swarm verify` exit code (subprocess seam) +// ═══════════════════════════════════════════ + +describe('cli-p-002: `swarm verify` exits non-zero unless the verdict is a clean pass', () => { + let tempDir; + let dbPath; + let repoPath; + + before(() => { + tempDir = mkdtempSync(join(tmpdir(), 'amendB-verify-cli-')); + dbPath = join(tempDir, 'control-plane.db'); + repoPath = makeNoTestRepo(tempDir); + const db = openDb(dbPath); + seedRunWithWave(db, 'r1', repoPath); + closeDb(dbPath); + }); + after(() => { + try { closeDb(dbPath); } catch { /* */ } + rmSync(tempDir, { recursive: true, force: true }); + }); + + it('a no_tests verdict makes `swarm verify ` exit non-zero with an explanation', () => { + const r = spawnSync(process.execPath, [CLI_PATH, 'verify', 'r1'], { + encoding: 'utf-8', + cwd: __dirname, + env: { ...process.env, SWARM_DB: dbPath }, + }); + assert.doesNotMatch(r.stderr || '', /SyntaxError/, `cli.js failed to parse:\n${r.stderr}`); + assert.notEqual(r.status, 0, + `a non-pass verdict must NOT exit 0 (CI must see the gate fail); got ${r.status}\nstdout:\n${r.stdout}\nstderr:\n${r.stderr}`); + assert.match(r.stdout, /Verification: NO_TESTS/, 'human-readable verdict still printed to stdout'); + assert.match(r.stderr, /swarm verify: NO_TESTS/, 'exit-code path prints the verdict + why to stderr'); + }); +}); + +// ═══════════════════════════════════════════ +// cli-p-001 — `swarm persist --ingest` exit code on ingest failure +// ═══════════════════════════════════════════ + +describe('cli-p-001: `swarm persist --ingest` exits non-zero when ingest hard-fails', () => { + let tempDir; + let dbPath; + // Unique runId so the repo-relative output dir (`/swarms//`, + // which getOutputDir() hardcodes and SWARM_DB does not redirect) cannot + // collide with another run, and we can clean it deterministically. + const RUN_ID = `amendB-cli-p-001-${Math.random().toString(36).slice(2)}`; + const repoOutputDir = join(__dirname, '..', '..', 'swarms', RUN_ID); + + before(() => { + tempDir = mkdtempSync(join(tmpdir(), 'amendB-persist-cli-')); + dbPath = join(tempDir, 'control-plane.db'); + const db = openDb(dbPath); + // A complete run so persist() builds + writes a normal success-shaped + // report. The ingest is forced to fail deterministically by giving the run + // a MALFORMED commit_sha ('badsha'): the submission persist writes is then + // rejected by packages/ingest/run.js (non-zero exit), driving the exact + // production catch in persist.js step 4 -> report.dogfood.ingested:false. + db.prepare(`INSERT INTO runs (id, repo, local_path, commit_sha, branch, status, created_at, completed_at) + VALUES (?, 'org/r', ?, 'badsha', 'main', 'complete', '2026-04-11T10:00:00Z', '2026-04-11T11:00:00Z')`) + .run(RUN_ID, tempDir); + saveDomainDraft(db, RUN_ID, [{ name: 'backend', globs: ['src/**'], ownership_class: 'owned' }]); + freezeDomains(db, RUN_ID); + db.prepare("INSERT INTO waves (run_id, phase, wave_number, status) VALUES (?, 'health-audit-a', 1, 'advanced')") + .run(RUN_ID); + closeDb(dbPath); + }); + after(() => { + try { closeDb(dbPath); } catch { /* */ } + rmSync(tempDir, { recursive: true, force: true }); + // getOutputDir writes under the repo's swarms/ dir; clean the run's + // artifact tree so the test leaves no trace in the working copy. + rmSync(repoOutputDir, { recursive: true, force: true }); + }); + + it('exit is non-zero and the reproduce line is surfaced when the ingest subprocess fails', () => { + const r = spawnSync(process.execPath, [CLI_PATH, 'persist', RUN_ID, '--ingest'], { + encoding: 'utf-8', + cwd: __dirname, + env: { ...process.env, SWARM_DB: dbPath }, + }); + assert.doesNotMatch(r.stderr || '', /SyntaxError/, `cli.js failed to parse:\n${r.stderr}`); + assert.notEqual(r.status, 0, + `a failed --ingest must NOT exit 0 (a CI gate on $? would treat a failed ` + + `corpus write as success); got ${r.status}\nstdout:\n${r.stdout}\nstderr:\n${r.stderr}`); + assert.match(r.stdout, /Ingested: NO/, 'human-readable ingest failure still printed to stdout'); + assert.match(r.stderr, /ERROR \[INGEST_FAILED\]/, + 'the exit-code path must surface a structured ingest-failure line on stderr'); + assert.match(r.stderr, /Reproduce:/, + 'a copy-pasteable reproduce line must be offered, mirroring persist-results.js'); + }); + + it('persist WITHOUT --ingest exits 0 (no false red) and reports artifacts-written honestly', () => { + const r = spawnSync(process.execPath, [CLI_PATH, 'persist', RUN_ID], { + encoding: 'utf-8', + cwd: __dirname, + env: { ...process.env, SWARM_DB: dbPath }, + }); + assert.equal(r.status, 0, + `persist without --ingest must exit 0; got ${r.status}\nstderr:\n${r.stderr}`); + assert.match(r.stdout, /Submitted: NO/, + 'fp-p-004: repo-knowledge must report artifacts-written, not a DB submission'); + assert.match(r.stdout, /rk audit import/, 'fp-p-004: must tell the operator how to actually submit'); + }); +}); + +// ═══════════════════════════════════════════ +// fp-p-004 — formatPersist honest repo-knowledge wording +// ═══════════════════════════════════════════ + +describe('fp-p-004: formatPersist distinguishes artifacts-written from submitted', () => { + it('prints Submitted: NO with the rk audit import hint when not submitted', () => { + const out = formatPersist({ + runId: 'r1', + verdict: 'pass', + artifacts: { export: '/x/e.json', dogfoodSubmission: '/x/s.json', audit: '/x/audit' }, + dogfood: { ingested: false, reason: 'Not requested' }, + repoKnowledge: { artifactsWritten: true, submitted: false, path: '/x/audit', status: 'pass', posture: 'healthy' }, + }); + assert.match(out, /Submitted: NO/); + assert.match(out, /rk audit import/, 'must tell the operator how to actually submit'); + assert.doesNotMatch(out, /Submitted: YES/); + }); +}); + +// ═══════════════════════════════════════════ +// fp-p-003 — incomplete-run ingest idempotency (stable finished_at) +// ═══════════════════════════════════════════ + +describe('fp-p-003: buildDogfoodSubmission is dedup-stable for an INCOMPLETE run', () => { + let db; + + beforeEach(() => { db = openMemoryDb(); }); + afterEach(() => db.close()); + + function buildIncompleteRun() { + // run.completed is NULL (status not complete) — the case where the pre-fix + // code fell back to new Date() for finished_at and shifted the dedup key. + db.prepare(`INSERT INTO runs (id, repo, local_path, commit_sha, branch, status, created_at, completed_at) + VALUES ('r1', 'org/r', '/tmp/r', ?, 'main', 'health-audit-a', '2026-04-11T10:00:00Z', NULL)`) + .run('a'.repeat(40)); + saveDomainDraft(db, 'r1', [{ name: 'backend', globs: ['src/**'], ownership_class: 'owned' }]); + freezeDomains(db, 'r1'); + db.prepare("INSERT INTO waves (run_id, phase, wave_number, status) VALUES ('r1', 'health-audit-a', 1, 'collected')") + .run(); + } + + it('finished_at is derived from a stable run column, identical across re-builds', () => { + buildIncompleteRun(); + const exp = buildRunExport(db, 'r1'); + const verdict = computeRunVerdict(exp); + + const sub1 = buildDogfoodSubmission(exp, verdict); + const sub2 = buildDogfoodSubmission(exp, verdict); + + // The dedup key in packages/ingest/run.js is (run_id, repo, finished_at). + // run_id + repo are already stable; the fix makes finished_at stable too. + assert.ok(sub1.timing?.finished_at, 'submission must carry timing.finished_at'); + assert.equal(sub1.timing.finished_at, sub2.timing.finished_at, + 're-running persist --ingest on an incomplete run must produce a stable ' + + 'finished_at so the ingest dedup probe matches (idempotent re-ingest)'); + // And it must equal the stable run column (created), not wall-clock. + assert.equal(sub1.timing.finished_at, '2026-04-11T10:00:00Z', + 'finished_at for an incomplete run falls back to run.created, not new Date()'); + }); +}); + +// ═══════════════════════════════════════════ +// cli-r-002 — main() refactor preserves the help/usage subprocess contract +// ═══════════════════════════════════════════ + +describe('cli-r-002: main() extraction keeps the help + usage contract intact', () => { + function runCli(args) { + return spawnSync(process.execPath, [CLI_PATH, ...args], { encoding: 'utf-8', cwd: __dirname }); + } + + it('no-args prints the help banner and exits 0', () => { + const r = runCli([]); + assert.doesNotMatch(r.stderr || '', /SyntaxError/, `cli.js failed to parse:\n${r.stderr}`); + assert.equal(r.status, 0, `no-args must exit 0; got ${r.status}\nstderr:\n${r.stderr}`); + assert.match(r.stdout, /swarm — Truthful swarm control plane for repo work/); + assert.match(r.stdout, /Phases:/); + }); + + it('an unknown command exits 1 and still prints the help banner', () => { + const r = runCli(['definitely-not-a-command']); + assert.equal(r.status, 1, `unknown command must exit 1; got ${r.status}`); + assert.match(r.stdout, /Commands:/); + }); + + it('verify with no run-id exits 1 with its Usage line (guard unchanged by exit-code fix)', () => { + const r = runCli(['verify']); + assert.equal(r.status, 1, `verify no-args must exit 1; got ${r.status}`); + assert.match(r.stderr, /Usage: swarm verify/); + }); + + it('persist with no run-id exits 1 with its Usage line', () => { + const r = runCli(['persist']); + assert.equal(r.status, 1, `persist no-args must exit 1; got ${r.status}`); + assert.match(r.stderr, /Usage: swarm persist/); + }); +}); diff --git a/packages/dogfood-swarm/meta-amendB-state-machines.test.js b/packages/dogfood-swarm/meta-amendB-state-machines.test.js new file mode 100644 index 0000000..b596ed4 --- /dev/null +++ b/packages/dogfood-swarm/meta-amendB-state-machines.test.js @@ -0,0 +1,310 @@ +/** + * meta-amendB-state-machines.test.js + * + * Stage-C (hardening) state-machines / DB-layer amend (sm-p-001..sm-p-004). + * One co-located regression test per confirmed LOW finding. Each test fails + * against the pre-fix code and passes against the surgical fix. + * + * sm-p-001 (LOW) lib/domains.js — sampleGlobPath() did not expand brace + * alternations `{a,b}` or char classes + * `[abc]`, so the freeze-time overlap + * probe (findOwnedGlobOverlaps) was + * BLIND to operator-authored brace/class + * owned globs. Two identical + * `src/{a,b}/**` owners froze silently. + * sm-p-002 (LOW) db/connection.js — openDb()'s schema-version ladder had + * no branch for version > SCHEMA_VERSION + * (a DB written by a NEWER build); it + * opened silently against an unknown + * shape. Now refuses loudly. + * sm-p-003 (LOW) lib/worktree.js — stale-worktree-remove (precedes the + * dependent `worktree add`) and + * `worktree prune` swallowed failures + * with no audit row. Now logStage'd to + * the NDJSON stream (no control-flow + * change — still best-effort). + * sm-p-004 (LOW) db/connection.js — foreign_keys=ON parity between the + * file-backed (openDb) and in-memory + * (openMemoryDb) factories, now routed + * through one applyConnectionPragmas + * helper so the integrity pragma cannot + * silently drift. openDb ALREADY + * enforced FKs pre-fix; this pins parity. + */ + +import { describe, it, beforeEach, afterEach } from 'node:test'; +import assert from 'node:assert/strict'; +import { mkdtempSync, rmSync, mkdirSync, writeFileSync, existsSync } from 'node:fs'; +import { execFileSync } from 'node:child_process'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; + +import { openDb, openMemoryDb, closeDb } from './db/connection.js'; +import { SCHEMA_VERSION } from './db/schema.js'; +import { saveDomainDraft, freezeDomains, aredomainsFrozen } from './lib/domains.js'; +import { createWorktree, cleanupAllWorktrees } from './lib/worktree.js'; + +function seedRun(db, runId, domains) { + db.prepare('INSERT INTO runs (id, repo, local_path, commit_sha) VALUES (?, ?, ?, ?)') + .run(runId, 'org/r', '/tmp/r', 'a'.repeat(40)); + saveDomainDraft(db, runId, domains); +} + +/** + * Run `fn` while capturing everything written to console.error, returning the + * captured lines. logStage writes NDJSON to stderr via console.error, so this + * is the forensic-channel observation point. Restores the original in finally + * so a throwing `fn` cannot leak the stub into sibling tests. + */ +function captureStderr(fn) { + const lines = []; + const original = console.error; + console.error = (...args) => { lines.push(args.join(' ')); }; + try { + fn(); + } finally { + console.error = original; + } + return lines; +} + +// ═══════════════════════════════════════════ +// sm-p-001 — freeze overlap probe sees brace/char-class owned globs +// ═══════════════════════════════════════════ + +describe('sm-p-001 — sampleGlobPath brace/char-class awareness', () => { + let db; + const RUN_ID = 'smp1'; + + beforeEach(() => { db = openMemoryDb(); }); + afterEach(() => { db.close(); }); + + it('two identical brace-alternation owned globs THROW at freeze (probe is no longer blind)', () => { + // Pre-fix: sampleGlobPath('src/{a,b}/**') kept the literal braces, minimatch + // returned false against its own glob, bestMatchingGlob → null on both sides, + // the `if (!here || !there) continue` skipped the pair, and the equal- + // specificity criss-cross froze silently. The fix collapses `{a,b}` to its + // first alternative so the probe path (`src/a/x`) matches and the overlap + // fires — exactly the sm-r-001 "identical owned globs still throw" contract, + // now extended to brace globs. + seedRun(db, RUN_ID, [ + { name: 'alpha', globs: ['src/{a,b}/**'], ownership_class: 'owned' }, + { name: 'beta', globs: ['src/{a,b}/**'], ownership_class: 'owned' }, + ]); + assert.throws( + () => freezeDomains(db, RUN_ID), + /overlapping owned domains/i, + 'identical brace-alternation owned globs are an equal-specificity criss-cross', + ); + }); + + it('two identical char-class owned globs THROW at freeze', () => { + // `[ab]` char class — same blindness, same fix path (collapse → one char). + seedRun(db, RUN_ID, [ + { name: 'one', globs: ['src/[ab]/**'], ownership_class: 'owned' }, + { name: 'two', globs: ['src/[ab]/**'], ownership_class: 'owned' }, + ]); + assert.throws(() => freezeDomains(db, RUN_ID), /overlapping owned domains/i); + }); + + it('DISJOINT brace globs still freeze (no false positive from the expansion)', () => { + // The first-alternative collapse must not invent a collision: `src/{a,b}/**` + // probes to `src/a/x`, `lib/{c,d}/**` probes to `lib/c/x` — disjoint, so the + // map is freezable. Guards against the expansion over-firing. + seedRun(db, RUN_ID, [ + { name: 'alpha', globs: ['src/{a,b}/**'], ownership_class: 'owned' }, + { name: 'beta', globs: ['lib/{c,d}/**'], ownership_class: 'owned' }, + ]); + freezeDomains(db, RUN_ID); + assert.equal(aredomainsFrozen(db, RUN_ID), true); + }); +}); + +// ═══════════════════════════════════════════ +// sm-p-002 — openDb refuses a future schema version +// ═══════════════════════════════════════════ + +describe('sm-p-002 — openDb future-schema-version downgrade guard', () => { + let tmp; + let dbPath; + + beforeEach(() => { + tmp = mkdtempSync(join(tmpdir(), 'smp2-')); + dbPath = join(tmp, 'control-plane.db'); + }); + afterEach(() => { + closeDb(dbPath); + rmSync(tmp, { recursive: true, force: true }); + }); + + it('refuses to open a DB written by a NEWER schema version', () => { + // Create the DB at the current version, then forge a future version into kv + // and drop the cached handle so the next openDb re-reads from disk. + const db = openDb(dbPath); + db.prepare("INSERT OR REPLACE INTO kv (key, value) VALUES ('schema_version', ?)") + .run(String(SCHEMA_VERSION + 1)); + closeDb(dbPath); + + // Pre-fix: neither `version < 1` nor `version < SCHEMA_VERSION` is true, so + // openDb proceeded silently against the unknown-newer shape. The fix adds an + // explicit `version > SCHEMA_VERSION` refusal. + assert.throws( + () => openDb(dbPath), + (e) => { + assert.match(e.message, new RegExp(`v${SCHEMA_VERSION + 1}`), + 'error must name the on-disk version'); + assert.match(e.message, new RegExp(`v${SCHEMA_VERSION}\\b`), + 'error must name the build version'); + assert.match(e.message, /pull the latest/i, 'error must tell the operator what to do'); + return true; + }, + 'opening a future-schema DB must refuse loudly', + ); + }); + + it('a refused open does not leave a poisoned handle in the pool', () => { + // The guard closes the handle + deletes the pool entry before throwing, so a + // later open against a corrected DB is not served a dead cached handle. + const db = openDb(dbPath); + db.prepare("INSERT OR REPLACE INTO kv (key, value) VALUES ('schema_version', ?)") + .run(String(SCHEMA_VERSION + 1)); + closeDb(dbPath); + assert.throws(() => openDb(dbPath), /schema/i); + + // Roll the on-disk version back to current; the next open must succeed + // (proving no stale pool entry from the refused attempt). + const fix = openDb(join(tmp, 'sane.db')); + fix.prepare("INSERT OR REPLACE INTO kv (key, value) VALUES ('schema_version', ?)") + .run(String(SCHEMA_VERSION)); + closeDb(join(tmp, 'sane.db')); + const reopened = openDb(join(tmp, 'sane.db')); + assert.equal(reopened.prepare('SELECT 1 AS ok').get().ok, 1); + closeDb(join(tmp, 'sane.db')); + }); + + it('still opens a DB at the current schema version (no false refusal)', () => { + const db = openDb(dbPath); + assert.equal(db.prepare('SELECT 1 AS ok').get().ok, 1); + }); +}); + +// ═══════════════════════════════════════════ +// sm-p-003 — swallowed worktree-cleanup failures are now observable +// ═══════════════════════════════════════════ + +describe('sm-p-003 — worktree cleanup failures emit a forensic NDJSON row', () => { + let repo; + + function initRepo() { + const root = mkdtempSync(join(tmpdir(), 'smp3-')); + const git = (args) => execFileSync('git', args, { cwd: root, stdio: ['pipe', 'pipe', 'pipe'] }); + git(['init', '-q']); + git(['config', 'user.email', 't@t.t']); + git(['config', 'user.name', 't']); + git(['commit', '--allow-empty', '-q', '-m', 'root']); + return root; + } + + beforeEach(() => { repo = initRepo(); }); + afterEach(() => { + try { cleanupAllWorktrees(repo); } catch { /* best-effort */ } + rmSync(repo, { recursive: true, force: true }); + }); + + it('a failed stale-worktree-remove logs worktree_stale_remove_failed before the add throws', () => { + // Force the line-82 stale-remove to FAIL while the wtDir exists: pre-create + // the worktree directory as a plain (non-worktree) dir with a file in it, so + // `git worktree remove --force` errors ("is not a working tree") and the + // downstream `git worktree add` then fails on the occupied path. Pre-fix the + // operator saw only the `add` error; the fix emits a greppable breadcrumb + // for the precursor failure first. + const wtDir = join(repo, '.swarm', 'worktrees', 'w1-backend'); + mkdirSync(wtDir, { recursive: true }); + writeFileSync(join(wtDir, 'occupied.txt'), 'not a worktree\n'); + + let threw = null; + const lines = captureStderr(() => { + try { + createWorktree(repo, { runId: 'swarm-smp3', waveNumber: 1, domainName: 'backend' }); + } catch (e) { + threw = e; + } + }); + + // The downstream add still fails (control flow unchanged — best-effort cleanup). + assert.ok(threw, 'the occupied path still makes worktree add fail (behavior unchanged)'); + + // The new breadcrumb landed on the NDJSON stream FIRST. + const row = lines + .map((l) => { try { return JSON.parse(l); } catch { return null; } }) + .find((o) => o && o.stage === 'worktree_stale_remove_failed'); + assert.ok(row, 'a worktree_stale_remove_failed NDJSON row must be emitted'); + assert.equal(row.component, 'dogfood-swarm'); + assert.equal(row.branch, 'swarm/smp3/w1-backend'); + assert.ok(typeof row.err === 'string' && row.err.length > 0, 'the underlying git error is captured'); + }); + + it('the happy path emits NO worktree_stale_remove_failed row', () => { + // A clean create (no stale dir) must not log the failure breadcrumb — + // proves the row is a genuine failure signal, not happy-path noise. + const lines = captureStderr(() => { + createWorktree(repo, { runId: 'swarm-smp3', waveNumber: 2, domainName: 'docs' }); + }); + const noisy = lines + .map((l) => { try { return JSON.parse(l); } catch { return null; } }) + .some((o) => o && o.stage === 'worktree_stale_remove_failed'); + assert.equal(noisy, false, 'no failure row on the happy path'); + }); +}); + +// ═══════════════════════════════════════════ +// sm-p-004 — foreign_keys=ON parity across both connection factories +// ═══════════════════════════════════════════ + +describe('sm-p-004 — foreign_keys parity (file-backed and in-memory)', () => { + let tmp; + let dbPath; + + beforeEach(() => { + tmp = mkdtempSync(join(tmpdir(), 'smp4-')); + dbPath = join(tmp, 'control-plane.db'); + }); + afterEach(() => { + closeDb(dbPath); + rmSync(tmp, { recursive: true, force: true }); + }); + + it('openMemoryDb sets foreign_keys = ON (the in-mem path enforces FKs)', () => { + const db = openMemoryDb(); + assert.equal(db.pragma('foreign_keys', { simple: true }), 1, + 'foreign_keys must be ON — it is per-connection and defaults OFF'); + db.close(); + }); + + it('openDb ALSO sets foreign_keys = ON (file-backed path; this was already true pre-fix)', () => { + // sm-p-004 confirmation: openDb already enforced FKs before this amend. The + // amend routes both factories through applyConnectionPragmas so the two + // cannot drift again — this assertion pins the file-backed half of parity. + const db = openDb(dbPath); + assert.equal(db.pragma('foreign_keys', { simple: true }), 1, + 'the file-backed connection must enforce FKs in lockstep with the in-mem one'); + }); + + it('FK enforcement actually bites: an orphan finding_events insert is rejected', () => { + // Behavioral proof that foreign_keys=ON is load-bearing, not decorative. + // finding_events.finding_id REFERENCES findings(id); inserting against a + // nonexistent finding must raise FOREIGN KEY constraint failure on BOTH + // connection factories. + const orphanInsert = (db) => + db.prepare("INSERT INTO finding_events (finding_id, event_type) VALUES (999999, 'created')").run(); + + const mem = openMemoryDb(); + assert.throws(() => orphanInsert(mem), /FOREIGN KEY/i, + 'in-memory: orphan FK insert must be rejected'); + mem.close(); + + const file = openDb(dbPath); // closed by afterEach via closeDb(dbPath) + assert.throws(() => orphanInsert(file), /FOREIGN KEY/i, + 'file-backed: orphan FK insert must be rejected'); + }); +}); diff --git a/packages/dogfood-swarm/meta-amendB-verify-engine.test.js b/packages/dogfood-swarm/meta-amendB-verify-engine.test.js new file mode 100644 index 0000000..a95cb4c --- /dev/null +++ b/packages/dogfood-swarm/meta-amendB-verify-engine.test.js @@ -0,0 +1,292 @@ +/** + * meta-amendB-verify-engine.test.js — Amend wave B, verify-engine domain. + * + * Regression tests for the Stage-C hardening / operator-UX findings on the + * verify engine (lib/verify/** runner + registry + adapters). Each block names + * the finding id and pins the *post-fix* behaviour so a future regression + * points straight at the defect that returned. + * + * These are diagnosability findings, not correctness bugs in the Stage-A sense: + * the wave gate stays SAFE (no false ADVANCE) in every case. What changes is + * the operator's ability to tell apart honest "cannot verify here" outcomes + * from real failures. The contract with the display layer (cli.js + + * commands/verify.js, owned by the operator-output agent) is that verdicts come + * from the set { pass, fail, skip, no_tests, tool_missing }, each carrying a + * `reason` string when non-pass, plus per-step `tool_missing` / `timed_out` / + * `truncated` flags. The assertions below pin that contract. + * + * ve-p-001 (MED, degradation) — a MISSING build tool on PATH (ENOENT / + * "not recognized" / "command not found") is a DISTINCT `tool_missing` + * verdict, NOT an ordinary `fail`, so a host lacking the toolchain does not + * look like "the fix broke the build". + * ve-p-005 (LOW, observability) — a step that hit its timeout is tagged + * `timed_out: true` (+ reason) so a 5-min hang is distinguishable from a + * fast exit-1 failure. + * ve-p-006 (LOW, future-proofing) — probeAll() has a deterministic, + * documented tie-break on equal score (rust > python > node) instead of + * incidental Map-insertion order. + * ve-p-007 (LOW, observability) — an output truncation sets `truncated: true` + * on the StepResult (and aggregates to the run) so the operator knows the + * captured log is partial. + */ + +import { describe, it } from 'node:test'; +import assert from 'node:assert/strict'; +import { join } from 'node:path'; +import { tmpdir } from 'node:os'; +import { mkdtempSync, writeFileSync, rmSync } from 'node:fs'; + +import { runStep, runSteps } from './lib/verify/runner.js'; +import { probeAll } from './lib/verify/registry.js'; +import { rustAdapter } from './lib/verify/adapters/rust.js'; + +// A binary name that cannot exist on PATH on any host. +const MISSING = 'definitely-not-a-real-binary-xyz'; + +// ═══════════════════════════════════════════════════════════════════════ +// ve-p-001 — missing tool on PATH → `tool_missing`, never a misleading FAIL +// ═══════════════════════════════════════════════════════════════════════ + +describe('ve-p-001 — a missing build tool degrades to tool_missing, not fail', () => { + it('runStep tags a nonexistent cmd with tool_missing + a reason naming it', () => { + // Pre-fix: this returned exit_code:1, passed:false with the raw shell + // "not recognized" / "command not found" text and NO way to tell it apart + // from a real compile failure. Post-fix the StepResult is self-identifying. + const r = runStep('.', { name: 'check', cmd: MISSING, args: ['--version'] }); + assert.equal(r.passed, false); + assert.equal(r.tool_missing, true, 'the missing-tool case must be flagged'); + assert.notEqual(r.timed_out, true, 'a missing tool is not a timeout'); + assert.match(r.reason, new RegExp(MISSING)); + assert.match(r.reason, /not found on PATH/i); + }); + + it('runSteps maps a REQUIRED missing-tool step to verdict tool_missing (not fail)', () => { + // The central gap: a required step whose tool is absent must NOT read as a + // FAIL ("my fix broke the build"). It degrades to a distinct verdict and + // the wave correctly stays un-advanced (the gate advances only on `pass`). + const result = runSteps('.', [ + { name: 'check', cmd: MISSING, args: ['build'] }, + ]); + assert.notEqual(result.verdict, 'fail', 'a missing toolchain must not look like a real failure'); + assert.equal(result.verdict, 'tool_missing'); + assert.match(result.reason, /not found on PATH/i); + assert.match(result.reason, new RegExp(MISSING)); + }); + + it('a REAL required failure dominates a co-occurring missing tool → fail', () => { + // If one required step genuinely fails (real non-zero exit) while another + // is merely tool-missing, the honest, actionable signal is `fail` — the + // real failure must not be masked as `tool_missing`. + const result = runSteps('.', [ + { name: 'check', cmd: 'node', args: ['-e', '"process.exit(2)"'] }, // real fail + { name: 'test', cmd: MISSING, args: ['run'] }, // tool missing + ], { continueOnError: true }); + assert.equal(result.verdict, 'fail'); + }); + + it('through the rust adapter: a host without cargo reports tool_missing', () => { + // The motivating scenario from the finding: auditing a Rust repo on a host + // that lacks `cargo`. The required `check`/`test` steps are tool-missing, + // so the adapter result is `tool_missing`, not a FAIL that reads as a + // broken build. (Skips on the rare host that actually has cargo installed.) + const dir = mkdtempSync(join(tmpdir(), 'vep001-rust-')); + writeFileSync( + join(dir, 'Cargo.toml'), + '[package]\nname = "probe-fixture"\nversion = "0.0.0"\n', + 'utf-8' + ); + try { + const result = rustAdapter.run(dir); + if (result.steps.some(s => s.tool_missing)) { + assert.equal(result.verdict, 'tool_missing'); + assert.match(result.reason, /cargo/); + } else { + // cargo is installed here — the engine ran for real; nothing to assert + // about tool_missing. The verdict is then a real pass/fail/no-op. + assert.ok(['pass', 'fail', 'skip', 'no_tests'].includes(result.verdict)); + } + } finally { + rmSync(dir, { recursive: true, force: true }); + } + }); +}); + +// ═══════════════════════════════════════════════════════════════════════ +// ve-p-005 — a timed-out step is distinguishable from a fast failure +// ═══════════════════════════════════════════════════════════════════════ + +describe('ve-p-005 — a step that hit its timeout is tagged timed_out', () => { + it('runStep flags timed_out (+ reason) when the per-step budget is exceeded', () => { + // Use a tiny per-step timeoutMs so the test does not wait the real 5 min. + // The child keeps the event loop alive ~3s; the 120ms budget kills it. + // Pre-fix this looked identical to a fast exit-1 failure; post-fix the + // hang is self-identifying so an operator reads "TIMED OUT", not "failed". + const r = runStep('.', { + name: 'test', + cmd: 'node', + args: ['-e', '"setTimeout(function(){}, 3000)"'], + timeoutMs: 120, + }); + assert.equal(r.passed, false); + assert.equal(r.timed_out, true, 'a killed-on-timeout step must be flagged'); + assert.notEqual(r.tool_missing, true, 'a hang is not a missing tool'); + assert.match(r.reason, /timed out/i); + assert.match(r.reason, /120ms/); + }); + + it('the timeout signal aggregates to the runSteps result', () => { + const result = runSteps('.', [ + { name: 'test', cmd: 'node', args: ['-e', '"setTimeout(function(){}, 3000)"'], timeoutMs: 120 }, + ]); + assert.equal(result.timed_out, true); + // A timed-out required step is still a (real) failure at the gate — safe + // direction — but the run carries the hang signal + reason for the operator. + assert.equal(result.verdict, 'fail'); + assert.match(result.reason, /timed out/i); + }); + + it('a fast clean pass is NOT flagged as timed_out', () => { + const r = runStep('.', { name: 'ok', cmd: 'node', args: ['-e', '"process.exit(0)"'] }); + assert.equal(r.passed, true); + assert.notEqual(r.timed_out, true); + }); +}); + +// ═══════════════════════════════════════════════════════════════════════ +// ve-p-006 — deterministic tie-break for equal-confidence probes +// ═══════════════════════════════════════════════════════════════════════ + +describe('ve-p-006 — probeAll has a deterministic tie-break on equal score', () => { + function mkPolyglotRepo() { + // A repo that scores 50 under BOTH node (package.json, no test script) and + // python (pyproject.toml, no pytest marker) — a genuine equal-score tie. + const dir = mkdtempSync(join(tmpdir(), 'vep006-poly-')); + writeFileSync( + join(dir, 'package.json'), + JSON.stringify({ name: 'poly', version: '0.0.0' }, null, 2), + 'utf-8' + ); + writeFileSync(join(dir, 'pyproject.toml'), '[project]\nname = "poly"\n', 'utf-8'); + return dir; + } + + it('breaks an equal score by documented priority (python > node), not Map order', () => { + const dir = mkPolyglotRepo(); + try { + const ranked = probeAll(dir); + const node = ranked.find(p => p.name === 'node'); + const python = ranked.find(p => p.name === 'python'); + // Precondition: the fixture really is a tie at this score. + assert.equal(node.score, 50); + assert.equal(python.score, 50); + // The tie-break must put the more marker-exclusive adapter first + // deterministically. python (pyproject.toml) outranks node (package.json). + assert.equal(ranked[0].name, 'python'); + assert.ok( + ranked.findIndex(p => p.name === 'python') < ranked.findIndex(p => p.name === 'node'), + 'python must rank ahead of node on a score tie' + ); + } finally { + rmSync(dir, { recursive: true, force: true }); + } + }); + + it('is stable: repeated probes of the same tie yield the same winner', () => { + const dir = mkPolyglotRepo(); + try { + const a = probeAll(dir).map(p => p.name); + const b = probeAll(dir).map(p => p.name); + assert.deepEqual(a, b); + } finally { + rmSync(dir, { recursive: true, force: true }); + } + }); + + it('a strictly higher score still wins regardless of the tie-break', () => { + // node here also has a `test` script (+20 → 70) and beats a bare pyproject + // (50). The tie-break must only break TIES, never override a real lead. + const dir = mkdtempSync(join(tmpdir(), 'vep006-lead-')); + writeFileSync( + join(dir, 'package.json'), + JSON.stringify({ name: 'lead', version: '0.0.0', scripts: { test: 'node --test' } }, null, 2), + 'utf-8' + ); + writeFileSync(join(dir, 'pyproject.toml'), '[project]\nname = "lead"\n', 'utf-8'); + try { + const ranked = probeAll(dir); + assert.equal(ranked[0].name, 'node'); + assert.ok(ranked[0].score > 50); + } finally { + rmSync(dir, { recursive: true, force: true }); + } + }); +}); + +// ═══════════════════════════════════════════════════════════════════════ +// ve-p-007 — output truncation is signaled to the operator +// ═══════════════════════════════════════════════════════════════════════ + +describe('ve-p-007 — a truncated step output sets truncated: true', () => { + it('flags truncated on a step whose stdout exceeds the cap', () => { + // Emit ~20k chars to stdout; the cap is 8000. Pre-fix the truncation was + // only visible in-band ("... (truncated)"); post-fix the StepResult carries + // a top-level `truncated` flag the display layer can surface. + const r = runStep('.', { + name: 'test', + cmd: 'node', + args: ['-e', '"process.stdout.write(\'x\'.repeat(20000))"'], + }); + assert.equal(r.passed, true); + assert.equal(r.truncated, true, 'an over-cap stdout must set truncated'); + assert.ok(r.stdout.length < 20000, 'stdout must actually be bounded'); + assert.match(r.stdout, /truncated/); + }); + + it('does NOT flag truncated on small output', () => { + const r = runStep('.', { name: 'ok', cmd: 'node', args: ['-e', '"console.log(\'small\')"'] }); + assert.equal(r.passed, true); + assert.notEqual(r.truncated, true); + }); + + it('the truncation signal aggregates to the runSteps result', () => { + const result = runSteps('.', [ + { name: 'test', cmd: 'node', args: ['-e', '"process.stdout.write(\'x\'.repeat(20000)); console.log(\'\\n# tests 1\')"'] }, + ]); + assert.equal(result.truncated, true); + }); +}); + +// ═══════════════════════════════════════════════════════════════════════ +// Contract — the verdict vocabulary exposed to the display layer +// ═══════════════════════════════════════════════════════════════════════ + +describe('contract — runSteps exposes verdicts in { pass, fail, skip, no_tests, tool_missing }', () => { + it('every runSteps verdict is in the agreed set, with a reason on non-pass', () => { + const allowed = new Set(['pass', 'fail', 'skip', 'no_tests', 'tool_missing']); + + const pass = runSteps('.', [ + { name: 'test', cmd: 'node', args: ['-e', '"console.log(\'# tests 1\')"'] }, + ]); + const fail = runSteps('.', [ + { name: 'test', cmd: 'node', args: ['-e', '"process.exit(1)"'] }, + ]); + const skip = runSteps('.', []); // no required steps + const toolMissing = runSteps('.', [ + { name: 'test', cmd: MISSING, args: ['run'] }, + ]); + + for (const r of [pass, fail, skip, toolMissing]) { + assert.ok(allowed.has(r.verdict), `verdict "${r.verdict}" must be in the contract set`); + } + assert.equal(pass.verdict, 'pass'); + assert.equal(fail.verdict, 'fail'); + assert.equal(skip.verdict, 'skip'); + assert.equal(toolMissing.verdict, 'tool_missing'); + + // Non-pass verdicts the engine originates here carry a reason string. + assert.equal(typeof skip.reason, 'string'); + assert.equal(typeof toolMissing.reason, 'string'); + // A plain `pass` adds no reason (keeps the existing shape unchanged). + assert.equal(pass.reason, undefined); + }); +}); diff --git a/packages/dogfood-swarm/package.json b/packages/dogfood-swarm/package.json index a1d1fb5..f2fc856 100644 --- a/packages/dogfood-swarm/package.json +++ b/packages/dogfood-swarm/package.json @@ -1,6 +1,6 @@ { "name": "@dogfood-lab/dogfood-swarm", - "version": "1.3.1", + "version": "1.3.2", "type": "module", "description": "10-phase parallel-agent protocol runner for testing-os. SQLite-backed control plane, durable receipts, domain-aware orchestration. Three R's recovery contract: revalidate / rewind / redrive.", "main": "cli.js", @@ -32,6 +32,7 @@ "dependencies": { "@dogfood-lab/findings": "^1.2.0", "@dogfood-lab/report": "^1.2.0", + "@dogfood-lab/schemas": "^1.2.0", "ajv": "^8.18.0", "ajv-formats": "^3.0.1", "better-sqlite3": "^12.10.0", diff --git a/packages/findings/package.json b/packages/findings/package.json index 3b3d79b..51b3e14 100644 --- a/packages/findings/package.json +++ b/packages/findings/package.json @@ -1,6 +1,6 @@ { "name": "@dogfood-lab/findings", - "version": "1.3.1", + "version": "1.3.2", "type": "module", "description": "Finding contract spine for testing-os. Validates, reads, lists, and queries evidence-bound findings — the fourth contract alongside record, scenario, and policy.", "main": "index.js", diff --git a/packages/ingest/package.json b/packages/ingest/package.json index e6a0b8a..f46c46c 100644 --- a/packages/ingest/package.json +++ b/packages/ingest/package.json @@ -1,6 +1,6 @@ { "name": "@dogfood-lab/ingest", - "version": "1.3.1", + "version": "1.3.2", "type": "module", "description": "Ingestion pipeline for testing-os. Thin glue: dispatch → verifier → persist → indexes.", "main": "run.js", diff --git a/packages/portfolio/package.json b/packages/portfolio/package.json index 9cec586..e0a3aed 100644 --- a/packages/portfolio/package.json +++ b/packages/portfolio/package.json @@ -1,6 +1,6 @@ { "name": "@dogfood-lab/portfolio", - "version": "1.3.1", + "version": "1.3.2", "private": true, "type": "module", "description": "Cross-repo portfolio generator for testing-os. Aggregates the latest record per repo into reports/dogfood-portfolio.json.", diff --git a/packages/report/package.json b/packages/report/package.json index 11ddc00..5aa8ab3 100644 --- a/packages/report/package.json +++ b/packages/report/package.json @@ -1,6 +1,6 @@ { "name": "@dogfood-lab/report", - "version": "1.3.1", + "version": "1.3.2", "type": "module", "description": "Submission builder for testing-os — turns swarm/run results into the JSON envelope expected by the verifier.", "main": "build-submission.js", diff --git a/packages/schemas/package.json b/packages/schemas/package.json index fe1a119..b950d91 100644 --- a/packages/schemas/package.json +++ b/packages/schemas/package.json @@ -1,6 +1,6 @@ { "name": "@dogfood-lab/schemas", - "version": "1.3.1", + "version": "1.3.2", "description": "JSON schemas for the testing-os contract spine — record, finding, pattern, recommendation, doctrine, policy, scenario, submission.", "type": "module", "main": "dist/index.js", diff --git a/scripts/agent-output.schema.json b/packages/schemas/src/json/agent-output.schema.json similarity index 98% rename from scripts/agent-output.schema.json rename to packages/schemas/src/json/agent-output.schema.json index 2c35f01..01139c8 100644 --- a/scripts/agent-output.schema.json +++ b/packages/schemas/src/json/agent-output.schema.json @@ -1,6 +1,6 @@ { "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://github.com/dogfood-lab/testing-os/scripts/agent-output.schema.json", + "$id": "https://github.com/dogfood-lab/testing-os/packages/schemas/src/json/agent-output.schema.json", "title": "Swarm Agent Output (canonical envelope)", "description": "JSON contract for agent JSON outputs written to swarms//wave-N/.json. F-252713-017 (Phase 7 wave 1): collect-time schema-conformance gate. Replaces silent normalization in collect.js with a structured error pointing the agent at the canonical shape. Validates the OUTER envelope — every agent output regardless of phase has { domain, summary } at minimum. Phase-specific oneOf branches govern the inner shape: audit outputs carry findings[], feature outputs carry features[], amend outputs carry fixes[] + files_changed[]. Mirrors AUDIT_CATEGORIES + FEATURE_CATEGORIES + SEVERITY_ENUM in packages/dogfood-swarm/lib/output-schema.js — kept in lockstep.", "type": "object", diff --git a/packages/verify/package.json b/packages/verify/package.json index b54b8fa..0275926 100644 --- a/packages/verify/package.json +++ b/packages/verify/package.json @@ -1,6 +1,6 @@ { "name": "@dogfood-lab/verify", - "version": "1.3.1", + "version": "1.3.2", "type": "module", "description": "Central verifier for testing-os. Validates submissions against schema and policy, produces persisted records.", "main": "index.js", diff --git a/scripts/check-validator-cache-singleton.test.mjs b/scripts/check-validator-cache-singleton.test.mjs index 3760e42..b3c669d 100644 --- a/scripts/check-validator-cache-singleton.test.mjs +++ b/scripts/check-validator-cache-singleton.test.mjs @@ -27,10 +27,12 @@ * 1. STATIC: no `new Ajv` outside `packages/schemas/`. This is the * visible-source half — if any consumer regresses and adds its * own Ajv instance back, this half trips immediately. (The - * build-output schema validator in dogfood-swarm/lib/validate- - * agent-output.js uses scripts/agent-output.schema.json which - * is NOT in @dogfood-lab/schemas — see that file's comment. - * Allowlisted explicitly here.) + * agent-output schema validator in dogfood-swarm/lib/validate- + * agent-output.js compiles a local Ajv: the agent-output schema now + * ships via @dogfood-lab/schemas/json/ but is NOT one of the eight + * payload schemas wired through the canonical compileSchema/ + * validatePayload seam — it is a swarm output envelope resolved as a + * raw JSON file. Allowlisted explicitly here; see that file's comment.) * * 2. DYNAMIC: import `compileSchema` from `@dogfood-lab/schemas` * from this script's resolution path; trigger each consumer's @@ -79,10 +81,13 @@ const repoRoot = resolve(here, '..'); // ────────────────────────────────────────────────────────────────────── test('D2B-015 STATIC: no `new Ajv` outside @dogfood-lab/schemas (single canonical compile site)', () => { - // The scripts/agent-output.schema.json validator (dogfood-swarm/lib/ - // validate-agent-output.js) uses a NON-@dogfood-lab schema — that - // file's header comment explicitly states it stands apart until the - // schema graduates to the schemas package. Allowlisted by path. + // The agent-output schema validator (dogfood-swarm/lib/ + // validate-agent-output.js) compiles its own Ajv. The agent-output + // schema now ships via @dogfood-lab/schemas/json/, but it is NOT one of + // the eight payload schemas wired through the canonical compileSchema/ + // validatePayload seam — it is a swarm output envelope resolved as a raw + // JSON file, so it is compiled with a local Ajv by design. Allowlisted by + // path (see that file's header comment). // // The schemas package itself houses the SOLE permitted `new Ajv` call // (src/validate.ts + dist/validate.js). No other package may diff --git a/scripts/doc-drift-patterns.json b/scripts/doc-drift-patterns.json index f61fc70..0fbd57c 100644 --- a/scripts/doc-drift-patterns.json +++ b/scripts/doc-drift-patterns.json @@ -240,9 +240,9 @@ { "id": "agent-output-schema", "kind": "schema-conformance", - "title": "Agent JSON outputs (audit / feature / amend) conform to scripts/agent-output.schema.json", + "title": "Agent JSON outputs (audit / feature / amend) conform to packages/schemas/src/json/agent-output.schema.json", "description": "The canonical envelope is { domain, summary, [findings|features|fixes], ... }. Phase-7-wave-1 productization (F-252713-017): replaces silent normalization in collect.js with a structured AgentOutputValidationError pointing the agent at the canonical shape. The schema is permissive at top level (only `domain` + `summary` required) so it can govern audit, feature, AND amend outputs, with phase-specific $defs for the inner shapes. Stage C Wave A2 D4-004 closed the vacuous-gate root cause: previously this check carried `allowEmpty: true` and both target globs matched ZERO committed files — the gate was structurally green forever. Now `swarms/__schema-fixtures__/` ships 3 hand-curated positive fixtures (basic, audit-with-findings, amend-with-fixes) plus 1 negative fixture (invalid-missing-domain.json). The handler reads `negativeFilenamePattern` (default `^invalid-`): basenames matching that regex MUST fail Ajv validation. If a negative fixture passes, the gate emits `NEGATIVE_FIXTURE_PASSED` — proof that the schema loosened or the fixture went stale.", - "schema": "scripts/agent-output.schema.json", + "schema": "packages/schemas/src/json/agent-output.schema.json", "targets": [ "swarms/__schema-fixtures__/*.json", "swarms/swarm-*/wave-*/outputs/*.json" @@ -250,7 +250,7 @@ "allowlist": [], "errorClass": "AgentOutputValidationError", "negativeFilenamePattern": "^invalid-", - "hint": "Match scripts/agent-output.schema.json. Required at top level: domain, summary. Audit outputs add `findings[]` (with severity ∈ CRITICAL/HIGH/MEDIUM/LOW and category ∈ AUDIT_CATEGORIES); feature outputs add `features[]`; amend outputs add `fixes[]` + `files_changed[]`. Live-run target glob expanded 2026-05-14 (study-swarm Prevention-A): wave-2 amend's coordinator-brief drift (fixes_applied vs fixes) was caught at collect-time after agents had already run — moving the schema-conformance check to CI fires the gate at push time so brief-vs-schema drift surfaces before agents are dispatched. Stage C Wave A2 follow-up: a positive fixture that no longer validates means the schema tightened without updating the fixture (fix one side); a negative fixture that validates means the schema loosened without updating the negative test (fix the schema or the fixture)." + "hint": "Match packages/schemas/src/json/agent-output.schema.json. Required at top level: domain, summary. Audit outputs add `findings[]` (with severity ∈ CRITICAL/HIGH/MEDIUM/LOW and category ∈ AUDIT_CATEGORIES); feature outputs add `features[]`; amend outputs add `fixes[]` + `files_changed[]`. Live-run target glob expanded 2026-05-14 (study-swarm Prevention-A): wave-2 amend's coordinator-brief drift (fixes_applied vs fixes) was caught at collect-time after agents had already run — moving the schema-conformance check to CI fires the gate at push time so brief-vs-schema drift surfaces before agents are dispatched. Stage C Wave A2 follow-up: a positive fixture that no longer validates means the schema tightened without updating the fixture (fix one side); a negative fixture that validates means the schema loosened without updating the negative test (fix the schema or the fixture)." }, { diff --git a/site/public/screenshots/verify-output.svg b/site/public/screenshots/verify-output.svg index 8753ff8..88fbaac 100644 --- a/site/public/screenshots/verify-output.svg +++ b/site/public/screenshots/verify-output.svg @@ -41,7 +41,7 @@ $ npm run verify - > testing-os@1.3.1 verify + > testing-os@1.3.2 verify > npm run sync-version:check && npm run check-doc-drift && npm run check-regression-pins && npm run test:scripts && npm run build && npm run test [sync-version] README version block matches package.json — clean. diff --git a/swarms/PROTOCOL.md b/swarms/PROTOCOL.md index 5702e3d..c0c46fa 100644 --- a/swarms/PROTOCOL.md +++ b/swarms/PROTOCOL.md @@ -294,7 +294,7 @@ When parallel agents each run `npm test` / `npm run verify` against a worktree c The serial-final-verify discipline closes the gap: 1. **Coordinator dispatches with `--skip-verify`** when running an amend wave with parallel agents: `swarm dispatch health-amend-a --skip-verify` (see `packages/dogfood-swarm/commands/dispatch.js` SKIP_VERIFY_DIRECTIVE). -2. **Agents make edits, write their output JSON, and stop.** No per-agent `npm test`. Agents emit `verification_skipped: true` at the top level of their output JSON to make the contract explicit (`scripts/agent-output.schema.json` `verification_skipped`). +2. **Agents make edits, write their output JSON, and stop.** No per-agent `npm test`. Agents emit `verification_skipped: true` at the top level of their output JSON to make the contract explicit (`packages/schemas/src/json/agent-output.schema.json` `verification_skipped`). 3. **`swarm collect` propagates the flag.** When any agent marks `verification_skipped: true`, the collect report sets `serial_verify_required: true` and the CLI surfaces a Next-step hint (`packages/dogfood-swarm/commands/collect.js` `serial_verify_required`). 4. **Coordinator runs ONE `npm run verify` against the cumulative tree** before promoting the wave. This is the only authoritative verification for the wave.