ZFP Overhaul — 13-layer zero-false-positive pipeline#1
ZFP Overhaul — 13-layer zero-false-positive pipeline#1VoidChecksum wants to merge 4 commits intoPurpleAILAB:mainfrom
Conversation
Add a full Zero-False-Positive (ZFP) pipeline in front of the existing
Vigilo workflow so that High/Critical findings are only promoted after
surviving independent PoC, dup, severity, adversarial, and vaccine-loop
gates.
## New agents (packages/claude/agents/)
- verifier.md — single ZFP quality gate, runs 8 gates including L13 RCA
distinctness semantic check
- judge.md — cross-family severity calibrator using C4/Sherlock
rubrics; auditor-family ≠ judge-family
- griller.md — adversarial FP hunter, 3 rounds, variant: max
- poc-generator.md — Foundry PoC emitter (gpt-5.2-codex)
- patcher.md — minimal fix (≤10 lines) tied to Root Cause
- re-verifier.md — vaccine loop closer; post-patch PoC must FAIL to
confirm bug is real (opus-4-5, different tier)
- economic-auditor.md — GPT-primary auditor for invariant violations
(LTV/share-price/no-free-lunch)
- invariant-tester.md — Foundry + Medusa invariant fuzz generator
- dup-detector.md — corpus similarity (haiku) with ~20k finding index
## 13-layer ZFP pipeline (vigilo.md Phase 3)
L1 static pre-pass deprio known-class
L2 auditor hypothesis w/ RCA
L3 PoC generation
L4 PoC compile
L5 PoC passes vulnerable state
L5' invariant fuzzer counterexamples
L6 determinism (two runs)
L7 corpus dup-check
L8 non-vacuous assertion + impact match
L9 post-patch PoC FAIL = bug real
L10 severity judge (cross-family)
L11 3-round adversarial grill (variant: max)
L12 cross-auditor consensus boost
L13 RCA semantic distinctness
Findings promote only when every applicable gate PASSes.
## Model routing rewrite (src/shared/model-requirements.ts)
- Opus-4-6 critical path (cheaper than 4-7 while keeping reasoning depth);
Opus-4-5 secondary, Opus-3 reserve fallback
- GPT-5.2 / gpt-5.2-codex primary for code-gen + cross-family auditors
- pickJudgeForAuditor() helper enforces family diversity between auditor
and judge to break shared-prior collusion
- `variant: max` reserved for griller only (single most expensive role)
## Finding schema (skills/vulnerability-base/SKILL.md)
- New Iron Law #5: Root Cause ≠ Symptom
- Top-level `## Root Cause` section required
- L13 semantic check: Verifier rejects findings where RCA paraphrases the
symptom; two worked RCA examples (reentrancy + oracle) showing good vs
bad framings
- Quality checklist extended
## Scripts
- scripts/static-prepass.sh — Slither + Semgrep + Aderyn parallel run,
outputs .vigilo/prepass.md; handles missing tools gracefully
- scripts/corpus-ingest.py — clones top-N Code4rena + Sherlock findings
repos in parallel, extracts severity via 5 strategies
- scripts/corpus-stats.sh — corpus dashboard (source/severity/protocol/year)
- scripts/dup-query.py — kNN query with ngram Jaccard + token overlap +
protocol filter; JSON output consumed by dup-detector agent
- scripts/corpus-bootstrap.sh — wrapper + pgvector schema init for v2
## Infrastructure
- pgvector container on :5433 ready for v2 semantic similarity
- vigilo-corpus/ structure documented in docs/ZFP-OVERHAUL.md
## CI
- .github/workflows/zfp-bench.yml — runs ScaBench regression on pushes +
PRs; fails if valid-finding rate regresses >2% vs baseline
## Build
- packages/opencode/build.mjs switched from `bun build` CLI to Bun.build()
API because `bun build` collides with the `build` script slot on
bun >= 1.3
## Docs
- docs/ZFP-OVERHAUL.md — design rationale, 13-layer table, roadmap
- docs/INSTALL-LOCAL.md — how to point opencode-web3 / Claude Code at the
local build; cost budgeting per role
## Corpus (external, not in tree)
Populated at ~/.vigilo-corpus/ with 20,789 indexed findings across 120
repos (60 C4 + 60 Sherlock, 2022–2025). Severity extracted from path,
filename suffix (-G/-Q), title tags [H-01], explicit "Severity:" lines,
and Sherlock "Issue H-1" patterns.
The 'plugins' array-of-objects shape was the legacy schema; current opencode-web3 requires 'plugin' as a flat array of paths/specs and rejects the old shape with: Error: Configuration is invalid at packages/opencode/opencode.json Unrecognized key: 'plugins' Migrate to the current schema so the plugin loads in fresh sessions.
Smoke-test findings — pre-existing upstream blockerRan the linked local build against the current 1.
|
The plugin bundle was built with `--target bun` and called Bun.* APIs
directly at module top-level, which broke when opencode ran under a Node
runtime:
Cannot destructure property 'spawn' of 'globalThis.Bun' as it is undefined
## Compat shim (new: src/shared/bun-compat.ts)
- spawn() — prefers Bun.spawn, falls back to child_process.spawn with
a Bun-compatible handle shape (stdout/stderr as WebStream,
exited promise, exitCode, kill)
- spawnSync() — prefers Bun.spawnSync, falls back to child_process.spawnSync
- readFileText() — Bun.file().text() → fs/promises.readFile(..., 'utf8')
- writeFile() — Bun.write(...) → fs/promises.writeFile(...)
- type Subprocess — generic alias, source-compat with 'bun' import
## Call-site migration (8 files)
- src/tools/ast-grep/cli.ts
- src/tools/interactive-bash/utils.ts
- src/tools/interactive-bash/tools.ts
- src/tools/grep/cli.ts
- src/tools/grep/downloader.ts
- src/tools/lsp/client.ts (incl. 'type Subprocess')
- src/tools/foundry/utils.ts
- src/tools/glob/cli.ts
- src/shared/tmux/tmux-utils.ts
- src/shared/zip-extractor.ts
- src/features/claude-code-mcp-loader/loader.ts
All 'from "bun"' imports redirected to shared bun-compat layer. CLI-only
files (src/cli/*.ts) still use Bun.* directly — they're not part of the
plugin bundle and run under the bun runtime.
## Build
build.mjs tolerates tsc declaration-emit errors (test files import
'bun:test', a few type nits in lsp/client.ts). Bundler still emits a
usable .js; .d.ts is emitted where possible. Fails the build only if the
Bun.build() bundler itself errors.
## ZFP agent TS factories (new: src/agents/zfp-factories.ts)
9 factories (verifier, judge, griller, poc-generator, patcher, re-verifier,
economic-auditor, invariant-tester, dup-detector) that read the full
agent prompt from the co-located Claude plugin (../claude/agents/*.md) at
factory time and register into the opencode agent registry via the
existing createBuiltinAgents() pipeline.
Falls back to a stub prompt (pointing at the MD path) if the Claude plugin
isn't present — preserves graceful degradation.
Wired into src/agents/utils.ts so 'opencode run' sees all ZFP agents and
vigilo.md's Phase 3 delegate_task() calls actually resolve.
## Verified
opencode-web3 now lists all 9 ZFP agents alongside the 12 existing ones.
Plugin loads without the prior 'globalThis.Bun is undefined' error.
`scoreBaseline()` called `matchTruthFinding()` which invokes `sendPrompt()` — but unlike `runScorer()`, `scoreBaseline()` never called `initOpenCodeClient()` first. Result: every run exited with [bench] ERROR: OpenCode client not initialized. Call initOpenCodeClient() first. regardless of whether baseline and truth data were present. Call `initOpenCodeClient(config.model)` at the top of `scoreBaseline()` so the two scoring paths have equivalent init behavior.
Runtime-compat landed — plugin loads under opencodeTwo follow-up commits on this PR: d6a8642 — runtime-compat shim + ZFP agent TS factories
d6a8642 — ZFP agent TS factoriesNew Wired into 563a17a — bench fix
Verified under opencode-web3 1.14.20All 21 agents register. Plugin loads without the E2E alchemix-v3 in progressKicked off Phase 1 recon (explorator + speculator) against the existing Bench E2E — partial
|
Summary
Adds a Zero-False-Positive (ZFP) pipeline in front of the existing Vigilo
audit workflow so that High/Critical findings promote only after surviving
independent PoC, dup, severity, adversarial, and vaccine-loop gates.
Goal: raise the valid-finding / Critical-accept rate on Cantina, Sherlock,
Code4rena, and Immunefi submissions.
13-layer ZFP pipeline (
vigilo.mdPhase 3)scripts/static-prepass.shpoc-generatorverifier(G3)verifier(G4)invariant-tester(parallel)verifier(G5)dup-detectorverifier(G6, G7)re-verifierjudge-{claude,gpt}griller(variant: max)verifier(G8)Findings promote only when every applicable gate returns PASS.
New agents (9,
packages/claude/agents/)verifier,judge,griller,poc-generator,patcher,re-verifier,economic-auditor,invariant-tester,dup-detector.Auto-discovered by the existing Claude plugin manifest — no registration
change needed.
Routing rewrite (
src/shared/model-requirements.ts)pickJudgeForAuditor()enforces auditor-family ≠ judge-family to breakshared-prior collusion
variant: maxreserved for griller only (highest-cost role, runs last)Finding schema (
skills/vulnerability-base/SKILL.md)## Root Causesection requiredthe symptom; worked RCA examples (reentrancy + oracle) included
Tooling
scripts/static-prepass.sh— Slither + Semgrep + Aderyn parallel runnerscripts/corpus-ingest.py— clones top-N C4 + Sherlock findings repos,indexes severity via 5 extraction strategies
scripts/corpus-stats.sh— corpus dashboardscripts/dup-query.py— kNN query (ngram Jaccard + protocol filter)scripts/corpus-bootstrap.sh— top-level + pgvector schema initCI
.github/workflows/zfp-bench.yml— runspackages/benchScaBenchregression on pushes + PRs; fails if valid-rate regresses >2% vs
recorded baseline. Replay-mode only (no live LLMs in CI).
Docs
docs/ZFP-OVERHAUL.md— full design rationale + roadmapdocs/INSTALL-LOCAL.md— local dev setup + cost budgeting per roleBuild change
packages/opencode/build.mjsswitched frombun buildCLI toBun.build()API — the
bun buildsubcommand collides with thebuildscript slot onbun ≥ 1.3.
External corpus (not in tree)
20,789 findings indexed at
~/.vigilo-corpus/across 60 C4 + 60 Sherlockcontests (2022–2025). Used by
dup-detector.Test plan
packages/opencode(npx tsc --noEmit)bun build.mjs→build ok)static-prepass.shruns end-to-end onalchemix-v3(Slither,Semgrep, Aderyn all detected and invoked)
dup-query.pyreturns semantically relevant hits (tested with"Reentrancy in withdraw" → top hit matches)
/auditrun onalchemix-v3with new pipeline(requires interactive opencode-web3 session — not runnable in CI)
baseline-summary.json)Cost budget (per candidate finding, with ZFP full pipeline)
~$3 per candidate. Griller is the single largest line item at ~$1.80
(3 × $0.60 Opus max). Disable via
--no-grillwhen iterating on non-Critical findings.
Backwards compatibility
vigilo,quaestor,explorator,speculator,8 specialist auditors) unchanged in behavior
validation with delegated gate chain)
/auditflow still works; ZFP is additiveKept as draft
Marking draft pending live-E2E validation. Once
alchemix-v3regressionnumbers are in (TP/FP rate vs prior audit), will mark ready.