feat(signum-evolve): add v1 candidate review layer by t3chn · Pull Request #115 · heurema/signum

t3chn · 2026-05-16T14:13:34Z

Summary

Adds signum-evolve v1 candidate review evidence for offline policy catalog experiments.
Adds bounded multi-prefix scope candidates through maxMutationDepth while preserving v0 default behavior.
Archives deterministic catalog_diff.json per candidate and surfaces compact diff/ranking metadata in the leaderboard.
Extends adoption bundles with catalog diff evidence and checklist coverage.
Adds a v1 smoke test for multi-prefix candidate generation, diff output, export, and scanner/catalog immutability.

Issue: Closes #114

Why

The v0 loop proved candidate generation, eval, compare, replay, and export.
Reviewers now need clearer candidate review evidence before any future adoption PR: what changed in the catalog, how candidates rank, and whether critical rules stayed untouched.

What changed

experiments/signum_evolve/candidate.py can generate deterministic prefix combinations when config sets maxMutationDepth > 1.
experiments/signum_evolve/catalog_diff.py records rule-level catalog diffs without touching source catalogs.
experiments/signum_evolve/report.py adds stable rank/score fields and compact catalog diff metadata to leaderboards.
experiments/signum_evolve/export.py includes catalog diff evidence in adoption bundles.
experiments/signum_evolve/configs/evolve.v1.json enables depth 2 for v1 experiments.

Review focus

Confirm v0 compatibility: evolve.v0.json still defaults to one scope mutation per candidate.
Confirm v1 remains scope-only and non-critical through existing mutation validation.
Confirm catalog diff output is deterministic and reviewable.
Confirm no runtime scanner/catalog/Codex/CI behavior changed.

Test plan

python3 -m py_compile experiments/signum_evolve/*.py
bash tests/test-signum-evolve-v0.sh
bash tests/test-signum-evolve-replay.sh
bash tests/test-signum-evolve-v1.sh
bash tests/test-policy-scanner-evals.sh
bash tests/test-policy-scanner-eval-compare.sh
bash tests/test-codex-prompt-evals.sh
bash tests/test-codex-prompt-eval-compare.sh
python3 evals/policy_scanner/run_policy_scanner_eval.py --repo-root . --json-output /tmp/policy-current-v1.json
python3 evals/policy_scanner/compare_policy_scanner_eval.py --baseline evals/policy_scanner/baselines/current.json --candidate /tmp/policy-current-v1.json
PATH=/opt/homebrew/bin:$PATH bash scripts/run-deterministic-tests.sh

Not changed

Scanner behavior
Source policy rule catalog
Claude overlay runtime
Codex prompt
CI wiring
Eval baselines
Candidate auto-apply behavior

Risks

narrow - Candidate ranking fields are new review metadata; future changes should preserve deterministic ordering for archived run comparisons.

Rollout / migration

N/A. This is an offline experiment harness change only.

Breaking changes

None.

Follow-ups

Add richer candidate scoring only after this review layer is accepted.
Keep future mutation families separate from this PR.

Merge strategy recommendation

Recommended: squash
Reason: this is one logical feature slice with one commit.

Why: - Reviewers need more than fixture pass/fail when assessing candidate policy catalogs from signum-evolve. - Issue #114 tracks adding bounded candidate review evidence without changing scanner behavior or auto-applying candidate catalogs. What changed: - Add bounded multi-prefix scope candidates through maxMutationDepth while preserving the v0 default depth of 1. - Add deterministic catalog_diff.json archives, compact leaderboard diff metadata, candidate ranking scores, and adoption bundle diff reporting. - Add evolve.v1.json and a v1 smoke test covering multi-prefix output, catalog diff export, and source scanner/catalog immutability. Testing: - python3 -m py_compile experiments/signum_evolve/*.py - bash tests/test-signum-evolve-v0.sh - bash tests/test-signum-evolve-replay.sh - bash tests/test-signum-evolve-v1.sh - bash tests/test-policy-scanner-evals.sh - bash tests/test-policy-scanner-eval-compare.sh - bash tests/test-codex-prompt-evals.sh - bash tests/test-codex-prompt-eval-compare.sh - python3 evals/policy_scanner/run_policy_scanner_eval.py --repo-root . --json-output /tmp/policy-current-v1.json - python3 evals/policy_scanner/compare_policy_scanner_eval.py --baseline evals/policy_scanner/baselines/current.json --candidate /tmp/policy-current-v1.json - PATH=/opt/homebrew/bin:/Users/vi/.codex/tmp/arg0/codex-arg0IyheAr:/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home/bin:/Users/vi/Library/Android/sdk/platform-tools:/Users/vi/Library/Android/sdk/emulator:/Users/vi/.antigravity/antigravity/bin:/Users/vi/.agents/bin:/Users/vi/.opencode/bin:/Users/vi/.local/bin:/Users/vi/go/bin:/opt/homebrew/opt/libpq/bin:/Users/vi/.local/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/opt/pkg/env/active/bin:/opt/pmk/env/global/bin:/Library/Apple/usr/bin:/usr/local/go/bin:/Users/vi/.local/bin:/Users/vi/.cargo/bin:/Users/vi/Library/Application Support/JetBrains/Toolbox/scripts:/Applications/Codex.app/Contents/Resources bash scripts/run-deterministic-tests.sh Risk: - narrow - Changes are isolated to the offline experiment harness, but candidate ordering and leaderboard scoring should be preserved deliberately because reviewers may compare archived runs. Constraint: Do not modify scanner/catalog/Codex/runtime/CI behavior, eval baselines, or auto-apply candidates. Related: #114

github-actions Bot added the intake/pass PR intake passed label May 16, 2026

t3chn marked this pull request as ready for review May 16, 2026 14:18

t3chn merged commit d0deda1 into main May 16, 2026
4 checks passed

t3chn deleted the codex/signum-evolve-v1-candidate-review branch May 16, 2026 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(signum-evolve): add v1 candidate review layer#115

feat(signum-evolve): add v1 candidate review layer#115
t3chn merged 1 commit into
mainfrom
codex/signum-evolve-v1-candidate-review

t3chn commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

t3chn commented May 16, 2026

Summary

Why

What changed

Review focus

Test plan

Not changed

Risks

Rollout / migration

Breaking changes

Follow-ups

Merge strategy recommendation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant