test(policy-scanner): expand generated path eval corpus by t3chn · Pull Request #116 · heurema/signum

t3chn · 2026-05-16T17:16:34Z

Summary

Expands the policy scanner eval corpus from 75 to 82 fixtures.
Adds generated-path dependency fixtures to make the current policy boundary explicit.
Keeps top-level generated/ manifest files as dependency review signals.
Keeps generated dependency examples under existing non-production prefixes excluded.
Updates the frozen policy scanner baseline and README metrics.

Why

generated/package.json is ambiguous: it may be disposable output, but it may also affect runtime or supply-chain surfaces.
Rather than changing scanner behavior immediately, this PR records the current intended boundary in the eval corpus.
This gives future scanner changes a measured baseline for deciding whether generated manifest paths should be narrowed later.

What changed

Added adversarial generated manifest fixtures for:
- generated/client/package.json
- generated/requirements.txt
- generated/go.mod
Added negative generated-path fixtures for:
- docs/generated/package.json
- examples/generated/requirements.txt
- tests/generated/go.mod
- generated/metadata.txt
Updated evals/policy_scanner/baselines/current.json for the 82-fixture corpus.
Updated evals/policy_scanner/README.md with the generated-path policy note.

Review focus

Confirm this is eval-only and does not change scanner behavior.
Confirm the generated-path boundary is the intended current policy:
- top-level generated/ manifests still produce dependency review signals;
- generated examples under docs/examples/tests remain excluded;
- non-manifest generated text remains ignored.

Test plan

bash tests/test-policy-scanner-evals.sh
bash tests/test-policy-scanner-eval-compare.sh
bash tests/test-signum-evolve-v1.sh
PATH=/opt/homebrew/bin:$PATH bash scripts/run-deterministic-tests.sh

Not changed

Scanner behavior
Policy rule catalog
Claude overlay runtime
Codex prompt
CI wiring
policy_scan.json output shape

Current metrics

fixtureCount: 82
passed: 82
failed: 0
precision: 1.0
recall: 1.0
f1: 1.0
criticalRecall: 1.0
criticalFalseNegatives: 0
determinismScore: 1.0
comparison status: equivalent
regressions: []

Risks

narrow - Baseline fixture count changes, but runtime scanner behavior is unchanged.

Rollout / migration

N/A. Eval-only corpus expansion.

Breaking changes

None.

Follow-ups

If real historical artifacts show generated/ manifests are mostly disposable noise, propose a separate measured scanner behavior PR.

Merge strategy recommendation

Recommended: squash
Reason: this is one eval-only corpus expansion commit.

Why: - Generated manifest paths are ambiguous because they may be disposable output or runtime/supply-chain inputs. - The scanner should keep that policy boundary explicit in the frozen eval corpus before any future behavior change. What changed: - Add generated-path policy scanner fixtures that preserve top-level generated manifest files as dependency review signals. - Add negative fixtures for generated dependency examples under existing non-production prefixes and for generated non-manifest text. - Update the policy scanner baseline and README metrics for the 82-fixture corpus. Testing: - bash tests/test-policy-scanner-evals.sh - bash tests/test-policy-scanner-eval-compare.sh - bash tests/test-signum-evolve-v1.sh - PATH=/opt/homebrew/bin:/Users/vi/.codex/tmp/arg0/codex-arg0IyheAr:/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home/bin:/Users/vi/Library/Android/sdk/platform-tools:/Users/vi/Library/Android/sdk/emulator:/Users/vi/.antigravity/antigravity/bin:/Users/vi/.agents/bin:/Users/vi/.opencode/bin:/Users/vi/.local/bin:/Users/vi/go/bin:/opt/homebrew/opt/libpq/bin:/Users/vi/.local/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/opt/pkg/env/active/bin:/opt/pmk/env/global/bin:/Library/Apple/usr/bin:/usr/local/go/bin:/Users/vi/.local/bin:/Users/vi/.cargo/bin:/Users/vi/Library/Application Support/JetBrains/Toolbox/scripts:/Applications/Codex.app/Contents/Resources bash scripts/run-deterministic-tests.sh Risk: - narrow - This changes eval coverage and baseline fixture count only; scanner behavior and policy rule catalog are unchanged. Constraint: Do not change scanner behavior, policy rules, Codex prompt, Claude overlay runtime, CI wiring, or generated experiment output.

github-actions Bot added the intake/pass PR intake passed label May 16, 2026

t3chn marked this pull request as ready for review May 16, 2026 17:22

t3chn merged commit 46091cb into main May 16, 2026
4 checks passed

t3chn deleted the codex/policy-scanner-generated-path-corpus branch May 16, 2026 17:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(policy-scanner): expand generated path eval corpus#116

test(policy-scanner): expand generated path eval corpus#116
t3chn merged 1 commit into
mainfrom
codex/policy-scanner-generated-path-corpus

t3chn commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

t3chn commented May 16, 2026

Summary

Why

What changed

Review focus

Test plan

Not changed

Current metrics

Risks

Rollout / migration

Breaking changes

Follow-ups

Merge strategy recommendation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant