Skip to content

test(policy-scanner): expand generated path eval corpus#116

Merged
t3chn merged 1 commit into
mainfrom
codex/policy-scanner-generated-path-corpus
May 16, 2026
Merged

test(policy-scanner): expand generated path eval corpus#116
t3chn merged 1 commit into
mainfrom
codex/policy-scanner-generated-path-corpus

Conversation

@t3chn
Copy link
Copy Markdown
Contributor

@t3chn t3chn commented May 16, 2026

Summary

  • Expands the policy scanner eval corpus from 75 to 82 fixtures.
  • Adds generated-path dependency fixtures to make the current policy boundary explicit.
  • Keeps top-level generated/ manifest files as dependency review signals.
  • Keeps generated dependency examples under existing non-production prefixes excluded.
  • Updates the frozen policy scanner baseline and README metrics.

Why

  • generated/package.json is ambiguous: it may be disposable output, but it may also affect runtime or supply-chain surfaces.
  • Rather than changing scanner behavior immediately, this PR records the current intended boundary in the eval corpus.
  • This gives future scanner changes a measured baseline for deciding whether generated manifest paths should be narrowed later.

What changed

  • Added adversarial generated manifest fixtures for:
    • generated/client/package.json
    • generated/requirements.txt
    • generated/go.mod
  • Added negative generated-path fixtures for:
    • docs/generated/package.json
    • examples/generated/requirements.txt
    • tests/generated/go.mod
    • generated/metadata.txt
  • Updated evals/policy_scanner/baselines/current.json for the 82-fixture corpus.
  • Updated evals/policy_scanner/README.md with the generated-path policy note.

Review focus

  • Confirm this is eval-only and does not change scanner behavior.
  • Confirm the generated-path boundary is the intended current policy:
    • top-level generated/ manifests still produce dependency review signals;
    • generated examples under docs/examples/tests remain excluded;
    • non-manifest generated text remains ignored.

Test plan

  • bash tests/test-policy-scanner-evals.sh
  • bash tests/test-policy-scanner-eval-compare.sh
  • bash tests/test-signum-evolve-v1.sh
  • PATH=/opt/homebrew/bin:$PATH bash scripts/run-deterministic-tests.sh

Not changed

  • Scanner behavior
  • Policy rule catalog
  • Claude overlay runtime
  • Codex prompt
  • CI wiring
  • policy_scan.json output shape

Current metrics

  • fixtureCount: 82
  • passed: 82
  • failed: 0
  • precision: 1.0
  • recall: 1.0
  • f1: 1.0
  • criticalRecall: 1.0
  • criticalFalseNegatives: 0
  • determinismScore: 1.0
  • comparison status: equivalent
  • regressions: []

Risks

  • narrow - Baseline fixture count changes, but runtime scanner behavior is unchanged.

Rollout / migration

  • N/A. Eval-only corpus expansion.

Breaking changes

  • None.

Follow-ups

  • If real historical artifacts show generated/ manifests are mostly disposable noise, propose a separate measured scanner behavior PR.

Merge strategy recommendation

  • Recommended: squash
  • Reason: this is one eval-only corpus expansion commit.

Why:
- Generated manifest paths are ambiguous because they may be disposable output or runtime/supply-chain inputs.
- The scanner should keep that policy boundary explicit in the frozen eval corpus before any future behavior change.

What changed:
- Add generated-path policy scanner fixtures that preserve top-level generated manifest files as dependency review signals.
- Add negative fixtures for generated dependency examples under existing non-production prefixes and for generated non-manifest text.
- Update the policy scanner baseline and README metrics for the 82-fixture corpus.

Testing:
- bash tests/test-policy-scanner-evals.sh
- bash tests/test-policy-scanner-eval-compare.sh
- bash tests/test-signum-evolve-v1.sh
- PATH=/opt/homebrew/bin:/Users/vi/.codex/tmp/arg0/codex-arg0IyheAr:/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home/bin:/Users/vi/Library/Android/sdk/platform-tools:/Users/vi/Library/Android/sdk/emulator:/Users/vi/.antigravity/antigravity/bin:/Users/vi/.agents/bin:/Users/vi/.opencode/bin:/Users/vi/.local/bin:/Users/vi/go/bin:/opt/homebrew/opt/libpq/bin:/Users/vi/.local/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/opt/pkg/env/active/bin:/opt/pmk/env/global/bin:/Library/Apple/usr/bin:/usr/local/go/bin:/Users/vi/.local/bin:/Users/vi/.cargo/bin:/Users/vi/Library/Application Support/JetBrains/Toolbox/scripts:/Applications/Codex.app/Contents/Resources bash scripts/run-deterministic-tests.sh

Risk:
- narrow - This changes eval coverage and baseline fixture count only; scanner behavior and policy rule catalog are unchanged.

Constraint: Do not change scanner behavior, policy rules, Codex prompt, Claude overlay runtime, CI wiring, or generated experiment output.
@github-actions github-actions Bot added the intake/pass PR intake passed label May 16, 2026
@t3chn t3chn marked this pull request as ready for review May 16, 2026 17:22
@t3chn t3chn merged commit 46091cb into main May 16, 2026
4 checks passed
@t3chn t3chn deleted the codex/policy-scanner-generated-path-corpus branch May 16, 2026 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

intake/pass PR intake passed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant