Summary
Add an Install Route Test Plan documentation page plus a small reusable
evaluation kit so that contributors and adopters can compare CodeGuard
delivery models (rule files, Agent Skills, and MCP) in a structured,
repeatable way.
This is intentionally a follow-up to #60, which added the install-path
documentation and the CoSAI personas page. Splitting it out keeps the
docs PR focused on the install routes themselves, and lets this PR ship
the test plan alongside the fixtures and prompts it depends on.
Motivation
The repository already validates rule integrity and generated bundles
(src/validate_unified_rules.py, src/convert_to_ide_formats.py,
.github/workflows/validate-rules.yml). Those checks do not measure:
- whether a given AI client actually activates CodeGuard guidance
consistently,
- whether skills are invoked reliably enough in a target workflow,
- whether MCP latency or availability is acceptable in practice,
- whether context efficiency is materially better for a given repo,
- which delivery mechanism is easiest for a team to operate.
Today there is no consistent, repeatable way for an evaluator to answer
those questions side-by-side across rule files, Agent Skills, and
MCP. This issue tracks closing that gap.
Proposed deliverables
Documentation
Evaluation kit (tests/install-routes/)
README.md explaining how to use the kit alongside the test plan.
prompts/:
security-generation.md — password hashing, session handling,
parameterized data access prompts.
security-review.md — review prompts that pair with the language
fixtures.
scope-control.md — code-vs-Markdown comparison prompts.
false-positive-control.md — non-security prompts to detect
over-activation.
fixtures/:
python/password_storage_review.py — intentionally weak password
storage example for review tests.
javascript/session_login_review.js — intentionally weak login flow
with concatenated SQL and unsafe cookies for review tests.
typescript/input_validation_review.ts — intentionally weak API
handler with concatenated SQL and missing authorization for review
tests.
markdown/changelog.md — benign changelog used as the
"should NOT trigger security guidance" control file.
results/first-pass-template.md — copy-per-route template capturing
test metadata, per-test results, scored summary, and a preliminary
recommendation.
Acceptance criteria
Out of scope
- Building a programmatic test harness or CI integration for the
evaluation kit (the first pass is intentionally human-run; automation
hooks are noted as a future enhancement in the test plan).
- Shipping a Project CodeGuard MCP server. The MCP route is evaluated
against an evaluator's own MCP deployment.
- Re-validating CodeGuard rule content. This issue evaluates delivery
mechanisms, not rule correctness.
Related
Summary
Add an Install Route Test Plan documentation page plus a small reusable
evaluation kit so that contributors and adopters can compare CodeGuard
delivery models (
rule files,Agent Skills, andMCP) in a structured,repeatable way.
This is intentionally a follow-up to #60, which added the install-path
documentation and the CoSAI personas page. Splitting it out keeps the
docs PR focused on the install routes themselves, and lets this PR ship
the test plan alongside the fixtures and prompts it depends on.
Motivation
The repository already validates rule integrity and generated bundles
(
src/validate_unified_rules.py,src/convert_to_ide_formats.py,.github/workflows/validate-rules.yml). Those checks do not measure:consistently,
Today there is no consistent, repeatable way for an evaluator to answer
those questions side-by-side across
rule files,Agent Skills, andMCP. This issue tracks closing that gap.Proposed deliverables
Documentation
docs/install-route-test-plan.md(new)accuracy, repeatability, latency, context efficiency, installation
effort, update workflow, offline behavior, auditability, governance
fit.
file-scope activation, security-sensitive generation, false-positive
control, repeated-run consistency, latency, offline/degraded
behavior, update workflow, and team reproducibility.
automation guidance, and a recommended first-pass workflow.
the delivery-models table with the responsible CoSAI personas, and
includes a persona-aligned ownership summary in the decision rubric.
Re-add cross-links removed in docs: add CoSAI persona mapping and expand install-path documentation #60 once this page exists:
docs/install-paths.md:Install Route Test Planlink in theBottom linesection.docs/getting-started.md:Install Route Test Planandtests/install-routes/references in theTesting the Integrationsection.
docs/personas.md: cross-link to the persona-aligned ownershipsummary, plus an entry in
Further Reading.mkdocs.yml:Install Route Test Plan: install-route-test-plan.mdnav entry between
Choosing an Install PathandCoSAI Personas.Evaluation kit (
tests/install-routes/)README.mdexplaining how to use the kit alongside the test plan.prompts/:security-generation.md— password hashing, session handling,parameterized data access prompts.
security-review.md— review prompts that pair with the languagefixtures.
scope-control.md— code-vs-Markdown comparison prompts.false-positive-control.md— non-security prompts to detectover-activation.
fixtures/:python/password_storage_review.py— intentionally weak passwordstorage example for review tests.
javascript/session_login_review.js— intentionally weak login flowwith concatenated SQL and unsafe cookies for review tests.
typescript/input_validation_review.ts— intentionally weak APIhandler with concatenated SQL and missing authorization for review
tests.
markdown/changelog.md— benign changelog used as the"should NOT trigger security guidance" control file.
results/first-pass-template.md— copy-per-route template capturingtest metadata, per-test results, scored summary, and a preliminary
recommendation.
Acceptance criteria
docs/install-route-test-plan.mdis added and renders cleanly inmkdocs serve.tests/install-routes/directory is added with the README, fourprompt files, four fixtures, and the results template.
install-paths.md,getting-started.md,personas.md, andmkdocs.yml.comment explaining that they are evaluation bait, so static
scanners and future readers do not treat the violations as bugs to
fix.
docs/install-route-test-plan.mdworks end-to-end against atleast one client/tool using the bundled prompt corpus and
fixtures.
Out of scope
evaluation kit (the first pass is intentionally human-run; automation
hooks are noted as a future enhancement in the test plan).
against an evaluator's own MCP deployment.
mechanisms, not rule correctness.
Related
docs/install-paths.md,docs/personas.md,and the CoSAI persona model used by this test plan)