Skip to content

feat(v0.1.3): audience classifier + --audit-profile suppression + p1-env-hints Pattern 2#26

Merged
brettdavies merged 6 commits intodevfrom
feat/v013-audience-classifier
Apr 22, 2026
Merged

feat(v0.1.3): audience classifier + --audit-profile suppression + p1-env-hints Pattern 2#26
brettdavies merged 6 commits intodevfrom
feat/v013-audience-classifier

Conversation

@brettdavies
Copy link
Copy Markdown
Owner

@brettdavies brettdavies commented Apr 22, 2026

Summary

CLI-side implementation of v0.1.3 handoff 5 (H5). Three orthogonal features share a release vehicle because
agentnative-site's H6 leaderboard launch reads whatever this release emits — bundling lets the site regen wave
pick up all three changes in a single pass.

  1. Audience classifier — new read-only derivation that labels a scorecard run as agent_optimized, mixed, or
    human_primary based on Warn counts across 4 signal behavioral checks (p1-non-interactive, p2-json-output,
    p7-quiet, p6-no-color-behavioral). Returns null when any signal check is missing so partial signal cannot
    produce a misleading verdict.
  2. --audit-profile flag — accepts one of human-tui, file-traversal, posix-utility, diagnostic-only.
    Suppresses checks that don't apply to the category (TUI apps legitimately intercept the TTY; POSIX utilities
    satisfy stdin-primary vacuously; diagnostic tools have no writes). Suppressed checks appear in results[] as
    Skip with structured evidence "suppressed by audit_profile: <category>" so readers see what was excluded.
  3. p1-env-hints Pattern 2 — extends detection beyond clap's [env: FOO] annotations to also recognize
    bash-style $FOO / TOOL_FOO references near flag definitions. Catches ripgrep, aider, and similar tools
    that document env bindings in free prose. Three mitigations against false positives: tool-scoped identifier shape
    (must be $-prefixed OR contain _), same-paragraph proximity (±4-line window or ENVIRONMENT section), shell-env
    blacklist.

Plan + implementation log: docs/plans/2026-04-21-v013-handoff-5-cli-audience-classifier.md (already on dev).

Schema-stable: schema_version stays "1.1". audience and audit_profile move from null to populated values
on the reserved fields, backwards-compatible with v1.1 consumers.

Changelog

Added

  • audience field on scorecard JSON now emits a kebab-case label (agent-optimized / mixed / human-primary) when all four signal behavioral checks ran, or null when any are missing.
  • --audit-profile <category> flag on anc check accepts human-tui, file-traversal, posix-utility, or diagnostic-only. The applied value echoes as the top-level audit_profile field on scorecard JSON, and suppressed checks drop out of coverage_summary.{must,should,may}.verified so site leaderboards don't overstate per-tool coverage under audit profiles.

Changed

  • p1-env-hints now recognizes bash-style env-var references ($FOO / TOOL_FOO) near flag definitions in addition to clap [env: FOO] annotations. Tools like ripgrep and aider that document env bindings in free prose now Pass instead of Warn. $PAGER and uppercase section headers like DOCKER_CONFIG: are excluded so tools like git / gh / man and pages with structured help output don't produce false positives.

Type of Change

  • feat: New feature (non-breaking change which adds functionality)

Related Issues/Stories

  • Plan: docs/plans/2026-04-21-v013-handoff-5-cli-audience-classifier.md
  • Compounded learning: docs/solutions/best-practices/cli-env-var-shape-heuristic-2026-04-21.md
  • Origin (combined H5): agentnative-site/docs/plans/2026-04-20-v013-handoff-5-audience-leaderboard.md
  • H4 handoff that designed Pattern 2: docs/plans/2026-04-20-v012-handoff-4-behavioral-checks.md
  • Todo 013 (pending→done): .context/compound-engineering/todos/013-done-p3-p1-env-hints-pattern-2-bash-style-detection.md

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing completed
  • All tests passing

Test Summary:

  • Unit tests: 382 passing (15 new in scorecard::audience, 6 new in principles::registry::tests covering
    SUPPRESSION_TABLE + suppresses(), 6 new in runner::help_probe::tests covering Pattern 2)
  • Integration tests: 47 passing (6 new test_audit_profile_* cases covering rejection of unknown values, JSON echo,
    Skip evidence format, absent-flag behavior, denominator shrink, and dogfood edge case)
  • cargo clippy --all-targets -- -Dwarnings: clean
  • cargo fmt --check: clean
  • cargo deny check: advisories/bans/licenses/sources ok
  • anc generate coverage-matrix --check: exit 0 (no registry drift)
  • Full scripts/hooks/pre-push: all checks passed

Dogfood verdicts:

  • anc check . --output jsonschema_version: "1.1", audience: "agent_optimized", audit_profile: null,
    27 pass / 2 warn / 4 skip / 0 fail. No regression.
  • anc check . --audit-profile diagnostic-only --output json → clean run, echoes "audit_profile": "diagnostic-only",
    suppresses p5-dry-run with structured evidence, no panic.

Live-binary smoke on Pattern 2:

  • rg: p1-env-hints Warn → Pass (catches RIPGREP_CONFIG_PATH and RIPGREP_COLOR in --config description prose).
  • aider: p1-env-hints Warn → Pass (catches $FOO-style prose mentions).
  • gh: still Warn. Named limitation — gh's env docs live in gh help environment, not gh --help, which is
    what the behavioral check probes. Out of scope for v0.1.3.

Files Modified

Created:

  • src/scorecard/audience.rsclassify(), SIGNAL_CHECK_IDS, 13 unit tests including drift guards
  • docs/solutions/best-practices/cli-env-var-shape-heuristic-2026-04-21.md (in shared solutions-docs repo)

Modified:

  • src/scorecard.rssrc/scorecard/mod.rs — directory-module promotion (git rename), format_json signature
    extended to accept audience + audit_profile
  • src/cli.rs--audit-profile flag + AuditProfile ValueEnum + From<AuditProfile> for ExceptionCategory
  • src/main.rs — suppression wiring in check execution loop, audience computation, scorecard field threading
  • src/principles/registry.rsSUPPRESSION_TABLE, suppresses() helper, ExceptionCategory::as_kebab_case(),
    DiagnosticDiagnosticOnly rename (to match "diagnostic-only" serde output), removed the
    #[allow(dead_code)] reservation
  • src/runner/help_probe.rs — Pattern 2 implementation (parse_env_hints_bash_style, extract_env_tokens,
    strip_clap_env_annotations, find_env_section, SHELL_ENV_BLACKLIST, PATTERN2_WINDOW)
  • tests/integration.rs — 6 new test_audit_profile_* cases
  • CLAUDE.md — removed the "reserved for v0.1.3" callout on ExceptionCategory; updated the scorecard-fields
    section to document the now-consumed fields
  • completions/anc.{bash,zsh,fish,elvish,powershell} — regenerated for the new --audit-profile flag surface

Key Features

  • Informational audience label, not a gate. Per CEO review Finding ci: add provenance guard workflow + Protect dev ruleset #3: the classifier is read-only over
    results[]. It does not mutate per-check verdicts, summary counts, or exit codes. Label mismatches on a given tool
    are fixed by adding a registry audit_profile or a new MUST — never by tweaking classifier logic. The classifier
    module's docstring records this doctrine verbatim.
  • Suppression is categorical, not per-tool. The four categories are fixed for v0.1.3. Adding a fifth requires a
    plan revision. Per-tool audit_profile lookups happen at the caller (site's regen script), keeping the CLI
    repo-agnostic — it only knows what the caller tells it.
  • Pattern 2 keeps Confidence::Medium. Widening detection does not raise confidence; the heuristic still has
    blind spots (e.g., help topics outside --help). Honest calibration over ambition.
  • Drift guards everywhere. SIGNAL_CHECK_IDS validated against the behavioral catalog; SUPPRESSION_TABLE
    validated against the full check catalog; ExceptionCategory::as_kebab_case() cross-checked against serde's
    rendering. A rename on either side of any mapping surfaces at test time.

Breaking Changes

  • No breaking changes

Additive to the v1.1 scorecard schema — reserved null fields now populate with real values. Consumers already
feature-detect audience and audit_profile (per site's H2 renderer); pre-v1.1 scorecards remain valid.

Deployment Notes

  • No special deployment steps required (for this PR into dev)

Release-branch mechanics per RELEASES.md happen in a follow-up PR: branch off origin/main, cherry-pick these
commits, bump Cargo.toml 0.1.2 → 0.1.3, regenerate changelog via generate-changelog.sh, PR to main. The
Homebrew-tap finalize-release dispatch will again land on the wrong repo (todo 006) — manual recovery command is
pre-staged in the plan's operational notes section.

Checklist

  • Code follows project conventions and style guidelines
  • Commit messages follow Conventional Commits
  • Self-review of code completed (including a code-simplicity-reviewer pass that surfaced three redundant
    guards — landed as the final refactor(help_probe) commit)
  • Tests added/updated and passing
  • No new warnings or errors introduced
  • Changes are backward compatible

Additional Context

The plan doc (docs/plans/2026-04-21-v013-handoff-5-cli-audience-classifier.md) already landed on dev as a
separate docs-only commit so it can serve as the citable record independent of this PR's merge. The
Implementation Log section in the plan captures every deviation from the original scope (unit-bundling due to
-Dwarnings, enum renames, suppression-table semantics, Pattern 2 heuristic tightening).

## Changelog

### Added

- `audience` field on scorecard JSON now emits a real label
  (`agent_optimized` / `mixed` / `human_primary`) when all 4 signal
  behavioral checks ran (`p1-non-interactive`, `p2-json-output`,
  `p7-quiet`, `p6-no-color-behavioral`). Reserved null when any signal
  is missing so partial signal cannot produce a misleading verdict.

### Changed

- Promote `src/scorecard.rs` to a directory module so the classifier
  can live alongside existing scorecard code without bloating one file.

Classifier is read-only over results and does not gate per-check
verdicts, totals, or exit codes. Per CEO review Finding #3
(2026-04-20), any "label is wrong for tool X" fix lands in the registry
or a new MUST, not in the classifier rules.

Drift guards: `SIGNAL_CHECK_IDS` validated against the behavioral
catalog and the requirement registry at test time, so a rename breaks
the build instead of silently mislabeling tools.
## Changelog

### Added

- `--audit-profile <category>` flag on `anc check` accepts `human-tui`,
  `file-traversal`, `posix-utility`, or `diagnostic-only`. Suppresses
  checks that don't apply to the target category — TUI apps legitimately
  intercept the TTY, POSIX utilities satisfy stdin-primary vacuously,
  diagnostic tools have no writes. Suppressed checks appear in
  `results[]` as `Skip` with evidence `"suppressed by audit_profile:
  <category>"` so readers see what was excluded and why.
- Scorecard JSON echoes the applied profile as the top-level
  `audit_profile` field.

### Changed

- Audience classifier now drops any signal check whose Skip evidence
  starts with `"suppressed by audit_profile:"` from the denominator,
  forcing `audience: null` when the classifier can't see all 4 signals.
  Organic Skips (no flags, no runner) still count as not-a-Warn.

The `ExceptionCategory` enum in the registry is now consumed; the
`#[allow(dead_code)]` reservation that landed in v0.1.1 is removed.
Every variant has a row in `SUPPRESSION_TABLE` (empty slices are
intentional for categories with no currently-covered checks to
suppress). Drift tests guard against check IDs in the table that
don't exist in the catalog.

No schema change — `schema_version` stays `"1.1"`.
## Changelog

### Changed

- `p1-env-hints` now recognizes bash-style env-var references near flag
  definitions in addition to clap's `[env: FOO]` annotations. Catches
  tools like `ripgrep` (free-prose `RIPGREP_CONFIG_PATH` mentions),
  `gh` (dedicated `ENVIRONMENT` section), and `aider` (`$FOO` next to
  each flag). Those targets previously Warned for missing env bindings
  even though they document them — the check now Passes.

Pattern 2 uses three mitigations against false positives:
- Tool-scoped shape — the token must be `$`-prefixed OR contain an
  underscore, which separates `RIPGREP_CONFIG_PATH` / `$PATH` from
  placeholders like `OPTIONS`, `COMMAND`, `HTTP`.
- Same-paragraph co-occurrence — a ±4-line window around each flag
  definition, plus the `ENVIRONMENT` / `ENV VARS` section when present.
- Shell-env blacklist — `PATH`, `HOME`, `USER`, `SHELL`, `PWD`, `LANG`,
  `TERM`, `TMPDIR` are never flag-bound env vars even when mentioned in
  prose.

Pattern 1 (clap-style `[env: FOO]`) behavior is frozen — the parser
strips those annotations before Pattern 2 runs so rejected clap tokens
can't be salvaged by Pattern 2. Confidence on `p1-env-hints` stays
Medium; widening detection does not raise confidence.
Remove the "reserved for v0.1.3" callout on ExceptionCategory — the
enum is now consumed by the SUPPRESSION_TABLE and `--audit-profile`
wiring. Document the drift guards and the committed four-category
surface (human-tui, file-traversal, posix-utility, diagnostic-only).

Update the scorecard-fields section: `audience` now derives from
`src/scorecard/audience.rs::classify()` with the 4 signal checks and
CEO review Finding #3 constraints; `audit_profile` echoes the applied
flag value. File paths reflect the `scorecard.rs → scorecard/`
promotion.
The greedy uppercase/digit/underscore scan already guarantees
`[A-Z_][A-Z0-9_]*` shape, so `is_env_var_name(candidate)` is a
tautology at the call site — dropped with a comment explaining
why. `i = end.max(i + 1)` folded to `i = end` since `end >= i + 1`
holds every time execution reaches that branch. `Vec::with_capacity`
on the merged hints vector is tuning noise for a list that's never
large — dropped for readability.

No behavior change — all 382 unit + 47 integration tests pass.
Picks up the new --audit-profile value-enum surface across bash /
zsh / fish / elvish / powershell. Generated via
`./scripts/generate-completions.sh`; no manual edits.
@brettdavies brettdavies merged commit dc5d741 into dev Apr 22, 2026
6 checks passed
@brettdavies brettdavies deleted the feat/v013-audience-classifier branch April 22, 2026 18:39
brettdavies added a commit that referenced this pull request Apr 22, 2026
…-case (#27)

## Summary

Resolves all 26 findings from the ce-review of commit `dc5d74168577` and unifies `audience` JSON values to kebab-case (a
post-review decision that eliminates a mixed-casing asymmetry flagged during review).

## Changelog

### Added

- Add `audience_reason` field on scorecard JSON — populated only when `audience` is `null`, with `"suppressed"` (signal
  check masked by `--audit-profile`) or `"insufficient_signal"` (signal check never produced) so consumers can see why
  the classifier withheld a label.
- Add `audit_profiles` array in `coverage/matrix.json` — each entry carries `{name, description, suppresses[]}`, letting
  agents and site renderers enumerate the four `--audit-profile` categories and what each one suppresses without
  scraping `--help`.

### Changed

- Change `audience` JSON values from snake_case to kebab-case (`agent-optimized` / `mixed` / `human-primary`) so both
  `audience` and `audit_profile` share a single casing within the same scorecard document. Consumers pinning on v0.1.2's
  `null` fallback are unaffected; v0.1.3 consumers must match the kebab-case shape.
- Change `coverage_summary.{must,should,may}.verified` to exclude `--audit-profile`-suppressed checks. Suppression now
  correctly reads as "not verified" in the metric, so site leaderboards no longer overstate per-tool coverage under
  audit profiles.
- Change suppressed and errored `results[].label` values to show the check's human-readable label (e.g., "Respects
  NO_COLOR") instead of falling back to the check id.

### Fixed

- Fix `p1-env-hints` Pattern 2 treating `$PAGER` as a flag-bound env reference. `PAGER` was always named in the
  `SHELL_ENV_BLACKLIST` doc comment as the archetypal exclusion, but was missing from the slice; tools like `git`, `gh`,
  and `man` that document `$PAGER` no longer produce a false-positive `p1-env-hints` warn.
- Fix Pattern 2 capturing uppercase-underscored section-header lines (e.g., `DOCKER_CONFIG:`) as env hints. Section
  headers are now excluded via a shared predicate before token extraction.

### Documentation

- Update README.md, AGENTS.md, and CLAUDE.md to describe the shipped v0.1.3 scorecard surface: `audience` +
  `audience_reason` + `audit_profile` field semantics, the `--audit-profile` flag with examples, and the
  `audit_profiles` section of `coverage/matrix.json` as the programmatic source for category enumeration.

## Type of Change

- [x] `feat`: New feature (non-breaking change which adds functionality)
- [x] `fix`: Bug fix (non-breaking change which fixes an issue)
- [x] `refactor`: Code refactoring (no functional changes)
- [x] `test`: Adding or updating tests
- [x] `docs`: Documentation update

## Related Issues/Stories

- Plan (CLI-side): `docs/plans/2026-04-21-v013-handoff-5-cli-audience-classifier.md` — Implementation Log captures the
  post-review kebab-case deviation
- Plan (site-side coordination): `agentnative-site/docs/plans/2026-04-21-v013-handoff-6-site-leaderboard-launch.md` —
  Unit 0.5 covers the kebab-case absorption that must land before the v0.1.3 scorecard regen wave
- Compounded learning: `docs/solutions/best-practices/module-directory-promotion-pattern-2026-04-22.md` — canonical `git
  mv` recipe for `foo.rs` → `foo/mod.rs` promotions (documented off the back of this PR's `help_probe` extraction)
- Compounded-trilogy cross-links: `docs/solutions/best-practices/rust-module-splitting-srp-not-loc-20260327.md`,
  `docs/solutions/best-practices/rust-pub-crate-fields-for-cross-module-impl-pattern-2026-04-20.md`
- Related PR (base commit under review): #26
- CE-review artifacts: `.context/compound-engineering/ce-review/20260422-134229-565fc6d7/` (10 reviewer JSONs +
  `metadata.json`; local-only by convention)

## Testing

- [x] Unit tests added/updated
- [x] Integration tests added/updated
- [x] Manual testing completed
- [x] All tests passing

**Test Summary:**

- Unit tests: 402 passing (net +8 new across `scorecard::audience`, `scorecard::tests`, `principles::registry`,
  `principles::matrix`, `cli::tests`, and `runner::help_probe::env_hints_bash::tests`)
- Integration tests: 50 passing (net +3 new: concrete non-null audience on self-dogfood, `--principle` forces `audience:
  null`, full-shape JSON snapshot lock)
- `cargo fmt --check`: clean
- `cargo clippy --all-targets -- -Dwarnings`: clean
- `cargo deny check`: advisories, bans, licenses, sources all OK
- `cargo run -- generate coverage-matrix --check`: no drift
- `scripts/hooks/pre-push` (full local-CI mirror): all checks passed
- GitHub CI on the PR head (`7a384b8`): all 5 required checks green
- Dogfood without profile: `anc check . --output json` → `schema_version: "1.1"`, `audience: "agent-optimized"`,
  `audit_profile: null`, `audience_reason` absent (omitted via `skip_serializing_if`).
- Dogfood with profile: `anc check . --audit-profile human-tui` → `audience: null`, `audience_reason: "suppressed"`,
  `coverage_summary.must.verified` drops 17 → 15 (suppressed P1 checks no longer inflate the metric).

## Files Modified

**Modified:**

- `AGENTS.md`, `CLAUDE.md`, `README.md` — v0.1.3 scorecard surface sync
- `coverage/matrix.json` — regenerated (adds top-level `audit_profiles` array)
- `src/check.rs` — new abstract `label(&self) -> &'static str` trait method
- `src/checks/**/*.rs` (40 files) — mechanical `label()` impl + `run()` body migration from `label: "…".into()` literals
  to `label: self.label().into()`
- `src/cli.rs` — `AuditProfile` ↔ `ExceptionCategory` parity drift tests
- `src/main.rs` — `SUPPRESSION_EVIDENCE_PREFIX` usage + `check.label()` in error and `--audit-profile` suppression
  branches
- `src/principles/registry.rs` — `SUPPRESSION_EVIDENCE_PREFIX` constant, `ALL_EXCEPTION_CATEGORIES` slice + compile-time
  variant guard, `ExceptionCategory::description()`, doc additions on `SUPPRESSION_TABLE` (trust boundary, drift-test
  scope)
- `src/principles/matrix.rs` — `AuditProfileEntry` struct, `audit_profiles` section + per-variant coverage test,
  `matrix_json_includes_audit_profiles_section`
- `src/scorecard/audience.rs` — kebab-case `classify()`, new `classify_reason()` helper, `is_audit_profile_suppression`
  promoted to `pub(crate)`, `debug_assert!` against duplicate signal IDs, module doc + suppression-prefix references
- `src/scorecard/mod.rs` — `audience_reason: Option<String>` field (derived in `build_scorecard`, `skip_serializing_if`
  when `audience` has a label), `build_coverage_summary` filter, Scorecard struct doc, `exit_code` trust-boundary doc,
  snapshot + casing + duplicate-signal regression tests
- `tests/integration.rs` — convention test blocking `CheckResult { … }` outside `fn run()` in `src/checks/**`, non-null
  audience dogfood, `--principle` + audience interaction, full-shape JSON snapshot, tightened `diagnostic-only`
  assertion, Pattern 2 bracketed-rejection and mixed-case negative fixtures

**Created:**

- `src/runner/help_probe/env_hints_bash.rs` — Pattern 2 submodule extracted from the 931-line `help_probe.rs`; holds
  `parse_env_hints_bash_style()`, `extract_env_tokens()`, `find_env_section()`, `is_section_header_line()`,
  `strip_clap_env_annotations()`, `SHELL_ENV_BLACKLIST`, `PATTERN2_WINDOW`, and 9 Pattern 2 tests (including 3 new for
  PAGER, section-header exclusion, and mixed-case / placeholder negatives).

**Renamed:**

- `src/runner/help_probe.rs` → `src/runner/help_probe/mod.rs` (via `git mv`; git detects rename at 58% similarity, blame
  preserved).

## Key Features

- Single source of truth for the suppression evidence string (`SUPPRESSION_EVIDENCE_PREFIX`): producer (`main.rs`),
  consumer (`audience::is_audit_profile_suppression`), and filter (`build_coverage_summary`) all reference the same
  constant so a rename can't silently desync the three sites.
- `Check::label()` on the trait means every emitted `CheckResult` — including the runtime layer's error and suppression
  branches — now uses a single source of truth for each check's human label. No more "scorecard row shows the cryptic
  id" when a check errors out or gets masked by a profile.
- `EnvHintSource` enum (`clap_annotation` / `proximity` / `env_section`) tags every emitted `EnvHint` with the pattern
  that matched, giving agents a structured signal when debugging a p1-env-hints false positive. Pattern 1 wins on
  duplicates; proximity loses to env_section when both apply.
- Compile-time guard (`_all_categories_covers_every_variant`) + runtime drift tests
  (`audit_profile_and_exception_category_variants_are_isomorphic`, `suppression_table_check_ids_exist_in_catalog`)
  triangulate the `AuditProfile` (CLI) ↔ `ExceptionCategory` (registry) ↔ `SUPPRESSION_TABLE` (registry) relationship so
  a fifth category can't land via partial edits.

## Benefits

- **Correctness of published metrics.** Before this PR, every v0.1.3 scorecard emitted with `--audit-profile` overstated
  `coverage_summary.verified` — the site's H6 leaderboard would have rendered inflated coverage numbers that didn't
  match the results-level evidence. Shipped the fix before the site's regen wave.
- **JSON consistency.** Unifying `audience` to kebab-case closes the only document-level casing asymmetry in the
  scorecard shape, so consumers no longer need two-rule case logic for one document.
- **Agent discoverability.** Both the `audit_profiles` matrix section and the expanded AGENTS.md "Agent-facing JSON
  surface" section turn previously help-text-scraped knowledge into first-class, committed, machine-readable artifacts.
- **Regression surface.** Eight new positive / negative / drift tests pin intentional behavior that future refactors
  might unknowingly break: exit-code-under-suppression, JSON casing contract, `CheckResult` construction scope,
  bracketed-env-var rejection, AuditProfile↔ExceptionCategory parity, signal-ID uniqueness, full-shape JSON snapshot,
  `--principle` + audience interaction.

## Breaking Changes

- [x] No breaking changes for v0.1.2 consumers (which always received `audience: null` and `audit_profile: null`).
- [x] Breaking for pre-release v0.1.3 consumers that had pinned on snake_case `audience` values. The only such consumer
  is in-house (agentnative-site's H6 leaderboard), which hasn't shipped yet. Unit 0.5 of the site's H6 plan covers the
  adaptation and must land before the v0.1.3 scorecard regen wave.

## Deployment Notes

- [x] No special deployment steps required for this PR into `dev`.

Release-branch mechanics per `RELEASES.md` happen in a follow-up PR:
branch off `origin/main`, cherry-pick these commits (plus `6fdca00` and `dc5d741` from dev), bump `Cargo.toml` `0.1.2` →
`0.1.3`, regenerate changelog via `generate-changelog.sh`, PR → `main`. The Homebrew-tap finalize-release dispatch will
land on the wrong repo per todo `006` in `brettdavies/.github` — the manual recovery command is pre-staged in the v0.1.3
plan's operational notes section.

## Checklist

- [x] Code follows project conventions and style guidelines
- [x] Commit messages follow [Conventional Commits](https://www.conventionalcommits.org/)
- [x] Self-review of code completed (via ce-review 10-reviewer pass against `dc5d741`; this PR resolves all findings)
- [x] Tests added/updated and passing (net +11 new)
- [x] No new warnings or errors introduced (`-Dwarnings` clean)
- [x] Changes are backward compatible for v0.1.2 consumers (see Breaking Changes for the v0.1.3 pre-release note)

## Additional Context

The ce-review process produced 10 per-reviewer JSON artifacts under
`.context/compound-engineering/ce-review/20260422-134229-565fc6d7/` capturing every finding's full detail-tier fields
(`why_it_matters` + `evidence[]`). These are local-only by convention and useful for debugging which finding drove which
change.

The `module-directory-promotion-pattern-2026-04-22.md` solution doc published to `docs/solutions/best-practices/` as
part of this session (not this PR — lives in the shared solutions-docs repo) compounds the
three-instances-of-the-same-refactor evidence (v0.1.2 H4 `runner.rs` split, v0.1.3 H5 `scorecard.rs` split, and this
PR's `help_probe.rs` split) into a canonical recipe. The companion trilogy cross-links across two older Rust
module-organization docs landed in the same solutions-docs push.
@brettdavies brettdavies mentioned this pull request Apr 23, 2026
16 tasks
brettdavies added a commit that referenced this pull request Apr 23, 2026
## Summary

Release branch for v0.1.3. Cherry-picked from `dev`:

- **PR #26** — `audience` classifier shipped (`agent-optimized` /
`mixed` / `human-primary`), `--audit-profile` flag (`human-tui`,
`file-traversal`, `posix-utility`, `diagnostic-only`) with suppression
wiring, `p1-env-hints` Pattern 2 for bash-style `$FOO` / `TOOL_FOO`
references near flag definitions, plus `audience_reason` +
`audit_profiles` matrix section.
- **PR #27** — 26 ce-review findings resolved, `audience` snake_case →
kebab-case unification, `$PAGER` / section-header false-positive fixes
in Pattern 2, suppressed/errored `results[].label` now shows human
labels, `coverage_summary.verified` correctly excludes
`--audit-profile`-suppressed checks.

Plus standard release mechanics: `Cargo.toml` bump to `0.1.3`,
`Cargo.lock` refresh, regenerated `CHANGELOG.md` (git-cliff from
squash-commit `## Changelog` bodies). No completions drift — PR #26
already regenerated them.

## Changelog

<!-- Release PR itself has no user-facing content — the upstream PRs #26
and #27 carry the `## Changelog` sections that git-cliff extracted into
CHANGELOG.md. Leaving this section empty per release-branch convention.
-->

## Type of Change

- [x] `feat`: New feature (non-breaking change which adds functionality)

## Related Issues/Stories

- Story: v0.1.3 H5 — audience classifier + audit-profile suppression
- Related PRs: #26 (feature), #27 (review resolutions)
- Architecture: see `docs/plans/v0.1.3-h5-plan.md` on `dev`

## Testing

- [x] Unit tests added/updated
- [x] Integration tests added/updated
- [x] Manual testing completed
- [x] All tests passing

**Test Summary:**

- Pre-push hook green on release branch: fmt / clippy `-Dwarnings` /
test / cargo-deny / Windows compat
- CI green on PR head: `ci / Fmt, clippy, test`, `ci / Package check`,
`ci / Security audit (advisories)`, `ci / Security audit (bans licenses
sources)`, `ci / Windows check`, `ci / Changelog`, `guard-docs /
check-forbidden-docs`, `guard-provenance / check-provenance`,
`guard-release / check-release-branch-name`
- Coverage matrix drift check passes against committed artifacts
- Smoke-tested `anc check .` (dogfood) with and without each
`--audit-profile` category

## Files Modified

**Modified:**

- `Cargo.toml`, `Cargo.lock` — version bump 0.1.2 → 0.1.3
- `CHANGELOG.md` — regenerated via `scripts/generate-changelog.sh`
- See cherry-pick commits `4a18051` (PR #26) and `db11e97` (PR #27) for
the full code-change surface (56 files, +2954 / −533).

**Created:**

- `src/scorecard/audience.rs`
- `src/runner/help_probe/env_hints_bash.rs`

**Deleted:**

- None (files renamed via `git mv` as part of module reorganization —
see cherry-pick bodies).

## Key Features

- Populated `audience` scorecard field with kebab-case labels
(`agent-optimized` / `mixed` / `human-primary`) when all four signal
behavioral checks ran; `null` with `audience_reason` when any are
missing.
- `--audit-profile <category>` flag on `anc check` suppresses checks
that don't apply to specific tool classes (human-TUI, file-traversal
utilities, POSIX utilities, diagnostic-only tools).
- `p1-env-hints` Pattern 2 recognizes bash-style `$FOO` / `TOOL_FOO` env
references near flag definitions — tools like `ripgrep` and `aider` that
document env bindings in free prose now Pass instead of Warn.
- `audit_profiles[]` section in `coverage/matrix.json` — consumers can
enumerate profiles programmatically instead of scraping `--help`.

## Benefits

- Scorecards now carry enough metadata for agents to route tools by
audience without running every check.
- `--audit-profile` prevents false-negative "fails" for tools that
legitimately don't satisfy a given principle (e.g., a file-traversal
utility isn't expected to output structured JSON).
- Pattern 2 removes a whole class of false positives on non-clap Rust
CLIs and non-Rust CLIs that document env vars in help prose.

## Breaking Changes

- [x] No breaking changes

Scorecard schema stays at `1.1`. New fields (`audience_reason`,
`audit_profile`, `audit_profiles[]` in matrix) are additive. `audience`
JSON values move from snake_case to kebab-case — v0.1.2 consumers only
ever saw `null` for this field, so consumers pinning on the v0.1.2 shape
are unaffected; v0.1.3 consumers must match the kebab-case form.

## Deployment Notes

- [ ] No special deployment steps required
- [x] Deployment steps documented below:

Post-merge:

1. Annotated tag + push — `git tag -a -m "Release v0.1.3" v0.1.3 && git
push origin main --tags`.
2. Tag push triggers `release.yml` → `cargo publish` (Trusted
Publishing) → GitHub Release (draftless, `make_latest: false` during
bottle window) → Homebrew dispatch → `finalize-release` flips
`make_latest: true`.
3. Follow-on on `agentnative-site`: install v0.1.3, regenerate
scorecards (new `audience` kebab-case + `audit_profile` +
`audience_reason` fields will populate), run
`scripts/sync-coverage-matrix.sh` to pull the new `audit_profiles`
section into `src/data/coverage-matrix.json`.

## Screenshots/Recordings

N/A — CLI-only change.

## Checklist

- [x] Code follows project conventions and style guidelines
- [x] Commit messages follow [Conventional
Commits](https://www.conventionalcommits.org/)
- [x] Self-review of code completed
- [x] Tests added/updated and passing
- [x] No new warnings or errors introduced
- [x] Changes are backward compatible (or breaking changes documented)
- [x] Cherry-picks touched no guarded paths (`docs/plans/`,
`docs/solutions/`, `docs/brainstorms/`, `docs/reviews/`)
- [x] `Cargo.toml` version matches the tag this branch will produce

## Additional Context

The two upstream feature PRs (#26 and #27) each carry their own `##
Changelog` sections — those are what `scripts/generate-changelog.sh`
extracts into `CHANGELOG.md`. This release PR's empty `## Changelog` is
intentional: duplicating the upstream bullets here would double-count
them in the next release cycle.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant