custom-stack-examples PR 2: real compliance-release skill behavior by garagon · Pull Request #204 · garagon/nanostack

garagon · 2026-04-26T19:39:40Z

Summary

Second PR of the Custom Stack Examples v1 round. PR 1 #203 landed the manifest, READMEs, and placeholder skills behind a 49-check static contract. This PR replaces the three bin/*.sh placeholders with real behavior that PR 3's runtime E2E will exercise on a live project.

Skill behavior

`license-audit/bin/audit.sh`

Auto-detects the project stack (package.json, requirements.txt, pyproject.toml, go.mod). Optional positional arg overrides the detection.
Walks direct dependencies only. For Node, reads license metadata from each module's package.json under node_modules/ when available; falls back to unknown otherwise. Python and Go manifests do not declare license metadata, so those deps always classify as unknown until the user runs a deeper auditor.
Go scanner reads both require forms: indented modules inside require (...) blocks AND single-line require <module> <version> statements. The first revision missed the single-line form.
Classifies into permissive (MIT/BSD/Apache/ISC/etc), weak_copyleft (LGPL/MPL/EPL), strong_copyleft (GPL/AGPL), or unknown.
Emits { stack, counts: {total, permissive, weak_copyleft, strong_copyleft, unknown}, flagged: [{name, license}] }.
Always exits 0; the calling skill computes summary.status (OK / WARN / BLOCKED) from counts + flagged.

`privacy-check/bin/check.sh`

Bounded source-tree scan (src/, app/, pages/, server/, api/, lib/) for personal-data tokens (email, name, phone, address, payment, ssn, api_key, access_token, file_upload) and telemetry library references (analytics, tracking, telemetry, segment, posthog, ga, mixpanel, sentry).
Reads env templates (.env.example, .env.sample, .env.template) for keys hinting at collection (EMAIL_*, PHONE_*, PAYMENT_*, SECRET_*, TOKEN_*, API_KEY_*). Never reads .env, .env.local, .env.production, or credential JSON; the bash guard's G-035 already blocks those at the host layer.
Resolves "privacy note exists" via PRIVACY.md, a ## Privacy / ## Privacidad / ## Data handling H2 in README.md, or TELEMETRY.md when the only signal class is telemetry.
Emits { signals: [{kind, file, evidence}], missing: [...] }. Not a legal review.

`release-readiness/bin/summarize.sh`

Reads upstream artifacts via $NANOSTACK_ROOT/bin/find-artifact.sh --verify for review, qa, security, license-audit, privacy-check. Two-step lookup distinguishes "never saved" from "saved but tampered":
1. find-artifact.sh phase 30 (does any artifact exist?)
2. find-artifact.sh phase 30 --verify (does its integrity verify?)
Per-upstream status: MISSING (artifact absent), TAMPERED (hash mismatch with evidence: "integrity_failure" OR .integrity field absent with evidence: "missing_integrity"), OK / WARN / BLOCKED (passed through from the verified artifact's summary.status).
Rollup is monotonic worst-case: any BLOCKED / TAMPERED / MISSING → rollup BLOCKED; any WARN → rollup WARN; otherwise OK. The composer never softens a failure.
Emits { checks: [{phase, status, evidence}], rollup_status }. Read-only. Does not run /ship, open PRs, commit, or deploy.

Smoke harnesses

Each skill's bin/smoke.sh exercises real cases on /tmp projects, no network, no installs.

Skill	Cases	What's covered
`license-audit`	6	Node MIT permissive (read from `node_modules/lodash/package.json`), Node GPL-3.0 in `flagged`, Python `requirements.txt`, Go block-form `require (...)`, empty project, Go single-line `require <module> <version>`.
`privacy-check`	8	Clean, email collection without note (signal + missing), email + `PRIVACY.md`, email + `## Privacy` H2, telemetry-only + `TELEMETRY.md`, `.env.example` with `EMAIL_API_KEY`, name-only collection (covers `name` token), `ga` import (covers Google Analytics).
`release-readiness`	7	All-OK, one WARN, one BLOCKED, one MISSING, mixed WARN+MISSING, tampered artifact (wrong hash → `integrity_failure`), stripped integrity field → `missing_integrity`. The `write_artifact` helper computes a real sha256 hash so OK/WARN/BLOCKED cases stay honest.

Total: 21 case-level assertions across the three skills.

Test plan

Out of scope (PRs 3-4)

PR 3: ci/e2e-custom-stack-examples.sh — full install → resolve → journal → analytics → discard → conductor scheduling on a real /tmp project, including subdir + no-git scaffold paths and the spec's ≥35-assertion target.
PR 4: README.md + README.es.md + EXTENDING.md "build your own workflow stack" repositioning, only after PR 3's harness proves it.

Second PR of the Custom Stack Examples v1 round. PR 1 landed the manifest, READMEs, and placeholder skills with a 49-check static contract. This PR replaces the three bin/*.sh placeholders with real behavior that the next PR's runtime E2E can build on. license-audit/bin/audit.sh - Auto-detects the project stack from package.json, requirements.txt, pyproject.toml, or go.mod in cwd. Optional positional arg overrides the detection when more than one manifest is present. - Walks direct dependencies. For Node, reads license metadata from each module's package.json under node_modules/ when available; falls back to "unknown" when not installed. Python and Go manifests do not declare license metadata, so deps from those stacks always classify as unknown. - Classifies licenses into permissive (MIT/BSD/Apache/ISC/etc), weak_copyleft (LGPL/MPL/EPL), strong_copyleft (GPL/AGPL), unknown. - Emits JSON: { stack, counts: {total, permissive, weak_copyleft, strong_copyleft, unknown}, flagged: [{name, license}] }. - Always exits 0; the artifact's summary.status (OK / WARN / BLOCKED) is computed by the calling skill from counts + flagged. privacy-check/bin/check.sh - Bounded source-tree scan (src/, app/, pages/, server/, api/, lib/) for personal-data tokens (email/phone/address/payment/ ssn/api_key/access_token/file_upload) and telemetry library references (analytics/tracking/telemetry/segment/posthog/ mixpanel/sentry). - Reads env templates (.env.example, .env.sample, .env.template) for keys hinting at collection (EMAIL_*, PHONE_*, PAYMENT_*, SECRET_*, TOKEN_*, API_KEY_*). Never reads .env, .env.local, .env.production, or credential JSON; the bash guard already blocks those at the host layer. - Resolves "privacy note exists" via PRIVACY.md, a "## Privacy" or "## Privacidad" or "## Data handling" H2 in README.md, or TELEMETRY.md when the only signal class is telemetry. - Emits JSON: { signals: [{kind, file, evidence}], missing: [...] }. - Read-only. release-readiness/bin/summarize.sh - Reads upstream artifacts via $NANOSTACK_ROOT/bin/find-artifact.sh for review, qa, security, license-audit, privacy-check. - Maps each artifact's summary.status to a per-check entry. Missing artifact -> MISSING. Artifact with no declared status -> WARN. Otherwise the declared status passes through. - Rolls up using monotonic worst-case logic: any BLOCKED -> rollup BLOCKED; any MISSING required upstream -> rollup BLOCKED; any WARN -> rollup WARN; otherwise OK. - Emits JSON: { checks: [{phase, status, evidence}], rollup_status }. - Read-only. Does not run /ship, open PRs, commit, or deploy. Each skill's bin/smoke.sh now exercises real cases: - license-audit: 5 cases (Node MIT permissive, Node GPL flagged, Python requirements.txt, Go go.mod, empty project). - privacy-check: 6 cases (clean, leak without note, leak with PRIVACY.md, leak with README '## Privacy' H2, telemetry-only with TELEMETRY.md, env-template hint). - release-readiness: 5 cases (all-OK, one WARN, one BLOCKED, one MISSING, mixed WARN+MISSING). The static contract count stays at 49; this PR adds runtime behavior, not new structural surface. PR 3 wires ci/e2e-custom-stack-examples.sh to exercise the end-to-end install + resolve + journal + analytics + discard + conductor journey.

… signals Codex's PR 2 review caught three real cases the smoke tests did not cover. All three are fixed and locked. P2.1 release-readiness now uses find-artifact.sh --verify. The release gate composes the final pre-ship decision from local artifacts, but the previous version called find-artifact without integrity verification. A modified .nanostack/security or any other upstream artifact could roll the gate up to OK without detection. The composer now does a two-step lookup per upstream: 1. find-artifact.sh phase 30 (does any artifact exist?) 2. find-artifact.sh phase 30 --verify (is its integrity intact?) A tampered artifact (mtime untouched, content rewritten) emits a new TAMPERED status in the per-check entry and forces the rollup to BLOCKED, separately from MISSING (artifact never saved). The rollup logic is still monotonic worst-case but the failure flag is now a single HAS_FAILURE that combines BLOCKED, TAMPERED, and MISSING. New smoke case 6: write a security artifact with a deliberately-wrong .integrity hash, assert the per-check status is TAMPERED and the rollup is BLOCKED. P2.2 license-audit go.mod scanner now reads single-line require. The original regex `grep -E '^[[:space:]]+...'` only matched indented modules inside `require (...)` blocks. A common minimal go.mod uses the single-line form `require github.com/spf13/cobra v1.8.0` at column 0 with no block, which the scanner silently dropped (counts. total == 0 for a real Go dependency set). scan_go now uses an awk state machine that handles both forms: - in_block: `require (...)` block + indented module names - top-level: `require <module> <version>` single-line statement Smoke case 6 added: go.mod with a single-line require, asserts counts.total == 1 and the dep classifies as unknown. P2.3 privacy-check now matches the full personal_data and telemetry contracts. The SKILL.md says it covers `name` (personal data) and `ga` (telemetry library / Google Analytics), but the regexes omitted both. A signup form that only collected `name` or a telemetry import that used `ga` passed with no signal. PERSONAL_RE adds `name`. TELEMETRY_RE adds `ga`. Both tokens are known false-positive triggers (name appears in lots of unrelated identifiers, ga is short and noisy), but the contract claims coverage and the user triages the per-file evidence list. The trade-off is documented inline. Smoke cases 7 and 8 lock the claim. Smoke totals: license-audit 6 cases (was 5), privacy-check 8 (was 6), release-readiness 6 (was 5). 20 case-level assertions total. Static contract count unchanged (49). All other suites unchanged.

Codex's PR 2 follow-up review caught a subtle but real escalation of the same class as the previous --verify fix: find-artifact.sh --verify silently accepts artifacts whose .integrity field is ABSENT — it only fails on a hash MISMATCH. So an attacker who can write the file can: 1. Open the artifact JSON 2. Delete the .integrity field 3. Set summary.status to "OK" 4. Save release-readiness then treats that as verified clean evidence and the gate rolls up to OK. The previous "deliberately wrong hash" fix did not catch this because the missing-integrity path skips the hash check entirely. bin/save-artifact.sh always writes .integrity (line 137), so a legitimate artifact never trips this. The release gate now requires the field to be present after find-artifact --verify succeeds: stored_integrity=$(jq -r '.integrity // ""' "$verified") if [ -z "$stored_integrity" ]; then status="TAMPERED" evidence="missing_integrity" fi The per-check entry distinguishes the two failure modes: evidence="integrity_failure" -> hash mismatch (old TAMPERED case) evidence="missing_integrity" -> field absent (new TAMPERED case) Both force rollup BLOCKED via the existing HAS_FAILURE flag. Smoke harness updates: - write_artifact now computes a real sha256 hash the same way bin/save-artifact.sh computes it (sha256 of the canonical jq -Sc form before adding the integrity field). Existing OK/WARN/ BLOCKED/MISSING cases stay legitimate. - New case 7: write all five upstreams properly, then run `jq 'del(.integrity)'` on the security artifact. Asserts per-check status=TAMPERED, evidence=missing_integrity, rollup BLOCKED. Smoke totals: license-audit 6, privacy-check 8, release-readiness 7 (was 6). 21 case-level assertions across the round.

#205) Third PR of the Custom Stack Examples v1 round. PR 1 #203 landed the manifest + 49-check static contract; PR 2 #204 wired the real skill behavior. This PR adds ci/e2e-custom-stack-examples.sh, the runtime contract that proves the compliance-release stack composes end-to-end on a real /tmp project. 15 cells, 51 assertions, no network: [1] fixture project (Node app, README, .env.example, src with email + name fields, MIT lodash under node_modules) [2] install all three skills via bin/create-skill.sh --from with the documented --concurrency + --depends-on flags; assert skills land in $NANOSTACK_STORE/skills and config registers all three phases [3] bin/check-custom-skill.sh validates each scaffolded skill [4] save fake review/qa/security artifacts via save-artifact.sh (real .integrity hashes) [5] run license-audit/bin/audit.sh + save artifact; assert MIT lodash classifies as permissive [6] run privacy-check/bin/check.sh + save artifact; assert email AND name signals are detected, missing privacy_note flagged [7] bin/resolve.sh release-readiness; assert phase_kind=custom and all five declared upstream keys resolve to a path [8] run release-readiness/bin/summarize.sh; assert rollup is WARN (privacy-check is WARN, others OK), all five checks present [9] bin/sprint-journal.sh emits ## /<phase> sections for all three custom phases [10] bin/analytics.sh --json counts the three under sprints.custom and reports custom_total >= 3 [11] bin/discard-sprint.sh --dry-run lists all three custom files [12] conductor sprint.sh start with the stack.json phase_graph; assert sprint has 10 nodes (think+plan+build+review+qa+ security+3 custom+ship) [13] conductor sprint.sh batch; assert build precedes license-audit and privacy-check, both schedule as type=read, release-readiness follows after both, ship follows after release-readiness [14] scaffold from a git subdirectory; assert skills land in repo root .nanostack/skills (not subdir/.nanostack) [15] scaffold without git (fake HOME); assert skills land in $HOME/.nanostack/skills (not cwd) The new e2e-custom-stack-examples GitHub Actions job runs the harness on workflow_dispatch, alongside the existing e2e-custom-stack and the rest of the e2e jobs. Bug fix surfaced by the harness: privacy-check's source-tree scanner used `head -1` on per-line token extraction, so a single line that collected both `email` and `name` (the e2e fixture's src/signup.js) only reported the first match. The smoke case 2 asserted `any` rather than `all`, so the bug was invisible until the runtime harness asserted both signals explicitly. Fix: scan_personal_data and scan_telemetry now emit one signal per unique matching token in each line. The PR 2 smoke cases for name-only and ga continue to pass; the runtime e2e assertion that both email and name surface from the same line now passes too.

* custom-stack-examples PR 4: public copy positions stack story Final PR of the Custom Stack Examples v1 round. PR 1 #203 landed the manifest + static contract. PR 2 #204 wired the real skill behavior. PR 3 #205 added the 51-assertion runtime harness. With the harness green, the framework spec's "do not reposition the README hero before runtime E2E lands" rule unblocks. This PR updates the public copy to position the stack story without overclaim. README.md "Build on nanostack" splits into Single skill + Workflow stack subsections. The single-skill block stays as-is (proven by ci/e2e-custom-stack-flows.sh). The new workflow-stack block points at examples/custom-stack-template/compliance-release/ as the reference shape, names the three skills (license-audit + privacy-check + release-readiness composer), claims the compliance-release example proves save / resolve / journal / analytics / discard / conductor compose, and names the harness that proves it (ci/e2e-custom-stack-examples.sh, 15 cells, 51 assertions). README.es.md gets the same shape under "Construí tu propio workflow stack" so Spanish parity holds. Same tokens, same claims, no new claims English-only. EXTENDING.md "Quickest way to start" expands to a two-row table: single-skill via examples/custom-skill-template/audit-licenses, vs. workflow stack via examples/custom-stack-template/compliance-release. Each row has its own quickstart block. Both link to their proving harness. The compliance-release/README.md status banner switches from "PR 1, install commands run once PR 3 ships" to "end-to-end working, 49 contract + 51 runtime assertions". The custom-stack-template overview README replaces its PR-by-PR breakdown with a "what's covered" summary that names every harness behind the claim. Lint adds custom-stack-examples-public-copy: - README.md + README.es.md each mention compliance-release, phase_graph, workflow stack, ci/e2e-custom-stack-examples.sh, and the example path. - EXTENDING.md links both starting points (single-skill template and the stack template), the spec doc, and the runtime harness. - Public copy does NOT contain disallowed phrases: marketplace, plugin ecosystem, GDPR ready, SOC2 ready, compliance certified, "works in every agent identically". After this lands, Custom Stack Examples v1 is complete and the round closes. * docs: drop em dash + scope claim to opt-in E2E workflow Codex's PR 4 review caught two real items. P1 CI red: examples/custom-stack-template/README.md line 47 used an em dash before "all proven by ci/e2e-custom-stack-examples.sh". The "No em-dashes in public copy" lint scans top-level *.md plus every examples/**/README.md, and this line tripped it. Replaced with a period. P2 overclaim: three sites said the runtime harness runs "on every workflow run" or implied auto-on-PR coverage. The harness lives in .github/workflows/e2e.yml as a workflow_dispatch job, not part of the on-PR lint matrix, so the claim does not match the actual config. Softened to "in the opt-in E2E workflow": - README.md "Build on nanostack" workflow-stack subsection. - README.es.md "Construí tu propio workflow stack" subsection (same shape, same scope language). - examples/custom-stack-template/compliance-release/README.md status banner: now distinguishes the static contract (runs on every PR) from the runtime harness (runs in the opt-in E2E workflow). - examples/custom-stack-template/README.md "what's covered" list. The custom-stack-examples-public-copy lock added in PR 4 still passes (it requires the harness path token to appear, not a specific frequency claim). Em-dash sweep clean across every file in scope.

garagon added 4 commits April 26, 2026 16:39

release-readiness/SKILL.md: document TAMPERED + integrity rules

adaac84

garagon merged commit 7d7476a into main Apr 26, 2026
52 checks passed

garagon deleted the cse-2-skill-behavior branch April 26, 2026 20:56

garagon mentioned this pull request Apr 26, 2026

custom-stack-examples PR 3: 15-cell runtime E2E for compliance-release #205

Merged

12 tasks

garagon mentioned this pull request Apr 26, 2026

custom-stack-examples PR 4: public copy positions the stack story #206

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

custom-stack-examples PR 2: real compliance-release skill behavior#204

custom-stack-examples PR 2: real compliance-release skill behavior#204
garagon merged 4 commits into
mainfrom
cse-2-skill-behavior

garagon commented Apr 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garagon commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Skill behavior

license-audit/bin/audit.sh

privacy-check/bin/check.sh

release-readiness/bin/summarize.sh

Smoke harnesses

Test plan

Out of scope (PRs 3-4)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

garagon commented Apr 26, 2026 •

edited

Loading

`license-audit/bin/audit.sh`

`privacy-check/bin/check.sh`

`release-readiness/bin/summarize.sh`