custom-stack-examples PR 2: real compliance-release skill behavior#204
Merged
Conversation
Second PR of the Custom Stack Examples v1 round. PR 1 landed the
manifest, READMEs, and placeholder skills with a 49-check static
contract. This PR replaces the three bin/*.sh placeholders with
real behavior that the next PR's runtime E2E can build on.
license-audit/bin/audit.sh
- Auto-detects the project stack from package.json,
requirements.txt, pyproject.toml, or go.mod in cwd. Optional
positional arg overrides the detection when more than one
manifest is present.
- Walks direct dependencies. For Node, reads license metadata
from each module's package.json under node_modules/ when
available; falls back to "unknown" when not installed. Python
and Go manifests do not declare license metadata, so deps from
those stacks always classify as unknown.
- Classifies licenses into permissive (MIT/BSD/Apache/ISC/etc),
weak_copyleft (LGPL/MPL/EPL), strong_copyleft (GPL/AGPL),
unknown.
- Emits JSON: { stack, counts: {total, permissive, weak_copyleft,
strong_copyleft, unknown}, flagged: [{name, license}] }.
- Always exits 0; the artifact's summary.status (OK / WARN /
BLOCKED) is computed by the calling skill from counts + flagged.
privacy-check/bin/check.sh
- Bounded source-tree scan (src/, app/, pages/, server/, api/,
lib/) for personal-data tokens (email/phone/address/payment/
ssn/api_key/access_token/file_upload) and telemetry library
references (analytics/tracking/telemetry/segment/posthog/
mixpanel/sentry).
- Reads env templates (.env.example, .env.sample, .env.template)
for keys hinting at collection (EMAIL_*, PHONE_*, PAYMENT_*,
SECRET_*, TOKEN_*, API_KEY_*). Never reads .env, .env.local,
.env.production, or credential JSON; the bash guard already
blocks those at the host layer.
- Resolves "privacy note exists" via PRIVACY.md, a "## Privacy"
or "## Privacidad" or "## Data handling" H2 in README.md, or
TELEMETRY.md when the only signal class is telemetry.
- Emits JSON: { signals: [{kind, file, evidence}], missing: [...] }.
- Read-only.
release-readiness/bin/summarize.sh
- Reads upstream artifacts via $NANOSTACK_ROOT/bin/find-artifact.sh
for review, qa, security, license-audit, privacy-check.
- Maps each artifact's summary.status to a per-check entry.
Missing artifact -> MISSING. Artifact with no declared status ->
WARN. Otherwise the declared status passes through.
- Rolls up using monotonic worst-case logic: any BLOCKED -> rollup
BLOCKED; any MISSING required upstream -> rollup BLOCKED; any
WARN -> rollup WARN; otherwise OK.
- Emits JSON: { checks: [{phase, status, evidence}], rollup_status }.
- Read-only. Does not run /ship, open PRs, commit, or deploy.
Each skill's bin/smoke.sh now exercises real cases:
- license-audit: 5 cases (Node MIT permissive, Node GPL flagged,
Python requirements.txt, Go go.mod, empty project).
- privacy-check: 6 cases (clean, leak without note, leak with
PRIVACY.md, leak with README '## Privacy' H2, telemetry-only
with TELEMETRY.md, env-template hint).
- release-readiness: 5 cases (all-OK, one WARN, one BLOCKED, one
MISSING, mixed WARN+MISSING).
The static contract count stays at 49; this PR adds runtime
behavior, not new structural surface. PR 3 wires
ci/e2e-custom-stack-examples.sh to exercise the end-to-end install
+ resolve + journal + analytics + discard + conductor journey.
… signals Codex's PR 2 review caught three real cases the smoke tests did not cover. All three are fixed and locked. P2.1 release-readiness now uses find-artifact.sh --verify. The release gate composes the final pre-ship decision from local artifacts, but the previous version called find-artifact without integrity verification. A modified .nanostack/security or any other upstream artifact could roll the gate up to OK without detection. The composer now does a two-step lookup per upstream: 1. find-artifact.sh phase 30 (does any artifact exist?) 2. find-artifact.sh phase 30 --verify (is its integrity intact?) A tampered artifact (mtime untouched, content rewritten) emits a new TAMPERED status in the per-check entry and forces the rollup to BLOCKED, separately from MISSING (artifact never saved). The rollup logic is still monotonic worst-case but the failure flag is now a single HAS_FAILURE that combines BLOCKED, TAMPERED, and MISSING. New smoke case 6: write a security artifact with a deliberately-wrong .integrity hash, assert the per-check status is TAMPERED and the rollup is BLOCKED. P2.2 license-audit go.mod scanner now reads single-line require. The original regex `grep -E '^[[:space:]]+...'` only matched indented modules inside `require (...)` blocks. A common minimal go.mod uses the single-line form `require github.com/spf13/cobra v1.8.0` at column 0 with no block, which the scanner silently dropped (counts. total == 0 for a real Go dependency set). scan_go now uses an awk state machine that handles both forms: - in_block: `require (...)` block + indented module names - top-level: `require <module> <version>` single-line statement Smoke case 6 added: go.mod with a single-line require, asserts counts.total == 1 and the dep classifies as unknown. P2.3 privacy-check now matches the full personal_data and telemetry contracts. The SKILL.md says it covers `name` (personal data) and `ga` (telemetry library / Google Analytics), but the regexes omitted both. A signup form that only collected `name` or a telemetry import that used `ga` passed with no signal. PERSONAL_RE adds `name`. TELEMETRY_RE adds `ga`. Both tokens are known false-positive triggers (name appears in lots of unrelated identifiers, ga is short and noisy), but the contract claims coverage and the user triages the per-file evidence list. The trade-off is documented inline. Smoke cases 7 and 8 lock the claim. Smoke totals: license-audit 6 cases (was 5), privacy-check 8 (was 6), release-readiness 6 (was 5). 20 case-level assertions total. Static contract count unchanged (49). All other suites unchanged.
Codex's PR 2 follow-up review caught a subtle but real escalation
of the same class as the previous --verify fix: find-artifact.sh
--verify silently accepts artifacts whose .integrity field is
ABSENT — it only fails on a hash MISMATCH. So an attacker who can
write the file can:
1. Open the artifact JSON
2. Delete the .integrity field
3. Set summary.status to "OK"
4. Save
release-readiness then treats that as verified clean evidence and
the gate rolls up to OK. The previous "deliberately wrong hash"
fix did not catch this because the missing-integrity path skips
the hash check entirely.
bin/save-artifact.sh always writes .integrity (line 137), so a
legitimate artifact never trips this. The release gate now requires
the field to be present after find-artifact --verify succeeds:
stored_integrity=$(jq -r '.integrity // ""' "$verified")
if [ -z "$stored_integrity" ]; then
status="TAMPERED"
evidence="missing_integrity"
fi
The per-check entry distinguishes the two failure modes:
evidence="integrity_failure" -> hash mismatch (old TAMPERED case)
evidence="missing_integrity" -> field absent (new TAMPERED case)
Both force rollup BLOCKED via the existing HAS_FAILURE flag.
Smoke harness updates:
- write_artifact now computes a real sha256 hash the same way
bin/save-artifact.sh computes it (sha256 of the canonical jq -Sc
form before adding the integrity field). Existing OK/WARN/
BLOCKED/MISSING cases stay legitimate.
- New case 7: write all five upstreams properly, then run
`jq 'del(.integrity)'` on the security artifact. Asserts
per-check status=TAMPERED, evidence=missing_integrity, rollup
BLOCKED.
Smoke totals: license-audit 6, privacy-check 8, release-readiness 7
(was 6). 21 case-level assertions across the round.
12 tasks
garagon
added a commit
that referenced
this pull request
Apr 26, 2026
#205) Third PR of the Custom Stack Examples v1 round. PR 1 #203 landed the manifest + 49-check static contract; PR 2 #204 wired the real skill behavior. This PR adds ci/e2e-custom-stack-examples.sh, the runtime contract that proves the compliance-release stack composes end-to-end on a real /tmp project. 15 cells, 51 assertions, no network: [1] fixture project (Node app, README, .env.example, src with email + name fields, MIT lodash under node_modules) [2] install all three skills via bin/create-skill.sh --from with the documented --concurrency + --depends-on flags; assert skills land in $NANOSTACK_STORE/skills and config registers all three phases [3] bin/check-custom-skill.sh validates each scaffolded skill [4] save fake review/qa/security artifacts via save-artifact.sh (real .integrity hashes) [5] run license-audit/bin/audit.sh + save artifact; assert MIT lodash classifies as permissive [6] run privacy-check/bin/check.sh + save artifact; assert email AND name signals are detected, missing privacy_note flagged [7] bin/resolve.sh release-readiness; assert phase_kind=custom and all five declared upstream keys resolve to a path [8] run release-readiness/bin/summarize.sh; assert rollup is WARN (privacy-check is WARN, others OK), all five checks present [9] bin/sprint-journal.sh emits ## /<phase> sections for all three custom phases [10] bin/analytics.sh --json counts the three under sprints.custom and reports custom_total >= 3 [11] bin/discard-sprint.sh --dry-run lists all three custom files [12] conductor sprint.sh start with the stack.json phase_graph; assert sprint has 10 nodes (think+plan+build+review+qa+ security+3 custom+ship) [13] conductor sprint.sh batch; assert build precedes license-audit and privacy-check, both schedule as type=read, release-readiness follows after both, ship follows after release-readiness [14] scaffold from a git subdirectory; assert skills land in repo root .nanostack/skills (not subdir/.nanostack) [15] scaffold without git (fake HOME); assert skills land in $HOME/.nanostack/skills (not cwd) The new e2e-custom-stack-examples GitHub Actions job runs the harness on workflow_dispatch, alongside the existing e2e-custom-stack and the rest of the e2e jobs. Bug fix surfaced by the harness: privacy-check's source-tree scanner used `head -1` on per-line token extraction, so a single line that collected both `email` and `name` (the e2e fixture's src/signup.js) only reported the first match. The smoke case 2 asserted `any` rather than `all`, so the bug was invisible until the runtime harness asserted both signals explicitly. Fix: scan_personal_data and scan_telemetry now emit one signal per unique matching token in each line. The PR 2 smoke cases for name-only and ga continue to pass; the runtime e2e assertion that both email and name surface from the same line now passes too.
13 tasks
garagon
added a commit
that referenced
this pull request
Apr 26, 2026
* custom-stack-examples PR 4: public copy positions stack story Final PR of the Custom Stack Examples v1 round. PR 1 #203 landed the manifest + static contract. PR 2 #204 wired the real skill behavior. PR 3 #205 added the 51-assertion runtime harness. With the harness green, the framework spec's "do not reposition the README hero before runtime E2E lands" rule unblocks. This PR updates the public copy to position the stack story without overclaim. README.md "Build on nanostack" splits into Single skill + Workflow stack subsections. The single-skill block stays as-is (proven by ci/e2e-custom-stack-flows.sh). The new workflow-stack block points at examples/custom-stack-template/compliance-release/ as the reference shape, names the three skills (license-audit + privacy-check + release-readiness composer), claims the compliance-release example proves save / resolve / journal / analytics / discard / conductor compose, and names the harness that proves it (ci/e2e-custom-stack-examples.sh, 15 cells, 51 assertions). README.es.md gets the same shape under "Construí tu propio workflow stack" so Spanish parity holds. Same tokens, same claims, no new claims English-only. EXTENDING.md "Quickest way to start" expands to a two-row table: single-skill via examples/custom-skill-template/audit-licenses, vs. workflow stack via examples/custom-stack-template/compliance-release. Each row has its own quickstart block. Both link to their proving harness. The compliance-release/README.md status banner switches from "PR 1, install commands run once PR 3 ships" to "end-to-end working, 49 contract + 51 runtime assertions". The custom-stack-template overview README replaces its PR-by-PR breakdown with a "what's covered" summary that names every harness behind the claim. Lint adds custom-stack-examples-public-copy: - README.md + README.es.md each mention compliance-release, phase_graph, workflow stack, ci/e2e-custom-stack-examples.sh, and the example path. - EXTENDING.md links both starting points (single-skill template and the stack template), the spec doc, and the runtime harness. - Public copy does NOT contain disallowed phrases: marketplace, plugin ecosystem, GDPR ready, SOC2 ready, compliance certified, "works in every agent identically". After this lands, Custom Stack Examples v1 is complete and the round closes. * docs: drop em dash + scope claim to opt-in E2E workflow Codex's PR 4 review caught two real items. P1 CI red: examples/custom-stack-template/README.md line 47 used an em dash before "all proven by ci/e2e-custom-stack-examples.sh". The "No em-dashes in public copy" lint scans top-level *.md plus every examples/**/README.md, and this line tripped it. Replaced with a period. P2 overclaim: three sites said the runtime harness runs "on every workflow run" or implied auto-on-PR coverage. The harness lives in .github/workflows/e2e.yml as a workflow_dispatch job, not part of the on-PR lint matrix, so the claim does not match the actual config. Softened to "in the opt-in E2E workflow": - README.md "Build on nanostack" workflow-stack subsection. - README.es.md "Construí tu propio workflow stack" subsection (same shape, same scope language). - examples/custom-stack-template/compliance-release/README.md status banner: now distinguishes the static contract (runs on every PR) from the runtime harness (runs in the opt-in E2E workflow). - examples/custom-stack-template/README.md "what's covered" list. The custom-stack-examples-public-copy lock added in PR 4 still passes (it requires the harness path token to appear, not a specific frequency claim). Em-dash sweep clean across every file in scope.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Second PR of the Custom Stack Examples v1 round. PR 1 #203 landed the manifest, READMEs, and placeholder skills behind a 49-check static contract. This PR replaces the three
bin/*.shplaceholders with real behavior that PR 3's runtime E2E will exercise on a live project.Skill behavior
license-audit/bin/audit.shpackage.json,requirements.txt,pyproject.toml,go.mod). Optional positional arg overrides the detection.licensemetadata from each module'spackage.jsonundernode_modules/when available; falls back tounknownotherwise. Python and Go manifests do not declare license metadata, so those deps always classify asunknownuntil the user runs a deeper auditor.requireforms: indented modules insiderequire (...)blocks AND single-linerequire <module> <version>statements. The first revision missed the single-line form.permissive(MIT/BSD/Apache/ISC/etc),weak_copyleft(LGPL/MPL/EPL),strong_copyleft(GPL/AGPL), orunknown.{ stack, counts: {total, permissive, weak_copyleft, strong_copyleft, unknown}, flagged: [{name, license}] }.summary.status(OK / WARN / BLOCKED) from counts + flagged.privacy-check/bin/check.shsrc/,app/,pages/,server/,api/,lib/) for personal-data tokens (email, name, phone, address, payment, ssn, api_key, access_token, file_upload) and telemetry library references (analytics, tracking, telemetry, segment, posthog, ga, mixpanel, sentry)..env.example,.env.sample,.env.template) for keys hinting at collection (EMAIL_*,PHONE_*,PAYMENT_*,SECRET_*,TOKEN_*,API_KEY_*). Never reads.env,.env.local,.env.production, or credential JSON; the bash guard's G-035 already blocks those at the host layer.PRIVACY.md, a## Privacy/## Privacidad/## Data handlingH2 inREADME.md, orTELEMETRY.mdwhen the only signal class is telemetry.{ signals: [{kind, file, evidence}], missing: [...] }. Not a legal review.release-readiness/bin/summarize.sh$NANOSTACK_ROOT/bin/find-artifact.sh --verifyforreview,qa,security,license-audit,privacy-check. Two-step lookup distinguishes "never saved" from "saved but tampered":find-artifact.sh phase 30(does any artifact exist?)find-artifact.sh phase 30 --verify(does its integrity verify?)MISSING(artifact absent),TAMPERED(hash mismatch withevidence: "integrity_failure"OR.integrityfield absent withevidence: "missing_integrity"),OK/WARN/BLOCKED(passed through from the verified artifact'ssummary.status).BLOCKED/TAMPERED/MISSING→ rollupBLOCKED; anyWARN→ rollupWARN; otherwiseOK. The composer never softens a failure.{ checks: [{phase, status, evidence}], rollup_status }. Read-only. Does not run/ship, open PRs, commit, or deploy.Smoke harnesses
Each skill's
bin/smoke.shexercises real cases on/tmpprojects, no network, no installs.license-auditnode_modules/lodash/package.json), Node GPL-3.0 inflagged, Pythonrequirements.txt, Go block-formrequire (...), empty project, Go single-linerequire <module> <version>.privacy-checkPRIVACY.md, email +## PrivacyH2, telemetry-only +TELEMETRY.md,.env.examplewithEMAIL_API_KEY, name-only collection (coversnametoken),gaimport (covers Google Analytics).release-readinessintegrity_failure), stripped integrity field →missing_integrity. Thewrite_artifacthelper computes a real sha256 hash so OK/WARN/BLOCKED cases stay honest.Total: 21 case-level assertions across the three skills.
Test plan
Out of scope (PRs 3-4)
ci/e2e-custom-stack-examples.sh— full install → resolve → journal → analytics → discard → conductor scheduling on a real/tmpproject, including subdir + no-git scaffold paths and the spec's ≥35-assertion target.README.md+README.es.md+EXTENDING.md"build your own workflow stack" repositioning, only after PR 3's harness proves it.