Skip to content

custom-stack-examples PR 2: real compliance-release skill behavior#204

Merged
garagon merged 4 commits into
mainfrom
cse-2-skill-behavior
Apr 26, 2026
Merged

custom-stack-examples PR 2: real compliance-release skill behavior#204
garagon merged 4 commits into
mainfrom
cse-2-skill-behavior

Conversation

@garagon
Copy link
Copy Markdown
Owner

@garagon garagon commented Apr 26, 2026

Summary

Second PR of the Custom Stack Examples v1 round. PR 1 #203 landed the manifest, READMEs, and placeholder skills behind a 49-check static contract. This PR replaces the three bin/*.sh placeholders with real behavior that PR 3's runtime E2E will exercise on a live project.

Skill behavior

license-audit/bin/audit.sh

  • Auto-detects the project stack (package.json, requirements.txt, pyproject.toml, go.mod). Optional positional arg overrides the detection.
  • Walks direct dependencies only. For Node, reads license metadata from each module's package.json under node_modules/ when available; falls back to unknown otherwise. Python and Go manifests do not declare license metadata, so those deps always classify as unknown until the user runs a deeper auditor.
  • Go scanner reads both require forms: indented modules inside require (...) blocks AND single-line require <module> <version> statements. The first revision missed the single-line form.
  • Classifies into permissive (MIT/BSD/Apache/ISC/etc), weak_copyleft (LGPL/MPL/EPL), strong_copyleft (GPL/AGPL), or unknown.
  • Emits { stack, counts: {total, permissive, weak_copyleft, strong_copyleft, unknown}, flagged: [{name, license}] }.
  • Always exits 0; the calling skill computes summary.status (OK / WARN / BLOCKED) from counts + flagged.

privacy-check/bin/check.sh

  • Bounded source-tree scan (src/, app/, pages/, server/, api/, lib/) for personal-data tokens (email, name, phone, address, payment, ssn, api_key, access_token, file_upload) and telemetry library references (analytics, tracking, telemetry, segment, posthog, ga, mixpanel, sentry).
  • Reads env templates (.env.example, .env.sample, .env.template) for keys hinting at collection (EMAIL_*, PHONE_*, PAYMENT_*, SECRET_*, TOKEN_*, API_KEY_*). Never reads .env, .env.local, .env.production, or credential JSON; the bash guard's G-035 already blocks those at the host layer.
  • Resolves "privacy note exists" via PRIVACY.md, a ## Privacy / ## Privacidad / ## Data handling H2 in README.md, or TELEMETRY.md when the only signal class is telemetry.
  • Emits { signals: [{kind, file, evidence}], missing: [...] }. Not a legal review.

release-readiness/bin/summarize.sh

  • Reads upstream artifacts via $NANOSTACK_ROOT/bin/find-artifact.sh --verify for review, qa, security, license-audit, privacy-check. Two-step lookup distinguishes "never saved" from "saved but tampered":
    1. find-artifact.sh phase 30 (does any artifact exist?)
    2. find-artifact.sh phase 30 --verify (does its integrity verify?)
  • Per-upstream status: MISSING (artifact absent), TAMPERED (hash mismatch with evidence: "integrity_failure" OR .integrity field absent with evidence: "missing_integrity"), OK / WARN / BLOCKED (passed through from the verified artifact's summary.status).
  • Rollup is monotonic worst-case: any BLOCKED / TAMPERED / MISSING → rollup BLOCKED; any WARN → rollup WARN; otherwise OK. The composer never softens a failure.
  • Emits { checks: [{phase, status, evidence}], rollup_status }. Read-only. Does not run /ship, open PRs, commit, or deploy.

Smoke harnesses

Each skill's bin/smoke.sh exercises real cases on /tmp projects, no network, no installs.

Skill Cases What's covered
license-audit 6 Node MIT permissive (read from node_modules/lodash/package.json), Node GPL-3.0 in flagged, Python requirements.txt, Go block-form require (...), empty project, Go single-line require <module> <version>.
privacy-check 8 Clean, email collection without note (signal + missing), email + PRIVACY.md, email + ## Privacy H2, telemetry-only + TELEMETRY.md, .env.example with EMAIL_API_KEY, name-only collection (covers name token), ga import (covers Google Analytics).
release-readiness 7 All-OK, one WARN, one BLOCKED, one MISSING, mixed WARN+MISSING, tampered artifact (wrong hash → integrity_failure), stripped integrity field → missing_integrity. The write_artifact helper computes a real sha256 hash so OK/WARN/BLOCKED cases stay honest.

Total: 21 case-level assertions across the three skills.

Test plan

  • license-audit smoke: 6/6
  • privacy-check smoke: 8/8
  • release-readiness smoke: 7/7
  • tests/run.sh: 83/83
  • ci/e2e-user-flows.sh: 100/100
  • ci/e2e-custom-stack-flows.sh: 30/30
  • ci/check-custom-stack-examples.sh: 49/49 (static contract still passes; this PR adds runtime behavior, not new structural surface)
  • ci/e2e-think-flows.sh: 32/32
  • ci/e2e-think-archetypes.sh: 25/25
  • ci/e2e-onboarding-flows.sh: 34/34
  • ci/e2e-delivery-matrix.sh: 17/17
  • ci/check-examples.sh: 32/32

Out of scope (PRs 3-4)

  • PR 3: ci/e2e-custom-stack-examples.sh — full install → resolve → journal → analytics → discard → conductor scheduling on a real /tmp project, including subdir + no-git scaffold paths and the spec's ≥35-assertion target.
  • PR 4: README.md + README.es.md + EXTENDING.md "build your own workflow stack" repositioning, only after PR 3's harness proves it.

garagon added 4 commits April 26, 2026 16:39
Second PR of the Custom Stack Examples v1 round. PR 1 landed the
manifest, READMEs, and placeholder skills with a 49-check static
contract. This PR replaces the three bin/*.sh placeholders with
real behavior that the next PR's runtime E2E can build on.

license-audit/bin/audit.sh
  - Auto-detects the project stack from package.json,
    requirements.txt, pyproject.toml, or go.mod in cwd. Optional
    positional arg overrides the detection when more than one
    manifest is present.
  - Walks direct dependencies. For Node, reads license metadata
    from each module's package.json under node_modules/ when
    available; falls back to "unknown" when not installed. Python
    and Go manifests do not declare license metadata, so deps from
    those stacks always classify as unknown.
  - Classifies licenses into permissive (MIT/BSD/Apache/ISC/etc),
    weak_copyleft (LGPL/MPL/EPL), strong_copyleft (GPL/AGPL),
    unknown.
  - Emits JSON: { stack, counts: {total, permissive, weak_copyleft,
    strong_copyleft, unknown}, flagged: [{name, license}] }.
  - Always exits 0; the artifact's summary.status (OK / WARN /
    BLOCKED) is computed by the calling skill from counts + flagged.

privacy-check/bin/check.sh
  - Bounded source-tree scan (src/, app/, pages/, server/, api/,
    lib/) for personal-data tokens (email/phone/address/payment/
    ssn/api_key/access_token/file_upload) and telemetry library
    references (analytics/tracking/telemetry/segment/posthog/
    mixpanel/sentry).
  - Reads env templates (.env.example, .env.sample, .env.template)
    for keys hinting at collection (EMAIL_*, PHONE_*, PAYMENT_*,
    SECRET_*, TOKEN_*, API_KEY_*). Never reads .env, .env.local,
    .env.production, or credential JSON; the bash guard already
    blocks those at the host layer.
  - Resolves "privacy note exists" via PRIVACY.md, a "## Privacy"
    or "## Privacidad" or "## Data handling" H2 in README.md, or
    TELEMETRY.md when the only signal class is telemetry.
  - Emits JSON: { signals: [{kind, file, evidence}], missing: [...] }.
  - Read-only.

release-readiness/bin/summarize.sh
  - Reads upstream artifacts via $NANOSTACK_ROOT/bin/find-artifact.sh
    for review, qa, security, license-audit, privacy-check.
  - Maps each artifact's summary.status to a per-check entry.
    Missing artifact -> MISSING. Artifact with no declared status ->
    WARN. Otherwise the declared status passes through.
  - Rolls up using monotonic worst-case logic: any BLOCKED -> rollup
    BLOCKED; any MISSING required upstream -> rollup BLOCKED; any
    WARN -> rollup WARN; otherwise OK.
  - Emits JSON: { checks: [{phase, status, evidence}], rollup_status }.
  - Read-only. Does not run /ship, open PRs, commit, or deploy.

Each skill's bin/smoke.sh now exercises real cases:
  - license-audit: 5 cases (Node MIT permissive, Node GPL flagged,
    Python requirements.txt, Go go.mod, empty project).
  - privacy-check: 6 cases (clean, leak without note, leak with
    PRIVACY.md, leak with README '## Privacy' H2, telemetry-only
    with TELEMETRY.md, env-template hint).
  - release-readiness: 5 cases (all-OK, one WARN, one BLOCKED, one
    MISSING, mixed WARN+MISSING).

The static contract count stays at 49; this PR adds runtime
behavior, not new structural surface. PR 3 wires
ci/e2e-custom-stack-examples.sh to exercise the end-to-end install
+ resolve + journal + analytics + discard + conductor journey.
… signals

Codex's PR 2 review caught three real cases the smoke tests did not
cover. All three are fixed and locked.

P2.1 release-readiness now uses find-artifact.sh --verify. The
release gate composes the final pre-ship decision from local
artifacts, but the previous version called find-artifact without
integrity verification. A modified .nanostack/security or any other
upstream artifact could roll the gate up to OK without detection.

The composer now does a two-step lookup per upstream:
  1. find-artifact.sh phase 30           (does any artifact exist?)
  2. find-artifact.sh phase 30 --verify  (is its integrity intact?)

A tampered artifact (mtime untouched, content rewritten) emits a
new TAMPERED status in the per-check entry and forces the rollup
to BLOCKED, separately from MISSING (artifact never saved). The
rollup logic is still monotonic worst-case but the failure flag is
now a single HAS_FAILURE that combines BLOCKED, TAMPERED, and
MISSING. New smoke case 6: write a security artifact with a
deliberately-wrong .integrity hash, assert the per-check status is
TAMPERED and the rollup is BLOCKED.

P2.2 license-audit go.mod scanner now reads single-line require.
The original regex `grep -E '^[[:space:]]+...'` only matched indented
modules inside `require (...)` blocks. A common minimal go.mod uses
the single-line form `require github.com/spf13/cobra v1.8.0` at
column 0 with no block, which the scanner silently dropped (counts.
total == 0 for a real Go dependency set).

scan_go now uses an awk state machine that handles both forms:
  - in_block:    `require (...)` block + indented module names
  - top-level:   `require <module> <version>` single-line statement

Smoke case 6 added: go.mod with a single-line require, asserts
counts.total == 1 and the dep classifies as unknown.

P2.3 privacy-check now matches the full personal_data and telemetry
contracts. The SKILL.md says it covers `name` (personal data) and
`ga` (telemetry library / Google Analytics), but the regexes
omitted both. A signup form that only collected `name` or a
telemetry import that used `ga` passed with no signal.

PERSONAL_RE adds `name`. TELEMETRY_RE adds `ga`. Both tokens are
known false-positive triggers (name appears in lots of unrelated
identifiers, ga is short and noisy), but the contract claims
coverage and the user triages the per-file evidence list. The
trade-off is documented inline. Smoke cases 7 and 8 lock the claim.

Smoke totals: license-audit 6 cases (was 5), privacy-check 8 (was
6), release-readiness 6 (was 5). 20 case-level assertions total.
Static contract count unchanged (49). All other suites unchanged.
Codex's PR 2 follow-up review caught a subtle but real escalation
of the same class as the previous --verify fix: find-artifact.sh
--verify silently accepts artifacts whose .integrity field is
ABSENT — it only fails on a hash MISMATCH. So an attacker who can
write the file can:
  1. Open the artifact JSON
  2. Delete the .integrity field
  3. Set summary.status to "OK"
  4. Save

release-readiness then treats that as verified clean evidence and
the gate rolls up to OK. The previous "deliberately wrong hash"
fix did not catch this because the missing-integrity path skips
the hash check entirely.

bin/save-artifact.sh always writes .integrity (line 137), so a
legitimate artifact never trips this. The release gate now requires
the field to be present after find-artifact --verify succeeds:

  stored_integrity=$(jq -r '.integrity // ""' "$verified")
  if [ -z "$stored_integrity" ]; then
    status="TAMPERED"
    evidence="missing_integrity"
  fi

The per-check entry distinguishes the two failure modes:
  evidence="integrity_failure" -> hash mismatch (old TAMPERED case)
  evidence="missing_integrity" -> field absent (new TAMPERED case)
Both force rollup BLOCKED via the existing HAS_FAILURE flag.

Smoke harness updates:
- write_artifact now computes a real sha256 hash the same way
  bin/save-artifact.sh computes it (sha256 of the canonical jq -Sc
  form before adding the integrity field). Existing OK/WARN/
  BLOCKED/MISSING cases stay legitimate.
- New case 7: write all five upstreams properly, then run
  `jq 'del(.integrity)'` on the security artifact. Asserts
  per-check status=TAMPERED, evidence=missing_integrity, rollup
  BLOCKED.

Smoke totals: license-audit 6, privacy-check 8, release-readiness 7
(was 6). 21 case-level assertions across the round.
@garagon garagon merged commit 7d7476a into main Apr 26, 2026
52 checks passed
@garagon garagon deleted the cse-2-skill-behavior branch April 26, 2026 20:56
garagon added a commit that referenced this pull request Apr 26, 2026
#205)

Third PR of the Custom Stack Examples v1 round. PR 1 #203 landed the
manifest + 49-check static contract; PR 2 #204 wired the real skill
behavior. This PR adds ci/e2e-custom-stack-examples.sh, the runtime
contract that proves the compliance-release stack composes
end-to-end on a real /tmp project.

15 cells, 51 assertions, no network:

  [1]  fixture project (Node app, README, .env.example, src with
       email + name fields, MIT lodash under node_modules)
  [2]  install all three skills via bin/create-skill.sh --from with
       the documented --concurrency + --depends-on flags; assert
       skills land in $NANOSTACK_STORE/skills and config registers
       all three phases
  [3]  bin/check-custom-skill.sh validates each scaffolded skill
  [4]  save fake review/qa/security artifacts via save-artifact.sh
       (real .integrity hashes)
  [5]  run license-audit/bin/audit.sh + save artifact; assert MIT
       lodash classifies as permissive
  [6]  run privacy-check/bin/check.sh + save artifact; assert email
       AND name signals are detected, missing privacy_note flagged
  [7]  bin/resolve.sh release-readiness; assert phase_kind=custom
       and all five declared upstream keys resolve to a path
  [8]  run release-readiness/bin/summarize.sh; assert rollup is WARN
       (privacy-check is WARN, others OK), all five checks present
  [9]  bin/sprint-journal.sh emits ## /<phase> sections for all
       three custom phases
  [10] bin/analytics.sh --json counts the three under sprints.custom
       and reports custom_total >= 3
  [11] bin/discard-sprint.sh --dry-run lists all three custom files
  [12] conductor sprint.sh start with the stack.json phase_graph;
       assert sprint has 10 nodes (think+plan+build+review+qa+
       security+3 custom+ship)
  [13] conductor sprint.sh batch; assert build precedes
       license-audit and privacy-check, both schedule as type=read,
       release-readiness follows after both, ship follows after
       release-readiness
  [14] scaffold from a git subdirectory; assert skills land in repo
       root .nanostack/skills (not subdir/.nanostack)
  [15] scaffold without git (fake HOME); assert skills land in
       $HOME/.nanostack/skills (not cwd)

The new e2e-custom-stack-examples GitHub Actions job runs the
harness on workflow_dispatch, alongside the existing
e2e-custom-stack and the rest of the e2e jobs.

Bug fix surfaced by the harness: privacy-check's source-tree
scanner used `head -1` on per-line token extraction, so a single
line that collected both `email` and `name` (the e2e fixture's
src/signup.js) only reported the first match. The smoke case 2
asserted `any` rather than `all`, so the bug was invisible until
the runtime harness asserted both signals explicitly.

Fix: scan_personal_data and scan_telemetry now emit one signal per
unique matching token in each line. The PR 2 smoke cases for
name-only and ga continue to pass; the runtime e2e assertion that
both email and name surface from the same line now passes too.
garagon added a commit that referenced this pull request Apr 26, 2026
* custom-stack-examples PR 4: public copy positions stack story

Final PR of the Custom Stack Examples v1 round. PR 1 #203 landed
the manifest + static contract. PR 2 #204 wired the real skill
behavior. PR 3 #205 added the 51-assertion runtime harness. With
the harness green, the framework spec's "do not reposition the
README hero before runtime E2E lands" rule unblocks. This PR
updates the public copy to position the stack story without
overclaim.

README.md "Build on nanostack" splits into Single skill + Workflow
stack subsections. The single-skill block stays as-is (proven by
ci/e2e-custom-stack-flows.sh). The new workflow-stack block points
at examples/custom-stack-template/compliance-release/ as the
reference shape, names the three skills (license-audit +
privacy-check + release-readiness composer), claims the
compliance-release example proves save / resolve / journal /
analytics / discard / conductor compose, and names the harness
that proves it (ci/e2e-custom-stack-examples.sh, 15 cells, 51
assertions).

README.es.md gets the same shape under "Construí tu propio
workflow stack" so Spanish parity holds. Same tokens, same claims,
no new claims English-only.

EXTENDING.md "Quickest way to start" expands to a two-row table:
single-skill via examples/custom-skill-template/audit-licenses, vs.
workflow stack via examples/custom-stack-template/compliance-release.
Each row has its own quickstart block. Both link to their proving
harness.

The compliance-release/README.md status banner switches from "PR
1, install commands run once PR 3 ships" to "end-to-end working,
49 contract + 51 runtime assertions". The custom-stack-template
overview README replaces its PR-by-PR breakdown with a "what's
covered" summary that names every harness behind the claim.

Lint adds custom-stack-examples-public-copy:
  - README.md + README.es.md each mention compliance-release,
    phase_graph, workflow stack, ci/e2e-custom-stack-examples.sh,
    and the example path.
  - EXTENDING.md links both starting points (single-skill template
    and the stack template), the spec doc, and the runtime harness.
  - Public copy does NOT contain disallowed phrases:
    marketplace, plugin ecosystem, GDPR ready, SOC2 ready,
    compliance certified, "works in every agent identically".

After this lands, Custom Stack Examples v1 is complete and the
round closes.

* docs: drop em dash + scope claim to opt-in E2E workflow

Codex's PR 4 review caught two real items.

P1 CI red: examples/custom-stack-template/README.md line 47 used
an em dash before "all proven by ci/e2e-custom-stack-examples.sh".
The "No em-dashes in public copy" lint scans top-level *.md plus
every examples/**/README.md, and this line tripped it. Replaced
with a period.

P2 overclaim: three sites said the runtime harness runs "on every
workflow run" or implied auto-on-PR coverage. The harness lives in
.github/workflows/e2e.yml as a workflow_dispatch job, not part of
the on-PR lint matrix, so the claim does not match the actual
config. Softened to "in the opt-in E2E workflow":

- README.md "Build on nanostack" workflow-stack subsection.
- README.es.md "Construí tu propio workflow stack" subsection (same
  shape, same scope language).
- examples/custom-stack-template/compliance-release/README.md
  status banner: now distinguishes the static contract (runs on
  every PR) from the runtime harness (runs in the opt-in E2E
  workflow).
- examples/custom-stack-template/README.md "what's covered" list.

The custom-stack-examples-public-copy lock added in PR 4 still
passes (it requires the harness path token to appear, not a
specific frequency claim). Em-dash sweep clean across every file
in scope.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant