manuelschipper · moyukhc · Mar 17, 2026 · Mar 17, 2026 · Mar 17, 2026 · Mar 17, 2026
diff --git a/.claude/commands/nah-demo.md b/.claude/commands/nah-demo.md
@@ -0,0 +1,283 @@
+# nah test-live — Security Demo
+
+Demonstrate nah's protection by running curated test cases through the live classification pipeline. The user watches nah intercept tool calls in real-time.
+
+## CRITICAL EXECUTION RULES
+
+**Execute ONE case at a time. This is non-negotiable.**
+
+For each case:
+1. Print the story context and narration (the threat being demonstrated)
+2. Execute the case (live tool call or dry-run classification)
+3. Print the result with match indicator and technical explanation
+4. Print a `---` separator before the next case
+
+**NEVER batch cases into shell scripts or for-loops.** Each case gets its own individual tool call with narration. If you find yourself writing a loop or a script that runs multiple cases, STOP — you are violating the execution rules. The narration between cases IS the demo.
+
+**NEVER include comments in Bash tool calls.** No `# description` lines before commands. Comments become the first token and change nah's classification. Put all narration in your text output, not in the command string.
+
+## Phase 0: Introduction & Mode Selection
+
+### Introduce the demo
+
+Print this introduction (adapt the tone, don't copy verbatim):
+
+> nah is a safety guard that intercepts every tool call Claude makes — Bash commands, file reads, writes, edits, searches — and classifies them in milliseconds with zero tokens. It blocks dangerous patterns (like remote code execution), asks for confirmation on risky operations (like force-pushing), and stays invisible for safe everyday work.
+>
+> This demo runs curated test cases through the live pipeline. For safe cases and blocked cases, you'll see nah's real interception in your terminal. For dangerous commands that shouldn't actually execute, we use dry-run classification.
+
+### Permission setup check
+
+Before proceeding, remind the user about the recommended permission setup (adapt the tone, don't copy verbatim):
+
+> **Quick setup note:** Don't use `--dangerously-skip-permissions` — in bypass mode, hooks fire asynchronously, so commands can execute before nah blocks them.
+>
+> Make sure `Bash`, `Read`, `Glob`, and `Grep` are in `permissions.allow` in your `~/.claude/settings.json` — nah is guarding them. For **Write** and **Edit**, your call — nah inspects their content either way.
+>
+> This way, nah's live blocks and asks will show up properly during the demo.
+
+### Config check
+
+Run `nah config show` via Bash and inspect the output. If the user has any custom configuration (action overrides, classify entries, custom sensitive paths, content pattern changes, etc.), warn them:
+
+> **Heads up:** You have a custom nah config. The expected results in this demo assume default settings (full profile, no overrides). Your custom config may change some decisions — I'll note any mismatches as we go, but they might be intentional on your part rather than bugs.
+
+If the config is default/empty, say so briefly and move on.
+
+### Understanding config vs defaults
+
+This is important context for interpreting results during the demo:
+
+- The **test battery expected values assume default config** (full profile, no overrides).
+- **Live tool calls use the active config** because they exercise the real hook path. If a user's config changes a policy (e.g., adds `~/.ssh` to allowed paths, or relaxes `network_outbound` to allow), live results may differ from the battery's expected value. **This is the config working correctly, not a bug in the test battery.**
+- **Dry-run base cases use `nah test --defaults`** so they ignore global/project config and compare against packaged defaults.
+- **Config variants use `nah test --config`** because they intentionally test a temporary override. Do not combine `--defaults` and `--config`.
+- Only flag a base-case mismatch as a real issue if it still mismatches under `nah test --defaults`, or if a live-case mismatch cannot be explained by active config.
+
+### Select mode
+
+Check `$ARGUMENTS`:
+
+- If argument is **`full`**: use full mode (90 base + 21 config variants)
+- If argument is **`story:NAME`**: use story mode for that story
+- If **no argument**: ask the user which mode they want:
+
+> **Which mode would you like?**
+> - **Demo** (recommended) — 25 curated cases across 8 security stories. Takes ~5 minutes. Covers all the highlights.
+> - **Full** — All 90 base cases + 21 config variants + log verification. Comprehensive regression suite.
+> - **Single story** — Deep-dive into one threat category. Stories: `safe_operations`, `remote_code_execution`, `data_exfiltration`, `obfuscated_execution`, `path_boundary_protection`, `destructive_operations`, `credential_secret_detection`, `network_context`.
+
+Wait for the user's answer before proceeding.
+
+### Select pacing
+
+After the user picks a mode, ask:
+
+> **Pacing?**
+> - **Pause between cases** — I'll stop after each case so you can inspect the result. Say "next" or "continue" to advance.
+> - **Run straight through** — I'll run all cases back-to-back without stopping.
+
+Wait for the user's answer. If they choose to pause, after printing each case's result, wait for the user to respond before continuing to the next case.
+
+---
+
+## Phase 1: Setup
+
+1. Read `src/nah/data/test_battery.json`
+2. Filter cases based on selected mode:
+   - **demo**: base cases with `quick: true` (25 cases)
+   - **full**: all base cases (90) + all variants (21)
+   - **story:NAME**: base cases where `story` field matches NAME
+3. Print header:
+
+```
+## nah test-live — N cases (demo|full|story:NAME)
+```
+
+---
+
+## Phase 2: Story-Based Execution
+
+Group selected base cases by their `story` field. Process stories in this order:
+
+| Story key | Header |
+|-----------|--------|
+| `safe_operations` | Safe Operations — nah stays out of your way |
+| `remote_code_execution` | Remote Code Execution — download-and-execute pipelines |
+| `data_exfiltration` | Data Exfiltration — stealing sensitive data |
+| `obfuscated_execution` | Obfuscated Execution — hiding malicious intent |
+| `path_boundary_protection` | Path & Boundary Protection — sensitive files and directories |
+| `destructive_operations` | Destructive Operations — irreversible changes |
+| `credential_secret_detection` | Credential & Secret Detection — scanning for secrets |
+| `network_context` | Network Context — who are you talking to? |
+
+For each story group, print a story header:
+
+```
+## [Story Header]
+```
+
+Then for each case in the story, print:
+
+```
+### [N/total] `input_summary`
+
+**Threat:** [narration field from JSON]
+```
+
+Execute the case (see Execution Mechanics below), then print:
+
+```
+**Result:** decision ✓
+**Why:** [description field from JSON]
+
+---
+```
+
+If mismatch: `**Result:** actual ✗ (expected: expected)`
+
+---
+
+## Execution Mechanics
+
+### Live cases (`mode: "live"`)
+
+Actually invoke the real tool. The user sees nah's real decision in their terminal.
+
+**Bash**: Use the Bash tool with `input.command`.
+**Read**: Use the Read tool with `input.file_path`.
+**Write**: Use the Write tool with `input.file_path` and `input.content`. nah intercepts at the tool-call level, so blocked writes never need a prior Read.
+**Glob**: Use the Glob tool with `input.pattern` and `input.path` (if present).
+**Grep**: Use the Grep tool with `input.pattern` and `input.path` (if present).
+
+Detection:
+- Tool denied with reason starting with `nah.` → record as **block**
+- Tool denied with reason starting with `nah?` → record as **ask**
+- Tool executed normally → record as **allow**
+
+### Dry-run cases (`mode: "dry_run"`)
+
+Never execute the real tool. Use `nah test --defaults` via the Bash tool for base/story dry-run cases. Parse the `Decision:` line from output. Map: `ALLOW` → allow, `ASK` → ask, `BLOCK` → block.
+
+**Bash**:
+```bash
+nah test --defaults "the command here"
+```
+
+**Write** (with content inspection):
+```bash
+nah test --defaults --tool Write --path ./config.py --content "AWS_SECRET_ACCESS_KEY=AKIA1234567890ABCDEF"
+```
+
+**Edit** (with content inspection):
+```bash
+nah test --defaults --tool Edit --path ./app.py --content "api_secret = \"hunter2hunter2\""
+```
+
+**Read/Glob** (path-only):
+```bash
+nah test --defaults --tool Read ~/.ssh/id_rsa
+nah test --defaults --tool Glob ~/.ssh
+```
+
+**Grep** (with search pattern for credential detection):
+```bash
+nah test --defaults --tool Grep --path /tmp --pattern "password\s*="
+```
+
+**MCP tools**:
+```bash
+nah test --defaults --tool mcp__example__tool
+```
+
+---
+
+## Phase 3: Summary
+
+After all cases, print a summary.
+
+**Demo mode:**
+```
+## Demo Complete
+
+| Story | Cases | Passed |
+|-------|-------|--------|
+| Safe Operations | N/N | ✓ |
+| Remote Code Execution | N/N | ✓ |
+| ... | | |
+
+**Passed:** N/N | **Allow:** N | **Ask:** N | **Block:** N
+```
+
+If any mismatches, list them with case ID, expected, and actual.
+
+**Full mode:** Same summary, plus note the report file path.
+
+---
+
+## Full Mode: Additional Phases
+
+These phases run only in `full` mode, after the base battery.
+
+### Config Variants
+
+Run each variant using `nah test --config` with the variant's `config` object as inline JSON. This applies a temporary config override for that single invocation — no file writes, no backup/restore needed. Do not add `--defaults`; config variants are intentionally override-based.
+
+For each variant, announce it:
+```
+### [V#] Config variant (feature): `input_summary`
+Config: config_description
+Expected: expected (default: default_expected)
+```
+
+Then execute using `nah test --config '<JSON>' ...` — convert the variant's `config` object to a JSON string and pass it via the `--config` flag. Construct the rest of the command from the variant's `tool` and `input` fields.
+
+Examples:
+```bash
+# Bash variant with classify override
+nah test --config '{"classify": {"git_safe": ["git push --force"]}}' "git push --force"
+
+# Bash variant with profile: none
+nah test --config '{"profile": "none"}' "git status"
+
+# Write variant with content pattern suppression
+nah test --tool Write --path ./config.py --content "secret=abc" --config '{"content_patterns": {"suppress": ["private key"]}}'
+```
+
+### Log Cross-Check
+
+1. Run `nah log -n 50 --json` via Bash
+2. For each base case that resulted in block or ask, check for a matching log entry
+3. Report: `Log verified: ✓` or list missing entries
+
+### Report File
+
+Create `command_test_runs/` directory if it doesn't exist, then write a markdown report to `command_test_runs/YYYY-MM-DD_HHMMSS.md`:
+
+```markdown
+# nah test report
+
+**Date:** YYYY-MM-DD HH:MM:SS
+**Mode:** full
+**Cases:** N total (X base + Y variants)
+
+## Results
+
+| # | Story | Tool | Input | Expected | Actual | Mode | Match |
+|---|-------|------|-------|----------|--------|------|-------|
+
+## Config Variants
+
+| V# | Feature | Config | Input | Expected | Actual | Match |
+|----|---------|--------|-------|----------|--------|-------|
+
+## Summary
+
+- **Passed:** N/N (X%)
+- **Live/Dry-run:** N/N
+- **Log verification:** ✓ or N missing
+
+## Mismatches
+
+(only if any — include case ID, tool, input, expected vs actual, full output)
+```
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -0,0 +1,11 @@
+## Summary
+
+<!-- Brief description of what this PR does -->
+
+## Test plan
+
+<!-- How was this tested? -->
+
+---
+
+By submitting this pull request, I confirm that I have read and agree to the [Contributor License Agreement](../CLA.md).
diff --git a/.github/workflows/deploy-docs.yml b/.github/workflows/deploy-docs.yml
@@ -0,0 +1,40 @@
+name: Deploy docs to schipper.ai
+
+on:
+  push:
+    branches: [main]
+    paths:
+      - 'site/**'
+      - 'mkdocs.yml'
+
+  workflow_dispatch:
+
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v6
+
+      - uses: actions/setup-python@v6
+        with:
+          python-version: '3.12'
+
+      - name: Install mkdocs
+        run: pip install mkdocs-material
+
+      - name: Build docs
+        run: python -m mkdocs build
+
+      - name: Push to schipper.ai
+        run: |
+          git clone --depth 1 https://x-access-token:${{ secrets.SCHIPPER_AI_DEPLOY }}@github.com/manuelschipper/schipper.ai.git /tmp/schipper.ai
+          rm -rf /tmp/schipper.ai/static/nah
+          mkdir -p /tmp/schipper.ai/static
+          cp -r _build /tmp/schipper.ai/static/nah
+          cd /tmp/schipper.ai
+          git config user.name "github-actions[bot]"
+          git config user.email "github-actions[bot]@users.noreply.github.com"
+          git add static/nah
+          git diff --cached --quiet && echo "No changes" && exit 0
+          git commit -m "Update nah docs from nah@${GITHUB_SHA::7}"
+          git push
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
@@ -0,0 +1,48 @@
+name: Publish to PyPI
+
+on:
+  push:
+    tags: ['v*']
+
+  workflow_dispatch:
+
+permissions:
+  contents: write
+  id-token: write
+
+jobs:
+  publish:
+    runs-on: ubuntu-latest
+    environment: pypi
+    steps:
+      - uses: actions/checkout@v6
+
+      - uses: actions/setup-python@v6
+        with:
+          python-version: '3.12'
+
+      - name: Install build tools
+        run: pip install build
+
+      - name: Build package
+        run: python -m build
+
+      - name: Publish to PyPI
+        uses: pypa/gh-action-pypi-publish@release/v1
+
+      - name: Extract changelog for release
+        id: changelog
+        run: |
+          VERSION="${GITHUB_REF_NAME#v}"
+          # Extract the section between ## [VERSION] and the next ## [
+          NOTES=$(awk "/^## \\[${VERSION}\\]/{found=1; next} /^## \\[/{if(found) exit} found{print}" CHANGELOG.md)
+          # Write to file to preserve newlines
+          echo "$NOTES" > /tmp/release_notes.md
+
+      - name: Create GitHub Release
+        run: |
+          gh release create "$GITHUB_REF_NAME" \
+            --title "$GITHUB_REF_NAME" \
+            --notes-file /tmp/release_notes.md
+        env:
+          GH_TOKEN: ${{ github.token }}