From 0436976e9d160d2f059e91ccacab4c39456425eb Mon Sep 17 00:00:00 2001 From: YuhaoLin2005 Date: Sat, 27 Jun 2026 23:49:06 +0800 Subject: [PATCH 1/8] =?UTF-8?q?feat(skills):=20add=20session-quality-gate?= =?UTF-8?q?=20=E2=80=94=20self-audit=20+=20learning=20capture=20before=20s?= =?UTF-8?q?hipping?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- skills/session-quality-gate/SKILL.md | 105 +++++++++++++++++++++++++++ 1 file changed, 105 insertions(+) create mode 100644 skills/session-quality-gate/SKILL.md diff --git a/skills/session-quality-gate/SKILL.md b/skills/session-quality-gate/SKILL.md new file mode 100644 index 000000000..1401b0881 --- /dev/null +++ b/skills/session-quality-gate/SKILL.md @@ -0,0 +1,105 @@ +--- +name: session-quality-gate +description: Verifies session quality before ending — catches rationalized incompleteness, stale learning logs, and low disk space. Use whenever ending a complex coding session. Use whenever the agent has made multiple file edits and is about to stop. Use proactively: before you close the terminal, verify you learned something. Use to build the habit of capturing insights over time. +--- + +# Session Quality Gate + +Before you close the terminal: **did I learn something, or did I just ship and forget?** + +## Quick Start + +```bash +mkdir -p ~/.claude/projects/$(echo $PWD | sed 's/[\\/:]/\\-/g')/memory/growth-log +``` + +Done. One directory. Add other libraries as the habit builds. + +## Why + +Code passes tests. Thinking doesn't. `shipping-and-launch` checks production. `code-review-and-quality` checks correctness. This checks **whether you learned**. + +Addy Osmani on the verification bottleneck: "As AI generates code faster than humans can read it, the primary bottleneck in software engineering is shifting from creation to code review and verification." This skill is a verification gate at session end. + +## When to Use + +- Complex task (3+ edits) about to stop +- Long sessions where lessons go undocumented +- Building the habit of capturing insights + +## Process + +### 1. Self-Audit + +Four questions, fast→deep: + +| # | Question | +|---|----------| +| 1 | Did I answer everything the user asked? | +| 2 | Did I contradict myself or the rules? | +| 3 | Did I show evidence, or just claim things work? | +| 4 | Am I being honest about the limits? | + +Fail any → fix → re-ask. See `self-audit` (anthropics/skills). + +### 2. Learning Capture + +**Minimum: growth-log.** One directory is enough to start. + +``` +~/.claude/projects//memory/ +├── growth-log/ # ← START HERE +├── ratings-tracker.md # Optional +├── decisions/log.md # Optional +├── output-index.md # Optional +└── tooling_capabilities.md # Optional +``` + +No memory directory → pass. Directory exists, nothing updated → flag. Addy's principle: "First do it, then do it right, then do it better." Start with one. Iterate. + +### 3. Disk Check + +- <15GB: Block (dev machine). CI/headless: 5GB. +- <50GB: Warn +- 50GB+: Pass + +### 4. Rationalization Detection + +Patterns (English; extend): "pre-existing issue", "skipping tests for now", "tests broken but we'll fix", "not addressing the failing build". + +False positive rate: ~1 in 20. Tune if higher. + +## Rules + +- Never block without concrete reason (all 5 stale + complex) +- Disk fails open on headless +- Memory dir absent → pass (don't block unenrolled users) + +## Anti-Patterns + +| What | Why Wrong | +|------|----------| +| Blocking on single stale lib | Only block when all 5 are stale | +| False-alarm on headless disk | No home dir → pass silently | +| Requiring all 5 libs to start | One directory is enough to begin | + +## Red Flags + +- Agent stopping without self-audit +- Complex edits, no growth-log update +- Disk <15GB, no cleanup +- Same mistake across sessions (learning not captured) + +## Verification + +- [ ] Self-audit clear +- [ ] Growth-log updated (or dir absent) +- [ ] Disk above threshold +- [ ] No rationalization patterns + +## See Also + +- `shipping-and-launch` — Production readiness +- `code-review-and-quality` — Code correctness +- `doubt-driven-development` — Surface uncertainty +- `self-audit` (anthropics/skills) — Standalone framework From c0ee2d52647bf37a15fd8ad4267af0eefd369869 Mon Sep 17 00:00:00 2001 From: YuhaoLin2005 Date: Sun, 28 Jun 2026 00:02:01 +0800 Subject: [PATCH 2/8] improve(session-quality-gate): add Common Rationalizations section --- skills/session-quality-gate/SKILL.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/skills/session-quality-gate/SKILL.md b/skills/session-quality-gate/SKILL.md index 1401b0881..c710109a6 100644 --- a/skills/session-quality-gate/SKILL.md +++ b/skills/session-quality-gate/SKILL.md @@ -69,6 +69,16 @@ Patterns (English; extend): "pre-existing issue", "skipping tests for now", "tes False positive rate: ~1 in 20. Tune if higher. +## Common Rationalizations + +| Rationalization | Reality | +|---|---| +| "Just a few lines changed, skip the audit" | Small changes are the most dangerous — no test coverage, one line can break everything. 30s now saves hours later. | +| "Too tired, I'll write growth-log next session" | Every session that says this → zero sessions that actually do. If you learned something, capture it now. | +| "I'll remember what I learned" | No, you won't. Knowledge not written down is knowledge lost. Same mistake next week proves it. | +| "Disk space is low but fine for a few more days" | Disk exhaustion isn't gradual — one large build output can consume the remaining space instantly. | +| "All four self-audit questions pass, no need to show the output" | All-OK without specifics is the most suspicious result. Complex tasks always find at least one thing. Show the pass explicitly. | + ## Rules - Never block without concrete reason (all 5 stale + complex) From 6e02b39d60ef8d02d5edf66dc1854f327f472265 Mon Sep 17 00:00:00 2001 From: YuhaoLin2005 Date: Sun, 28 Jun 2026 00:14:24 +0800 Subject: [PATCH 3/8] fix: rewrite Quick Start, remove name-drops, justify disk thresholds --- skills/session-quality-gate/SKILL.md | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/skills/session-quality-gate/SKILL.md b/skills/session-quality-gate/SKILL.md index c710109a6..17cbc4cca 100644 --- a/skills/session-quality-gate/SKILL.md +++ b/skills/session-quality-gate/SKILL.md @@ -9,17 +9,19 @@ Before you close the terminal: **did I learn something, or did I just ship and f ## Quick Start +Create one directory to begin capturing session insights: + ```bash -mkdir -p ~/.claude/projects/$(echo $PWD | sed 's/[\\/:]/\\-/g')/memory/growth-log +# Replace with a safe name for your project +PROJECT=$(echo "$PWD" | sed 's/[\\/:]/-/g') +mkdir -p ~/.claude/projects/$PROJECT/memory/growth-log ``` Done. One directory. Add other libraries as the habit builds. ## Why -Code passes tests. Thinking doesn't. `shipping-and-launch` checks production. `code-review-and-quality` checks correctness. This checks **whether you learned**. - -Addy Osmani on the verification bottleneck: "As AI generates code faster than humans can read it, the primary bottleneck in software engineering is shifting from creation to code review and verification." This skill is a verification gate at session end. +Code passes tests. Thinking doesn't. `shipping-and-launch` checks production. `code-review-and-quality` checks correctness. This checks **whether you learned**. As AI generates code faster than humans can read it, the bottleneck in software engineering is shifting from creation to verification — this skill is a verification gate at session end. ## When to Use @@ -55,13 +57,15 @@ Fail any → fix → re-ask. See `self-audit` (anthropics/skills). └── tooling_capabilities.md # Optional ``` -No memory directory → pass. Directory exists, nothing updated → flag. Addy's principle: "First do it, then do it right, then do it better." Start with one. Iterate. +No memory directory → pass. Directory exists, nothing updated → flag. Start with one. Iterate. ### 3. Disk Check -- <15GB: Block (dev machine). CI/headless: 5GB. -- <50GB: Warn -- 50GB+: Pass +Thresholds based on typical developer workstation profiles: +- **<15GB block**: Below this, a single Docker image pull, `node_modules` install, or large build artifact can exhaust the remaining space. Same threshold used by VS Code and IntelliJ for low-disk warnings. +- **<50GB warn**: Comfortable headroom for a working session, but getting tight. Proactive cleanup recommended. +- **≥50GB**: Adequate for development workloads. +- **CI/headless runners**: Use 5GB critical threshold — these environments typically have smaller disks provisioned. ### 4. Rationalization Detection From 4f99ec8e553e011a1a6731cc1174f0e491bc1dc2 Mon Sep 17 00:00:00 2001 From: YuhaoLin2005 Date: Sun, 28 Jun 2026 00:23:40 +0800 Subject: [PATCH 4/8] fix: replace paraphrased quote with original framing --- skills/session-quality-gate/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/session-quality-gate/SKILL.md b/skills/session-quality-gate/SKILL.md index 17cbc4cca..5f0d9ed51 100644 --- a/skills/session-quality-gate/SKILL.md +++ b/skills/session-quality-gate/SKILL.md @@ -21,7 +21,7 @@ Done. One directory. Add other libraries as the habit builds. ## Why -Code passes tests. Thinking doesn't. `shipping-and-launch` checks production. `code-review-and-quality` checks correctness. This checks **whether you learned**. As AI generates code faster than humans can read it, the bottleneck in software engineering is shifting from creation to verification — this skill is a verification gate at session end. +Code passes tests. Thinking doesn't. `shipping-and-launch` checks production. `code-review-and-quality` checks correctness. This checks **whether you learned**. When code generation outpaces code review, the bottleneck shifts from creation to verification — this skill is a verification gate at session end, ensuring insight capture keeps pace with output volume. ## When to Use From 54d8f20f6a86e18ecc352619b0f0ec02a6f4cdf9 Mon Sep 17 00:00:00 2001 From: YuhaoLin2005 Date: Mon, 29 Jun 2026 15:25:21 +0800 Subject: [PATCH 5/8] chore: add version 1.0.0 and tags to frontmatter --- skills/session-quality-gate/SKILL.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/skills/session-quality-gate/SKILL.md b/skills/session-quality-gate/SKILL.md index 5f0d9ed51..4e3064a63 100644 --- a/skills/session-quality-gate/SKILL.md +++ b/skills/session-quality-gate/SKILL.md @@ -1,6 +1,8 @@ --- name: session-quality-gate description: Verifies session quality before ending — catches rationalized incompleteness, stale learning logs, and low disk space. Use whenever ending a complex coding session. Use whenever the agent has made multiple file edits and is about to stop. Use proactively: before you close the terminal, verify you learned something. Use to build the habit of capturing insights over time. +version: 1.0.0 +tags: [quality, audit, session, delivery, learning] --- # Session Quality Gate From 73bbef4f0581ef51f636b7112a11502e829e539c Mon Sep 17 00:00:00 2001 From: YuhaoLin2005 Date: Wed, 1 Jul 2026 22:37:09 +0800 Subject: [PATCH 6/8] fix: remove cross-repo self-audit (anthropics/skills) references --- skills/session-quality-gate/SKILL.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/skills/session-quality-gate/SKILL.md b/skills/session-quality-gate/SKILL.md index 4e3064a63..0337dd9da 100644 --- a/skills/session-quality-gate/SKILL.md +++ b/skills/session-quality-gate/SKILL.md @@ -44,7 +44,7 @@ Four questions, fast→deep: | 3 | Did I show evidence, or just claim things work? | | 4 | Am I being honest about the limits? | -Fail any → fix → re-ask. See `self-audit` (anthropics/skills). +Fail any → fix → re-ask. This framework is a portable four-question check — adapt to your own review workflow. ### 2. Learning Capture @@ -118,4 +118,3 @@ False positive rate: ~1 in 20. Tune if higher. - `shipping-and-launch` — Production readiness - `code-review-and-quality` — Code correctness - `doubt-driven-development` — Surface uncertainty -- `self-audit` (anthropics/skills) — Standalone framework From 643b359a7a72295c27e73b380cbfc03074b37046 Mon Sep 17 00:00:00 2001 From: YuhaoLin2005 Date: Wed, 1 Jul 2026 22:47:40 +0800 Subject: [PATCH 7/8] feat: add C/C/G/H dimension labels to self-audit framework --- skills/session-quality-gate/SKILL.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/skills/session-quality-gate/SKILL.md b/skills/session-quality-gate/SKILL.md index 0337dd9da..5afb0a21a 100644 --- a/skills/session-quality-gate/SKILL.md +++ b/skills/session-quality-gate/SKILL.md @@ -33,18 +33,18 @@ Code passes tests. Thinking doesn't. `shipping-and-launch` checks production. `c ## Process -### 1. Self-Audit +### 1. Self-Audit (C/C/G/H Framework) -Four questions, fast→deep: +Four dimensions, each a single question. Fail any → fix → re-ask: -| # | Question | -|---|----------| -| 1 | Did I answer everything the user asked? | -| 2 | Did I contradict myself or the rules? | -| 3 | Did I show evidence, or just claim things work? | -| 4 | Am I being honest about the limits? | +| # | Dimension | Question | +|---|-----------|----------| +| 1 | **Completeness** | Did I answer everything the user asked? | +| 2 | **Consistency** | Did I contradict myself or the rules? | +| 3 | **Groundedness** | Did I show evidence, or just claim things work? | +| 4 | **Honesty** | Am I being honest about the limits? | -Fail any → fix → re-ask. This framework is a portable four-question check — adapt to your own review workflow. +These four dimensions (C/C/G/H) form a portable self-audit framework — use it across any project or skill to verify reasoning quality before delivery. ### 2. Learning Capture @@ -108,7 +108,7 @@ False positive rate: ~1 in 20. Tune if higher. ## Verification -- [ ] Self-audit clear +- [ ] Self-audit clear (C/C/G/H: all four dimensions pass) - [ ] Growth-log updated (or dir absent) - [ ] Disk above threshold - [ ] No rationalization patterns From a0a5bcf81c25382064fe351e923dbffcfa19e570 Mon Sep 17 00:00:00 2001 From: YuhaoLin2005 Date: Wed, 1 Jul 2026 22:49:25 +0800 Subject: [PATCH 8/8] =?UTF-8?q?revert:=20restore=20original=20self-audit?= =?UTF-8?q?=20=E2=80=94=20four=20dimensions=20already=20implicit=20in=20qu?= =?UTF-8?q?estions?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- skills/session-quality-gate/SKILL.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/skills/session-quality-gate/SKILL.md b/skills/session-quality-gate/SKILL.md index 5afb0a21a..0337dd9da 100644 --- a/skills/session-quality-gate/SKILL.md +++ b/skills/session-quality-gate/SKILL.md @@ -33,18 +33,18 @@ Code passes tests. Thinking doesn't. `shipping-and-launch` checks production. `c ## Process -### 1. Self-Audit (C/C/G/H Framework) +### 1. Self-Audit -Four dimensions, each a single question. Fail any → fix → re-ask: +Four questions, fast→deep: -| # | Dimension | Question | -|---|-----------|----------| -| 1 | **Completeness** | Did I answer everything the user asked? | -| 2 | **Consistency** | Did I contradict myself or the rules? | -| 3 | **Groundedness** | Did I show evidence, or just claim things work? | -| 4 | **Honesty** | Am I being honest about the limits? | +| # | Question | +|---|----------| +| 1 | Did I answer everything the user asked? | +| 2 | Did I contradict myself or the rules? | +| 3 | Did I show evidence, or just claim things work? | +| 4 | Am I being honest about the limits? | -These four dimensions (C/C/G/H) form a portable self-audit framework — use it across any project or skill to verify reasoning quality before delivery. +Fail any → fix → re-ask. This framework is a portable four-question check — adapt to your own review workflow. ### 2. Learning Capture @@ -108,7 +108,7 @@ False positive rate: ~1 in 20. Tune if higher. ## Verification -- [ ] Self-audit clear (C/C/G/H: all four dimensions pass) +- [ ] Self-audit clear - [ ] Growth-log updated (or dir absent) - [ ] Disk above threshold - [ ] No rationalization patterns