Skip to content

feat(commands): improve learn-eval with checklist-based holistic verdict#360

Merged
affaan-m merged 4 commits intoaffaan-m:mainfrom
shimo4228:feat/commands/learn-eval-v2
Mar 11, 2026
Merged

feat(commands): improve learn-eval with checklist-based holistic verdict#360
affaan-m merged 4 commits intoaffaan-m:mainfrom
shimo4228:feat/commands/learn-eval-v2

Conversation

@shimo4228
Copy link
Contributor

@shimo4228 shimo4228 commented Mar 8, 2026

Summary

Improves the /learn-eval command by replacing the 5-dimension numeric scoring rubric with a checklist-based holistic verdict system.

Motivation

The original learn-eval used a scoring table (Specificity, Actionability, Scope Fit, Non-redundancy, Coverage — each scored 1–5). In practice with modern frontier models (Opus 4.6+), this approach has a key limitation: forcing rich qualitative signals into numeric scores loses nuance and can produce misleading totals. For example, a pattern might score 3/5 on Coverage but 5/5 on everything else — the total (23/25) looks great, but the Coverage gap might be critical.

Modern models have strong contextual judgment. Rather than quantizing that judgment into numbers, this update lets the model weigh all factors holistically while an explicit checklist ensures no critical verification step is skipped.

Key Changes

1. Explicit pre-save checklist (Step 5a)

Before any verdict, the model must actually verify:

  • Grep ~/.claude/skills/ for content overlap
  • Check MEMORY.md for overlap
  • Consider appending to an existing skill
  • Confirm reusability (not a one-off fix)

2. Four-way verdict system (Step 5b)

Replaces binary save/don't-save with nuanced outcomes:

Verdict When Action
Save Unique, specific, well-scoped Save as new skill
Improve then Save Valuable but needs refinement Revise once, then re-evaluate
Absorb into [X] Should be part of existing skill Append to existing file
Drop Trivial, redundant, or abstract Explain and stop

The Absorb verdict is particularly important — it prevents skill file proliferation by directing related knowledge into existing skills rather than creating new files.

3. Verdict-specific confirmation flows (Step 6)

Each verdict type has a tailored output format, showing exactly the information needed for that decision.

4. Guideline dimensions (non-scored)

The same quality dimensions (Specificity, Actionability, Scope Fit, Non-redundancy, Coverage) remain as guidance for the holistic judgment, but are no longer scored numerically.

Type

  • Skill / [x] Command / [ ] Agent / [ ] Hook

Testing

  • Tested locally with Claude Code CLI across multiple extraction sessions
  • Verified the checklist catches duplicates that numeric scoring missed
  • Confirmed Absorb verdict correctly identifies merge-worthy patterns

Checklist

  • Follows file naming conventions
  • Includes required sections
  • No hardcoded paths or personal information
  • English only

Summary by cubic

Upgrades /learn-eval to a checklist-driven, holistic verdict flow with knowledge-placement awareness to prevent duplicates and reduce skill file sprawl. Clarifies confirmation steps and output format, and adds a brief design rationale for safer, smarter saves.

  • New Features

    • Pre-save checklist: grep global/project skills, check global/project MEMORY.md, consider appending, confirm reusability
    • Four verdicts: Save, Improve then Save, Absorb into [X], Drop — with a short design rationale for holistic judgment
    • Verdict-specific confirmations and a structured output block (checklist + verdict + 1–2 sentence rationale)
  • Bug Fixes

    • Fixed markdownlint MD001 by changing sub-steps 5a/5b from #### to ###

Written for commit 712ae16. Summary will update on new commits.

Summary by CodeRabbit

  • Documentation
    • Switched evaluation workflow to a checklist-based quality gate and holistic verdicts (replacing numeric scoring).
    • Introduced four verdicts — Save, Improve then Save, Absorb into existing content, Drop — with verdict-specific confirmation flows and actions.
    • Added guideline dimensions, explicit pre-evaluation steps (overlap/reusability checks), consolidated checklist+verdict output formatting, and saving decision steps.
    • Documented design rationale and rule to append absorbed items to existing skills.

Replace the 5-dimension numeric scoring rubric with a checklist + holistic
verdict system (Save / Improve then Save / Absorb into [X] / Drop).

Key improvements:
- Explicit pre-save checklist: grep skills/ for duplicates, check MEMORY.md,
  consider appending to existing skills, confirm reusability
- 4-way verdict instead of binary save/don't-save: adds "Absorb into [X]"
  to prevent skill file proliferation, and "Improve then Save" for iterative
  refinement
- Verdict-specific confirmation flows tailored to each outcome
- Design rationale explaining why holistic judgment outperforms numeric
  scoring with modern frontier models
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 8, 2026

📝 Walkthrough

Walkthrough

Replaces the numeric 5-dimension self-evaluation rubric in commands/learn-eval.md with a checklist-first "Quality gate — Checklist + Holistic verdict" workflow, adds pre-evaluation steps, four verdicts (Save / Improve then Save / Absorb / Drop) with verdict-specific confirmation and updated output formatting.

Changes

Cohort / File(s) Summary
Evaluation workflow documentation
commands/learn-eval.md
Replaced numeric self-evaluation rubric with a checklist-driven "Quality gate — Checklist + Holistic verdict" model. Added explicit pre-evaluation checks, a Required checklist (5a) and Holistic verdict (5b), four verdict outcomes with confirmation flows and formatted outputs; added rule to append on "Absorb" and updated design rationale and notes.

Sequence Diagram(s)

sequenceDiagram
  participant User as User
  participant CLI as CLI / Commands
  participant Eval as Evaluator (Checklist + Verdict)
  participant KB as KnowledgeBase / Skills
  participant FS as FileSystem

  rect rgba(200,220,255,0.5)
    User->>CLI: invoke /learn (submit artifact)
    CLI->>Eval: run pre-evaluation checks + checklist
    Eval->>Eval: evaluate checklist -> compute guideline dimensions + holistic verdict
  end

  alt Verdict: Save
    Eval->>FS: create or update learning file
    FS-->>CLI: confirmation
    CLI-->>User: "Saved"
  else Verdict: Improve then Save
    Eval-->>CLI: return improvement suggestions
    User->>CLI: apply improvements
    CLI->>FS: save after updates
    CLI-->>User: "Improved and saved"
  else Verdict: Absorb
    Eval->>KB: append content to existing skill
    KB-->>CLI: confirmation
    CLI-->>User: "Absorbed into [X]"
  else Verdict: Drop
    Eval-->>CLI: show reason and abort save
    CLI-->>User: "Dropped"
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Poem

🐰
I hopped through lines and checked each clue,
A checklist glittered, tidy and new,
Save, Improve, Absorb, or let it drop—
I nibble logic, then I hop! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main change: replacing a numeric rubric with a checklist-based holistic verdict system for the learn-eval command.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Owner

@affaan-m affaan-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review: checks are failing. Please fix failures before review.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="commands/learn-eval.md">

<violation number="1" location="commands/learn-eval.md:60">
P2: Checklist grep only checks global skills (`~/.claude/skills/`), missing project-level skills (`.claude/skills/`). Since step 3 defines both as valid save locations, duplicates in project skills would go undetected. The MEMORY.md item already says "both project and global" — apply the same pattern here.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Add one-off context when rerunning by tagging @cubic-dev-ai with guidance or docs links (including llms.txt)
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@commands/learn-eval.md`:
- Around line 83-89: Step 6's "Verdict-specific confirmation flow" lists
behaviors for only three verdicts while Step 5b introduces four; add an explicit
"Improve then Save" branch under that list that describes running the one
allowed re-evaluation/improvement, presenting the revised draft along with save
path, checklist results, and a 1-line verdict rationale, and then saving only
after user confirmation so the post-revision path is unambiguous (refer to the
"Verdict-specific confirmation flow" section and the four verdicts introduced in
Step 5b to place the new branch).
- Around line 56-63: The checklist currently only searches the global skills
directory (`~/.claude/skills/`) and can miss project-local saves; update the
"Grep `~/.claude/skills/` by keyword to check for content overlap" item in
commands/learn-eval.md (the required checklist under "Execute all of the
following") to also search the project-local skills folder (e.g.,
`.claude/skills/learned/` or `.claude/skills/`) so duplicates saved by Step 3
are detected — change the bullet to instruct grepping both the global path and
the project-local `.claude/skills/learned/` (or perform a recursive search in
both locations).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ed275d9f-bba2-4032-b36f-2e3fe1f23aee

📥 Commits

Reviewing files that changed from the base of the PR and between 6090401 and 08db389.

📒 Files selected for processing (1)
  • commands/learn-eval.md

Copy link
Owner

@affaan-m affaan-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review: checks are failing. Please fix failures before review.

Change h4 (####) to h3 (###) for sub-steps 5a and 5b to comply with
heading increment rule (headings must increment by one level at a time).
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
commands/learn-eval.md (2)

83-89: ⚠️ Potential issue | 🟠 Major

Add the Improve then Save branch to Step 6.

Step 5b defines four verdicts, but this flow only explains three of them. Without an explicit post-revision branch, the command is ambiguous after the one allowed re-evaluation. Based on learnings, review todo lists to identify out of order steps.

Suggested wording
 6. **Verdict-specific confirmation flow**
 
+   - **Improve then Save**: Present the required improvements + revised draft + updated checklist/verdict after one re-evaluation; if the revised verdict is **Save**, save after user confirmation, otherwise follow the new verdict
    - **Save**: Present save path + checklist results + 1-line verdict rationale + full draft → save after user confirmation
    - **Absorb into [X]**: Present target path + additions (diff format) + checklist results + verdict rationale → append after user confirmation
    - **Drop**: Show checklist results + reasoning only (no confirmation needed)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@commands/learn-eval.md` around lines 83 - 89, Step 6 is missing the "Improve
then Save" verdict from Step 5b; add a fourth branch named "Improve then Save"
to the verdict-specific confirmation flow that (1) presents suggested revisions
or a short improvement plan + diff of changes, (2) runs the single allowed
re-evaluation/revision to produce an updated draft, (3) shows the updated
checklist results + 1-line verdict rationale + full revised draft for user
review, and (4) prompts the user to confirm before saving—update the Step 6 text
to enumerate this branch alongside "Save", "Absorb into [X]" and "Drop".

60-60: ⚠️ Potential issue | 🟠 Major

Check project-local skills for overlap too.

This checklist only searches ~/.claude/skills/, but Step 3 also allows saving under .claude/skills/learned/. That can miss duplicates and turn an Absorb into an incorrect Save. Based on learnings, review todo lists to identify out of order steps.

Suggested wording
-   - [ ] Grep `~/.claude/skills/` by keyword to check for content overlap
+   - [ ] Grep `~/.claude/skills/` and relevant project `.claude/skills/` files by keyword to check for content overlap
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@commands/learn-eval.md` at line 60, Update the checklist item that currently
reads "Grep `~/.claude/skills/` by keyword to check for content overlap" to also
search the project-local skills directories and learned subfolder: change the
instruction to grep both `~/.claude/skills/` and `./.claude/skills/` (including
`./.claude/skills/learned/`) by keyword so duplicates saved locally aren’t
missed; ensure the wording clearly instructs checking both global and
project-local paths to avoid misclassifying Absorb vs Save.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@commands/learn-eval.md`:
- Around line 83-89: Step 6 is missing the "Improve then Save" verdict from Step
5b; add a fourth branch named "Improve then Save" to the verdict-specific
confirmation flow that (1) presents suggested revisions or a short improvement
plan + diff of changes, (2) runs the single allowed re-evaluation/revision to
produce an updated draft, (3) shows the updated checklist results + 1-line
verdict rationale + full revised draft for user review, and (4) prompts the user
to confirm before saving—update the Step 6 text to enumerate this branch
alongside "Save", "Absorb into [X]" and "Drop".
- Line 60: Update the checklist item that currently reads "Grep
`~/.claude/skills/` by keyword to check for content overlap" to also search the
project-local skills directories and learned subfolder: change the instruction
to grep both `~/.claude/skills/` and `./.claude/skills/` (including
`./.claude/skills/learned/`) by keyword so duplicates saved locally aren’t
missed; ensure the wording clearly instructs checking both global and
project-local paths to avoid misclassifying Absorb vs Save.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 99600bbc-4739-4575-804e-8e22dfad0128

📥 Commits

Reviewing files that changed from the base of the PR and between 08db389 and ae94edc.

📒 Files selected for processing (1)
  • commands/learn-eval.md

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
commands/learn-eval.md (1)

23-27: ⚠️ Potential issue | 🟠 Major

Step 3 determines location before knowing if a new file will be created.

Step 3 asks you to choose Global vs Project save location, but Step 5b's verdict can result in "Absorb into [X]" (append to existing skill) or "Drop" (no file created). When the verdict is Absorb, the location determined in Step 3 is never used, making the decision premature.

Consider moving Step 3 after Step 5, or making it conditional: "If the verdict is Save or Improve then Save, determine save location..." This ensures the location decision only happens when creating a new file. Based on learnings, todo lists should be checked for out-of-order steps.

♻️ Suggested reordering

Option 1: Move Step 3 to happen conditionally after Step 5 verdict is known.

Option 2: Add a note clarifying that Step 3 is provisional:

 3. **Determine save location:**
+   *(Note: This decision applies only if the verdict in Step 5 is Save or Improve then Save; skip if Absorbing or Dropping.)*
    - Ask: "Would this pattern be useful in a different project?"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@commands/learn-eval.md` around lines 23 - 27, Step 3 currently forces
choosing Global vs Project before Step 5b’s verdict (which can be "Absorb" or
"Drop"), so make the location decision conditional or move it after the verdict:
either relocate the entire Step 3 to occur immediately after Step 5 (so you only
choose when a new file will actually be created) or change Step 3 wording to “If
the verdict is Save or Improve then Save, determine save location (Global vs
Project)…”, and update any references to "Absorb" / "Drop" in Step 5b to clarify
that those outcomes skip the location selection.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@commands/learn-eval.md`:
- Around line 90-91: Reword Step 7 to clearly describe conditional actions based
on the verdict: state that for verdict "Drop" no save/absorb occurs; "Absorb"
appends content to an existing skill file (not the Step 3 location); and only
"Save" and "Improve then Save" write to the location determined in Step 3; use
the terms used in the doc (Step 3 location, verdict names
Drop/Absorb/Save/Improve then Save) so readers understand which verdict triggers
which storage behavior.

---

Outside diff comments:
In `@commands/learn-eval.md`:
- Around line 23-27: Step 3 currently forces choosing Global vs Project before
Step 5b’s verdict (which can be "Absorb" or "Drop"), so make the location
decision conditional or move it after the verdict: either relocate the entire
Step 3 to occur immediately after Step 5 (so you only choose when a new file
will actually be created) or change Step 3 wording to “If the verdict is Save or
Improve then Save, determine save location (Global vs Project)…”, and update any
references to "Absorb" / "Drop" in Step 5b to clarify that those outcomes skip
the location selection.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f282fb86-7f11-493b-941b-85edb64555e4

📥 Commits

Reviewing files that changed from the base of the PR and between ae94edc and 1d4ab47.

📒 Files selected for processing (1)
  • commands/learn-eval.md

Comment on lines +90 to +91
7. Save / Absorb to the determined location

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Step 7 doesn't account for all verdicts.

"Save / Absorb to the determined location" is incomplete and misleading:

  • Drop verdict performs no save or absorb action
  • Absorb verdict appends to an existing skill file (not the location determined in Step 3)
  • Only Save and Improve then Save use the Step 3 location

Reword to clarify the conditional nature:

📝 Suggested rewording
-7. Save / Absorb to the determined location
+7. **Execute the verdict action**
+   - **Save** / **Improve then Save**: Save new skill file to the location determined in Step 3
+   - **Absorb into [X]**: Append additions to the existing target skill file
+   - **Drop**: No file action required
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@commands/learn-eval.md` around lines 90 - 91, Reword Step 7 to clearly
describe conditional actions based on the verdict: state that for verdict "Drop"
no save/absorb occurs; "Absorb" appends content to an existing skill file (not
the Step 3 location); and only "Save" and "Improve then Save" write to the
location determined in Step 3; use the terms used in the doc (Step 3 location,
verdict names Drop/Absorb/Save/Improve then Save) so readers understand which
verdict triggers which storage behavior.

Copy link
Owner

@affaan-m affaan-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the checklist and verdict-flow gaps, then refreshed the branch onto current main so the CI validator fix is included.

@affaan-m affaan-m merged commit 973be02 into affaan-m:main Mar 11, 2026
38 of 39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants