Skip to content

feat(commands): improve learn-eval with checklist-based holistic verdict#385

Closed
affaan-m wants to merge 5 commits intomainfrom
codex/pr-360-learn-eval-v2
Closed

feat(commands): improve learn-eval with checklist-based holistic verdict#385
affaan-m wants to merge 5 commits intomainfrom
codex/pr-360-learn-eval-v2

Conversation

@affaan-m
Copy link
Owner

@affaan-m affaan-m commented Mar 11, 2026

Summary

  • carry forward the /learn-eval holistic-verdict improvements from #360
  • address the reviewer-requested gaps by checking project-local skills for overlap and adding the explicit Improve then Save confirmation branch
  • refresh onto current main and include the CI validator dependency fix so the branch validates on current runners

Supersedes

Test plan

  • npm ci --ignore-scripts
  • node scripts/ci/validate-agents.js
  • node scripts/ci/validate-hooks.js
  • node scripts/ci/validate-commands.js
  • node scripts/ci/validate-skills.js
  • node scripts/ci/validate-rules.js

Summary by cubic

Upgrades /learn-eval to a checklist-based, holistic verdict system that reduces duplicate skills and guides where and how to save knowledge. Improves reliability of CI validators on current runners.

  • New Features

    • Replace numeric scoring with a checklist + 4-way verdict: Save / Improve then Save / Absorb into [X] / Drop.
    • Add overlap checks against project/global skills and MEMORY.md, and consider appending to existing skills.
    • Add verdict-specific confirmation flows and a concise output template.
  • Dependencies

    • Workflows now install validation dependencies (npm ci --ignore-scripts) before running validators.
    • Bump ajv to ^8.18.0.

Written for commit c6e7552. Summary will update on new commits.

Summary by CodeRabbit

Release Notes

  • Documentation

    • Redesigned skill evaluation guide with a holistic verdict system (supporting Save, Improve then Save, Absorb, or Drop outcomes) and comprehensive pre-save verification steps.
  • Chores

    • Enhanced CI/CD validation workflow with dependency installation improvements.
    • Updated development dependencies.

shimo4228 and others added 5 commits March 8, 2026 19:35
Replace the 5-dimension numeric scoring rubric with a checklist + holistic
verdict system (Save / Improve then Save / Absorb into [X] / Drop).

Key improvements:
- Explicit pre-save checklist: grep skills/ for duplicates, check MEMORY.md,
  consider appending to existing skills, confirm reusability
- 4-way verdict instead of binary save/don't-save: adds "Absorb into [X]"
  to prevent skill file proliferation, and "Improve then Save" for iterative
  refinement
- Verdict-specific confirmation flows tailored to each outcome
- Design rationale explaining why holistic judgment outperforms numeric
  scoring with modern frontier models
Change h4 (####) to h3 (###) for sub-steps 5a and 5b to comply with
heading increment rule (headings must increment by one level at a time).
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 11, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Two CI workflow files add npm dependency installation steps for validation jobs, documentation is updated to replace numeric evaluation rubrics with a holistic verdict checklist system, and the ajv dev dependency is updated from v8.17.0 to v8.18.0.

Changes

Cohort / File(s) Summary
CI Workflows
.github/workflows/ci.yml, .github/workflows/reusable-validate.yml
Added "Install validation dependencies" step running npm ci --ignore-scripts after Node.js setup in both the validate job and reusable workflow.
Documentation Update
commands/learn-eval.md
Restructured evaluation process from 5-dimension numeric rubric to checklist-based holistic verdict system. Added verdict categories (Save, Improve then Save, Absorb, Drop), expanded quality gate section, and introduced design rationale and additional pre-save verification guidance.
Dependencies
package.json
Updated ajv dev dependency from ^8.17.0 to ^8.18.0.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Poem

🐰 With workflows now swift and dependencies bright,
The validation now checks things quite right!
From numbers to verdicts, a qualitative way,
The learn-eval docs see the light of the day! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: improving learn-eval documentation with a checklist-based holistic verdict system, which is the primary focus across the workflow files and the commands/learn-eval.md changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/pr-360-learn-eval-v2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 5 files

@affaan-m
Copy link
Owner Author

Superseded by #360, which is now merged on main. Closing this duplicate follow-up to keep review surface clean.

@affaan-m affaan-m closed this Mar 11, 2026
@affaan-m affaan-m deleted the codex/pr-360-learn-eval-v2 branch March 11, 2026 22:59
@affaan-m affaan-m restored the codex/pr-360-learn-eval-v2 branch March 11, 2026 23:00
@affaan-m affaan-m deleted the codex/pr-360-learn-eval-v2 branch March 12, 2026 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants