Skip to content

feat: add evaluation report contract verifier#927

Closed
luochen211 wants to merge 4 commits into
santifer:mainfrom
luochen211:codex/report-contract-linter
Closed

feat: add evaluation report contract verifier#927
luochen211 wants to merge 4 commits into
santifer:mainfrom
luochen211:codex/report-contract-linter

Conversation

@luochen211

@luochen211 luochen211 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • add verify-reports.mjs to validate evaluation report headers and Machine Summary blocks
  • expose npm run verify:reports and include the self-test in test-all.mjs
  • document accepted report contract fields in docs/SCRIPTS.md

Validation

  • node verify-reports.mjs --self-test
  • npm run verify:reports
  • node test-all.mjs --quick (232 passed, 0 failed; existing README personal-data warnings)

Summary by CodeRabbit

  • New Features

    • Added verify:reports command to validate evaluation report markdown files, checking for required headers, accepted legitimacy tier values, proper score formats, PDF status, and Machine Summary sections.
  • Documentation

    • Added comprehensive documentation for the report verification script with usage examples and exit code reference.
  • Tests

    • Integrated automated report verification into the test suite.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

A new report validation CLI script, verify-reports.mjs, validates evaluation report markdown files in the reports/ directory. It enforces required headers (URL, Legitimacy, Score), legitimacy tier values, PDF status presence for pipeline reports, and Machine Summary section YAML parsing. The script is registered as an npm script, integrated into the test suite with self-test validation, and documented in SCRIPTS.md.

Changes

Report Validation Script with Self-Test Integration

Layer / File(s) Summary
Report validation contract and implementation
verify-reports.mjs
Defines validation constants (default reports directory, allowed legitimacy tiers, required fields), helper utilities to enumerate report files, extract/parse Machine Summary fenced blocks, and detect PDF requirements, and validateReport() implementation that collects errors and warnings for mandatory headers, conditional PDF presence, and Machine Summary structure.
Self-test and CLI entry point
verify-reports.mjs
selfTest() creates temporary reports directory, writes representative valid/invalid/malformed markdown files, asserts expected error/warning outcomes, and cleans up; main() dispatches to self-test mode or aggregates per-file results into JSON output with exit code 1 on errors.
Script registration, test integration, and documentation
package.json, test-all.mjs, docs/SCRIPTS.md
Registers verify:reports npm script to invoke verify-reports.mjs, adds --self-test execution to test-all.mjs suite, and documents the validation contract rules, run commands, and exit code semantics.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

Suggested labels

🔧 scripts, 📄 docs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a new verification script for evaluation report contracts, which is the primary focus across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@verify-reports.mjs`:
- Around line 56-57: The Score parsing regex in verify-reports.mjs (the const
score assignment) only allows zero or one decimal place and thus rejects values
like 4.25/5; update the regex used in the score extraction (the expression
assigned to score) to allow up to two decimal places (e.g., make the fractional
part \.\d{1,2} optional) while keeping the existing alternatives (N/A, SKIP,
DUP) and case-insensitivity so reports like 4.25/5 are accepted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 17785419-9a17-451e-967d-6dc8b791ed6f

📥 Commits

Reviewing files that changed from the base of the PR and between 46a5c59 and 8fea658.

📒 Files selected for processing (4)
  • docs/SCRIPTS.md
  • package.json
  • test-all.mjs
  • verify-reports.mjs

Comment thread verify-reports.mjs
Comment on lines +56 to +57
const score = content.match(/\*\*Score:\*\*\s*(\d(?:\.\d)?\/5|N\/A|SKIP|DUP)/i)?.[1];
if (!score) errors.push('missing or invalid **Score:** header');

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for scores with two decimal places in reports/
rg -n '\*\*Score:\*\*\s*\d\.\d{2}/5' --glob 'reports/*.md' -C1

Repository: santifer/career-ops

Length of output: 212


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Locate verify-reports.mjs to confirm directory structure / validation code context
echo "== verify-reports.mjs locations =="
rg -n --files-with-matches "verify-reports\.mjs" . 2>/dev/null || true
# fallback: find file directly
echo "== find verify-reports.mjs =="
fd -a "verify-reports.mjs" . 2>/dev/null || true

# 2) Locate "reports" directory/files
echo "== find reports/ directories =="
fd -t d -a "reports" . 2>/dev/null || true
echo "== list markdown under any reports-like directories (depth<=3) =="
fd -a -t f "*.md" . | rg -n "(/|^)reports(/|$)" || true

# 3) Search for Score headers with decimal formats in all markdown (limit to likely files)
echo "== Search for **Score:** with two decimals in any md/html/js templates =="
rg -n --glob '**/*.md' --glob '**/*.html' --glob '**/*.mjs' \
  '\*\*Score:\*\*\s*\d+\.\d{2}/5' -S . || true

echo "== Search for **Score:** with 1 decimal =="
rg -n --glob '**/*.md' --glob '**/*.html' --glob '**/*.mjs' \
  '\*\*Score:\*\*\s*\d+\.\d/5' -S . || true

echo "== Search for **Score:** with no decimals (e.g., 4/5) =="
rg -n --glob '**/*.md' --glob '**/*.html' --glob '**/*.mjs' \
  '\*\*Score:\*\*\s*\d/5' -S . || true

Repository: santifer/career-ops

Length of output: 986


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== verify-reports.mjs regex area (lines ~50-80) =="
nl -ba verify-reports.mjs | sed -n '40,90p'

echo "== Search in reports/ for two-decimal scores like 4.25/5 =="
rg -n '\*\*Score:\*\*\s*\d+\.\d{2}/5' reports -S || true

echo "== Search in reports/ for one-decimal scores like 4.2/5 =="
rg -n '\*\*Score:\*\*\s*\d+\.\d/5' reports -S || true

echo "== Search in reports/ for integer scores like 4/5 =="
rg -n '\*\*Score:\*\*\s*\d/5' reports -S || true

Repository: santifer/career-ops

Length of output: 175


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== verify-reports.mjs total lines =="
wc -l verify-reports.mjs

echo "== verify-reports.mjs lines 45-75 =="
awk 'NR>=45 && NR<=75 {printf "%d:%s\n", NR, $0}' verify-reports.mjs

echo "== Search in reports/ for two-decimal scores like 4.25/5 =="
rg -n '\*\*Score:\*\*\s*\d+\.\d{2}/5' reports -S || true

echo "== Search in reports/ for one-decimal scores like 4.2/5 =="
rg -n '\*\*Score:\*\*\s*\d+\.\d/5' reports -S || true

echo "== Search in reports/ for integer scores like 4/5 =="
rg -n '\*\*Score:\*\*\s*\d/5' reports -S || true

Repository: santifer/career-ops

Length of output: 1591


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== reports/ markdown files (count + first 50) =="
fd -t f "*.md" reports . | head -n 50
echo "== reports/ md count =="
fd -t f "*.md" reports . | wc -l

echo "== Any **Score:** occurrences in reports/ =="
rg -n '\*\*Score:\*\*' reports -S || true

echo "== Any two-decimal scores in reports/ (4.25/5 etc) =="
rg -n '\*\*Score:\*\*\s*\d+\.\d{2}/5' reports -S || true

echo "== Any one-decimal scores in reports/ (4.2/5 etc) =="
rg -n '\*\*Score:\*\*\s*\d+\.\d/5' reports -S || true

echo "== Any integer scores in reports/ (4/5) =="
rg -n '\*\*Score:\*\*\s*\d/5' reports -S || true

Repository: santifer/career-ops

Length of output: 414


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== reports/ directory listing (top 3 levels) =="
find reports -maxdepth 3 -type d -print
echo "== reports/ md files count =="
find reports -type f -name '*.md' | wc -l
echo "== reports/ md files (first 50) =="
find reports -type f -name '*.md' | head -n 50

echo "== Any **Score:** occurrences in reports/ =="
rg -n '\*\*Score:\*\*' reports -S || true

echo "== Two-decimal scores (**Score:** 4.25/5) in reports/ =="
rg -n '\*\*Score:\*\*\s*\d+\.\d{2}/5' reports -S || true

echo "== One-decimal scores (**Score:** 4.2/5) in reports/ =="
rg -n '\*\*Score:\*\*\s*\d+\.\d/5' reports -S || true

echo "== Integer scores (**Score:** 4/5) in reports/ =="
rg -n '\*\*Score:\*\*\s*\d/5' reports -S || true

Repository: santifer/career-ops

Length of output: 394


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Any **Score:** values with two decimals across repo =="
rg -n '\*\*Score:\*\*\s*\d+\.\d{2}/5' -S . || true

echo "== Any **Score:** values with one decimal across repo =="
rg -n '\*\*Score:\*\*\s*\d+\.\d/5' -S . || true

echo "== Any **Score:** values with integer across repo =="
rg -n '\*\*Score:\*\*\s*\d/5' -S . || true

Repository: santifer/career-ops

Length of output: 380


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "== verify-reports.mjs lines 90-150 =="
awk 'NR>=90 && NR<=150 {print NR ":" $0}' verify-reports.mjs
echo "== examples/sample-report.md =="
awk 'NR<=60 {print NR ":" $0}' examples/sample-report.md

Repository: santifer/career-ops

Length of output: 4755


Score regex only supports 0–1 decimal digit; no two-decimal scores found in current repo fixtures

verify-reports.mjs uses \d(?:\.\d)?\/5, which accepts 4/5 and 4.2/5 but rejects 4.25/5. Repo-wide search found only **Score:** 4.2/5 (and 4.1/5 in the self-test), with no existing **Score:** values containing two decimal places—so this is only a gap if 4.xx/5 is expected in real reports.

🛠️ Proposed fix to support two decimal places
-  const score = content.match(/\*\*Score:\*\*\s*(\d(?:\.\d)?\/5|N\/A|SKIP|DUP)/i)?.[1];
+  const score = content.match(/\*\*Score:\*\*\s*(\d(?:\.\d{1,2})?\/5|N\/A|SKIP|DUP)/i)?.[1];
   if (!score) errors.push('missing or invalid **Score:** header');
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const score = content.match(/\*\*Score:\*\*\s*(\d(?:\.\d)?\/5|N\/A|SKIP|DUP)/i)?.[1];
if (!score) errors.push('missing or invalid **Score:** header');
const score = content.match(/\*\*Score:\*\*\s*(\d(?:\.\d{1,2})?\/5|N\/A|SKIP|DUP)/i)?.[1];
if (!score) errors.push('missing or invalid **Score:** header');
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@verify-reports.mjs` around lines 56 - 57, The Score parsing regex in
verify-reports.mjs (the const score assignment) only allows zero or one decimal
place and thus rejects values like 4.25/5; update the regex used in the score
extraction (the expression assigned to score) to allow up to two decimal places
(e.g., make the fractional part \.\d{1,2} optional) while keeping the existing
alternatives (N/A, SKIP, DUP) and case-insensitivity so reports like 4.25/5 are
accepted.

@santifer

Copy link
Copy Markdown
Owner

@luochen211 — closing this one quickly: it's a re-submit of #892 (same verify-reports.mjs, same npm script), which was closed three hours ago under the core-acceptance criterion explained in full on #890, along with its sibling issue #876. My guess is your Codex queue re-generated it from a stale state rather than this being intentional — worth syncing your automation against the repo before the next batch.

To be crystal clear about scope: this close is ONLY about the re-submit. #929 (the networkidle fix) and #928 (latex menu) are welcome and in my review queue right now — they're exactly the candidate-facing work the criterion exists to prioritize. And if you disagree with the criterion itself, the right venue is a reply on #890 — I'll engage with arguments there, not with re-pushes. 🙏

@santifer santifer closed this Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants