Skip to content

Conversation

@Qard
Copy link
Contributor

@Qard Qard commented Jan 13, 2026

Summary

Comprehensive documentation improvements addressing multiple user requests for better examples and reference materials.

Changes

1. Custom JSON Scorers (#137)

  • Added detailed JSDoc documentation to TypeScript JSON scorers (JSONDiff, ValidJSON)
  • Added module-level documentation with practical examples for both TypeScript and Python
  • Included examples for:
    • Composing existing scorers (schema validation + semantic comparison)
    • Custom validation logic (API response validation)
  • Improved consistency between TypeScript and Python documentation

2. Context-Based Evaluators with Braintrust Eval (#82)

  • Added comprehensive RAGAS module documentation with Braintrust Eval integration
  • Demonstrated how to pass context through metadata in Eval runs
  • Included practical examples for both TypeScript and Python showing:
    • Direct usage of context-based evaluators
    • Integration with Braintrust Eval runs
    • Proper extraction of context from dataset metadata

3. Complete Scorer Reference (#101)

  • Created comprehensive SCORERS.md reference documentation
  • Documented all 30+ available scorers organized by category:
    • LLM-as-a-Judge scorers (Factuality, Battle, ClosedQA, etc.)
    • RAG scorers (ContextRelevancy, Faithfulness, AnswerCorrectness, etc.)
    • Heuristic scorers (Levenshtein, NumericDiff, etc.)
    • JSON scorers (JSONDiff, ValidJSON)
    • List scorers (ListContains)
  • For each scorer, documented:
    • All parameters with descriptions
    • Score ranges and interpretation
    • Practical usage examples
  • Added score interpretation guidelines

Documentation Files

  • js/json.ts - Enhanced with module and JSDoc examples
  • py/autoevals/json.py - Enhanced with module examples
  • js/ragas.ts - Added comprehensive module documentation
  • py/autoevals/ragas.py - Added Braintrust Eval integration examples
  • SCORERS.md - New comprehensive scorer reference (650+ lines)
  • TODO.md - Updated to track documentation progress

Issues Addressed

Closes #137 - Custom scorer for JSONs
Closes #82 - Better docs for context-based evaluators
Closes #101 - Document supported scores

@github-actions
Copy link

github-actions bot commented Jan 13, 2026

Braintrust eval report

Autoevals (json-scorer-docs-1768323434)

Score Average Improvements Regressions
NumericDiff 73.7% (-1pp) 8 🟢 9 🔴
Time_to_first_token 1.43tok (0tok) 62 🟢 57 🔴
Llm_calls 1.55 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) - -
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 279.25tok (+0tok) - -
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Completion_tokens 19.3tok (+1.15tok) 33 🟢 44 🔴
Completion_reasoning_tokens 0tok (+0tok) - -
Total_tokens 298.54tok (+1.15tok) 33 🟢 44 🔴
Estimated_cost 0$ (+0$) - -
Duration 1.44s (-0.1s) 161 🟢 57 🔴
Llm_duration 2.92s (-0.19s) 73 🟢 46 🔴

@Qard Qard changed the title Add comprehensive documentation for custom JSON scorers Comprehensive documentation improvements for scorers and evaluators Jan 13, 2026
@Qard Qard force-pushed the json-scorer-docs branch 2 times, most recently from 95ca6c8 to e2f72e5 Compare January 13, 2026 01:00
Addresses multiple documentation requests with practical examples and
complete reference materials.

- Custom JSON scorers (#137): Added detailed JSDoc/docstrings with examples
  showing how to compose scorers (schema validation + semantic comparison)
  and create custom validators (API response validation)

- Context-based evaluators with Braintrust Eval (#82): Added RAGAS module
  documentation demonstrating how to pass context through metadata in Eval
  runs, with practical examples for both TypeScript and Python

- Complete scorer reference (#101): Created comprehensive SCORERS.md
  documenting all 30+ available scorers with parameters, score ranges,
  interpretation guidelines, and usage examples organized by category
  (LLM-as-judge, RAG, heuristic, JSON, list)

Files modified:
- js/json.ts, py/autoevals/json.py: Enhanced with module-level examples
- js/ragas.ts, py/autoevals/ragas.py: Added Eval integration examples
- SCORERS.md: New 650+ line comprehensive reference
@Qard Qard force-pushed the json-scorer-docs branch from e2f72e5 to 421fe0e Compare January 13, 2026 16:56
@Qard Qard merged commit f4e62ec into main Jan 13, 2026
7 checks passed
@Qard Qard deleted the json-scorer-docs branch January 13, 2026 18:20
@github-actions
Copy link

github-actions bot commented Jan 13, 2026

Braintrust eval report

Autoevals (main-1768328440)

Score Average Improvements Regressions
NumericDiff 73.4% (+1pp) 3 🟢 2 🔴
Time_to_first_token 1.35tok (+0.03tok) 38 🟢 81 🔴
Llm_calls 1.55 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) - -
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 279.25tok (+0tok) - -
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Completion_tokens 19.3tok (+0tok) - -
Completion_reasoning_tokens 0tok (+0tok) - -
Total_tokens 298.54tok (+0tok) - -
Estimated_cost 0$ (+0$) - -
Duration 2.87s (-0.36s) 94 🟢 124 🔴
Llm_duration 2.82s (+0.11s) 32 🟢 86 🔴

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

custom scorer for jsons Supported scores (autoevals JS) Better support and documentation for using context-based evaluators in Eval run

3 participants