Add evaluation topic benchmark dataset by tine1117 · Pull Request #39 · federicodeponte/opendraft

tine1117 · 2026-05-31T18:20:29Z

Summary

This adds the fixed benchmark topic set requested in #37:

data/eval_topics.json with 20 topics across CS, medicine, economics, social science, physics, biology, meta-science, and environmental science
canonical sources for each topic, including DOI or arXiv IDs where available
expected terms for lightweight coverage checks
a short schema note in EVALUATION.md

The structure is meant to be easy for the planned evaluation scripts to consume while still being readable for manual review.

Closes #37.

python -m json.tool data\eval_topics.json

Also checked the dataset contains 20 entries and every entry has the required keys.

Add evaluation topic benchmark dataset

e8ed2ed

federicodeponte mentioned this pull request Jun 1, 2026

feat: add benchmark topic dataset for evaluation suite #38

Closed