CommonEval

Code for staging LLM evaluation benchmarks in a variety of standard formats for common evaluation.

The focus of this library is reading and writing benchmark data, but it includes one example benchmark dataset in data/eng for illustration purposes. Please do not use these files for fine-tuning, since that compromises their ability to measure LLM performance fairly.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
commoneval		commoneval
data/eng/bible_qa-death-bool		data/eng/bible_qa-death-bool
docs		docs
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CommonEval

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CommonEval

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages