Simple Bench

Run Instructions

Run benchmark:

python run_benchmark.py --model_name=gpt-4o --dataset_path=simple_bench_public.json

Clone the github repo and cd into it.

Make sure you have the correct python version (3.10.11) as a venv:

pyenv local 3.10.11
python -m venv llm_env
source llm_env/bin/activate

Install dependencies:

The best way to install dependencies is to use uv. If you don't have it installed in your environment, you can install it with pip install uv.

uv pip install -r pyproject.toml

Create a .env file with the following:

OPENAI_API_KEY=<your key>
ANTHROPIC_API_KEY=<your key>
...

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
weave_utils		weave_utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
run_benchmark.py		run_benchmark.py
simple_bench_public.json		simple_bench_public.json
simple_bench_public_set.csv		simple_bench_public_set.csv
system_prompt.txt		system_prompt.txt
uv.lock		uv.lock