reasoning-eval

Usage:

python lm-eval.py --model-path [model path on HF] --dataset-name [dataset name] --sample-output-file [json file name]

Supported models:

models under deepseek-ai (e.g., deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)

Supported datasets:

AIME-2024

To add a model please add the corresponding generation config under the load_model function in file utils.py, for instance:

if "deepseek-ai" in model_name: 
    model = AutoModelForCausalLM.from_pretrained(
        model_name, device_map="auto", trust_remote_code=True
    ).eval()
    model.generation_config = GenerationConfig.from_pretrained(
        model_name, trust_remote_code=True
    )
    model.generation_config.temperature = 0.6  
    model.generation_config.top_p = 0.95
    model.generation_config.max_new_tokens = 32768
    tokenizer = AutoTokenizer.from_pretrained(
        model_name, trust_remote_code=True, bf16=True, use_flash_attn=True
    )
    return model, tokenizer

To add a customized dataset please add the corresponding dataset class in folder task.

To add a customized prompt please add the template in template.py and also update the load_dataset function under utils.py.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
evaluator		evaluator
task		task
.gitignore		.gitignore
README.md		README.md
lm-eval.py		lm-eval.py
plot.py		plot.py
requirements.txt		requirements.txt
template.py		template.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

reasoning-eval

About

Uh oh!

Releases

Packages

Languages

mprammer/reasoning-eval

Folders and files

Latest commit

History

Repository files navigation

reasoning-eval

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages