Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation

Project Website | 📝Paper | 📐LiveAopsBench | 📊AoPS-Ins (Third party)

This repository contains the code and instructions for reproducing the data collection and processing pipeline described in our paper "Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation".

Environment Setup

To install the required dependencies:

pip install -r requirements.txt

By default, we use a setup of 4xA100 GPU with 80GB memory each.

Data Collection

To construct the AoPS-Instruct dataset:

To crawl the data, run the following script:
```
bash scripts/crawl_raw.sh
```
This will produce a raw jsonlines file: out/items_raw.jl to be processed in the next steps.

Note: To perform a test run to make sure the whole pipeline is working, in the crawling script, you can add the test_mode option to crawl only 1000 datapoints:
```
bash scripts/crawl_raw.sh --test_mode True
```
Then, you can run the rest of the pipeline to make sure everything is runnable before running a larger script file. Make sure to delete the entire test-run folder ./out before rerunning the pipeline since the pipeline is designed to resume unless the entire folder is removed.
Modify and run the reproduction script:
- WORKDIR: The working directory for the script.
- NUM_GPUS: The number of GPUS to use for parsing models. This variable should be an even number. Ensure that the CUDA_VISIBLE_DEVICES is set correctly. We assume 80GB A100/H100 is used and a minimum of 2 GPUs are needed.
- ITEMS_RAW_PATH: The path to raw crawled data (from step 2).
- Optionally change the parsing and rewriting models (default is Qwen 2.5 models).
Then run
```
bash scripts/reproduce.sh
```
This will process the raw crawled data and create the final training dataset in the specified format. The script supports resuming, so if interrupted, it will pick up where it left off.

Processed Data

We provide the full code for reproducing AoPS-Ins and LiveAoPSBench, making it easy for you to explore and experiment with these tools. Processed data is available through a community reproduction effort, accessible here: Hugging Face Dataset. While we encourage the use of this third-party dataset, please be aware that we disclaim any liability for its use and any associated issues that may arise.

Evaluation Code

To run evaluation on the LiveAoPS Bench, please refer to eval directory.

Questions/Issues

Please submit a GitHub issue if you have any questions or find any bugs.

Citation

If you use this code or dataset in your research, please cite our paper:

@misc{aopsdataset,
      title={Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation}, 
      author={Sadegh Mahdavi and Muchen Li and Kaiwen Liu and Christos Thrampoulidis and Leonid Sigal and Renjie Liao},
      year={2025},
      eprint={2501.14275},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.14275}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
aops		aops
aops_crawler		aops_crawler
eval		eval
scripts		scripts
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
classify_aops.py		classify_aops.py
filter_crawl_errors.py		filter_crawl_errors.py
gather_jsonl_pieces.py		gather_jsonl_pieces.py
llm.py		llm.py
parse_aops.py		parse_aops.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation

Environment Setup

Data Collection

Processed Data

Evaluation Code

Questions/Issues

Citation

About

Releases

Packages

Contributors 2

Languages

License

DSL-Lab/aops

Folders and files

Latest commit

History

Repository files navigation

Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation

Environment Setup

Data Collection

Processed Data

Evaluation Code

Questions/Issues

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages