CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published 9 days ago • 45
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 74
InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining Paper • 2003.13198 • Published Mar 30, 2020
ExpertPrompting: Instructing Large Language Models to be Distinguished Experts Paper • 2305.14688 • Published May 24, 2023
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese Paper • 2211.01335 • Published Nov 2, 2022 • 1
Transferring General Multimodal Pretrained Models to Text Recognition Paper • 2212.09297 • Published Dec 19, 2022
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement Paper • 2409.12122 • Published Sep 18, 2024 • 3