CodeLLMEval

Evaluation based on programming scenarios

[ English | 中文 ]

👋 Join our WeChat

Defect scenario	Serious result	case
Dead Loop	Severe cause CPU 100%, service crash	2
Memory leak, memory overflow	Severe OOM, service crashes	2
Thread Deadlock	Concurrent threads compete for resource deadlocks, severely causing CPU 100% or OOM, service unavailability or failure	2
Inconsistent concurrent data	Improper operation in multi-threaded situations leads to inconsistent and dirty data	1
Long context/token capability	Test the accuracy and maximum capability of long text processing	1
Context learning capability	Test the accuracy of context understanding and reasoning	1

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
assets		assets
common-scenarios		common-scenarios
deepseekR1		deepseekR1
high-frequency-bugs		high-frequency-bugs
human-eval		human-eval
lesson		lesson
prompt-template		prompt-template
README.md		README.md
README_zh.md		README_zh.md