12 17 1

Yi Cui PRO

onekq

https://onekq.ai

AI & ML interests

Benchmark, Code Generation Model

Recent Activity

updated a collection about 21 hours ago

QLora-ready Coding Models

updated a collection about 21 hours ago

QLora-ready Coding Models

updated a collection about 21 hours ago

QLora-ready Coding Models

View all activity

Articles

Does Daily Software Engineering Work Need Reasoning Models?

Sep 24, 2024

• 5

All LLMs Write Great Code, But Some Make (A Lot) Fewer Mistakes

Sep 12, 2024

• 4

Organizations

Posts 7

Post

3020

🐋 DeepSeek 🐋v3 achieves a solid 7 point jump than v2.5, surpassing GPT-4o, but is still behind 🍓 o1 🍓and Claude 3.5.

onekq-ai/WebApp1K-models-leaderboard

Post

586

October version of Claude 3.5 lifts SOTA (set by its June version) by 7 points.
onekq-ai/WebApp1K-models-leaderboard

Closed sourced models are widening the gap again.

Note: Our frontier leaderboard now uses double test scenarios because the single-scenario test suit has been saturated.

View all posts

Papers 3

arxiv:2409.13773

arxiv:2409.05177

arxiv:2408.00019

models

None public yet

datasets

None public yet