view post Post 3020 π DeepSeek πv3 achieves a solid 7 point jump than v2.5, surpassing GPT-4o, but is still behind π o1 πand Claude 3.5. onekq-ai/WebApp1K-models-leaderboard See translation
view post Post 586 October version of Claude 3.5 lifts SOTA (set by its June version) by 7 points. onekq-ai/WebApp1K-models-leaderboardClosed sourced models are widening the gap again.Note: Our frontier leaderboard now uses double test scenarios because the single-scenario test suit has been saturated.