Skip to content

Commit 014366d

Browse files
committed
Merge branch 'main' into kevin
2 parents 8f3d8da + 5458ebb commit 014366d

File tree

2 files changed

+12
-16
lines changed

2 files changed

+12
-16
lines changed

docs/modules/usage/llms/llms.md

Lines changed: 5 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,23 +5,14 @@ OpenHands can connect to any LLM supported by LiteLLM. However, it requires a po
55
## Model Recommendations
66

77
Based on our evaluations of language models for coding tasks (using the SWE-bench dataset), we can provide some
8-
recommendations for model selection. Some analyses can be found in [this blog article comparing LLMs](https://www.all-hands.dev/blog/evaluation-of-llms-as-coding-agents-on-swe-bench-at-30x-speed) and
9-
[this blog article with some more recent results](https://www.all-hands.dev/blog/openhands-codeact-21-an-open-state-of-the-art-software-development-agent).
10-
11-
When choosing a model, consider both the quality of outputs and the associated costs. Here's a summary of the findings:
12-
13-
- Claude 3.5 Sonnet is the best by a fair amount, achieving a 53% resolve rate on SWE-Bench Verified with the default agent in OpenHands.
14-
- GPT-4o lags behind, and o1-mini actually performed somewhat worse than GPT-4o. We went in and analyzed the results a little, and briefly it seemed like o1 was sometimes "overthinking" things, performing extra environment configuration tasks when it could just go ahead and finish the task.
15-
- Finally, the strongest open models were Llama 3.1 405 B and deepseek-v2.5, and they performed reasonably, even besting some of the closed models.
16-
17-
Please refer to the [full article](https://www.all-hands.dev/blog/evaluation-of-llms-as-coding-agents-on-swe-bench-at-30x-speed) for more details.
8+
recommendations for model selection. Our latest benchmarking results can be found in [this spreadsheet](https://docs.google.com/spreadsheets/d/1wOUdFCMyY6Nt0AIqF705KN4JKOWgeI4wUGUP60krXXs/edit?gid=0).
189

1910
Based on these findings and community feedback, the following models have been verified to work reasonably well with OpenHands:
2011

21-
- claude-3-5-sonnet (recommended)
22-
- gpt-4 / gpt-4o
23-
- llama-3.1-405b
24-
- deepseek-v2.5
12+
- anthropic/claude-3-5-sonnet-20241022 (recommended)
13+
- anthropic/claude-3-5-haiku-20241022
14+
- deepseek/deepseek-chat
15+
- gpt-4o
2516

2617
:::warning
2718
OpenHands will issue many prompts to the LLM you configure. Most of these LLMs cost money, so be sure to set spending

frontend/src/utils/verified-models.ts

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
// Here are the list of verified models and providers that we know work well with OpenHands.
2-
export const VERIFIED_PROVIDERS = ["openai", "azure", "anthropic"];
3-
export const VERIFIED_MODELS = ["gpt-4o", "claude-3-5-sonnet-20241022"];
2+
export const VERIFIED_PROVIDERS = ["openai", "azure", "anthropic", "deepseek"];
3+
export const VERIFIED_MODELS = [
4+
"gpt-4o",
5+
"claude-3-5-sonnet-20241022",
6+
"deepseek-chat",
7+
];
48

59
// LiteLLM does not return OpenAI models with the provider, so we list them here to set them ourselves for consistency
610
// (e.g., they return `gpt-4o` instead of `openai/gpt-4o`)
@@ -21,6 +25,7 @@ export const VERIFIED_ANTHROPIC_MODELS = [
2125
"claude-2.1",
2226
"claude-3-5-sonnet-20240620",
2327
"claude-3-5-sonnet-20241022",
28+
"claude-3-5-haiku-20241022",
2429
"claude-3-haiku-20240307",
2530
"claude-3-opus-20240229",
2631
"claude-3-sonnet-20240229",

0 commit comments

Comments
 (0)