-
Notifications
You must be signed in to change notification settings - Fork 805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open Deep Research blog #2644
Open Deep Research blog #2644
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some suggestion to align the section headers.
Co-authored-by: Albert Villanova del Moral <[email protected]>
Co-authored-by: Albert Villanova del Moral <[email protected]>
Co-authored-by: Albert Villanova del Moral <[email protected]>
Co-authored-by: Albert Villanova del Moral <[email protected]>
open-deep-research.md
Outdated
|
||
## TLDR | ||
|
||
Yesterday, OpenAI released [Deep Research](https://openai.com/index/introducing-deep-research/), a system that browses the web to summarize content and answer questions based on the summary. The system is impressive and blew our mind when we tried it for the first time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mind -> minds
open-deep-research.md
Outdated
One of the main results in the blog post is a strong improvement of performances on the [General AI Assistants benchmark (GAIA)](https://huggingface.co/gaia-benchmark), a benchmark we’ve been playing with recently as well, where they successfully reached near 67% correct answers on 1-shot on average, and 47.6% on especially challenging “level 3” questions that involve multiple steps of reasoning and tool usage (see below for a presentation of GAIA). | ||
|
||
|
||
DeepResearch is composed of an LLM (which can be selected from the current list of LLM provided by OpenAI, 4o, o1, o3, etc) and an internal “agentic framework” which guide the LLM to use tools like web search and organize its actions in steps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLM -> LLMs, guide -> guides
open-deep-research.md
Outdated
|
||
DeepResearch is composed of an LLM (which can be selected from the current list of LLM provided by OpenAI, 4o, o1, o3, etc) and an internal “agentic framework” which guide the LLM to use tools like web search and organize its actions in steps. | ||
|
||
While powerful LLM are now freely available in open-source (see e.g. [the recent DeepSeek R1 model](https://huggingface.co/deepseek-ai/DeepSeek-R1)), OpenAI didn’t disclose much about the agentic framework underlying Deep Research… |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLM -> LLMs
|
||
So we decided to embark on a mission to reproduce their results and open-source the needed framework along the way! | ||
|
||
The clock is ticking, let’s go! ⏱️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't mention the 24-hour thing so this clock reference is kind of a non-sequitur
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh i did but someone removed it
open-deep-research.md
Outdated
|
||
Now we need to provide the agent with the right set of tools. | ||
|
||
**1.** A web browser. While a fully fledged web browser interaction like [Operator](https://openai.com/index/introducing-operator/) will be needed to reach full performance, we started with an extremely simple text-based web browser for now for our first PoC. You can find the code [here](https://github.com/huggingface/smolagents/blob/gaia-submission-r1/examples/open_deep_research/scripts/text_web_browser.py) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just write out proof-of-concept
open-deep-research.md
Outdated
|
||
Here is a short roadmap of improvements which we feel would really improve these tools’ performance (feel free to open a PR and contribute!): | ||
|
||
- extending the number of file-formats which can be read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file-formats -> file formats
open-deep-research.md
Outdated
|
||
## Results 🏅 | ||
|
||
In our 24h+ reproduction sprint, we’ve already seen steady improvements in the performances of our agent on GAIA! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
performances -> performance
open-deep-research.md
Outdated
|
||
In our 24h+ reproduction sprint, we’ve already seen steady improvements in the performances of our agent on GAIA! | ||
|
||
We’ve quickly gone up from the previous SOTA with an open framework, around 46% for Magentic-One, to our current performances of 54% on the validation set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SOTA -> SoTA if you want to be pedantic
open-deep-research.md
Outdated
|
||
We’ve quickly gone up from the previous SOTA with an open framework, around 46% for Magentic-One, to our current performances of 54% on the validation set. | ||
|
||
This bump in performance is due mostly to letting our agents write their actions in code! Indeed, when switching to a standard agent that writes actions in Json instead of code, performance of the same setup is instantly degraded to 33% average on the validation set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Json -> JSON
Co-authored-by: Albert Villanova del Moral <[email protected]>
Co-authored-by: Albert Villanova del Moral <[email protected]>
Co-authored-by: Albert Villanova del Moral <[email protected]>
Co-authored-by: Albert Villanova del Moral <[email protected]>
Co-authored-by: Aymeric Roucher <[email protected]>
should we merge @aymeric-roucher |
cc @aymeric-roucher @thomwolf @clefourrier @albertvillanova @NathanHB you can check from hf.co/new-blog