Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open Deep Research blog #2644

Merged
merged 21 commits into from
Feb 4, 2025
Merged

Open Deep Research blog #2644

merged 21 commits into from
Feb 4, 2025

Conversation

merveenoyan
Copy link
Contributor

cc @aymeric-roucher @thomwolf @clefourrier @albertvillanova @NathanHB you can check from hf.co/new-blog

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some suggestion to align the section headers.

open-deep-research.md Outdated Show resolved Hide resolved
open-deep-research.md Outdated Show resolved Hide resolved
open-deep-research.md Outdated Show resolved Hide resolved
open-deep-research.md Outdated Show resolved Hide resolved
merveenoyan and others added 5 commits February 4, 2025 17:39
Co-authored-by: Albert Villanova del Moral <[email protected]>
Co-authored-by: Albert Villanova del Moral <[email protected]>
Co-authored-by: Albert Villanova del Moral <[email protected]>
Co-authored-by: Albert Villanova del Moral <[email protected]>

## TLDR

Yesterday, OpenAI released [Deep Research](https://openai.com/index/introducing-deep-research/), a system that browses the web to summarize content and answer questions based on the summary. The system is impressive and blew our mind when we tried it for the first time.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind -> minds

open-deep-research.md Show resolved Hide resolved
One of the main results in the blog post is a strong improvement of performances on the [General AI Assistants benchmark (GAIA)](https://huggingface.co/gaia-benchmark), a benchmark we’ve been playing with recently as well, where they successfully reached near 67% correct answers on 1-shot on average, and 47.6% on especially challenging “level 3” questions that involve multiple steps of reasoning and tool usage (see below for a presentation of GAIA).


DeepResearch is composed of an LLM (which can be selected from the current list of LLM provided by OpenAI, 4o, o1, o3, etc) and an internal “agentic framework” which guide the LLM to use tools like web search and organize its actions in steps.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLM -> LLMs, guide -> guides


DeepResearch is composed of an LLM (which can be selected from the current list of LLM provided by OpenAI, 4o, o1, o3, etc) and an internal “agentic framework” which guide the LLM to use tools like web search and organize its actions in steps.

While powerful LLM are now freely available in open-source (see e.g. [the recent DeepSeek R1 model](https://huggingface.co/deepseek-ai/DeepSeek-R1)), OpenAI didn’t disclose much about the agentic framework underlying Deep Research…
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLM -> LLMs


So we decided to embark on a mission to reproduce their results and open-source the needed framework along the way!

The clock is ticking, let’s go! ⏱️
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't mention the 24-hour thing so this clock reference is kind of a non-sequitur

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh i did but someone removed it


Now we need to provide the agent with the right set of tools.

**1.** A web browser. While a fully fledged web browser interaction like [Operator](https://openai.com/index/introducing-operator/) will be needed to reach full performance, we started with an extremely simple text-based web browser for now for our first PoC. You can find the code [here](https://github.com/huggingface/smolagents/blob/gaia-submission-r1/examples/open_deep_research/scripts/text_web_browser.py)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just write out proof-of-concept


Here is a short roadmap of improvements which we feel would really improve these tools’ performance (feel free to open a PR and contribute!):

- extending the number of file-formats which can be read.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file-formats -> file formats


## Results 🏅

In our 24h+ reproduction sprint, we’ve already seen steady improvements in the performances of our agent on GAIA!
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

performances -> performance


In our 24h+ reproduction sprint, we’ve already seen steady improvements in the performances of our agent on GAIA!

We’ve quickly gone up from the previous SOTA with an open framework, around 46% for Magentic-One, to our current performances of 54% on the validation set.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SOTA -> SoTA if you want to be pedantic


We’ve quickly gone up from the previous SOTA with an open framework, around 46% for Magentic-One, to our current performances of 54% on the validation set.

This bump in performance is due mostly to letting our agents write their actions in code! Indeed, when switching to a standard agent that writes actions in Json instead of code, performance of the same setup is instantly degraded to 33% average on the validation set.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Json -> JSON

merveenoyan and others added 2 commits February 4, 2025 17:58
Co-authored-by: Albert Villanova del Moral <[email protected]>
Co-authored-by: Aymeric Roucher <[email protected]>
@merveenoyan
Copy link
Contributor Author

should we merge @aymeric-roucher

@merveenoyan merveenoyan merged commit 82890f5 into main Feb 4, 2025
1 check passed
@merveenoyan merveenoyan deleted the add-open-deep-research branch February 4, 2025 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants