Open Deep Research blog #2644

merveenoyan · 2025-02-04T16:27:58Z

cc @aymeric-roucher @thomwolf @clefourrier @albertvillanova @NathanHB you can check from hf.co/new-blog

albertvillanova

Just some suggestion to align the section headers.

open-deep-research.md

Co-authored-by: Albert Villanova del Moral <[email protected]>

open-deep-research.md

craffel · 2025-02-04T16:43:47Z

open-deep-research.md

+
+## TLDR
+
+Yesterday, OpenAI released [Deep Research](https://openai.com/index/introducing-deep-research/), a system that browses the web to summarize content and answer questions based on the summary. The system is impressive and blew our mind when we tried it for the first time.


mind -> minds

open-deep-research.md

craffel · 2025-02-04T16:44:39Z

open-deep-research.md

+One of the main results in the blog post is a strong improvement of performances on the [General AI Assistants benchmark (GAIA)](https://huggingface.co/gaia-benchmark), a benchmark we’ve been playing with recently as well, where they successfully reached near 67% correct answers on 1-shot on average, and 47.6% on especially challenging “level 3” questions that involve multiple steps of reasoning and tool usage (see below for a presentation of GAIA).
+
+
+DeepResearch is composed of an LLM (which can be selected from the current list of LLM provided by OpenAI, 4o, o1, o3, etc) and an internal “agentic framework” which guide the LLM to use tools like web search and organize its actions in steps. 


LLM -> LLMs, guide -> guides

craffel · 2025-02-04T16:44:49Z

open-deep-research.md

+
+DeepResearch is composed of an LLM (which can be selected from the current list of LLM provided by OpenAI, 4o, o1, o3, etc) and an internal “agentic framework” which guide the LLM to use tools like web search and organize its actions in steps. 
+
+While powerful LLM are now freely available in open-source (see e.g. [the recent DeepSeek R1 model](https://huggingface.co/deepseek-ai/DeepSeek-R1)), OpenAI didn’t disclose much about the agentic framework underlying Deep Research…


LLM -> LLMs

craffel · 2025-02-04T16:45:12Z

open-deep-research.md

+
+So we decided to embark on a mission to reproduce their results and open-source the needed framework along the way!
+
+The clock is ticking, let’s go! ⏱️


You didn't mention the 24-hour thing so this clock reference is kind of a non-sequitur

Oh i did but someone removed it

craffel · 2025-02-04T16:49:26Z

open-deep-research.md

+
+Now we need to provide the agent with the right set of tools. 
+
+**1.** A web browser. While a fully fledged web browser interaction like [Operator](https://openai.com/index/introducing-operator/) will be needed to reach full performance, we started with an extremely simple text-based web browser for now for our first PoC. You can find the code [here](https://github.com/huggingface/smolagents/blob/gaia-submission-r1/examples/open_deep_research/scripts/text_web_browser.py)


maybe just write out proof-of-concept

craffel · 2025-02-04T16:50:03Z

open-deep-research.md

+
+Here is a short roadmap of improvements which we feel would really improve these tools’ performance (feel free to open a PR and contribute!):
+
+- extending the number of file-formats which can be read.


file-formats -> file formats

craffel · 2025-02-04T16:50:11Z

open-deep-research.md

+
+## Results 🏅
+
+In our 24h+ reproduction sprint, we’ve already seen steady improvements in the performances of our agent on GAIA!


performances -> performance

craffel · 2025-02-04T16:50:22Z

open-deep-research.md

+
+In our 24h+ reproduction sprint, we’ve already seen steady improvements in the performances of our agent on GAIA!
+
+We’ve quickly gone up from the previous SOTA with an open framework, around 46% for Magentic-One, to our current performances of 54% on the validation set.


SOTA -> SoTA if you want to be pedantic

craffel · 2025-02-04T16:50:38Z

open-deep-research.md

+
+We’ve quickly gone up from the previous SOTA with an open framework, around 46% for Magentic-One, to our current performances of 54% on the validation set.
+
+This bump in performance is due mostly to letting our agents write their actions in code! Indeed, when switching to a standard agent that writes actions in Json instead of code, performance of the same setup is instantly degraded to 33% average on the validation set.


Json -> JSON

open-deep-research.md

Co-authored-by: Albert Villanova del Moral <[email protected]>

open-deep-research.md

Co-authored-by: Albert Villanova del Moral <[email protected]>

open-deep-research.md

Co-authored-by: Albert Villanova del Moral <[email protected]>

open-deep-research.md

Co-authored-by: Albert Villanova del Moral <[email protected]>

open-deep-research.md

Co-authored-by: Aymeric Roucher <[email protected]>

open-deep-research.md

merveenoyan · 2025-02-04T17:18:17Z

should we merge @aymeric-roucher

merveenoyan added 3 commits February 4, 2025 17:27

initial commit

9cbf1e0

fix tip block

b030d06

fix

ab9dc54

albertvillanova reviewed Feb 4, 2025

View reviewed changes

open-deep-research.md Outdated Show resolved Hide resolved

open-deep-research.md Outdated Show resolved Hide resolved

open-deep-research.md Outdated Show resolved Hide resolved

open-deep-research.md Outdated Show resolved Hide resolved

merveenoyan and others added 5 commits February 4, 2025 17:39

call to action

95f61b4

Update open-deep-research.md

7177de5

Co-authored-by: Albert Villanova del Moral <[email protected]>

Update open-deep-research.md

a6f89da

Co-authored-by: Albert Villanova del Moral <[email protected]>

Update open-deep-research.md

5113e8a

Co-authored-by: Albert Villanova del Moral <[email protected]>

Update open-deep-research.md

018e4c9

Co-authored-by: Albert Villanova del Moral <[email protected]>

albertvillanova reviewed Feb 4, 2025

View reviewed changes

open-deep-research.md Show resolved Hide resolved

craffel reviewed Feb 4, 2025

View reviewed changes

albertvillanova reviewed Feb 4, 2025

View reviewed changes

open-deep-research.md Outdated Show resolved Hide resolved

merveenoyan and others added 3 commits February 4, 2025 17:56

address comments

e9c425d

Update open-deep-research.md

1b0db5d

Update open-deep-research.md

40e8c49

Co-authored-by: Albert Villanova del Moral <[email protected]>

albertvillanova reviewed Feb 4, 2025

View reviewed changes

open-deep-research.md Outdated Show resolved Hide resolved

merveenoyan and others added 2 commits February 4, 2025 17:58

more comments

646489e

Update open-deep-research.md

f1e30e9

Co-authored-by: Albert Villanova del Moral <[email protected]>

albertvillanova reviewed Feb 4, 2025

View reviewed changes

open-deep-research.md Outdated Show resolved Hide resolved

aymeric-roucher reviewed Feb 4, 2025

View reviewed changes

open-deep-research.md Outdated Show resolved Hide resolved

aymeric-roucher and others added 2 commits February 4, 2025 18:01

Update open-deep-research.md

6003d6d

Update open-deep-research.md

b526fb8

Co-authored-by: Albert Villanova del Moral <[email protected]>

albertvillanova approved these changes Feb 4, 2025

View reviewed changes

add blog.yml

3e99d5b

albertvillanova reviewed Feb 4, 2025

View reviewed changes

open-deep-research.md Outdated Show resolved Hide resolved

merveenoyan and others added 2 commits February 4, 2025 18:04

Merge branch 'main' into add-open-deep-research

0a788f7

Update open-deep-research.md

6e30367

Co-authored-by: Albert Villanova del Moral <[email protected]>

aymeric-roucher reviewed Feb 4, 2025

View reviewed changes

open-deep-research.md Outdated Show resolved Hide resolved

Update open-deep-research.md

c64a973

aymeric-roucher reviewed Feb 4, 2025

View reviewed changes

open-deep-research.md Outdated Show resolved Hide resolved

Update open-deep-research.md

7bba75e

Co-authored-by: Aymeric Roucher <[email protected]>

aymeric-roucher reviewed Feb 4, 2025

View reviewed changes

open-deep-research.md Outdated Show resolved Hide resolved

Update open-deep-research.md

cb24c0a

merveenoyan merged commit 82890f5 into main Feb 4, 2025
1 check passed

merveenoyan deleted the add-open-deep-research branch February 4, 2025 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open Deep Research blog #2644

Open Deep Research blog #2644

merveenoyan commented Feb 4, 2025

albertvillanova left a comment

craffel Feb 4, 2025

craffel Feb 4, 2025

craffel Feb 4, 2025

craffel Feb 4, 2025

aymeric-roucher Feb 4, 2025

craffel Feb 4, 2025

craffel Feb 4, 2025

craffel Feb 4, 2025

craffel Feb 4, 2025

craffel Feb 4, 2025

merveenoyan commented Feb 4, 2025


		## TLDR

		Yesterday, OpenAI released [Deep Research](https://openai.com/index/introducing-deep-research/), a system that browses the web to summarize content and answer questions based on the summary. The system is impressive and blew our mind when we tried it for the first time.

		One of the main results in the blog post is a strong improvement of performances on the [General AI Assistants benchmark (GAIA)](https://huggingface.co/gaia-benchmark), a benchmark we’ve been playing with recently as well, where they successfully reached near 67% correct answers on 1-shot on average, and 47.6% on especially challenging “level 3” questions that involve multiple steps of reasoning and tool usage (see below for a presentation of GAIA).


		DeepResearch is composed of an LLM (which can be selected from the current list of LLM provided by OpenAI, 4o, o1, o3, etc) and an internal “agentic framework” which guide the LLM to use tools like web search and organize its actions in steps.


		DeepResearch is composed of an LLM (which can be selected from the current list of LLM provided by OpenAI, 4o, o1, o3, etc) and an internal “agentic framework” which guide the LLM to use tools like web search and organize its actions in steps.

		While powerful LLM are now freely available in open-source (see e.g. [the recent DeepSeek R1 model](https://huggingface.co/deepseek-ai/DeepSeek-R1)), OpenAI didn’t disclose much about the agentic framework underlying Deep Research…


		So we decided to embark on a mission to reproduce their results and open-source the needed framework along the way!

		The clock is ticking, let’s go! ⏱️


		Now we need to provide the agent with the right set of tools.

		1. A web browser. While a fully fledged web browser interaction like [Operator](https://openai.com/index/introducing-operator/) will be needed to reach full performance, we started with an extremely simple text-based web browser for now for our first PoC. You can find the code [here](https://github.com/huggingface/smolagents/blob/gaia-submission-r1/examples/open_deep_research/scripts/text_web_browser.py)


		Here is a short roadmap of improvements which we feel would really improve these tools’ performance (feel free to open a PR and contribute!):

		- extending the number of file-formats which can be read.


		## Results 🏅

		In our 24h+ reproduction sprint, we’ve already seen steady improvements in the performances of our agent on GAIA!


		In our 24h+ reproduction sprint, we’ve already seen steady improvements in the performances of our agent on GAIA!

		We’ve quickly gone up from the previous SOTA with an open framework, around 46% for Magentic-One, to our current performances of 54% on the validation set.


		We’ve quickly gone up from the previous SOTA with an open framework, around 46% for Magentic-One, to our current performances of 54% on the validation set.

		This bump in performance is due mostly to letting our agents write their actions in code! Indeed, when switching to a standard agent that writes actions in Json instead of code, performance of the same setup is instantly degraded to 33% average on the validation set.

Open Deep Research blog #2644

Open Deep Research blog #2644

Conversation

merveenoyan commented Feb 4, 2025

albertvillanova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merveenoyan commented Feb 4, 2025