Skip to content

Commit 804d53e

Browse files
Authors, centering, links to models/datasets
1 parent acef5ba commit 804d53e

File tree

1 file changed

+23
-19
lines changed

1 file changed

+23
-19
lines changed

intel-deepmath.md

Lines changed: 23 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -5,34 +5,38 @@ authors:
55
- user: danf
66
guest: true
77
org: Intel
8+
- user: mber
9+
guest: true
10+
org: Intel
11+
- user: moshew
12+
guest: true
13+
org: Intel
14+
815
---
916

10-
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/intel-deepmath/deepmath-figure.jpg" style="width:600" alt="An LLM is using a calculator to answer questions." />
17+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/intel-deepmath/deepmath-figure.jpg" width=700 alt="An LLM is using a calculator to answer questions." />
1118

1219
# DeepMath: A Lightweight Math Reasoning Agent for LLMs
1320

14-
*By Intel AI — Daniel Fleischer, Moshe Berchansky, Moshe Wasserblat*
15-
16-
1721
Large language models (LLMs) have made impressive strides in reasoning tasks, yet mathematical problem-solving remains a challenge. Traditional "chain-of-thought" reasoning often produces verbose explanations and error-prone arithmetic. **DeepMath** tackles this by combining a small Python executor with a fine-tuned LLM, enabling concise, computation-driven reasoning.
1822

1923
## The Big Idea
2024

21-
DeepMath is built on **Qwen3-4B Thinking** and fine-tuned with **GRPO (Group Relative Policy Optimization)**. Instead of verbose text, the model emits **tiny Python snippets** for intermediate steps, runs them in a secure sandbox, and folds the results back into its reasoning, reducing errors and output length.
25+
DeepMath is built on **[Qwen3-4B Thinking](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507)** and fine-tuned with **GRPO (Group Relative Policy Optimization)**. Instead of verbose text, the model emits **tiny Python snippets** for intermediate steps, runs them in a secure sandbox, and folds the results back into its reasoning, reducing errors and output length.
2226

2327
✅ No file I/O, no network calls, strict timeouts.
2428

2529
✅ Safe, deterministic, and auditable.
2630

27-
We evaluate DeepMath on four math datasets: **MATH500, AIME, HMMT, and HLE,** and show that:
31+
We evaluate DeepMath on four math datasets: **[MATH500](https://huggingface.co/datasets/HuggingFaceH4/MATH-500), [AIME](https://huggingface.co/datasets/opencompass/AIME2025), [HMMT](https://huggingface.co/datasets/MathArena/hmmt_feb_2025), and [HLE](https://huggingface.co/datasets/cais/hle),** and show that:
2832

2933
- The math agent alone improves accuracy and reduces verbosity.
3034

3135
- GRPO training alone biases outputs toward brevity and correctness.
3236

3337
- Combining the agent with GRPO yields the largest gains.
3438

35-
👉 Code and evaluation scripts: <https://github.com/IntelLabs/DeepMath>
39+
👉 Code and evaluation scripts: <https://github.com/IntelLabs/DeepMath> \
3640
👉 Model: <https://huggingface.co/Intel/deepmath-v1>
3741

3842
## Why DeepMath?
@@ -52,10 +56,10 @@ DeepMath implements both. The model learns to generate short Python snippets, wh
5256
- Inference: based on [SmolAgents](https://github.com/huggingface/smolagents/), a math agent was created. vLLM is used as the inference engine.
5357
- Training: based on the GRPO trainer in [TRL](https://github.com/huggingface/trl), we modified TRL's vLLM client and server to generate GRPO completions using our DeepMath agent.
5458

55-
<figure>
56-
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/intel-deepmath/trl-grpo-vllm-deepmath.png" style="width:400" alt="Changes to vLLM client and server in TRL library." />
57-
<figcaption><p>Figure 1: The vLLM client and server were modified to use the DeepMath agent in generating the candidates, while using the vLLM backend.</p></figcaption>
58-
</figure>
59+
<p align="center">
60+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/intel-deepmath/trl-grpo-vllm-deepmath.png" width=600 alt="Changes to vLLM client and server in TRL library." /><br>
61+
<em>Figure 1: The vLLM client and server were modified to use the DeepMath agent in generating the candidates, while using the vLLM backend.</em>
62+
</p>
5963

6064
- **Agent Interface:** During inference, the model can output normal tokens or special agent calls containing Python snippets.
6165

@@ -69,10 +73,10 @@ DeepMath implements both. The model learns to generate short Python snippets, wh
6973

7074
- **Interpretability:** Snippets are readable and auditable.
7175

72-
<figure>
73-
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/intel-deepmath/output-example.png" style="width:700" alt="Output example: it contains a short python snippet as well as its output which is used in the reasoning process."/>
74-
<figcaption><p>Figure 2: Output example where python code is generated, evaluated and the answer is inserted into the trace and used for context.</p></figcaption>
75-
</figure>
76+
<p align="center">
77+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/intel-deepmath/output-example.png" width=800 alt="Output example: it contains a short python snippet as well as its output which is used in the reasoning process."/><br>
78+
<em>Figure 2: Output example where python code is generated, evaluated and the answer is inserted into the trace and used for context.</em>
79+
</p>
7680

7781
## Training with GRPO
7882

@@ -98,7 +102,9 @@ We benchmarked DeepMath against baselines on four datasets. Metrics include:
98102

99103
- **Mean output length** (brevity).
100104

101-
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/intel-deepmath/main-results.png" style="width:800" alt="Main results table."/>
105+
<p align="center">
106+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/intel-deepmath/main-results.png" width=1000 alt="Main results table."/>
107+
</p>
102108

103109
**Key Insight:** DeepMath reduces output length by up to **66%** while improving accuracy on challenging datasets.
104110

@@ -116,9 +122,7 @@ DeepMath demonstrates a practical and lightweight way to combine a small executo
116122

117123
## Try It Yourself
118124

119-
Check out the GitHub repo and share your feedback! Contributions welcome. 🚀
120-
121-
<https://github.com/intel/DeepMath>.
125+
Check out the [GitHub repo](https://github.com/IntelLabs/DeepMath) and share your feedback! Contributions welcome. 🚀
122126

123127
## Citation
124128

0 commit comments

Comments
 (0)