Integrate GRPO for Replicating DeepSeek-R1 and Other Inference Models, Achieving a HuggingFace open‑r1–like Effect #2306

submartingales · 2025-02-02T12:35:53Z

⚠️ Please check that this feature request hasn't been suggested before.

I searched previous Ideas in Discussions didn't find any similar feature requests.
I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

We need to support replicating the performance of DeepSeek‑R1 on our existing model. The main requirement is to integrate the GRPO algorithm, which has already been implemented in git+https://github.com/huggingface/trl.git.

✔️ Solution

We need to support replicating the performance of DeepSeek‑R1 on our existing model. The main requirement is to integrate the GRPO algorithm, which has already been implemented in git+https://github.com/huggingface/trl.git.

❓ Alternatives

There have been ongoing efforts to support reproducing DeepSeek-R1 on https://github.com/huggingface/open-r1

📝 Additional Context

The GRPO algorithm has already been implemented in git+https://github.com/huggingface/trl.git.

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this feature has not been requested yet.
I have provided enough information for the maintainers to understand and evaluate this request.

The text was updated successfully, but these errors were encountered:

submartingales added the enhancement New feature or request label Feb 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate GRPO for Replicating DeepSeek-R1 and Other Inference Models, Achieving a HuggingFace open‑r1–like Effect #2306

Integrate GRPO for Replicating DeepSeek-R1 and Other Inference Models, Achieving a HuggingFace open‑r1–like Effect #2306

submartingales commented Feb 2, 2025

Integrate GRPO for Replicating DeepSeek-R1 and Other Inference Models, Achieving a HuggingFace open‑r1–like Effect #2306

Integrate GRPO for Replicating DeepSeek-R1 and Other Inference Models, Achieving a HuggingFace open‑r1–like Effect #2306

Comments

submartingales commented Feb 2, 2025

⚠️ Please check that this feature request hasn't been suggested before.

🔖 Feature description

✔️ Solution

❓ Alternatives

📝 Additional Context

Acknowledgements