Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate GRPO for Replicating DeepSeek-R1 and Other Inference Models, Achieving a HuggingFace open‑r1–like Effect #2306

Open
5 tasks done
submartingales opened this issue Feb 2, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@submartingales
Copy link

⚠️ Please check that this feature request hasn't been suggested before.

  • I searched previous Ideas in Discussions didn't find any similar feature requests.
  • I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

We need to support replicating the performance of DeepSeek‑R1 on our existing model. The main requirement is to integrate the GRPO algorithm, which has already been implemented in git+https://github.com/huggingface/trl.git.

✔️ Solution

We need to support replicating the performance of DeepSeek‑R1 on our existing model. The main requirement is to integrate the GRPO algorithm, which has already been implemented in git+https://github.com/huggingface/trl.git.

❓ Alternatives

There have been ongoing efforts to support reproducing DeepSeek-R1 on https://github.com/huggingface/open-r1

📝 Additional Context

The GRPO algorithm has already been implemented in git+https://github.com/huggingface/trl.git.

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this feature has not been requested yet.
  • I have provided enough information for the maintainers to understand and evaluate this request.
@submartingales submartingales added the enhancement New feature or request label Feb 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant