You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
⚠️ Please check that this feature request hasn't been suggested before.
I searched previous Ideas in Discussions didn't find any similar feature requests.
I searched previous Issues didn't find any similar feature requests.
🔖 Feature description
We need to support replicating the performance of DeepSeek‑R1 on our existing model. The main requirement is to integrate the GRPO algorithm, which has already been implemented in git+https://github.com/huggingface/trl.git.
✔️ Solution
We need to support replicating the performance of DeepSeek‑R1 on our existing model. The main requirement is to integrate the GRPO algorithm, which has already been implemented in git+https://github.com/huggingface/trl.git.
🔖 Feature description
We need to support replicating the performance of DeepSeek‑R1 on our existing model. The main requirement is to integrate the GRPO algorithm, which has already been implemented in
git+https://github.com/huggingface/trl.git
.✔️ Solution
We need to support replicating the performance of DeepSeek‑R1 on our existing model. The main requirement is to integrate the GRPO algorithm, which has already been implemented in
git+https://github.com/huggingface/trl.git
.❓ Alternatives
There have been ongoing efforts to support reproducing DeepSeek-R1 on https://github.com/huggingface/open-r1
📝 Additional Context
The GRPO algorithm has already been implemented in
git+https://github.com/huggingface/trl.git
.Acknowledgements
The text was updated successfully, but these errors were encountered: