You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
⚠️ Please check that this feature request hasn't been suggested before.
I searched previous Ideas in Discussions didn't find any similar feature requests.
I searched previous Issues didn't find any similar feature requests.
🔖 Feature description
This project is very good, and DeepSeekv3's algorithm will become the foundation of a new generation of universal models.
Joining early would be very beneficial for this project, and of course, it would also be easier for us to use.
✔️ Solution
for example:
1、FP8 mixed precision training
2、like medusa of axolot one years ago: multi-token prediction training
3、Latent 𝐜𝑡𝐾V:q_lora_rank、kv_lora_rank
4、Large model distillation small model
🔖 Feature description
This project is very good, and DeepSeekv3's algorithm will become the foundation of a new generation of universal models.
Joining early would be very beneficial for this project, and of course, it would also be easier for us to use.
✔️ Solution
for example:
1、FP8 mixed precision training
2、like medusa of axolot one years ago: multi-token prediction training
3、Latent 𝐜𝑡𝐾V:q_lora_rank、kv_lora_rank
4、Large model distillation small model
https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/model.py
These technologies are very efficient in deepseek v3,and can be extended to other models.
❓ Alternatives
No response
📝 Additional Context
No response
Acknowledgements
The text was updated successfully, but these errors were encountered: