Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when will add training of deepseek v3? it`s a big update of llamas #2228

Open
5 tasks done
sankexin opened this issue Dec 31, 2024 · 2 comments · May be fixed by #2230
Open
5 tasks done

when will add training of deepseek v3? it`s a big update of llamas #2228

sankexin opened this issue Dec 31, 2024 · 2 comments · May be fixed by #2230
Labels
enhancement New feature or request

Comments

@sankexin
Copy link

sankexin commented Dec 31, 2024

⚠️ Please check that this feature request hasn't been suggested before.

  • I searched previous Ideas in Discussions didn't find any similar feature requests.
  • I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

This project is very good, and DeepSeekv3's algorithm will become the foundation of a new generation of universal models.
Joining early would be very beneficial for this project, and of course, it would also be easier for us to use.

✔️ Solution

for example:
1、FP8 mixed precision training
2、like medusa of axolot one years ago: multi-token prediction training
3、Latent 𝐜𝑡𝐾V:q_lora_rank、kv_lora_rank
4、Large model distillation small model

https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/model.py

These technologies are very efficient in deepseek v3,and can be extended to other models.

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this feature has not been requested yet.
  • I have provided enough information for the maintainers to understand and evaluate this request.
@sankexin sankexin added the enhancement New feature or request label Dec 31, 2024
@NanoCode012
Copy link
Collaborator

Are you able to try #2230 ? I think this minimal change should allow it to work with packing.

@NanoCode012 NanoCode012 linked a pull request Jan 2, 2025 that will close this issue
@ehartford
Copy link
Collaborator

need fp8 native, I don't wanna train in 16bit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants