-
-
Couldn't load subscription status.
- Fork 10.9k
Description
The model to consider.
https://ai.gitcode.com/ascend-tribe/openPangu-Ultra-MoE-718B-V1.1
The architecture of the openPangu-Ultra-MoE-718B-V1.1 adopts the mainstream Multi-head Latent Attention (MLA), Multi-Token Prediction (MTP), high MoE sparsity, and features several different designs:
Depth-Scaled Sandwich-Norm and TinyInit: These techniques adjust the layer normalization structure and parameter initialization for improved training stability.
EP-Group load balancing loss: This technique optimizes the load balancing loss, achieving better expert specialization.
The closest model vllm already supports.
The closest model is Deepseek_v3.
What's your difficulty of supporting the model you want?
Most related mainstream modules have been well implemented in vllm, but the Depth-Scaled Sandwich-Norm structure and EP-Group router still need some adaptation.
Furthermore, more dense models in openPangu series will be added in the future.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.