无辅助损失的专家路由
#56
by
qing9
- opened
https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/modeling_deepseek.py#:~:text=group_scores%20%3D%20(,%23%20%5Bn%2C%20n_group%5D
这个地方的topk(2, dim=-1)是不是有问题?应该是topk(topk_group)?