Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu训练很慢 #39

Closed
syyxsxx opened this issue Oct 18, 2024 · 3 comments
Closed

gpu训练很慢 #39

syyxsxx opened this issue Oct 18, 2024 · 3 comments

Comments

@syyxsxx
Copy link

syyxsxx commented Oct 18, 2024

你好请教下
在imagenet上面用10*4090上面需要20多天的样子 不知道这个速度又没有问题
我用jax的版本训练差不多1天就可以了(300个epoch)
多谢哈

@RobertLuo1
Copy link
Collaborator

你好,我们第二版训练Tokenizer和二阶段的Transformer都是用的是910B NPU,但是在第一版的时候我们用的是32卡V100,时间大概是4-5天,想问一下你这个jax版本是把代码改到了JAX吗,精度能对齐吗?

@wyf0912
Copy link

wyf0912 commented Dec 16, 2024

你好~请问IBQ最小的模型训练需要多少卡多久呀 @RobertLuo1

@RobertLuo1
Copy link
Collaborator

你好,很高兴你对我们工作的关注,IBQ一阶段我们都是用64卡910B训练的,不同codebook size的训练时间不同,应该是4-7天,第二阶段300M的模型我们用的是32卡,最大2B用的是96卡,训练时间2B大概需要9天,300M的大概是4天

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants