Skip to content

The kat model training time problem #30

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
yuhangxu666 opened this issue Mar 25, 2025 · 2 comments
Open

The kat model training time problem #30

yuhangxu666 opened this issue Mar 25, 2025 · 2 comments

Comments

@yuhangxu666
Copy link

Hi, I read in the paper that the gpu you guys are using is a single a5000 to train kat, what I am using is a single a6000, when I train a kat model such as kat_base, I cranked up the batchsize to 512, and it took me up to a day to train an epoch down the line, then I tried to train a smaller model: kat_ tiny, and adjusted the batchsize to 1024, it also took up to 10 hours to train an epoch, which is very time-consuming. Is this normal or am I mistaken somewhere?

Image

@Adamdad
Copy link
Owner

Adamdad commented Mar 25, 2025

Can you see your GPU utilization? This is still slow. I use 8xA5000 and train around 1-2 days for 300 epoches.

@yuhangxu666
Copy link
Author

你能查看 GPU 利用率吗?这仍然很慢。我使用 8xA5000,训练大约 1-2 天,共 300 个 epoches。

Image
My gpu utilization is shown above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants