The kat model training time problem #30

yuhangxu666 · 2025-03-25T08:28:42Z

Hi, I read in the paper that the gpu you guys are using is a single a5000 to train kat, what I am using is a single a6000, when I train a kat model such as kat_base, I cranked up the batchsize to 512, and it took me up to a day to train an epoch down the line, then I tried to train a smaller model: kat_ tiny, and adjusted the batchsize to 1024, it also took up to 10 hours to train an epoch, which is very time-consuming. Is this normal or am I mistaken somewhere?

Adamdad · 2025-03-25T08:31:47Z

Can you see your GPU utilization? This is still slow. I use 8xA5000 and train around 1-2 days for 300 epoches.

yuhangxu666 · 2025-03-25T08:38:23Z

你能查看 GPU 利用率吗？这仍然很慢。我使用 8xA5000，训练大约 1-2 天，共 300 个 epoches。

My gpu utilization is shown above

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The kat model training time problem #30

The kat model training time problem #30

yuhangxu666 commented Mar 25, 2025

Adamdad commented Mar 25, 2025

Uh oh!

yuhangxu666 commented Mar 25, 2025

Uh oh!

The kat model training time problem #30

The kat model training time problem #30

Comments

yuhangxu666 commented Mar 25, 2025

Adamdad commented Mar 25, 2025

Uh oh!

yuhangxu666 commented Mar 25, 2025

Uh oh!