Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Training Costs #4

Open
HUAFOR opened this issue Sep 14, 2024 · 6 comments
Open

About Training Costs #4

HUAFOR opened this issue Sep 14, 2024 · 6 comments

Comments

@HUAFOR
Copy link

HUAFOR commented Sep 14, 2024

Hi, Thanks for you great work to reproduce the training code for GR-1! I wonder how long it takes to complete the training process for GR-1 from scratch?[ABC->D setting]

@StarCycle
Copy link
Collaborator

@HUAFOR

Sorry! My fault! I just saw your issue here...

It's not recommended to use this repo to train it from scratch. Some developers tried it but the performance is not as good as the original version, though I try my best to recover every training details they used.

By contrast, you can train from the pretrained checkpoint provided by GR-MG.

For faster response you can send me an email...sorry again...I am working on video generation model and my own MimicTest policy toolbox in these days

Best,
Zhuoheng

@StarCycle
Copy link
Collaborator

For your original question, please refer to this issue

They use 32 V100 32GB. But no worry, in my experience you can achieve roughly the same speed with 8*4090 GPU. If you open torch compiling option in my repo, it can even be 50% faster!

@1786707378
Copy link

@HUAFOR

不好意思!我的错!我刚刚在这里看到了你的问题......

不建议使用此 repo 从头开始训练它。一些开发人员尝试了一下,但性能不如原始版本,尽管我尽我所能恢复他们使用的每一个训练细节。

相比之下,您可以从 GR-MG 提供的预训练检查点进行训练。

为了更快地回复,您可以给我发送电子邮件...再次抱歉...这些天我正在研究视频生成模型和我自己的 MimicTest 策略工具箱

最好的,卓恒

Hello, I would like to know whether the pretrained checkpoint provided by GR-MG is "pretrained.pt". I used that checkpoint for training, but the results were very poor, far from the performance achieved with ByteDance's "snapshot_ABC.pt". I also want to know if your GR-Chunk is trained based on "snapshot_ABC.pt" or "pretrained.pt". I’m wondering if my training method is causing the issue. Thank you.

@StarCycle
Copy link
Collaborator

Hello @1786707378,

I haven't tried the pretrained checkpoint provided by GR-MG...Can you load it easily to my code? My GR-Chunk is based on "snapshot_ABC.pt"

Could you please let me know which training method you are using? We can have a phone call on wechat if you have time (my ID: StarRingSpace)

@1786707378
Copy link

I think I made a mistake. I just made a simple attempt with your code and "pretrained. pt", and it seems that the model is incorrect. Thank you for your reply

@StarCycle
Copy link
Collaborator

Emm I dont fully understand but good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants