Any plans for multi-gpu training support? #88

joeljang · 2024-11-27T05:20:47Z

Hey Genmo team, thank you so much for the open-source of the fine-tuning code.

Are there any plans for releasing a multi-gpu fine-tuning version?
Was there a specific reason there is only a single-gpu fine-tuning code release? Is it nontrivial to make it multi-gpu training?

Thanks in advance!

paras-genmo · 2024-12-01T01:28:10Z

Hi @joeljang, at the moment we're not planning to release a multi-GPU trainer. One of the top community requests has been to reduce the minimum resource requirements for using and fine-tuning Mochi, so we made sure the LoRA tuner could run on a single GPU.

vedantroy · 2024-12-01T10:49:00Z

@joeljang I don't think it should be terribly difficult to do. You'll need to implement the context parallel backwards passes: https://github.com/genmoai/mochi/blob/main/src/genmo/mochi_preview/dit/joint_model/context_parallel.py + tweak the training loop a bit.

alexnwang · 2025-01-07T23:33:28Z

@vedantroy Following up on this -- wouldn't it just be a matter of adding DDP? As forward + backward fit fully on a single GPU, shouldn't of sharding data and averaging gradient updates be sufficient?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any plans for multi-gpu training support? #88

Any plans for multi-gpu training support? #88

joeljang commented Nov 27, 2024 •

edited

Loading

paras-genmo commented Dec 1, 2024

vedantroy commented Dec 1, 2024

alexnwang commented Jan 7, 2025

Any plans for multi-gpu training support? #88

Any plans for multi-gpu training support? #88

Comments

joeljang commented Nov 27, 2024 • edited Loading

paras-genmo commented Dec 1, 2024

vedantroy commented Dec 1, 2024

alexnwang commented Jan 7, 2025

joeljang commented Nov 27, 2024 •

edited

Loading