[Reland] Add Dynamic Model Import and ModelSpec Definition #837

fegin · 2025-02-12T07:48:09Z

ghstack didn't land #814 correctly. Open this PR to do so. The detail discussion please refer to #814

What does this PR do?

This PR introduces ModelSpec to describe a model and how to parallelize a model.
- All the models should call register_model_spec().
- Users can also use --experimental.custom_model_path to dynamically import a model that is not implemented by TorchTitan. The module should also call register_model_spec().
This PR also refactors OptimizersContainer and LRSchedulersContainers
- Fixes an issue that optimizers will accept parameters that requires_grad is False.
- Improve typing and docstring.
- Improve the function and class reusability.
- OptimizersContainer now inherits from torch.optim.Optimizer .
This PR also moves parallelize_llama and pipelining_llama to the llama folder.

Why do we need this PR?
This allows users to use TorchTitan with a new model without intrusively change TorchTitan code.

Next steps

Dataloader is not included
Checkpoint customization is not included yet.

[ghstack-poisoned]

`ghstack` didn't land pytorch#814 correctly. Open this PR to do so. The detail discussion please refer to pytorch#814 **What does this PR do?** 1. This PR introduces `ModelSpec` to describe a model and how to parallelize a model. * All the models should call `register_model_spec()`. * Users can also use `--experimental.custom_model_path` to dynamically import a model that is not implemented by TorchTitan. The module should also call `register_model_spec()`. 2. This PR also refactors `OptimizersContainer` and `LRSchedulersContainers` * Fixes an issue that optimizers will accept parameters that requires_grad is False. * Improve typing and docstring. * Improve the function and class reusability. * `OptimizersContainer` now inherits from `torch.optim.Optimizer` . 3. This PR also moves `parallelize_llama` and `pipelining_llama` to the `llama` folder. **Why do we need this PR?** This allows users to use TorchTitan with a new model without intrusively change TorchTitan code. **Next steps** 1. Dataloader is not included 2. Checkpoint customization is not included yet.

fegin added 24 commits January 31, 2025 10:40

Update

df1bc6a

[ghstack-poisoned]

Update (base update)

705c1f8

[ghstack-poisoned]

Update

dfc1649

[ghstack-poisoned]

Update

720f12a

[ghstack-poisoned]

Update

225bfcc

[ghstack-poisoned]

Update

650152e

[ghstack-poisoned]

Update

687fda9

[ghstack-poisoned]

Update (base update)

5cffd5a

[ghstack-poisoned]

Update

6a51325

[ghstack-poisoned]

Update

5b33b65

[ghstack-poisoned]

Update (base update)

22ac944

[ghstack-poisoned]

Update

2e569d7

[ghstack-poisoned]

Update

bab9bf5

[ghstack-poisoned]

Update

210707a

[ghstack-poisoned]

Update

6fb1d74

[ghstack-poisoned]

Update

02c87b2

[ghstack-poisoned]

Update

4234a26

[ghstack-poisoned]

Update

a5491da

[ghstack-poisoned]

Update

b5cd485

[ghstack-poisoned]

Update

078d4ad

[ghstack-poisoned]

Update

caf5b97

[ghstack-poisoned]

Update (base update)

5467f2b

[ghstack-poisoned]

Update

c131309

[ghstack-poisoned]

Update

2f4d1ce

[ghstack-poisoned]

pytorch-bot bot added the ci-no-td label Feb 12, 2025

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 12, 2025

fegin requested review from tianyu-l and fduwjj February 12, 2025 07:55

fegin merged commit fb0a942 into main Feb 12, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Reland] Add Dynamic Model Import and ModelSpec Definition #837

[Reland] Add Dynamic Model Import and ModelSpec Definition #837

fegin commented Feb 12, 2025 •

edited

Loading

[Reland] Add Dynamic Model Import and ModelSpec Definition #837

[Reland] Add Dynamic Model Import and ModelSpec Definition #837

Conversation

fegin commented Feb 12, 2025 • edited Loading

fegin commented Feb 12, 2025 •

edited

Loading