Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reland] Add Dynamic Model Import and ModelSpec Definition #837

Merged
merged 24 commits into from
Feb 12, 2025
Merged

Conversation

fegin
Copy link
Contributor

@fegin fegin commented Feb 12, 2025

ghstack didn't land #814 correctly. Open this PR to do so. The detail discussion please refer to #814

What does this PR do?

  1. This PR introduces ModelSpec to describe a model and how to parallelize a model.
    • All the models should call register_model_spec().
    • Users can also use --experimental.custom_model_path to dynamically import a model that is not implemented by TorchTitan. The module should also call register_model_spec().
  2. This PR also refactors OptimizersContainer and LRSchedulersContainers
    • Fixes an issue that optimizers will accept parameters that requires_grad is False.
    • Improve typing and docstring.
    • Improve the function and class reusability.
    • OptimizersContainer now inherits from torch.optim.Optimizer .
  3. This PR also moves parallelize_llama and pipelining_llama to the llama folder.

Why do we need this PR?
This allows users to use TorchTitan with a new model without intrusively change TorchTitan code.

Next steps

  1. Dataloader is not included
  2. Checkpoint customization is not included yet.

fegin added 24 commits January 31, 2025 10:40
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added the ci-no-td label Feb 12, 2025
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 12, 2025
@fegin fegin requested review from tianyu-l and fduwjj February 12, 2025 07:55
@fegin fegin merged commit fb0a942 into main Feb 12, 2025
9 checks passed
garrett361 pushed a commit to garrett361/torchtitan that referenced this pull request Feb 12, 2025
`ghstack` didn't land pytorch#814
correctly. Open this PR to do so. The detail discussion please refer to
pytorch#814

**What does this PR do?**
1. This PR introduces `ModelSpec` to describe a model and how to
parallelize a model.
    * All the models should call `register_model_spec()`. 
* Users can also use `--experimental.custom_model_path` to dynamically
import a model that is not implemented by TorchTitan. The module should
also call `register_model_spec()`.
2. This PR also refactors `OptimizersContainer` and
`LRSchedulersContainers`
* Fixes an issue that optimizers will accept parameters that
requires_grad is False.
    * Improve typing and docstring.
    * Improve the function and class reusability.
    * `OptimizersContainer` now inherits from `torch.optim.Optimizer` .
3. This PR also moves `parallelize_llama` and `pipelining_llama` to the
`llama` folder.

**Why do we need this PR?**
This allows users to use TorchTitan with a new model without intrusively
change TorchTitan code.

**Next steps**
1. Dataloader is not included
2. Checkpoint customization is not included yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-no-td CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants