Skip to content

Commit 392b4b0

Browse files
committed
update readme
1 parent a9b29db commit 392b4b0

File tree

2 files changed

+253
-81
lines changed

2 files changed

+253
-81
lines changed

CONTRIBUTING.md

+224
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,224 @@
1+
<!---
2+
Copyright 2022 The HuggingFace Team. All rights reserved.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
-->
16+
17+
# How to contribute to 🤗 Nanotron?
18+
19+
Everyone is welcome to contribute, and we value everybody's contribution. Code
20+
is thus not the only way to help the community. Answering questions, helping
21+
others, reaching out and improving the documentations are immensely valuable to
22+
the community.
23+
24+
It also helps us if you spread the word: reference the library from blog posts
25+
on the awesome projects it made possible, shout out on Twitter every time it has
26+
helped you, or simply star the repo to say "thank you".
27+
28+
Whichever way you choose to contribute, please be mindful to respect our
29+
[code of conduct](CODE_OF_CONDUCT.md).
30+
31+
## You can contribute in so many ways!
32+
33+
Some of the ways you can contribute to nanotron:
34+
* Fixing outstanding issues with the existing code;
35+
* Contributing to the examples or to the documentation;
36+
* Submitting issues related to bugs or desired new features.
37+
38+
## Submitting a new issue or feature request
39+
40+
Do your best to follow these guidelines when submitting an issue or a feature
41+
request. It will make it easier for us to come back to you quickly and with good
42+
feedback.
43+
44+
### Did you find a bug?
45+
46+
The 🤗 Nanotron library is robust and reliable thanks to the users who notify us of
47+
the problems they encounter. So thank you for reporting an issue.
48+
49+
First, we would really appreciate it if you could **make sure the bug was not
50+
already reported** (use the search bar on Github under Issues).
51+
52+
Did not find it? :( So we can act quickly on it, please follow these steps:
53+
54+
* Include your **OS type and version**, the versions of **Python** and **PyTorch**.
55+
* A short, self-contained, code snippet that allows us to reproduce the bug in
56+
less than 30s;
57+
* Provide your Nanotron configuration used for the run;
58+
* Describe the expected behavior and the actual behavior;
59+
60+
### Do you want a new feature?
61+
62+
A good feature request addresses the following points:
63+
64+
1. Motivation first:
65+
* Is it related to a problem/frustration with the library? If so, please explain
66+
why. Providing a code snippet that demonstrates the problem is best.
67+
* Is it related to something you would need for a project? We'd love to hear
68+
about it!
69+
* Is it something you worked on and think could benefit the community?
70+
Awesome! Tell us what problem it solved for you.
71+
2. Write a *full paragraph* describing the feature;
72+
3. Provide a **code snippet** that demonstrates its future use;
73+
4. In case this is related to a paper, please attach a link;
74+
5. Attach any additional information (drawings, screenshots, etc.) you think may help.
75+
76+
If your issue is well written we're already 80% of the way there by the time you
77+
post it.
78+
79+
## Submitting a pull request (PR)
80+
81+
Before writing code, we strongly advise you to search through the existing PRs or
82+
issues to make sure that nobody is already working on the same thing. If you are
83+
unsure, it is always a good idea to open an issue to get some feedback.
84+
85+
You will need basic `git` proficiency to be able to contribute to
86+
🤗 Nanotron. `git` is not the easiest tool to use but it has the greatest
87+
manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro
88+
Git](https://git-scm.com/book/en/v2) is a very good reference.
89+
90+
Follow these steps to start contributing:
91+
92+
1. Fork the [repository](https://github.com/huggingface/nanotron) by
93+
clicking on the 'Fork' button on the repository's page. This creates a copy of the code
94+
under your GitHub user account.
95+
96+
2. Clone your fork to your local disk, and add the base repository as a remote. The following command
97+
assumes you have your public SSH key uploaded to GitHub. See the following guide for more
98+
[information](https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository).
99+
100+
```bash
101+
$ git clone [email protected]:<your Github handle>/nanotron.git
102+
$ cd nanotron
103+
$ git remote add upstream https://github.com/huggingface/nanotron.git
104+
```
105+
106+
3. Create a new branch to hold your development changes, and do this for every new PR you work on.
107+
108+
Start by synchronizing your `main` branch with the `upstream/main` branch (ore details in the [GitHub Docs](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/syncing-a-fork)):
109+
110+
```bash
111+
$ git checkout main
112+
$ git fetch upstream
113+
$ git merge upstream/main
114+
```
115+
116+
Once your `main` branch is synchronized, create a new branch from it:
117+
118+
```bash
119+
$ git checkout -b a-descriptive-name-for-my-changes
120+
```
121+
122+
**Do not** work on the `main` branch.
123+
124+
4. Set up a development environment by running the following command in a conda or a virtual environment you've created for working on this library:
125+
126+
```bash
127+
$ pip install -e ".[dev]"
128+
$ pip install -e ".[test]"
129+
$ pre-commit install
130+
```
131+
132+
(If nanotron was already installed in the virtual environment, remove
133+
it with `pip uninstall nanotron` before reinstalling it in editable
134+
mode with the `-e` flag.)
135+
136+
Alternatively, if you are using [Visual Studio Code](https://code.visualstudio.com/Download), the fastest way to get set up is by using
137+
the provided Dev Container. Documentation on how to get started with dev containers is available [here](https://code.visualstudio.com/docs/remote/containers).
138+
139+
5. Develop the features on your branch.
140+
141+
As you work on the features, you should make sure that the test suite
142+
passes. You should run the tests impacted by your changes like this (see
143+
below an explanation regarding the environment variable):
144+
145+
```bash
146+
$ pytest tests/<TEST_TO_RUN>.py
147+
```
148+
149+
`nanotron` relies on `ruff` to format its source code
150+
consistently. After you make changes, apply automatic style corrections and code verifications
151+
that can't be automated in one go with:
152+
153+
This target is also optimized to only work with files modified by the PR you're working on.
154+
155+
If you prefer to run the checks one after the other, the following command apply the
156+
style corrections:
157+
158+
```bash
159+
$ pre-commit run --all-files
160+
```
161+
162+
Once you're happy with your changes, add changed files using `git add` and
163+
make a commit with `git commit` to record your changes locally:
164+
165+
```bash
166+
$ git add modified_file.py
167+
$ git commit
168+
```
169+
170+
Please write [good commit messages](https://chris.beams.io/posts/git-commit/).
171+
172+
It is a good idea to sync your copy of the code with the original
173+
repository regularly. This way you can quickly account for changes:
174+
175+
```bash
176+
$ git fetch upstream
177+
$ git rebase upstream/main
178+
```
179+
180+
Push the changes to your account using:
181+
182+
```bash
183+
$ git push -u origin a-descriptive-name-for-my-changes
184+
```
185+
186+
6. Once you are satisfied (**and the checklist below is happy too**), go to the
187+
webpage of your fork on GitHub. Click on 'Pull request' to send your changes
188+
to the project maintainers for review.
189+
190+
7. It's ok if maintainers ask you for changes. It happens to core contributors
191+
too! So everyone can see the changes in the Pull request, work in your local
192+
branch and push the changes to your fork. They will automatically appear in
193+
the pull request.
194+
195+
196+
### Checklist
197+
198+
1. The title of your pull request should be a summary of its contribution;
199+
2. If your pull request addresses an issue, please mention the issue number in
200+
the pull request description to make sure they are linked (and people
201+
consulting the issue know you are working on it);
202+
3. To indicate a work in progress please prefix the title with `[WIP]`, or mark
203+
the PR as a draft PR. These are useful to avoid duplicated work, and to differentiate
204+
it from PRs ready to be merged;
205+
4. Make sure existing tests pass;
206+
5. Add high-coverage tests. No quality testing = no merge.
207+
208+
See an example of a good PR here: https://github.com/huggingface/nanotron/pull/155
209+
210+
### Tests
211+
212+
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in
213+
the [tests folder](https://github.com/huggingface/nanotron/tree/main/tests).
214+
215+
We use `pytest` in order to run the tests. From the root of the
216+
repository, here's how to run tests with `pytest` for the library:
217+
218+
```bash
219+
# Runs all tests (where 12 of which run in parallel)
220+
$ pytest -n 12 tests
221+
```
222+
223+
You can specify a smaller set of tests in order to test only the feature
224+
you're working on.

README.md

+29-81
Original file line numberDiff line numberDiff line change
@@ -23,99 +23,47 @@
2323
<h3 align="center">
2424
<a href="https://huggingface.co/nanotron"><img style="float: middle; padding: 10px 10px 10px 10px;" width="60" height="55" src="https://huggingface.co/datasets/huggingface/brand-assets/resolve/main/hf-logo.png" /></a>
2525
</h3>
26+
<h3 align="center">
27+
<p>Pretraining models made easy
28+
</h3>
2629

2730

31+
Nanotron is a library for pretraining transformer models. It provides a simple and flexible API to pretrain models on custom datasets. Nanotron is designed to be easy to use, fast, and scalable. It is built with the following principles in mind:
2832

29-
#
30-
31-
The objective of this library is to provide easy distributed primitives in order to train a variety of models efficiently using 3D parallelism. For more information about the internal design of the library or 3D parallelism in general, please check out [[docs.md]](./docs/docs.md) and [[3d_parallelism.md]](./docs/3d_parallelism.md).
32-
33-
34-
# Philosophy
35-
36-
- Make it fast. At least as fast as other open source versions.
37-
- Make it minimal. We don't actually need to support all techniques and all versions of 3D parallelism. What matters is that we can efficiently use the "best" ones.
38-
- Make everything explicit instead of transparent. As we move forward, making things transparent works well when it works well but is a horrible debugging experience if one doesn't understand the implications of techniques used. In order to mitigate this, we choose to be explicit in the way it does things
39-
40-
# Core Features
41-
42-
We support the following:
43-
- 3D parallelism, including one-forward-one-backward pipeline engine
44-
- ZeRO-1 optimizer
45-
- FP32 gradient accumulation
46-
- Parameter tying/sharding
47-
- Spectral µTransfer parametrization for scaling up neural networks
48-
49-
# Installation
33+
- **Simplicity**: Nanotron is designed to be easy to use. It provides a simple and flexible API to pretrain models on custom datasets.
34+
- **Performance**: Optimized for speed and scalability, Nanotron uses the latest techniques to train models faster and more efficiently.
5035

51-
Requirements:
52-
- Python >= 3.10
53-
- PyTorch >= 2.0.0
54-
- Flash-Attention >= 2.5.0
36+
## Installation
5537

56-
To install (in a new env):
5738
```bash
58-
pip install torch
59-
pip install packaging; pip install "flash-attn>=2.5.0" --no-build-isolation
60-
pip install nanotron
39+
# Requirements: Python>=3.10
40+
git clone https://github.com/huggingface/nanotron
41+
cd nanotron
42+
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu121
43+
pip install -e .
44+
45+
# Install dependencies if you want to use the example scripts
46+
pip install datasets transformers
47+
pip install "flash-attn>=2.5.0" --no-build-isolation
6148
```
49+
> [!NOTE]
50+
> If you get `undefined symbol: ncclCommRegister` error you should install torch 2.1.2 instead: `pip install torch==2.1.2 --index-url https://download.pytorch.org/whl/cu121`
6251
63-
Also nice to have: `pip install transformers datasets python-etcd tensorboardX`
64-
65-
We also support a set of flavors that you can install using `pip install -e [$FLAVOR]`:
66-
- `dev`: Used is you are developping in `nanotron`. It installs in particular our linter mechanism. On top of that you have to run `pre-commit install` afterwards.
67-
- `test`: We use `pytest` in order to run out testing suite. In order to run tests in parallel, it will install `pytest-xdist`, which you can leverage by running `pytest -n 12 tests` (12 is the number of parallel test)
68-
69-
70-
# Quick examples
71-
72-
In the `/examples` directory, you can find a few example configuration file, and a script to run it.
73-
74-
You can run a sample training using:
75-
```bash
76-
torchrun --nproc_per_node=8 run_train.py --config-file examples/train_tiny_llama.sh
77-
```
52+
> [!TIP]
53+
> We log to wandb automatically if it's installed. For that you can use `pip install wandb`. If you don't want to use wandb, you can run `wandb disabled`.
7854
79-
And run a sample generation using:
55+
## Quick Start
56+
### Training a tiny Llama model
57+
The following command will train a tiny Llama model on a single node with 8 GPUs. The model will be saved in the `checkpoints` directory as specified in the config file.
8058
```bash
81-
torchrun --nproc_per_node=8 run_generation.py --ckpt-path checkpoints/text/4
59+
CUDA_DEVICE_MAX_CONNECTIONS=1 torchrun --nproc_per_node=8 run_train.py --config-file examples/config_tiny_llama.yaml
8260
```
8361

84-
# Development guidelines
85-
86-
If you plan on developing on `nanotron`, we suggest you install the `dev` flavor: `pip install -e ".[dev]"`
87-
88-
We use pre-commit to run a bunch of callbacks on each commit, mostly normalization code in order for the codebase to stay consistent. Please do run `pre-commit install`.
89-
90-
For the linting:
62+
### Run generation from your checkpoint
9163
```bash
92-
pre-commit install
93-
pre-commit run --config .pre-commit-config.yaml --all-files
64+
torchrun --nproc_per_node=1 run_generate.py --ckpt-path checkpoints/10/ --pp 1 --tp 1
9465
```
66+
> [!TIP]
67+
> We could set a larger TP for faster generation, and a larger PP in case of very large models.
9568
96-
*As a part of making sure we aren't slowed down as the codebase grows, we will not merge a PR if the features it introduces do not have test coverage.*
97-
98-
We have extensions built on top of Nanotron, with their tests located in the `/examples` folder. Since VSCode defaults to discovering tests only in the `/tests` folder, please run tests from both `/examples` and `/tests` to ensure your PR does not break these extensions. Please run `make tests` to execute all the nanotron tests and the tests in the `/examples` directory that you need to pass.
99-
100-
Features we would like to add:
101-
- [ ] Support `torch.compile`
102-
- [ ] More optimized kernels
103-
- [ ] Support Zero3
104-
- [ ] Other PP schedules (such as Interleaved 1f1b...)
105-
- [ ] Ring attention / Sequence Parallelism
106-
- [ ] 3D Parallel MoEs
107-
- [ ] Supporting more architectures (Mamba..)
108-
- [ ] ...
109-
110-
111-
# Useful scripts
112-
- `scripts/log_lighteval_to_wandb.py`: logs the evaluation results of LightEval to wandb, including summary statistics.
113-
114-
115-
# Environment Variables
116-
- `NANOTRON_BENCHMARK=1`: if you want to log the throughput during training
117-
118-
119-
# Credits
120-
121-
We would like to thank everyone working on LLMs, especially those sharing their work openly from which we took great inspiration: Nvidia for `Megatron-LM/apex`, Microsoft for `DeepSpeed`, HazyResearch for `flash-attn`
69+
## Config file description

0 commit comments

Comments
 (0)