First release readme (pytorch#227)

Reworked readme to highlight first release and feature set. q - use our logo? (I think it adds some spark). Visual preview: <img width="898" alt="Screenshot 2024-04-14 at 7 02 39 PM" src="https://github.com/pytorch/torchtitan/assets/46302957/60b4b6a8-c4f3-41a8-8d8d-27b924f8de15">
weifengpy · Apr 16, 2024 · a10262a · a10262a
1 parent f86bfb2
commit a10262a
Show file tree

Hide file tree

Showing 3 changed files with 39 additions and 11 deletions.
diff --git a/README.md b/README.md
@@ -1,18 +1,45 @@
 # torchtitan
+<p align="center">
+  <picture>
+    <source media="(prefers-color-scheme: light)" srcset="https://github.com/lessw2020/TorchTitan/blob/1ab9828ae6aa0e6508d9a7002d743d96d85e8599/assets/images/TorchTitan_logo_main.jpg">
+    <img alt="TorchTitan_Logo" width=35%>
+  </picture>
+</p>
 
-Note: This repository is currently under heavy development.
+## torchtitan is still in pre-release!
+`torchtitan` is currently in a pre-release state and under extensive development.
 
-`torchtitan` is a proof-of-concept for Large-scale LLM training using native PyTorch. It is (and will continue to be) a repo to showcase PyTorch's latest distributed training features in a clean, minimal codebase. torchtitan is complementary to and not a replacement for any of the great large-scale LLM training codebases such as Megatron, Megablocks, LLM Foundry, Deepspeed, etc. Instead, we hope that the features showcased in torchtitan will be adopted by these codebases quickly. torchtitan is unlikely to ever grow a large community around it.
+`torchtitan` is a native PyTorch reference architecture showcasing some of the latest PyTorch techniques for large scale model training.
+* Designed to be easy to understand, use and extend for different training purposes.
+* Minimal changes to the model code when applying 1D, 2D, or (soon) 3D Parallel.
+* Modular components instead of monolithic codebase.
+* Get started in minutes, not hours!
 
-## Design Principles
+Please note: `torchtitan` is a proof-of-concept for Large-scale LLM training using native PyTorch. It is (and will continue to be) a repo to showcase PyTorch's latest distributed training features in a clean, minimal codebase. torchtitan is complementary to and not a replacement for any of the great large-scale LLM training codebases such as Megatron, Megablocks, LLM Foundry, Deepspeed, etc. Instead, we hope that the features showcased in torchtitan will be adopted by these codebases quickly. torchtitan is unlikely to ever grow a large community around it.
 
-While torchtitan utilizes the PyTorch ecosystem for things like data loading (i.e. HuggingFace datasets), the core functionality is written in PyTorch.
+## Pre-Release Updates:
+#### (4/16/2024): TorchTitan is now public but in a pre-release state and under development.  Currently we showcase pre-training Llama2 models (LLMs) of various sizes from scratch.
+
+Key features available:</br>
+1 - [FSDP2 (per param sharding)](https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md) </br>
+2 - Tensor Parallel (FSDP + Tensor Parallel)</br>
+3 - Selective layer and op activation checkpointing </br>
+4 - Distributed checkpointing (asynch pending) </br>
+5 - 3 datasets pre-configured (47K - 144M)</br>
+6 - GPU usage, MFU, tokens per second and other metrics all reported and displayed via TensorBoard.</br>
+7 - optional Fused RMSNorm, learning rate scheduler, meta init, and more.</br>
+8 - All options easily configured via toml files.</br>
+
+
+## Coming soon features:
+1 - Asynch checkpointing </br>
+2 - FP8 support </br>
+3 - Context Parallel </br>
+4 - 3D (Pipeline Parallel) </br>
+5 - Torch Compile support </br>
 
-* Designed to be easy to understand, use and extend for different training purposes.
-* Minimal changes to the model code, when applying 1D/2D or 3D Parallelisms.
-* Modular components instead of monolithic codebase
 
-# Installation
+## Installation
 
 Install PyTorch from source or install the latest pytorch nightly, then install requirements by
 
@@ -31,7 +58,7 @@ run the llama debug model locally to verify the setup is correct:
 ./run_llama_train.sh
 ```
 
-# TensorBoard
+## TensorBoard
 
 To visualize TensorBoard metrics of models trained on a remote server via a local web browser:
 

diff --git a/assets/images/readme.md b/assets/images/readme.md
@@ -0,0 +1 @@
+images folder for main repo
diff --git a/train.py b/train.py
@@ -390,8 +390,8 @@ def loss_fn(pred, labels):
                 )
 
     if torch.distributed.get_rank() == 0:
-        logger.info("Sleeping 1 second for other ranks to complete")
-        time.sleep(1)
+        logger.info("Sleeping for 2 seconds for others ranks to complete ")
+        time.sleep(2)
 
     metric_logger.close()
     logger.info("Training completed")