From 7c3a1fac24ec5f0ce31b2b801d28ed22299f2200 Mon Sep 17 00:00:00 2001 From: Less Wright Date: Mon, 14 Oct 2024 17:21:10 -0700 Subject: [PATCH] add arxiv badge and link to torchtitan paper (#618) This PR: 1 - adds an arvix badge underneath the two CI badges that links to the titan arxiv paper. 2 - adds a two sentence section in the main body summarizing our paper with link 3 - makes a minor grammar correction regarding using latest nightly. --- README.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 48ac1b25..51ae1bf9 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,7 @@ # torchtitan -`torchtitan` is currently in a pre-release state and under extensive development. Currently we showcase pre-training **Llama 3.1**, **Llama 3**, and **Llama 2** LLMs of various sizes from scratch. To use the latest features of `torchtitan`, we recommend latest PyTorch nightly. +`torchtitan` is currently in a pre-release state and under extensive development. Currently we showcase pre-training **Llama 3.1**, **Llama 3**, and **Llama 2** LLMs of various sizes from scratch. To use the latest features of `torchtitan`, we recommend using the most recent PyTorch nightly. `torchtitan` is a proof-of-concept for Large-scale LLM training using native PyTorch. It is (and will continue to be) a repo to showcase PyTorch's latest distributed training features in a clean, minimal codebase. torchtitan is complementary to and not a replacement for any of the great large-scale LLM training codebases such as Megatron, Megablocks, LLM Foundry, Deepspeed, etc. Instead, we hope that the features showcased in torchtitan will be adopted by these codebases quickly. torchtitan is unlikely to ever grow a large community around it. @@ -18,6 +18,12 @@ Our guiding principles when building `torchtitan`: [![Welcome to torchtitan!](assets/images/titan_play_video.png)](https://youtu.be/ee5DOEqD35I?si=_B94PbVv0V5ZnNKE "Welcome to torchtitan!") +### Our torchtitan paper on arXiv + +[![arXiv](https://img.shields.io/badge/arXiv-2410.06511-b31b1b.svg?style=plastic)](https://arxiv.org/abs/2410.06511) + +We provide a detailed look into the parallelisms and optimizations available in `torchtitan`, along with summary advice on when to use various techniques: [TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training](https://arxiv.org/abs/2410.06511) + ### Dive into the code You may want to see how the model is defined or how parallelism techniques are applied. For a guided tour, see these files first: