Commit 7042d7a
TE Gemma tutorial attempt#2 (#1839)
* add tutorial files and other local changes
Signed-off-by: Sudhakar Singh <[email protected]>
* remove extraneous code for easy debu
Signed-off-by: Sudhakar Singh <[email protected]>
* make cuda graphs work with non-paged and paged attention
Signed-off-by: Sudhakar Singh <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* perf imp for kv cache ops
Signed-off-by: Sudhakar Singh <[email protected]>
* add code for calibration
Signed-off-by: Sudhakar Singh <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* optimize kv_cache reindex and copy kernels
Signed-off-by: Charlene Yang <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* changes to make quantizers work with fp8_calibration
Signed-off-by: Sudhakar Singh <[email protected]>
* avoid reindexing from python side
Signed-off-by: Charlene Yang <[email protected]>
* rename variable from previous commit
Signed-off-by: Charlene Yang <[email protected]>
* minor fix
Signed-off-by: Charlene Yang <[email protected]>
* minor fix
Signed-off-by: Charlene Yang <[email protected]>
* use quantizer only if needed
Signed-off-by: Sudhakar Singh <[email protected]>
* functionality of the tutorial tested and perf checked
Signed-off-by: Sudhakar Singh <[email protected]>
* remove files and update headers/licenses
Signed-off-by: Sudhakar Singh <[email protected]>
* update header/license
Signed-off-by: Sudhakar Singh <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update tutorial for review
Signed-off-by: Sudhakar Singh <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* make weights downloadable on the fly; remove extra print statements
Signed-off-by: Sudhakar Singh <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix lint and update comments
Signed-off-by: Sudhakar Singh <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add comma back, typo
Signed-off-by: Sudhakar Singh <[email protected]>
* sequence_start_positions should be None for training
Signed-off-by: Sudhakar Singh <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add paged attention numberes and update requirements.txt file
Signed-off-by: Sudhakar Singh <[email protected]>
* more fixes
Signed-off-by: Sudhakar Singh <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* make tutorial work on blackwell
Signed-off-by: Sudhakar Singh <[email protected]>
* remove gemma FT tutorial for now
Signed-off-by: Sudhakar Singh <[email protected]>
* fixing the headings placement and rewording attention -> kv caching
Signed-off-by: Sudhakar Singh <[email protected]>
* fixes from comments
Signed-off-by: Sudhakar Singh <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix the images
Signed-off-by: Sudhakar Singh <[email protected]>
* misc fixes
Signed-off-by: Sudhakar Singh <[email protected]>
* add more comments to te_gemma.py and cleanup utils.py
Signed-off-by: Sudhakar Singh <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add more information about the hierarchy of the classes used in the tutorial
Signed-off-by: Sudhakar Singh <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add better cuda graphs picture
Signed-off-by: Sudhakar Singh <[email protected]>
* addd updated cuda graphs pictures
Signed-off-by: Sudhakar Singh <[email protected]>
* add illustrated cuda graphs
Signed-off-by: Sudhakar Singh <[email protected]>
* fix
Signed-off-by: Sudhakar Singh <[email protected]>
* small fixes in documentation
Signed-off-by: Sudhakar Singh <[email protected]>
* add torch.no_grad() to force reduced memory usage
Signed-off-by: Sudhakar Singh <[email protected]>
* some fixes from recent comments
Signed-off-by: Sudhakar Singh <[email protected]>
* more fixes from remaining comments
Signed-off-by: Sudhakar Singh <[email protected]>
* add te_rope_emb to class desc
Signed-off-by: Sudhakar Singh <[email protected]>
* fix tutorial wording; add calibration fix to grouped_linear.py
Signed-off-by: Sudhakar Singh <[email protected]>
---------
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Charlene Yang <[email protected]>1 parent ba37529 commit 7042d7a
File tree
23 files changed
+5152
-33
lines changed- docs
- examples
- te_gemma
- media
- te_llama
- transformer_engine/pytorch
- attention
- csrc/extensions
- module
23 files changed
+5152
-33
lines changedLoading
Loading
Loading
Loading
Loading
0 commit comments