This repository contains code, experiments, and a report for a study on the Liger Kernel and its application in memory-efficient training of Large Language Models (LLMs). The report analyzes the Liger Kernel's techniques, including fused kernel operations and chunking strategies, and demonstrates its impact on memory usage and throughput.
.
├── .gitignore # Git ignore file
├── all_gpu_liger_kernel.py # Script for single GPU Liger Kernel experiment
├── deepspeed_liger_kernel.py # Script for DeepSpeed CPU Offloading, CPU Adam, Gradient Checkpointing and Liger Kernel experiment using DeepSpeed
├── drawing.py # Script for generating comparison plots
├── requirements.txt # Python package requirements
├── README.md # This README file
├── run_ds.sh # Shell script for running the DeepSpeed experiment
├── src # Source code directory for DeepSpeed experiment
│ ├── __init__.py
│ ├── ds_utils.py
│ ├── model_utils.py
│ ├── optimizer_utils.py
│ └── utils.py
└── doc # Report and slides directory
├── image # Images used in the report
├── report.md # Report in markdown format
├── group_10_midterm.docx # Report in Word format
├── group_10_midterm.pdf # Report in PDF format
├── group_10_liger_kernel.pptx # Presentation slides
└── group_10_liger_kernel.pdf # Presentation slides in PDF format
-
Python Environment: Ensure you have a Python environment with the required packages installed. You can create a virtual environment and install the dependencies using:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
-
Hugging Face Hub: You'll need to be logged in to the Hugging Face Hub to download the Llama model. Follow the instructions at huggingface.co to create an account and obtain an access token. Then, configure your Hugging Face access token:
huggingface-cli login
To run the single GPU experiment using Liger Kernel:
python all_gpu_liger_kernel.py
This script will run a fake training iteration with Liger Kernel enabled and output peak memory usage, iteration latency, and throughput. It will also save a memory snapshot. Modify the ENABLE_ROPE
, ENABLE_SWIGLU
, ENABLE_RMS
, and ENABLE_FLCE
variables in the script to control which Liger Kernel optimizations are applied.
This experiment uses the DeepSpeed library to offload all static memory into CPU memory, and also use CPU Adam to offload the optimizer operations to the CPU. The experiment also uses gradient checkpointing to reduce activation memory usage. After all these optimizations, the impact of the Liger Kernel can be observed.
To run the experiment:
-
Modify the parameters in
run_ds.sh
, such asNUM_GPUS
,PER_DEVICE_TRAIN_BATCH_SIZE
,MAX_SEQ_LENGTH
, etc., to adjust the experiment settings. -
Run the script:
bash run_ds.sh
This script will run a fake training iteration using DeepSpeed and Liger Kernel. The results (peak memory, throughput, latency) will be printed to the console and saved in results/<snapshot_name>.txt
. A memory snapshot will also be saved to the snap_shots
directory, and TensorBoard profiling logs will be written to the logs
directory.
To visualize the profiler logs, use TensorBoard:
tensorboard --logdir=logs/
Use the PyTorch memory visualizer as described in pytorch.org/memory_viz to analyze the memory snapshots saved in the snap_shots
directory.
The script drawing.py
generates comparison plots for throughput and peak memory usage between different strategies (e.g., with and without Liger Kernel). The data for these plots are hardcoded within the drawing.py
script. You'll need to update these values with the results of your experiments to generate accurate comparison plots. Run the script:
python drawing.py
The plots will be saved in the current directory as PNG files.
The doc
directory contains the report (report.md
, group_10_midterm.docx
, group_10_midterm.pdf
) and presentation slides (group_10_liger_kernel.pptx
) summarizing the study on the Liger Kernel. So if you want to know more about the study, please refer to these documents.