Skip to content

This repository showcases hands-on projects leveraging distributed multi-GPU training to fine-tune large language models (LLMs).

License

Notifications You must be signed in to change notification settings

sparklerz/multigpu-llm-finetuning

Repository files navigation

Distributed Multi‑GPU LLM Fine‑Tuning Projects

This repository showcases hands-on projects leveraging distributed multi-GPU training to fine-tune large language models (LLMs), demonstrating expertise in PyTorch Distributed, DeepSpeed, Ray (Tune, Train), and MosaicML's LLM Foundry. Each project includes detailed experiment tracking, evaluation, and final model weights.

Projects Overview

Project Framework / Tool Model Hardware Experiment Tracking Resources
PyTorch DDP Multi-GPU Training PyTorch DDP Qwen2-0.5B-Instruct 2×T4 16GB MLflow
PyTorch FSDP Multi-GPU Training PyTorch FSDP OPT-1.3B 2×T4 16GB W&B
DeepSpeed ZeRO-2 Offload Training DeepSpeed ZeRO-2 Offload Llama-3.2-1B-Instruct 1×P100 16GB1 W&B
DeepSpeed Pipeline Parallelism DeepSpeed Pipeline + ZeRO-1 Llama-3.2-1B-Instruct 2×T4 16GB W&B
LLM Foundry FSDP Fine-tuning MosaicML's LLM Foundry, FSDP OPT-1.3B 2×T4 16GB W&B
Ray Train with DeepSpeed ZeRO-3 Ray Train, DeepSpeed ZeRO-3 BLOOMZ-1b1 2×T4 16GB W&B
Ray Tune Hyperparameter Optimization Ray Tune, PyTorch Qwen2-0.5B-Instruct 2×T4 16GB W&B

Most experiments were run on Kaggle with 2 × T4 16GB GPUs

Footnotes

  1. DeepSpeed ZeRO-2 offload peaked at ~37 GB CPU RAM, exceeding Kaggle’s 30 GB CPU RAM limit, so the project was run on Vast.ai.

About

This repository showcases hands-on projects leveraging distributed multi-GPU training to fine-tune large language models (LLMs).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages