Distributed Multi‑GPU LLM Fine‑Tuning Projects

This repository showcases hands-on projects leveraging distributed multi-GPU training to fine-tune large language models (LLMs), demonstrating expertise in PyTorch Distributed, DeepSpeed, Ray (Tune, Train), and MosaicML's LLM Foundry. Each project includes detailed experiment tracking, evaluation, and final model weights.

Projects Overview

Project	Framework / Tool	Model	Hardware	Experiment Tracking	Resources
PyTorch DDP Multi-GPU Training	PyTorch DDP	Qwen2-0.5B-Instruct	2×T4 16GB	MLflow	Training Notebook Evaluation Notebook HF Model
PyTorch FSDP Multi-GPU Training	PyTorch FSDP	OPT-1.3B	2×T4 16GB	W&B	Training Notebook Evaluation Notebook HF Model W&B
DeepSpeed ZeRO-2 Offload Training	DeepSpeed ZeRO-2 Offload	Llama-3.2-1B-Instruct	1×P100 16GB¹	W&B	Training Notebook Evaluation Notebook HF Model W&B
DeepSpeed Pipeline Parallelism	DeepSpeed Pipeline + ZeRO-1	Llama-3.2-1B-Instruct	2×T4 16GB	W&B	Training Notebook Evaluation Notebook HF Model W&B
LLM Foundry FSDP Fine-tuning	MosaicML's LLM Foundry, FSDP	OPT-1.3B	2×T4 16GB	W&B	Training Notebook HF Model W&B
Ray Train with DeepSpeed ZeRO-3	Ray Train, DeepSpeed ZeRO-3	BLOOMZ-1b1	2×T4 16GB	W&B	Training Notebook HF Model W&B
Ray Tune Hyperparameter Optimization	Ray Tune, PyTorch	Qwen2-0.5B-Instruct	2×T4 16GB	W&B	Training Notebook HF Model W&B

Most experiments were run on Kaggle with 2 × T4 16GB GPUs

DeepSpeed ZeRO-2 offload peaked at ~37 GB CPU RAM, exceeding Kaggle’s 30 GB CPU RAM limit, so the project was run on Vast.ai. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 572 Commits
deepspeed-offload		deepspeed-offload
deepspeed-pipeline		deepspeed-pipeline
llm-foundry-finetune		llm-foundry-finetune
pytorch-ddp		pytorch-ddp
pytorch-fsdp		pytorch-fsdp
ray-train		ray-train
ray-tune		ray-tune
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Distributed Multi‑GPU LLM Fine‑Tuning Projects

Projects Overview

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

sparklerz/multigpu-llm-finetuning

Folders and files

Latest commit

History

Repository files navigation

Distributed Multi‑GPU LLM Fine‑Tuning Projects

Projects Overview

Footnotes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Distributed Multi‑GPU LLM Fine‑Tuning Projects

Packages