GitHub - poofeth/DXRL-Starter

Below is a ready‑to‑drop‑in README.md that walks through provisioning an Ubuntu 22.04 (or newer) box with CUDA 12.8 + A100 for TRL‑based GRPO training. Everything is written as sequential shell blocks so you can copy‑paste one chunk at a time.

# ⚡️ A100 CUDA 12.8 — GRPO Training Stack (PyTorch + TRL + vLLM + FlashAttention + W&B)

This guide assumes **Ubuntu 22.04+**, a working **CUDA 12.8** driver/toolkit, and an **A100** GPU.  
The same steps work on H100 & Blackwell cards; adjust compute capability flags if you build from source.

---

## 0  Prerequisites

```bash
# Sanity‑check that Ubuntu sees the GPU & CUDA 12.8
nvidia-smi          # should list A100 + Driver built against CUDA 12.8
nvcc --version      # CUDA compilation tools release 12.8

1  System update & build tooling

sudo apt update && sudo apt -y full-upgrade

# Core build chain + Python headers (needed for PyTorch / FlashAttention wheels)
sudo apt install -y build-essential pkg-config git curl wget ca-certificates \
  libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev \
  libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev \
  liblzma-dev ninja-build

sudo apt-get install python3-dev python3.12-dev

2  Install pyenv (Python version / venv manager)

    apt install python3.12-venv

We already have Python 3.12.3 system‑wide, but pyenv lets us isolate project envs and pin micro‑versions as needed. (Medium)

3  Create a clean Python 3.12 env for training

python3 -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip wheel setuptools

4  Install PyTorch + CUDA 12.8 wheels

pip install --pre torch torchvision torchaudio \
  --index-url https://download.pytorch.org/whl/nightly/cu128   # cu128 wheels

Official cu128 wheels are distributed via the PyTorch extra index. (GitHub)

✅ Quick GPU test

python - <<'PY'
import torch, platform
print("PyTorch", torch.__version__, "CUDA", torch.version.cuda)
print("GPU OK:", torch.cuda.is_available())
print("Device:", torch.cuda.get_device_name(0))
PY

5  Install TRL (Transformer RL) & friends

pip install trl transformers datasets evaluate accelerate

(The current TRL 0.19.x line supports GRPO, DPO, RM, etc.)

6  Install vLLM pre‑built for CUDA 12.8

# Pull the wheel compiled against cu128 and the PyTorch you already installed
pip install vllm --extra-index-url https://download.pytorch.org/whl/cu128

vLLM ships wheels built with CUDA 12.8—no manual compile needed. (vLLM)

7  Install FlashAttention 2 / 3

Flash‑Attn requires a short source build to link against your local CUDA toolkit:

pip install flash-attn --no-build-isolation     # fastest path
#   or, if you need FlashAttention‑3 for Hopper/H100:
# git clone https://github.com/Dao-AILab/flash-attention.git && cd flash-attention
# python setup.py install
#then lets do liger kernel for some extra speed and memory savings...
pip install liger-kernel

Flash‑Attn 2.8+ targets CUDA ≥ 12.3, with 12.8 recommended for best perf. (PyPI)

8  Weights & Biases (experiment tracking)

pip install wandb
wandb login   # paste your API key

Lightweight install—no GPU code. (PyPI)

9  Optional data / viz tooling

pip install pandas pyarrow numpy matplotlib seaborn tqdm rich

These cover Parquet IO (pyarrow), general data munging (pandas), and quick exploratory plots (matplotlib, seaborn) without clashing with the deep‑learning stack.

🎉 You’re ready to launch GRPO runs using TRL. Want to know more? https://huggingface.co/docs/trl/main/en/grpo_trainer

python -m trl.grpo_trainer \
  --model_name_or_path facebook/opt-1.3b \
  --reward_model my_reward_model \
  --logging_steps 10 \
  --report_to wandb

For distributed multi‑GPU launches on the A100, consult DeepSpeed or Accelerate config files as needed.

Happy training!


---

**Anything missing?**  
If you need Triton, cuDNN, or additional monitoring (e.g., `gpustat`, `nvtop`), just add them under a new bullet—this scaffold is meant to stay minimal and conflict‑free.
::contentReference[oaicite:5]{index=5}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
README.md		README.md
TIPSANDTRICKS.md		TIPSANDTRICKS.md
datatest.py		datatest.py
test_grpo_reward.py		test_grpo_reward.py
test_reward_model.py		test_reward_model.py
token_branching_tree_1.png		token_branching_tree_1.png
token_branching_tree_2.png		token_branching_tree_2.png
torchtest.py		torchtest.py
trainGRPO.py		trainGRPO.py
trajectoryvisual.py		trajectoryvisual.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1  System update & build tooling

2  Install pyenv (Python version / venv manager)

3  Create a clean Python 3.12 env for training

4  Install PyTorch + CUDA 12.8 wheels

✅ Quick GPU test

5  Install TRL (Transformer RL) & friends

6  Install vLLM pre‑built for CUDA 12.8

7  Install FlashAttention 2 / 3

8  Weights & Biases (experiment tracking)

9  Optional data / viz tooling

🎉 You’re ready to launch GRPO runs using TRL. Want to know more? https://huggingface.co/docs/trl/main/en/grpo_trainer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

1 System update & build tooling

2 Install pyenv (Python version / venv manager)

3 Create a clean Python 3.12 env for training

4 Install PyTorch + CUDA 12.8 wheels

✅ Quick GPU test

5 Install TRL (Transformer RL) & friends

6 Install vLLM pre‑built for CUDA 12.8

7 Install FlashAttention 2 / 3

8 Weights & Biases (experiment tracking)

9 Optional data / viz tooling

🎉 You’re ready to launch GRPO runs using TRL. Want to know more? https://huggingface.co/docs/trl/main/en/grpo_trainer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1  System update & build tooling

2  Install pyenv (Python version / venv manager)

3  Create a clean Python 3.12 env for training

4  Install PyTorch + CUDA 12.8 wheels

5  Install TRL (Transformer RL) & friends

6  Install vLLM pre‑built for CUDA 12.8

7  Install FlashAttention 2 / 3

8  Weights & Biases (experiment tracking)

9  Optional data / viz tooling

🎉 You’re ready to launch GRPO runs using TRL. Want to know more? https://huggingface.co/docs/trl/main/en/grpo_trainer

Packages