Skip to content

adaumsilva/LuminaLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LuminaLLM: Domain-Specific Fine-Tuning Pipeline

Author: Adam Silva GitHub: https://github.com/adaumsilva/
Specialization: AI Engineering, RAG Architectures, and MLOps.


LuminaLLM is a research-grade repository dedicated to the adaptation of open-source Large Language Models (LLMs) to specialized domains. By utilizing Parameter-Efficient Fine-Tuning (PEFT) and 4-bit quantization (QLoRA), this project demonstrates how to achieve state-of-the-art performance on niche tasks while minimizing computational overhead.


Key Features

  • Efficient Fine-Tuning: Implementation of QLoRA to reduce VRAM requirements, allowing 7B+ parameter models to be tuned on accessible hardware.
  • Modular Ingestion: Custom data pipelines for transforming unstructured text into instruction-following or chat-completion formats.
  • Experiment Tracking: Integrated support for Weights & Biases (W&B) to monitor gradient norms, loss curves, and GPU utilization.
  • Quantization & Merging: Scripts for loading models in 4/8-bit and merging LoRA weights back into the base model for production deployment.
  • Performance Benchmarking: Comparative evaluation tools to measure the delta between base models and fine-tuned versions.

Requirements

  • Python 3.10+
  • CUDA-capable GPU (≥ 16 GB VRAM recommended for Mistral-7B; 24 GB for larger models)
  • CUDA 11.8 or 12.1

Setup

# 1. Clone and enter the repo
git clone https://github.com/your-org/LuminaLLM.git
cd LuminaLLM

# 2. Create a virtual environment
python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

# 3. Install PyTorch (CUDA 12.1 wheel)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# 4. Install project dependencies
pip install -r requirements.txt

# 5. (Recommended) Install Flash Attention 2 for faster training
pip install flash-attn --no-build-isolation

# 6. Configure secrets
cp .env.example .env
# Edit .env and fill in HF_TOKEN, WANDB_API_KEY, etc.

Data Preparation

# Convert Alpaca-format JSON to JSONL with a val split
python scripts/prepare_data.py \
    --input  data/raw/alpaca_data.json \
    --output data/train.jsonl \
    --format alpaca \
    --val-output data/val.jsonl \
    --val-split 0.05

# Convert ShareGPT JSONL
python scripts/prepare_data.py \
    --input  data/raw/sharegpt.jsonl \
    --output data/train.jsonl \
    --format sharegpt

# Convert OpenAI chat-format JSONL
python scripts/prepare_data.py \
    --input  data/raw/chat_data.jsonl \
    --output data/train.jsonl \
    --format openai

Expected output schema per line:

{"instruction": "Explain quantum entanglement", "input": "", "output": "Quantum entanglement is..."}

Exploratory Data Analysis

jupyter notebook notebooks/01_eda.ipynb

The notebook produces:

  • Token-length histograms and percentile tables for max_seq_length selection
  • Field coverage report
  • Duplicate detection
  • GPU VRAM estimate per batch size

Training

Edit configs/finetune.yaml to set your base model, LoRA rank, learning rate, etc., then run:

python scripts/train.py --config configs/finetune.yaml

Override any YAML field inline:

python scripts/train.py --config configs/finetune.yaml \
    model.base_model_id=meta-llama/Meta-Llama-3-8B-Instruct \
    lora.r=32 \
    training.num_train_epochs=1 \
    training.report_to=tensorboard

Key config knobs (configs/finetune.yaml):

Section Key Default Notes
model base_model_id mistralai/Mistral-7B-Instruct-v0.2 Any HF Hub causal LM
model attn_implementation flash_attention_2 Set eager if Flash Attention not installed
quantization bnb_4bit_quant_type nf4 nf4 outperforms fp4 empirically
lora r 64 Higher = more capacity, more VRAM
lora lora_alpha 128 Effective scale = alpha / r
lora use_rslora true Rank-stabilised LoRA (recommended)
training optim paged_adamw_8bit Saves ~2 GB vs standard AdamW
data packing true ConstantLengthDataset; maximises GPU utilisation

Evaluation

# Perplexity on the validation split
python scripts/evaluate.py \
    --adapter outputs/mistral-7b-qlora/final_adapter \
    --config  configs/finetune.yaml \
    --mode    perplexity

# ROUGE-1/2/L on a held-out test file
python scripts/evaluate.py \
    --adapter    outputs/mistral-7b-qlora/final_adapter \
    --config     configs/finetune.yaml \
    --mode       rouge \
    --test-file  data/test.jsonl \
    --output     outputs/rouge_results.json

# LLM-as-a-judge (requires OPENAI_API_KEY)
python scripts/evaluate.py \
    --adapter    outputs/mistral-7b-qlora/final_adapter \
    --config     configs/finetune.yaml \
    --mode       judge \
    --test-file  data/test.jsonl \
    --judge-model gpt-4o-mini \
    --output     outputs/judge_results.json

Inference

# Interactive REPL with the merged model
python scripts/inference.py \
    --model outputs/mistral-7b-merged \
    --mode  interactive

# Interactive REPL with base model + adapter (no merge required)
python scripts/inference.py \
    --base-model mistralai/Mistral-7B-Instruct-v0.2 \
    --adapter    outputs/mistral-7b-qlora/final_adapter \
    --mode       interactive \
    --template   mistral

# Batch inference
python scripts/inference.py \
    --model       outputs/mistral-7b-merged \
    --mode        batch \
    --input-file  data/test.jsonl \
    --output-file outputs/predictions.jsonl \
    --greedy

Repository Structure

LuminaLLM/
├── configs/
│   └── finetune.yaml          # All hyperparameters (single source of truth)
├── scripts/
│   ├── train.py               # QLoRA SFT training entry point
│   ├── evaluate.py            # Perplexity / ROUGE / LLM-as-a-judge
│   ├── inference.py           # Interactive and batch inference
│   └── prepare_data.py        # Raw → JSONL conversion
├── src/
│   ├── data/
│   │   ├── __init__.py
│   │   └── dataset.py         # DatasetPipeline, prompt formatters
│   └── model/
│       ├── __init__.py
│       ├── builder.py         # build_bnb_config, build_model_and_tokenizer, build_peft_model
│       └── utils.py           # Parameter counting, GPU telemetry, merge_and_save
├── notebooks/
│   └── 01_eda.ipynb           # Exploratory data analysis
├── outputs/                   # Checkpoints, adapters, merged models (gitignored)
├── data/                      # Local datasets (gitignored)
├── .env.example
├── .gitignore
└── requirements.txt

Experiment Tracking

Set training.report_to: wandb in the config and provide WANDB_API_KEY in .env.

Logged metrics:

  • train/loss, train/grad_norm, train/learning_rate
  • eval/loss, eval/perplexity (computed post-eval)
  • GPU VRAM usage via custom callback

For TensorBoard:

training.report_to: tensorboard
tensorboard --logdir outputs/

Memory Optimisation Reference

Technique VRAM saving Config key
4-bit NF4 quantization ~75 % of model weights quantization.load_in_4bit
Double quantization ~0.4 GB extra quantization.bnb_4bit_use_double_quant
Gradient checkpointing ~30–40 % activations training.gradient_checkpointing
Paged AdamW 8-bit ~2 GB optimizer states training.optim
Flash Attention 2 ~20 % activations model.attn_implementation
Sequence packing Maximises batch utilisation data.packing

Citation

@misc{luminallm2024,
  title  = {LuminaLLM: QLoRA Fine-Tuning Pipeline},
  year   = {2024},
  url    = {https://github.com/your-org/LuminaLLM}
}

About

A specialized fine-tuning repository for adapting open-source LLMs (Llama 3/Mistral) to niche domains using QLoRA, PEFT, and Hugging Face integration.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors