Name	Name	Last commit message	Last commit date
Latest commit yuchenlin Update README.md Jan 26, 2024 1417dc4 · Jan 26, 2024 History 10 Commits
data	data	update the scripts (init commit)	Dec 24, 2023
src	src	update the scripts (init commit)	Dec 24, 2023
README.md	README.md	Update README.md	Jan 26, 2024

AlignTDS

Analyzing the Alignment of LLMs through the lens of Token Distribution Shift (TDS). Part of the Re-Align project by AI2 Mosaic. More info on our website and in our paper.

Alignment as Token Distribution Shifts 🔄

What changes does alignment tuning bring? 🧐

Analysis of TDS: Our approach involves comparing token distributions between base and aligned Large Language Models (LLMs) to understand the impact of alignment tuning.

The Analysis Pipeline ⚙️

Choose a pair of LLMs (e.g., Llama-2 and Llama-2-chat).
Get answer o from the aligned LLM.
Input the context to the base LLM and get the token distribution for the next position P_base.
Analyze the differences in distribution to understand the effects of alignment tuning.

Types of Token Positions Based on TDS 📊

Unshifted positions: 🏠 Aligned token is also top 1 in P_base.
Marginal positions: 🌿 Aligned token ranks 2nd or 3rd by P_base.
Shifted positions: 🚀 Aligned token is outside the top 3 in P_base.

Web Demos for TDS analysis 🌐

Visualize token distribution shifts easily with our web demos:
- TDS demo: Llama-2-7b vs Llama-2-7b-chat (shifted ratio: 7.8%)
- TDS demo: Llama-2-7b vs Vicuna-7b-v1.5 (shifted ratio: 4.8%)
- TDS demo: Mistral-7b vs Mistral-7b-instruct (shifted ratio: 5.2%)

Key Findings 🔑

Only a small fraction of tokens are affected by alignment. The base and aligned LLMs usually share the same top-ranked tokens.
Alignment mainly changes stylistic elements, around 5-8% of positions.
Earlier tokens are more critical for alignment. The top token of the aligned model is often in the top 5 of the base model.
Base LLMs are already primed to follow instructions given an appropriate context.

Token Distribution Shift Analysis

Knowledge content comes from base LLMs.

Click to show/hide image 🖼️

TDS across different LLM pairs.

Click to show/hide images 🖼️

Learnings from alignment tuning.

Click to show/hide image 🖼️

TDS diminishes over time during decoding.

Click to show/hide images 🖼️

Procedures 🛠️

Generate outputs from aligned models

We use a generated output file containing the responses of aligned models. Filepath example: data/Llama-2-7b-chat-hf.json. See the repo URIAL for generation details.

Run Logit Analysis 📊

Save the token Logits of aligned models

# i2i   
instruct_data_file="data/Llama-2-7b-chat-hf.json"
logits_folder="saved_logits/just_eval_1000/llama2/shards/"
# i2i
mkdir -p $logits_folder 
n_shards=4 # or 1 if you only have one gpu
shard_size=250 # or 1000 if you only have one gpu
start_gpu=0
for ((start = 0, end = (($shard_size)), gpu = $start_gpu; gpu < $n_shards+$start_gpu; start += $shard_size, end += $shard_size, gpu++)); do
    CUDA_VISIBLE_DEVICES=$gpu python src/logit_analysis.py \
                --data_file $instruct_data_file \
                --logits_folder $logits_folder \
                --pair llama \
                --mode i2i \
                --start $start --end $end &  
done
# Merge the shards
python src/scripts/merge_logits.py saved_logits/just_eval_1000/llama/ llama i2i

Save the token logits of base models

logits_folder="saved_logits/just_eval_1000/llama2_tp/shards/"
mkdir -p $logits_folder
n_shards=4
shard_size=250
start_gpu=0
for ((start = 0, end = (($shard_size)), gpu = $start_gpu; gpu < $n_shards+$start_gpu; start += $shard_size, end += $shard_size, gpu++)); do
    CUDA_VISIBLE_DEVICES=$gpu python src/logit_analysis.py \
                --data_file $instruct_data_file \
                --enable_template \
                --logits_folder $logits_folder \
                --pair llama2 \
                --mode i2b \
                --i2i_pkl_file saved_logits/just_eval_1000/llama2/llama2-i2i.pkl \
                --start $start --end $end & 
done
# Merge the shards
python src/scripts/merge_logits.py saved_logits/just_eval_1000/llama2_tp/ llama2 i2b

Data Reformatting

python src/demo/data_prep.py llama2_tp saved_logits/just_eval_1000/llama2/llama2-i2i.pkl saved_logits/just_eval_1000/llama2_tp/llama2-i2b.pkl

Generate HTML pages for visualization

python src/demo/generate_html.py llama2_tp

TODOs 📝

Integrate model generation into the logit computation process.
Use vllm lib for efficiency improvements.
Create an interactive demo.
Add more data from larger LLMs.
Compare models fine-tuned in different ways.

Citation 📄

@article{Lin2023ReAlign,
    author = {Bill Yuchen Lin and Abhilasha Ravichander and Ximing Lu and Nouha Dziri and Melanie Sclar and Khyathi Chandu and Chandra Bhagavatula and Yejin Choi},
    journal = {ArXiv preprint},
    title = {The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning},
    year = {2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlignTDS

Alignment as Token Distribution Shifts 🔄

What changes does alignment tuning bring? 🧐

The Analysis Pipeline ⚙️

Types of Token Positions Based on TDS 📊

Web Demos for TDS analysis 🌐

Key Findings 🔑

Token Distribution Shift Analysis

Procedures 🛠️

Generate outputs from aligned models

Run Logit Analysis 📊

Save the token Logits of aligned models

Save the token logits of base models

Data Reformatting

Generate HTML pages for visualization

TODOs 📝

Citation 📄

About

Languages

Re-Align/AlignTDS

Folders and files

Latest commit

History

Repository files navigation

AlignTDS

Alignment as Token Distribution Shifts 🔄

What changes does alignment tuning bring? 🧐

The Analysis Pipeline ⚙️

Types of Token Positions Based on TDS 📊

Web Demos for TDS analysis 🌐

Key Findings 🔑

Token Distribution Shift Analysis

Procedures 🛠️

Generate outputs from aligned models

Run Logit Analysis 📊

Save the token Logits of aligned models

Save the token logits of base models

Data Reformatting

Generate HTML pages for visualization

TODOs 📝

Citation 📄

About

Resources

Stars

Watchers

Forks

Languages