Skip to content

SAINLP/DivDetox

Repository files navigation

Overview

Code for "Detoxifying Large Language Models via the Diversity of Toxic Samples"

Requirements

conda create -n DivDetox python=3.10
conda activate DivDetox
pip install -r requirements.txt

Multi-Category-Induced Personalized Sample Generation Strategy (MPSG)

You can generate the train dataset by this code

# generate train dataset
bash run_gen_traindata.sh
# select train dataset, change 'id' to 'icl'/'noprompt' to obtain negative/positive samples.
python chose_traindataset.py

Change the model name to obtain the dataset for different models. You can also use the dataset processed by us in ./train_datasets.

Scaled Contrastive DPO (SC-DPO)

Fine-tune the models with the train dataset

bash run_ft.sh

You can alse use the trained model in huggingface by "zhaozhao9898/DivDetox-Llama-3-8B"

Evaluation

Generate sentences by cdpt and evaluate the performance of the trained model. Please enter YOUR_API_KEY in config.yml to use the perspective API.

bash run_generations.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages