Overview

Code for "Detoxifying Large Language Models via the Diversity of Toxic Samples"

Requirements

conda create -n DivDetox python=3.10
conda activate DivDetox
pip install -r requirements.txt

Multi-Category-Induced Personalized Sample Generation Strategy (MPSG)

You can generate the train dataset by this code

# generate train dataset
bash run_gen_traindata.sh
# select train dataset, change 'id' to 'icl'/'noprompt' to obtain negative/positive samples.
python chose_traindataset.py

Change the model name to obtain the dataset for different models. You can also use the dataset processed by us in ./train_datasets.

Scaled Contrastive DPO (SC-DPO)

Fine-tune the models with the train dataset

bash run_ft.sh

You can alse use the trained model in huggingface by "zhaozhao9898/DivDetox-Llama-3-8B"

Evaluation

Generate sentences by cdpt and evaluate the performance of the trained model. Please enter YOUR_API_KEY in config.yml to use the perspective API.

bash run_generations.sh

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
prompts		prompts
scripts		scripts
train_datasets		train_datasets
transformers		transformers
utils		utils
chose_traindataset.py		chose_traindataset.py
config.yml		config.yml
readme.md		readme.md
requirement.txt		requirement.txt
run_ft.sh		run_ft.sh
run_gen_traindata.sh		run_gen_traindata.sh
run_generations.sh		run_generations.sh
traindata_evaluator.py		traindata_evaluator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Requirements

Multi-Category-Induced Personalized Sample Generation Strategy (MPSG)

Scaled Contrastive DPO (SC-DPO)

Evaluation

About

Uh oh!

Releases

Packages

Languages

SAINLP/DivDetox

Folders and files

Latest commit

History

Repository files navigation

Overview

Requirements

Multi-Category-Induced Personalized Sample Generation Strategy (MPSG)

Scaled Contrastive DPO (SC-DPO)

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages