Code for "Detoxifying Large Language Models via the Diversity of Toxic Samples"
conda create -n DivDetox python=3.10
conda activate DivDetox
pip install -r requirements.txtYou can generate the train dataset by this code
# generate train dataset
bash run_gen_traindata.sh
# select train dataset, change 'id' to 'icl'/'noprompt' to obtain negative/positive samples.
python chose_traindataset.pyChange the model name to obtain the dataset for different models.
You can also use the dataset processed by us in ./train_datasets.
Fine-tune the models with the train dataset
bash run_ft.shYou can alse use the trained model in huggingface by "zhaozhao9898/DivDetox-Llama-3-8B"
Generate sentences by cdpt and evaluate the performance of the trained model.
Please enter YOUR_API_KEY in config.yml to use the perspective API.
bash run_generations.sh