Skip to content

comnetsAD/LLMSafetyGuardrails

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MultiTaskBench: COLING 2025

Citation

Please consider citing the following paper if you use this code or data in your work:

@misc{jan2024multitaskmayhemunveilingmitigating,
      title={Multitask Mayhem: Unveiling and Mitigating Safety Gaps in LLMs Fine-tuning}, 
      author={Essa Jan and Nouar AlDahoul and Moiz Ali and Faizan Ahmad and Fareed Zaffar and Yasir Zaki},
      year={2024},
      eprint={2409.15361},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2409.15361}, 
} 

MultiTaskBench Safety Dataset

The MultiTaskBench dataset is available in datasets/multitaskbench

The directory contains a .csv files, containing the combined 2000 data, 500 queries for each task (Classification, Text Generation, Code Generation and Translation).

Harmful Testing Dataset

Dataset to measure the Attack Success Rate (ASR) of a model is available in datasets/harmful.

Fine-Tuning Code

The code used to fine-tune open source models is available in code/Finetuning.

GPT Judge Evaluation Code

The code used to evaluate LLM responses is available in code/evaluation

Fine-Tuned Model Access

All the Models finetuned for the paper are available on our Huggingface.

Licensing

Code is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •