This is a repositiory for my Final Year Project in NUS, on adversarial attacks on LLMs.
On unzipping the file we will have 2 folders -
- llm_attacks
- model_red_teaming
llm_attacks use a modified version of the code from Zou et. al. (2023), and focus on token-level attacks. model_red_teaming is based on a version of Mehrotra et. al. (2023) and Chao et. al. (2023). Sysprompts are taken from Mehrotra et. al. (2023).