Skip to content

sleeepeer/PIArena

Repository files navigation

PIArena

A Platform for Prompt Injection Evaluation

ProjectPage HuggingFace LeaderBoard Paper Star

PIArena is an easy-to-use toolbox and also a comprehensive benchmark for researching prompt injection attacks and defenses. It provides:

  • Plug-and-play Attacks & Defenses – Easily integrate state-of-the-art defenses into your workflow to protect your LLM system against prompt injection attacks. You can also play with existing attack strategies to perform a better research.
  • Systematic Evaluation Benchmark – End-to-end evaluation pipeline enables you to easily evaluate attacks / defenses on various datasets.
  • Add Your Own – You can also easily integrate your own attack or defense into our benchmark to systematically assess how well it perform.

Table of Contents

πŸ“ Quick Start

βš™οΈ Installation

Clone the project and setup python environment:

git clone [email protected]:sleeepeer/PIArena.git
cd PIArena
conda create -n piarena python=3.10 -y
conda activate piarena
pip install -r requirements.txt
pip install --upgrade setuptools pip
pip install -e .   # Install piarena as an editable package

Login to HuggingFace πŸ€— with your HuggingFace Access Token, you can find it at this link:

huggingface-cli login

πŸ“Œ Ready-to-use Tools

You can simply import attacks and defenses and integrate them into your own code. Please see details in Attack docs and Defense docs.

from piarena.attacks import get_attack
from piarena.defenses import get_defense
from piarena.llm import Model

llm = Model("Qwen/Qwen3-4B-Instruct-2507")
defense = get_defense("promptguard")
attack = get_attack("combined")

πŸ“ˆ Run Evaluation

Use main.py to run the benchmark:

# Using CLI arguments
python main.py --dataset squad_v2 --attack direct --defense none

# Using a YAML config file
python main.py --config configs/experiments/my_experiment.yaml

# Run many experiments in parallel across GPUs
# Edit the configuration section in scripts/run.py to set GPUs, datasets, attacks, defenses
# The scheduler automatically assigns jobs to the least-loaded GPU
python scripts/run.py

Available Datasets: Please see HuggingFace/PIArena.

Available Attacks:

Available Defenses:

πŸ” Search-based Attacks

PIArena supports search-based attacks (PAIR, TAP, Strategy Search) that iteratively refine injected prompts using an attack LLM. Use main_search.py for these attacks:

# --attack can be tap, pair, strategy_search
python main_search.py --dataset squad_v2 --attack strategy_search --defense datafilter \
  --backend_llm Qwen/Qwen3-4B-Instruct-2507 --attacker_llm Qwen/Qwen3-4B-Instruct-2507

# Run many search experiments in parallel
# Edit scripts/run_search.py to configure GPUs, attacks, defenses, datasets
python scripts/run_search.py

See Strategy Search for details.

πŸ” Reinforcement Learning-based Attacks

Building upon PIArena (including defenses and benchmarks), this repository provides the code for PISmith, a reinforcement learning-based framework for red teaming prompt injection defenses.

πŸ€– Agent Benchmarks

PIArena also supports agentic benchmarks: InjecAgent, AgentDojo and AgentDyn.

Setup Agent Benchmarks

# AgentDojo / AgentDyn
cd agents/agentdojo && pip install -e . && cd ../..

InjecAgent Evaluation

python main_injecagent.py --model meta-llama/Llama-3.1-8B-Instruct --defense none

AgentDojo / AgentDyn Evaluation

# Original AgentDojo suite with OpenAI API
export OPENAI_API_KEY="Your API Key Here"
python main_agentdojo.py --model gpt-5-mini --attack none --suite workspace

# Original AgentDojo suite with a PIArena defense
python main_agentdojo.py --model meta-llama/Llama-3.1-8B-Instruct --attack tool_knowledge --defense datafilter --suite workspace

# Merged AgentDyn suite with a PIArena defense
python main_agentdojo.py --model gpt-4o-2024-08-06 --attack important_instructions --defense datafilter --suite shopping

# Benchmark-native defense from the merged AgentDojo / AgentDyn tree
python main_agentdojo.py --model gpt-4o-2024-08-06 --attack important_instructions --defense prompt_guard_2_detector --suite shopping

The same main_agentdojo.py entrypoint is used for both benchmark families:

  • AgentDojo suites: workspace, slack, travel, banking
  • AgentDyn suites: shopping, github, dailylife

PIArena integrates defenses to work in AgentDojo and AgentDyn. Benchmark-native defenses such as tool_filter, repeat_user_prompt, piguard_detector, and prompt_guard_2_detector are also available through the same runner.

πŸ™‹πŸ»β€β™€οΈ Add your own attacks / defenses

Please see Extending PIArena for full details.

About

PIArena: A Platform for Prompt Injection Evaluation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors