GitHub - sleeepeer/PIArena: PIArena: A Platform for Prompt Injection Evaluation

A Platform for Prompt Injection Evaluation

PIArena is an easy-to-use toolbox and also a comprehensive benchmark for researching prompt injection attacks and defenses. It provides:

Plug-and-play Attacks & Defenses – Easily integrate state-of-the-art defenses into your workflow to protect your LLM system against prompt injection attacks. You can also play with existing attack strategies to perform a better research.
Systematic Evaluation Benchmark – End-to-end evaluation pipeline enables you to easily evaluate attacks / defenses on various datasets.
Add Your Own – You can also easily integrate your own attack or defense into our benchmark to systematically assess how well it perform.

📝 Quick Start

⚙️ Installation

Clone the project and setup python environment:

git clone [email protected]:sleeepeer/PIArena.git
cd PIArena
conda create -n piarena python=3.10 -y
conda activate piarena
pip install -r requirements.txt
pip install --upgrade setuptools pip
pip install -e .   # Install piarena as an editable package

Login to HuggingFace 🤗 with your HuggingFace Access Token, you can find it at this link:

huggingface-cli login

📌 Ready-to-use Tools

You can simply import attacks and defenses and integrate them into your own code. Please see details in Attack docs and Defense docs.

from piarena.attacks import get_attack
from piarena.defenses import get_defense
from piarena.llm import Model

llm = Model("Qwen/Qwen3-4B-Instruct-2507")
defense = get_defense("promptguard")
attack = get_attack("combined")

📈 Run Evaluation

Use main.py to run the benchmark:

# Using CLI arguments
python main.py --dataset squad_v2 --attack direct --defense none

# Using a YAML config file
python main.py --config configs/experiments/my_experiment.yaml

# Run many experiments in parallel across GPUs
# Edit the configuration section in scripts/run.py to set GPUs, datasets, attacks, defenses
# The scheduler automatically assigns jobs to the least-loaded GPU
python scripts/run.py

Available Datasets: Please see HuggingFace/PIArena.

Available Attacks:

none - No attack (baseline)
direct - Directly attack using injected prompt (default)
combined - Formalizing and Benchmarking Prompt Injection Attacks and Defenses
ignore - Ignore Previous Prompt: Attack Techniques For Language Models
completion - Prompt injection attacks against GPT-3
character - Delimiters won’t save you from prompt injection
nanogcg - GCG and nanoGCG
tap - TAP: A Query-Efficient Method for Jailbreaking Black-Box LLMs
pair - PAIR: Jailbreaking black box large language models in twenty queries
strategy_search - Strategy search attack based on defense feedback introduced in PIArena.

Available Defenses:

none - No defense (baseline, default)
datasentinel - DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks
attentiontracker - Attention Tracker: Detecting Prompt Injection Attacks in LLMs
piguard - PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free
promptguard - Meta Prompt Guard
secalign - SecAlign: Defending Against Prompt Injection with Preference Optimization (uses Meta-SecAlign model)
promptlocate - PromptLocate: Localizing Prompt Injection Attacks
promptarmor - PromptArmor: Simple yet Effective Prompt Injection Defenses
pisanitizer - PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization
datafilter - Defending Against Prompt Injection with DataFilter

🔍 Search-based Attacks

PIArena supports search-based attacks (PAIR, TAP, Strategy Search) that iteratively refine injected prompts using an attack LLM. Use main_search.py for these attacks:

# --attack can be tap, pair, strategy_search
python main_search.py --dataset squad_v2 --attack strategy_search --defense datafilter \
  --backend_llm Qwen/Qwen3-4B-Instruct-2507 --attacker_llm Qwen/Qwen3-4B-Instruct-2507

# Run many search experiments in parallel
# Edit scripts/run_search.py to configure GPUs, attacks, defenses, datasets
python scripts/run_search.py

See Strategy Search for details.

🔍 Reinforcement Learning-based Attacks

Building upon PIArena (including defenses and benchmarks), this repository provides the code for PISmith, a reinforcement learning-based framework for red teaming prompt injection defenses.

🤖 Agent Benchmarks

PIArena also supports agentic benchmarks: InjecAgent, AgentDojo and AgentDyn.

Setup Agent Benchmarks

# AgentDojo / AgentDyn
cd agents/agentdojo && pip install -e . && cd ../..

InjecAgent Evaluation

python main_injecagent.py --model meta-llama/Llama-3.1-8B-Instruct --defense none

AgentDojo / AgentDyn Evaluation

# Original AgentDojo suite with OpenAI API
export OPENAI_API_KEY="Your API Key Here"
python main_agentdojo.py --model gpt-5-mini --attack none --suite workspace

# Original AgentDojo suite with a PIArena defense
python main_agentdojo.py --model meta-llama/Llama-3.1-8B-Instruct --attack tool_knowledge --defense datafilter --suite workspace

# Merged AgentDyn suite with a PIArena defense
python main_agentdojo.py --model gpt-4o-2024-08-06 --attack important_instructions --defense datafilter --suite shopping

# Benchmark-native defense from the merged AgentDojo / AgentDyn tree
python main_agentdojo.py --model gpt-4o-2024-08-06 --attack important_instructions --defense prompt_guard_2_detector --suite shopping

The same main_agentdojo.py entrypoint is used for both benchmark families:

AgentDojo suites: workspace, slack, travel, banking
AgentDyn suites: shopping, github, dailylife

PIArena integrates defenses to work in AgentDojo and AgentDyn. Benchmark-native defenses such as tool_filter, repeat_user_prompt, piguard_detector, and prompt_guard_2_detector are also available through the same runner.

🙋🏻‍♀️ Add your own attacks / defenses

Please see Extending PIArena for full details.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
agents		agents
assets		assets
configs		configs
datasets		datasets
docs		docs
piarena		piarena
scripts		scripts
website		website
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
main_agentdojo.py		main_agentdojo.py
main_injecagent.py		main_injecagent.py
main_search.py		main_search.py
print_results.ipynb		print_results.ipynb
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Platform for Prompt Injection Evaluation

Table of Contents

📝 Quick Start

⚙️ Installation

📌 Ready-to-use Tools

📈 Run Evaluation

🔍 Search-based Attacks

🔍 Reinforcement Learning-based Attacks

🤖 Agent Benchmarks

Setup Agent Benchmarks

InjecAgent Evaluation

AgentDojo / AgentDyn Evaluation

🙋🏻‍♀️ Add your own attacks / defenses

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A Platform for Prompt Injection Evaluation

Table of Contents

📝 Quick Start

⚙️ Installation

📌 Ready-to-use Tools

📈 Run Evaluation

🔍 Search-based Attacks

🔍 Reinforcement Learning-based Attacks

🤖 Agent Benchmarks

Setup Agent Benchmarks

InjecAgent Evaluation

AgentDojo / AgentDyn Evaluation

🙋🏻‍♀️ Add your own attacks / defenses

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages