PIArena is an easy-to-use toolbox and also a comprehensive benchmark for researching prompt injection attacks and defenses. It provides:
- Plug-and-play Attacks & Defenses β Easily integrate state-of-the-art defenses into your workflow to protect your LLM system against prompt injection attacks. You can also play with existing attack strategies to perform a better research.
- Systematic Evaluation Benchmark β End-to-end evaluation pipeline enables you to easily evaluate attacks / defenses on various datasets.
- Add Your Own β You can also easily integrate your own attack or defense into our benchmark to systematically assess how well it perform.
Clone the project and setup python environment:
git clone [email protected]:sleeepeer/PIArena.git
cd PIArena
conda create -n piarena python=3.10 -y
conda activate piarena
pip install -r requirements.txt
pip install --upgrade setuptools pip
pip install -e . # Install piarena as an editable packageLogin to HuggingFace π€ with your HuggingFace Access Token, you can find it at this link:
huggingface-cli loginYou can simply import attacks and defenses and integrate them into your own code. Please see details in Attack docs and Defense docs.
from piarena.attacks import get_attack
from piarena.defenses import get_defense
from piarena.llm import Model
llm = Model("Qwen/Qwen3-4B-Instruct-2507")
defense = get_defense("promptguard")
attack = get_attack("combined")Use main.py to run the benchmark:
# Using CLI arguments
python main.py --dataset squad_v2 --attack direct --defense none
# Using a YAML config file
python main.py --config configs/experiments/my_experiment.yaml
# Run many experiments in parallel across GPUs
# Edit the configuration section in scripts/run.py to set GPUs, datasets, attacks, defenses
# The scheduler automatically assigns jobs to the least-loaded GPU
python scripts/run.pyAvailable Datasets: Please see HuggingFace/PIArena.
Available Attacks:
none- No attack (baseline)direct- Directly attack using injected prompt (default)combined- Formalizing and Benchmarking Prompt Injection Attacks and Defensesignore- Ignore Previous Prompt: Attack Techniques For Language Modelscompletion- Prompt injection attacks against GPT-3character- Delimiters wonβt save you from prompt injectionnanogcg- GCG and nanoGCGtap- TAP: A Query-Efficient Method for Jailbreaking Black-Box LLMspair- PAIR: Jailbreaking black box large language models in twenty queriesstrategy_search- Strategy search attack based on defense feedback introduced in PIArena.
Available Defenses:
none- No defense (baseline, default)datasentinel- DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacksattentiontracker- Attention Tracker: Detecting Prompt Injection Attacks in LLMspiguard- PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Freepromptguard- Meta Prompt Guardsecalign- SecAlign: Defending Against Prompt Injection with Preference Optimization (uses Meta-SecAlign model)promptlocate- PromptLocate: Localizing Prompt Injection Attackspromptarmor- PromptArmor: Simple yet Effective Prompt Injection Defensespisanitizer- PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitizationdatafilter- Defending Against Prompt Injection with DataFilter
PIArena supports search-based attacks (PAIR, TAP, Strategy Search) that iteratively refine injected prompts using an attack LLM. Use main_search.py for these attacks:
# --attack can be tap, pair, strategy_search
python main_search.py --dataset squad_v2 --attack strategy_search --defense datafilter \
--backend_llm Qwen/Qwen3-4B-Instruct-2507 --attacker_llm Qwen/Qwen3-4B-Instruct-2507
# Run many search experiments in parallel
# Edit scripts/run_search.py to configure GPUs, attacks, defenses, datasets
python scripts/run_search.pySee Strategy Search for details.
Building upon PIArena (including defenses and benchmarks), this repository provides the code for PISmith, a reinforcement learning-based framework for red teaming prompt injection defenses.
PIArena also supports agentic benchmarks: InjecAgent, AgentDojo and AgentDyn.
# AgentDojo / AgentDyn
cd agents/agentdojo && pip install -e . && cd ../..python main_injecagent.py --model meta-llama/Llama-3.1-8B-Instruct --defense none# Original AgentDojo suite with OpenAI API
export OPENAI_API_KEY="Your API Key Here"
python main_agentdojo.py --model gpt-5-mini --attack none --suite workspace
# Original AgentDojo suite with a PIArena defense
python main_agentdojo.py --model meta-llama/Llama-3.1-8B-Instruct --attack tool_knowledge --defense datafilter --suite workspace
# Merged AgentDyn suite with a PIArena defense
python main_agentdojo.py --model gpt-4o-2024-08-06 --attack important_instructions --defense datafilter --suite shopping
# Benchmark-native defense from the merged AgentDojo / AgentDyn tree
python main_agentdojo.py --model gpt-4o-2024-08-06 --attack important_instructions --defense prompt_guard_2_detector --suite shoppingThe same main_agentdojo.py entrypoint is used for both benchmark families:
- AgentDojo suites:
workspace,slack,travel,banking - AgentDyn suites:
shopping,github,dailylife
PIArena integrates defenses to work in AgentDojo and AgentDyn.
Benchmark-native defenses such as tool_filter, repeat_user_prompt, piguard_detector, and prompt_guard_2_detector are also available through the same runner.
Please see Extending PIArena for full details.