Breaking AI Systems: From Image Classifiers to LLM Agents

Hands-on adversarial attacks across the full AI stack -- from white-box image perturbations to LLM jailbreaking and multimodal exploitation -- plus a 186-jailbreak case study.

Guest Lecture -- CS 6501: Security of AI Systems | University of Virginia, February 2026

Raja Sekhar Rao Dheekonda | Distinguished Engineer @ Dreadnode

Slides (PDF)

Prerequisites

Accounts

Dreadnode Platform -- platform.dreadnode.io
- Sign up for a free account
- Copy your API key from the dashboard
- Required for Demo 1 (Crucible challenge) and result tracking
Groq -- console.groq.com
- API key for Llama 4 Maverick model access
- Used for all LLM demos (attacker, evaluator, and target)

System Requirements

Python 3.10 -- 3.13
uv (recommended) or pip

Setup

1. Install uv (if not already installed)

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

2. Clone the repository

git clone https://github.com/rdheekonda/breaking-ai-systems.git
cd breaking-ai-systems

3. Create the environment and install dependencies

uv sync

4. Set your API keys

Copy the example environment file and fill in your keys:

cp .env-example .env
# Edit .env with your actual API keys

5. Launch Jupyter

uv run jupyter notebook demos/

6. Verify setup

Open and run demos/00_setup_verification.ipynb to confirm everything is working.

Demos

#	Notebook	What It Does
Setup	00_setup_verification.ipynb	Verify your environment, dependencies, and API keys are configured correctly before running demos
1	adversarial_model_evasion.ipynb	Force a MobileNetV2 classifier to misclassify a wolf as a "Granny Smith apple" using white-box PGD and black-box HopSkipJump attacks
2	llm_text_model_probing.ipynb	Use Tree of Attacks with Pruning (TAP) to automatically discover jailbreak prompts that bypass an LLM's safety guardrails
3	multimodal_probing.ipynb	Exploit cross-modal attack surfaces by splitting harmful intent across text and image inputs to bypass multimodal safety filters
Case Study	case_study_186_jailbreaks.ipynb	Analyze results from running TAP, GOAT, and Crescendo at scale -- 186 jailbreaks discovered in 137 minutes of automated red teaming

Attack Surface Coverage

Image Classifiers --> LLM Text --> Multimodal

Demo	Attack Type	Technique	Access Level
1	Adversarial perturbation	PGD	White-box (full gradients)
1	Adversarial evasion	HopSkipJump	Black-box (query only)
2	Jailbreak generation	TAP	API access
3	Cross-modal exploitation	Text + image splitting	API access

Models Used

All LLM calls are routed through LiteLLM, providing a unified interface to 100+ providers. Swap the groq/ prefix (e.g., openai/, anthropic/, azure/, bedrock/) to target any model.

Model	Provider	Role
Llama 4 Maverick 17B 128E Instruct	Groq	Attacker, Evaluator, Target (Demos 2--3, Case Study)
MobileNetV2 (ImageNet)	Crucible / Dreadnode	Target (Demo 1)

Core Dependencies

Library	Purpose
LiteLLM >= 1.40	Unified LLM API interface
torchvision >= 0.17	Pre-trained models and image transforms
Adversarial Robustness Toolbox >= 1.18	PGD, HopSkipJump, and other attack implementations
Rich >= 13.0	Terminal output formatting

Project Structure

breaking-ai-systems/
├── core/
│   ├── __init__.py         # Public API — all exports
│   ├── models.py           # Shared model loading + preprocessing (MobileNetV2)
│   ├── pgd.py              # White-box PGD attack (PGDResult + run_pgd)
│   ├── hop_skip_jump.py    # Black-box HopSkipJump attack (HSJResult + run_hsj)
│   ├── tap.py              # Tree of Attacks with Pruning (TAP) implementation
│   ├── multimodal.py       # Multimodal attack interface (cross-modal probing)
│   ├── display.py          # Shared Rich comparison tables
│   ├── visual.py           # Matplotlib visualization helpers
│   ├── transforms.py       # Image transforms for multimodal attacks
│   └── utils.py            # Crucible flag submission helpers
├── data/
│   ├── reference.png       # Wolf image for Demo 1 (adversarial evasion)
│   └── meth.png            # Reference image for Demo 3 (multimodal probing)
├── demos/
│   ├── 00_setup_verification.ipynb
│   ├── adversarial_model_evasion.ipynb
│   ├── llm_text_model_probing.ipynb
│   ├── multimodal_probing.ipynb
│   └── case_study_186_jailbreaks.ipynb
├── Breaking AI Systems - UVA Guest Lecture 2026 .pdf
├── .env-example            # Template for API keys
└── pyproject.toml

Resources

Slides: Breaking AI Systems -- UVA Guest Lecture 2026 (PDF)
Blog: 186 Jailbreaks: Applying MLOps to AI Red Teaming
Full eval notebook: AI Red Teaming Eval -- complete implementation orchestrating TAP, GOAT, and Crescendo at scale
Crucible CTF: platform.dreadnode.io/crucible

Frameworks Referenced

Framework	Focus
MITRE ATLAS	Adversarial Threat Landscape for AI Systems
OWASP Top 10 for LLM Applications	Security risks specific to LLM deployments

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
core		core
data		data
demos		demos
.env-example		.env-example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
.secrets.baseline		.secrets.baseline
Breaking AI Systems - UVA Guest Lecture 2026 .pdf		Breaking AI Systems - UVA Guest Lecture 2026 .pdf
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Breaking AI Systems: From Image Classifiers to LLM Agents

Prerequisites

Accounts

System Requirements

Setup

1. Install uv (if not already installed)

2. Clone the repository

3. Create the environment and install dependencies

4. Set your API keys

5. Launch Jupyter

6. Verify setup

Demos

Attack Surface Coverage

Models Used

Core Dependencies

Project Structure

Resources

Frameworks Referenced

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Breaking AI Systems: From Image Classifiers to LLM Agents

Prerequisites

Accounts

System Requirements

Setup

1. Install uv (if not already installed)

2. Clone the repository

3. Create the environment and install dependencies

4. Set your API keys

5. Launch Jupyter

6. Verify setup

Demos

Attack Surface Coverage

Models Used

Core Dependencies

Project Structure

Resources

Frameworks Referenced

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages