Hands-on adversarial attacks across the full AI stack -- from white-box image perturbations to LLM jailbreaking and multimodal exploitation -- plus a 186-jailbreak case study.
Guest Lecture -- CS 6501: Security of AI Systems | University of Virginia, February 2026
Raja Sekhar Rao Dheekonda | Distinguished Engineer @ Dreadnode
-
Dreadnode Platform -- platform.dreadnode.io
- Sign up for a free account
- Copy your API key from the dashboard
- Required for Demo 1 (Crucible challenge) and result tracking
-
Groq -- console.groq.com
- API key for Llama 4 Maverick model access
- Used for all LLM demos (attacker, evaluator, and target)
- Python 3.10 -- 3.13
- uv (recommended) or pip
# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"git clone https://github.com/rdheekonda/breaking-ai-systems.git
cd breaking-ai-systemsuv syncCopy the example environment file and fill in your keys:
cp .env-example .env
# Edit .env with your actual API keysuv run jupyter notebook demos/Open and run demos/00_setup_verification.ipynb to confirm everything is working.
| # | Notebook | What It Does |
|---|---|---|
| Setup | 00_setup_verification.ipynb | Verify your environment, dependencies, and API keys are configured correctly before running demos |
| 1 | adversarial_model_evasion.ipynb | Force a MobileNetV2 classifier to misclassify a wolf as a "Granny Smith apple" using white-box PGD and black-box HopSkipJump attacks |
| 2 | llm_text_model_probing.ipynb | Use Tree of Attacks with Pruning (TAP) to automatically discover jailbreak prompts that bypass an LLM's safety guardrails |
| 3 | multimodal_probing.ipynb | Exploit cross-modal attack surfaces by splitting harmful intent across text and image inputs to bypass multimodal safety filters |
| Case Study | case_study_186_jailbreaks.ipynb | Analyze results from running TAP, GOAT, and Crescendo at scale -- 186 jailbreaks discovered in 137 minutes of automated red teaming |
Image Classifiers --> LLM Text --> Multimodal
| Demo | Attack Type | Technique | Access Level |
|---|---|---|---|
| 1 | Adversarial perturbation | PGD | White-box (full gradients) |
| 1 | Adversarial evasion | HopSkipJump | Black-box (query only) |
| 2 | Jailbreak generation | TAP | API access |
| 3 | Cross-modal exploitation | Text + image splitting | API access |
All LLM calls are routed through LiteLLM, providing a unified interface to 100+ providers. Swap the groq/ prefix (e.g., openai/, anthropic/, azure/, bedrock/) to target any model.
| Model | Provider | Role |
|---|---|---|
| Llama 4 Maverick 17B 128E Instruct | Groq | Attacker, Evaluator, Target (Demos 2--3, Case Study) |
| MobileNetV2 (ImageNet) | Crucible / Dreadnode | Target (Demo 1) |
| Library | Purpose |
|---|---|
| LiteLLM >= 1.40 | Unified LLM API interface |
| torchvision >= 0.17 | Pre-trained models and image transforms |
| Adversarial Robustness Toolbox >= 1.18 | PGD, HopSkipJump, and other attack implementations |
| Rich >= 13.0 | Terminal output formatting |
breaking-ai-systems/
├── core/
│ ├── __init__.py # Public API — all exports
│ ├── models.py # Shared model loading + preprocessing (MobileNetV2)
│ ├── pgd.py # White-box PGD attack (PGDResult + run_pgd)
│ ├── hop_skip_jump.py # Black-box HopSkipJump attack (HSJResult + run_hsj)
│ ├── tap.py # Tree of Attacks with Pruning (TAP) implementation
│ ├── multimodal.py # Multimodal attack interface (cross-modal probing)
│ ├── display.py # Shared Rich comparison tables
│ ├── visual.py # Matplotlib visualization helpers
│ ├── transforms.py # Image transforms for multimodal attacks
│ └── utils.py # Crucible flag submission helpers
├── data/
│ ├── reference.png # Wolf image for Demo 1 (adversarial evasion)
│ └── meth.png # Reference image for Demo 3 (multimodal probing)
├── demos/
│ ├── 00_setup_verification.ipynb
│ ├── adversarial_model_evasion.ipynb
│ ├── llm_text_model_probing.ipynb
│ ├── multimodal_probing.ipynb
│ └── case_study_186_jailbreaks.ipynb
├── Breaking AI Systems - UVA Guest Lecture 2026 .pdf
├── .env-example # Template for API keys
└── pyproject.toml
- Slides: Breaking AI Systems -- UVA Guest Lecture 2026 (PDF)
- Blog: 186 Jailbreaks: Applying MLOps to AI Red Teaming
- Full eval notebook: AI Red Teaming Eval -- complete implementation orchestrating TAP, GOAT, and Crescendo at scale
- Crucible CTF: platform.dreadnode.io/crucible
| Framework | Focus |
|---|---|
| MITRE ATLAS | Adversarial Threat Landscape for AI Systems |
| OWASP Top 10 for LLM Applications | Security risks specific to LLM deployments |