VCA-AI-201: AI & Agentic Security II
AI-201 takes AI-101's OWASP-LLM-Top-10 foundation into production agentic-system territory. Two named signature CVEs anchor the chapter: CVE-2025-68664 (LangGrinch deserialization, CVSS 9.3). A critical-severity Python pickle-deserialization chain in a popular agentic-orchestration library; and CVE-2025-9556 (LangChainGo Gonja SSTI) - the Go-language cousin of CVE-2025-65106's Jinja2 SSTI pattern, demonstrating that the bug class generalises across template-engine languages. Students reproduce both end-to-end and produce coordinated-disclosure-style writeups. The chapter references the Virtus DVLA (Daily Vulnerable LLM Application) testbed for hands-on agentic-pentesting practice; current DVLA findings are model-intrinsic and require no external coordination.
virtus-llm-owasp); free-tier-cloud-GPU pathway (Google Colab + Kaggle + HuggingFace Spaces + Lightning AI) for adversarial-suffix-search labs (GCG / AutoDAN / PAIR all benefit from GPU); multi-modal model access (LLaVA + Whisper local inference via Ollama + faster-whisper); NO hardware (see hardware platform · we update this as the kit firms up)Course Overview
AI-201 moves from individual-vulnerability awareness (AI-101's OWASP-LLM-Top-10 frame) to production-pentest discipline. Students learn to scope an agentic-system engagement, identify the trust boundaries that production systems consistently get wrong, build defensible reproduction tools, write coordinated-disclosure-quality reports, and reason about cross-language bug-class generalisation. The two signature CVEs (LangGrinch + LangChainGo Gonja) demonstrate that the canonical agentic-system bugs generalise: a Python deserialization pattern reappears in Go; a Jinja2 SSTI pattern reappears in Gonja.
The Virtus DVLA (Daily Vulnerable LLM Application) testbed is the chapter's primary lab
environment. DVLA is a deliberately-vulnerable agentic chatbot the academy maintains; current
findings are all model-intrinsic. Bug classes that would land against any
comparable model behind any comparable wrapper, so no external coordination is required for
academy students to study them. The 9-model L3-regression baseline is published in the
virtus-llm-owasp public repo.
Position relative to peer offerings. AI-201 is the first formal curriculum that treats agentic-system pentesting as a discipline in its own right (rather than as "LLM security with extra labs") and that ships the Virtus DVLA as a reproducible-by-students target. Industry workshops (HackerOne, Snyk, OffSec) offer pieces of this material; AI-201 assembles the pieces into a coherent track with CVE-coordinated-disclosure discipline as part of the curriculum.
Structuring frame, MITRE ATLAS. Where AI-101 anchored to the OWASP LLM Top 10 (a 10-item list good for Belt-3 introduction), AI-201 anchors to MITRE ATLAS (the Adversarial Threat Landscape for Artificial-Intelligence Systems; v5.1.0, November 2025; 16 tactics + 84 techniques + 56 sub-techniques + 32 mitigations + 42 case studies; backed by 16 member organisations including Microsoft and JPMorgan Chase via the MITRE Secure AI Program). ATLAS is the ATT&CK-of-AI-security: a 100+-technique knowledge base that gives a Belt-4 adversarial-AI engagement a shared vocabulary with the rest of the AI-red-team field. Every AI-201 module maps to one or more ATLAS tactics; the capstone deliverable is a MITRE-ATLAS-mapped pentest report. ATLAS's October 2025 expansion (in collaboration with Zenity Labs) added 14 new attack techniques specifically for AI Agents and Generative AI systems, the autonomous-agent attack surface AI-201 students operate against.
Pedagogy. AI-201 follows AI-101's teaching habits and adds: foundational
textbook readings continued at intermediate depth (Mitchell Ch 7-13 on NLP / reasoning /
ethics; Christian Ch 1-4 from the Prophecy section as a forward-pointer into AI-301; Karpathy
makemore Videos 1-2 + nanoGPT as the substrate-companion path);
cross-language bug-class analysis (LangChain Jinja2 + LangChainGo Gonja + LangChainJS Eta
+ LangChain4J FreeMarker as a 4-language SSTI study); production-system threat modelling
(the chapter's capstone is a threat model for a real-world LLM-app the student
doesn't control); responsible-disclosure discipline (students walk through a real
CVE coordination process from report to public disclosure); Microsoft AI Red Team
"diverse-team-first" doctrine (per their 100+ GenAI products red-teamed by
October 2024 and the AI Red Teaming Playground Labs open-source training infrastructure).
Curriculum Outline
Fourteen modules across ~12 weeks (12 originals + 2 NEW insertions: Module 4.5 academic jailbreak corpus + Module 7.5 multi-modal adversarial). Each module maps to one or more MITRE ATLAS tactics.
| Module | Topic | MITRE ATLAS tactic | Project |
|---|---|---|---|
| 1 | From OWASP to ATLAS. Production-pentest scoping | Reconnaissance + Resource Development | Scope a hypothetical engagement; produce a 2-page MITRE-ATLAS-mapped rules-of-engagement document |
| 2 | The Virtus DVLA testbed | ML Model Access + ML Attack Staging | Clone virtus-llm-owasp; reproduce one published L3-regression finding |
| 3 | Pickle / cloudpickle / dill deserialization in agentic systems | Initial Access (ML Supply Chain Compromise) | CVE-2025-68664 (LangGrinch CVSS 9.3) end-to-end reproduction |
| 4 | Cross-language SSTI, the bug-class generalisation | Execution (User Execution + Tool-Chain Compromise) | CVE-2025-9556 (LangChainGo Gonja SSTI) reproduction; pair against CVE-2025-65106 from AI-101 |
| 4.5 (NEW) | The 2023-2026 academic jailbreak corpus | Defense Evasion (Adversarial Examples + Model Bypass) | Reproduce GCG (Zou et al. 2023; arxiv 2307.15043; universal-and-transferable adversarial suffixes), AutoDAN (Liu et al. 2023; ICLR 2024; hierarchical-genetic semantically-meaningful prompts), PAIR (Chao et al. 2023; black-box jailbreak in <20 queries via LLM-as-attacker); evaluate against HarmBench (Mazeika et al. 2024; ICML; standardized 400-behavior eval) + JailbreakBench + AdvBench; primary-paper required reading |
| 5 | Tool-calling exploit patterns | Discovery + Lateral Movement | Construct a permissive tool; observe agency-confusion exploits |
| 6 | RAG-poisoning + indirect prompt injection at scale | Persistence + Collection | Build a poisoned vector-store; demonstrate the exfiltration chain |
| 7 | Agentic web-scraping + SSRF in LLM-rendered URLs | Command and Control (LLM-Mediated C2) | SSRF via LLM-generated URL; defend with allow-list |
| 7.5 (NEW) | Multi-modal adversarial attacks | Initial Access (Multi-Modal Adversarial Inputs) | Reproduce visual prompt injection against LLaVA-v1.5-13B + GPT-4o mini (the Virtual Scenario Hypnosis result reports 82.6% / 89.0% harmful-output rates. Bypasses text-only filters because the malicious instructions arrive as pixels not text); reproduce Whisper transcription-chain attack (cascaded mic → ASR → LLM pipeline; the canonical "your favorite chatbot listens to wrong instructions"); compositional multi-modal attacks (Chain of Attack CVPR 2025) |
| 8 | Coordinated disclosure discipline + NIST AI 600-1 GenAI Profile | cross-cuts to ATLAS Mitigations | Walk a hypothetical CVE coordination from report to public disclosure; reference the NIST AI Risk Management Framework + the NIST AI 600-1 GenAI Profile (released July 26, 2024) as the regulatory companion to MITRE ATLAS's tactical framework |
| 9 | Defensive architecture: input validation, output filters, sandboxing | ATLAS Mitigations | Build a defensible LangChain agent; pass red-team review |
| 10 | Model-intrinsic vs application-layer findings | ATLAS evaluation methodology | Classify 10 published LLM CVEs; map to the right defence layer |
| 11 | Multi-model regression testing | ATLAS evaluation methodology | Run a finding against 9 models (the DVLA L3 baseline); produce a MITRE-ATLAS-mapped comparison |
| 12 | Capstone. Full agentic-pentest engagement | full ATLAS spectrum | Pentest an open-source LangChain-based application; produce a 12-page MITRE-ATLAS-mapped report |
Forward-pointer to AI-301 (Anthropic SAE / interpretability). Anthropic's interpretability team has used sparse autoencoders to extract tens of millions of features from production models (Claude 3 Sonnet), with safety-relevant features explicitly identified (deception, sycophancy, bias, dangerous content; per Scaling Monosemanticity, May 2024). If you want to understand WHY a particular jailbreak works at the feature level, that's interpretability research, and AI-301's substrate↔language thesis Module 4.5 is where it lands.
AI-201 vs ADV-102 division of labour. AI-201 is the methodology + tradecraft + multi-CVE pattern depth course (12 weeks; 14 modules; full agentic-pentest engagement capstone). ADV-102 is the single-CVE deep-dive microcurriculum over CVE-2025-65106 (the shared signature lab with AI-101 Module 8). Both exist; both are valuable; students can take either or both.
How the Course Teaches: Foundational Readings (continued from AI-101)
AI-101's paired-textbook system continues at intermediate depth. Mitchell's AI: A
Guide for Thinking Humans Chapters 7-13 (NLP / reasoning / game-playing / transfer
learning / ethics) supply the down-to-earth narrative substrate; Karpathy's
makemore Videos 1-2 + the nanoGPT video carry the build-it-yourself
substrate-companion path; Christian's The Alignment Problem Chapters 1-4 (the
Prophecy section) enter as a forward-pointer into AI-301. Students at Belt 4 are also expected to
read primary academic papers, the GCG / AutoDAN / PAIR / HarmBench papers named in the
Module 4.5 row above are required reading, not background.
Sample weave (Mitchell, AI: A Guide for Thinking Humans, Ch 7, On Trustworthy AI). Mitchell's argument in Chapter 7 is that "trustworthiness" in AI systems is not a property a model has or doesn't have, but a property a deployment context produces or fails to produce. The pedagogical point for AI-201 is that the same model behind two different deployment wrappers exhibits two different security postures. Trust boundaries are drawn at the application layer, not at the weights. This is what makes the "model-intrinsic vs application-layer findings" module (Module 10) feasible: a Belt-4 pentester walks into an engagement and asks where the wrapper authors drew the boundaries, not just whether the underlying model has been jailbroken before. Mitchell's framing supplies the mental model that the engagement-scoping work in Module 1 puts into practice.
Sample weave (Zou et al. 2023, "Universal and Transferable Adversarial Attacks on Aligned Language Models", the GCG paper). Zou and colleagues introduce the Greedy Coordinate Gradient method as the foundational fully-automated approach to LLM-adversarial-suffix generation: optimise a short adversarial suffix token-by-token against the model's gradient signal until the suffix elicits the target harmful behaviour. The paper's central claim, the one that justifies its place as AI-201 required reading. Is that the discovered suffixes transfer across models they were never optimised against. A suffix found on Vicuna-7B works on GPT-3.5, on Claude, on Gemini. This transferability is what made GCG the canonical citation for the next two years of adversarial-suffix work and what makes "defend against GCG" a real Belt-4 production requirement, not a Vicuna-only curiosity. Lab 4.5 has you reproduce GCG against an open-weight model in the academy cloud-GPU pathway, then evaluate transferability against the DVLA L3 regression baseline.
The full per-chapter weave catalog and the Module 4.5 primary-paper reading list publish as
handouts/cross-chapter-ai-track-anchor-reading-guide.md and
handouts/ai-201-academic-jailbreak-corpus-required-reading.md respectively.
Learning Outcomes
- Remember. Identify the named CVEs from the chapter (CVE-2025-65106 / CVE-2025-68664 / CVE-2025-9556) and their bug classes.
- Understand. Explain why bug classes generalise across template-engine languages (Jinja2 / Gonja / Eta / FreeMarker).
- Apply. Reproduce CVE-2025-68664 (LangGrinch deserialization) end-to-end.
- Apply. Reproduce CVE-2025-9556 (LangChainGo Gonja SSTI) end-to-end.
- Apply. Use the DVLA testbed; reproduce a published L3-regression finding across 9 models.
- Analyze. Classify a finding as model-intrinsic vs application-layer; pick the right defence-layer.
- Synthesize. Pentest an open-source LangChain agent + write a coordinated-disclosure-style 10-page report.
Hands-On Labs
- Lab 2.1: DVLA clone + first L3-regression reproduction.
- Lab 3.1 (signature): CVE-2025-68664 end-to-end pickle deserialization chain.
- Lab 4.1 (signature): CVE-2025-9556 Gonja SSTI; cross-reference Jinja2 + Eta + FreeMarker.
- Lab 5.1: permissive-tool agent; agency-confusion exploit.
- Lab 6.1: RAG-poisoning lab; document-loader exfiltration chain.
- Lab 7.1: SSRF via LLM-generated URLs.
- Lab 8.1: hypothetical CVE coordination walkthrough.
- Lab 9.1: defensible LangChain agent; passes red-team review.
- Lab 11.1: multi-model regression test; 9-model comparison.
- Lab 12 (capstone): full pentest engagement + 10-page report.
Assessment
First, your project must work. both CVE reproductions land; capstone pentest report submitted. Then we score the report on three dimensions (40/30/30). reproduction depth (40%) · report quality at coordinated-disclosure practices (30%) · cross-language bug-class generalisation articulation (30%). B− minimum on Tier 2 for the certificate.
Career Outcomes & Cross-Course Bridges
- → VCA-AI-301. Capstone adversarial-AI; the substrate↔language thesis literalized via Anthropic Sparse Autoencoder + activation steering / representation engineering work; multi-track capstone slate.
- → VCA-ADV-102. Sibling-but-distinct single-CVE microcurriculum over CVE-2025-65106; AI-201 graduates take it for the deeper coordinated-disclosure walk.
- → VCA-PEN-101. Pentesting register transfers; agentic-system pentest is a sub-discipline of pentest.
- → VCA-RE-101. Reverse-engineering practices transfers; reading agentic-system source is a sub-discipline of binary analysis. AI-201's ML-in-malware cross-cut prepares students for ML-classifier RE work in re-201.
- Industry. AI red-team senior; agentic-system pentest lead; LLM-app security architect; AI red-team engineer at Microsoft AI Red Team (100+ GenAI products by Oct 2024); NVIDIA AI Red Team (garak maintainers); production AI-product security at vendor red teams (Lakera, HiddenLayer, Robust Intelligence, Adversa AI).
- Peer university references. Berkeley CS294/194-196 Agentic AI (Fall 2025) + CS294/194-280 Advanced LLM Agents (Spring 2025) cover overlapping territory at ML-research set; AI-201 differs by being cybersecurity-anchored.
Tool Journal: AI-201 Originating Entries
~15 tool-journal entries originate in AI-201; the AI-101 corpus continues at advanced depth (Karpathy makemore + nanoGPT are watched in the substrate-companion path).
- LangChainGo, the Go cousin of LangChain; runtime for CVE-2025-9556 reproduction
- Gonja, the Go template engine the SSTI lands in
- Pickle introspection tooling (pickletools, fickling). Pickle-deserialization analysis
- Virtus DVLA, the testbed itself, treated as a tool
- Promptfoo. Prompt-test framework for regression-test work
- NVIDIA garak. Carried over from AI-101 Module 7.5; deeper probe authoring at AI-201
- HarmBench, UC Berkeley + Google DeepMind + Center for AI Safety; THE standardized eval framework for automated red teaming; 400 behaviors across 7 risk categories; de facto standard for quantitative reproducible attack/defense comparison post-2024 (github.com/centerforaisafety/HarmBench)
- JailbreakBench, Chao et al. 2024; complementary lighter-weight benchmark to HarmBench (jailbreakbench.github.io)
- Microsoft PyRIT (multi-turn orchestration). Carried over from AI-101 Module 7.5; AI-201 elevates to multi-turn Crescendo / TAP / Skeleton-Key attack-strategy orchestration. The same tool 100+ Microsoft AI Red Team operations have run.
- MITRE ATLAS Navigator, the framework's canonical exploration tool; map your engagement against the 16-tactic / 84-technique knowledge base (mitre-atlas.github.io/atlas-navigator)
- Microsoft AI Red Teaming Playground Labs. Open-source training infrastructure for AI red-team trainings; canonical hands-on instructor-onboarding asset (github.com/microsoft/AI-Red-Teaming-Playground-Labs)
- vulnverifier-cve-2025-68664, the academy's reproduction harness
- Coordinated-disclosure documentation template. Standard CVE report format; cross-references NIST AI 600-1 GenAI Profile
- multi-model regression runner, 9-model L3 baseline runner
- OpenSSF Scorecard. Supply-chain security scoring for AI dependencies
- LLaVA + Whisper local inference (Ollama + faster-whisper). Multi-modal adversarial-lab substrate (Module 7.5)
Before You Start
- Have you completed AI-101? (If no → AI-101 is central prereq.)
- Have you completed PEN-101 or have equivalent pentesting experience? (If no → PEN-101 strongly recommended.)
- Are you comfortable installing Go 1.21+? (If no → go.dev/doc/install.)
- Can you read CVE writeups + vendor patch diffs fluently? (If no → AI-101 review.)
- Are you comfortable with the responsible-disclosure norms (no public exploitation against systems you don't own)? (If no → SEC-101 ethics module.)
Format Prescriptions
Hour budget: ~26 lec hr + ~50 lab hr + ~64 indep hr (= ~140 hr total).
Live
2 sessions/wk × 90 min over 12 weeks.
Night class
1-2 sessions/wk evenings; ~24 weeks.
Bootcamp
40 hr/wk × ~3.5 weeks intensive.
Async self-paced
Recorded video; AI-API budget guidance; 1:1 tutoring premium for CVE reproduction + reporting.
High school / homeschool co-op
Adapted live cadence over a school semester (~16 weeks).