This repository contains minimal demo implementations of the six security design patterns discussed in the paper "Design Patterns for Securing LLM Agents against Prompt Injections". Each example is a self-contained Chainlit application demonstrating a specific pattern.
The goal is to provide clear, runnable code that showcases how to build more secure and resilient LLM agents by imposing structural constraints on their operation.
This code is for educational and demonstration purposes only.
The implementations in this repository are simplified to clearly illustrate the core principles of each security pattern. They are not production-ready and should not be used in a live environment.
Follow these steps to set up your environment to run the examples. All commands should be run from the root of this project directory.
It is highly recommended to use a virtual environment to manage dependencies.
# Create the virtual environment
python3 -m venv venv
# Activate it (on macOS/Linux)
source venv/bin/activate
# Or on Windows
# venv\Scripts\activate
Install all required Python packages from the requirements.txt
file.
pip install -r requirements.txt
The examples use the OpenAI API. You need to provide your API key in a .env
file.
Create a file named .env
in the root of this project and add your key like this:
OPENAI_API_KEY="sk-..."
Each numbered subfolder contains the implementation for one security pattern. All chainlit run
commands should be executed from the root directory of this project.
This pattern restricts the LLM to only selecting a pre-defined tool and its arguments. It cannot generate conversational text for the user, making it resilient to prompt injection.
To run this example, execute the following command:
chainlit run 01_action-selector/app1.py
01_action-selector/app2.py
is an extended version of this pattern where the LLM can pass some parameters to the tools, but in a very controlled fashion (in this case only valid order ids for the current user) so to prevent any chance of arbitrary inputs.
This pattern separates an agent's operation into two phases. First, an LLM creates a fixed, immutable plan of action based only on the user's initial prompt. Then, a separate execution process carries out that plan, preventing prompt injections encountered during execution from altering the fundamental control flow.
To run this example, execute the following command:
chainlit run 02_plan-then-execute/app.py
This pattern is used to securely process a batch of untrusted documents. Each document is "mapped" in isolation to a structured, sanitized format by one LLM. Then, a second "reducer" LLM aggregates only the clean, structured data to produce a final result, ensuring an injection in one document cannot affect the others.
To run this example, execute the following command:
chainlit run 03_llm-map-reduce/app.py
This pattern uses two distinct LLM roles to create a security firewall. A stateful "Privileged" LLM orchestrates tasks and calls tools but never sees untrusted data. It uses a separate, tool-less "Quarantined" LLM to process any untrusted content. Communication is handled via symbolic variables, ensuring the Privileged LLM's context is never tainted.
To run this example, execute the following command:
chainlit run 04_dual-llm/app.py
This pattern has an LLM generate a complete Python script that is then executed in a sandboxed interpreter. This example is presented in two versions:
app_v1.py
demonstrates the base pattern.app_v2.py
enhances the pattern by introducing a rudimentary provenance tracking system. This system is inspired by the security concepts in the paper by Debenedetti et al. (see references), where data is tagged with its source. A security policy is then enforced at the most critical point—thequarantined_llm
tool—to block it from processing data that has been concatenated from multiple different untrusted sources.
To run the version with provenance tracking, execute the following command:
chainlit run 05_code-then-execute/app_v2.py
This pattern defends against injections in the user's prompt by separating request parsing from response generation. A 'retriever' LLM first extracts only the necessary, sanitized information (e.g., a service plan name) from the user's full request. A second 'summarizer' LLM then generates the final answer using a new, clean context that contains only the retrieved data, making it impossible for it to act on the original injection.
To run this example, execute the following command:
chainlit run 06_context-minimization/app.py
-
"Design Patterns for Securing LLM Agents against Prompt Injections"
- Luca Beurer-Kellner, Beat Buesser, Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, and Václav Volhejn.
- https://arxiv.org/abs/2506.08837
- This paper introduces the six core design patterns that this repository implements as practical code examples.
-
"The Dual LLM Pattern for Building AI Assistants That Can Resist Prompt Injection"
- Simon Willison.
- https://simonwillison.net/2023/Apr/25/dual-llm-pattern/
- This blog post served as the inspiration for Pattern 4 (Dual LLM).
-
"Defeating Prompt Injections by Design"
- Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr.
- https://arxiv.org/abs/2503.18813
- The concept of provenance tracking and data flow policies, demonstrated in the enhanced version of Pattern 5 (Code-Then-Execute), is inspired by the security principles discussed in this paper.