Skip to content

Commit ad5d962

Browse files
committed
feat: add composable guardrails system for input/output validation
Introduce a guardrail layer that intercepts content before it reaches an agent (input guards) and before it returns to the caller (output guards). Guards are composable, ordered, and follow the same thread-safe, stateless design as Tools. A guard's `call` method returns one of three outcomes: - **pass** (nil or GuardResult.pass): content proceeds unchanged - **rewrite** (GuardResult.rewrite): content is replaced before continuing to the next guard or the LLM - **tripwire** (GuardResult.tripwire): the run is aborted immediately with a dedicated error and metadata on the RunResult Key design decisions: - Guards are agent-scoped (`input_guards:` / `output_guards:` kwargs), not global, enabling fine-grained per-agent policies - Fail-open by default: a guard that raises an unexpected exception logs and passes. `strict: true` converts exceptions to tripwires - Input guards run once before the first LLM call; output guards run only on the final response (not intermediate tool-call turns) - Guard chains execute in array order; each guard sees the output of the previous guard's potential rewrite - Structured output (Hash/Array from response_schema) is serialized to JSON before the guard chain and deserialized back after rewrite - GuardRunner.run tracks rewrites across the chain and returns action: :rewrite so callers can detect changes - Dedup check (last_message_matches?) runs after input guards so rewritten input is compared against history - Tripwire rescue uses finalize_run with guardrail_tripwire kwarg; StandardError rescue has a safety-net re-raise for Tripwire New files: - lib/agents/guard.rb — base class, Tripwire exception, DSL - lib/agents/guard_result.rb — value object (pass/rewrite/tripwire) - lib/agents/guard_runner.rb — ordered chain executor Integration points: - Agent: accepts input_guards/output_guards, propagated through clone - Runner: input guards before LLM, output guards before finalize_run, Guard::Tripwire rescue with guardrail_tripwire metadata on RunResult - RunResult: new `guardrail_tripwire` field and `tripwired?` predicate - CallbackManager: new `guard_triggered` event type - AgentRunner: `on_guard_triggered` callback registration - Instrumentation: `agents.run.guard.*` OTel spans with phase/action attributes, compatible with Langfuse Tests: 12 new examples covering input guard rewrites, output guard rewrites, structured output guards (redact/tripwire/pass-through), dedup regression, and tripwire metadata and callback emission. Existing specs updated to stub the new guard attributes.
1 parent 993e6e7 commit ad5d962

19 files changed

+1730
-20
lines changed

CLAUDE.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ This project is a Ruby SDK for building multi-agent AI workflows. It allows deve
3030
- `lib/agents/tool.rb`: Defines the `Tool` class, the base for creating custom tools for agents.
3131
- `lib/agents/agent_runner.rb`: Thread-safe agent execution manager for multi-agent conversations.
3232
- `lib/agents/runner.rb`: Internal orchestrator that handles individual conversation turns.
33+
- `lib/agents/guard.rb`: Base class for guardrails — stateless input/output validators.
34+
- `lib/agents/guard_result.rb`: Value object for guard outcomes (pass/rewrite/tripwire).
35+
- `lib/agents/guard_runner.rb`: Ordered chain executor for guards with fail-open/closed modes.
3336
- `spec/`: Contains the RSpec tests for the project.
3437
- `examples/`: Includes example implementations of multi-agent systems, such as an ISP customer support demo.
3538
- `Gemfile`: Manages the project's Ruby dependencies.
@@ -65,7 +68,9 @@ This will start a command-line interface where you can interact with the multi-a
6568
- **Handoff**: The process of transferring a conversation from one agent to another. This is a core feature of the SDK.
6669
- **Runner**: Internal component that manages individual conversation turns (used by AgentRunner).
6770
- **Context**: A shared state object that stores conversation history and agent information, fully serializable for persistence.
68-
- **Callbacks**: Event hooks for monitoring agent execution, including agent thinking, tool start/complete, and handoffs.
71+
- **Callbacks**: Event hooks for monitoring agent execution, including agent thinking, tool start/complete, handoffs, and guard triggers.
72+
- **Guard**: A stateless validator that intercepts content before (input) or after (output) agent execution. Returns pass, rewrite (modify content), or tripwire (abort run).
73+
- **GuardRunner**: Executes an ordered chain of guards. Supports fail-open (default) and fail-closed (strict) error handling.
6974

7075
## Development Commands
7176

@@ -118,6 +123,9 @@ ruby examples/isp-support/interactive.rb
118123
- **Agents::Context**: Shared state management across agent interactions
119124
- **Agents::Handoff**: Manages seamless transfers between agents
120125
- **Agents::CallbackManager**: Centralized event handling for real-time monitoring
126+
- **Agents::Guard**: Base class for guardrails (input/output content validation)
127+
- **Agents::GuardResult**: Value object for guard outcomes (pass/rewrite/tripwire)
128+
- **Agents::GuardRunner**: Ordered guard chain executor with fail-open/closed modes
121129

122130
### Key Design Principles
123131

@@ -143,6 +151,9 @@ lib/agents/
143151
├── tool_context.rb # Tool execution context
144152
├── tool_wrapper.rb # Thread-safe tool wrapping
145153
├── callback_manager.rb # Centralized callback event handling
154+
├── guard.rb # Base class for guardrails (input/output validators)
155+
├── guard_result.rb # Value object for guard outcomes (pass/rewrite/tripwire)
156+
├── guard_runner.rb # Ordered guard chain executor
146157
├── message_extractor.rb # Conversation history processing
147158
└── version.rb # Gem version
148159
```
@@ -231,6 +242,7 @@ The SDK includes a comprehensive callback system for monitoring agent execution
231242
- `on_tool_start`: Triggered when a tool begins execution
232243
- `on_tool_complete`: Triggered when a tool finishes execution
233244
- `on_agent_handoff`: Triggered when control transfers between agents
245+
- `on_guard_triggered`: Triggered when a guard produces a non-pass result (rewrite or tripwire)
234246

235247
### Callback Integration
236248

docs/concepts/guardrails.md

Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
---
2+
layout: default
3+
title: Guardrails
4+
parent: Concepts
5+
nav_order: 8
6+
---
7+
8+
# Guardrails
9+
10+
Guardrails are composable validation layers that intercept content before it reaches an agent (input guards) and before it returns to the caller (output guards). They allow you to enforce policies, redact sensitive data, and abort runs when content violates your rules.
11+
12+
## How Guards Work
13+
14+
A guard is a stateless class that receives content and returns one of three outcomes:
15+
16+
- **Pass** (return `nil` or `GuardResult.pass`): Content is acceptable, continue execution.
17+
- **Rewrite** (`GuardResult.rewrite`): Replace the content with a modified version.
18+
- **Tripwire** (`GuardResult.tripwire`): Abort the run immediately with an error.
19+
20+
```ruby
21+
class PiiRedactor < Agents::Guard
22+
guard_name "pii_redactor"
23+
description "Redacts Social Security numbers from content"
24+
25+
def call(content, context)
26+
redacted = content.gsub(/\b\d{3}-\d{2}-\d{4}\b/, "[REDACTED]")
27+
GuardResult.rewrite(redacted, message: "SSN redacted") if redacted != content
28+
end
29+
end
30+
```
31+
32+
## Input Guards vs Output Guards
33+
34+
**Input guards** run before the first LLM call. They validate or transform the user's message before the agent sees it. Use them for prompt injection detection, input sanitization, or content filtering.
35+
36+
**Output guards** run on the agent's final response before it returns to the caller. They validate or transform what the agent says back. Use them for PII redaction, topic fencing, or response quality checks.
37+
38+
```ruby
39+
agent = Agents::Agent.new(
40+
name: "Support",
41+
instructions: "You are a helpful support agent.",
42+
input_guards: [PromptInjectionGuard.new],
43+
output_guards: [PiiRedactor.new, TopicFence.new]
44+
)
45+
```
46+
47+
Guards execute in array order. Each guard sees the output of the previous guard's potential rewrite, forming a processing pipeline.
48+
49+
## Writing a Guard
50+
51+
Extend `Agents::Guard` and implement the `call` method:
52+
53+
```ruby
54+
class MaxLengthGuard < Agents::Guard
55+
guard_name "max_length"
56+
description "Tripwires if content exceeds maximum length"
57+
58+
def initialize(max:)
59+
super()
60+
@max = max
61+
end
62+
63+
def call(content, context)
64+
if content.length > @max
65+
GuardResult.tripwire(
66+
message: "Content exceeds #{@max} characters",
67+
metadata: { length: content.length, max: @max }
68+
)
69+
end
70+
end
71+
end
72+
```
73+
74+
Guards follow the same thread-safety principles as Tools:
75+
- No execution state in instance variables (only configuration like `@max` above)
76+
- All shared state flows through the `context` parameter
77+
- Guard instances are immutable after creation
78+
79+
## Tripwires
80+
81+
When a guard tripwires, the run aborts immediately. The result includes structured metadata about what happened:
82+
83+
```ruby
84+
result = runner.run("Tell me a secret")
85+
86+
if result.tripwired?
87+
puts result.guardrail_tripwire[:guard_name] # => "content_policy"
88+
puts result.guardrail_tripwire[:message] # => "Response violates content policy"
89+
puts result.guardrail_tripwire[:metadata] # => { category: "secrets" }
90+
end
91+
```
92+
93+
Tripwires short-circuit the guard chain. If guard 1 tripwires, guards 2 and 3 never run.
94+
95+
## Fail-Open vs Fail-Closed
96+
97+
By default, guards are **fail-open**: if a guard raises an unexpected exception (not a Tripwire), the error is logged and the guard is skipped. This prevents a buggy guard from breaking your entire application.
98+
99+
For high-security contexts, you can configure **fail-closed** (strict) mode on the agent. In strict mode, any unexpected guard exception is converted to a tripwire:
100+
101+
```ruby
102+
# Fail-open (default) — buggy guard is skipped, run continues
103+
agent = Agents::Agent.new(
104+
name: "Support",
105+
input_guards: [PotentiallyBuggyGuard.new]
106+
)
107+
108+
# Fail-closed — any guard error aborts the run
109+
# (configured via GuardRunner strict: true, typically set at the runner level)
110+
```
111+
112+
## Structured Output
113+
114+
When an agent uses `response_schema`, the LLM returns structured data (a Hash). Output guards still receive a String — the SDK automatically serializes the Hash to JSON before the guard chain and deserializes it back after any rewrite. This means your guards always operate on Strings regardless of output format.
115+
116+
```ruby
117+
# This guard works on both plain text and structured output
118+
class ContentFilter < Agents::Guard
119+
guard_name "content_filter"
120+
121+
def call(content, context)
122+
# content is always a String — JSON for structured output
123+
if content.include?("forbidden")
124+
GuardResult.tripwire(message: "Forbidden content detected")
125+
end
126+
end
127+
end
128+
```
129+
130+
## Guards Across Handoffs
131+
132+
Guards are agent-scoped. When agent A hands off to agent B:
133+
134+
- Agent A's **input guards** ran once on the original user input (before the handoff decision).
135+
- Agent A's **output guards** do NOT run — the handoff interrupts before a final response.
136+
- Agent B's **output guards** run on agent B's final response.
137+
138+
This means each agent enforces its own policies independently.
139+
140+
## Callbacks and Instrumentation
141+
142+
Guard activity is observable through the callback system:
143+
144+
```ruby
145+
runner = Agents::Runner.with_agents(agent)
146+
.on_guard_triggered { |guard_name, phase, action, message, ctx|
147+
puts "Guard #{guard_name} (#{phase}): #{action}#{message}"
148+
}
149+
```
150+
151+
The callback fires for every non-pass result (rewrites and tripwires). It does not fire when guards pass.
152+
153+
If OpenTelemetry instrumentation is installed, guard events produce `agents.run.guard.*` spans with attributes for guard name, phase (input/output), action (rewrite/tripwire), and message.
154+
155+
## Complete Example
156+
157+
```ruby
158+
class PromptInjectionGuard < Agents::Guard
159+
guard_name "prompt_injection"
160+
description "Detects common prompt injection patterns"
161+
162+
def call(content, context)
163+
patterns = [
164+
/ignore\s+(all\s+)?previous\s+instructions/i,
165+
/you\s+are\s+now\s+a/i,
166+
/disregard\s+(all\s+)?prior/i
167+
]
168+
169+
if patterns.any? { |p| content.match?(p) }
170+
GuardResult.tripwire(
171+
message: "Potential prompt injection detected",
172+
metadata: { input_length: content.length }
173+
)
174+
end
175+
end
176+
end
177+
178+
class PiiRedactor < Agents::Guard
179+
guard_name "pii_redactor"
180+
description "Redacts SSNs and email addresses"
181+
182+
def call(content, context)
183+
redacted = content
184+
.gsub(/\b\d{3}-\d{2}-\d{4}\b/, "[SSN REDACTED]")
185+
.gsub(/\b[\w.+-]+@[\w-]+\.[\w.]+\b/, "[EMAIL REDACTED]")
186+
187+
GuardResult.rewrite(redacted, message: "PII redacted") if redacted != content
188+
end
189+
end
190+
191+
agent = Agents::Agent.new(
192+
name: "Support",
193+
instructions: "You are a helpful customer support agent.",
194+
input_guards: [PromptInjectionGuard.new],
195+
output_guards: [PiiRedactor.new]
196+
)
197+
198+
runner = Agents::Runner.with_agents(agent)
199+
.on_guard_triggered { |name, phase, action, msg|
200+
Rails.logger.info("Guard #{name} (#{phase}): #{action}")
201+
}
202+
203+
result = runner.run("What is my email?")
204+
# Output PII is automatically redacted before reaching the user
205+
```

lib/agents.rb

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,9 @@ def configured?
112112
require_relative "agents/tool"
113113
require_relative "agents/handoff"
114114
require_relative "agents/helpers"
115+
require_relative "agents/guard_result"
116+
require_relative "agents/guard"
117+
require_relative "agents/guard_runner"
115118
require_relative "agents/agent"
116119

117120
# Execution components

lib/agents/agent.rb

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,8 @@
5050
# )
5151
module Agents
5252
class Agent
53-
attr_reader :name, :instructions, :model, :tools, :handoff_agents, :temperature, :response_schema, :headers, :params
53+
attr_reader :name, :instructions, :model, :tools, :handoff_agents, :temperature, :response_schema, :headers, :params,
54+
:input_guards, :output_guards
5455

5556
# Initialize a new Agent instance
5657
#
@@ -64,7 +65,7 @@ class Agent
6465
# @param headers [Hash, nil] Default HTTP headers applied to LLM requests
6566
# @param params [Hash, nil] Default provider-specific parameters applied to LLM requests (e.g., service_tier)
6667
def initialize(name:, instructions: nil, model: "gpt-4.1-mini", tools: [], handoff_agents: [], temperature: 0.7,
67-
response_schema: nil, headers: nil, params: nil)
68+
response_schema: nil, headers: nil, params: nil, input_guards: [], output_guards: [])
6869
@name = name
6970
@instructions = instructions
7071
@model = model
@@ -74,6 +75,8 @@ def initialize(name:, instructions: nil, model: "gpt-4.1-mini", tools: [], hando
7475
@response_schema = response_schema
7576
@headers = Helpers::HashNormalizer.normalize(headers, label: "headers", freeze_result: true)
7677
@params = Helpers::HashNormalizer.normalize(params, label: "params", freeze_result: true)
78+
@input_guards = input_guards.dup.freeze
79+
@output_guards = output_guards.dup.freeze
7780

7881
# Mutex for thread-safe handoff registration
7982
# While agents are typically configured at startup, we want to ensure
@@ -170,7 +173,9 @@ def clone(**changes)
170173
temperature: changes.fetch(:temperature, @temperature),
171174
response_schema: changes.fetch(:response_schema, @response_schema),
172175
headers: changes.fetch(:headers, @headers),
173-
params: changes.fetch(:params, @params)
176+
params: changes.fetch(:params, @params),
177+
input_guards: changes.fetch(:input_guards, @input_guards),
178+
output_guards: changes.fetch(:output_guards, @output_guards)
174179
)
175180
end
176181

lib/agents/agent_runner.rb

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,8 @@ def initialize(agents)
5454
agent_thinking: [],
5555
agent_handoff: [],
5656
llm_call_complete: [],
57-
chat_created: []
57+
chat_created: [],
58+
guard_triggered: []
5859
}
5960
end
6061

@@ -195,6 +196,18 @@ def on_chat_created(&block)
195196
self
196197
end
197198

199+
# Register a callback for guard triggered events.
200+
# Called when a guardrail produces a non-pass result (rewrite or tripwire).
201+
#
202+
# @param block [Proc] Callback block that receives (guard_name, phase, action, message, context_wrapper)
203+
# @return [self] For method chaining
204+
def on_guard_triggered(&block)
205+
return self unless block
206+
207+
@callbacks_mutex.synchronize { @callbacks[:guard_triggered] << block }
208+
self
209+
end
210+
198211
private
199212

200213
# Build agent registry from provided agents only.

lib/agents/callback_manager.rb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ class CallbackManager
2222
agent_handoff
2323
llm_call_complete
2424
chat_created
25+
guard_triggered
2526
].freeze
2627

2728
def initialize(callbacks = {})

0 commit comments

Comments
 (0)