Skip to content

feat: add composable guardrails system for input/output validation#59

Open
sergiobayona wants to merge 1 commit intochatwoot:mainfrom
sergiobayona:feat/guardrails
Open

feat: add composable guardrails system for input/output validation#59
sergiobayona wants to merge 1 commit intochatwoot:mainfrom
sergiobayona:feat/guardrails

Conversation

@sergiobayona
Copy link
Copy Markdown
Contributor

Introduce a guardrail layer that intercepts content before it reaches an agent (input guards) and before it returns to the caller (output guards). Guards are composable, ordered, and follow the same thread-safe, stateless design as Tools.

A guard's call method returns one of three outcomes:

  • pass (nil or GuardResult.pass): content proceeds unchanged
  • rewrite (GuardResult.rewrite): content is replaced before continuing to the next guard or the LLM
  • tripwire (GuardResult.tripwire): the run is aborted immediately with a dedicated error and metadata on the RunResult

Key design decisions:

  • Guards are agent-scoped (input_guards: / output_guards: kwargs), not global, enabling fine-grained per-agent policies
  • Fail-open by default: a guard that raises an unexpected exception logs and passes. strict: true converts exceptions to tripwires
  • Input guards run once before the first LLM call; output guards run only on the final response (not intermediate tool-call turns)
  • Guard chains execute in array order; each guard sees the output of the previous guard's potential rewrite
  • Structured output (Hash/Array from response_schema) is serialized to JSON before the guard chain and deserialized back after rewrite
  • GuardRunner.run tracks rewrites across the chain and returns action: :rewrite so callers can detect changes
  • Dedup check (last_message_matches?) runs after input guards so rewritten input is compared against history
  • Tripwire rescue uses finalize_run with guardrail_tripwire kwarg; StandardError rescue has a safety-net re-raise for Tripwire

New files:

  • lib/agents/guard.rb — base class, Tripwire exception, DSL
  • lib/agents/guard_result.rb — value object (pass/rewrite/tripwire)
  • lib/agents/guard_runner.rb — ordered chain executor

Integration points:

  • Agent: accepts input_guards/output_guards, propagated through clone
  • Runner: input guards before LLM, output guards before finalize_run, Guard::Tripwire rescue with guardrail_tripwire metadata on RunResult
  • RunResult: new guardrail_tripwire field and tripwired? predicate
  • CallbackManager: new guard_triggered event type
  • AgentRunner: on_guard_triggered callback registration
  • Instrumentation: agents.run.guard.* OTel spans with phase/action attributes, compatible with Langfuse

Introduce a guardrail layer that intercepts content before it reaches
an agent (input guards) and before it returns to the caller (output
guards). Guards are composable, ordered, and follow the same
thread-safe, stateless design as Tools.

A guard's `call` method returns one of three outcomes:
- **pass** (nil or GuardResult.pass): content proceeds unchanged
- **rewrite** (GuardResult.rewrite): content is replaced before
  continuing to the next guard or the LLM
- **tripwire** (GuardResult.tripwire): the run is aborted immediately
  with a dedicated error and metadata on the RunResult

Key design decisions:
- Guards are agent-scoped (`input_guards:` / `output_guards:` kwargs),
  not global, enabling fine-grained per-agent policies
- Fail-open by default: a guard that raises an unexpected exception
  logs and passes. `strict: true` converts exceptions to tripwires
- Input guards run once before the first LLM call; output guards run
  only on the final response (not intermediate tool-call turns)
- Guard chains execute in array order; each guard sees the output of
  the previous guard's potential rewrite
- Structured output (Hash/Array from response_schema) is serialized
  to JSON before the guard chain and deserialized back after rewrite
- GuardRunner.run tracks rewrites across the chain and returns
  action: :rewrite so callers can detect changes
- Dedup check (last_message_matches?) runs after input guards so
  rewritten input is compared against history
- Tripwire rescue uses finalize_run with guardrail_tripwire kwarg;
  StandardError rescue has a safety-net re-raise for Tripwire

New files:
- lib/agents/guard.rb         — base class, Tripwire exception, DSL
- lib/agents/guard_result.rb  — value object (pass/rewrite/tripwire)
- lib/agents/guard_runner.rb  — ordered chain executor

Integration points:
- Agent: accepts input_guards/output_guards, propagated through clone
- Runner: input guards before LLM, output guards before finalize_run,
  Guard::Tripwire rescue with guardrail_tripwire metadata on RunResult
- RunResult: new `guardrail_tripwire` field and `tripwired?` predicate
- CallbackManager: new `guard_triggered` event type
- AgentRunner: `on_guard_triggered` callback registration
- Instrumentation: `agents.run.guard.*` OTel spans with phase/action
  attributes, compatible with Langfuse

Tests: 12 new examples covering input guard rewrites, output guard
rewrites, structured output guards (redact/tripwire/pass-through),
dedup regression, and tripwire metadata and callback emission.
Existing specs updated to stub the new guard attributes.
@netlify
Copy link
Copy Markdown

netlify bot commented Mar 18, 2026

Deploy Preview for ruby-ai-agents ready!

Name Link
🔨 Latest commit ad5d962
🔍 Latest deploy log https://app.netlify.com/projects/ruby-ai-agents/deploys/69babcd126f4f400083cf29e
😎 Deploy Preview https://deploy-preview-59--ruby-ai-agents.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@sergiobayona
Copy link
Copy Markdown
Contributor Author

any feedback welcome

@aakashb95
Copy link
Copy Markdown
Contributor

Hi Sergio, This looks interesting. I will take a look at this in the next few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants