feat: add composable guardrails system for input/output validation#59
Open
sergiobayona wants to merge 1 commit intochatwoot:mainfrom
Open
feat: add composable guardrails system for input/output validation#59sergiobayona wants to merge 1 commit intochatwoot:mainfrom
sergiobayona wants to merge 1 commit intochatwoot:mainfrom
Conversation
Introduce a guardrail layer that intercepts content before it reaches an agent (input guards) and before it returns to the caller (output guards). Guards are composable, ordered, and follow the same thread-safe, stateless design as Tools. A guard's `call` method returns one of three outcomes: - **pass** (nil or GuardResult.pass): content proceeds unchanged - **rewrite** (GuardResult.rewrite): content is replaced before continuing to the next guard or the LLM - **tripwire** (GuardResult.tripwire): the run is aborted immediately with a dedicated error and metadata on the RunResult Key design decisions: - Guards are agent-scoped (`input_guards:` / `output_guards:` kwargs), not global, enabling fine-grained per-agent policies - Fail-open by default: a guard that raises an unexpected exception logs and passes. `strict: true` converts exceptions to tripwires - Input guards run once before the first LLM call; output guards run only on the final response (not intermediate tool-call turns) - Guard chains execute in array order; each guard sees the output of the previous guard's potential rewrite - Structured output (Hash/Array from response_schema) is serialized to JSON before the guard chain and deserialized back after rewrite - GuardRunner.run tracks rewrites across the chain and returns action: :rewrite so callers can detect changes - Dedup check (last_message_matches?) runs after input guards so rewritten input is compared against history - Tripwire rescue uses finalize_run with guardrail_tripwire kwarg; StandardError rescue has a safety-net re-raise for Tripwire New files: - lib/agents/guard.rb — base class, Tripwire exception, DSL - lib/agents/guard_result.rb — value object (pass/rewrite/tripwire) - lib/agents/guard_runner.rb — ordered chain executor Integration points: - Agent: accepts input_guards/output_guards, propagated through clone - Runner: input guards before LLM, output guards before finalize_run, Guard::Tripwire rescue with guardrail_tripwire metadata on RunResult - RunResult: new `guardrail_tripwire` field and `tripwired?` predicate - CallbackManager: new `guard_triggered` event type - AgentRunner: `on_guard_triggered` callback registration - Instrumentation: `agents.run.guard.*` OTel spans with phase/action attributes, compatible with Langfuse Tests: 12 new examples covering input guard rewrites, output guard rewrites, structured output guards (redact/tripwire/pass-through), dedup regression, and tripwire metadata and callback emission. Existing specs updated to stub the new guard attributes.
✅ Deploy Preview for ruby-ai-agents ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Contributor
Author
|
any feedback welcome |
Contributor
|
Hi Sergio, This looks interesting. I will take a look at this in the next few days. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Introduce a guardrail layer that intercepts content before it reaches an agent (input guards) and before it returns to the caller (output guards). Guards are composable, ordered, and follow the same thread-safe, stateless design as Tools.
A guard's
callmethod returns one of three outcomes:Key design decisions:
input_guards:/output_guards:kwargs), not global, enabling fine-grained per-agent policiesstrict: trueconverts exceptions to tripwiresNew files:
Integration points:
guardrail_tripwirefield andtripwired?predicateguard_triggeredevent typeon_guard_triggeredcallback registrationagents.run.guard.*OTel spans with phase/action attributes, compatible with Langfuse