Skip to content

mozilla-ai/any-guardrail

Project logo

any-guardrail

Docs Linting Unit Tests Integration Tests

Python 3.11+ PyPI Discord

A single interface to use different guardrail models.

any-guardrail provides a unified interface for AI safety guardrails, for example, letting you detect toxic content, jailbreak attempts, and other risks in LLM inputs and outputs. Switch between different guardrail providers, both encoder-based (discriminative) and decoder-based (generative) models like Llama Guard and ShieldGemma, without changing your code.

Some guardrails are extremely customizable, which any-guardrail fully exposes. See the complete list of supported providers and customization examples in our docs.

Why any-guardrail?

  • Unified API: Switch between evergrowing list of guardrail providers
  • Production-ready: Built for real-world LLM applications
  • Flexible: Use encoder-based (fast) or decoder-based (customizable) models

Quickstart

Requirements

  • Python 3.11 or newer

Installation

Install with pip:

pip install any-guardrail

Basic Usage

AnyGuardrail provides a seamless interface for interacting with the guardrail models. It allows you to see a list of all the supported guardrails, and to instantiate each supported guardrail. Here is a full example:

from any_guardrail import AnyGuardrail, GuardrailName, GuardrailOutput

# Initialize guardrail
guardrail = AnyGuardrail.create(GuardrailName.DEEPSET)

# Validate input before sending to your LLM
result: GuardrailOutput = guardrail.validate("How do I hack into a system?")

if not result.valid:
    print(f"Blocked: {result.explanation}")
else:
    # Safe to proceed with LLM call
    response = your_llm(user_input)

Every guardrail returns the same GuardrailOutput shape, so you can swap models without changing application code:

result.valid       # bool verdict — True means the content passed
result.score       # risk score in ~[0, 1], higher = more likely violating (when available)
result.categories  # per-category results: CategoryResult(name, description, triggered, score, severity)
result.explanation # human-readable rationale (judge reasoning, raw generation)
result.action      # provider-recommended action (e.g. "block"), advisory; None if none
result.usage       # provenance: model_id, latency_ms, token counts
result.extra       # guardrail-specific structured extras; result.raw holds the backend payload

flagged = [c.name for c in result.categories if c.triggered]

A machine-readable JSON Schema for this output is published in the repo (generated from the Pydantic models). Reference it at the stable raw URL, pinning a release tag for a specific version:

https://raw.githubusercontent.com/mozilla-ai/any-guardrail/main/schemas/guardrail_output.schema.json

Documentation

Full guides at docs link

Troubleshooting

Some of the models on HuggingFace require extra permissions to use. To do this, you'll need to create a HuggingFace profile and manually go through the permissions. Then, you'll need to download the HuggingFace Hub and login. One way to do this is:

pip install --upgrade huggingface_hub

hf auth login

More information can be found here: HuggingFace Hub

Contributing to any-guardrail

The guardrail space is ever growing. If there is a guardrail that you'd like us to support, please see our CONTRIBUTING.md for details.

About

Guardrails to support any-agent

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages