feat: add EncoderfileProvider for running guardrails via encoderfile binaries by dni138 · Pull Request #160 · mozilla-ai/any-guardrail

dni138 · 2026-05-20T17:31:13Z

Summary

Adds EncoderfileProvider so guardrails can run against Mozilla AI's encoderfile binaries — single self-contained executables that bundle a transformer encoder + classification head, no Python ML stack required at inference time.

Auto-downloads the right per-platform .encoderfile from mozilla-ai/encoderfile (supports aarch64-apple-darwin, x86_64-apple-darwin, aarch64-linux-gnu, x86_64-linux-gnu); accepts binary_path= override for locally-built binaries.
Spawns the binary in HTTP serve mode on a free port, polls /predict for readiness, and tears the subprocess down via close() / atexit.
Works with Protectai, Jasper, Deepset, and DuoGuard out of the box — the four guardrails currently published as encoderfiles.
New encoderfile extra in pyproject.toml (pulls in huggingface-hub, hf-xet, numpy — no torch, no transformers).
Interactive cookbook at docs/cookbook/encoderfile_guardrail.ipynb runs each guardrail through both providers side-by-side, plus batched inference.

The PR is structured as atomic commits, bottom-up:

refactor(providers) — Push softmax/sigmoid + argmax + id2label lookup down from match_injection_label[_batch] into each provider's infer(). Every provider now returns the same shape (logits, scores, predicted_indices, predicted_labels), so post-processing in guardrails stops reaching into provider.model.config.id2label. Adds multi_label=True on HuggingFaceProvider for DuoGuard's sigmoid head. Touches 10 source files + the batch-inference unit tests; all 85 unit tests stay green at this commit.
feat(providers) — New EncoderfileProvider + artifact mapping + encoderfile extra + unit tests (subprocess and HTTP mocked).
docs(encoderfile) — Cookbook notebook, provider reference page, SUMMARY nav.
test(integration) — Real-binary end-to-end tests for all 4 guardrails, CI-skipped.
fix(lint) — Satisfy ruff in encoderfile provider, notebook, and tests.
build(docs) — Emit Provider reference pages from generate_api_docs.py so the GitBook validator finds the new SUMMARY entry.
fix(providers) — Gate numpy import behind the huggingface extra so the base install stays importable.
fix(providers) — Restrict downloaded encoderfile permissions to owner (Copilot review).
fix(providers) — Close stale subprocess on repeated load_model calls (Copilot review).
chore(providers) — Clarify S310 suppression comment in encoderfile HTTP calls (Copilot review).

Closes #143.

Test plan

pytest tests/unit — 85 passed locally
pre-commit run --all-files — clean (ruff, mypy, codespell, nbstripout)
pytest tests/integration/test_encoderfile.py — all 5 tests pass on macOS arm64 with real binaries downloaded from HF
Notebook executes end-to-end with all 4 guardrails (HF vs encoderfile produce equivalent valid verdicts on the same inputs)
CI lint + test job passes (all 9 checks green: docs, run-docs-tests, run-linter, run-unit-tests × 6 OS/Python combinations)
All Copilot review threads resolved (4 of 4)
Reviewer can swap any of {Protectai, Jasper, Deepset, DuoGuard} between provider=HuggingFaceProvider() and provider=EncoderfileProvider() and get equivalent results — verified manually via the cookbook notebook

🤖 Generated with Claude Code

Push label-resolution (softmax/sigmoid + argmax + id2label lookup) from `match_injection_label[_batch]` down into the provider's `infer()`. Every provider now returns the same shape (`logits`, `scores`, `predicted_indices`, `predicted_labels`), so post-processing in guardrails no longer reaches into `provider.model.config.id2label` — a HF-specific path that wouldn't work for the upcoming EncoderfileProvider. Adds a `multi_label=True` flag on `HuggingFaceProvider` so multi-label heads (DuoGuard) get sigmoid'd scores. HarmGuard and DuoGuard now read `scores` directly from the uniform output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Run guardrails against [encoderfile](https://github.com/mozilla-ai/encoderfile) binaries instead of HuggingFace models. The provider auto-downloads the right per-platform `.encoderfile` artifact from `mozilla-ai/encoderfile`, spawns it in HTTP serve mode on a free port, polls `/predict` for readiness, and tears the subprocess down via `close()` or `atexit`. Users opt in per-guardrail: ```python from any_guardrail.guardrails.protectai.protectai import Protectai from any_guardrail.providers.encoderfile import EncoderfileProvider guardrail = Protectai(provider=EncoderfileProvider()) ``` A `binary_path=` override is available for locally-built encoderfiles. Supported model IDs (matching the published HF artifacts) are mapped in `_encoderfile_artifacts.py`. Adds the `encoderfile` extra (`huggingface-hub`, `hf-xet`, `numpy`) and wires it into `all`. Refs #143. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- New interactive cookbook walks through running Protectai, Jasper, Deepset, and DuoGuard via EncoderfileProvider, side-by-side with HuggingFaceProvider, plus native batched inference and lifecycle notes. - Provider reference page documents constructor knobs, the auto-download flow, supported model IDs, and platform/lifecycle caveats. - SUMMARY.md gains a new "Providers" section and the cookbook entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Real-binary integration tests for Protectai, Jasper, Deepset, and DuoGuard backed by EncoderfileProvider. Each test validates a safe and an unsafe prompt, plus a batched-validate case that exercises the binary's native `/predict` batching path. CI-skipped (binaries are ~800 MB each). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds an EncoderfileProvider to run supported guardrails via Mozilla AI encoderfile self-contained binaries (served locally over HTTP), and refactors post-processing so guardrails consume a provider-agnostic, uniform inference output shape.

Changes:

Introduces EncoderfileProvider (auto-download + subprocess lifecycle + HTTP inference) and an encoderfile artifact mapping.
Updates HuggingFaceProvider.infer() to return normalized outputs (logits, scores, predicted_indices, predicted_labels) and adds multi_label=True support (sigmoid scoring) for DuoGuard.
Adds unit + (CI-skipped) integration tests and documentation/cookbook content for encoderfile usage.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/unit/test_unit_encoderfile_provider.py	Unit tests for encoderfile provider behavior (platform detection, download selection, HTTP parsing, lifecycle).
tests/unit/test_unit_batch_inference.py	Updates batch inference tests to use the new uniform inference output shape (no direct id2label / torch logits).
tests/integration/test_encoderfile.py	End-to-end tests that download real encoderfile binaries and validate guardrail verdicts (skipped in CI).
src/any_guardrail/providers/huggingface.py	Normalizes HF inference outputs; adds softmax/sigmoid scoring and `multi_label` option.
src/any_guardrail/providers/encoderfile.py	New provider that manages an encoderfile binary subprocess and proxies `/predict` over HTTP.
src/any_guardrail/providers/_encoderfile_artifacts.py	Maps supported model IDs to encoderfile artifact paths on HuggingFace.
src/any_guardrail/guardrails/utils.py	Refactors injection-label helpers to consume `predicted_labels` + `scores` rather than raw logits/id2label.
src/any_guardrail/guardrails/sentinel/sentinel.py	Switches sentinel post-processing to the new utils signature.
src/any_guardrail/guardrails/protectai/protectai.py	Switches protectai post-processing to the new utils signature.
src/any_guardrail/guardrails/pangolin/pangolin.py	Switches pangolin post-processing to the new utils signature.
src/any_guardrail/guardrails/jasper/jasper.py	Switches jasper post-processing to the new utils signature.
src/any_guardrail/guardrails/injec_guard/injec_guard.py	Switches injec_guard post-processing to the new utils signature.
src/any_guardrail/guardrails/harm_guard/harm_guard.py	Updates HarmGuard to read unsafe probability from `scores` instead of recomputing softmax.
src/any_guardrail/guardrails/duo_guard/duo_guard.py	Updates DuoGuard to use provider-provided sigmoid `scores`; sets HF provider to `multi_label=True`.
src/any_guardrail/guardrails/deepset/deepset.py	Switches deepset post-processing to the new utils signature.
pyproject.toml	Adds `encoderfile` optional extra and includes it in `all`.
docs/SUMMARY.md	Adds cookbook + provider reference entries for encoderfile.
docs/cookbook/encoderfile_guardrail.ipynb	New cookbook demonstrating swapping providers + batch inference + lifecycle cleanup.
docs/api/providers/encoderfile.md	Adds API reference stub for encoderfile provider.

Comments suppressed due to low confidence (1)

src/any_guardrail/providers/encoderfile.py:262

Same as above: urlopen() is suppressed as "localhost only", but the request target is derived from self.host. If non-loopback hosts are allowed, this suppression can hide real SSRF-style lint findings. Consider enforcing loopback-only defaults (and rejecting non-loopback unless explicitly opted in), or adjust/remove the suppression.

        payload = json.dumps(model_inputs.data).encode("utf-8")
        request = urllib.request.Request(  # noqa: S310 - localhost only
            f"{self.base_url}/predict",
            data=payload,
            headers={"Content-Type": "application/json"},
            method="POST",
        )
        with urllib.request.urlopen(request, timeout=self.request_timeout) as resp:  # noqa: S310 - localhost only
            body = resp.read()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Add `strict=True` to the batched `zip()` in the cookbook notebook (B905). - Document `__del__` and silence the deliberate broad-except in cleanup (D105, BLE001, S110) — `__del__` must never raise. - Split compound `assert` statements in the integration tests into one assertion per condition (PT018). - Ruff format also reformatted the encoderfile provider and its unit tests; no functional changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The GitBook build validates every SUMMARY.md link against ``site/``; provider pages weren't being generated, so the new EncoderFile entry in SUMMARY.md was failing the validator on CI. Add a ``PROVIDERS`` registry alongside ``GUARDRAILS`` and a ``_provider_page`` helper that surfaces the provider's constructor and its ``load_model`` / ``pre_process`` / ``infer`` / ``close`` methods. Output goes to ``api/providers/``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`any_guardrail/providers/__init__.py` eagerly imports `HuggingFaceProvider`, so a top-level `import numpy as np` made `import any_guardrail` fail on the base install (no `[huggingface]` extra). This broke the docs-tests job, which only installs the base package. Move numpy into the existing `MISSING_PACKAGES_ERROR` try-block alongside torch and transformers. `_softmax` and `_sigmoid` reference `np` only at call time, and they're only called from inside `infer()` — which already fails earlier with the same import error if huggingface deps are missing. Also re-normalizes one notebook cell's source as a JSON list per nbstripout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`_ensure_executable` was setting the execute bit for owner, group, and other on auto-downloaded binaries. That's broader than necessary on multi-user hosts. Only the owner-execute bit is needed for the provider to spawn the subprocess. Addresses Copilot review comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Calling `load_model()` twice on the same `EncoderfileProvider` instance would overwrite `self.process` and leak the previously-spawned binary and its port. Tear down the existing subprocess at the top of `load_model()` so the provider can be reused across model_ids cleanly. Addresses Copilot review comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The "localhost only" noqa comments were misleading because the bind host is configurable. The actual reason the suppression is safe is that the URL targets the encoderfile subprocess this provider spawned, with a host/port owned by the provider. Update the comments to say so. Addresses Copilot review comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ing label resolution The uniform-shape refactor of `HuggingFaceProvider.infer` (commit f2a2b53) assumed 2D logits of shape `(batch, num_classes)` and unconditionally computed `predicted_indices = scores.argmax(axis=-1).tolist()` followed by `[id2label[i] for i in predicted_indices]`. For causal-LM-backed guardrails like ShieldGemma the logits are `(batch, seq, vocab)`, so `predicted_indices` came back as a list of lists and the dict lookup crashed with `TypeError: cannot use 'list' as a dict key`. The ShieldGemma integration test was therefore broken on this branch (it still passes on main, where `infer` returns the raw model output). Fix: detect higher-rank logits and return the raw torch tensor as `data["logits"]` with the classification-only fields (`scores`, `predicted_indices`, `predicted_labels`) set to `None`. ShieldGemma already reads `logits[0, -1, [vocab["Yes"], vocab["No"]]]` directly and runs its own `torch.nn.functional.softmax` on the selection, so it needs a torch tensor (not numpy) and doesn't care about the classification fields. The 2D classification path is unchanged. Verified locally: - ShieldGemma integration test now passes (was failing). - ProtectAI, Jasper, DuoGuard integration tests still pass (2D path). Adds three unit tests in `test_unit_huggingface_provider.py` covering the 2D classification path, the 3D causal-LM path, and the multi-label sigmoid branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… uniform infer() shape Now that this branch introduces EncoderfileProvider and reshapes HuggingFaceProvider.infer() to a uniform dict, update CLAUDE.md so Claude Code has accurate guidance for working on this branch (and beyond, once it merges). Changes: - Two-layer intro: name both providers explicitly so the multi-backend story is the first thing the reader sees. - Providers section: add the EncoderfileProvider entry with what it does, where artifacts come from, platform support, and that it's drop-in for HuggingFaceProvider on supported classifiers. - New "Uniform infer() shape" subsection: documents the dict contract both providers honor (`logits`, `scores`, `predicted_indices`, `predicted_labels`) and the causal-LM exception where HuggingFaceProvider returns a raw torch tensor + None fields. This is the contract guardrail authors need to know to write provider-agnostic _post_processing. - Dependencies: add the `encoderfile` extra and note it's torch/transformers-free. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Also updated cookbook script to avoid skipping bash commands

javiermtorres

Maybe we can add a grpc client in some followup PR, wdyt?

dni138 · 2026-05-22T15:33:59Z

Maybe we can add a grpc client in some followup PR, wdyt?

Ahhh, I forgot about grpc! Let me see how much of a lift it would be to add it in here, but I am open to it either in this PR or opening a new one to support both llamafile and encoderfile.

…generate_cookbooks `tests/unit/test_generate_cookbooks.py` was added on this branch to validate the cookbook-rendering script. It imports `generate_cookbooks` from `scripts/` via `sys.path.insert`, which trips two lint checks: - Ruff E402: `import` not at top of file (it can't be — `sys.path` must be modified first). - Mypy `import-not-found`: `scripts/` isn't on the import path, so mypy can't resolve the module. Add `# noqa: E402` on the import line for ruff. For mypy, the inline `# type: ignore[import-not-found]` isn't honored under our strict + `follow_untyped_imports` config; use the canonical `[[tool.mypy.overrides]] module = ["generate_cookbooks"] ignore_missing_imports = true` instead. Also picks up incidental trailing-whitespace cleanup in `encoderfile_guardrail.ipynb` that pre-commit's trim hook applied in the same pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@javiermtorres

`EncoderfileProvider` currently relied on `atexit.register(self.close)` for subprocess cleanup, which only fires on interpreter exit. For exception-safe use in scripts and request handlers, callers want a `with` block. Implement `__enter__` (returning `self`) and `__exit__` (calling `self.close()`). `atexit` is kept as a safety net for notebook/REPL usage where `with` would be awkward, and explicit `provider.close()` still works. `__enter__` deliberately does *not* call `load_model()`: providers are typically constructed before the caller knows which `model_id` to load, and guardrail classes (`Protectai`, `Jasper`, etc.) call `load_model` themselves in their own `__init__`. Updates the cookbook lifecycle section to show the `with` pattern as the recommended cleanup approach. Tests added: - `test_context_manager_returns_self_from_enter` - `test_context_manager_calls_close_on_exit` - `test_context_manager_calls_close_even_when_block_raises` Addresses @javiermtorres's review comment on PR #160. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@javiermtorres

`_free_port()` opens a socket to find a free port, closes it, returns the number. There's a small window before the encoderfile binary calls `bind()` during which another process can grab that port. @javiermtorres reported hitting this in lumigator. When we auto-pick the port (`self.port is None`), retry the spawn-and- wait sequence up to 3 times with a fresh port on each attempt. When the caller pinned a port via `EncoderfileProvider(port=NNNN)`, no retry: a bind failure is a config problem to surface, not a race. Implementation: - New `_BIND_RACE_RETRIES` class constant (= 3) so the retry budget is visible and adjustable. - Pulled the Popen + state-setup out of `load_model` into a private `_spawn_subprocess` helper so the retry loop reads cleanly. - Retry on both `RuntimeError` (subprocess exited prematurely — the common bind-race signature) and `TimeoutError` (slow startup, also plausibly port-related). Tests added: - `test_load_model_retries_on_bind_race_when_port_auto_picked`: first Popen returns a proc with `poll()==1` (dead), second returns a live proc; assert two Popen calls and final success. - `test_load_model_does_not_retry_when_port_pinned`: assert exactly one Popen attempt when `port=` was supplied. - `test_load_model_gives_up_after_max_bind_race_retries`: every retry also dies; assert `_BIND_RACE_RETRIES` total attempts and the final RuntimeError surfaces. Addresses @javiermtorres's review comment on PR #160. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI runs ruff 0.15.12 (pinned in .pre-commit-config.yaml), but my local pre-commit cache had 0.15.10. The newer version flagged three rules my local run let through: - **EM101** on `raise RuntimeError("boom")` in `test_context_manager_calls_close_even_when_block_raises` — assign the message to a variable first. - **PT012** on the same test's `pytest.raises()` block holding two statements (load_model + raise). Pull the body out into a small helper function so the `pytest.raises` block only contains one call. - **PYI034** on `EncoderfileProvider.__enter__ -> EncoderfileProvider` — context manager methods should return `Self` so subclasses keep the right inferred type. Import `Self` under `TYPE_CHECKING` since the module already uses `from __future__ import annotations`. Also reword the comment block above `import generate_cookbooks` in `test_generate_cookbooks.py`: the prose contained the literal string "# noqa: E402" inside backticks, which ruff 0.15.12 parses as a real (but invalid) noqa directive and emits a warning about. Rephrasing to "the ruff E402 suppression" sidesteps the false positive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@javiermtorres

…ark.heavy markers (#163) * refactor(tests): replace RUNNING_IN_CI gating with @pytest.mark.e2e marker @javiermtorres flagged that integration tests today decide for themselves whether to skip based on `RUNNING_IN_CI = os.environ.get("CI") == "true"`, and pointed out that the right place for that decision is the runner — either via `pytest -m ...` selection or via the CI workflow. Refactor to move the gating out of the test files. Changes: - `pyproject.toml`: register the `e2e` and `heavy` markers under `[tool.pytest.ini_options]`. Add `--strict-markers` so typos surface as collection errors instead of silently selecting nothing. - `tests/integration/conftest.py`: new directory-scope conftest that auto-applies `@pytest.mark.e2e` to every test under `tests/integration/`. By convention everything in this directory hits real binaries, downloaded models, or external APIs — so marking by directory keeps individual test files boilerplate-free and ensures new e2e tests pick up the marker automatically. - `tests/integration/test_huggingface_guardrails.py`: drop the module-level `RUNNING_IN_CI` import and the per-param `skipif(RUNNING_IN_CI, ...)` for the three big-model parameters. Replace with module-level marker handled by the conftest, plus per-param `marks=pytest.mark.heavy` on shield_gemma / glider / granite_guardian (the ones that need >5 GB RAM). - `tests/integration/test_granite_guardian.py`: drop `RUNNING_IN_CI` and the module-level `pytestmark = skipif(...)`. Add `pytestmark = pytest.mark.heavy` (`e2e` comes from the conftest). Selection matrix after the change (verified locally): - `pytest -m e2e tests/integration` → 21 tests (everything). - `pytest -m "e2e and not heavy" tests/integration` → 15 tests (deepset, duo_guard, harm_guard, injec_guard, jasper, pangolin, llama_guard, protectai, sentinel, off_topic, alinia, any_llm, 3x azure). - `pytest -m "e2e and heavy" tests/integration` → 6 tests (shield_gemma, glider, granite_guardian, 3x granite_guardian). - `pytest -m "not e2e" tests/integration` → 0 tests (clean). Note: `tests/integration/test_encoderfile.py` (which lands on the encoderfile-provider branch, PR #160) still uses the old `RUNNING_IN_CI` pattern; it'll be migrated to the marker as part of that branch's rebase onto main once both PRs land. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(integration): skip heavy e2e tests by default; add include_heavy dispatch input Before this commit the integration workflow ran every test in `tests/integration/`, relying on each test file's `RUNNING_IN_CI` skipif to filter out the >5 GB-model ones. The previous commit replaced that pattern with the `e2e` / `heavy` pytest markers, so the workflow now needs to do the selection. - Default invocation (push to main, or manual without checking the box): `pytest -m "e2e and not heavy" tests/integration` — runs exactly the set that used to run with `CI=true`. No behavior change for normal CI. - Manual `workflow_dispatch` with `include_heavy=true`: `pytest -m e2e tests/integration` — also runs ShieldGemma, Glider, Granite Guardian. Timeout bumped to 1800s for those (the previous 600s was tight even before, with reruns). The `workflow_dispatch.inputs.include_heavy` checkbox surfaces the intent @javiermtorres asked for: an e2e CI task that can be manually triggered when you actually want to exercise the heavy models. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… CI version (#166) `pre-commit run` reuses a cached virtualenv per hook revision, which means after a ruff version bump in `.pre-commit-config.yaml` your local cache can keep running the old version until you `pre-commit clean`. That's how PR #160 hit a CI lint failure on rules (EM101, PT012, PYI034) that the local pre-commit had silently allowed: local cache was on ruff 0.15.10, CI's pinned ruff is 0.15.12. Add a sub-bullet under the Ruff entry in the Code Quality section documenting: - The mismatch mode and how to spot it (CI lints fail on rules the local run didn't flag). - The fix: `pre-commit clean && pre-commit run --all-files`. - A one-liner for confirming which ruff version each cached env is actually on: find ~/.cache/pre-commit -name ruff -type f -exec {} --version \; Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@javiermtorres

Forward-port of the same change @javiermtorres requested for EncoderfileProvider in PR #160 (commit 87ade6a). LlamafileProvider mirrors EncoderfileProvider's subprocess+HTTP pattern and benefits from the same exception-safe cleanup. - Add `__enter__` (returning `Self`) and `__exit__` (calling `self.close()`) to LlamafileProvider. `Self` imported under `TYPE_CHECKING` (file already has `from __future__ import annotations`). - Keep `atexit.register(self.close)` as the non-`with` safety net. - Class docstring grows a `with LlamafileProvider() as provider: ...` example. - Cookbook lifecycle cell promotes the `with` block to bullet 1 (matches what the encoderfile cookbook now does). - CLAUDE.md provider bullets for both encoderfile and llamafile now mention "Implements the context manager protocol" for consistency (encoderfile already has the method post-#160-merge; just the doc bullet was lagging). Tests added: - `test_context_manager_returns_self_from_enter` - `test_context_manager_calls_close_on_exit` - `test_context_manager_calls_close_even_when_block_raises` (uses the helper-function pattern to satisfy ruff 0.15.12's PT012). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…s for llamafile Forward-port of PR #160 commit 9203c78 to LlamafileProvider. Same TOCTOU concern with `_free_port()`: it returns a port that was free a moment ago, but another process can grab it before the llamafile binary binds. The encoderfile fix benefited both providers in principle but the encoderfile commit only touched `src/any_guardrail/providers/encoderfile.py` — llamafile needed its own copy of the retry logic. Implementation: - New `_BIND_RACE_RETRIES = 3` class constant with the same docstring rationale. - Pulled the Popen + state-setup out of `load_model` into a private `_spawn_subprocess` helper so the retry loop reads cleanly. - Retry on `(RuntimeError, TimeoutError)` only when `self.port is None` (auto-picked). When the caller pinned a port, surface failures immediately — that's a config issue, not a race. Tests added (mirror the encoderfile suite): - `test_load_model_retries_on_bind_race_when_port_auto_picked` - `test_load_model_does_not_retry_when_port_pinned` - `test_load_model_gives_up_after_max_bind_race_retries` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #160 (encoderfile-provider) merged to main via squash, which made main's history disjoint from the merge chain that llamafile-provider had been using to reach the same encoderfile content. Conflicts surfaced in 6 files but in every case llamafile-provider's HEAD was the strict superset of main's squashed snapshot: - `CLAUDE.md` (4 hunks): HEAD has both LlamafileProvider mentions and the context-manager note on both provider bullets; main has neither. - `docs/SUMMARY.md` (2 hunks): HEAD has the llamafile cookbook + API page entries. - `pyproject.toml` (2 hunks): HEAD has the `llamafile` optional extra plus its entry in the `all` aggregator. - `scripts/generate_api_docs.py` (1 hunk): HEAD adds `LlamafileProvider` to the `PROVIDERS` registry. - `src/any_guardrail/providers/huggingface.py` (1 hunk): HEAD adds the `generate_chat()` method. - `tests/unit/test_unit_huggingface_provider.py` (1 hunk): HEAD adds the `_make_chat_provider` helper and the `generate_chat` test suite. All resolved by taking HEAD. The resulting tree is byte-identical to HEAD before the merge; this commit only fixes the topology so the PR can merge. Verified: 133/133 unit tests pass; pre-commit (ruff 0.15.12, mypy strict, codespell, nbstripout) is clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@javiermtorres

…161) * refactor(providers): uniform inference output across providers Push label-resolution (softmax/sigmoid + argmax + id2label lookup) from `match_injection_label[_batch]` down into the provider's `infer()`. Every provider now returns the same shape (`logits`, `scores`, `predicted_indices`, `predicted_labels`), so post-processing in guardrails no longer reaches into `provider.model.config.id2label` — a HF-specific path that wouldn't work for the upcoming EncoderfileProvider. Adds a `multi_label=True` flag on `HuggingFaceProvider` so multi-label heads (DuoGuard) get sigmoid'd scores. HarmGuard and DuoGuard now read `scores` directly from the uniform output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(providers): add EncoderfileProvider for encoderfile binaries Run guardrails against [encoderfile](https://github.com/mozilla-ai/encoderfile) binaries instead of HuggingFace models. The provider auto-downloads the right per-platform `.encoderfile` artifact from `mozilla-ai/encoderfile`, spawns it in HTTP serve mode on a free port, polls `/predict` for readiness, and tears the subprocess down via `close()` or `atexit`. Users opt in per-guardrail: ```python from any_guardrail.guardrails.protectai.protectai import Protectai from any_guardrail.providers.encoderfile import EncoderfileProvider guardrail = Protectai(provider=EncoderfileProvider()) ``` A `binary_path=` override is available for locally-built encoderfiles. Supported model IDs (matching the published HF artifacts) are mapped in `_encoderfile_artifacts.py`. Adds the `encoderfile` extra (`huggingface-hub`, `hf-xet`, `numpy`) and wires it into `all`. Refs #143. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(encoderfile): add cookbook notebook and provider reference page - New interactive cookbook walks through running Protectai, Jasper, Deepset, and DuoGuard via EncoderfileProvider, side-by-side with HuggingFaceProvider, plus native batched inference and lifecycle notes. - Provider reference page documents constructor knobs, the auto-download flow, supported model IDs, and platform/lifecycle caveats. - SUMMARY.md gains a new "Providers" section and the cookbook entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(integration): cover encoderfile-backed guardrails end-to-end Real-binary integration tests for Protectai, Jasper, Deepset, and DuoGuard backed by EncoderfileProvider. Each test validates a safe and an unsafe prompt, plus a batched-validate case that exercises the binary's native `/predict` batching path. CI-skipped (binaries are ~800 MB each). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(lint): satisfy ruff in encoderfile provider, notebook, and tests - Add `strict=True` to the batched `zip()` in the cookbook notebook (B905). - Document `__del__` and silence the deliberate broad-except in cleanup (D105, BLE001, S110) — `__del__` must never raise. - Split compound `assert` statements in the integration tests into one assertion per condition (PT018). - Ruff format also reformatted the encoderfile provider and its unit tests; no functional changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * build(docs): emit Provider reference pages from generate_api_docs.py The GitBook build validates every SUMMARY.md link against ``site/``; provider pages weren't being generated, so the new EncoderFile entry in SUMMARY.md was failing the validator on CI. Add a ``PROVIDERS`` registry alongside ``GUARDRAILS`` and a ``_provider_page`` helper that surfaces the provider's constructor and its ``load_model`` / ``pre_process`` / ``infer`` / ``close`` methods. Output goes to ``api/providers/``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(providers): gate numpy import behind the huggingface extra `any_guardrail/providers/__init__.py` eagerly imports `HuggingFaceProvider`, so a top-level `import numpy as np` made `import any_guardrail` fail on the base install (no `[huggingface]` extra). This broke the docs-tests job, which only installs the base package. Move numpy into the existing `MISSING_PACKAGES_ERROR` try-block alongside torch and transformers. `_softmax` and `_sigmoid` reference `np` only at call time, and they're only called from inside `infer()` — which already fails earlier with the same import error if huggingface deps are missing. Also re-normalizes one notebook cell's source as a JSON list per nbstripout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(providers): restrict downloaded encoderfile permissions to owner `_ensure_executable` was setting the execute bit for owner, group, and other on auto-downloaded binaries. That's broader than necessary on multi-user hosts. Only the owner-execute bit is needed for the provider to spawn the subprocess. Addresses Copilot review comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(providers): close stale subprocess on repeated load_model calls Calling `load_model()` twice on the same `EncoderfileProvider` instance would overwrite `self.process` and leak the previously-spawned binary and its port. Tear down the existing subprocess at the top of `load_model()` so the provider can be reused across model_ids cleanly. Addresses Copilot review comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(providers): clarify S310 suppression in encoderfile HTTP calls The "localhost only" noqa comments were misleading because the bind host is configurable. The actual reason the suppression is safe is that the URL targets the encoderfile subprocess this provider spawned, with a host/port owned by the provider. Update the comments to say so. Addresses Copilot review comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(huggingface): handle causal-LM 3D logits in infer() without crashing label resolution The uniform-shape refactor of `HuggingFaceProvider.infer` (commit f2a2b53) assumed 2D logits of shape `(batch, num_classes)` and unconditionally computed `predicted_indices = scores.argmax(axis=-1).tolist()` followed by `[id2label[i] for i in predicted_indices]`. For causal-LM-backed guardrails like ShieldGemma the logits are `(batch, seq, vocab)`, so `predicted_indices` came back as a list of lists and the dict lookup crashed with `TypeError: cannot use 'list' as a dict key`. The ShieldGemma integration test was therefore broken on this branch (it still passes on main, where `infer` returns the raw model output). Fix: detect higher-rank logits and return the raw torch tensor as `data["logits"]` with the classification-only fields (`scores`, `predicted_indices`, `predicted_labels`) set to `None`. ShieldGemma already reads `logits[0, -1, [vocab["Yes"], vocab["No"]]]` directly and runs its own `torch.nn.functional.softmax` on the selection, so it needs a torch tensor (not numpy) and doesn't care about the classification fields. The 2D classification path is unchanged. Verified locally: - ShieldGemma integration test now passes (was failing). - ProtectAI, Jasper, DuoGuard integration tests still pass (2D path). Adds three unit tests in `test_unit_huggingface_provider.py` covering the 2D classification path, the 3D causal-LM path, and the multi-label sigmoid branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(providers): add Provider.generate_chat() opt-in chat method Introduces a `generate_chat()` method on the Provider base class for chat-style decoder LLM workflows that don't fit the existing pre_process / infer pipeline. The default raises NotImplementedError naming the provider class, so encoder-only and API-shaped providers (Encoderfile, AzureContentSafety, etc.) are unaffected. The contract returns a uniform `GuardrailInferenceOutput[AnyDict]` with `generated_text` (decoded new tokens only), `prompt_token_count`, `completion_token_count`, and `raw`. Pushing decoding into the provider lets downstream guardrails operate on a string and stay provider- agnostic — the same call site can be backed by HuggingFaceProvider (tensor + tokenizer) or LlamafileProvider (HTTP). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(huggingface): implement generate_chat() on HuggingFaceProvider Wires up the chat-generation path that was previously duplicated inside each decoder-LLM guardrail (apply_chat_template + model.generate + slice + decode). Defaults match the existing inline behavior: `add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt"`. Callers can override any of these via `chat_template_kwargs`, which is also how RAG documents and tool lists are forwarded. Sampling is opt-in (`do_sample=False` greedy by default); when sampling, `temperature` is forwarded only if also provided. `generation_kwargs` is a pass-through for less common params like `pad_token_id` that some models need. Adds 5 unit tests covering decode-only-new-tokens, kwarg pass-through, sampling vs greedy, and the raw-output field. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(granite_guardian): route generation through provider.generate_chat() GraniteGuardian was reaching directly into `provider.tokenizer` and `provider.model.generate(...)`, which made it impossible to back with any provider other than HuggingFace. Move all the tensor / tokenizer plumbing into `HuggingFaceProvider.generate_chat()` so the guardrail just shapes messages, dispatches to the provider, and parses a string. What stays: - `_build_guardian_block` and `_build_messages` (Granite-specific prompt assembly) - The yes/no/think regex parsing and `_parse_generation` helper - All public behavior — the wire-format prompt to the HF model is identical pre/post refactor What moves out: - `apply_chat_template`, device placement, `torch.no_grad() + generate`, output slicing, and `tokenizer.decode` — all now live inside `HuggingFaceProvider.generate_chat()` - `import torch` is gone from this file — granite_guardian.py is now torch-free and provider-agnostic Tests updated to mock `provider.generate_chat` instead of `apply_chat_template` / `model.generate` / `tokenizer.decode`. Adds a test confirming that a non-chat-capable provider surfaces a clean NotImplementedError from `validate()`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(llama_guard): route generation through provider.generate_chat() Mirrors the GraniteGuardian refactor: replace direct `provider.tokenizer.apply_chat_template` / `provider.model.generate` / `provider.tokenizer.decode` calls with a single call to `provider.generate_chat(...)`. The guardrail now just builds the chat conversation per model variant, dispatches, and substring-checks the returned text for "unsafe". Behavior preservation: - The per-variant conversation shape (multimodal-content list vs. plain string) is unchanged. - Llama Guard 3 explicitly passes `add_generation_prompt=False` via chat_template_kwargs to match the prior tokenizer_params; v4 keeps the new default of True. - Llama Guard 3 keeps `pad_token_id=0` via generation_kwargs to preserve its prior generation behavior. - `_cached_model_inputs` and the v3-vs-v4 post-processing branching are gone — `generate_chat` returns pre-sliced, pre-decoded text in a uniform shape. Drops `import torch`. Refactoring LlamaGuard alongside GraniteGuardian proves the `generate_chat` abstraction works for more than one caller without changing public behavior. ShieldGemma stays on direct provider access (its token-level-logits use case can't be expressed through a chat-completion API); Glider uses HF pipelines and is out of scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(providers): add LlamafileProvider for decoder-LLM guardrails `llamafile` packages a decoder LLM (llama.cpp + GGUF weights) into a single Cosmopolitan APE executable that runs on Linux, macOS, and BSD. This provider auto-downloads the binary from HuggingFace, spawns it as a subprocess server, and routes `generate_chat()` calls to `POST /v1/chat/completions`. Together with the GraniteGuardian / LlamaGuard refactors, this lets users back a chat-style guardrail with llamafile instead of HF + torch with no code changes to the guardrail. Implementation notes: - Multi-platform binary, so no per-arch artifact tag (unlike encoderfile). The curated `LLAMAFILE_ARTIFACTS` map starts with `ibm-granite/granite-guardian-4.1-8b` -> the mozilla-ai 0.10 alpha artifact; power users can pass `binary_path=` or `repo_id`+`filename` to bypass the map. - macOS arm64 cannot exec Cosmopolitan APE binaries directly (the kernel doesn't recognize the `MZ` magic). The APE prelude is valid POSIX shell that exec's into the binary, so we spawn via `sh`, which works portably across Linux and macOS. The shell exec's, so Popen's PID is still the llama server PID — terminate() works. - Llamafile 0.10 uses modern llama.cpp flags: `--server` is implicit when `--port` is given, `--nobrowser` was removed (use `--no-webui`), and `--jinja` is the default. Surfaces `n_gpu_layers`, `context_size`, and `extra_args` for advanced use. - `pre_process` and `infer` raise NotImplementedError. Llamafile is a chat-style backend; `generate_chat` is the only inference entry point. Decoder-LLM guardrails route through it automatically. Includes 20 unit tests (mocked subprocess + urlopen + hf_hub_download) and a `skipif(RUNNING_IN_CI)` integration test that runs the real ~6.92 GB Granite Guardian llamafile end-to-end against safe + unsafe prompts. Adds the `llamafile` extra to pyproject.toml and folds it into `all`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(cookbook): add llamafile vs HuggingFace Granite Guardian comparison Side-by-side cookbook showing the same `GraniteGuardian` class backed by `HuggingFaceProvider` and `LlamafileProvider`, with the provider swap as the only code change. Walks through the HARM criterion, a JAILBREAK criterion reusing the running llamafile, a bring-your-own custom criterion, and lifecycle (atexit, explicit close, local binary_path, repo_id/filename overrides, GPU offload via n_gpu_layers). Calls out the 6.92 GB first-run download up front so readers can plan disk space. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cookbook): pass AutoModelForCausalLM when building HF provider for Granite Guardian `HuggingFaceProvider()` defaults to `AutoModelForSequenceClassification`, which can't load Granite Guardian (a causal LM). The default provider that `GraniteGuardian.__init__` builds internally picks the right classes, but when the cookbook supplied its own provider it bypassed that, so the first code cell raised: ValueError: Unrecognized configuration class GraniteConfig for this kind of AutoModel: AutoModelForSequenceClassification. Pass `model_class=AutoModelForCausalLM` and `tokenizer_class=AutoTokenizer` explicitly in the side-by-side example, and add a comment explaining when the explicit classes are needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(llamafile): don't silently force greedy when sampling without an explicit temperature `LlamafileProvider.generate_chat()` was unconditionally sending `temperature` in the request body, defaulting to `0` whenever the caller didn't supply an explicit value. With `do_sample=True` but no `temperature=`, this collapsed to greedy decoding (temperature=0) on the llamafile side — silently divergent from `HuggingFaceProvider`, which only forwards `temperature` when explicitly set. Now: - `do_sample=False` (greedy, the default) pins `temperature=0`. - `do_sample=True` forwards `temperature` only when the caller provides one; otherwise the field is omitted and the llamafile server uses its own default. Adds two regression tests: - `test_generate_chat_omits_temperature_when_sampling_without_explicit_value` - `test_generate_chat_pins_temperature_zero_in_greedy_mode` Addresses Copilot PR review comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(guardrails): lazy-import transformers in granite_guardian and llama_guard Both modules eagerly imported `transformers` at module load time, which made `from any_guardrail.guardrails.granite_guardian import GraniteGuardian` (and the LlamaGuard equivalent) crash with ImportError on installs that have only `any-guardrail[llamafile]` (no `transformers`/`torch`). That defeats the whole point of `LlamafileProvider`: a user can't pass `provider=LlamafileProvider(...)` if importing the guardrail class itself already requires the HF stack. Move the `transformers` imports inside the `provider is None` branch of each guardrail's `__init__`, so they only run when the default HuggingFaceProvider is actually being constructed. LlamaGuard's `__init__` is restructured slightly: the v3-vs-v4 chat_template_kwargs decision now happens before the provider branch (it's a guardrail-level concern, not a provider-level one), and only the default-provider construction depends on the lazy import. Addresses Copilot PR review comments. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: register LlamafileProvider in API doc generator and SUMMARY `scripts/convert_to_gitbook.py` excludes `docs/api/**` from the GitBook output and instead relies on `scripts/generate_api_docs.py` to emit fully-rendered provider pages into `site/api/`. The mkdocstrings stub at `docs/api/providers/llamafile.md` was therefore unreachable in the published docs — it only worked for local `mkdocs serve`. Add `LlamafileProvider` to the generator's `PROVIDERS` registry so it emits an `api/providers/llamafile.md` reference page, and link both the API page and the new cookbook from `docs/SUMMARY.md` so they appear in GitBook navigation. Addresses Copilot PR review comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * style(tests): apply ruff 0.15.12 formatting to llamafile provider tests The pinned ruff version in `.pre-commit-config.yaml` (0.15.12) collapses single-line-eligible function signatures that older ruff versions left wrapped. Local pre-commit was running an older cached version, so the diff slipped through; CI's `run-linter` job caught it. Pure formatting change — no semantic edits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(cookbook): add per-call timings to HF vs llamafile comparison Wrap each `validate()` call in `time.perf_counter()` so readers can see the actual per-prompt cost on both backends and a totals/ratio line at the end. Adds a note that the first prompt pays a one-time warm-up cost on each side (model load into caches, kv-cache allocation), so the second and third rows are the steady-state numbers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(claude.md): document EncoderfileProvider, encoderfile extra, and uniform infer() shape Now that this branch introduces EncoderfileProvider and reshapes HuggingFaceProvider.infer() to a uniform dict, update CLAUDE.md so Claude Code has accurate guidance for working on this branch (and beyond, once it merges). Changes: - Two-layer intro: name both providers explicitly so the multi-backend story is the first thing the reader sees. - Providers section: add the EncoderfileProvider entry with what it does, where artifacts come from, platform support, and that it's drop-in for HuggingFaceProvider on supported classifiers. - New "Uniform infer() shape" subsection: documents the dict contract both providers honor (`logits`, `scores`, `predicted_indices`, `predicted_labels`) and the causal-LM exception where HuggingFaceProvider returns a raw torch tensor + None fields. This is the contract guardrail authors need to know to write provider-agnostic _post_processing. - Dependencies: add the `encoderfile` extra and note it's torch/transformers-free. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(claude.md): document LlamafileProvider, generate_chat() contract, and llamafile extra Mirrors the encoderfile update for the llamafile addition on this branch: - Two-layer intro: name LlamafileProvider alongside the other providers so the multi-backend story is exhaustive. - Providers section: add the LlamafileProvider entry with the chat-only contract, the sh-bootstrap rationale for macOS arm64, the GPU offload knob, and the artifact-map location. - New "Opt-in generate_chat() for chat-style decoder LLMs" subsection (parallel to the existing "Uniform infer() shape" one) documenting the contract both HF and llamafile providers honor and that GraniteGuardian / LlamaGuard route through it instead of touching tokenizer/model directly. - Adding-a-new-guardrail step 5: call out the lazy-import pattern decoder-LLM guardrails must follow so a `[llamafile]`-only install doesn't ImportError at module load. - Dependencies: add the `llamafile` extra. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: clarify encoderfile artifact availability and cleanup examples Also updated cookbook script to avoid skipping bash commands * fix(tests): suppress lint warnings for dynamic script import in test_generate_cookbooks `tests/unit/test_generate_cookbooks.py` was added on this branch to validate the cookbook-rendering script. It imports `generate_cookbooks` from `scripts/` via `sys.path.insert`, which trips two lint checks: - Ruff E402: `import` not at top of file (it can't be — `sys.path` must be modified first). - Mypy `import-not-found`: `scripts/` isn't on the import path, so mypy can't resolve the module. Add `# noqa: E402` on the import line for ruff. For mypy, the inline `# type: ignore[import-not-found]` isn't honored under our strict + `follow_untyped_imports` config; use the canonical `[[tool.mypy.overrides]] module = ["generate_cookbooks"] ignore_missing_imports = true` instead. Also picks up incidental trailing-whitespace cleanup in `encoderfile_guardrail.ipynb` that pre-commit's trim hook applied in the same pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(providers): add context manager protocol to EncoderfileProvider `EncoderfileProvider` currently relied on `atexit.register(self.close)` for subprocess cleanup, which only fires on interpreter exit. For exception-safe use in scripts and request handlers, callers want a `with` block. Implement `__enter__` (returning `self`) and `__exit__` (calling `self.close()`). `atexit` is kept as a safety net for notebook/REPL usage where `with` would be awkward, and explicit `provider.close()` still works. `__enter__` deliberately does *not* call `load_model()`: providers are typically constructed before the caller knows which `model_id` to load, and guardrail classes (`Protectai`, `Jasper`, etc.) call `load_model` themselves in their own `__init__`. Updates the cookbook lifecycle section to show the `with` pattern as the recommended cleanup approach. Tests added: - `test_context_manager_returns_self_from_enter` - `test_context_manager_calls_close_on_exit` - `test_context_manager_calls_close_even_when_block_raises` Addresses @javiermtorres's review comment on PR #160. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(providers): retry on TOCTOU port-bind race when auto-picking ports `_free_port()` opens a socket to find a free port, closes it, returns the number. There's a small window before the encoderfile binary calls `bind()` during which another process can grab that port. @javiermtorres reported hitting this in lumigator. When we auto-pick the port (`self.port is None`), retry the spawn-and- wait sequence up to 3 times with a fresh port on each attempt. When the caller pinned a port via `EncoderfileProvider(port=NNNN)`, no retry: a bind failure is a config problem to surface, not a race. Implementation: - New `_BIND_RACE_RETRIES` class constant (= 3) so the retry budget is visible and adjustable. - Pulled the Popen + state-setup out of `load_model` into a private `_spawn_subprocess` helper so the retry loop reads cleanly. - Retry on both `RuntimeError` (subprocess exited prematurely — the common bind-race signature) and `TimeoutError` (slow startup, also plausibly port-related). Tests added: - `test_load_model_retries_on_bind_race_when_port_auto_picked`: first Popen returns a proc with `poll()==1` (dead), second returns a live proc; assert two Popen calls and final success. - `test_load_model_does_not_retry_when_port_pinned`: assert exactly one Popen attempt when `port=` was supplied. - `test_load_model_gives_up_after_max_bind_race_retries`: every retry also dies; assert `_BIND_RACE_RETRIES` total attempts and the final RuntimeError surfaces. Addresses @javiermtorres's review comment on PR #160. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(lint): satisfy ruff 0.15.12 lints introduced after my local cache CI runs ruff 0.15.12 (pinned in .pre-commit-config.yaml), but my local pre-commit cache had 0.15.10. The newer version flagged three rules my local run let through: - **EM101** on `raise RuntimeError("boom")` in `test_context_manager_calls_close_even_when_block_raises` — assign the message to a variable first. - **PT012** on the same test's `pytest.raises()` block holding two statements (load_model + raise). Pull the body out into a small helper function so the `pytest.raises` block only contains one call. - **PYI034** on `EncoderfileProvider.__enter__ -> EncoderfileProvider` — context manager methods should return `Self` so subclasses keep the right inferred type. Import `Self` under `TYPE_CHECKING` since the module already uses `from __future__ import annotations`. Also reword the comment block above `import generate_cookbooks` in `test_generate_cookbooks.py`: the prose contained the literal string "# noqa: E402" inside backticks, which ruff 0.15.12 parses as a real (but invalid) noqa directive and emits a warning about. Rephrasing to "the ruff E402 suppression" sidesteps the false positive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(providers): add context manager protocol to LlamafileProvider Forward-port of the same change @javiermtorres requested for EncoderfileProvider in PR #160 (commit 87ade6a). LlamafileProvider mirrors EncoderfileProvider's subprocess+HTTP pattern and benefits from the same exception-safe cleanup. - Add `__enter__` (returning `Self`) and `__exit__` (calling `self.close()`) to LlamafileProvider. `Self` imported under `TYPE_CHECKING` (file already has `from __future__ import annotations`). - Keep `atexit.register(self.close)` as the non-`with` safety net. - Class docstring grows a `with LlamafileProvider() as provider: ...` example. - Cookbook lifecycle cell promotes the `with` block to bullet 1 (matches what the encoderfile cookbook now does). - CLAUDE.md provider bullets for both encoderfile and llamafile now mention "Implements the context manager protocol" for consistency (encoderfile already has the method post-#160-merge; just the doc bullet was lagging). Tests added: - `test_context_manager_returns_self_from_enter` - `test_context_manager_calls_close_on_exit` - `test_context_manager_calls_close_even_when_block_raises` (uses the helper-function pattern to satisfy ruff 0.15.12's PT012). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(providers): retry on TOCTOU port-bind race when auto-picking ports for llamafile Forward-port of PR #160 commit 9203c78 to LlamafileProvider. Same TOCTOU concern with `_free_port()`: it returns a port that was free a moment ago, but another process can grab it before the llamafile binary binds. The encoderfile fix benefited both providers in principle but the encoderfile commit only touched `src/any_guardrail/providers/encoderfile.py` — llamafile needed its own copy of the retry logic. Implementation: - New `_BIND_RACE_RETRIES = 3` class constant with the same docstring rationale. - Pulled the Popen + state-setup out of `load_model` into a private `_spawn_subprocess` helper so the retry loop reads cleanly. - Retry on `(RuntimeError, TimeoutError)` only when `self.port is None` (auto-picked). When the caller pinned a port, surface failures immediately — that's a config issue, not a race. Tests added (mirror the encoderfile suite): - `test_load_model_retries_on_bind_race_when_port_auto_picked` - `test_load_model_does_not_retry_when_port_pinned` - `test_load_model_gives_up_after_max_bind_race_retries` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(tests): migrate test_llamafile.py from RUNNING_IN_CI to e2e marker PR #163 (now merged to main) replaced the `RUNNING_IN_CI = os.environ.get("CI") == "true"` gating pattern across all integration tests with a `@pytest.mark.e2e` marker auto-applied by `tests/integration/conftest.py`. The conftest flowed into llamafile-provider via the main → encoderfile-provider → llamafile-provider merge chain, but `test_llamafile.py` was added on this branch *before* PR #163 existed, so it still carried the old `RUNNING_IN_CI` pattern. With both in place, the test was redundantly gated (auto-marked `e2e` by directory AND module-level `skipif` on `RUNNING_IN_CI`). - Drop `import os`, `RUNNING_IN_CI` constant, and the `pytest.mark.skipif(RUNNING_IN_CI, ...)` from `pytestmark`. - Keep the platform `skipif` (`darwin`/`linux` only) — that's a real capability gate, not a CI-skip. - `e2e` is now applied via the directory conftest; the test gets collected only under `pytest -m e2e tests/integration` (or the workflow_dispatch path). Verified locally: - `pytest -m "e2e and not heavy" --collect-only tests/integration/test_llamafile.py` collects 1 test (the binary is ~6.92 GB but doesn't need GPU — not `heavy`). - `pytest -m "not e2e" --collect-only tests/integration/test_llamafile.py` collects 0 tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: angpt <anushrigupta@gmail.com>

dni138 and others added 4 commits May 20, 2026 13:29

dni138 requested a review from besaleli May 20, 2026 17:32

dni138 assigned javiermtorres and dni138 and unassigned javiermtorres May 20, 2026

dni138 requested review from Copilot and javiermtorres May 20, 2026 17:32

Copilot started reviewing on behalf of dni138 May 20, 2026 17:33 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

Comment thread src/any_guardrail/providers/huggingface.py Outdated

Comment thread src/any_guardrail/providers/encoderfile.py Outdated

Comment thread src/any_guardrail/providers/encoderfile.py

Comment thread src/any_guardrail/providers/encoderfile.py

dni138 requested a review from angpt May 20, 2026 17:41

dni138 and others added 10 commits May 20, 2026 13:42

Merge branch 'main' into encoderfile-provider

bd2e1b2

docs: clarify encoderfile artifact availability and cleanup examples

6d577d9

Also updated cookbook script to avoid skipping bash commands