feat(llm): add catalog controls and speed billing#249
Open
brynary wants to merge 8 commits into
Open
Conversation
Resolve LLM credentials and adapter registration through the runtime catalog so settings-defined providers can be used for requests. This keeps built-in behavior default-equivalent while supporting provider IDs, aliases, extra headers, header-only auth, base URLs, and provider API model IDs at the adapter boundary.
Propagate run-level model controls into workflow LLM requests, type speed at the request boundary, and reject unsupported speed or reasoning controls before provider dispatch.
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
Add explicit model feature metadata for reasoning effort levels and prompt caching, while preserving the legacy effort flag as a compatibility alias. Gate Anthropic prompt-cache request encoding on the catalog feature and keep request serialization details in the adapter.
Other tests in the workspace (e.g. secret_list_json_returns_metadata_only) write ANTHROPIC_API_KEY to the shared session daemon's vault. When that test runs before bulk_skip_exits_zero_and_prints_summary, the shared server resolves Anthropic as configured via vault env-lookup fallback, attempts to call the real Anthropic API with the leaked test value, and fails with a 401 "invalid x-api-key" instead of returning a skip. Spawn an isolated server with a fresh vault so the bulk test's "no credentials configured" precondition holds regardless of which other tests share the nextest session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Thread the resolved server LLM catalog through workflow validation, model resolution, credential lookup, request construction, and worker startup so request-serving paths no longer depend on the builtin catalog.
…o phase-7-controls-validation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR advances the catalog-driven LLM work from #210 by making the resolved model catalog the source of truth for provider registration, request control validation, and billing identity. Runs now preserve canonical provider/model/speed identity through pricing and API responses instead of collapsing billing around provider API aliases or model IDs alone.
What Changed
ModelRefvalues, uses base model costs for standard speed, applies per-speed cost overrides, and returns an unknown estimate instead of silently billing zero for unsupported combinations.claude-opus-4-6andclaude-opus-4-7.Notes for Review
Billing lookup intentionally uses canonical catalog model IDs. Provider
api_idsubstitution remains limited to provider request construction, so aliases can be used on the wire without changing billing identity. Event conversion paths that do not have catalog access now preserve token counts with a null dollar estimate rather than falling back to the bootstrap catalog.Verification
cargo build -p fabro-apicd lib/packages/fabro-api-client && bun run generatecargo +nightly-2026-04-14 fmt --check --allcargo +nightly-2026-04-14 clippy --workspace --all-targets -- -D warningsulimit -n 4096 && cargo nextest run -p fabro-model -p fabro-workflow -p fabro-server -p fabro-api -p fabro-cli --no-fail-fastulimit -n 4096 && cargo nextest run --workspace --no-fail-fastcd apps/fabro-web && bun run typecheckcd apps/fabro-web && bun testgit diff --check🤖 Generated with GPT-5 via Codex