Skip to content

feat(llm): add catalog controls and speed billing#249

Open
brynary wants to merge 8 commits into
mainfrom
phase-7-controls-validation
Open

feat(llm): add catalog controls and speed billing#249
brynary wants to merge 8 commits into
mainfrom
phase-7-controls-validation

Conversation

@brynary
Copy link
Copy Markdown
Member

@brynary brynary commented May 13, 2026

Summary

This PR advances the catalog-driven LLM work from #210 by making the resolved model catalog the source of truth for provider registration, request control validation, and billing identity. Runs now preserve canonical provider/model/speed identity through pricing and API responses instead of collapsing billing around provider API aliases or model IDs alone.

What Changed

  • Register LLM provider adapters from the resolved catalog, including custom OpenAI-compatible providers and their credential resolution paths.
  • Validate effective model request controls, including run-level defaults and node overrides, before dispatching LLM requests.
  • Add catalog-aware billing lookup that prices canonical ModelRef values, uses base model costs for standard speed, applies per-speed cost overrides, and returns an unknown estimate instead of silently billing zero for unsupported combinations.
  • Move Anthropic Opus fast-mode pricing into the built-in catalog for claude-opus-4-6 and claude-opus-4-7.
  • Thread the injected catalog and effective speed controls through workflow billing, including API-mode and CLI-mode handlers.
  • Update billing APIs, server aggregation, generated clients, and the web billing view to expose provider/model/speed billing identity and keep standard and fast usage in separate rows.

Notes for Review

Billing lookup intentionally uses canonical catalog model IDs. Provider api_id substitution remains limited to provider request construction, so aliases can be used on the wire without changing billing identity. Event conversion paths that do not have catalog access now preserve token counts with a null dollar estimate rather than falling back to the bootstrap catalog.

Verification

  • cargo build -p fabro-api
  • cd lib/packages/fabro-api-client && bun run generate
  • cargo +nightly-2026-04-14 fmt --check --all
  • cargo +nightly-2026-04-14 clippy --workspace --all-targets -- -D warnings
  • ulimit -n 4096 && cargo nextest run -p fabro-model -p fabro-workflow -p fabro-server -p fabro-api -p fabro-cli --no-fail-fast
  • ulimit -n 4096 && cargo nextest run --workspace --no-fail-fast
  • cd apps/fabro-web && bun run typecheck
  • cd apps/fabro-web && bun test
  • git diff --check

Compound Engineering
🤖 Generated with GPT-5 via Codex

brynary added 3 commits May 12, 2026 19:58
Resolve LLM credentials and adapter registration through the runtime catalog so settings-defined providers can be used for requests. This keeps built-in behavior default-equivalent while supporting provider IDs, aliases, extra headers, header-only auth, base URLs, and provider API model IDs at the adapter boundary.
Propagate run-level model controls into workflow LLM requests, type speed at the request boundary, and reject unsupported speed or reasoning controls before provider dispatch.
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

brynary and others added 5 commits May 13, 2026 08:25
Add explicit model feature metadata for reasoning effort levels and prompt caching, while preserving the legacy effort flag as a compatibility alias. Gate Anthropic prompt-cache request encoding on the catalog feature and keep request serialization details in the adapter.
Other tests in the workspace (e.g. secret_list_json_returns_metadata_only)
write ANTHROPIC_API_KEY to the shared session daemon's vault. When that
test runs before bulk_skip_exits_zero_and_prints_summary, the shared
server resolves Anthropic as configured via vault env-lookup fallback,
attempts to call the real Anthropic API with the leaked test value, and
fails with a 401 "invalid x-api-key" instead of returning a skip.

Spawn an isolated server with a fresh vault so the bulk test's "no
credentials configured" precondition holds regardless of which other
tests share the nextest session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Thread the resolved server LLM catalog through workflow validation,
model resolution, credential lookup, request construction, and worker
startup so request-serving paths no longer depend on the builtin catalog.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant