fix(openai): clamp max_tokens to per-model limits to prevent overflow errors by leseb · Pull Request #5696 · ogx-ai/ogx

leseb · 2026-05-04T13:53:36Z

What does this PR do?

Fixes BadRequestError: max_tokens is too large when clients (e.g. Claude Code) send max_tokens values that exceed what the target OpenAI model supports. For example, Claude Code requests max_tokens: 32000 but gpt-4o-mini only supports 16384.

Adds a static per-model max_output_tokens map to the OpenAI provider adapter and clamps incoming max_tokens at request time. Supports prefix matching for dated snapshot variants (e.g. gpt-4o-2024-08-06 inherits from gpt-4o). Logs a warning once per unknown model so operators know the map needs updating when new models are released.

Also populates max_output_tokens in model metadata via construct_model_from_identifier(), exposing it through the /v1/models endpoint's custom_metadata field.

Test Plan

uv run pytest tests/unit/providers/inference/test_remote_openai.py -v --tb=short

Output:

tests/unit/providers/inference/test_remote_openai.py::TestOpenAIMaxTokensClamping::test_clamps_when_request_exceeds_model_limit PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIMaxTokensClamping::test_keeps_lower_request_value PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIMaxTokensClamping::test_no_clamping_when_max_tokens_is_none PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIMaxTokensClamping::test_does_not_mutate_original_params PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIMaxTokensClamping::test_different_models_have_different_limits PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIMaxTokensClamping::test_no_clamping_for_unknown_model PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIMaxTokensClamping::test_dated_snapshot_model_uses_base_limit PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIModelMetadata::test_construct_model_includes_max_output_tokens PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIModelMetadata::test_construct_model_unknown_has_no_max_output_tokens PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIModelMetadata::test_construct_model_embedding_unchanged PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIMaxOutputTokensWarning::test_warns_once_for_unknown_model PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIMaxOutputTokensWarning::test_all_known_models_have_limits PASSED
12 passed in 0.12s

… errors Clients like Claude Code may request max_tokens values that exceed what the target OpenAI model supports (e.g. 32000 for gpt-4o-mini which caps at 16384), causing BadRequestError from the OpenAI API. Add a static per-model max_output_tokens map and clamp incoming requests accordingly, with prefix matching for dated snapshot variants and a warning log for unknown models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

…napshot prefixes. Signed-off-by: Sébastien Han <seb@redhat.com>

leseb · 2026-05-04T14:20:32Z

With this patch i can do:

ANTHROPIC_BASE_URL="http://localhost:8321" ANTHROPIC_API_KEY="fake" claude --model "openai/gpt-4o-mini" -p "say hi in one word" 2>&1 | head
Hello.

Signed-off-by: Sébastien Han <seb@redhat.com>

cdoern

lots of new unit tests but lgtm

leseb requested review from bbrowning, cdoern, franciscojavierarceo, mattf and raghotham as code owners May 4, 2026 13:53

fix(openai): Clamp both completion token fields and prefer specific s…

98779d0

…napshot prefixes. Signed-off-by: Sébastien Han <seb@redhat.com>

style(openai): Apply ruff formatting for pre-commit compliance.

175b6a4

Signed-off-by: Sébastien Han <seb@redhat.com>

cdoern approved these changes May 4, 2026

View reviewed changes

leseb added this pull request to the merge queue May 4, 2026

Merged via the queue into ogx-ai:main with commit 8fcda2f May 4, 2026
71 checks passed

leseb deleted the leseb/openai-max-tokens-clamp branch May 4, 2026 14:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(openai): clamp max_tokens to per-model limits to prevent overflow errors#5696

fix(openai): clamp max_tokens to per-model limits to prevent overflow errors#5696
leseb merged 3 commits into
ogx-ai:mainfrom
leseb:leseb/openai-max-tokens-clamp

leseb commented May 4, 2026

Uh oh!

leseb commented May 4, 2026

Uh oh!

cdoern left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leseb commented May 4, 2026

What does this PR do?

Test Plan

Uh oh!

leseb commented May 4, 2026

Uh oh!

cdoern left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants