Skip to content

fix(openai): clamp max_tokens to per-model limits to prevent overflow errors#5696

Merged
leseb merged 3 commits into
ogx-ai:mainfrom
leseb:leseb/openai-max-tokens-clamp
May 4, 2026
Merged

fix(openai): clamp max_tokens to per-model limits to prevent overflow errors#5696
leseb merged 3 commits into
ogx-ai:mainfrom
leseb:leseb/openai-max-tokens-clamp

Conversation

@leseb

@leseb leseb commented May 4, 2026

Copy link
Copy Markdown
Member

What does this PR do?

Fixes BadRequestError: max_tokens is too large when clients (e.g. Claude Code) send max_tokens values that exceed what the target OpenAI model supports. For example, Claude Code requests max_tokens: 32000 but gpt-4o-mini only supports 16384.

Adds a static per-model max_output_tokens map to the OpenAI provider adapter and clamps incoming max_tokens at request time. Supports prefix matching for dated snapshot variants (e.g. gpt-4o-2024-08-06 inherits from gpt-4o). Logs a warning once per unknown model so operators know the map needs updating when new models are released.

Also populates max_output_tokens in model metadata via construct_model_from_identifier(), exposing it through the /v1/models endpoint's custom_metadata field.

Test Plan

uv run pytest tests/unit/providers/inference/test_remote_openai.py -v --tb=short

Output:

tests/unit/providers/inference/test_remote_openai.py::TestOpenAIMaxTokensClamping::test_clamps_when_request_exceeds_model_limit PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIMaxTokensClamping::test_keeps_lower_request_value PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIMaxTokensClamping::test_no_clamping_when_max_tokens_is_none PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIMaxTokensClamping::test_does_not_mutate_original_params PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIMaxTokensClamping::test_different_models_have_different_limits PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIMaxTokensClamping::test_no_clamping_for_unknown_model PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIMaxTokensClamping::test_dated_snapshot_model_uses_base_limit PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIModelMetadata::test_construct_model_includes_max_output_tokens PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIModelMetadata::test_construct_model_unknown_has_no_max_output_tokens PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIModelMetadata::test_construct_model_embedding_unchanged PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIMaxOutputTokensWarning::test_warns_once_for_unknown_model PASSED
tests/unit/providers/inference/test_remote_openai.py::TestOpenAIMaxOutputTokensWarning::test_all_known_models_have_limits PASSED
12 passed in 0.12s

… errors

Clients like Claude Code may request max_tokens values that exceed what
the target OpenAI model supports (e.g. 32000 for gpt-4o-mini which caps
at 16384), causing BadRequestError from the OpenAI API. Add a static
per-model max_output_tokens map and clamp incoming requests accordingly,
with prefix matching for dated snapshot variants and a warning log for
unknown models.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
…napshot prefixes.

Signed-off-by: Sébastien Han <seb@redhat.com>
@leseb

leseb commented May 4, 2026

Copy link
Copy Markdown
Member Author

With this patch i can do:

ANTHROPIC_BASE_URL="http://localhost:8321" ANTHROPIC_API_KEY="fake" claude --model "openai/gpt-4o-mini" -p "say hi in one word" 2>&1 | head
Hello.

Signed-off-by: Sébastien Han <seb@redhat.com>

@cdoern cdoern left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots of new unit tests but lgtm

@leseb leseb added this pull request to the merge queue May 4, 2026
Merged via the queue into ogx-ai:main with commit 8fcda2f May 4, 2026
71 checks passed
@leseb leseb deleted the leseb/openai-max-tokens-clamp branch May 4, 2026 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants