Skip to content

[Bug]: vllm-openai nightly Docker image still fails due to missing pytest during EngineCore startup #43528

@yasu-oh

Description

@yasu-oh

Your current environment

Docker image:
vllm/vllm-openai:nightly
https://hub.docker.com/layers/vllm/vllm-openai/nightly/images/sha256-2b5f940431016b25c461761cb813cebd1f02a9e4ba1069226a5c1c9ffb6834c6

vLLM version:
0.21.1rc1.dev262+g33d7cbe02

Model:
RedHatAI/gemma-4-31B-it-NVFP4

Related issue:
#43480

🐛 Describe the bug

I previously reported a similar startup failure in #43480, where the nightly Docker image failed because pytest was not installed and was imported indirectly via humming / cupy.testing.

After pulling a newer nightly image, the original failure path seems to have changed, but the server still fails to start because pytest is missing.

In this newer build, the model is loaded successfully, but EngineCore fails during startup while vLLM is initializing KV caches and running the profiling dummy run.

The failure path is now roughly:

EngineCore startup
  -> _initialize_kv_caches
  -> determine_available_memory
  -> gpu_worker.profile_run
  -> gpu_model_runner._dummy_run
  -> torch._dynamo AOT compile
  -> torch.distributed.tensor.experimental._context_parallel._cp_custom_ops
  -> torch.library.custom_op / _register_fake
  -> inspect.getframeinfo / inspect.getmodule
  -> cupy.testing
  -> import pytest
  -> ModuleNotFoundError: No module named 'pytest'

So this appears to be the same underlying runtime dependency / import side-effect issue as #43480, but it is now triggered from a different code path during EngineCore initialization rather than during the earlier quantization config verification path.

Since pytest is normally a test dependency, the official runtime Docker image should not require it for normal vLLM server startup.

Startup arguments

The server was started with the following non-default arguments shown in the log:

{
  'model_tag': 'RedHatAI/gemma-4-31B-it-NVFP4',
  'default_chat_template_kwargs': {'enable_thinking': True},
  'enable_auto_tool_choice': True,
  'tool_call_parser': 'gemma4',
  'host': '0.0.0.0',
  'model': 'RedHatAI/gemma-4-31B-it-NVFP4',
  'trust_remote_code': True,
  'max_model_len': 256000,
  'served_model_name': ['gemma4-31b'],
  'reasoning_parser': 'gemma4',
  'kv_cache_dtype': 'fp8',
  'mm_processor_kwargs': {'max_soft_tokens': 1120},
  'max_num_batched_tokens': 8192,
  'max_num_seqs': 32,
  'scheduler_reserve_full_isl': False,
  'async_scheduling': True,
  'optimization_level': '3'
}

Expected behavior

vllm/vllm-openai:nightly should start successfully without requiring the test dependency pytest.

If some runtime dependency indirectly imports cupy.testing, the Docker image should either include the required dependency or avoid importing test-only modules during normal server startup.

Actual behavior

The server fails during EngineCore initialization.

The important part of the traceback is:

File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 396, in determine_available_memory
    self.model_runner.profile_run()

File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 6164, in profile_run
    hidden_states, last_hidden_states = self._dummy_run(

File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5824, in _dummy_run
    outputs = self.model(

File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gemma4_mm.py", line 1487, in forward
    hidden_states = self.language_model.model(

File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 663, in __call__
    self.aot_compiled_fn = self.aot_compile(*args, **kwargs)

File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 873, in aot_compile
    return aot_compile_fullgraph(

File "/usr/local/lib/python3.12/dist-packages/torch/distributed/tensor/experimental/_context_parallel/_cp_custom_ops.py", line 8, in <module>
    @torch.library.custom_op("cplib::flex_cp_allgather", mutates_args=())

File "/usr/local/lib/python3.12/dist-packages/torch/_library/utils.py", line 45, in get_source
    frame = inspect.getframeinfo(sys._getframe(stacklevel))

File "/usr/lib/python3.12/inspect.py", line 1007, in getmodule
    if ismodule(module) and hasattr(module, '__file__'):

File "/usr/local/lib/python3.12/dist-packages/cupy/testing/__init__.py", line 50, in <module>
    from cupy.testing._random import fix_random  # NOQA

File "/usr/local/lib/python3.12/dist-packages/cupy/testing/_random.py", line 11, in <module>
    import pytest

ModuleNotFoundError: No module named 'pytest'

Then the API server exits with:

RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Notes

This does not look like a model download or Hugging Face authentication issue. The model checkpoint is loaded successfully before the failure.

This also does not look specific to the earlier humming import path reported in #43480. The new failure path goes through torch._dynamo / torch.distributed.tensor.experimental / cupy.testing, but it reaches the same root cause:

ModuleNotFoundError: No module named 'pytest'

Could you please check whether the nightly runtime image should include pytest, or whether cupy.testing should be avoided during normal vLLM server startup?

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions