[Bug]: vllm-openai nightly Docker image still fails due to missing pytest during EngineCore startup

### Your current environment

Docker image:
vllm/vllm-openai:nightly
https://hub.docker.com/layers/vllm/vllm-openai/nightly/images/sha256-2b5f940431016b25c461761cb813cebd1f02a9e4ba1069226a5c1c9ffb6834c6

vLLM version:
0.21.1rc1.dev262+g33d7cbe02

Model:
RedHatAI/gemma-4-31B-it-NVFP4

Related issue:
#43480 

### 🐛 Describe the bug

I previously reported a similar startup failure in #43480, where the nightly Docker image failed because `pytest` was not installed and was imported indirectly via `humming` / `cupy.testing`.

After pulling a newer nightly image, the original failure path seems to have changed, but the server still fails to start because `pytest` is missing.

In this newer build, the model is loaded successfully, but `EngineCore` fails during startup while vLLM is initializing KV caches and running the profiling dummy run.

The failure path is now roughly:

```text
EngineCore startup
  -> _initialize_kv_caches
  -> determine_available_memory
  -> gpu_worker.profile_run
  -> gpu_model_runner._dummy_run
  -> torch._dynamo AOT compile
  -> torch.distributed.tensor.experimental._context_parallel._cp_custom_ops
  -> torch.library.custom_op / _register_fake
  -> inspect.getframeinfo / inspect.getmodule
  -> cupy.testing
  -> import pytest
  -> ModuleNotFoundError: No module named 'pytest'
```

So this appears to be the same underlying runtime dependency / import side-effect issue as #43480, but it is now triggered from a different code path during EngineCore initialization rather than during the earlier quantization config verification path.

Since `pytest` is normally a test dependency, the official runtime Docker image should not require it for normal vLLM server startup.

### Startup arguments

The server was started with the following non-default arguments shown in the log:

```python
{
  'model_tag': 'RedHatAI/gemma-4-31B-it-NVFP4',
  'default_chat_template_kwargs': {'enable_thinking': True},
  'enable_auto_tool_choice': True,
  'tool_call_parser': 'gemma4',
  'host': '0.0.0.0',
  'model': 'RedHatAI/gemma-4-31B-it-NVFP4',
  'trust_remote_code': True,
  'max_model_len': 256000,
  'served_model_name': ['gemma4-31b'],
  'reasoning_parser': 'gemma4',
  'kv_cache_dtype': 'fp8',
  'mm_processor_kwargs': {'max_soft_tokens': 1120},
  'max_num_batched_tokens': 8192,
  'max_num_seqs': 32,
  'scheduler_reserve_full_isl': False,
  'async_scheduling': True,
  'optimization_level': '3'
}
```

### Expected behavior

`vllm/vllm-openai:nightly` should start successfully without requiring the test dependency `pytest`.

If some runtime dependency indirectly imports `cupy.testing`, the Docker image should either include the required dependency or avoid importing test-only modules during normal server startup.

### Actual behavior

The server fails during `EngineCore` initialization.

The important part of the traceback is:

```text
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 396, in determine_available_memory
    self.model_runner.profile_run()

File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 6164, in profile_run
    hidden_states, last_hidden_states = self._dummy_run(

File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5824, in _dummy_run
    outputs = self.model(

File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gemma4_mm.py", line 1487, in forward
    hidden_states = self.language_model.model(

File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 663, in __call__
    self.aot_compiled_fn = self.aot_compile(*args, **kwargs)

File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 873, in aot_compile
    return aot_compile_fullgraph(

File "/usr/local/lib/python3.12/dist-packages/torch/distributed/tensor/experimental/_context_parallel/_cp_custom_ops.py", line 8, in <module>
    @torch.library.custom_op("cplib::flex_cp_allgather", mutates_args=())

File "/usr/local/lib/python3.12/dist-packages/torch/_library/utils.py", line 45, in get_source
    frame = inspect.getframeinfo(sys._getframe(stacklevel))

File "/usr/lib/python3.12/inspect.py", line 1007, in getmodule
    if ismodule(module) and hasattr(module, '__file__'):

File "/usr/local/lib/python3.12/dist-packages/cupy/testing/__init__.py", line 50, in <module>
    from cupy.testing._random import fix_random  # NOQA

File "/usr/local/lib/python3.12/dist-packages/cupy/testing/_random.py", line 11, in <module>
    import pytest

ModuleNotFoundError: No module named 'pytest'
```

Then the API server exits with:

```text
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
```

### Notes

This does not look like a model download or Hugging Face authentication issue. The model checkpoint is loaded successfully before the failure.

This also does not look specific to the earlier `humming` import path reported in #43480. The new failure path goes through `torch._dynamo` / `torch.distributed.tensor.experimental` / `cupy.testing`, but it reaches the same root cause:

```text
ModuleNotFoundError: No module named 'pytest'
```

Could you please check whether the nightly runtime image should include `pytest`, or whether `cupy.testing` should be avoided during normal vLLM server startup?


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: vllm-openai nightly Docker image still fails due to missing pytest during EngineCore startup #43528

Your current environment

🐛 Describe the bug

Startup arguments

Expected behavior

Actual behavior

Notes

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: vllm-openai nightly Docker image still fails due to missing pytest during EngineCore startup #43528

Description

Your current environment

🐛 Describe the bug

Startup arguments

Expected behavior

Actual behavior

Notes

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions