[Core][AMD] Propagate shutdown timeout to MultiprocExecutor by rjrock · Pull Request #43154 · vllm-project/vllm

rjrock · 2026-05-19T22:10:13Z

Purpose

rocprofv3 requires a grace period during process shutdown in order to emit trace data. This PR adds the environment variable VLLM_WORKER_SHUTDOWN_TIMEOUT_SECONDS that sets a shutdown grace period for worker processes of MultiProcExecutor. The env var is also passed to the engine manager shutdown.

Previously, running a command like the below would fail.

rocprofv3 \
  --disable-signal-handlers \
  --output-format pftrace \
  -r -- \
    vllm \
      bench throughput \
      --shutdown-timeout 60 \
      --model Qwen/Qwen3-32B \
      --num-prompts=1 \
      --tensor-parallel-size 2

Similarly, any rocprofv3 trace command that took longer than the 4 second shutdown period in multiproc_executor.py::_ensure_worker_termination would fail.

With this change merged, a successful run would look like the below.

export VLLM_WORKER_SHUTDOWN_TIMEOUT_SECONDS=120
rocprofv3 \
  --disable-signal-handlers \
  --output-format pftrace \
  -r -- \
    vllm \
      bench throughput \
      --shutdown-timeout 60 \
      --model Qwen/Qwen3-32B \
      --num-prompts=1 \
      --tensor-parallel-size 2

Test Plan

pytest tests/v1/executor/test_executor.py::test_multiproc_executor_worker_termination_timeout
pytest -s -v tests/v1/engine/test_core_engine_actor_manager.py::test_background_resources_passes_worker_shutdown_timeout

Test Result

Success

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

gemini-code-assist

Code Review

This pull request implements a configurable shutdown timeout for the V1 engine and multiprocess executor. It adds a shutdown_timeout attribute to BackgroundResources and updates the MultiprocExecutor to use this value, ensuring a minimum grace period during worker termination. A review comment correctly identified a potential TypeError in multiproc_executor.py that could occur if the timeout configuration is None, suggesting a default value to prevent the crash.

rjrock · 2026-05-20T18:01:23Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a configurable shutdown timeout for the MultiprocExecutor in the V1 engine. Changes include adding a shutdown_timeout field to BackgroundResources, passing this value to the engine manager during shutdown, and updating MultiprocExecutor to use the configured timeout with a 4-second minimum. Unit tests were added to verify worker termination behavior. Feedback points out a potential TypeError in MultiprocExecutor if the shutdown_timeout is None and provides a suggestion to handle this case safely.

mergify · 2026-05-20T18:41:47Z

Hi @rjrock, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

AndreasKaratzas · 2026-06-01T18:51:38Z

cc @njhill PTAL

dllehr-amd

Can you take a quick peak at my note? I'm trying to confirm that we won't negatively impact the default operation mode if we don't set the time ourselves.

rocprofv3 requires a grace period during process shutdown in order to emit trace data. Signed-off-by: Ryan Rock <ryan.rock@amd.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Ryan Rock <ryan.rock@amd.com>

This reverts commit c20b9a8. Signed-off-by: Ryan Rock <ryan.rock@amd.com>

Signed-off-by: Ryan Rock <ryan.rock@amd.com>

rjrock · 2026-06-01T21:46:02Z

Added a max call to BackgroundResources to maintain the previous behavior.

Signed-off-by: Ryan Rock <ryan.rock@amd.com>

njhill · 2026-06-03T22:27:00Z

Thanks @rjrock. The shutdown_timeout option in the config is for a global graceful shutdown where we wait for in-fight requests to complete rather than immediately aborting them.

So I'm not sure we should use that value here. By the time we are shutting down the executor we are in tear-down mode and the 4 second timeout is just to allow the resources to be released/process to exit cleanly. Perhaps for this purpose it would be better to just add a new VLLM_WORKER_SHUTDOWN_TIMEOUT env var in envs.py?

mergify · 2026-06-04T18:47:19Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @rjrock.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Ryan Rock <ryan.rock@amd.com>

rjrock · 2026-06-06T01:22:18Z

Thanks @rjrock. The shutdown_timeout option in the config is for a global graceful shutdown where we wait for in-fight requests to complete rather than immediately aborting them.

So I'm not sure we should use that value here. By the time we are shutting down the executor we are in tear-down mode and the 4 second timeout is just to allow the resources to be released/process to exit cleanly. Perhaps for this purpose it would be better to just add a new VLLM_WORKER_SHUTDOWN_TIMEOUT env var in envs.py?

That makes sense. I rewrote it to use the env var VLLM_WORKER_SHUTDOWN_TIMEOUT_SECONDS. Please take another look when you get a chance, @njhill.

njhill

Thanks @rjrock just have a couple of minor comments

Co-authored-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Ryan Rock <ryan.rock@amd.com>

Signed-off-by: Ryan Rock <ryan.rock@amd.com>

mergify · 2026-06-11T22:30:50Z

Hi @rjrock, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

dllehr-amd

Thanks @rjrock! approving as well!

Co-authored-by: Claude Signed-off-by: Nicholas Edelman <nedelman@nvidia.com> [Core][AMD] Propagate shutdown timeout to MultiprocExecutor (vllm-project#43154) Signed-off-by: Ryan Rock <ryan.rock@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> [Refactor] Deprecate ResponsesParser wrapper, inline parsing into ParsableContext (vllm-project#45431) Signed-off-by: sfeng33 <4florafeng@gmail.com> [ROCm] Bump Torch to 2.11 (vllm-project#45362) Signed-off-by: Micah Williamson <micah.williamson@amd.com> [Attention] Improve attention benchmarks: configs and profiling (vllm-project#39336) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

…ject#43154) Signed-off-by: Ryan Rock <ryan.rock@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

mergify Bot added rocm Related to AMD ROCm v1 labels May 19, 2026

github-project-automation Bot added this to AMD May 19, 2026

github-project-automation Bot moved this to Todo in AMD May 19, 2026

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

Comment thread vllm/v1/executor/multiproc_executor.py Outdated

rjrock force-pushed the rocprof-worker-shutdown branch from a947bd5 to d895571 Compare May 20, 2026 17:50

gemini-code-assist Bot reviewed May 20, 2026

View reviewed changes

Comment thread vllm/v1/executor/multiproc_executor.py Outdated

rjrock force-pushed the rocprof-worker-shutdown branch from 1a048aa to dbb1bf8 Compare May 20, 2026 18:18

rjrock marked this pull request as ready for review May 20, 2026 18:36

rjrock requested a review from njhill as a code owner May 20, 2026 18:36

rjrock force-pushed the rocprof-worker-shutdown branch from dbb1bf8 to eaf54b2 Compare May 20, 2026 19:33

jwzheng96 mentioned this pull request May 30, 2026

[Bugfix] Reject non-positive values for ParallelConfig int knobs #44057

Merged

3 tasks

AndreasKaratzas added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 1, 2026

dllehr-amd requested changes Jun 1, 2026

View reviewed changes

Comment thread vllm/v1/engine/core_client.py Outdated

rjrock and others added 5 commits June 1, 2026 15:52

[Core][AMD] Propagate shutdown timeout to MultiprocExecutor

e528a29

rocprofv3 requires a grace period during process shutdown in order to emit trace data. Signed-off-by: Ryan Rock <ryan.rock@amd.com>

Update vllm/v1/executor/multiproc_executor.py

759c168

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Ryan Rock <ryan.rock@amd.com>

Revert "Update vllm/v1/executor/multiproc_executor.py"

c27614f

This reverts commit c20b9a8. Signed-off-by: Ryan Rock <ryan.rock@amd.com>

Add tests

fd61043

Signed-off-by: Ryan Rock <ryan.rock@amd.com>

Condense pytest.param lines

0a59310

Signed-off-by: Ryan Rock <ryan.rock@amd.com>

rjrock force-pushed the rocprof-worker-shutdown branch from eaf54b2 to 0a59310 Compare June 1, 2026 20:52

Set GPU worker shutdown to at least 5 seconds

309fb76

Signed-off-by: Ryan Rock <ryan.rock@amd.com>

rjrock requested a review from dllehr-amd June 1, 2026 21:47

Fix test_engine_core_client failure

5f911de

Signed-off-by: Ryan Rock <ryan.rock@amd.com>

mergify Bot added the needs-rebase label Jun 4, 2026

rjrock added 4 commits June 4, 2026 19:48

Revert changes

0b9052f

Signed-off-by: Ryan Rock <ryan.rock@amd.com>

Use env var instead of CLI option

fc00cf7

Signed-off-by: Ryan Rock <ryan.rock@amd.com>

Add tests

f2059ff

Signed-off-by: Ryan Rock <ryan.rock@amd.com>

Merge branch 'main' into rocprof-worker-shutdown

a1f12ae

Signed-off-by: Ryan Rock <ryan.rock@amd.com>

mergify Bot removed the needs-rebase label Jun 5, 2026

Remove newline from merge

f1dd9a1

Signed-off-by: Ryan Rock <ryan.rock@amd.com>

njhill reviewed Jun 11, 2026

View reviewed changes

Comment thread vllm/envs.py Outdated

Comment thread vllm/envs.py Outdated

rjrock and others added 2 commits June 11, 2026 16:52

Update vllm/envs.py

680b7d1

Co-authored-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Ryan Rock <ryan.rock@amd.com>

Default VLLM_WORKER_SHUTDOWN_TIMEOUT_SECONDS to 5

1145ca5

Signed-off-by: Ryan Rock <ryan.rock@amd.com>

rjrock requested a review from njhill June 11, 2026 22:03

njhill approved these changes Jun 11, 2026

View reviewed changes

njhill enabled auto-merge (squash) June 11, 2026 22:25

Merge branch 'main' into rocprof-worker-shutdown

3509f4f

Merge branch 'main' into rocprof-worker-shutdown

ac6ee07

dllehr-amd approved these changes Jun 12, 2026

View reviewed changes

njhill merged commit aab639c into vllm-project:main Jun 12, 2026
80 checks passed

github-project-automation Bot moved this from Todo to Done in AMD Jun 12, 2026

rjrock deleted the rocprof-worker-shutdown branch June 12, 2026 20:14

Uh oh!

Conversation

rjrock commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

rjrock commented May 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mergify Bot commented May 20, 2026

Uh oh!

AndreasKaratzas commented Jun 1, 2026

Uh oh!

dllehr-amd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rjrock commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njhill commented Jun 3, 2026

Uh oh!

mergify Bot commented Jun 4, 2026

Uh oh!

rjrock commented Jun 6, 2026

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented Jun 11, 2026

Uh oh!

dllehr-amd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rjrock commented May 19, 2026 •

edited

Loading

rjrock commented Jun 1, 2026 •

edited

Loading