Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Dec 22, 2025

Fix #18237

This PR introduces --shutdown-timeout for forcefully terminate the server after N seconds of waiting since the first shutdown request received (i.e. SIGINT, SIGTERM)

The most commonly known use case is when update_slots() is being stuck on a large batch of tokens that can take minutes to finish.

NOTE: this feature works on both router and single-model modes

How I tested this PR:

  • Start server with -ngl 0 -t 1 to make it super slow
  • Send a long input prompt
  • Crtl+C to stop it, expected to force-terminated after 10s

@ServeurpersoCom
Copy link
Collaborator

ServeurpersoCom commented Dec 22, 2025

I'm putting this branch directly in production; we'll see how it performs under real-use conditions and stress testing. If I understand correctly, no child process should get stuck, so the router will continue functioning normally. I'll also test with GLM Air 4.5 on large contexts where I've had hanging issues

@ServeurpersoCom
Copy link
Collaborator

As it stands, I can no longer zombify my llama-server router with the layer 7 DoS script. I haven't seen any regressions in basic usage

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 22, 2025

Thanks for testing!

I feel like there can be different (equivalent) approaches to implement this functionality, so probably will need to ask @ggerganov for review when he come back (no rush btw)

@ngxson ngxson marked this pull request as ready for review December 22, 2025 18:01
@ngxson ngxson requested a review from ggerganov as a code owner December 22, 2025 18:01
@ggerganov
Copy link
Member

The most commonly known use case is when update_slots() is being stuck on a large batch of tokens that can take minutes to finish.

Currently, do we know other cases in which a child can become unresponsive?

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 22, 2025

Currently, do we know other cases in which a child can become unresponsive?

Another case that I can think of is when model loading take too long (the shutdown_handler only get registered after model loaded), so user may need to wait until model loaded just to exit the application. This is currently the case for child process managed by router, but the quick fix can be to force-terminate the process at the stage if user ask to unload it.

This case may not happen in CLI application, as Ctrl+C at this stage will force-terminate it (the SIGINT handler is not yet registered).

Edit: while writing this, I realized that the solution proposed in this PR doesn't address this case; Do libllama provide a reliable way to stop model loading?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add watchdog with force-kill timeout for hung child processes in router mode

3 participants