-
Notifications
You must be signed in to change notification settings - Fork 14.2k
server: implement --shutdown-timeout #18292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
I'm putting this branch directly in production; we'll see how it performs under real-use conditions and stress testing. If I understand correctly, no child process should get stuck, so the router will continue functioning normally. I'll also test with GLM Air 4.5 on large contexts where I've had hanging issues |
|
As it stands, I can no longer zombify my llama-server router with the layer 7 DoS script. I haven't seen any regressions in basic usage |
|
Thanks for testing! I feel like there can be different (equivalent) approaches to implement this functionality, so probably will need to ask @ggerganov for review when he come back (no rush btw) |
Currently, do we know other cases in which a child can become unresponsive? |
Another case that I can think of is when model loading take too long (the This case may not happen in CLI application, as Ctrl+C at this stage will force-terminate it (the SIGINT handler is not yet registered). Edit: while writing this, I realized that the solution proposed in this PR doesn't address this case; Do |
Fix #18237
This PR introduces
--shutdown-timeoutfor forcefully terminate the server after N seconds of waiting since the first shutdown request received (i.e. SIGINT, SIGTERM)The most commonly known use case is when
update_slots()is being stuck on a large batch of tokens that can take minutes to finish.NOTE: this feature works on both router and single-model modes
How I tested this PR:
-ngl 0 -t 1to make it super slow