-
Notifications
You must be signed in to change notification settings - Fork 67
Increase graceful timeout and hardcode AWS_PROFILE #306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase graceful timeout and hardcode AWS_PROFILE #306
Conversation
@@ -507,6 +507,7 @@ def get_endpoint_resource_arguments_from_request( | |||
main_env = [] | |||
if isinstance(flavor, RunnableImageLike) and flavor.env: | |||
main_env = [{"name": key, "value": value} for key, value in flavor.env.items()] | |||
main_env.append({"name": "AWS_PROFILE", "value": build_endpoint_request.aws_role}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we put a test for this (to test various merging logic with user-provided aws profiles)
@@ -10,7 +10,7 @@ | |||
|
|||
def start_server(): | |||
parser = argparse.ArgumentParser() | |||
parser.add_argument("--graceful-timeout", type=int, default=600) | |||
parser.add_argument("--graceful-timeout", type=int, default=1800) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this what's being used in async tasks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought we are going to patch llm-engine/model-engine/model_engine_server/inference/async_inference/celery.py to listen to sigterm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah iirc this code is used to serve the user container for both sync and async tasks for artifact-like bundles, and async_inference/celery.py
shouldn't be used anymore for the user containers at least (not sure about celery-forwarder though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'd also have to patch celery-forwarder to listen to sigterm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this what's being used in async tasks?
This is what Frances's async endpoint uses
we'd also have to patch celery-forwarder to listen to sigterm
From the celery documentation: "When shutdown is initiated the worker will finish all currently executing tasks before it actually terminates" which I think means we're good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible to simulate such an scenario by sending traffic while restarting a pod?
No description provided.