Skip to content

Advanced High-Availability considerations for model service #3051

@kyujin-cho

Description

@kyujin-cho

Currently Model Service relies on the health check information provided by the kernel runner operating on each container. As the container itself acts as the only source, the health status cannot be determined whenever entire GPU node shuts down.To guarantee the activeness of each model service, it is crucial to check whether the container itself is unresponsive and try to reconcile the replica size if it is. We can suggest following improvements to resolve the issue:

  • Make AppProxy as the health checker
  • Add an option to automatically terminate unhealthy sessions after a certain grace period

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions