-
Notifications
You must be signed in to change notification settings - Fork 163
feat(BA-992): Offload health check capability to AppProxy #5134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors how health checks and route updates are offloaded to AppProxy. Key changes include:
- Introducing
ModelServiceHelper
and reading health check config frommodel-definition.yaml
- Switching route updates to Redis-based notifications (
valkey_live
) instead of direct AppProxy HTTP calls - Adding Pydantic
HealthCheckConfig
and YAML-driven health check extraction
Reviewed Changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
src/ai/backend/manager/services/processors.py | Injected ValkeyLiveClient into ServiceArgs |
src/ai/backend/manager/services/model_serving/types.py | Added RouteConnectionInfo dataclass |
src/ai/backend/manager/services/model_serving/services/model_serving.py | Refactored endpoint creation and route update flows to use ModelServiceHelper and Redis |
src/ai/backend/manager/registry.py | Implemented AppProxy endpoint creation, health-check support, and Redis notifications |
src/ai/backend/common/data/config/types.py | Added HealthCheckConfig Pydantic model |
Comments suppressed due to low confidence (3)
src/ai/backend/manager/registry.py:3586
- Add unit tests for get_health_check_info to cover both standard runtime variants and CUSTOM model-definition.yaml parsing to ensure health check configurations are correctly extracted.
async def get_health_check_info(
src/ai/backend/manager/registry.py:3640
- [nitpick] Add or update the docstring for create_appproxy_endpoint to explain parameters, return value, and health_check payload format for better maintainability.
async with session.post(
src/ai/backend/manager/registry.py:3658
- The create_appproxy_endpoint payload no longer includes any 'apps' or inference routing info; ensure AppProxy can derive routing solely from Redis notifications or re-add necessary fields to the payload if required.
"health_check": health_check_information.model_dump(mode="json")
@dataclass | ||
class RouteConnectionInfo: | ||
app: str | ||
kernel_host: str | ||
kernel_port: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The RouteConnectionInfo dataclass is defined but never used; consider removing it or integrating it into generate_redis_route_info for consistent typing.
@dataclass | |
class RouteConnectionInfo: | |
app: str | |
kernel_host: str | |
kernel_port: int | |
# Removed the unused RouteConnectionInfo dataclass as it is not referenced anywhere in the code. |
Copilot uses AI. Check for mistakes.
src/ai/backend/manager/registry.py
Outdated
@@ -3671,6 +3688,19 @@ async def delete_appproxy_endpoint(self, db_sess: AsyncSession, endpoint: Endpoi | |||
): | |||
pass | |||
|
|||
async def notify_endpoint_route_update_to_appproxy(self, endpoint: EndpointRow) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider wrapping Redis operations in notify_endpoint_route_update_to_appproxy within try/except to handle potential valkey_live errors and avoid unhandled exceptions disrupting routing updates.
Copilot uses AI. Check for mistakes.
session_id: SessionId | ||
model_name: str | ||
new_status: ModelServiceStatus | ||
class RouteCreationEvent(AbstractAnycastEvent): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RouteLifecycleEvent?
Please fix this conflict tomorrow. @kyujin-cho |
Please resolve conflicts. |
ec25b1b
to
0258e66
Compare
0258e66
to
192e55a
Compare
Co-authored-by: HyeockJinKim <[email protected]>
Co-authored-by: HyeockJinKim <[email protected]>
Co-authored-by: HyeockJinKim <[email protected]>
Resolves #3051 (BA-992).
Summary
Breaking Changes
Important: This PR breaks health check capability on OSS AppProxy. Future work will restore support, but for now disable the health check feature in
model-definition.yaml
to use model service on Open Source Backend.AI.Changes
API Specifications
ModelServiceStatusEventArgs
base classEndpointRouteListUpdatedEvent
for endpoint route synchronizationHealth Check System
ModelServiceHelper
class replacingModelServicePredicateChecker
Route Management
generate_redis_route_info
method for serializable connection datanotify_endpoint_route_update_to_appproxy
for real-time updatesDatabase Changes
Technical Details
The health check system now reads configuration from either:
Health check configuration is passed to AppProxy during endpoint creation, allowing AppProxy to handle health checking independently. Route connection information is stored in Redis for AppProxy consumption with the key pattern:
endpoint.{endpoint_id}.route_connection_info
Testing
Existing tests should continue to pass. New functionality requires model service integration testing with AppProxy components.