Skip to content

Conversation

kyujin-cho
Copy link
Member

Follow-up PR of #5134. This PR updates session status transition logic to generate up-to-date health check information upon model service's target session status updates.

@Copilot Copilot AI review requested due to automatic review settings July 21, 2025 06:14
@github-actions github-actions bot added size:M 30~100 LoC comp:manager Related to Manager component comp:common Related to Common component labels Jul 21, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the model service health check synchronization with AppProxy by updating session status transition logic to provide real-time health check information. The changes ensure that AppProxy receives up-to-date health check configurations whenever session statuses change.

Key changes include:

  • Enhanced health check information storage in Redis with separate enabled flag and configuration
  • Updated session filtering to only include RUNNING sessions in route generation
  • Consolidated Redis operations for both connection info and health check data

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
src/ai/backend/manager/registry.py Updated to use new consolidated Redis storage method and improved variable naming
src/ai/backend/manager/models/endpoint.py Added session status filtering to only include RUNNING sessions in active routes
src/ai/backend/common/clients/valkey_client/valkey_live/client.py Added new method to atomically update both connection info and health check data in Redis
changes/.feat.md Added changelog entry for the feature

@@ -543,7 +544,9 @@ async def generate_redis_route_info(
[
r.session
for r in active_routes
if r.status in RouteStatus.active_route_statuses() and r.session
if r.status in RouteStatus.active_route_statuses()
and r.session
Copy link
Preview

Copilot AI Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code accesses r.session_row.status but there's no guarantee that session_row is loaded or available on the route object r. This could result in an AttributeError if the session relationship isn't properly loaded.

Suggested change
and r.session
and r.session
and r.session_row

Copilot uses AI. Check for mistakes.

@kyujin-cho kyujin-cho requested a review from HyeockJinKim July 21, 2025 06:16
Comment on lines 486 to 501
await self._client.client.set(
f"endpoint.{endpoint_id}.route_connection_info",
json.dumps(connection_info),
expiry=ExpirySet(ExpiryType.SEC, 3600),
)
await self._client.client.set(
f"endpoint.{endpoint_id}.health_check_enabled",
"true" if health_check_config is not None else "false",
expiry=ExpirySet(ExpiryType.SEC, 3600),
)
if health_check_config:
await self._client.client.set(
f"endpoint.{endpoint_id}.health_check_config",
health_check_config.model_dump_json(),
expiry=ExpirySet(ExpiryType.SEC, 3600),
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use batch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block was originally written in pipeline but I just unwrapped it because there seemed to be no alternative option available for valkey... But I guess batch would fit. Thanks for the heads up.

@HyeockJinKim HyeockJinKim enabled auto-merge July 21, 2025 06:34
@HyeockJinKim HyeockJinKim added this pull request to the merge queue Jul 21, 2025
Merged via the queue into main with commit 372253f Jul 21, 2025
29 checks passed
@HyeockJinKim HyeockJinKim deleted the feature/sync-health-check-info-real-time branch July 21, 2025 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:common Related to Common component comp:manager Related to Manager component size:M 30~100 LoC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants