fix: keep a strong reference to the websocket stream task in autogen-studio#7825
fix: keep a strong reference to the websocket stream task in autogen-studio#7825kratos0718 wants to merge 2 commits into
Conversation
The run websocket handler starts ws_manager.start_stream() with a bare asyncio.create_task() and discards the returned task. The event loop only keeps a weak reference to a task, so it can be garbage-collected before start_stream finishes — silently dropping a user's run mid-stream. Track the task in a module-level set and remove it on completion via add_done_callback so a strong reference is held until it finishes.
avinashkamat48-design
left a comment
There was a problem hiding this comment.
Keeping a strong reference fixes the GC risk, but this still drops task failures silently. If ws_manager.start_stream(...) raises, the done callback only discards the task from _background_tasks; nobody retrieves task.exception(), logs it, or notifies the websocket/client. Could the callback inspect/log failures so background stream errors do not become invisible?
Per review feedback: keeping a strong reference fixed the GC risk but swallowed exceptions. The done callback now logs any non-cancelled task exception via logger.error(exc_info=...), so a failing start_stream is no longer dropped silently.
|
Good catch @avinashkamat48-design — you're right, the strong reference alone still let failures pass silently. I've updated the done-callback to log any non-cancelled task exception via def _on_stream_done(t: asyncio.Task, run_id: int = run_id) -> None:
_background_tasks.discard(t)
if not t.cancelled() and (exc := t.exception()) is not None:
logger.error(f"Stream task for run {run_id} failed: {exc}", exc_info=exc)
stream_task.add_done_callback(_on_stream_done)Happy to route it to the client over the socket instead of (or in addition to) logging if you'd prefer that. Thanks for the review! |
|
@microsoft-github-policy-service agree |
Why are these changes needed?
The run websocket handler in
autogen-studio(web/routes/ws.py) starts the stream with a bareasyncio.create_task(...)and discards the returned task:Per the asyncio docs, the event loop only keeps a weak reference to a task. A task with no other reference can be garbage-collected before it finishes — here that means a user's streaming run can be silently dropped mid-flight under GC pressure, with no error surfaced.
Fix
Keep a strong reference to the task in a module-level set and remove it on completion via
add_done_callback(the pattern recommended in the asyncio docs):No behavior change beyond ensuring the stream task is retained until it completes. Single file.
Related issue number
N/A — small self-contained reliability fix.
Checks
python -m py_compile).