[None][fix] Add synchronize for kvcache cleanup #7537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

pengbowang-nv wants to merge 1 commit into NVIDIA:release/1.0 from pengbowang-nv:fix-add-synchronize-for-kvcache-cleanup

tensorrt_llm/_torch/pyexecutor/model_engine.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -657,6 +657,7 @@ def clean_up_kv_cache(): @@
                     # Zero the KV cache; NaNs may be introduced during warmup
                     for layer_idx in kv_cache_manager.layer_offsets.keys():
                         kv_cache_manager.get_buffers(layer_idx).zero_()
+                    torch.cuda.synchronize()
                 stack.callback(clean_up_kv_cache)
@@ Expand Down @@