Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -612,7 +612,7 @@ class ContinuousBatchingAsyncIOs:
between the two batches, which means twice as more VRAM is used for static input tensors and CUDA graph. If your GPU
is large enough or you want to generate long sequences, this is a good trade-off to make.

Asynchronous batching works by creating two pairs of host - device inputs and ouputs:
Asynchronous batching works by creating two pairs of host - device inputs and outputs:

inputs
┌──────────┐ ────────► ┌────────────┐
Expand Down
Loading