Will multiple sequences be supported for LLamaCpp backend? #1022
alfie-nsugh
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
With something like this
the different generations can be seen as different sequences. With something like LLamaCpp, you can do parallel decoding by changing
batch.n_seq_id[j] = 1
.Will optimizations like this be supported at some point?
Beta Was this translation helpful? Give feedback.
All reactions