Replies: 1 comment
-
WIP |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
turbomind在LLM部分很快,但是对于InternVL
其在vit部分依然采用的automap的流水线推理的模式,这极大的增加了整体的时延
对与pytorch engine,其vit部分已经实现了tp,我在尝试整合时遇到了困难。
有没有好心人解释一下如何将单纯的一个vit tp实现剥离出来,适配到turbomind推理前的vl_encoder.async_infer中去
Beta Was this translation helpful? Give feedback.
All reactions