-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[TRTLLM-5974][feat] Support disaggregated serving in TRTLLM Sampler #5328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TRTLLM-5974][feat] Support disaggregated serving in TRTLLM Sampler #5328
Conversation
/bot run |
PR_Github #9374 [ run ] triggered by Bot |
/bot run |
PR_Github #9414 [ run ] triggered by Bot |
PR_Github #9374 [ run ] completed with state |
PR_Github #9414 [ run ] completed with state |
/bot run |
PR_Github #9491 [ run ] triggered by Bot |
PR_Github #9491 [ run ] completed with state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for disaggregated serving in the TRTLLM Sampler while addressing a bug in the conversion from FinishedState to finish reason. Key changes include:
- Adding a new test configuration and test case for "trtllm_sampler" alongside the existing "overlap" configuration.
- Refactoring the finish reason conversion in the sampler to use the FinishedState abstraction.
- Updating the resource management in the py_executor to invoke seq_slot_manager during disaggregated generation initialization.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
tests/integration/defs/disaggregated/test_disaggregated.py | Updated test conditions to include the new trtllm_sampler, with new YAML config. |
tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml | Added a new configuration file for TRTLLM Sampler tests. |
tensorrt_llm/_torch/pyexecutor/seq_slot_manager.py | Adjusted request slot allocation logic for disaggregated generation. |
tensorrt_llm/_torch/pyexecutor/sampler.py | Updated the finish reason conversion logic using FinishedState. |
tensorrt_llm/_torch/pyexecutor/py_executor.py | Added resource preparation for the seq_slot_manager. |
tensorrt_llm/_torch/pyexecutor/finish_reason.py | Introduced the FinishedState class to better encapsulate finish reason logic. |
/bot run |
PR_Github #9560 [ run ] triggered by Bot |
PR_Github #9560 [ run ] completed with state |
/bot run |
PR_Github #9577 [ run ] triggered by Bot |
PR_Github #9577 [ run ] completed with state |
/bot run |
PR_Github #9599 [ run ] triggered by Bot |
PR_Github #9599 [ run ] completed with state |
6418b03
to
b78af95
Compare
PR_Github #9663 [ run ] triggered by Bot |
/bot run |
PR_Github #9707 [ run ] triggered by Bot |
PR_Github #9663 [ run ] completed with state |
PR_Github #9707 [ run ] completed with state |
283db74
to
c37bbad
Compare
/bot run |
PR_Github #9749 [ run ] triggered by Bot |
PR_Github #9749 [ run ] completed with state |
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
c37bbad
to
7cea569
Compare
/bot run |
PR_Github #9825 [ run ] triggered by Bot |
PR_Github #9825 [ run ] completed with state |
…VIDIA#5328) Signed-off-by: Daniel Campora <[email protected]> Signed-off-by: Daniel Cámpora <[email protected]> Co-authored-by: Copilot <[email protected]>
…VIDIA#5328) Signed-off-by: Daniel Campora <[email protected]> Signed-off-by: Daniel Cámpora <[email protected]> Co-authored-by: Copilot <[email protected]>
…VIDIA#5328) Signed-off-by: Daniel Campora <[email protected]> Signed-off-by: Daniel Cámpora <[email protected]> Co-authored-by: Copilot <[email protected]>
…VIDIA#5328) Signed-off-by: Daniel Campora <[email protected]> Signed-off-by: Daniel Cámpora <[email protected]> Co-authored-by: Copilot <[email protected]>
…VIDIA#5328) Signed-off-by: Daniel Campora <[email protected]> Signed-off-by: Daniel Cámpora <[email protected]> Co-authored-by: Copilot <[email protected]>
…VIDIA#5328) Signed-off-by: Daniel Campora <[email protected]> Signed-off-by: Daniel Cámpora <[email protected]> Co-authored-by: Copilot <[email protected]>
…VIDIA#5328) Signed-off-by: Daniel Campora <[email protected]> Signed-off-by: Daniel Cámpora <[email protected]> Co-authored-by: Copilot <[email protected]>
…VIDIA#5328) Signed-off-by: Daniel Campora <[email protected]> Signed-off-by: Daniel Cámpora <[email protected]> Co-authored-by: Copilot <[email protected]>
Support disaggregated serving in TRTLLM Sampler
This PR brings Disaggregated serving to the TRTLLM Sampler.