-
Notifications
You must be signed in to change notification settings - Fork 1.7k
tests: add test_chunked_prefill for llama4 #5549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0eedd1c
to
e1449bb
Compare
e1449bb
to
26010dc
Compare
@schetlur-nv llama4 with chunked prefill will be OOM, test log. Is it known issues? It seems like that the feature has done https://jirasw.nvidia.com/browse/TRTLLM-5484. |
26010dc
to
ef7047d
Compare
📝 Walkthrough""" WalkthroughA new parameterized test method, Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~6 minutes Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (1)
414-415
: Document why overlap scheduler is disabled.The
disable_overlap_scheduler=True
configuration should be documented to explain why this is necessary for chunked prefill testing.pytorch_config = dict(attn_backend=attn_backend, - disable_overlap_scheduler=True) + disable_overlap_scheduler=True) # Required for chunked prefill stabilitytests/integration/test_lists/qa/examples_test_list.txt (1)
457-458
: Missing TIMEOUT annotation & asymmetry with preceding block.For parity with the Llama-3 chunked-prefill entries (lines 440-441) add explicit timeouts, e.g.:
-accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_chunked_prefill[attn_backend=FLASHINFER] -accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_chunked_prefill[attn_backend=TRTLLM] +accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_chunked_prefill[attn_backend=FLASHINFER] TIMEOUT (60) +accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_chunked_prefill[attn_backend=TRTLLM] TIMEOUT (90)Without a timeout the runner may default to an unsuitable global value or silently use the previous test’s timeout.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tests/integration/defs/accuracy/test_llm_api_pytorch.py
(1 hunks)tests/integration/test_lists/qa/examples_test_list.txt
(3 hunks)tests/integration/test_lists/qa/llm_sanity_test.txt
(1 hunks)tests/integration/test_lists/waives.txt
(1 hunks)
🔇 Additional comments (3)
tests/integration/test_lists/qa/llm_sanity_test.txt (1)
21-21
: Ensure Llama4 chunked_prefill tests are included where intended.We’ve located all
chunked_prefill
entries across the QA test lists:
- tests/integration/test_lists/qa/llm_sanity_test.txt
- accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_chunked_prefill[attn_backend=FLASHINFER] TIMEOUT (60)
- tests/integration/test_lists/qa/examples_test_list.txt
- accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_chunked_prefill[attn_backend=FLASHINFER] TIMEOUT (60)
- accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_chunked_prefill[attn_backend=TRTLLM] TIMEOUT (90)
- accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_chunked_prefill[attn_backend=FLASHINFER]
- accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_chunked_prefill[attn_backend=TRTLLM]
It appears only
examples_test_list.txt
includes the new Llama4 tests, whilellm_sanity_test.txt
remains limited to Llama3.1. Please confirm whether:
llm_sanity_test.txt
should also registerTestLlama4MaverickInstruct::test_chunked_prefill
entries,- or if Llama4 tests belong solely in
examples_test_list.txt
per the PR’s scope.tests/integration/test_lists/waives.txt (1)
435-437
: Looks good – waiver entries correctly protect the new chunked-prefill tests.The two
test_chunked_prefill
parametrisations are now guarded behind bug 5345391, preventing the OOM from breaking CI. No further action required here.tests/integration/test_lists/qa/examples_test_list.txt (1)
440-441
: Nice addition, but keep timeout style consistent.The Llama-3 chunked-prefill tests have explicit
TIMEOUT (60|90)
tags – perfect.
No change required other than making sure the surrounding grouping comment explains these are new chunked-prefill sanity tests.
0d5dd5c
to
c964f77
Compare
aaa21e8
to
fd58042
Compare
/bot run |
PR_Github #12667 [ run ] triggered by Bot |
PR_Github #12667 [ run ] completed with state |
fd58042
to
e2e5e5b
Compare
/bot run |
PR_Github #12675 [ run ] triggered by Bot |
PR_Github #12675 [ run ] completed with state |
e2e5e5b
to
310ed76
Compare
PR_Github #12844 [ run ] triggered by Bot |
PR_Github #12844 [ run ] completed with state |
cec28aa
to
979cc1e
Compare
/bot reuse-pipeline |
PR_Github #12872 [ reuse-pipeline ] triggered by Bot |
PR_Github #12872 [ reuse-pipeline ] completed with state |
368e91d
to
a29ae50
Compare
a29ae50
to
c06792a
Compare
Signed-off-by: Xin He (SW-GPU) <[email protected]>
c06792a
to
a8717f2
Compare
/bot reuse-pipeline |
PR_Github #12931 [ reuse-pipeline ] triggered by Bot |
PR_Github #12931 [ reuse-pipeline ] completed with state |
Signed-off-by: Xin He (SW-GPU) <[email protected]> Signed-off-by: Shreyas Misra <[email protected]>
Signed-off-by: Xin He (SW-GPU) <[email protected]> Signed-off-by: Ransiki Zhang <[email protected]>
Signed-off-by: Xin He (SW-GPU) <[email protected]> Signed-off-by: Lanyu Liao <[email protected]>
add chunked prefill tests for llama4, https://prod.blsm.nvidia.com/swqa-tensorrt-qa-test/job/LLM_FUNCTION_CLUSTER_TEST/912/testReport/B200.accuracy.test_llm_api_pytorch/TestLlama4MaverickInstruct/
Summary by CodeRabbit