[nvbug/5354946][fix] Fix mtp vanilla draft inputs (#5568)

lfr-0531 · web-flow · commit 6cbc9a5297d1 · 2025-06-30T15:59:12.000+08:00
Signed-off-by: Fanrong Li &lt;23290157+lfr-0531@users.noreply.github.com&gt;
diff --git a/tensorrt_llm/_torch/speculative/mtp.py b/tensorrt_llm/_torch/speculative/mtp.py
@@ -522,7 +522,6 @@ def forward(
                 "position_ids": draft_inputs["position_ids"],
                 "hidden_states": draft_hidden_states,
                 "attn_metadata": draft_inputs["attn_metadata"],
-                "spec_metadata": draft_inputs["spec_metadata"],
             }
         next_draft_tokens = torch.stack(next_draft_tokens, dim=1)
 
diff --git a/tests/integration/test_lists/waives.txt b/tests/integration/test_lists/waives.txt
@@ -422,7 +422,6 @@ accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_nvfp4_4gpus[moe_backe
 accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_nvfp4_4gpus[moe_backend=TRTLLM-mtp_nextn=2-ep4-fp8kv=True-attention_dp=True-cuda_graph=True-overlap_scheduler=True-torch_compile=False] SKIP (https://nvbugs/5349343)
 full:B200/test_e2e.py::test_ptp_quickstart_advanced_deepseek_multi_nodes[DeepSeek-R1/DeepSeek-R1-0528-FP4] SKIP (https://nvbugs/5344688)
 accuracy/test_llm_api.py::TestLlama3_1_8BInstruct::test_guided_decoding_4gpus[xgrammar] SKIP (https://nvbugs/5346443)
-accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales[mtp=vanilla-fp8kv=False-attention_dp=False-cuda_graph=False-overlap_scheduler=False-torch_compile=False] SKIP (https://nvbugs/5354946)
 examples/test_multimodal.py::test_llm_multimodal_general[kosmos-2-pp:1-tp:1-float16-bs:1-cpp_e2e:True-nb:1] SKIP (https://nvbugs/5354936)
 examples/test_multimodal.py::test_llm_multimodal_general[fuyu-8b-pp:1-tp:1-float16-bs:1-cpp_e2e:True-nb:1] SKIP (https://nvbugs/5354936)
 accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_4gpus_static_eplb SKIP (https://nvbugs/5354925)

Original file line number	Diff line number	Diff line change
`@@ -522,7 +522,6 @@ def forward(`
`522`	`522`	`"position_ids": draft_inputs["position_ids"],`
`523`	`523`	`"hidden_states": draft_hidden_states,`
`524`	`524`	`"attn_metadata": draft_inputs["attn_metadata"],`
`525`		`- "spec_metadata": draft_inputs["spec_metadata"],`
`526`	`525`	`}`
`527`	`526`	`next_draft_tokens = torch.stack(next_draft_tokens, dim=1)`
`528`	`527`