Skip to content

Commit fa4311d

Browse files
nopperlAlnusjaponicatdoublep
authored
[V1] v1 engine + full CUDA graph support for PLaMo2 (vllm-project#23998)
Signed-off-by: Hemmi Shinichi <[email protected]> Signed-off-by: nopperl <[email protected]> Co-authored-by: Hemmi Shinichi <[email protected]> Co-authored-by: Thomas Parnell <[email protected]>
1 parent 6d80ae8 commit fa4311d

File tree

6 files changed

+350
-126
lines changed

6 files changed

+350
-126
lines changed

docs/models/supported_models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -395,7 +395,7 @@ th {
395395
| `PhiMoEForCausalLM` | Phi-3.5-MoE | `microsoft/Phi-3.5-MoE-instruct`, etc. | ✅︎ | ✅︎ | ✅︎ |
396396
| `Phi4FlashForCausalLM` | Phi-4-mini-flash-reasoning | `microsoft/microsoft/Phi-4-mini-instruct`, etc. | | | |
397397
| `PersimmonForCausalLM` | Persimmon | `adept/persimmon-8b-base`, `adept/persimmon-8b-chat`, etc. | | ✅︎ | ✅︎ |
398-
| `Plamo2ForCausalLM` | PLaMo2 | `pfnet/plamo-2-1b`, `pfnet/plamo-2-8b`, etc. | | ✅︎ | |
398+
| `Plamo2ForCausalLM` | PLaMo2 | `pfnet/plamo-2-1b`, `pfnet/plamo-2-8b`, etc. | | ✅︎ | ✅︎ |
399399
| `QWenLMHeadModel` | Qwen | `Qwen/Qwen-7B`, `Qwen/Qwen-7B-Chat`, etc. | ✅︎ | ✅︎ | ✅︎ |
400400
| `Qwen2ForCausalLM` | QwQ, Qwen2 | `Qwen/QwQ-32B-Preview`, `Qwen/Qwen2-7B-Instruct`, `Qwen/Qwen2-7B`, etc. | ✅︎ | ✅︎ | ✅︎ |
401401
| `Qwen2MoeForCausalLM` | Qwen2MoE | `Qwen/Qwen1.5-MoE-A2.7B`, `Qwen/Qwen1.5-MoE-A2.7B-Chat`, etc. | ✅︎ | ✅︎ | ✅︎ |

docs/usage/v1_guide.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ Models using selective state-space mechanisms instead of standard transformer at
110110
Models that use Mamba-2 and Mamba-1 layers (e.g., `Mamba2ForCausalLM`, `MambaForCausalLM`,`FalconMambaForCausalLM`) are supported.
111111

112112
Hybrid models that combine Mamba-2 and Mamba-1 layers with standard attention layers are also supported (e.g., `BambaForCausalLM`,
113-
`Zamba2ForCausalLM`, `NemotronHForCausalLM`, `FalconH1ForCausalLM` and `GraniteMoeHybridForCausalLM`, `JambaForCausalLM`).
113+
`Zamba2ForCausalLM`, `NemotronHForCausalLM`, `FalconH1ForCausalLM` and `GraniteMoeHybridForCausalLM`, `JambaForCausalLM`, `Plamo2ForCausalLM`).
114114

115115
Hybrid models with mechanisms different to Mamba are also supported (e.g, `MiniMaxText01ForCausalLM`, `MiniMaxM1ForCausalLM`, `Lfm2ForCausalLM`).
116116

tests/models/language/generation/test_hybrid.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,7 @@
2525

2626
HYBRID_MODELS = [
2727
"ai21labs/Jamba-tiny-dev",
28-
# skipping until vLLM implementation issues are resolved
29-
# "pfnet/plamo-2-1b",
28+
"pfnet/plamo-2-1b",
3029
"Zyphra/Zamba2-1.2B-instruct",
3130
"hmellor/tiny-random-BambaForCausalLM",
3231
"ibm-granite/granite-4.0-tiny-preview",
@@ -37,6 +36,7 @@
3736
V1_SUPPORTED_MODELS = [
3837
"state-spaces/mamba-130m-hf",
3938
"ai21labs/Jamba-tiny-dev",
39+
"pfnet/plamo-2-1b",
4040
"yujiepan/mamba2-codestral-v0.1-tiny-random",
4141
"Zyphra/Zamba2-1.2B-instruct",
4242
"hmellor/tiny-random-BambaForCausalLM",
@@ -47,6 +47,7 @@
4747

4848
FULL_CUDA_GRAPH_MODELS = [
4949
"ai21labs/Jamba-tiny-dev",
50+
"pfnet/plamo-2-1b",
5051
"Zyphra/Zamba2-1.2B-instruct",
5152
]
5253

tests/models/registry.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -287,8 +287,6 @@ def check_available_online(
287287
"PhiMoEForCausalLM": _HfExamplesInfo("microsoft/Phi-3.5-MoE-instruct",
288288
trust_remote_code=True),
289289
"Plamo2ForCausalLM": _HfExamplesInfo("pfnet/plamo-2-1b",
290-
max_transformers_version="4.53",
291-
transformers_version_reason="vLLM impl inherits PreTrainedModel and clashes with get_input_embeddings", # noqa: E501
292290
trust_remote_code=True),
293291
"QWenLMHeadModel": _HfExamplesInfo("Qwen/Qwen-7B-Chat",
294292
max_transformers_version="4.53",

vllm/config/compilation.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -340,6 +340,7 @@ class CompilationConfig:
340340
"vllm.mamba_mixer",
341341
"vllm.short_conv",
342342
"vllm.linear_attention",
343+
"vllm.plamo2_mamba_mixer",
343344
]
344345

345346
def compute_hash(self) -> str:

0 commit comments

Comments
 (0)