examples : set n_seq_max = 2 for ctx3

danbev · danbev · commit 6693df846c86 · 2026-02-06T13:19:54.000+01:00
This commit updates the save-load-state example to set the n_seq_max
parameter to 2 when initializing the ctx3 context.

The motivation for this change is that using 1 as n_parallel/n_seq_max
the context only supports one sequence, but the test laster tries to
use a second sequence which results in the following error:
```console
main : loaded state with 4 tokens
main : seq 0 copied, 225760 bytes
main : kv cache cleared
find_slot: seq_id=1 &gt;= n_seq_max=1 Try using a bigger --parallel value
state_read_meta: failed to find available cells in kv cache
```
This seems to only happen for recurrent/hybrid models.
diff --git a/examples/save-load-state/save-load-state.cpp b/examples/save-load-state/save-load-state.cpp
@@ -151,7 +151,9 @@ int main(int argc, char ** argv) {
     }
 
     // make new context
-    llama_context * ctx3 = llama_init_from_model(model, common_context_params_to_llama(params));
+    auto params_ctx3 = common_context_params_to_llama(params);
+    params_ctx3.n_seq_max = 2;
+    llama_context * ctx3 = llama_init_from_model(model, params_ctx3);
 
     llama_sampler * smpl3 = llama_sampler_chain_init(sparams);
 

Original file line number	Diff line number	Diff line change
`@@ -151,7 +151,9 @@ int main(int argc, char ** argv) {`
`151`	`151`	`}`
`152`	`152`
`153`	`153`	`// make new context`
`154`		`- llama_context * ctx3 = llama_init_from_model(model, common_context_params_to_llama(params));`
	`154`	`+ auto params_ctx3 = common_context_params_to_llama(params);`
	`155`	`+ params_ctx3.n_seq_max = 2;`
	`156`	`+ llama_context * ctx3 = llama_init_from_model(model, params_ctx3);`
`155`	`157`
`156`	`158`	`llama_sampler * smpl3 = llama_sampler_chain_init(sparams);`
`157`	`159`