sched : fix multiple evaluations of the same graph with pipeline parallelism #14855

slaren · 2025-07-24T14:04:47Z

Fixes incorrect results when using LLAMA_SET_ROWS=1 with pipeline parallelism. Caused by increasing cur_copy in sched_graph_compute, resulting in an incorrect copy being used if the same graph is evaluated multiple times.

…llelism ggml-ci

…llelism (ggml-org#14855) ggml-ci

* origin/master: docs : update HOWTO‑add‑model.md for ModelBase and new model classes (ggml-org#14874) ggml : remove invalid portPos specifiers from dot files (ggml-org#14838) context : restore preemptive sched reset when LLAMA_SET_ROWS=0 (ggml-org#14870) mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (ggml-org#14503) rpc : check for null buffers in get/set/copy tensor endpoints (ggml-org#14868) sched : fix multiple evaluations of the same graph with pipeline parallelism (ggml-org#14855) musa: upgrade musa sdk to rc4.2.0 (ggml-org#14498) sync : ggml cmake : fix usage issues (ggml/1257) ggml-cpu : remove stdlib include from repack.cpp (ggml/1276) context : perform output reorder lazily upon access after sync (ggml-org#14853) chat : fix kimi-k2 chat template (ggml-org#14852) sycl: fixed semantics of block offset calculation (ggml-org#14814) llama : fix MiniCPM inference after Granite Four changes (ggml-org#14850) docs: add libcurl-dev install hint for Linux distros (ggml-org#14801) metal : fix fusion across different encoders (ggml-org#14849) sycl: fix undefined variable in work group size check (ggml-org#14843) convert : text-only support for GLM-4.1V-9B-Thinking (ggml-org#14823) CUDA: fix overflow in FA, tune performance (ggml-org#14840) CUDA: fix compilation with GGML_CUDA_F16 (ggml-org#14837)

sched : fix multiple evaluations of the same graph with pipeline para…

a124399

…llelism ggml-ci

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jul 24, 2025

ggerganov mentioned this pull request Jul 24, 2025

Eval bug: LLAMA_SET_ROWS=1 gibberish output with Dual GPU offload #14795

Closed

ggerganov approved these changes Jul 25, 2025

View reviewed changes

ggerganov merged commit c12bbde into master Jul 25, 2025
54 of 55 checks passed

ggerganov mentioned this pull request Jul 25, 2025

Eval bug: Generation speed loss after b5920 #14876

Closed

taronaeo pushed a commit to taronaeo/llama.cpp-s390x that referenced this pull request Jul 25, 2025

sched : fix multiple evaluations of the same graph with pipeline para…

a122095

…llelism (ggml-org#14855) ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sched : fix multiple evaluations of the same graph with pipeline parallelism #14855

sched : fix multiple evaluations of the same graph with pipeline parallelism #14855

Uh oh!

slaren commented Jul 24, 2025

Uh oh!

Uh oh!

Uh oh!

sched : fix multiple evaluations of the same graph with pipeline parallelism #14855

sched : fix multiple evaluations of the same graph with pipeline parallelism #14855

Uh oh!

Conversation

slaren commented Jul 24, 2025

Uh oh!

Uh oh!

Uh oh!