-
Notifications
You must be signed in to change notification settings - Fork 12.5k
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Proposed Enhancement
Add a new command-line flag --no-warmup to disable the internal warm-up in llama-bench.
When used, llama-bench should not run the prompt/gen warm-up phases, and directly execute the timed trials.
This will eliminate redundant operations in automated benchmarking pipelines, and improve efficiency for users running long optimization loops.
Motivation
Currently, llama-bench automatically performs an internal warm-up before each benchmark test case.
However, in multi-stage tuning workflows like llama-optimus, this leads to redundant warm-ups: When trying to optimize for the best llama.cpp flags, a tool (such as llama-optimus) can already perform a heavy warm-up phase at the start;
In optimization loops that call llama-bench multiple times, repeating warm-up for each benchmark trial is wasted time and resources.
Also, it would help skip warm-up during debug and dev testing.
Possible Implementation
llama.cpp already supports a --no-warmup option—but only in llama-cli and llama-server. The flag was introduced in PR [#8712] to bypass the internal llama_decode warm-up call during CLI/server invocation
The codebase already has parsing and internal logic to disable warm-up in related components.