Skip to content

Feature Request: Add --no-warmup to llama-bench #14224

@BrunoArsioli

Description

@BrunoArsioli

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Proposed Enhancement

Add a new command-line flag --no-warmup to disable the internal warm-up in llama-bench.

When used, llama-bench should not run the prompt/gen warm-up phases, and directly execute the timed trials.

This will eliminate redundant operations in automated benchmarking pipelines, and improve efficiency for users running long optimization loops.

Motivation

Currently, llama-bench automatically performs an internal warm-up before each benchmark test case.

However, in multi-stage tuning workflows like llama-optimus, this leads to redundant warm-ups: When trying to optimize for the best llama.cpp flags, a tool (such as llama-optimus) can already perform a heavy warm-up phase at the start;

In optimization loops that call llama-bench multiple times, repeating warm-up for each benchmark trial is wasted time and resources.

Also, it would help skip warm-up during debug and dev testing.

Possible Implementation

llama.cpp already supports a --no-warmup option—but only in llama-cli and llama-server. The flag was introduced in PR [#8712] to bypass the internal llama_decode warm-up call during CLI/server invocation

The codebase already has parsing and internal logic to disable warm-up in related components.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions