UPSTREAM PR #18106: model : add ASR support for LFM2-Audio-1.5B (conformer) by loci-dev · Pull Request #592 · auroralabs-loci/llama.cpp

loci-dev · 2025-12-16T16:43:30Z

Rebased to latest master
Removed some redundant ggml_cont

loci-review · 2025-12-16T17:28:39Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary - PR #592

Overview

PR #592 adds LFM2-Audio-1.5B Conformer architecture support for ASR. The changes introduce 640 additions across 17 files, primarily in the MTMD (multimodal) module. Performance impact is isolated to build.bin.libmtmd.so with no effect on core inference paths.

Key Findings

Impacted Functions in Performance-Critical Areas

The analysis reveals degradation concentrated in STL container operations within the MTMD module:

Most-Impacted Functions (Absolute Change):

_ZNSt6vectorI10clip_layerSaIS0_EE17_M_default_appendEm: +849 ns response time (vector growth during CLIP layer initialization)
_ZNSt6vectorI10clip_layerSaIS0_EE11_S_max_sizeERKS1_: +113 ns response time (size validation)
_ZNSt6vectorIN17clip_model_loader15support_info_opESaIS1_EE5beginEv: +88 ns response time (iterator initialization)
_ZNSt15__new_allocatorI10clip_layerE8allocateEmPKv: +87 ns response time (memory allocation)
_ZSt10_ConstructI10clip_layerJEEvPT_DpOT0_: +76 ns response time (object construction)
_ZN10clip_layerC1Ev: +48 ns response time (constructor initialization)

Root Cause: The clip_layer structure gained 21 new pointer members (168 bytes) for Conformer-specific tensors. This increases constructor initialization time and amplifies STL operation costs during model loading.

Impact on Inference Performance (Tokens per Second)

Core inference functions remain unaffected. Analysis of critical inference paths shows:

llama_decode: No changes detected
llama_encode: No changes detected
llama_tokenize: No changes detected
llama_model_load: No changes detected

Tokens per second impact: 0%

The degradation is confined to MTMD module initialization (model loading phase), not the inference loop. Using the reference that 2 ms slower llama_decode reduces tokens per second by 7%, the observed changes (measured in nanoseconds in non-inference paths) translate to negligible inference impact.

Power Consumption Analysis

Impacted Binary:

build.bin.libmtmd.so: +1,262 nanojoules (+0.902%)

Unaffected Binaries:

build.bin.libllama.so: 0% change
build.bin.libggml.so: 0% change
build.bin.llama-run: 0% change
All other core inference binaries: 0% change

The power increase stems from cumulative throughput time increases in STL operations during MTMD model initialization. The 0.902% increase represents the energy cost of initializing larger data structures and loading additional tensors (21 per layer) for Conformer models.

Code Changes Context

The performance changes reflect intentional feature additions:

New Conformer audio encoder with 7-layer convolution subsampling
Extended tensor definitions (43 new MODEL_TENSOR entries)
SSM convolution kernel extended to support size 9 (previously 3-4)
Batch normalization folding in conversion pipeline

The degradation is proportional to the structural complexity added: each Conformer layer requires 21 additional tensor pointers, directly explaining the +48 ns constructor overhead and cascading STL operation costs.

tdakhran and others added 9 commits December 15, 2025 22:14

ASR with LFM2-Audio-1.5B

145b628

Set rope_theta

4f5d521

Fix comment

0e8779a

Remove rope_theta setting

f5b132a

Address PR feedback

ba9e597

rename functions to conformer

cea578b

remove some redundant ggml_cont

a3ebc93

Merge branch 'master' into tarek/feat/lfm2-asr-upstream

7865a15

fix missing tensor

72a41fd

loci-dev temporarily deployed to PROD__AL_DEMO December 16, 2025 16:43 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 19 times, most recently from a014a6b to eda9f43 Compare December 18, 2025 10:09

loci-dev force-pushed the main branch 30 times, most recently from 15838f1 to 006b713 Compare December 24, 2025 23:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #18106: model : add ASR support for LFM2-Audio-1.5B (conformer)#592

UPSTREAM PR #18106: model : add ASR support for LFM2-Audio-1.5B (conformer)#592
loci-dev wants to merge 9 commits into
mainfrom
upstream-PR18106-branch_ggml-org-tarek/feat/lfm2-asr-upstream

loci-dev commented Dec 16, 2025

Uh oh!

loci-review Bot commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

loci-dev commented Dec 16, 2025

Uh oh!

loci-review Bot commented Dec 16, 2025

Performance Analysis Summary - PR #592

Overview

Key Findings

Impacted Functions in Performance-Critical Areas

Impact on Inference Performance (Tokens per Second)

Power Consumption Analysis

Code Changes Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants