Skip to content

UPSTREAM PR #18106: model : add ASR support for LFM2-Audio-1.5B (conformer)#592

Open
loci-dev wants to merge 9 commits into
mainfrom
upstream-PR18106-branch_ggml-org-tarek/feat/lfm2-asr-upstream
Open

UPSTREAM PR #18106: model : add ASR support for LFM2-Audio-1.5B (conformer)#592
loci-dev wants to merge 9 commits into
mainfrom
upstream-PR18106-branch_ggml-org-tarek/feat/lfm2-asr-upstream

Conversation

@loci-dev

Copy link
Copy Markdown

Mirrored from ggml-org/llama.cpp#18106

Supersede ggml-org/llama.cpp#17694

  • Rebased to latest master
  • Removed some redundant ggml_cont

@loci-review

loci-review Bot commented Dec 16, 2025

Copy link
Copy Markdown

Explore the complete analysis inside the Version Insights

Performance Analysis Summary - PR #592

Overview

PR #592 adds LFM2-Audio-1.5B Conformer architecture support for ASR. The changes introduce 640 additions across 17 files, primarily in the MTMD (multimodal) module. Performance impact is isolated to build.bin.libmtmd.so with no effect on core inference paths.

Key Findings

Impacted Functions in Performance-Critical Areas

The analysis reveals degradation concentrated in STL container operations within the MTMD module:

Most-Impacted Functions (Absolute Change):

  • _ZNSt6vectorI10clip_layerSaIS0_EE17_M_default_appendEm: +849 ns response time (vector growth during CLIP layer initialization)
  • _ZNSt6vectorI10clip_layerSaIS0_EE11_S_max_sizeERKS1_: +113 ns response time (size validation)
  • _ZNSt6vectorIN17clip_model_loader15support_info_opESaIS1_EE5beginEv: +88 ns response time (iterator initialization)
  • _ZNSt15__new_allocatorI10clip_layerE8allocateEmPKv: +87 ns response time (memory allocation)
  • _ZSt10_ConstructI10clip_layerJEEvPT_DpOT0_: +76 ns response time (object construction)
  • _ZN10clip_layerC1Ev: +48 ns response time (constructor initialization)

Root Cause: The clip_layer structure gained 21 new pointer members (168 bytes) for Conformer-specific tensors. This increases constructor initialization time and amplifies STL operation costs during model loading.

Impact on Inference Performance (Tokens per Second)

Core inference functions remain unaffected. Analysis of critical inference paths shows:

  • llama_decode: No changes detected
  • llama_encode: No changes detected
  • llama_tokenize: No changes detected
  • llama_model_load: No changes detected

Tokens per second impact: 0%

The degradation is confined to MTMD module initialization (model loading phase), not the inference loop. Using the reference that 2 ms slower llama_decode reduces tokens per second by 7%, the observed changes (measured in nanoseconds in non-inference paths) translate to negligible inference impact.

Power Consumption Analysis

Impacted Binary:

  • build.bin.libmtmd.so: +1,262 nanojoules (+0.902%)

Unaffected Binaries:

  • build.bin.libllama.so: 0% change
  • build.bin.libggml.so: 0% change
  • build.bin.llama-run: 0% change
  • All other core inference binaries: 0% change

The power increase stems from cumulative throughput time increases in STL operations during MTMD model initialization. The 0.902% increase represents the energy cost of initializing larger data structures and loading additional tensors (21 per layer) for Conformer models.

Code Changes Context

The performance changes reflect intentional feature additions:

  • New Conformer audio encoder with 7-layer convolution subsampling
  • Extended tensor definitions (43 new MODEL_TENSOR entries)
  • SSM convolution kernel extended to support size 9 (previously 3-4)
  • Batch normalization folding in conversion pipeline

The degradation is proportional to the structural complexity added: each Conformer layer requires 21 additional tensor pointers, directly explaining the +48 ns constructor overhead and cascading STL operation costs.

@loci-dev loci-dev force-pushed the main branch 19 times, most recently from a014a6b to eda9f43 Compare December 18, 2025 10:09
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 15838f1 to 006b713 Compare December 24, 2025 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants