A Survey of Efficient Attention Methods

Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention
PDF: https://attention-survey.github.io/files/Attention_Survey.pdf
Paper webpage: https://attention-survey.github.io

This paper provides a comprehensive survey of Efficient Attention Methods, categorizing them into four classes.

Updates

[2025/8/19] 🎉 Our survey paper is now publicly available on GitHub! If you do find our resources helpful, please cite our paper.

Class 1: Hardware-efficient Attention

💡 Core Idea: Accelerate attention by leveraging hardware characteristics.

📝 Overall Formulations:

Hardware-efficient attention for the prefilling stage can be formulated as:

Where $\Psi(\cdot), \Theta(\cdot)$ are preprocess functions to accelerate computation, e.g., quantization functions in SageAttention.

Hardware-efficient attention for the decoding stage can be formulated as:

Where $\Psi(\cdot), \Theta(\cdot)$ are KV cache preprocess functions.

An example is FlashAttention, which tiles $Q, K, V$ to progressively compute the attention output $O$. Such a strategy avoids the I/O of $S, P$ matrices in the shape of $N \times N$.

The Table below summarizes various hardware-efficient attention methods. 👇

Class2: Compact Attention

💡 Core Idea: Compressing the KV cache of attention by weight sharing or low rank decomposition while keeping computational cost unchanged, as with a full-sized KV cache.

📝 Overall Formulations:

The Table Below is a summarization of various compact attention approaches. 👇

Class3: Sparse Attention

💡 Core Idea: Selectively performing a subset of computations in attention while omitting others.

📝 Overall Formulations:

The Table below summarizes various sparse attention methods. 👇

Class4: Linear Attention

💡 Core Idea: Redesigning the computational formulation of attention to achieve (\mathcal{O}(N)) time complexity.

📝 Overall Formulations:

Computational Forms

Linear Attention can be implemented in three forms: parallel, recurrent, and chunkwise.

Gating Mechanisms

Many linear attention methods incorporate forget gates and select gates.

Based on the presence of these gates, we can classify linear attention methods as follows:

Naive Linear Attention (No Gates)

📝 The Table below summarizes naive attention methods. 👇
Linear Attention with a Forget Gate

📝 This Table compares methods that use a forget gate. 👇
Linear Attention with Forget and Select Gates

📝 This Table compares methods that utilize both the forget gate and the select gate. 👇

A Special Case: Test-Time Training (TTT)

A unique approach, Test-Time Training (TTT), treats the hidden states of linear attention as learnable parameters.

Citation

If you find our work helpful, please cite our paper:

@article{zhangsurvey,
  title={Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention},
  author={Zhang, Jintao and Su, Rundong and Liu, Chunyu and Wei, Jia and Wang, Ziteng and Zhang, Pengle and Wang, Haoxu and Jiang, Huiqiang and Huang, Haofeng and Xiang, Chendong and others}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
png_figs		png_figs
resource		resource
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Survey of Efficient Attention Methods

Updates

Class 1: Hardware-efficient Attention

Class2: Compact Attention

Class3: Sparse Attention

Class4: Linear Attention

Computational Forms

Gating Mechanisms

A Special Case: Test-Time Training (TTT)

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

A Survey of Efficient Attention Methods

Updates

Class 1: Hardware-efficient Attention

Class2: Compact Attention

Class3: Sparse Attention

Class4: Linear Attention

Computational Forms

Gating Mechanisms

A Special Case: Test-Time Training (TTT)

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages