[Example] Add the fused kernel for decoding of flash linear attention #27

yaoyaoding · 2025-09-01T06:10:34Z

This PR adds the example of fused kernel for flash linear attention:

Sigmoid Gating Delta Rule Update Benchmark Results:
      name     (B, T, H, K)   (HV, V) latency (ms)
0    torch   (1, 1, 4, 128)  (8, 128)        1.164
1   triton   (1, 1, 4, 128)  (8, 128)        0.010
2    tilus   (1, 1, 4, 128)  (8, 128)        0.006
3    torch   (1, 2, 4, 128)  (8, 128)        2.270
4   triton   (1, 2, 4, 128)  (8, 128)        0.011
5    tilus   (1, 2, 4, 128)  (8, 128)        0.007
6    torch   (1, 4, 4, 128)  (8, 128)        4.475
7   triton   (1, 4, 4, 128)  (8, 128)        0.013
8    tilus   (1, 4, 4, 128)  (8, 128)        0.009
9    torch   (1, 8, 4, 128)  (8, 128)        8.848
10  triton   (1, 8, 4, 128)  (8, 128)        0.018
11   tilus   (1, 8, 4, 128)  (8, 128)        0.013
12   torch  (1, 16, 4, 128)  (8, 128)       17.589
13  triton  (1, 16, 4, 128)  (8, 128)        0.029
14   tilus  (1, 16, 4, 128)  (8, 128)        0.022
15   torch  (1, 32, 4, 128)  (8, 128)       35.413
16  triton  (1, 32, 4, 128)  (8, 128)        0.051
17   tilus  (1, 32, 4, 128)  (8, 128)        0.044

Signed-off-by: Yaoyao Ding <[email protected]>

add a fused version

2fc225a

Signed-off-by: Yaoyao Ding <[email protected]>

yaoyaoding changed the title ~~[Example] Add the fused kernel for decoding stage of flash linear attention~~ [Example] Add the fused kernel for decoding of flash linear attention Sep 1, 2025

yaoyaoding merged commit 28dd71f into main Sep 1, 2025
8 of 9 checks passed

yaoyaoding deleted the yaoyao/fla branch September 3, 2025 02:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Example] Add the fused kernel for decoding of flash linear attention #27

[Example] Add the fused kernel for decoding of flash linear attention #27

Uh oh!

yaoyaoding commented Sep 1, 2025

Uh oh!

Uh oh!

Uh oh!

[Example] Add the fused kernel for decoding of flash linear attention #27

[Example] Add the fused kernel for decoding of flash linear attention #27

Uh oh!

Conversation

yaoyaoding commented Sep 1, 2025

Uh oh!

Uh oh!

Uh oh!