Skip to content

[mlir][hoisting] Currently linalg hoisting can not optimize memref.assume_alignment #144825

Open
@xiangzh1

Description

@xiangzh1

Last month (2025-05-18)Kleiman update the AssumeAlignmentOp, let it has AnyMemRef result。
this make the following changes:

%2 = hal.interface.binding.subspan layout ... : memref<4096x4096xf16, #hal.descriptor_type<storage_buffer>>
 memref.assume_alignment %2, 64 : memref<4096x4096xf16, #hal.descriptor_type<storage_buffer>>
use %2

change to

%2 = hal.interface.binding.subspan layout ... : memref<4096x4096xf16, #hal.descriptor_type<storage_buffer>>
%assume_align = memref.assume_alignment %2, 64 : memref<4096x4096xf16, #hal.descriptor_type<storage_buffer>>
use %assume_align

Problem:
This will affect the linalg hoisting optimization,due to the memref.assume_alignment inherited the interface
ViewLikeOpInterface which is excluded by linalg hoisting.

for example , in follow mlir, the
"%1 = vector.transfer_read %assume_align_0[%c0, %c0] ..." and
"vector.transfer_write %3, %assume_align_0[%c0, %c0]"
read from and write to a same location. We can hoist them out of loop:

%m0 = hal.interface.binding.subspan layout ...: memref<4096x4096xf16>
 %m1 = hal.interface.binding.subspan layout ...: memref<4096x4096xf16>
 %assume_align_0 = memref.assume_alignment %m0, 64 : memref<4096x4096xf16>
 %assume_align_1 = memref.assume_alignment %m1, 64 : memref<4096x4096xf16>
 scf.for %arg0 = %c256 to %c4096 step %c256 {
   %1 = vector.transfer_read %assume_align_0[%c0, %c0], %cst_0 {in_bounds = [true, true]} : memref<4096x4096xf16>, vector<16x16xf16>
   %2 = vector.transfer_read %m1[%arg0, %arg0], %cst_0 {in_bounds = [true, true]} : memref<4096x4096xf16>, vector<16x16xf16>
   %3 = vector.contract {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %2, %2, %1 : vector<16x16xf16>, vector<16x16xf16> into vector<16x16xf16>
   vector.transfer_write %3, %assume_align_0[%c0, %c0] {in_bounds = [true, true]} : vector<16x16xf16>, memref<4096x4096xf16>
 }

but due to the transfer_read/write from/to an assume_alignment operation. The linalg hoisting stop do optimization for it.
(I am not much understand why the linalg hoisting do this, I am a beginner in mlir)
But the assume_alignment just mark memref's alignment, The linalg hoisting should check its memref operand not it self.
so we expect the upper mlir can be optimized to:

   %m0 = hal.interface.binding.subspan layout ...: memref<4096x4096xf16>
    %m1 = hal.interface.binding.subspan layout ...: memref<4096x4096xf16>
    %assume_align_0 = memref.assume_alignment %m0, 64 : memref<4096x4096xf16>
    %assume_align_1 = memref.assume_alignment %m1, 64 : memref<4096x4096xf16>
    %0 = vector.transfer_read %assume_align[%c0, %c0], %cst {in_bounds = [true, true]} : memref<4096x4096xf16>, vector<16x16xf16> // out of loop
    %1 = scf.for %arg0 = %c256 to %c4096 step %c256 iter_args(%arg1 = %0) -> (vector<16x16xf16>) {
      %2 = vector.transfer_read %assume_align_1[%arg0, %arg0], %cst {in_bounds = [true, true]} : memref<4096x4096xf16>, vector<16x16xf16>
      %3 = vector.contract {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %2, %2, %arg1 : vector<16x16xf16>, vector<16x16xf16> into vector<16x16xf16>
      scf.yield %3 : vector<16x16xf16>
    }
    vector.transfer_write %1, %assume_align[%c0, %c0] {in_bounds = [true, true]} : vector<16x16xf16>, memref<4096x4096xf16> // out of loop

detailed example pls refer to example
(I don't not how to write hal.interface.binding for mlir-opt so, in the example I use memref.alloc() instead of them.)

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions