Skip to content

Commit d59a51e

Browse files
jayfoadpradt2
authored andcommitted
[AMDGPU] GFX12 VMEM loads can write VGPR results out of order (llvm#105549)
Fix SIInsertWaitcnts to account for this by adding extra waits to avoid WAW dependencies. (cherry picked from commit 5506831)
1 parent 16112bd commit d59a51e

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

llvm/test/CodeGen/AMDGPU/load-global-i32.ll

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3154,8 +3154,10 @@ define amdgpu_kernel void @global_sextload_v32i32_to_v32i64(ptr addrspace(1) %ou
31543154
; SI-NOHSA-NEXT: buffer_store_dwordx4 v[36:39], off, s[0:3], 0 offset:240
31553155
; SI-NOHSA-NEXT: buffer_store_dwordx4 v[32:35], off, s[0:3], 0 offset:192
31563156
; SI-NOHSA-NEXT: buffer_load_dword v8, off, s[12:15], 0 ; 4-byte Folded Reload
3157+
; SI-NOHSA-NEXT: s_waitcnt vmcnt(0)
31573158
; SI-NOHSA-NEXT: buffer_load_dword v9, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
31583159
; SI-NOHSA-NEXT: buffer_load_dword v10, off, s[12:15], 0 offset:8 ; 4-byte Folded Reload
3160+
; SI-NOHSA-NEXT: s_waitcnt vmcnt(0)
31593161
; SI-NOHSA-NEXT: buffer_load_dword v11, off, s[12:15], 0 offset:12 ; 4-byte Folded Reload
31603162
; SI-NOHSA-NEXT: s_waitcnt vmcnt(0)
31613163
; SI-NOHSA-NEXT: buffer_store_dwordx4 v[8:11], off, s[0:3], 0 offset:208

0 commit comments

Comments
 (0)