Skip to content

Remove decomposition of insert_alice_async in place of spir-v operations #546

@prathams417

Description

@prathams417

Currently the insert_slice_async op is not actually done asynchronously, rather it is decomposed into:
// insert_slice_async %src, %dst, %idx, %mask, %other
// =>
// %tmp = load %src, %mask, %other
// %res = insert_slice %tmp into %dst[%idx]

Nvidia has already replaced this with their PTX assembly instruction cp.async.cg.shared.global.

We should remove the decomposition in favour of similar operations existing in spir-v:
OpGroupAsyncCopy and OpGroupWaitEvents

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions