Skip to content

Conversation

yaoyaoding
Copy link
Member

This PR adds the following TMA-related instructions:

  1. cp_async_tensor_global_to_shared
  2. cp_async_tensor_shared_to_global
  3. cp_async_tensor_commit_group
  4. cp_async_tensor_wait_group
  5. fence_proxy_copy_async

Internally,

  1. add a linear decomposition utility to extract the strides of global tensor access
  2. add InvariantTrackingContext in codegen to track the grid- and block-invariant variables
  3. add GlobalViewContext in codegen to track all global tensors created via GlobalView with kernel parameter as ptr
  4. add the low-level hidet primitives for TMA related PTX instructions

@yaoyaoding yaoyaoding requested a review from Copilot September 17, 2025 06:05
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds TMA (Tensor Memory Accelerator) related instructions to the framework, enabling asynchronous tensor copy operations between global and shared memory. The PR introduces 5 new TMA instructions, utilities for linear expression decomposition, and tracking contexts for grid/block invariant variables and global tensor views.

  • Add 5 new TMA instructions: cp_async_tensor_global_to_shared, cp_async_tensor_shared_to_global, cp_async_tensor_commit_group, cp_async_tensor_wait_group, fence_proxy_copy_async
  • Introduce linear decomposition utility for extracting tensor access strides
  • Add InvariantTrackingContext and GlobalViewContext for TMA code generation

Reviewed Changes

Copilot reviewed 49 out of 51 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/lang/test_copy_async_tensor.py Test case for TMA copy operations
python/tilus/ir/utils/lineardec.py Linear expression decomposition utility
python/tilus/ir/instructions/cuda/cp_async_tensor.py TMA instruction definitions
python/tilus/backends/emitters/cuda/cp_async_tensor.py TMA instruction code generation
python/tilus/extensions/hidet/ir/primitives/cuda/tensor_map.py TensorMap primitive definitions
python/tilus/extensions/hidet/ir/primitives/cuda/copy_async_tensor.py TMA primitive functions
Comments suppressed due to low confidence (1)

python/tilus/extensions/hidet/transforms/lower_subbyte_type.py:1

  • Unreachable code detected. Line 229 will never execute because line 228 always raises an exception.
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

yaoyaoding and others added 15 commits September 17, 2025 06:18
Signed-off-by: Yaoyao Ding <[email protected]>
Signed-off-by: Yaoyao Ding <[email protected]>
Signed-off-by: Yaoyao Ding <[email protected]>
Signed-off-by: Yaoyao Ding <[email protected]>
Signed-off-by: Yaoyao Ding <[email protected]>
Signed-off-by: Yaoyao Ding <[email protected]>
Signed-off-by: Yaoyao Ding <[email protected]>
Signed-off-by: Yaoyao Ding <[email protected]>
Signed-off-by: Yaoyao Ding <[email protected]>
Signed-off-by: Yaoyao Ding <[email protected]>
Signed-off-by: Yaoyao Ding <[email protected]>
Signed-off-by: Yaoyao Ding <[email protected]>
Signed-off-by: Yaoyao Ding <[email protected]>
Signed-off-by: Yaoyao Ding <[email protected]>
Signed-off-by: Yaoyao Ding <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant