Skip to content

Conversation

yaoyaoding
Copy link
Member

In Tilus Script, you can partition the threads in a thread block into smaller thread groups and define instructions that execute only within specific thread groups. This provides fine-grained control over thread execution and enables efficient parallel programming patterns.

Example:

class MyScript(tilus.Script):
    def __init__(self):
        super().__init__()

    def __call__(self, ...):
        # Specify 4 warps = 128 threads total
        self.attrs.warps = 4

        # First level: split into 4 groups of 32 threads each
        with self.thread_group(0, num_groups=4):
            # Only first group (threads 0-31) enters here

            # Second level: further split into 2 sub-groups of 16 threads each
            with self.thread_group(0, num_groups=2):
                # Only threads 0-15 execute this
                fine_grained_work()

            with self.thread_group(1, num_groups=2):
                # Only threads 16-31 execute this
                different_fine_grained_work()

            # Back to first level - all threads 0-31 execute this
            self.sync()

Signed-off-by: Yaoyao Ding <[email protected]>
Signed-off-by: Yaoyao Ding <[email protected]>
Signed-off-by: Yaoyao Ding <[email protected]>
Signed-off-by: Yaoyao Ding <[email protected]>
@yaoyaoding yaoyaoding merged commit 6b82fe4 into main Sep 13, 2025
8 checks passed
@yaoyaoding yaoyaoding deleted the add-thread-group branch September 13, 2025 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant