add support for grouped gemm by ajassani · Pull Request #254 · AMD-AGI/TraceLens

ajassani · 2025-08-17T23:54:15Z

No description provided.

Copilot

Pull Request Overview

This PR adds support for grouped GEMM (General Matrix Multiplication) operations to the TraceLens performance modeling framework. Grouped GEMM applies group-specific weight matrices to partitions of input tensors, which is useful for certain neural network architectures.

Implements a new GroupedGemm performance model class with forward/backward FLOP and byte calculations
Adds a custom implementation custom_grouped_gemm in the Megatron extension for parsing grouped GEMM events
Updates the extension mappings to register the new grouped GEMM operation

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
TraceLens/PerfModel/perf_model.py	Implements the core `GroupedGemm` performance model class with comprehensive documentation and computation methods
examples/example_megatron_extension.py	Adds `custom_grouped_gemm` implementation and registers it in the performance model mappings

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-08-17T23:55:00Z

examples/example_megatron_extension.py

+        if bpe_in == 1 or bpe_in == 2:
+            bpe_out = 2
+        else:
+            raise ValueError(f"Expected bpe_in to be 1 or 2, got {bpe_in}")


The error message should be more informative about what data types are supported. Consider mentioning the supported data types or referring to the name2bpe function documentation.

Suggested change

raise ValueError(f"Expected bpe_in to be 1 or 2, got {bpe_in}")

raise ValueError(

f"Unsupported bpe_in value: {bpe_in}. Supported bpe_in values are 1 (float16) and 2 (float32). "

"Please ensure that the input types are supported. See the name2bpe function documentation for details."

)

Copilot · 2025-08-17T23:55:00Z

TraceLens/PerfModel/perf_model.py

+        Y : tensor, shape (M, N)
+            The concatenated result of the groupwise multiplications.
+
+    Computation is functionally equivalent to  (implementation detail will ofcourse be efficient):


There's a spelling error: 'ofcourse' should be 'of course'.

Suggested change

Computation is functionally equivalent to (implementation detail will ofcourse be efficient):

Computation is functionally equivalent to (implementation detail will of course be efficient):

add support for grouped gemm

2ee0729

ajassani requested a review from Copilot August 17, 2025 23:54

Copilot AI reviewed Aug 17, 2025

View reviewed changes

ajassani merged commit 55def0b into main Aug 17, 2025
1 check passed

ajassani deleted the feat/megatron_grouped_gemm branch August 17, 2025 23:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for grouped gemm#254

add support for grouped gemm#254
ajassani merged 1 commit intomainfrom
feat/megatron_grouped_gemm

ajassani commented Aug 17, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 17, 2025

Uh oh!

Copilot AI Aug 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	Computation is functionally equivalent to (implementation detail will ofcourse be efficient):
	Computation is functionally equivalent to (implementation detail will of course be efficient):

Conversation

ajassani commented Aug 17, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants