Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for grouped GEMM (General Matrix Multiplication) operations to the TraceLens performance modeling framework. Grouped GEMM applies group-specific weight matrices to partitions of input tensors, which is useful for certain neural network architectures.
- Implements a new
GroupedGemmperformance model class with forward/backward FLOP and byte calculations - Adds a custom implementation
custom_grouped_gemmin the Megatron extension for parsing grouped GEMM events - Updates the extension mappings to register the new grouped GEMM operation
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| TraceLens/PerfModel/perf_model.py | Implements the core GroupedGemm performance model class with comprehensive documentation and computation methods |
| examples/example_megatron_extension.py | Adds custom_grouped_gemm implementation and registers it in the performance model mappings |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| if bpe_in == 1 or bpe_in == 2: | ||
| bpe_out = 2 | ||
| else: | ||
| raise ValueError(f"Expected bpe_in to be 1 or 2, got {bpe_in}") |
There was a problem hiding this comment.
The error message should be more informative about what data types are supported. Consider mentioning the supported data types or referring to the name2bpe function documentation.
| raise ValueError(f"Expected bpe_in to be 1 or 2, got {bpe_in}") | |
| raise ValueError( | |
| f"Unsupported bpe_in value: {bpe_in}. Supported bpe_in values are 1 (float16) and 2 (float32). " | |
| "Please ensure that the input types are supported. See the name2bpe function documentation for details." | |
| ) |
| Y : tensor, shape (M, N) | ||
| The concatenated result of the groupwise multiplications. | ||
|
|
||
| Computation is functionally equivalent to (implementation detail will ofcourse be efficient): |
There was a problem hiding this comment.
There's a spelling error: 'ofcourse' should be 'of course'.
| Computation is functionally equivalent to (implementation detail will ofcourse be efficient): | |
| Computation is functionally equivalent to (implementation detail will of course be efficient): |
No description provided.