feat: transformer engine ver 1 GEMM ops te_gemm_ts added by olehtika · Pull Request #125 · AMD-AGI/TraceLens

olehtika · 2025-04-28T12:08:55Z

-Add transformer engine ver 1 GEMM ops tex_ts::te_gemm_ts GEMM kernel computation
-Modify GEMM base class init to parse transpose information before matrix dimension calculation since it is needed

olehtika · 2025-04-28T12:14:44Z

Refs #28

jasainio · 2025-04-29T17:41:35Z

TraceLens/PerfModel/perf_model.py

+    def bytes(self):
+        dtype_A_B = self.param_details['dtype_A_B']
+        if dtype_A_B[0] != dtype_A_B[1]:
+            raise ValueError(f"Data types of A and B are different: {dtype_A_B}")


Why do the data types have to be the same? With fp8 we might have dtype_A=fp8 and dtype_B=bf8 and this should be fine. You probably need to generalize this bytes calculation for different A and B data types.

Base class GEMM seems to support different datatypes so there should be no reason to require the same datatype. Is this correct @ajassani?

jasainio · 2025-04-29T17:55:16Z

TraceLens/PerfModel/perf_model.py

+                             bpe_output=self.bpe)
+
+    def flops_bwd(self):
+        raise NotImplementedError("Backward pass for aten::addmm is not defined.")


Backward pass for aten::addmm is not defined -> Backward pass for tex_ts::te_gemm_ts is not defined.

Has this flops_bmw been implemented for any ops? Are there examples of how this is calculated?

jasainio · 2025-04-29T17:55:25Z

TraceLens/PerfModel/perf_model.py

+    def flops_bwd(self):
+        raise NotImplementedError("Backward pass for aten::addmm is not defined.")
+    def bytes_bwd(self, bytes_per_element):
+        raise NotImplementedError("Backward pass for aten::addmm is not defined.")


Backward pass for aten::addmm is not defined -> Backward pass for tex_ts::te_gemm_ts is not defined

-Add transformer engine ver 1 GEMM ops tex_ts::te_gemm_ts GEMM kernel computation -Modify GEMM base class init to parse transpose information before matrix dimension calculation since it is needed

feat: transformer engine ver 1 GEMM ops te_gemm_ts added

fc9e86d

olehtika requested review from ajassani, jasainio and mpashkovskii April 28, 2025 12:08

fix: tex_ts_te_gemm_ts perf_model parse bias info from args

5bdb934

jasainio requested changes Apr 29, 2025

View reviewed changes

olehtika added 2 commits May 2, 2025 11:50

fix: fix merge main conflicts

8fc6cf4

feat: generalize bytes calculation for different input dtypes

42c4828

jasainio approved these changes May 5, 2025

View reviewed changes

ajassani merged commit 2f94270 into main May 5, 2025

ajassani deleted the feat/te-ver1-gemm-ops branch May 5, 2025 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: transformer engine ver 1 GEMM ops te_gemm_ts added#125

feat: transformer engine ver 1 GEMM ops te_gemm_ts added#125
ajassani merged 4 commits intomainfrom
feat/te-ver1-gemm-ops

olehtika commented Apr 28, 2025

Uh oh!

olehtika commented Apr 28, 2025

Uh oh!

jasainio Apr 29, 2025

Uh oh!

olehtika May 2, 2025

Uh oh!

jasainio Apr 29, 2025

Uh oh!

jasainio Apr 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

olehtika commented Apr 28, 2025

Uh oh!

olehtika commented Apr 28, 2025

Uh oh!

jasainio Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

olehtika May 2, 2025

Choose a reason for hiding this comment

Uh oh!

jasainio Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

jasainio Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants