add feature for extension by ajassani · Pull Request #249 · AMD-AGI/TraceLens

ajassani · 2025-08-01T17:15:42Z

No description provided.

Copilot

Pull Request Overview

This PR adds an extension system for the performance report generation, allowing users to inject custom hooks for tree post-processing and performance model modifications. The main purpose is to enable experimental workflows with transformer models (specifically Transformer Engine v2) by adding support for pseudo operations and custom performance models.

Key changes:

Adds --extension_file argument to load Python extension files with custom logic
Implements extension loading mechanism with support for tree post-processing, performance model updates, and operation categorization
Includes a comprehensive Megatron extension example for Transformer Engine v2 compatibility

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
examples/generate_perf_report.py	Adds extension loading functionality and command-line argument
examples/generate_perf_report.md	Documents the new extension system with usage examples
examples/example_megatron_extension.py	Provides a complete extension example for Transformer Engine v2
TraceLens/TreePerf/tree_perf.py	Makes performance model mappings extensible by storing them as instance attributes

Comments suppressed due to low confidence (1)

examples/generate_perf_report.py:125

Using .get() without a default value will return None for unknown operations, which will cause a TypeError when trying to instantiate perf_model_class(event, ...) on line 126. The original code used direct dictionary access which would raise a more informative KeyError.

                        help='Include short kernel study in the report.')

Copilot · 2025-08-01T17:16:47Z

examples/generate_perf_report.py

        df_grouped = df_grouped.head(topk)
    return df_hist, df_grouped

+def apply_extension(perf_analyzer, extension_path):


The function lacks error handling for file loading failures. If the extension file doesn't exist or has syntax errors, this will cause an unhandled exception that could crash the script.

Copilot · 2025-08-01T17:16:47Z

examples/example_megatron_extension.py

+    """
+    Context: In Transformer Engine v1, the blas GEMM calls are made by tex_ts::te_gemm_ts CPU ops.
+    As a result we can parse the gemm shapes from these CPU ops.
+    Hoewever, in Transformer Engine v2, the GEMM calls are made directly by the_Linear and _LayerNormLinear in fwd pass 


Typo: 'Hoewever' should be 'However'

Suggested change

Hoewever, in Transformer Engine v2, the GEMM calls are made directly by the_Linear and _LayerNormLinear in fwd pass

However, in Transformer Engine v2, the GEMM calls are made directly by the_Linear and _LayerNormLinear in fwd pass

Copilot · 2025-08-01T17:16:47Z

examples/example_megatron_extension.py

+    def get_launcher_start(kernel_evt):
+        launcher = trace_tree.get_parent_event(kernel_evt)
+        return launcher.get('ts')
+    bprop_gemm_kernels = sorted(bprop_gemm_kernels, key=lambda e: get_launcher_start(e)) #which 


Incomplete comment: '#which' appears to be an unfinished thought and should be completed or removed

Suggested change

bprop_gemm_kernels = sorted(bprop_gemm_kernels, key=lambda e: get_launcher_start(e)) #which

bprop_gemm_kernels = sorted(bprop_gemm_kernels, key=lambda e: get_launcher_start(e)) # sort by launcher start time to determine xgrad and wgrad order

Copilot · 2025-08-01T17:16:47Z

examples/example_megatron_extension.py

+    children.remove(launcher_evt['UID'])
+    children.append(pseudo_evt['UID'])
+
+# we also need to 


Incomplete comment: 'we also need to' is an unfinished sentence that should be completed or removed

Suggested change

# we also need to

Copilot · 2025-08-01T17:16:48Z

examples/example_megatron_extension.py

+    # ref TransformerEngine/transformer_engine/pytorch/cpp_extensions/fused_attn.py
+    # https://github.com/NVIDIA/TransformerEngine/blob/51cd441501e8e6dee18c00056f008e1b53b89ebd/transformer_engine/pytorch/attention/dot_product_attention/backends.py#L881


The method has inconsistent indentation compared to the class definition. The docstring and method body should be properly indented to match the class structure.

Suggested change

# ref TransformerEngine/transformer_engine/pytorch/cpp_extensions/fused_attn.py

# https://github.com/NVIDIA/TransformerEngine/blob/51cd441501e8e6dee18c00056f008e1b53b89ebd/transformer_engine/pytorch/attention/dot_product_attention/backends.py#L881

# ref TransformerEngine/transformer_engine/pytorch/cpp_extensions/fused_attn.py

# https://github.com/NVIDIA/TransformerEngine/blob/51cd441501e8e6dee18c00056f008e1b53b89ebd/transformer_engine/pytorch/attention/dot_product_attention/backends.py#L881

add feature for extension

a381450

ajassani requested a review from Copilot August 1, 2025 17:15

Copilot AI reviewed Aug 1, 2025

View reviewed changes

ajassani merged commit a0974c3 into main Aug 1, 2025
1 check passed

ajassani deleted the feat/tl_ext branch August 1, 2025 17:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add feature for extension#249

add feature for extension#249
ajassani merged 1 commit intomainfrom
feat/tl_ext

ajassani commented Aug 1, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 1, 2025

Uh oh!

Copilot AI Aug 1, 2025

Uh oh!

Copilot AI Aug 1, 2025

Uh oh!

Copilot AI Aug 1, 2025

Uh oh!

Copilot AI Aug 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	Hoewever, in Transformer Engine v2, the GEMM calls are made directly by the_Linear and _LayerNormLinear in fwd pass
	However, in Transformer Engine v2, the GEMM calls are made directly by the_Linear and _LayerNormLinear in fwd pass

	bprop_gemm_kernels = sorted(bprop_gemm_kernels, key=lambda e: get_launcher_start(e)) #which
	bprop_gemm_kernels = sorted(bprop_gemm_kernels, key=lambda e: get_launcher_start(e)) # sort by launcher start time to determine xgrad and wgrad order

		# ref TransformerEngine/transformer_engine/pytorch/cpp_extensions/fused_attn.py
		# https://github.com/NVIDIA/TransformerEngine/blob/51cd441501e8e6dee18c00056f008e1b53b89ebd/transformer_engine/pytorch/attention/dot_product_attention/backends.py#L881

Conversation

ajassani commented Aug 1, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants