Skip to content

add feature for extension#249

Merged
ajassani merged 1 commit intomainfrom
feat/tl_ext
Aug 1, 2025
Merged

add feature for extension#249
ajassani merged 1 commit intomainfrom
feat/tl_ext

Conversation

@ajassani
Copy link
Copy Markdown
Collaborator

@ajassani ajassani commented Aug 1, 2025

No description provided.

@ajassani ajassani requested a review from Copilot August 1, 2025 17:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds an extension system for the performance report generation, allowing users to inject custom hooks for tree post-processing and performance model modifications. The main purpose is to enable experimental workflows with transformer models (specifically Transformer Engine v2) by adding support for pseudo operations and custom performance models.

Key changes:

  • Adds --extension_file argument to load Python extension files with custom logic
  • Implements extension loading mechanism with support for tree post-processing, performance model updates, and operation categorization
  • Includes a comprehensive Megatron extension example for Transformer Engine v2 compatibility

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
examples/generate_perf_report.py Adds extension loading functionality and command-line argument
examples/generate_perf_report.md Documents the new extension system with usage examples
examples/example_megatron_extension.py Provides a complete extension example for Transformer Engine v2
TraceLens/TreePerf/tree_perf.py Makes performance model mappings extensible by storing them as instance attributes
Comments suppressed due to low confidence (1)

examples/generate_perf_report.py:125

  • Using .get() without a default value will return None for unknown operations, which will cause a TypeError when trying to instantiate perf_model_class(event, ...) on line 126. The original code used direct dictionary access which would raise a more informative KeyError.
                        help='Include short kernel study in the report.')

df_grouped = df_grouped.head(topk)
return df_hist, df_grouped

def apply_extension(perf_analyzer, extension_path):
Copy link

Copilot AI Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function lacks error handling for file loading failures. If the extension file doesn't exist or has syntax errors, this will cause an unhandled exception that could crash the script.

Copilot uses AI. Check for mistakes.
"""
Context: In Transformer Engine v1, the blas GEMM calls are made by tex_ts::te_gemm_ts CPU ops.
As a result we can parse the gemm shapes from these CPU ops.
Hoewever, in Transformer Engine v2, the GEMM calls are made directly by the_Linear and _LayerNormLinear in fwd pass
Copy link

Copilot AI Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: 'Hoewever' should be 'However'

Suggested change
Hoewever, in Transformer Engine v2, the GEMM calls are made directly by the_Linear and _LayerNormLinear in fwd pass
However, in Transformer Engine v2, the GEMM calls are made directly by the_Linear and _LayerNormLinear in fwd pass

Copilot uses AI. Check for mistakes.
def get_launcher_start(kernel_evt):
launcher = trace_tree.get_parent_event(kernel_evt)
return launcher.get('ts')
bprop_gemm_kernels = sorted(bprop_gemm_kernels, key=lambda e: get_launcher_start(e)) #which
Copy link

Copilot AI Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete comment: '#which' appears to be an unfinished thought and should be completed or removed

Suggested change
bprop_gemm_kernels = sorted(bprop_gemm_kernels, key=lambda e: get_launcher_start(e)) #which
bprop_gemm_kernels = sorted(bprop_gemm_kernels, key=lambda e: get_launcher_start(e)) # sort by launcher start time to determine xgrad and wgrad order

Copilot uses AI. Check for mistakes.
children.remove(launcher_evt['UID'])
children.append(pseudo_evt['UID'])

# we also need to
Copy link

Copilot AI Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete comment: 'we also need to' is an unfinished sentence that should be completed or removed

Suggested change
# we also need to

Copilot uses AI. Check for mistakes.
Comment on lines +289 to +290
# ref TransformerEngine/transformer_engine/pytorch/cpp_extensions/fused_attn.py
# https://github.com/NVIDIA/TransformerEngine/blob/51cd441501e8e6dee18c00056f008e1b53b89ebd/transformer_engine/pytorch/attention/dot_product_attention/backends.py#L881
Copy link

Copilot AI Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method has inconsistent indentation compared to the class definition. The docstring and method body should be properly indented to match the class structure.

Suggested change
# ref TransformerEngine/transformer_engine/pytorch/cpp_extensions/fused_attn.py
# https://github.com/NVIDIA/TransformerEngine/blob/51cd441501e8e6dee18c00056f008e1b53b89ebd/transformer_engine/pytorch/attention/dot_product_attention/backends.py#L881
# ref TransformerEngine/transformer_engine/pytorch/cpp_extensions/fused_attn.py
# https://github.com/NVIDIA/TransformerEngine/blob/51cd441501e8e6dee18c00056f008e1b53b89ebd/transformer_engine/pytorch/attention/dot_product_attention/backends.py#L881

Copilot uses AI. Check for mistakes.
@ajassani ajassani merged commit a0974c3 into main Aug 1, 2025
1 check passed
@ajassani ajassani deleted the feat/tl_ext branch August 1, 2025 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants