Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds an extension system for the performance report generation, allowing users to inject custom hooks for tree post-processing and performance model modifications. The main purpose is to enable experimental workflows with transformer models (specifically Transformer Engine v2) by adding support for pseudo operations and custom performance models.
Key changes:
- Adds
--extension_fileargument to load Python extension files with custom logic - Implements extension loading mechanism with support for tree post-processing, performance model updates, and operation categorization
- Includes a comprehensive Megatron extension example for Transformer Engine v2 compatibility
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| examples/generate_perf_report.py | Adds extension loading functionality and command-line argument |
| examples/generate_perf_report.md | Documents the new extension system with usage examples |
| examples/example_megatron_extension.py | Provides a complete extension example for Transformer Engine v2 |
| TraceLens/TreePerf/tree_perf.py | Makes performance model mappings extensible by storing them as instance attributes |
Comments suppressed due to low confidence (1)
examples/generate_perf_report.py:125
- Using
.get()without a default value will return None for unknown operations, which will cause a TypeError when trying to instantiateperf_model_class(event, ...)on line 126. The original code used direct dictionary access which would raise a more informative KeyError.
help='Include short kernel study in the report.')
| df_grouped = df_grouped.head(topk) | ||
| return df_hist, df_grouped | ||
|
|
||
| def apply_extension(perf_analyzer, extension_path): |
There was a problem hiding this comment.
The function lacks error handling for file loading failures. If the extension file doesn't exist or has syntax errors, this will cause an unhandled exception that could crash the script.
| """ | ||
| Context: In Transformer Engine v1, the blas GEMM calls are made by tex_ts::te_gemm_ts CPU ops. | ||
| As a result we can parse the gemm shapes from these CPU ops. | ||
| Hoewever, in Transformer Engine v2, the GEMM calls are made directly by the_Linear and _LayerNormLinear in fwd pass |
There was a problem hiding this comment.
Typo: 'Hoewever' should be 'However'
| Hoewever, in Transformer Engine v2, the GEMM calls are made directly by the_Linear and _LayerNormLinear in fwd pass | |
| However, in Transformer Engine v2, the GEMM calls are made directly by the_Linear and _LayerNormLinear in fwd pass |
| def get_launcher_start(kernel_evt): | ||
| launcher = trace_tree.get_parent_event(kernel_evt) | ||
| return launcher.get('ts') | ||
| bprop_gemm_kernels = sorted(bprop_gemm_kernels, key=lambda e: get_launcher_start(e)) #which |
There was a problem hiding this comment.
Incomplete comment: '#which' appears to be an unfinished thought and should be completed or removed
| bprop_gemm_kernels = sorted(bprop_gemm_kernels, key=lambda e: get_launcher_start(e)) #which | |
| bprop_gemm_kernels = sorted(bprop_gemm_kernels, key=lambda e: get_launcher_start(e)) # sort by launcher start time to determine xgrad and wgrad order |
| children.remove(launcher_evt['UID']) | ||
| children.append(pseudo_evt['UID']) | ||
|
|
||
| # we also need to |
There was a problem hiding this comment.
Incomplete comment: 'we also need to' is an unfinished sentence that should be completed or removed
| # we also need to |
| # ref TransformerEngine/transformer_engine/pytorch/cpp_extensions/fused_attn.py | ||
| # https://github.com/NVIDIA/TransformerEngine/blob/51cd441501e8e6dee18c00056f008e1b53b89ebd/transformer_engine/pytorch/attention/dot_product_attention/backends.py#L881 |
There was a problem hiding this comment.
The method has inconsistent indentation compared to the class definition. The docstring and method body should be properly indented to match the class structure.
| # ref TransformerEngine/transformer_engine/pytorch/cpp_extensions/fused_attn.py | |
| # https://github.com/NVIDIA/TransformerEngine/blob/51cd441501e8e6dee18c00056f008e1b53b89ebd/transformer_engine/pytorch/attention/dot_product_attention/backends.py#L881 | |
| # ref TransformerEngine/transformer_engine/pytorch/cpp_extensions/fused_attn.py | |
| # https://github.com/NVIDIA/TransformerEngine/blob/51cd441501e8e6dee18c00056f008e1b53b89ebd/transformer_engine/pytorch/attention/dot_product_attention/backends.py#L881 |
No description provided.