Conversation
| 4: ['float', 'scalar'], | ||
| 2: ['c10::half', 'c10::bfloat16'], | ||
| 1: ['c10::float8_e4m3fnuz'], | ||
| 1: ['c10::float8_e4m3fnuz', 'unsigned char'], |
There was a problem hiding this comment.
Do we have any other data types that we could already add here?
ajassani
left a comment
There was a problem hiding this comment.
I dont think we need to make an assumptionabut the out dtype as
output dtype is the index 10 in event["args"]["Input type"]
TraceLens/PerfModel/perf_model.py
Outdated
| else: | ||
| # assume output dtype lowest of inputs, ignore scalars alpha and beta for now | ||
| # TODO: correct later if better way found | ||
| self.bpe_output = min(self.bpe_mat1, self.bpe_mat2, self.bpe_bias) |
There was a problem hiding this comment.
I dont think we need to make an assumption here as
output dtype is the index 10 in event["args"]["Input type"]
There was a problem hiding this comment.
Or is it bias dtype?
There was a problem hiding this comment.
Here we use index 10 to get shapes for output matrix
https://github.com/AMD-AIG-AIMA/TraceLens/blob/main/TraceLens/PerfModel/perf_model.py#L474
Also we can confirm here that arg index 10 is output matrix
https://github.com/ROCm/TransformerEngine/blob/e9772d4d18b2980e8e0643c94591a94cad9bb8b7/transformer_engine/pytorch/cpp_extensions/gemm.py#L254
There was a problem hiding this comment.
Now included output dtype
There was a problem hiding this comment.
Pull Request Overview
This PR fixes the missing dtype issue in the calculation of bytes per element by updating the dtype mappings and how input types are handled.
- Extend dtype mapping to include 'unsigned char'
- Update input type tuple to include output and bias dtypes
- Adjust the bytes computation logic to use the new tuple structure
Comments suppressed due to low confidence (2)
TraceLens/PerfModel/perf_model.py:503
- Verify that the ordering in the input type tuple accurately reflects the intended mapping for A, B, output, and bias, particularly ensuring that index 10 corresponds to the output dtype and index 18 to the bias dtype.
dtype_A_B = (event['args']['Input type'][0], event['args']['Input type'][5], event['args']['Input type'][10], event['args']['Input type'][18])
TraceLens/PerfModel/perf_model.py:519
- Confirm that the removal of the min() function and the direct mapping for output dtype are correct and consistent with the intended behavior, ensuring no additional logic is now required.
self.bpe_output = name2bpe(dtype_A_B[2])
| @@ -39,7 +39,7 @@ def name2bpe(name): | |||
| 8: ['double', 'long int'], | |||
| 4: ['float', 'scalar'], | |||
| 2: ['c10::half', 'c10::bfloat16'], | |||
There was a problem hiding this comment.
[nitpick] Consider adding a comment to clarify the inclusion of 'unsigned char' in the dtype mapping to improve future maintainability.
| 2: ['c10::half', 'c10::bfloat16'], | |
| 2: ['c10::half', 'c10::bfloat16'], | |
| # 'unsigned char' is included as a representation for 1-byte data types, often used as a fallback or for specific low-precision formats. |
Fix missing dtype in bytes calculation when calculating output dtype
Fix missing dtype in bytes calculation when calculating output dtype