Releases: AMD-AGI/TraceLens
Releases · AMD-AGI/TraceLens
v0.4.0
Switching the repo to public
What's Changed
- Alternative subtract_intervals function optimized for scalability by @tjkemp in #89
- Add MIT license to all meaningful files by @gabeweisz in #92
- Bugfix/conv output shape by @RElbers in #95
- dim eff analysis by @ajassani in #94
- Stagingv0.4 by @ajassani in #96
- pass arch and detail to all op by @ajassani in #97
- Add additional Jax data analyses by @gabeweisz in #98
- Add in GEMM shape analysis from XLA file by @gabeweisz in #100
- move convert import up by @gabeweisz in #101
- only include linked kernels in gpu timeline by @ajassani in #110
- Feat/event replay by @ajassani in #102
- include nn module in kernel view and refine by @ajassani in #116
- add aten::bmm perf model by @ajassani in #117
- Feat/topk ops by @ajassani in #120
- Feat/nn module parent by @ajassani in #121
- feat: TraceLens UI by @mpashkovskii in #99
- Fix get_breakdown_df_multigpu to filter out CPU events by @gabeweisz in #122
- include args in perf metrics table by @ajassani in #130
- short term fix for te linear gemms by @ajassani in #126
- add perf model for aten baddbmm by @ajassani in #131
- annotate gpu events by stream index by @ajassani in #135
- Reorg examples by @ajassani in #136
- Feat/unique args by @ajassani in #137
- feat: transformer engine ver 1 GEMM ops te_gemm_ts added by @olehtika in #125
- parse trans from args and add te gemm name to list in examples by @ajassani in #140
- Add GEMM performance model support for Jax by @gabeweisz in #139
- Integrate gemmologist for modeling gemm efficiencies. by @araina-amd in #133
- use correct dtype in cmd by @ajassani in #141
- Jax support for gemmologist integration by @gabeweisz in #143
- fix: tex_ts::te_gemm_ts missing dtype by @olehtika in #142
- allow causal attention by @ajassani in #145
- support gqa by @ajassani in #147
- tev2 native support by @ajassani in #148
- native cat 2 op names by @ajassani in #149
- megatron lm custom flow by @ajassani in #152
- fix flash true in fusedattnfunc by @ajassani in #153
- Fix/unique args kernel names by @ajassani in #154
- Fix/megatron gemm dtype by @ajassani in #156
- Fix/te dtype by @ajassani in #157
- Feat/qkv stride by @ajassani in #159
- torch_op_mapping: fix typo, add clamp_max by @lauri9 in #162
- Add script to filter trace to range of given user annotation by @lauri9 in #163
- Feat/custom report by @ajassani in #166
- Feat/nn flops by @ajassani in #169
- Feat/replay from report by @ajassani in #168
- Feat/aten sdp eff atten by @ajassani in #172
- Test/event replay by @ajassani in #173
- Fix/event replay by @ajassani in #174
- Fix/torch import by @ajassani in #175
- Test/event replay improve by @ajassani in #176
- Feat/evt replay example improve by @ajassani in #177
- add conditional strenum as suggested by @jakki-amd by @gabeweisz in #161
- Fix/lib deps by @ajassani in #178
- continue on fn call error by @ajassani in #179
- explicit openpyxl import error by @ajassani in #180
- sort false to avoid error by @ajassani in #181
- Add roofline_analyzer 0.1.1 by @tykow in #182
- regression test for perf report by @ajassani in #183
- fix is tensor type by @ajassani in #188
- Feat/perf report compare by @ajassani in #193
- NCCLAnalyzer: add missing allgather_into_tensor_coalesced collective name by @lauri9 in #194
- Enabling batch gemm through gemmologist. by @araina-amd in #189
- NcclAnalyser: add gzip support by @lauri9 in #196
- add flash_attn::_flash_attn_forward by @lauri9 in #197
- feat: extend performance report for multiple GPU ranks and trace files by @olehtika in #186
- Revert "feat: extend performance report for multiple GPU ranks and tr… by @olehtika in #198
- fix: fallback for StrEnum ImportError for Python 3.10 by @olehtika in #205
- feat: use util.DataLoader for all data loading by @olehtika in #208
- fix: JaxAnalysis FP8 GEMM kernel missing issue by @olehtika in #203
- SDPA/Flash attention changes for the perf model by @araina-amd in #191
- Partition on K and V instead of Q in SDPA backward pass by @araina-amd in #211
- option to print cpu op dispatch args by @ajassani in #213
- add aten::_scaled_dot_product_flash_attention to perf model by @ajassani in #217
- Feat/call stack analysis by @ajassani in #220
- dropout not needed for perf metrics by @ajassani in #221
- warn on dtype mismatch in gemms by @ajassani in #223
- feat: jax analysis reporting command line tool by @olehtika in #210
- Fix/jax analyses do not parse file header by @jujaykka in #218
- Fix/processing performance fixes by @jujaykka in #219
- minimize installation dependencies by @ajassani in #224
- Add GitHub Actions Workflows by @spandoesai in #222
- torch op categorization by @ajassani in #228
- fix error on dtype mismatch by @ajassani in #231
- fix edge cases for perf report by @ajassani in #235
- quick fix for execute by @ajassani in #236
- add openpyxl as requirement by @stephen-youn in #230
- overcome openpyxl dependency by @ajassani in #238
- gemms conceptual by @ajassani in #134
- notebook for autocast exploration by @ajassani in #240
- Commandline interface for roofline and code refactoring for perf model by @araina-amd in #229
- Added Perf Model for aiter::flash_attn by @spandoesai in #242
- Docs/trace2tree by @ajassani in #243
- fix perf report comparison script by @ajassani in #244
- TraceMap: interactive HTML dashboards for analyzing vLLM workloads (added to custom_workflows) by @hyukjlee in #246
- warn when compute perf metrics fails by @ajassani in #248
- Check if m, n and k are not none in gemm. by @araina-amd in #247
- add feature for extension by @ajassani in #249
- kernel detail update by @ajassani in #245
- Added the TraceDiff API to TraceLens by @spandoesai in #241
- add support for grouped gemm by @ajassani in #254
- allow different d_h_qk and d_h_v by @ajass...