Skip to content

Commit 0ee77dd

Browse files
authored
Feat/perf report compare (#193)
1 parent 00afab9 commit 0ee77dd

File tree

2 files changed

+491
-0
lines changed

2 files changed

+491
-0
lines changed

examples/compare_perf_reports.md

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
## Compare TraceLens Performance Reports
2+
3+
---
4+
5+
### 1 . What It Takes In
6+
7+
| Input | Description |
8+
| ------------------ | -------------------------------------------------------------------------------------------------------------------------- |
9+
| `*.xlsx` files | TraceLens reports you want to compare. Provide **at least two**. |
10+
| Optional `--names` | Human-readable tags for each report. If omitted, the script falls back to the base filenames (handy, but sometimes messy). |
11+
12+
---
13+
14+
### 2 . How to Call It
15+
16+
```bash
17+
python compare_tracelens_reports.py \
18+
baseline.xlsx \
19+
candidate.xlsx \
20+
--names baseline candidate \
21+
--sheets all \
22+
-o comparison.xlsx
23+
```
24+
25+
Common flags:
26+
27+
| Flag | Default | Purpose |
28+
| -------------- | ----------------- | ------------------------------------------------------------------------------------------------ |
29+
| `-o, --output` | `comparison.xlsx` | Name of the merged workbook. |
30+
| `--names` | `<file stem>` | Custom tags (must match the number of reports). |
31+
| `--sheets` | `all` | Limit processing to a subset:<br>`gpu_timeline`, `ops_summary`, `ops_all`, `roofline`, or `all`. |
32+
33+
---
34+
35+
### 3 . What Comes Out
36+
37+
The script writes **one workbook** (`comparison.xlsx` unless you override it) containing multiple sheets:
38+
39+
| Sheet | When You Get It | What It Shows |
40+
| -------------- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
41+
| `gpu_timeline` | `--sheets gpu_timeline` or `all` | End-to-end GPU activity by **type** (`compute`, `memcpy`, etc.) with per-report timings plus:<br>- `time ms__<tag>_diff`<br>- `time ms__<tag>_pct` |
42+
| `ops_summary` | `--sheets ops_summary` or `all` | Per-op aggregates keyed on **`name`**. Sorted by the baseline’s `total_direct_kernel_time_ms`. Unhelpful columns (e.g., cumulative %) are stripped from non-baseline reports. |
43+
| `ops_all_*` | `--sheets ops_all` or `all` | Three sheets **per variant tag**:<br>• `ops_all_intersect_<tag>` – op instances present in both baseline and variant.<br>• `ops_all_only_baseline_<tag>` – ops only the baseline ran.<br>• `ops_all_only_variant_<tag>` – ops only the variant ran.<br>Columns irrelevant to a given view are hidden, not deleted. |
44+
| `<roofline>_*` | `--sheets roofline` or `all` | Same intersect / only\_\* breakdown for each roofline group:<br>`GEMM`, `SDPA_fwd`, `SDPA_bwd`, `CONV_fwd`, `CONV_bwd`, `UnaryElementwise`, `BinaryElementwise`. |
45+
46+
Hidden columns stay in the file (for power users) but are invisible in Excel by default.
47+
48+
---
49+
50+
### 4 . Diff Math
51+
52+
For every metric you ask it to track (`diff_cols` in the code), the script computes:
53+
54+
```text
55+
metric__<tag>_diff # variant - baseline
56+
metric__<tag>_pct # 100 * diff / baseline
57+
```
58+
---
59+
60+
### 5 . Design Decisions You Should Know
61+
62+
* **Outer merge, never inner** – if an op vanished, you’ll see it.
63+
* **Baseline = first report** – choose wisely.
64+
* **Column prefixing** – every metric becomes `<tag>::metric`, so you can safely concatenate arbitrary reports.
65+
* **Sheet-specific pruning** – the script aggressively hides noise (e.g., median, UID) to keep the output readable. You can always unhide them in Excel if you need them.
66+
* **Excel 31-char rule** – sheet names are truncated to fit; no data loss, just shorter labels.
67+
68+
---
69+
70+
### 6 . Future Enhancements
71+
1. Morphology-aware diffing – Understand the call stack tree and compare at lowest common call stack level. For example, if a baseline leaf op is 'cudnn_convolution' and the variant is 'miopen_convolution', the diff algorithm should recognize that lowest common level is 'convolution' and compare the two.

0 commit comments

Comments
 (0)