|
| 1 | +## Compare TraceLens Performance Reports |
| 2 | + |
| 3 | +--- |
| 4 | + |
| 5 | +### 1 . What It Takes In |
| 6 | + |
| 7 | +| Input | Description | |
| 8 | +| ------------------ | -------------------------------------------------------------------------------------------------------------------------- | |
| 9 | +| `*.xlsx` files | TraceLens reports you want to compare. Provide **at least two**. | |
| 10 | +| Optional `--names` | Human-readable tags for each report. If omitted, the script falls back to the base filenames (handy, but sometimes messy). | |
| 11 | + |
| 12 | +--- |
| 13 | + |
| 14 | +### 2 . How to Call It |
| 15 | + |
| 16 | +```bash |
| 17 | +python compare_tracelens_reports.py \ |
| 18 | + baseline.xlsx \ |
| 19 | + candidate.xlsx \ |
| 20 | + --names baseline candidate \ |
| 21 | + --sheets all \ |
| 22 | + -o comparison.xlsx |
| 23 | +``` |
| 24 | + |
| 25 | +Common flags: |
| 26 | + |
| 27 | +| Flag | Default | Purpose | |
| 28 | +| -------------- | ----------------- | ------------------------------------------------------------------------------------------------ | |
| 29 | +| `-o, --output` | `comparison.xlsx` | Name of the merged workbook. | |
| 30 | +| `--names` | `<file stem>` | Custom tags (must match the number of reports). | |
| 31 | +| `--sheets` | `all` | Limit processing to a subset:<br>`gpu_timeline`, `ops_summary`, `ops_all`, `roofline`, or `all`. | |
| 32 | + |
| 33 | +--- |
| 34 | + |
| 35 | +### 3 . What Comes Out |
| 36 | + |
| 37 | +The script writes **one workbook** (`comparison.xlsx` unless you override it) containing multiple sheets: |
| 38 | + |
| 39 | +| Sheet | When You Get It | What It Shows | |
| 40 | +| -------------- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | |
| 41 | +| `gpu_timeline` | `--sheets gpu_timeline` or `all` | End-to-end GPU activity by **type** (`compute`, `memcpy`, etc.) with per-report timings plus:<br>- `time ms__<tag>_diff`<br>- `time ms__<tag>_pct` | |
| 42 | +| `ops_summary` | `--sheets ops_summary` or `all` | Per-op aggregates keyed on **`name`**. Sorted by the baseline’s `total_direct_kernel_time_ms`. Unhelpful columns (e.g., cumulative %) are stripped from non-baseline reports. | |
| 43 | +| `ops_all_*` | `--sheets ops_all` or `all` | Three sheets **per variant tag**:<br>• `ops_all_intersect_<tag>` – op instances present in both baseline and variant.<br>• `ops_all_only_baseline_<tag>` – ops only the baseline ran.<br>• `ops_all_only_variant_<tag>` – ops only the variant ran.<br>Columns irrelevant to a given view are hidden, not deleted. | |
| 44 | +| `<roofline>_*` | `--sheets roofline` or `all` | Same intersect / only\_\* breakdown for each roofline group:<br>`GEMM`, `SDPA_fwd`, `SDPA_bwd`, `CONV_fwd`, `CONV_bwd`, `UnaryElementwise`, `BinaryElementwise`. | |
| 45 | + |
| 46 | +Hidden columns stay in the file (for power users) but are invisible in Excel by default. |
| 47 | + |
| 48 | +--- |
| 49 | + |
| 50 | +### 4 . Diff Math |
| 51 | + |
| 52 | +For every metric you ask it to track (`diff_cols` in the code), the script computes: |
| 53 | + |
| 54 | +```text |
| 55 | +metric__<tag>_diff # variant - baseline |
| 56 | +metric__<tag>_pct # 100 * diff / baseline |
| 57 | +``` |
| 58 | +--- |
| 59 | + |
| 60 | +### 5 . Design Decisions You Should Know |
| 61 | + |
| 62 | +* **Outer merge, never inner** – if an op vanished, you’ll see it. |
| 63 | +* **Baseline = first report** – choose wisely. |
| 64 | +* **Column prefixing** – every metric becomes `<tag>::metric`, so you can safely concatenate arbitrary reports. |
| 65 | +* **Sheet-specific pruning** – the script aggressively hides noise (e.g., median, UID) to keep the output readable. You can always unhide them in Excel if you need them. |
| 66 | +* **Excel 31-char rule** – sheet names are truncated to fit; no data loss, just shorter labels. |
| 67 | + |
| 68 | +--- |
| 69 | + |
| 70 | +### 6 . Future Enhancements |
| 71 | +1. Morphology-aware diffing – Understand the call stack tree and compare at lowest common call stack level. For example, if a baseline leaf op is 'cudnn_convolution' and the variant is 'miopen_convolution', the diff algorithm should recognize that lowest common level is 'convolution' and compare the two. |
0 commit comments