Skip to content

Tune CCMP for better Perf #116445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jul 23, 2025
Merged

Conversation

khushal1996
Copy link
Member

@khushal1996 khushal1996 commented Jun 9, 2025

This PR enable more paths for CCMP(#111072) by doing the following

  • Control when and how many switches are converted to CCMP - A switch conversion can span across blocks but CCMP did not check across blocks and hence converted potential switch candidates to ccmp partially hence reducing the effectiveness of switch. This has been handled in this PR to make sure existing switches do not regress.

  • Let all candidates for CCMP go through lowering - There were priori conditions for a CCMP to happen. Although it can handle all types of nodes, it was limited to certain types of node right now. I have gone ahead and enabled CCMP on all nodes while carefully checking for which node to convert to CCMP.

Superpmi run

Clean Superpmi replay

PS C:\Git_repos\runtime\src\coreclr\scripts> python .\superpmi.py replay -arch x64 -core_root "C:\Git_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root" -jitoption JitBypassApxCheck=1 -jitoption EnableApxConditionalChaining=1
[17:06:30] ================ Logging to C:\Git_repos\runtime\artifacts\spmi\superpmi.6.log
[17:06:30] Using JIT/EE Version from jiteeversionguid.h: 124f7514-194f-4924-9d70-25d41ca17947
[17:06:30] Found download cache directory "C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64" and --force_download not set; skipping download
[17:06:30] SuperPMI replay
[17:06:30] JIT Path: C:\Git_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll
[17:06:30] Using MCH files:
[17:06:30]   C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\aspnet.run.windows.x64.checked.mch
[17:06:30]   C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\benchmarks.run.windows.x64.checked.mch
[17:06:31]   C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\benchmarks.run_pgo.windows.x64.checked.mch
[17:06:31]   C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\benchmarks.run_pgo_optrepeat.windows.x64.checked.mch
[17:06:31]   C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\coreclr_tests.run.windows.x64.checked.mch
[17:06:31]   C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\libraries.crossgen2.windows.x64.checked.mch
[17:06:31]   C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\libraries.pmi.windows.x64.checked.mch
[17:06:31]   C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\libraries_tests.run.windows.x64.Release.mch
[17:06:31]   C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch
[17:06:31]   C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\realworld.run.windows.x64.checked.mch
[17:06:31]   C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\smoke_tests.nativeaot.windows.x64.checked.mch
[17:06:31] Running SuperPMI replay of C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\aspnet.run.windows.x64.checked.mch
[17:07:35] Clean SuperPMI replay (191141 contexts processed)
[17:07:35] Running SuperPMI replay of C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\benchmarks.run.windows.x64.checked.mch
[17:07:41] Clean SuperPMI replay (28534 contexts processed)
[17:07:41] Running SuperPMI replay of C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\benchmarks.run_pgo.windows.x64.checked.mch
[17:08:09] Clean SuperPMI replay (163914 contexts processed)
[17:08:09] Running SuperPMI replay of C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\benchmarks.run_pgo_optrepeat.windows.x64.checked.mch
[17:08:15] Clean SuperPMI replay (38923 contexts processed)
[17:08:15] Running SuperPMI replay of C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\coreclr_tests.run.windows.x64.checked.mch
[17:10:10] Clean SuperPMI replay (622667 contexts processed)
[17:10:10] Running SuperPMI replay of C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\libraries.crossgen2.windows.x64.checked.mch
[17:10:31] Clean SuperPMI replay (272864 contexts processed)
[17:10:31] Running SuperPMI replay of C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\libraries.pmi.windows.x64.checked.mch
[17:11:01] Clean SuperPMI replay (295522 contexts processed)
[17:11:01] Running SuperPMI replay of C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\libraries_tests.run.windows.x64.Release.mch
[17:12:59] SuperPMI encountered missing data for 3 out of 830168 contexts
[17:12:59] Running SuperPMI replay of C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch
[17:14:05] SuperPMI encountered missing data for 14 out of 371357 contexts
[17:14:05] Running SuperPMI replay of C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\realworld.run.windows.x64.checked.mch
[17:14:11] SuperPMI encountered missing data for 1 out of 29296 contexts
[17:14:11] Running SuperPMI replay of C:\Git_repos\runtime\artifacts\spmi\mch\124f7514-194f-4924-9d70-25d41ca17947.windows.x64\smoke_tests.nativeaot.windows.x64.checked.mch
[17:14:14] Clean SuperPMI replay (34167 contexts processed)
[17:14:14] Replay summary:
[17:14:14]   All replays clean
PS C:\Git_repos\runtime\src\coreclr\scripts>

TESTING

SDE test RUN
image

APX + CCMP + PR SDE test RUN

image

Superpmi Results -
Base - APX + CCMP
Diff - APX + CCMP + PR

image

Overall (-53,202 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs Base Instruction Count Diff Instruction Count
aspnet.run.windows.x64.checked.mch 71,458,057 +1,184 -1.90% 16260020 -1,749(-0.38%)(-0.54%)
benchmarks.run.windows.x64.checked.mch 8,843,331 -127 -4.34% 2222946 -395(-0.65%)(-0.82%)
benchmarks.run_pgo.windows.x64.checked.mch 72,194,319 -37,291 -1.47% 16737044 -11,689(-0.47%)(-0.58%)
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch 12,482,222 -247 -3.99% 3132513 -567(-0.65%)(-0.82%)
coreclr_tests.run.windows.x64.checked.mch 410,916,520 -7,890 -0.93% 85153004 -6,370(-0.64%)(-0.76%)
libraries.crossgen2.windows.x64.checked.mch 38,546,410 +350 -6.83% 10482053 -2,038(-1.21%)(-1.39%)
libraries.pmi.windows.x64.checked.mch 58,454,180 +2 -6.46% 14792343 -2,235(-0.86%)(-1.03%)
libraries_tests.run.windows.x64.Release.mch 387,147,737 -7,820 -0.84% 85190114 -15,490(-0.42%)(-0.52%)
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 154,406,173 -1,008 -5.68% 35744823 -2,342(-0.79%)(-0.96%)
realworld.run.windows.x64.checked.mch 11,740,196 +9 -5.19% 2845601 -621(-0.69%)(-0.80%)
smoke_tests.nativeaot.windows.x64.checked.mch 5,512,086 -364 -11.38% 1544817 -411(-1.57%)(-1.72%)
Details

Instruction Count improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (#instructions) Regressions (#instructions)
aspnet.run.windows.x64.checked.mch 982 711 33 238 -1,993 +244
benchmarks.run.windows.x64.checked.mch 224 186 5 33 -410 +15
benchmarks.run_pgo.windows.x64.checked.mch 4,238 3,213 11 1,014 -11,708 +19
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch 333 273 10 50 -597 +30
coreclr_tests.run.windows.x64.checked.mch 2,493 2,071 7 415 -6,393 +23
libraries.crossgen2.windows.x64.checked.mch 1,169 964 20 185 -2,099 +61
libraries.pmi.windows.x64.checked.mch 1,203 999 21 183 -2,308 +73
libraries_tests.run.windows.x64.Release.mch 7,622 5,930 323 1,369 -16,793 +1,303
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 1,132 924 25 183 -2,454 +112
realworld.run.windows.x64.checked.mch 327 273 4 50 -632 +11
smoke_tests.nativeaot.windows.x64.checked.mch 181 161 0 20 -411 +0
19,904 15,705 459 3,740 -45,798 +1,891

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 9, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jun 9, 2025
@khushal1996 khushal1996 marked this pull request as ready for review June 10, 2025 05:24
@Copilot Copilot AI review requested due to automatic review settings June 10, 2025 05:24
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the JIT’s CCMP optimization by ensuring full switch chains are detected across basic blocks and by broadening the lowering phase to consider more compare operations for CCMP.

  • Introduces a testingForConversion mode in optSwitchDetectAndConvert with a minimum-test threshold to avoid partial CCMP.
  • Declares and defines CanConvertOpToCCMP and IsOpPreferredForCCMP to guide lowering choices.
  • Adds BBF_SWITCH_CONVERSION_LIKELY to mark candidate blocks and clears it on block splits.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
switchrecognition.cpp Added conversion-testing path, BBF_SWITCH_CONVERSION_LIKELY, and CONVERT_SWITCH_TO_CCMP_MIN_TEST.
lower.h Declared CanConvertOpToCCMP and IsOpPreferredForCCMP.
lower.cpp Defined CCMP helper methods and updated TryLowerAndOrToCCMP.
fgbasic.cpp Cleared BBF_SWITCH_CONVERSION_LIKELY on block splits.
compiler.h Extended optSwitchConvert and optSwitchDetectAndConvert APIs.
block.h Defined new BBF_SWITCH_CONVERSION_LIKELY flag.
Comments suppressed due to low confidence (2)

src/coreclr/jit/switchrecognition.cpp:15

  • There are no tests covering the new minimum-test threshold for CCMP conversion (fewer than 5 comparisons). Please add unit tests that verify behavior both below and above this threshold to prevent regressions.
#define CONVERT_SWITCH_TO_CCMP_MIN_TEST 5

src/coreclr/jit/lower.cpp:11664

  • The newly added 'else { return false; }' appears to pair with the preceding debug-only block (e.g., after JITDUMP), causing TryLowerAndOrToCCMP to return false in normal builds. This likely disables CCMP lowering when not in verbose mode. Adjust the else so it’s scoped to the intended condition or remove it.
else

@khushal1996
Copy link
Member Author

@dotnet/intel for further review.

@khushal1996
Copy link
Member Author

@jakobbotsch let me know if you have further reviews.
@dotnet/intel @tannergooding for more reviews.

@jakobbotsch
Copy link
Member

/azp run runtime, runtime-coreclr superpmi-diffs

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@JulieLeeMSFT
Copy link
Member

Fixes #114564.

@khushal1996
Copy link
Member Author

Looks like there is an assert during codegen in superpmi-diffs from this log: https://helixr1107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-116445-merge-3673ad9030974e6692/linux-arm64-3/1/console.5ca266d8.log?helixlogtype=result

[18:13:32] ISSUE: <ASSERT> #352654 /Users/runner/work/1/s/src/coreclr/jit/codegenarm64.cpp (3402) - Assertion failed 'unreached' in 'Runtime_113337:issue3()' during 'Generate code' (IL size 80; hash 0x074ed573; FullOpts)
[18:13:32] 

Is it related to this PR? I don't see it in other runs, but let me retry the run here to check.

Yeah I re ran the CI to see if it fails again. The build analysis was clear in the previous run.

@khushal1996
Copy link
Member Author

Looks like there is an assert during codegen in superpmi-diffs from this log: https://helixr1107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-116445-merge-3673ad9030974e6692/linux-arm64-3/1/console.5ca266d8.log?helixlogtype=result

[18:13:32] ISSUE: <ASSERT> #352654 /Users/runner/work/1/s/src/coreclr/jit/codegenarm64.cpp (3402) - Assertion failed 'unreached' in 'Runtime_113337:issue3()' during 'Generate code' (IL size 80; hash 0x074ed573; FullOpts)
[18:13:32] 

Is it related to this PR? I don't see it in other runs, but let me retry the run here to check.

@jakobbotsch the errors have been resolved. Other CI errors look unrelated at this time.

@EgorBo
Copy link
Member

EgorBo commented Jul 17, 2025

Is it expected that the diffs are empty?

@jakobbotsch
Copy link
Member

Is it expected that the diffs are empty?

The changes only kick in when ccmp on x64 is supported, which we don't have in SPMI.

@khushal1996
Copy link
Member Author

Is it expected that the diffs are empty?

@EgorBo @jakobbotsch
yes. they will be seen when CCMP is supported on Intel hardware. Superpmi diffs with APX enabled are shown below

image

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. Retried the CI jobs that timed out.

@khushal1996
Copy link
Member Author

Thanks for the review @jakobbotsch

@jakobbotsch
Copy link
Member

/ba-g Infra timeout issues

@jakobbotsch jakobbotsch merged commit 8277598 into dotnet:main Jul 23, 2025
102 of 110 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants