[CombToSynth] Use parallel-prefix tree for unsigned comparisons #9048

uenoku · 2025-10-03T10:51:55Z

This commit extends the unsigned comparison lowering to support multiple parallel-prefix architectures (Sklanskey, Kogge-Stone, Brent-Kung) in addition to the existing ripple-carry implementation. No functional change in prefix-tree/adder lowering.

Previously, all comparisons used a ripple-carry style implementation that processed bits sequentially from LSB to MSB, resulting in O(n) depth for n-bit comparisons. This was a significant performance bottleneck for wide comparisons. The comparison lowering is now refactored to use the same parallel-prefix tree algorithms as the adder, reducing depth to O(log n).

The comparison logic is formulated as a prefix computation where equal bits are computed as ~(a_i ^ b_i) and greater bits as ~a_i & b_i, with propagate and generate signals based on equality and greater-than conditions.

Integration tests for all architectures verify logical equivalence via circt-lec.

before/after for 64-bit unsigned comparision.

uenoku@pop-os ~/d/circt-synth (dev/hidetou/icmp-parallel) [1]> ../circt/build/bin/circt-synth bar.mlir -output-longest-path=- -o /dev/null
# Longest Path Analysis result for "icmp_unsigned"
Found 128 paths
Found 1 unique fanout points
Maximum path delay: 128
## Showing Levels
Level = 128       . Count = 1         . 100.00    %
## Top 0 (out of 1) fan-out points

uenoku@pop-os ~/d/circt-synth (dev/hidetou/icmp-parallel)> ./build/bin/circt-synth bar.mlir -output-longest-path=- -o /dev/null 
# Longest Path Analysis result for "icmp_unsigned"
Found 128 paths
Found 1 unique end points 
Maximum path delay: 14
## Showing Levels
Level = 14        . Count = 1         . 100.00    %
## Top 0 (out of 1) end points

This commit extends the unsigned comparison lowering to support multiple parallel-prefix architectures (Sklanskey, Kogge-Stone, Brent-Kung) in addition to the existing ripple-carry implementation. Previously, all comparisons used a ripple-carry style implementation that processed bits sequentially from LSB to MSB, resulting in O(n) depth for n-bit comparisons. This was a significant performance bottleneck for wide comparisons. The comparison lowering is now refactored to use the same parallel-prefix tree algorithms as the adder, reducing depth to O(log n). Small comparisons (less than 8 bits) continue to use ripple-carry by default since the overhead of parallel-prefix structures is not worthwhile, while larger comparisons use parallel-prefix trees for better performance. The architecture can be explicitly specified via the synth.test.arch attribute. The comparison logic is formulated as a prefix computation where equal bits are computed as ~(a_i ^ b_i) and greater bits as ~a_i & b_i, with propagate and generate signals based on equality and greater-than conditions. Signed comparisons extract the sign bit and compare magnitudes separately using the unsigned comparison infrastructure. Integration tests for all architectures verify logical equivalence via circt-lec. The parallel-prefix approach reduces comparison depth from O(n) to O(log n) for n-bit comparisons, matching the delay characteristics of parallel-prefix adders.

uenoku · 2025-10-03T10:54:04Z

FileCheck test is missing in non-integration test and I'll add a test for parallel prefix tree.

cowardsa

Very neat - nice work and pleasing to see the longest path improvements - ideal how we can reuse the prefix computation.

For reduced overhead - could pass a flag to prevent the prefix computation generating a lot of gates that will then need to be removed by DCE? Namely, we are only interested in generating the carry-out and propagate out? (however the code already generates unused gates so its already far from optimal in terms of efficiency)

integration_test/circt-synth/comb-lowering-compare.mlir

uenoku · 2025-10-03T16:22:52Z

could pass a flag to prevent the prefix computation generating a lot of gates that will then need to be removed by DCE? Namely, we are only interested in generating the carry-out and propagate out?

Good points, but I'm not sure there is a non-complicated way to prune prefix computation since it's hard to know beforehand which index is actually used by carry-out and propagate out in the last stage. I think we can change the prefix computation for comparison to use recursive function with memoization that by nature lazily computes ony necessary prefixes, but it requires quite a bit of changes for prefix tree generation functions we have now. So please let me stick with the current implementation for this PR.

cowardsa · 2025-10-03T17:10:31Z

could pass a flag to prevent the prefix computation generating a lot of gates that will then need to be removed by DCE? Namely, we are only interested in generating the carry-out and propagate out?

Good points, but I'm not sure there is a non-complicated way to prune prefix computation since it's hard to know beforehand which index is actually used by carry-out and propagate out in the last stage. I think we can change the prefix computation for comparison to use recursive function with memoization that by nature lazily computes ony necessary prefixes, but it requires quite a bit of changes for prefix tree generation functions we have now. So please let me stick with the current implementation for this PR.

Absolutely non-blocking for sure - just a thought for future improvements if we hit against performance issues.

uenoku requested a review from cowardsa October 3, 2025 10:54

cowardsa approved these changes Oct 3, 2025

View reviewed changes

integration_test/circt-synth/comb-lowering-compare.mlir Outdated Show resolved Hide resolved

Address commments

9fc0b51

Add a test case

fdc58d5

uenoku merged commit ba41ee0 into llvm:main Oct 3, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CombToSynth] Use parallel-prefix tree for unsigned comparisons #9048

[CombToSynth] Use parallel-prefix tree for unsigned comparisons #9048

Uh oh!

uenoku commented Oct 3, 2025 •

edited

Loading

Uh oh!

uenoku commented Oct 3, 2025

Uh oh!

cowardsa left a comment

Uh oh!

Uh oh!

uenoku commented Oct 3, 2025 •

edited

Loading

Uh oh!

cowardsa commented Oct 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[CombToSynth] Use parallel-prefix tree for unsigned comparisons #9048

[CombToSynth] Use parallel-prefix tree for unsigned comparisons #9048

Uh oh!

Conversation

uenoku commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

uenoku commented Oct 3, 2025

Uh oh!

cowardsa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

uenoku commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cowardsa commented Oct 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

uenoku commented Oct 3, 2025 •

edited

Loading

uenoku commented Oct 3, 2025 •

edited

Loading