-
Notifications
You must be signed in to change notification settings - Fork 404
[Comb][circt-synth] Implement BalanceMux pass for optimizing mux chains #9044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b8a6f8a to
caeb6fb
Compare
This pass performs two main optimizations on mux chains: enhanced mux chain folding that converts chains of muxes with index comparisons into array operations, and priority encoder rebalancing that transforms linear chains into balanced tree structures reducing depth from O(n) to O(log n). The pass handles both false-side and true-side chain patterns and includes comprehensive test cases covering priority encoding, duplicate conditions, and index comparison folding.
caeb6fb to
c7011fb
Compare
fabianschuiki
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Very useful pass to have 💯
| /// Enum for mux chain folding styles. | ||
| enum MuxChainWithComparisonFoldingStyle { None, BalancedMuxTree, ArrayGet }; | ||
| /// Mux chain folding that converts chains of muxes with index | ||
| /// comparisons into array operations or balanced mux trees. `styleFn` is a | ||
| /// callback that returns the desired folding style based on the index | ||
| /// width and number of entries. | ||
| bool foldMuxChainWithComparison( | ||
| PatternRewriter &rewriter, MuxOp rootMux, bool isFalseSide, | ||
| llvm::function_ref<MuxChainWithComparisonFoldingStyle(size_t indexWidth, | ||
| size_t numEntries)> | ||
| styleFn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea to have this as a utility on the Comb dialect 💯
| LDBG() << "Rebalanced priority mux with " << conditions.size() | ||
| << " conditions, using " << (useFalseSideChain ? "false" : "true") | ||
| << "-side chain.\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never seen LDBG() before. Very handy! 😄
|
@uenoku - didn't see any timing guidance after a brief scan - did you plan to integrate the timing analysis into this pass? As the discussion you pointed to highlighted the dependence on arrival times for an optimal mux tree structure? |
|
Yes the current code structure would not be too hard to integrate timing analysis. Will do that in a follow-up. For priority mux we can choose ideal separation points based on timing, for index comparison it's a bit trickier since it's necessary to know don't care bits of index. |
cowardsa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - nice work and certainly seems like a valuable addition to the synthesis pipeline.
Only potential more major suggestion - could add circt-lec tests and longest-path-analysis to check the reduction in levels is as expected i.e. O(n) -> O(log n)?
| /// `array_get (array_create(A, B, C), VAL)` or a balanced mux tree which is far | ||
| /// more compact and allows synthesis tools to do more interesting | ||
| /// optimizations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of interest - Synopsys documentation favours priority encoders: see slide 88
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, yeah. I wish there is a SV builtin for priority mux so that we can capture design intents. Maybe eventually we also want to introduce an operation for priority mux.
| // a tremendous number of replicated entries in the array. Some sparsity is | ||
| // ok though, so we require the table to be at least 5/8 utilized. | ||
| uint64_t tableSize = 1ULL << indexWidth; | ||
| if (numEntries >= tableSize * 5 / 8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Interesting - is there any data to support this 5/8 ths? If not perhaps worth adding a todo to benchmark - someone might pick it up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter was introduced here df37fa1 a while ago. Not sure the rational behind this and I think it's fine to change the behaivor. This is general Comb folder where creating a balance tree may not be always ideal (especially when emitting SystemVerilog, it would be easier for commertial tools to pattern match as well).
| // Balance mux chains. For area oriented flow, we want to keep the mux chains | ||
| // unless they are very deep. | ||
| comb::BalanceMuxOptions balanceOptions{OptimizationStrategyTiming ? 16 : 64}; | ||
| pm.addPass(comb::createBalanceMux(balanceOptions)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
52dad7a to
bf531be
Compare
This pass performs two main optimizations on mux chains: enhanced mux
chain folding that converts chains of muxes with index comparisons into
balanced mux tree, and priority encoder rebalancing that transforms linear
chains into balanced tree structures reducing depth from O(n) to O(log n).
1.
Unlike the FIRRTL pipeline which avoids mux balancing to preserve design
intent (chipsalliance/chisel#1199), the synthesis
pipeline implements these optimizations since it's focused on circuit
optimization rather than maintaining the original design structure.
The pass handles both false-side and true-side chain patterns and includes
comprehensive test cases covering priority encoding, duplicate conditions,
and index comparison folding.