Skip to content

Conversation

@uenoku
Copy link
Member

@uenoku uenoku commented Aug 25, 2025

This commit introduces IncrementalLongestPathAnalysis to support lazy delay computation during IR mutations in synthesis passes, with the DatapathToComb conversion pass as the first use case.

The implementation adds the IncrementalLongestPathAnalysis class that extends the existing analysis with incremental capabilities through the PatternRewriter::Listener interface to track mutations and maintain validity. New getOrComputeDelay() and getOrComputePaths() methods enable on-demand computation, while an OperationAnalyzer class handles dynamic analysis of Comb and HW dialect operations through temporary module synthesis.

The analysis requires a top-down rewriting approach to maintain correctness, as bottom-up mutations can invalidate paths that depend on erased values. The incremental framework validates mutations to prevent breaking existing timing paths and provides early error detection for inappropriate modifications to critical path values.

@uenoku uenoku force-pushed the dev/hidetou/incremental branch from e01ee81 to fd778a3 Compare August 25, 2025 08:55
@uenoku uenoku requested a review from cowardsa August 25, 2025 08:56
@uenoku uenoku force-pushed the dev/hidetou/incremental branch from fd778a3 to 0f473d2 Compare August 25, 2025 08:58
Copy link
Contributor

@cowardsa cowardsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this connection Hideto - looks really valuable for improving synthesis results! I also like the interface but wanted to clarify usage a little:

  1. Is it the responsibility of the pass author to call notifyOperation... whenever you make changes to the IR? Or is this automatically hooked up with existing MLIR infra?
  2. Does knowing precisely where changes have been made (due to IR transformations) mean we can only look at local changes to the delay paths?

P.s. if you hook up the comb delay model interface then others (myself included) can add additional or more precise models as needed - similar to the KnownBits Analysis

@uenoku
Copy link
Member Author

uenoku commented Aug 26, 2025

Is it the responsibility of the pass author to call notifyOperation... whenever you make changes to the IR? Or is this automatically hooked up with existing MLIR infra?

Yes, it's user's responsibility to notify the mutation but MLIR's DialectConversion/PartialConversion/PatternRewriter automatically send notification to the listener as long as the listener is registered (config.listener = analysis; is the corresponding code). So it's shouldn't be a much problem if MLIR infra was used.

Does knowing precisely where changes have been made (due to IR transformations) mean we can only look at local changes to the delay paths?

Yes, that's current limitation. Currently IncrementalLongestPathAnalysis will only accept local scope (hw::HWModuleOp).
User could use normal LongestPathAnalysis for mlir::ModuleOp but it doesn't support incremental updates now.

For inter-module timing analysis I'm inclined to prefer synth.required_times/arrival_times operation (or attribute?) to attach timing information across modules (similar SDC) so that local timing analysis can use information of global timing analysis, and periodically sync the timing info globally.

hw.module @Foo(in %x: i3) {
    // bit 0/1/2 arrive at time 1/5/10.
    %x_with_timing = synth.arrival_times %x [10, 5, 1] : i3
    %out = hw.instance @Bar(%x_with_timing): (i3) -> i2
    %out_with_timing = synth.arrival_times %out_timing [100, 20] : i2
    
}

… IR transformations

This commit introduces IncrementalLongestPathAnalysis to support lazy computation of delay information during IR mutations in synthesis passes. The analysis is updated incrementally so that we can lazily compute the delay while performing IR mutations.
A first use case is the DatapathToComb conversion pass.

Key changes include adding the IncrementalLongestPathAnalysis class that extends the existing longest path analysis with incremental update capabilities. The implementation includes the PatternRewriter::Listener interface to track IR mutations and maintain analysis validity. New getOrComputeDelay() and getOrComputePaths() methods enable on-demand delay computation, and the incremental analysis is integrated into DatapathCompressOpConversion for timing-aware compress operation lowering.

The longest path analysis is inherently complex and use-def chains must be taken into account when rewriting the IR. A top-down rewriting approach is necessary to maintain analysis correctness, as bottom-up mutations can invalidate paths that depend on erased values.

The incremental analysis validates that IR mutations don't break existing timing paths and provides early error detection when values used in critical paths are inappropriately modified.
@uenoku uenoku force-pushed the dev/hidetou/incremental branch 4 times, most recently from efa345e to da0e7d3 Compare August 29, 2025 10:31
@uenoku
Copy link
Member Author

uenoku commented Aug 29, 2025

@cowardsa LongestPathAnalysis.cpp requires more clean up and testing, so not ready for review but the analysis now returns timing model (= logic depth in AIG) for most of comb/hw operations (so it should be usable for datapath lowering).
Certainly we can introduce an op interface to override some of the analysis but I'd like to do that in the follow-up.
I made changes on DatapathToComb which requires analysis to work. Specifically I replaced DialectConversion with GreedyRewirter due to the lack of top-down rewrite in DialectConversion. I hope that's fine change.

This is a change of AIG depth for your benchmark set for the current PR FYI.

add_three 12 -> 12
blend 49 -> 49
dot_product 43 -> 43
-----------
fmaa 72 -> 67
fmaa_shae 45 -> 43
fma 64  -> 61

@cowardsa
Copy link
Contributor

cowardsa commented Aug 29, 2025

This is fantastic! Very pleasing to see this play out in the results as well - you need the top-down traversal so that we have lowered everything which the current operator depends on to ensure that we can compute the timing information. Is that correct? I actually like this a lot as it also means that partial_product operators will be lowered before we reach the compressor trees that consume their outputs. Nice!

I'm hoping to have a timing-driven compressor tree PR ready today or early next week - but will perhaps separate out the datapathToComb changes to avoid conflicts :)

@uenoku
Copy link
Member Author

uenoku commented Aug 30, 2025

ensure that we can compute the timing information. Is that correct?

Exactly! It's necessary to topologically sort the IR beforehand because HWModule has a graph region but after that top-down lowering should ensure that operand is lowered before the operation.

@uenoku uenoku force-pushed the dev/hidetou/incremental branch 2 times, most recently from e4b2787 to e0d8429 Compare September 1, 2025 04:07
@uenoku uenoku marked this pull request as ready for review September 1, 2025 04:09
@uenoku uenoku force-pushed the dev/hidetou/incremental branch 2 times, most recently from d990563 to 831be2d Compare September 1, 2025 04:40
…able timing-aware DatapathToComb optimization

This commit enhances the AIG longest path analysis infrastructure to support operations from the Comb and HW dialects, enabling comprehensive timing analysis across mixed-dialect circuits. The DatapathToComb pass has been updated to use timing-aware optimization strategies through the incremental longest path analysis framework.

The longest path analysis now handles comb.and, comb.or, comb.xor, and hw.constant operations alongside existing AIG operations, providing more accurate delay calculations for circuits that haven't been fully lowered to AIG. An Oracle class has been introduced to dynamically analyze unsupported operations by creating temporary modules and running synthesis pipelines to extract timing characteristics.

The DatapathToComb pass integrates IncrementalLongestPathAnalysis as a listener in its greedy rewrite driver, enabling timing-aware transformations. The compressor lowering logic now sorts addends by arrival time to minimize critical path delays during Wallace tree construction. Configuration options for timing-aware datapath optimizations have been added to the synthesis pipeline.

Test coverage has been expanded to verify longest path analysis behavior with mixed dialects and ensure proper timing optimization during datapath-to-combinational lowering. This enables more sophisticated timing optimization strategies throughout the CIRCT synthesis pipeline by providing accurate delay information across
@uenoku uenoku force-pushed the dev/hidetou/incremental branch from 831be2d to 0cf5b31 Compare September 1, 2025 04:48
@uenoku uenoku requested a review from cowardsa September 1, 2025 08:35
Copy link
Contributor

@cowardsa cowardsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental review - will continue but hit a weird GitHub interface bug...

Comment on lines +91 to +107
for (size_t j = 0; j < addends[0].size(); ++j) {
SmallVector<std::pair<int64_t, Value>> delays;
for (auto &addend : addends) {
auto delay = analysis->getOrComputeMaxDelay(addend[j], 0);
if (failed(delay))
return rewriter.notifyMatchFailure(op,
"Failed to get delay for input");
delays.push_back(std::make_pair(*delay, addend[j]));
}
std::stable_sort(delays.begin(), delays.end(),
[](const std::pair<int64_t, Value> &a,
const std::pair<int64_t, Value> &b) {
return a.first < b.first;
});
for (size_t i = 0; i < addends.size(); ++i)
addends[i][j] = delays[i].second;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - will pass the "delays" directly to the constructor in an upcoming PR - but current placeholder looks good

Comment on lines +289 to +292
// Set the listener to update timing information
// HACK: Setting max iterations to 2 to ensure that the patterns are one-shot,
// making sure target operations are datapath operations are replaced.
config.setMaxIterations(2).setListener(analysis).setUseTopDownTraversal(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add a check after applying the rewriter to ensure that there are no "Illegal" operators from the Datapath dialect - to ensure we satisfy the conditions imposed by the conversion pass?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in runOnOperation 👍

Comment on lines +441 to +455
//===----------------------------------------------------------------------===//
// OperationAnalyzer
//===----------------------------------------------------------------------===//

// OperationAnalyzer handles timing analysis for individual operations that
// haven't been converted to AIG yet. It creates isolated modules for each
// unique operation type/signature combination, runs the conversion pipeline
// (HW -> Comb -> AIG), and analyzes the resulting AIG representation.
//
// This is used as a fallback when the main analysis encounters operations
// that don't have direct AIG equivalents. The analyzer:
// 1. Creates a wrapper HW module for the operation
// 2. Runs the standard conversion pipeline to lower it to AIG
// 3. Analyzes the resulting AIG to compute timing paths
// 4. Caches results based on operation name and function signature
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! It really nicely captures any improvements to the lowering code itself e.g. adding a new faster adder architecture! This assumes uniform arrival times for all inputs is that correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, input arrival times are assumed to be uniform.

Comment on lines +506 to +508
// Temporary module used to hold wrapper modules during analysis
// Each operation gets its own wrapper module created inside this parent
mlir::OwningOpRef<mlir::ModuleOp> moduleOp;
Copy link
Contributor

@cowardsa cowardsa Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking! Presumably the scope is a single operation, therefore no input limits are known e.g. if I perform:
{4'd0, a} + {4'd0, b}

Does this get viewed as an 8-bit adder for the purpose of this local lowering? I think this is a perfectly reasonable over-approximation for guiding initial lowering decisions. If we envision a future iterative timing closure loop then we may need a more accurate delay model.

Copy link
Member Author

@uenoku uenoku Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this get viewed as an 8-bit adder for the purpose of this local lowering?

Yes, constants are not visible so it will be viewed as 8-bit adder. Maybe we can include the known bits as a part of analysis, e.g. cache[{opName, funcType, inputKnownBits]} that makes things more accurate.

Copy link
Contributor

@cowardsa cowardsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great! I'll be interested to test the performance but the interface looks ideal and will track changes to the lowering code. The tests are also very nice - look forward to seeing how we might incorporate this across the synthesis pipeline :)

auto opName = op->getName();
auto functionType = getFunctionTypeForOp(op);
auto key = std::make_pair(opName, functionType);
auto it = cache.find(key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment on lines 1340 to 1352
// Connect module inputs to cloned operation operands
// Handle type mismatches with bitcast operations
for (auto arg : hwModule.getBodyBlock()->getArguments()) {
Value input = arg;
auto idx = arg.getArgNumber();

// Insert bitcast if input port type differs from operand type
if (input.getType() != cloned->getOperand(idx).getType())
input = builder.create<hw::BitcastOp>(
op->getLoc(), cloned->getOperand(idx).getType(), input);

cloned->setOperand(idx, input);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why do we get type mismatches? Is this always safe to cast to different bitwidths?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LongestPathAnalysis doesn't look dataflow through array/struct, so it's necessary to use integers for ports. Integers will be bit-cast to original type and they always have the same bitwidth.

@uenoku uenoku merged commit 330598e into main Sep 2, 2025
7 checks passed
@uenoku uenoku deleted the dev/hidetou/incremental branch September 2, 2025 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants