-
Notifications
You must be signed in to change notification settings - Fork 4
simpler dominance frontiers #12
Conversation
let mut dfs = IndexVec::from_elem_n(BitSet::new_empty(num_nodes), num_nodes); | ||
|
||
for b in (0..num_nodes).map(|i| G::Node::new(i)) { | ||
if graph.predecessors(b).take(2).count() > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the maximum number of predecessors? Or is there no upper bound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we only care if it's a join point -- i.e. it has more than one predecessor.
In general there can be unbounded predecessors.
As you can see from a later comment, this code isn't as nice as I'd hoped, but hey.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was partly wondering if there's a nicer way of saying "how many predecessors does b
have"? rather than firing up an iterator. It might not be worth worrying about (depending on how expensive the iterator is to set up).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was hoping for this too!
Predecessors are expressed by a trait with a sole method which returns an iterator:
https://github.com/rust-lang/rust/blob/3750348daff89741e3153e0e120aa70a45ff5b68/src/librustc_data_structures/graph/mod.rs#L35-L43
And then the MIR implementation of this trait is in terms of predecessors_for()
:
https://github.com/rust-lang/rust/blob/3750348daff89741e3153e0e120aa70a45ff5b68/src/librustc/mir/mod.rs#L233-L236
Let me have a look if I can use any of this stuff directly, but I might be hard to access as we are behind parameterised types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so at the point where we are fiddling with those iterators, we are working with a ControlFlowGraph
, which a MIR
is an instance of. We can't get directly at the underlying predecessors_for()
of the MIR
struct, I'm afraid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, then I think this is as good as it gets. Using collect
would almost certainly be worse than what we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could implement this function (elsewhere, probably in emit_tir.rs
) as a concrete function over a MIR of course. I only implemented it here because the rest of the code from that paper was also here and already generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not worth it. I just wanted to check if something easy had been missed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
src/librustc_yk_sections/emit_tir.rs
Outdated
// We first need a mapping from block to the variables it defines. Appel calls this | ||
// `A_{orig}`. We can derive this from our definition sites. | ||
let (a_orig, num_tir_blks) = { | ||
let num_tir_blks = blocks.len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable name is as long as the RHS expression. I'd personally remove the variable and just use the RHS expression where needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That code is removed in a later commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
bors r+ |
12: simpler dominance frontiers r=ltratt a=vext01 These changes are optimisations and simplifications to the machinery around computing dominance frontiers and inserting phi nodes. Most notably instead of using the frontier algorithm from Appel book, we instead use the one from the paper "A Simple,Fast Dominance Algorithm. Keith D. Cooper, Timothy J. Harvey, and Ken Kennedy": https://www.doc.ic.ac.uk/~livshits/classes/CO444H/reading/dom14.pdf The Rust compiler already implements most of this paper, I just had to add the last little bit. I had been using grmtools as a benchmark for these changes and was underwhelmed when I saw that they made no performance difference whatsoever :( I was going to suggest we include these changes anyway, as they make the code shorter and easier to read. But then I tried with these optimisations on the compiler itself and found it shaved off about 2 hours and 15 minutes! I thought it was a fluke, so I repeated the build another two times over the weekend. Sure enough, consistent speedup! http://bencher12.soft-dev.org:8010/#/builders/2/builds/117 http://bencher12.soft-dev.org:8010/#/builders/2/builds/118 http://bencher12.soft-dev.org:8010/#/builders/2/builds/119 And here's a build from before where the build was slower: http://bencher12.soft-dev.org:8010/#/builders/2/builds/110 So it was about 6 hours 15 mins, now it's about 4 hours. The build is now faster than when we had a hand-rolled serialiser. There is one more potential optimisation: remove `RefCell`s. I did a little benchmark and found that refcells have a fair overhead if you use them in a tight loop (especially in release mode). However, to remove the `RefCell`s from our conversion code, we'd need quite a bit of re-factoring. Since I'm getting itchy to move on, I suggest we don't worry about that for now. Co-authored-by: Edd Barrett <[email protected]>
Build succeeded |
sync with rust-lang/rust branch master
This is a combination of 18 commits. Commit softdevteam#2: Additional examples and some small improvements. Commit softdevteam#3: fixed mir-opt non-mir extensions and spanview title elements Corrected a fairly recent assumption in runtest.rs that all MIR dump files end in .mir. (It was appending .mir to the graphviz .dot and spanview .html file names when generating blessed output files. That also left outdated files in the baseline alongside the files with the incorrect names, which I've now removed.) Updated spanview HTML title elements to match their content, replacing a hardcoded and incorrect name that was left in accidentally when originally submitted. Commit softdevteam#4: added more test examples also improved Makefiles with support for non-zero exit status and to force validation of tests unless a specific test overrides it with a specific comment. Commit softdevteam#5: Fixed rare issues after testing on real-world crate Commit softdevteam#6: Addressed PR feedback, and removed temporary -Zexperimental-coverage -Zinstrument-coverage once again supports the latest capabilities of LLVM instrprof coverage instrumentation. Also fixed a bug in spanview. Commit softdevteam#7: Fix closure handling, add tests for closures and inner items And cleaned up other tests for consistency, and to make it more clear where spans start/end by breaking up lines. Commit softdevteam#8: renamed "typical" test results "expected" Now that the `llvm-cov show` tests are improved to normally expect matching actuals, and to allow individual tests to override that expectation. Commit softdevteam#9: test coverage of inline generic struct function Commit softdevteam#10: Addressed review feedback * Removed unnecessary Unreachable filter. * Replaced a match wildcard with remining variants. * Added more comments to help clarify the role of successors() in the CFG traversal Commit softdevteam#11: refactoring based on feedback * refactored `fn coverage_spans()`. * changed the way I expand an empty coverage span to improve performance * fixed a typo that I had accidently left in, in visit.rs Commit softdevteam#12: Optimized use of SourceMap and SourceFile Commit softdevteam#13: Fixed a regression, and synched with upstream Some generated test file names changed due to some new change upstream. Commit softdevteam#14: Stripping out crate disambiguators from demangled names These can vary depending on the test platform. Commit softdevteam#15: Ignore llvm-cov show diff on test with generics, expand IO error message Tests with generics produce llvm-cov show results with demangled names that can include an unstable "crate disambiguator" (hex value). The value changes when run in the Rust CI Windows environment. I added a sed filter to strip them out (in a prior commit), but sed also appears to fail in the same environment. Until I can figure out a workaround, I'm just going to ignore this specific test result. I added a FIXME to follow up later, but it's not that critical. I also saw an error with Windows GNU, but the IO error did not specify a path for the directory or file that triggered the error. I updated the error messages to provide more info for next, time but also noticed some other tests with similar steps did not fail. Looks spurious. Commit softdevteam#16: Modify rust-demangler to strip disambiguators by default Commit softdevteam#17: Remove std::process::exit from coverage tests Due to Issue #77553, programs that call std::process::exit() do not generate coverage results on Windows MSVC. Commit softdevteam#18: fix: test file paths exceeding Windows max path len
These changes are optimisations and simplifications to the machinery around computing dominance frontiers and inserting phi nodes.
Most notably instead of using the frontier algorithm from Appel book, we instead use the one from the paper "A Simple,Fast Dominance Algorithm. Keith D. Cooper, Timothy J. Harvey, and Ken Kennedy":
https://www.doc.ic.ac.uk/~livshits/classes/CO444H/reading/dom14.pdf
The Rust compiler already implements most of this paper, I just had to add the last little bit.
I had been using grmtools as a benchmark for these changes and was underwhelmed when I saw that they made no performance difference whatsoever :( I was going to suggest we include these changes anyway, as they make the code shorter and easier to read.
But then I tried with these optimisations on the compiler itself and found it shaved off about 2 hours and 15 minutes! I thought it was a fluke, so I repeated the build another two times over the weekend. Sure enough, consistent speedup!
http://bencher12.soft-dev.org:8010/#/builders/2/builds/117
http://bencher12.soft-dev.org:8010/#/builders/2/builds/118
http://bencher12.soft-dev.org:8010/#/builders/2/builds/119
And here's a build from before where the build was slower:
http://bencher12.soft-dev.org:8010/#/builders/2/builds/110
So it was about 6 hours 15 mins, now it's about 4 hours. The build is now faster than when we had a hand-rolled serialiser.
There is one more potential optimisation: remove
RefCell
s. I did a little benchmark and found that refcells have a fair overhead if you use them in a tight loop (especially in release mode). However, to remove theRefCell
s from our conversion code, we'd need quite a bit of re-factoring. Since I'm getting itchy to move on, I suggest we don't worry about that for now.