Skip to content
This repository was archived by the owner on Jan 7, 2022. It is now read-only.

Conversation

vext01
Copy link
Member

@vext01 vext01 commented Apr 8, 2019

These changes are optimisations and simplifications to the machinery around computing dominance frontiers and inserting phi nodes.

Most notably instead of using the frontier algorithm from Appel book, we instead use the one from the paper "A Simple,Fast Dominance Algorithm. Keith D. Cooper, Timothy J. Harvey, and Ken Kennedy":
https://www.doc.ic.ac.uk/~livshits/classes/CO444H/reading/dom14.pdf

The Rust compiler already implements most of this paper, I just had to add the last little bit.

I had been using grmtools as a benchmark for these changes and was underwhelmed when I saw that they made no performance difference whatsoever :( I was going to suggest we include these changes anyway, as they make the code shorter and easier to read.

But then I tried with these optimisations on the compiler itself and found it shaved off about 2 hours and 15 minutes! I thought it was a fluke, so I repeated the build another two times over the weekend. Sure enough, consistent speedup!

http://bencher12.soft-dev.org:8010/#/builders/2/builds/117
http://bencher12.soft-dev.org:8010/#/builders/2/builds/118
http://bencher12.soft-dev.org:8010/#/builders/2/builds/119

And here's a build from before where the build was slower:
http://bencher12.soft-dev.org:8010/#/builders/2/builds/110

So it was about 6 hours 15 mins, now it's about 4 hours. The build is now faster than when we had a hand-rolled serialiser.

There is one more potential optimisation: remove RefCells. I did a little benchmark and found that refcells have a fair overhead if you use them in a tight loop (especially in release mode). However, to remove the RefCells from our conversion code, we'd need quite a bit of re-factoring. Since I'm getting itchy to move on, I suggest we don't worry about that for now.

let mut dfs = IndexVec::from_elem_n(BitSet::new_empty(num_nodes), num_nodes);

for b in (0..num_nodes).map(|i| G::Node::new(i)) {
if graph.predecessors(b).take(2).count() > 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the maximum number of predecessors? Or is there no upper bound?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we only care if it's a join point -- i.e. it has more than one predecessor.

In general there can be unbounded predecessors.

As you can see from a later comment, this code isn't as nice as I'd hoped, but hey.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was partly wondering if there's a nicer way of saying "how many predecessors does b have"? rather than firing up an iterator. It might not be worth worrying about (depending on how expensive the iterator is to set up).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was hoping for this too!

Predecessors are expressed by a trait with a sole method which returns an iterator:
https://github.com/rust-lang/rust/blob/3750348daff89741e3153e0e120aa70a45ff5b68/src/librustc_data_structures/graph/mod.rs#L35-L43

And then the MIR implementation of this trait is in terms of predecessors_for():
https://github.com/rust-lang/rust/blob/3750348daff89741e3153e0e120aa70a45ff5b68/src/librustc/mir/mod.rs#L233-L236

Let me have a look if I can use any of this stuff directly, but I might be hard to access as we are behind parameterised types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so at the point where we are fiddling with those iterators, we are working with a ControlFlowGraph, which a MIR is an instance of. We can't get directly at the underlying predecessors_for() of the MIR struct, I'm afraid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, then I think this is as good as it gets. Using collect would almost certainly be worse than what we have.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could implement this function (elsewhere, probably in emit_tir.rs) as a concrete function over a MIR of course. I only implemented it here because the rest of the code from that paper was also here and already generic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not worth it. I just wanted to check if something easy had been missed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

// We first need a mapping from block to the variables it defines. Appel calls this
// `A_{orig}`. We can derive this from our definition sites.
let (a_orig, num_tir_blks) = {
let num_tir_blks = blocks.len();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable name is as long as the RHS expression. I'd personally remove the variable and just use the RHS expression where needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code is removed in a later commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@ltratt
Copy link
Member

ltratt commented Apr 8, 2019

bors r+

bors bot added a commit that referenced this pull request Apr 8, 2019
12: simpler dominance frontiers r=ltratt a=vext01

These changes are optimisations and simplifications to the machinery around computing dominance frontiers and inserting phi nodes.

Most notably instead of using the frontier algorithm from Appel book, we instead use the  one from the paper "A Simple,Fast Dominance Algorithm. Keith D. Cooper, Timothy J. Harvey, and Ken Kennedy":
https://www.doc.ic.ac.uk/~livshits/classes/CO444H/reading/dom14.pdf

The Rust compiler already implements most of this paper, I just had to add the last little bit.

I had been using grmtools as a benchmark for these changes and was underwhelmed when I saw that they made no performance difference whatsoever :( I was going to suggest we include these changes anyway, as they make the code shorter and easier to read.

But then I tried with these optimisations on the compiler itself and found it shaved off about 2 hours and 15 minutes! I thought it was a fluke, so I repeated the build another two times over the weekend. Sure enough, consistent speedup!

http://bencher12.soft-dev.org:8010/#/builders/2/builds/117
http://bencher12.soft-dev.org:8010/#/builders/2/builds/118
http://bencher12.soft-dev.org:8010/#/builders/2/builds/119

And here's a build from before where the build was slower:
http://bencher12.soft-dev.org:8010/#/builders/2/builds/110

So it was about 6 hours 15 mins, now it's about 4 hours. The build is now faster than when we had a hand-rolled serialiser.

There is one more potential optimisation: remove `RefCell`s. I did a little benchmark and found that refcells have a fair overhead if you use them in a tight loop (especially in release mode). However, to remove the `RefCell`s from our conversion code, we'd need quite a bit of re-factoring. Since I'm getting itchy to move on, I suggest we don't worry about that for now.

Co-authored-by: Edd Barrett <[email protected]>
@bors
Copy link
Contributor

bors bot commented Apr 8, 2019

Build succeeded

@bors bors bot merged commit 29bf888 into master Apr 8, 2019
@bors bors bot deleted the yk-simpler-frontiers branch April 8, 2019 14:26
bors bot pushed a commit that referenced this pull request Sep 22, 2019
sync with rust-lang/rust branch master
vext01 pushed a commit to vext01/ykrustc that referenced this pull request Oct 12, 2020
This is a combination of 18 commits.

Commit softdevteam#2:

Additional examples and some small improvements.

Commit softdevteam#3:

fixed mir-opt non-mir extensions and spanview title elements

Corrected a fairly recent assumption in runtest.rs that all MIR dump
files end in .mir. (It was appending .mir to the graphviz .dot and
spanview .html file names when generating blessed output files. That
also left outdated files in the baseline alongside the files with the
incorrect names, which I've now removed.)

Updated spanview HTML title elements to match their content, replacing a
hardcoded and incorrect name that was left in accidentally when
originally submitted.

Commit softdevteam#4:

added more test examples

also improved Makefiles with support for non-zero exit status and to
force validation of tests unless a specific test overrides it with a
specific comment.

Commit softdevteam#5:

Fixed rare issues after testing on real-world crate

Commit softdevteam#6:

Addressed PR feedback, and removed temporary -Zexperimental-coverage

-Zinstrument-coverage once again supports the latest capabilities of
LLVM instrprof coverage instrumentation.

Also fixed a bug in spanview.

Commit softdevteam#7:

Fix closure handling, add tests for closures and inner items

And cleaned up other tests for consistency, and to make it more clear
where spans start/end by breaking up lines.

Commit softdevteam#8:

renamed "typical" test results "expected"

Now that the `llvm-cov show` tests are improved to normally expect
matching actuals, and to allow individual tests to override that
expectation.

Commit softdevteam#9:

test coverage of inline generic struct function

Commit softdevteam#10:

Addressed review feedback

* Removed unnecessary Unreachable filter.
* Replaced a match wildcard with remining variants.
* Added more comments to help clarify the role of successors() in the
CFG traversal

Commit softdevteam#11:

refactoring based on feedback

* refactored `fn coverage_spans()`.
* changed the way I expand an empty coverage span to improve performance
* fixed a typo that I had accidently left in, in visit.rs

Commit softdevteam#12:

Optimized use of SourceMap and SourceFile

Commit softdevteam#13:

Fixed a regression, and synched with upstream

Some generated test file names changed due to some new change upstream.

Commit softdevteam#14:

Stripping out crate disambiguators from demangled names

These can vary depending on the test platform.

Commit softdevteam#15:

Ignore llvm-cov show diff on test with generics, expand IO error message

Tests with generics produce llvm-cov show results with demangled names
that can include an unstable "crate disambiguator" (hex value). The
value changes when run in the Rust CI Windows environment. I added a sed
filter to strip them out (in a prior commit), but sed also appears to
fail in the same environment. Until I can figure out a workaround, I'm
just going to ignore this specific test result. I added a FIXME to
follow up later, but it's not that critical.

I also saw an error with Windows GNU, but the IO error did not
specify a path for the directory or file that triggered the error. I
updated the error messages to provide more info for next, time but also
noticed some other tests with similar steps did not fail. Looks
spurious.

Commit softdevteam#16:

Modify rust-demangler to strip disambiguators by default

Commit softdevteam#17:

Remove std::process::exit from coverage tests

Due to Issue #77553, programs that call std::process::exit() do not
generate coverage results on Windows MSVC.

Commit softdevteam#18:

fix: test file paths exceeding Windows max path len
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants