-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Don't @inbounds
AbstractArray's iterate method; optimize checkbounds
instead
#58793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Do you have any benchmarks handy? |
Using @jishnub's benchmark: Nightly @ f61c640 (which has the julia> using BenchmarkTools
Precompiling BenchmarkTools finished.
8 dependencies successfully precompiled in 12 seconds. 9 already precompiled.
julia> using LinearAlgebra
julia> A = rand(1000,1000); v2 = view(A, 1:2:lastindex(A));
julia> @btime norm(Iterators.map(splat(-), zip($v2, $v2)));
1.941 ms (0 allocations: 0 bytes) vs. b09514e: julia> @btime norm(Iterators.map(splat(-), zip($v2, $v2)));
312.167 μs (0 allocations: 0 bytes) |
@inline iterate(A::AbstractArray, state = iterate_starting_state(A)) = _iterate(A, state) | ||
@inline function _iterate(A::AbstractArray, state::Tuple) | ||
y = iterate(state...) | ||
y === nothing && return nothing | ||
A[y[1]], (state[1], tail(y)...) | ||
end | ||
function _iterate(A::AbstractArray, state::Integer) | ||
@inline function _iterate(A::AbstractArray, state::Integer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these functions really so big that they aren't being inlined automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before b09514e, yes, they were (well, kinda. length
itself was not inlining; had I added @inline
to it, though, it'd make this method not inline). That branch was enough to push things over the inlining limit. As I wrote in #58785 (comment),
this calls abstract infrastructure that could be large (and itself
@inline
'd)... which would then require these generic methods to be similarly@inline
.
@nanosoldier |
Test failures (quite a lot) look relevant 😢 |
@inbounds
in AbstractArray's iterate method@inbounds
AbstractArray's iterate method; optimize checkbounds
instead
This is not valid because it only checks the resulting index into the parent (which we explicitly say we do not check) but skips the checks into the indices (which are the important ones!)
theoretically this is not guaranteed -- indeed an unreachable branch is not currently present for CodeUnits
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
Very good PR. Benchmarks are an interesting mixed bag of slowdowns and infinite speedups :) Not sure what to do about it. |
The regressions that look meaningful to me are in the iteration of |
OK, I don't think any of the regressions are actually real. I can't reproduce the iteration ones locally, and BenchmarkTools is doing something wildly wrong for the in-place BitSet ones. Just looking at nightly: julia> using Random, StableRNGs, BenchmarkTools
julia> const RNG = StableRNG(1)
StableRNGs.LehmerRNG(state=0x00000000000000000000000000000003)
julia> const iterlen = 1000;
julia> const ints = rand(RNG, 1:iterlen, iterlen);
julia> c = BitSet(ints);
julia> const newints = [rand(RNG, ints, 10); rand(RNG, 1:iterlen, 10); rand(RNG, iterlen:2iterlen, 10);];
julia> c2 = BitSet(newints);
julia> @benchmark union!(x, $c2) setup=(x=copy($c)) evals=1
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 0.001 ns … 916.000 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 42.000 ns ┊ GC (median): 0.00%
Time (mean ± σ): 46.112 ns ± 33.456 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▄ ▇█ ▂ ▁▃ ▁
█▁▁▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁▁▁▁▆▁▁▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁▁▁▇ █
0.001 ns Histogram: log(frequency) by time 208 ns <
Memory estimate: 736 bytes, allocs estimate: 1. That is... weird. @nanosoldier |
@nanosoldier |
🤷 @nanosoldier |
I don't think nanosoldier knows how to distribute |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
That looks much better. I think there's only two (non-scalar) regressions over 1.1x that's on both pages:
The perf improvements are stable across the two runs. This is looking good to me. |
Just for posterity, as I understand it, there are three separate considerations — each of which help increase the chance that the compiler does something smart here:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not deeply familiar with this part of the code, but changes look sensible, the diff in base
(excluding test
) is net negative, we get new tests and more consistently better performance, looks like a win-win situation.
Co-authored-by: N5N3 <[email protected]>
OK, given that the julia> using BenchmarkTools, LinearAlgebra
julia> A = rand(1000,1000); v2 = view(A, 1:2:lastindex(A));
julia> @btime norm(Iterators.map(splat(-), zip($v2, $v2)));
931.917 μs (0 allocations: 0 bytes) |
Split off from #58785, this simplifies
iterate
and removes the@inbounds
call that was added in #58635. It achieves the same (or better!) performance, however, by targeting optimizations incheckbounds
and — in particular — the construction of a lineareachindex
(against which the bounds are checked).