You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By dealing with the `alpha == 0` case separately, we ensure that if
`alpha::Bool`, it must be `true`. This reduces the branches in
`_lscale_add` from 4 to 2 in the common case of 3-argument `mul!`. This
leads to a latency reduction, as each branch has to compile a different
broadcast expression, and we currently compile four but use only one.
Primarily, this PR leads to a reduction in allocations.
```julia
julia> using LinearAlgebra
julia> v = 1:4; w = similar(v);
julia> @time mul!(w, 1, v);
0.171120 seconds (1.04 M allocations: 52.799 MiB, 99.98% compilation time) # nightly
0.163178 seconds (702.63 k allocations: 35.533 MiB, 99.98% compilation time) # this PR
```
Something similar usually doesn't lead to a big gain in the
`_rscale_add` method, as `s * alpha` often has the same type as `s`, and
therefore the branches on `alpha` compile the same code.
0 commit comments