-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
special: trig: smaller polynomials for sinpi,cospi kernels for f16,f32 #58977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
special: trig: smaller polynomials for sinpi,cospi kernels for f16,f32 #58977
Conversation
Regenerate the polynomials for polynomial approximation used in the implementation of `sinpi` and `cospi` kernels for `Float16` and `Float32`. The main difference is that smaller polynomials are used, which should help performance. There should be no loss in accuracy, as the kernels for `Float16` and `Float32` are "wide", computed in increased precision before rounding the final result back to the original narrow type. The current minimax polynomials for these wide kernels have unnecessarily great degree. Another difference, for `cospi`, is that coefficient 0 is not constrained to be exactly equal to `1` any more. This means the polynomial approximation is not exact for zero inputs any more, however this should be OK, considering the result is rounded to a narrower type before returning it to the user in any case. Relaxing the value of coefficient 0 possibly allows further minimizing the overall maximum relative error (if keeping the polynomial degree fixed). The polynomials for the kernel implementations are generated using Sollya. Apart from generating the minimax polynomial using the Remez algorithm, Sollya also takes care of rounding the big coefficients into machine precision in a way that loses less accuracy than with naive coefficient-wise rounding. All relevant Sollya code is included in relevant doc strings.
Experiments indicate there is no loss of accuracy: module RelativeError
export relative_error
@noinline function throw_invalid()
throw(ArgumentError("invalid"))
end
function relative_error(accurate::AbstractFloat, approximate::AbstractFloat)
if isnan(accurate)
@noinline throw_invalid()
end
if isnan(approximate)
@noinline throw_invalid()
end
# the relative error is usually not required to great accuracy, so `Float32` should be wide enough
zero_return = Float32(0)
inf_return = Float32(Inf)
if isinf(accurate) || iszero(accurate) # handle floating-point edge cases
if isinf(accurate)
if isinf(approximate) && (signbit(accurate) == signbit(approximate))
return zero_return
end
return inf_return
end
# `iszero(accurate)`
if iszero(approximate)
return zero_return
end
return inf_return
end
# assuming `precision(BigFloat)` is great enough
acc = if accurate isa BigFloat
accurate
else
BigFloat(acc)::BigFloat
end
# https://nhigham.com/2017/08/14/how-and-how-not-to-compute-a-relative-error/
err = abs(Float32((approximate - acc) / acc))
if isnan(err)
@noinline throw_invalid() # unexpected
end
err
end
function relative_error(accurate::Acc, approximate::App, x::AbstractFloat) where {Acc, App}
acc = accurate(x)
app = approximate(x)
relative_error(acc, app)
end
function relative_error(func::Func, x::AbstractFloat) where {Func}
relative_error(func ∘ BigFloat, func, x)
end
end
module RelativeErrorMaximum
export relative_error_maximum
using ..RelativeError
function relative_error_maximum(func::Func, iterator) where {Func}
function f(x::AbstractFloat)
relative_error(func, x)
end
maximum(f, iterator)
end
end
module FloatingPointExhaustiveIterators
export FloatingPointExhaustiveIterator
@noinline function throw_err_not_finite()
throw(ArgumentError("not finite"))
end
struct FloatingPointExhaustiveIterator{T <: AbstractFloat}
start::T
stop::T
function FloatingPointExhaustiveIterator(start::AbstractFloat, stop::AbstractFloat)
if !isfinite(start)
@noinline throw_err_not_finite()
end
if !isfinite(stop)
@noinline throw_err_not_finite()
end
new{typeof(start)}(start, stop)
end
end
function Base.IteratorSize(::Type{<:FloatingPointExhaustiveIterator})
Base.SizeUnknown()
end
function Base.IteratorEltype(::Type{<:FloatingPointExhaustiveIterator})
Base.HasEltype()
end
function Base.eltype(::Type{<:FloatingPointExhaustiveIterator{T}}) where {T}
T
end
function _iterate(iterator::FloatingPointExhaustiveIterator, state::AbstractFloat)
start = iterator.start
stop = iterator.stop
if !isfinite(start)
@noinline throw_err_not_finite()
end
if !isfinite(stop)
@noinline throw_err_not_finite()
end
if stop < state
return nothing
end
if !isfinite(state)
@noinline throw_err_not_finite()
end
(state, nextfloat(state))
end
function Base.iterate(iterator::FloatingPointExhaustiveIterator, state = iterator.start)
_iterate(iterator, state)
end
end
setprecision(BigFloat, 100)
function max_err_exhaustive(func::Func, start::AbstractFloat, stop::AbstractFloat) where {Func}
iterator = FloatingPointExhaustiveIterators.FloatingPointExhaustiveIterator(start, stop)
RelativeErrorMaximum.relative_error_maximum(func, iterator)
end
function max_err_nonexhaustive(func::Func, start::AbstractFloat, stop::AbstractFloat) where {Func}
iterator = range(; start, stop, length = 2^12)
RelativeErrorMaximum.relative_error_maximum(func, iterator)
end
function max_err(func::Func, start::Float16, stop::Float16) where {Func}
max_err_exhaustive(func, start, stop)
end
function max_err(func::Func, start::Float32, stop::Float32) where {Func}
max_err_nonexhaustive(func, start, stop)
end
function f()
for func ∈ (cospi, sinpi)
print(' ' ^ (2*1))
@show func
for type ∈ (Float16, Float32)
print(' ' ^ (2*2))
@show type
for (a, b) ∈ ((0.0, 0.125), (0.125, 0.25))
print(' ' ^ (2*3))
@show (a, b)
print(' ' ^ (2*3))
@show max_err(func, type(a), type(b))
end
end
end
print('\n')
end
println("before")
f()
include("/tmp/j.jl") # load the PR changes
println("after")
f() Results (same before and after):
|
Now I notice the coefficient counts actually stay the same for the |
@@ -725,6 +725,146 @@ function acos(x::T) where T <: Union{Float32, Float64} | |||
end | |||
end | |||
|
|||
# Not used at run time, so prevent unnecessary specialization/inference. | |||
Base.@nospecializeinfer function _exact_string_to_floating_point((@nospecialize type::Type{<:AbstractFloat}), string::String, precision::Int = 1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function isn't useful. Julia's parser parses floating point numbers correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. The reason this function existed is just to ensure I didn't make any copy-paste error, as Sollya provides the coefficients exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I'll have to get rid of this anyway because bootstrap doesn't like BigFloat
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair. IMO it shouldn't be committed though. Once the kernel is tested, it is a lot easier to read to just embed the normal versions of the coefs
Can you give error in ULPs rather than absolute/relative? It's much more easily interperable. That said nice work! I've never really liked sollya, but it is excelent for its job. |
Also, I do think it's worth checking whether we can compute |
Regenerate the polynomials for polynomial approximation used in the implementation of
sinpi
andcospi
kernels forFloat16
andFloat32
. The main difference is that smaller polynomials are used, which should help performance. There should be no loss in accuracy, as the kernels forFloat16
andFloat32
are "wide", computed in increased precision before rounding the final result back to the original narrow type. The current minimax polynomials for these wide kernels have unnecessarily great degree.Another difference, for
cospi
, is that coefficient 0 is not constrained to be exactly equal to1
any more. This means the polynomial approximation is not exact for zero inputs any more, however this should be OK, considering the result is rounded to a narrower type before returning it to the user in any case. Relaxing the value of coefficient 0 possibly allows further minimizing the overall maximum relative error (if keeping the polynomial degree fixed).The polynomials for the kernel implementations are generated using Sollya. Apart from generating the minimax polynomial using the Remez algorithm, Sollya also takes care of rounding the big coefficients into machine precision in a way that loses less accuracy than with naive coefficient-wise rounding.
All relevant Sollya code is included in relevant doc strings.