special: trig: smaller polynomials for sinpi,cospi kernels for f16,f32 #58977

nsajko · 2025-07-12T02:11:29Z

Regenerate the polynomials for polynomial approximation used in the implementation of sinpi and cospi kernels for Float16 and Float32. The main difference is that smaller polynomials are used, which should help performance. There should be no loss in accuracy, as the kernels for Float16 and Float32 are "wide", computed in increased precision before rounding the final result back to the original narrow type. The current minimax polynomials for these wide kernels have unnecessarily great degree.

Another difference, for cospi, is that coefficient 0 is not constrained to be exactly equal to 1 any more. This means the polynomial approximation is not exact for zero inputs any more, however this should be OK, considering the result is rounded to a narrower type before returning it to the user in any case. Relaxing the value of coefficient 0 possibly allows further minimizing the overall maximum relative error (if keeping the polynomial degree fixed).

The polynomials for the kernel implementations are generated using Sollya. Apart from generating the minimax polynomial using the Remez algorithm, Sollya also takes care of rounding the big coefficients into machine precision in a way that loses less accuracy than with naive coefficient-wise rounding.

All relevant Sollya code is included in relevant doc strings.

Regenerate the polynomials for polynomial approximation used in the implementation of `sinpi` and `cospi` kernels for `Float16` and `Float32`. The main difference is that smaller polynomials are used, which should help performance. There should be no loss in accuracy, as the kernels for `Float16` and `Float32` are "wide", computed in increased precision before rounding the final result back to the original narrow type. The current minimax polynomials for these wide kernels have unnecessarily great degree. Another difference, for `cospi`, is that coefficient 0 is not constrained to be exactly equal to `1` any more. This means the polynomial approximation is not exact for zero inputs any more, however this should be OK, considering the result is rounded to a narrower type before returning it to the user in any case. Relaxing the value of coefficient 0 possibly allows further minimizing the overall maximum relative error (if keeping the polynomial degree fixed). The polynomials for the kernel implementations are generated using Sollya. Apart from generating the minimax polynomial using the Remez algorithm, Sollya also takes care of rounding the big coefficients into machine precision in a way that loses less accuracy than with naive coefficient-wise rounding. All relevant Sollya code is included in relevant doc strings.

nsajko · 2025-07-12T02:14:48Z

Experiments indicate there is no loss of accuracy:

module RelativeError
    export relative_error
    @noinline function throw_invalid()
        throw(ArgumentError("invalid"))
    end
    function relative_error(accurate::AbstractFloat, approximate::AbstractFloat)
        if isnan(accurate)
            @noinline throw_invalid()
        end
        if isnan(approximate)
            @noinline throw_invalid()
        end
        # the relative error is usually not required to great accuracy, so `Float32` should be wide enough
        zero_return = Float32(0)
        inf_return = Float32(Inf)
        if isinf(accurate) || iszero(accurate)  # handle floating-point edge cases
            if isinf(accurate)
                if isinf(approximate) && (signbit(accurate) == signbit(approximate))
                    return zero_return
                end
                return inf_return
            end
            # `iszero(accurate)`
            if iszero(approximate)
                return zero_return
            end
            return inf_return
        end
        # assuming `precision(BigFloat)` is great enough
        acc = if accurate isa BigFloat
            accurate
        else
            BigFloat(acc)::BigFloat
        end
        # https://nhigham.com/2017/08/14/how-and-how-not-to-compute-a-relative-error/
        err = abs(Float32((approximate - acc) / acc))
        if isnan(err)
            @noinline throw_invalid()  # unexpected
        end
        err
    end
    function relative_error(accurate::Acc, approximate::App, x::AbstractFloat) where {Acc, App}
        acc = accurate(x)
        app = approximate(x)
        relative_error(acc, app)
    end
    function relative_error(func::Func, x::AbstractFloat) where {Func}
        relative_error(func ∘ BigFloat, func, x)
    end
end

module RelativeErrorMaximum
    export relative_error_maximum
    using ..RelativeError
    function relative_error_maximum(func::Func, iterator) where {Func}
        function f(x::AbstractFloat)
            relative_error(func, x)
        end
        maximum(f, iterator)
    end
end

module FloatingPointExhaustiveIterators
    export FloatingPointExhaustiveIterator
    @noinline function throw_err_not_finite()
        throw(ArgumentError("not finite"))
    end
    struct FloatingPointExhaustiveIterator{T <: AbstractFloat}
        start::T
        stop::T
        function FloatingPointExhaustiveIterator(start::AbstractFloat, stop::AbstractFloat)
            if !isfinite(start)
                @noinline throw_err_not_finite()
            end
            if !isfinite(stop)
                @noinline throw_err_not_finite()
            end
            new{typeof(start)}(start, stop)
        end
    end
    function Base.IteratorSize(::Type{<:FloatingPointExhaustiveIterator})
        Base.SizeUnknown()
    end
    function Base.IteratorEltype(::Type{<:FloatingPointExhaustiveIterator})
        Base.HasEltype()
    end
    function Base.eltype(::Type{<:FloatingPointExhaustiveIterator{T}}) where {T}
        T
    end
    function _iterate(iterator::FloatingPointExhaustiveIterator, state::AbstractFloat)
        start = iterator.start
        stop = iterator.stop
        if !isfinite(start)
            @noinline throw_err_not_finite()
        end
        if !isfinite(stop)
            @noinline throw_err_not_finite()
        end
        if stop < state
            return nothing
        end
        if !isfinite(state)
            @noinline throw_err_not_finite()
        end
        (state, nextfloat(state))
    end
    function Base.iterate(iterator::FloatingPointExhaustiveIterator, state = iterator.start)
        _iterate(iterator, state)
    end
end

setprecision(BigFloat, 100)

function max_err_exhaustive(func::Func, start::AbstractFloat, stop::AbstractFloat) where {Func}
    iterator = FloatingPointExhaustiveIterators.FloatingPointExhaustiveIterator(start, stop)
    RelativeErrorMaximum.relative_error_maximum(func, iterator)
end

function max_err_nonexhaustive(func::Func, start::AbstractFloat, stop::AbstractFloat) where {Func}
    iterator = range(; start, stop, length = 2^12)
    RelativeErrorMaximum.relative_error_maximum(func, iterator)
end

function max_err(func::Func, start::Float16, stop::Float16) where {Func}
    max_err_exhaustive(func, start, stop)
end

function max_err(func::Func, start::Float32, stop::Float32) where {Func}
    max_err_nonexhaustive(func, start, stop)
end

function f()
    for func ∈ (cospi, sinpi)
        print(' ' ^ (2*1))
        @show func
        for type ∈ (Float16, Float32)
            print(' ' ^ (2*2))
            @show type
            for (a, b) ∈ ((0.0, 0.125), (0.125, 0.25))
                print(' ' ^ (2*3))
                @show (a, b)
                print(' ' ^ (2*3))
                @show max_err(func, type(a), type(b))
            end
        end
    end
    print('\n')
end

println("before")
f()

include("/tmp/j.jl")  # load the PR changes

println("after")
f()

Results (same before and after):

before
  func = cospi
    type = Float16
      (a, b) = (0.0, 0.125)
      max_err(func, type(a), type(b)) = 0.00026347567f0
      (a, b) = (0.125, 0.25)
      max_err(func, type(a), type(b)) = 0.0003340189f0
    type = Float32
      (a, b) = (0.0, 0.125)
      max_err(func, type(a), type(b)) = 3.1983575f-8
      (a, b) = (0.125, 0.25)
      max_err(func, type(a), type(b)) = 4.169794f-8
  func = sinpi
    type = Float16
      (a, b) = (0.0, 0.125)
      max_err(func, type(a), type(b)) = 0.045070343f0
      (a, b) = (0.125, 0.25)
      max_err(func, type(a), type(b)) = 0.00048023803f0
    type = Float32
      (a, b) = (0.0, 0.125)
      max_err(func, type(a), type(b)) = 5.9350548f-8
      (a, b) = (0.125, 0.25)
      max_err(func, type(a), type(b)) = 5.8340195f-8

after
  func = cospi
    type = Float16
      (a, b) = (0.0, 0.125)
      max_err(func, type(a), type(b)) = 0.00026347567f0
      (a, b) = (0.125, 0.25)
      max_err(func, type(a), type(b)) = 0.0003340189f0
    type = Float32
      (a, b) = (0.0, 0.125)
      max_err(func, type(a), type(b)) = 3.1983575f-8
      (a, b) = (0.125, 0.25)
      max_err(func, type(a), type(b)) = 4.169794f-8
  func = sinpi
    type = Float16
      (a, b) = (0.0, 0.125)
      max_err(func, type(a), type(b)) = 0.045070343f0
      (a, b) = (0.125, 0.25)
      max_err(func, type(a), type(b)) = 0.00048023803f0
    type = Float32
      (a, b) = (0.0, 0.125)
      max_err(func, type(a), type(b)) = 5.9350548f-8
      (a, b) = (0.125, 0.25)
      max_err(func, type(a), type(b)) = 5.8340195f-8

nsajko · 2025-07-12T02:35:10Z

Now I notice the coefficient counts actually stay the same for the Float32 implementations, contrary to what I say above. I suppose it would still be preferable to merge the entire PR for the better internal documentation. That said, FTR, I might try to make other extensive changes to the trig functions soon if I get the time, which could mean this PR was unnecessary, if all of them get merged in the end.

base/special/trig.jl

oscardssmith · 2025-07-12T05:14:50Z

base/special/trig.jl

@@ -725,6 +725,146 @@ function acos(x::T) where T <: Union{Float32, Float64}
    end
 end

+# Not used at run time, so prevent unnecessary specialization/inference.
+Base.@nospecializeinfer function _exact_string_to_floating_point((@nospecialize type::Type{<:AbstractFloat}), string::String, precision::Int = 1000)


This function isn't useful. Julia's parser parses floating point numbers correctly.

I know. The reason this function existed is just to ensure I didn't make any copy-paste error, as Sollya provides the coefficients exactly.

However, I'll have to get rid of this anyway because bootstrap doesn't like BigFloat.

fair. IMO it shouldn't be committed though. Once the kernel is tested, it is a lot easier to read to just embed the normal versions of the coefs

oscardssmith · 2025-07-12T05:15:30Z

Can you give error in ULPs rather than absolute/relative? It's much more easily interperable.

That said nice work! I've never really liked sollya, but it is excelent for its job.

oscardssmith · 2025-07-12T05:24:07Z

Also, I do think it's worth checking whether we can compute cospi without extended precision. The vibes of the function suggest that it should be possible.

nsajko added performance Must go faster maths Mathematical functions labels Jul 12, 2025

nsajko commented Jul 12, 2025

View reviewed changes

base/special/trig.jl Outdated Show resolved Hide resolved

attempt to fix bootstrap

ca39015

oscardssmith reviewed Jul 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

special: trig: smaller polynomials for sinpi,cospi kernels for f16,f32 #58977

special: trig: smaller polynomials for sinpi,cospi kernels for f16,f32 #58977

nsajko commented Jul 12, 2025

Uh oh!

nsajko commented Jul 12, 2025

Uh oh!

nsajko commented Jul 12, 2025

Uh oh!

Uh oh!

oscardssmith Jul 12, 2025

Uh oh!

nsajko Jul 12, 2025

Uh oh!

nsajko Jul 12, 2025

Uh oh!

oscardssmith Jul 12, 2025 •

edited

Loading

Uh oh!

oscardssmith commented Jul 12, 2025 •

edited

Loading

Uh oh!

oscardssmith commented Jul 12, 2025

Uh oh!

Uh oh!

Uh oh!

special: trig: smaller polynomials for sinpi,cospi kernels for f16,f32 #58977

Are you sure you want to change the base?

special: trig: smaller polynomials for sinpi,cospi kernels for f16,f32 #58977

Conversation

nsajko commented Jul 12, 2025

Uh oh!

nsajko commented Jul 12, 2025

Uh oh!

nsajko commented Jul 12, 2025

Uh oh!

Uh oh!

oscardssmith Jul 12, 2025

Choose a reason for hiding this comment

Uh oh!

nsajko Jul 12, 2025

Choose a reason for hiding this comment

Uh oh!

nsajko Jul 12, 2025

Choose a reason for hiding this comment

Uh oh!

oscardssmith Jul 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oscardssmith commented Jul 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oscardssmith commented Jul 12, 2025

Uh oh!

Uh oh!

oscardssmith Jul 12, 2025 •

edited

Loading

oscardssmith commented Jul 12, 2025 •

edited

Loading