Skip to content

Commit 56a893c

Browse files
authored
Merge pull request #694 from daniestevez/divide-aarch64
volk_32fc_x2_divide_32fc: add documentation about numerical accuracy
2 parents f2535d0 + c364110 commit 56a893c

File tree

1 file changed

+38
-0
lines changed

1 file changed

+38
-0
lines changed

kernels/volk/volk_32fc_x2_divide_32fc.h

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,44 @@
5454
* volk_free(input_vector);
5555
* volk_free(out);
5656
* \endcode
57+
*
58+
* \b Numerical accuracy
59+
*
60+
* The division of complex numbers (a + bj)/(c + dj) is calculated as
61+
* (a*c + b*d)/(c*c + d*d) + (b*c - a*d)/(c*c + d*d) j.
62+
*
63+
* Since Volk is built using unsafe math optimizations
64+
* (-fcx-limited-range -funsafe-math-optimizations with gcc), after computing
65+
* the two numerators and the denominator in the above formula, the compiler
66+
* can either:
67+
*
68+
* a) compute the two divisions;
69+
* b) compute the inverse of the denominator, 1.0/(c*c + d*d), and then
70+
* multiply this number by each of the numerators.
71+
*
72+
* Under gcc, b) is allowed by -freciprocal-math, which is included in
73+
* -funsafe-math-optimizations.
74+
*
75+
* Depending on the architecture and the estimated cost of multiply and divide
76+
* instructions, the compiler can perform a) or b). gcc performs a) under x86_64
77+
* and armv7, and b) under aarch64.
78+
*
79+
* To avoid obtaining inf or nan in some architectures, care should be taken
80+
* that c*c + d*d is not too small. In particular, if c*c + d*d < FLT_MAX, then
81+
* the calculation of 1.0/(c*c + d*d) will yield inf.
82+
*
83+
* For more information about numerical accuracy of complex division, see the
84+
* following:
85+
*
86+
* - https://godbolt.org/z/z9z5b9oM5 Compiler Explorer example showing how
87+
* complex division is done in different architectures and with different
88+
* optimization options.
89+
*
90+
* -
91+
* https://github.com/gcc-mirror/gcc/blob/7e3c58bfc1d957e48faf0752758da0fed811ed58/libgcc/libgcc2.c#L2707
92+
* gcc's implementation of the __divsc3 function, which implements C99 complex
93+
* division semantics and handles several potential numerical problems. This
94+
* function is used whenever -fcx-limited-range is not enabled.
5795
*/
5896

5997
#ifndef INCLUDED_volk_32fc_x2_divide_32fc_u_H

0 commit comments

Comments
 (0)