|
54 | 54 | * volk_free(input_vector); |
55 | 55 | * volk_free(out); |
56 | 56 | * \endcode |
| 57 | + * |
| 58 | + * \b Numerical accuracy |
| 59 | + * |
| 60 | + * The division of complex numbers (a + bj)/(c + dj) is calculated as |
| 61 | + * (a*c + b*d)/(c*c + d*d) + (b*c - a*d)/(c*c + d*d) j. |
| 62 | + * |
| 63 | + * Since Volk is built using unsafe math optimizations |
| 64 | + * (-fcx-limited-range -funsafe-math-optimizations with gcc), after computing |
| 65 | + * the two numerators and the denominator in the above formula, the compiler |
| 66 | + * can either: |
| 67 | + * |
| 68 | + * a) compute the two divisions; |
| 69 | + * b) compute the inverse of the denominator, 1.0/(c*c + d*d), and then |
| 70 | + * multiply this number by each of the numerators. |
| 71 | + * |
| 72 | + * Under gcc, b) is allowed by -freciprocal-math, which is included in |
| 73 | + * -funsafe-math-optimizations. |
| 74 | + * |
| 75 | + * Depending on the architecture and the estimated cost of multiply and divide |
| 76 | + * instructions, the compiler can perform a) or b). gcc performs a) under x86_64 |
| 77 | + * and armv7, and b) under aarch64. |
| 78 | + * |
| 79 | + * To avoid obtaining inf or nan in some architectures, care should be taken |
| 80 | + * that c*c + d*d is not too small. In particular, if c*c + d*d < FLT_MAX, then |
| 81 | + * the calculation of 1.0/(c*c + d*d) will yield inf. |
| 82 | + * |
| 83 | + * For more information about numerical accuracy of complex division, see the |
| 84 | + * following: |
| 85 | + * |
| 86 | + * - https://godbolt.org/z/z9z5b9oM5 Compiler Explorer example showing how |
| 87 | + * complex division is done in different architectures and with different |
| 88 | + * optimization options. |
| 89 | + * |
| 90 | + * - |
| 91 | + * https://github.com/gcc-mirror/gcc/blob/7e3c58bfc1d957e48faf0752758da0fed811ed58/libgcc/libgcc2.c#L2707 |
| 92 | + * gcc's implementation of the __divsc3 function, which implements C99 complex |
| 93 | + * division semantics and handles several potential numerical problems. This |
| 94 | + * function is used whenever -fcx-limited-range is not enabled. |
57 | 95 | */ |
58 | 96 |
|
59 | 97 | #ifndef INCLUDED_volk_32fc_x2_divide_32fc_u_H |
|
0 commit comments