Our benchmark shows that certain FFT routines are more accurate than others. In other cases, a routine is accurate on one machine but not on another. This page attempts to explain why these phenomena occur.
We first comment on general accuracy features of FFT algorithms, then on the effects of floating-point types, and finally on specific codes. See also our accuracy benchmark methodology.
Most FFT implementations in the benchmark are based on the Cooley-Tukey algorithm, whose floating-point error grows as O(log N) in the worst case (Gentleman & Sande, 1966) and as O(√log N) on average (Schatzman, 1996), for a 1d transform of size N. For these bounds to hold, certain trigonometric constants used by the algorithm (the twiddle factors) must be computed accurately. Inaccurate twiddle factors are the most likely reason for the inaccuracy of an FFT routine.
Inaccurate twiddle factors are usually caused by poor trigonometric recurrences. A trigonometric recurrence is a trick that many codes use to avoid the time and memory overhead of precomputing and storing an array of accurate twiddle factors. Instead, these codes use a formula that starts with a small number of twiddle factors and iteratively combines them to produce the other constants needed by the algorithm. Unfortunately, most such recurrence formulas accumulate errors as O(√N), O(N), or even O(N2), much faster than the FFT itself (Tasche & Zeuner, 2002).
In other cases, poor accuracy is inherent in the algorithm. For example, the QFT code uses accurate precomputed trigonometric functions, but the algorithm seems to be intrinsically less stable than Cooley-Tukey, probably because of its reliance on the secant function, which is singular.
Even inaccurate recurrence formulas can produce accurate results if they are implemented in higher precision (see also Kahan, 2001). A single-precision FFT routine may be accurate even if it uses an inaccurate recurrence, if the recurrence is explicitly computed in double precision.
Certain machines/compilers may implicitly compute certain formulas
at a higher precision than explicitly requested by the user. For
example, the x86 family of processors supports 32-bit
single-precision, 64-bit double-precision, and 80-bit
extended-precision floating-point numbers. If the floating-point unit
is set in extended-precision mode (which is the default in GNU/Linux),
variables that are register-allocated by the compiler become
extended-precision even if they were declared float
or
double
. Consequently, even double-precision routines
that use inaccurate recurrences may become accurate on x86 machines
(but they are still inaccurate on other processors).
On x86, one cannot in general predict which precision will be used
for recurrence formulas simply by looking at the source code, because
the final precision depends on the register-allocation choices of the
compiler. This uncertainty could be removed by using long
double
types explicitly, but none of the benchmarked codes do
this. Consequently, you should be wary of a routine that happens to
be accurate only on x86 with our compiler, because the routine may no
longer be accurate with your compiler.
x86 SIMD instructions (i.e., SSE, SSE2, and 3DNow!) do not offer extended precision. This causes a very slight degradation in the accuracy of SIMD codes on x86 compared to their non-SIMD variants. The SIMD accuracy on x86 processors is nevertheless the same as the accuracy on other machines for the same precision.
We now comment on the accuracy results for some specific codes out of the many we benchmark, in view of the previous discussion about recurrence formulas and floating-point types. We have not performed a detailed error analysis of these codes, but have merely inspected the code's choice of trigonometric recurrence and drawn the most likely conclusions.
Both the Numerical Recipes (nr-c and nr-f) and
ransom codes use the same O(√N)-accuracy recurrence
algorithm with explicitly double-precision variables, which for
single-precision transforms produces accurate results (for the sizes
benchmarked). The Bloodworth (bloodworth and
bloodworth-fht) codes use this recurrence with variables of the
same precision as the FFT, but on x86 our compiler is apparently able
to keep these in extended-precision registers, producing accurate
results in both single and double precision; its accuracy degrades on
other CPUs, however. (The author, C. Bloodworth, was aware of this
issue: the original version of the code was in double precision, with
an option to use explicit extended-precision types for the
recurrence.) Numerical Recipes is only benchmarked in its
"native" single precision, but would exhibit the same problems if you
followed its authors' recommendation to make a double-precision code
by s/float/double/g
.
The as117, as83, and as97 routines, as well as the singleton code, all use the O(√N)-accuracy recurrence as in N. R. above, but with single-precision variables and correspondingly large errors. (as117 uses a different FFT algorithm due to Bluestein, which is more expensive and which apparently compounds the trig. errors.)
The gsl-radix2 routines also use the O(√N)-accuracy recurrence, with the recurrence in the same precision as the FFT so that the transform accuracy is degraded. The faster gsl-mixed-radix routines (based on fftpack), on the other hand, use accurate precomputed twiddle tables.
The monnier code uses precomputed twiddle factors, but computes the angles in the table via repeated addition instead of multiplications, which gratuitously degrades the accuracy.
The double-precision mixfft code uses an O(N)-accuracy
recurrence algorithm, and employs small arrays as part of its
recurrence code that apparently prevent extended-precision registers
from being fully exploited on x86. The accuracy of mixfft is
further degraded because the code specifies certain constants with
fewer digits than the full double-precision accuracy (e.g.,
7.0710678118655E-01
). The glassman,
napack, and sorensen-rsrfft codes use the same
recurrence algorithm in single precision.
The double-precision rmayer-unstable routine uses the O(N)-accuracy method above, while rmayer-buneman is a variant of the same code that uses a recurrence algorithm by Buneman (1987) achieving O(√log N) mean accuracy (like the FFT itself) via a precomputed table of O(log N) twiddle factors.
The cross code uses an O(N2)-accuracy recurrence algorithm, performed in double precision. For single-precision transforms, the recurrence is still accurate enough not to degrade the FFT accuracy until N=262144. For double-precision transforms (for which the code is apparently unable to fully exploit extended-precision registers on x86), however, this code is typically the least accurate of all the routines we benchmark.
The accuracy of fftw3 is slightly better on x86 machines without SSE/SSE2/3DNow! than on machines with these instructions, because the standard x86 floating-point unit computes intermediate results in extended-precision, whereas SSE and SSE2 use single and double precision, respectively. You can see the difference if you compare e.g. the Pentium II results (no SIMD) to the Pentium IV results (SIMD in single and double precision) or the AMD results (single-precision SIMD).