benchFFT logo
Go back to the benchFFT accuracy results page.

FFT Accuracy Benchmark Comments

Our benchmark shows that certain FFT routines are more accurate than others. In other cases, a routine is accurate on one machine but not on another. This page attempts to explain why these phenomena occur.

We first comment on general accuracy features of FFT algorithms, then on the effects of floating-point types, and finally on specific codes. See also our accuracy benchmark methodology.

Generalities

Most FFT implementations in the benchmark are based on the Cooley-Tukey algorithm, whose floating-point error grows as O(log N) in the worst case (Gentleman & Sande, 1966) and as O(√log N) on average (Schatzman, 1996), for a 1d transform of size N. For these bounds to hold, certain trigonometric constants used by the algorithm (the twiddle factors) must be computed accurately. Inaccurate twiddle factors are the most likely reason for the inaccuracy of an FFT routine.

Inaccurate twiddle factors are usually caused by poor trigonometric recurrences. A trigonometric recurrence is a trick that many codes use to avoid the time and memory overhead of precomputing and storing an array of accurate twiddle factors. Instead, these codes use a formula that starts with a small number of twiddle factors and iteratively combines them to produce the other constants needed by the algorithm. Unfortunately, most such recurrence formulas accumulate errors as O(√N), O(N), or even O(N2), much faster than the FFT itself (Tasche & Zeuner, 2002).

In other cases, poor accuracy is inherent in the algorithm. For example, the QFT code uses accurate precomputed trigonometric functions, but the algorithm seems to be intrinsically less stable than Cooley-Tukey, probably because of its reliance on the secant function, which is singular.

Effects of Floating-point Types

Even inaccurate recurrence formulas can produce accurate results if they are implemented in higher precision (see also Kahan, 2001). A single-precision FFT routine may be accurate even if it uses an inaccurate recurrence, if the recurrence is explicitly computed in double precision.

Certain machines/compilers may implicitly compute certain formulas at a higher precision than explicitly requested by the user. For example, the x86 family of processors supports 32-bit single-precision, 64-bit double-precision, and 80-bit extended-precision floating-point numbers. If the floating-point unit is set in extended-precision mode (which is the default in GNU/Linux), variables that are register-allocated by the compiler become extended-precision even if they were declared float or double. Consequently, even double-precision routines that use inaccurate recurrences may become accurate on x86 machines (but they are still inaccurate on other processors).

On x86, one cannot in general predict which precision will be used for recurrence formulas simply by looking at the source code, because the final precision depends on the register-allocation choices of the compiler. This uncertainty could be removed by using long double types explicitly, but none of the benchmarked codes do this. Consequently, you should be wary of a routine that happens to be accurate only on x86 with our compiler, because the routine may no longer be accurate with your compiler.

x86 SIMD instructions (i.e., SSE, SSE2, and 3DNow!) do not offer extended precision. This causes a very slight degradation in the accuracy of SIMD codes on x86 compared to their non-SIMD variants. The SIMD accuracy on x86 processors is nevertheless the same as the accuracy on other machines for the same precision.

Comments on Specific Codes

We now comment on the accuracy results for some specific codes out of the many we benchmark, in view of the previous discussion about recurrence formulas and floating-point types. We have not performed a detailed error analysis of these codes, but have merely inspected the code's choice of trigonometric recurrence and drawn the most likely conclusions.

Both the Numerical Recipes (nr-c and nr-f) and ransom codes use the same O(√N)-accuracy recurrence algorithm with explicitly double-precision variables, which for single-precision transforms produces accurate results (for the sizes benchmarked). The Bloodworth (bloodworth and bloodworth-fht) codes use this recurrence with variables of the same precision as the FFT, but on x86 our compiler is apparently able to keep these in extended-precision registers, producing accurate results in both single and double precision; its accuracy degrades on other CPUs, however. (The author, C. Bloodworth, was aware of this issue: the original version of the code was in double precision, with an option to use explicit extended-precision types for the recurrence.) Numerical Recipes is only benchmarked in its "native" single precision, but would exhibit the same problems if you followed its authors' recommendation to make a double-precision code by s/float/double/g.

The as117, as83, and as97 routines, as well as the singleton code, all use the O(√N)-accuracy recurrence as in N. R. above, but with single-precision variables and correspondingly large errors. (as117 uses a different FFT algorithm due to Bluestein, which is more expensive and which apparently compounds the trig. errors.)

The gsl-radix2 routines also use the O(√N)-accuracy recurrence, with the recurrence in the same precision as the FFT so that the transform accuracy is degraded. The faster gsl-mixed-radix routines (based on fftpack), on the other hand, use accurate precomputed twiddle tables.

The monnier code uses precomputed twiddle factors, but computes the angles in the table via repeated addition instead of multiplications, which gratuitously degrades the accuracy.

The double-precision mixfft code uses an O(N)-accuracy recurrence algorithm, and employs small arrays as part of its recurrence code that apparently prevent extended-precision registers from being fully exploited on x86. The accuracy of mixfft is further degraded because the code specifies certain constants with fewer digits than the full double-precision accuracy (e.g., 7.0710678118655E-01). The glassman, napack, and sorensen-rsrfft codes use the same recurrence algorithm in single precision.

The double-precision rmayer-unstable routine uses the O(N)-accuracy method above, while rmayer-buneman is a variant of the same code that uses a recurrence algorithm by Buneman (1987) achieving O(√log N) mean accuracy (like the FFT itself) via a precomputed table of O(log N) twiddle factors.

The cross code uses an O(N2)-accuracy recurrence algorithm, performed in double precision. For single-precision transforms, the recurrence is still accurate enough not to degrade the FFT accuracy until N=262144. For double-precision transforms (for which the code is apparently unable to fully exploit extended-precision registers on x86), however, this code is typically the least accurate of all the routines we benchmark.

The accuracy of fftw3 is slightly better on x86 machines without SSE/SSE2/3DNow! than on machines with these instructions, because the standard x86 floating-point unit computes intermediate results in extended-precision, whereas SSE and SSE2 use single and double precision, respectively. You can see the difference if you compare e.g. the Pentium II results (no SIMD) to the Pentium IV results (SIMD in single and double precision) or the AMD results (single-precision SIMD).

Further Reading


Go back to the benchFFT accuracy results page.