Go back to the benchFFT accuracy results page.

Our benchmark shows that certain FFT routines are more accurate than others. In other cases, a routine is accurate on one machine but not on another. This page attempts to explain why these phenomena occur.

We first comment on general accuracy features of FFT algorithms, then on the effects of floating-point types, and finally on specific codes. See also our accuracy benchmark methodology.

Most FFT implementations in the benchmark are based on the
Cooley-Tukey algorithm, whose floating-point error grows as O(log N)
in the worst case (Gentleman & Sande, 1966) and as O(√log N)
on average (Schatzman, 1996), for a 1d transform of size N. For these
bounds to hold, certain trigonometric constants used by the algorithm
(the *twiddle factors*) must be computed accurately. Inaccurate
twiddle factors are the most likely reason for the inaccuracy of an
FFT routine.

Inaccurate twiddle factors are usually caused by poor trigonometric
recurrences. A *trigonometric recurrence* is a trick that many
codes use to avoid the time and memory overhead of precomputing and
storing an array of accurate twiddle factors. Instead, these codes
use a formula that starts with a small number of twiddle factors and
iteratively combines them to produce the other constants needed by the
algorithm. Unfortunately, most such recurrence formulas accumulate
errors as O(√N), O(N), or even O(N^{2}), much faster
than the FFT itself (Tasche & Zeuner, 2002).

In other cases, poor accuracy is inherent in the algorithm. For example, the QFT code uses accurate precomputed trigonometric functions, but the algorithm seems to be intrinsically less stable than Cooley-Tukey, probably because of its reliance on the secant function, which is singular.

** Even inaccurate recurrence formulas can produce accurate
results if they are implemented in higher precision** (see also
Kahan, 2001). A single-precision FFT routine may be accurate even if
it uses an inaccurate recurrence, if the recurrence is explicitly
computed in double precision.

Certain machines/compilers may implicitly compute certain formulas
at a higher precision than explicitly requested by the user. For
example, the x86 family of processors supports 32-bit
single-precision, 64-bit double-precision, and 80-bit
extended-precision floating-point numbers. If the floating-point unit
is set in extended-precision mode (which is the default in GNU/Linux),
variables that are register-allocated by the compiler become
extended-precision even if they were declared `float`

or
`double`

. Consequently, even double-precision routines
that use inaccurate recurrences may become accurate on x86 machines
(but they are still inaccurate on other processors).

On x86, one cannot in general predict which precision will be used
for recurrence formulas simply by looking at the source code, because
the final precision depends on the register-allocation choices of the
compiler. This uncertainty could be removed by using ```
long
double
```

types explicitly, but none of the benchmarked codes do
this. Consequently, you should be wary of a routine that happens to
be accurate only on x86 with our compiler, because the routine may no
longer be accurate with *your* compiler.

x86 SIMD instructions (i.e., SSE, SSE2, and 3DNow!) do not offer extended precision. This causes a very slight degradation in the accuracy of SIMD codes on x86 compared to their non-SIMD variants. The SIMD accuracy on x86 processors is nevertheless the same as the accuracy on other machines for the same precision.

We now comment on the accuracy results for some specific codes out of the many we benchmark, in view of the previous discussion about recurrence formulas and floating-point types. We have not performed a detailed error analysis of these codes, but have merely inspected the code's choice of trigonometric recurrence and drawn the most likely conclusions.

Both the Numerical Recipes (**nr-c** and **nr-f**) and
**ransom** codes use the same O(√N)-accuracy recurrence
algorithm with explicitly double-precision variables, which for
single-precision transforms produces accurate results (for the sizes
benchmarked). The Bloodworth (**bloodworth** and
**bloodworth-fht**) codes use this recurrence with variables of the
same precision as the FFT, but on x86 our compiler is apparently able
to keep these in extended-precision registers, producing accurate
results in both single and double precision; its accuracy degrades on
other CPUs, however. (The author, C. Bloodworth, was aware of this
issue: the original version of the code was in double precision, with
an option to use explicit extended-precision types for the
recurrence.) *Numerical Recipes* is only benchmarked in its
"native" single precision, but would exhibit the same problems if you
followed its authors' recommendation to make a double-precision code
by `s/float/double/g`

.

The **as117**, **as83**, and **as97** routines, as well as
the **singleton** code, all use the O(√N)-accuracy recurrence
as in *N. R.* above, but with single-precision variables and
correspondingly large errors. (as117 uses a different FFT algorithm
due to Bluestein, which is more expensive and which apparently
compounds the trig. errors.)

The **gsl-radix2** routines also use the O(√N)-accuracy
recurrence, with the recurrence in the same precision as the FFT so
that the transform accuracy is degraded. The faster
**gsl-mixed-radix** routines (based on fftpack), on the other hand,
use accurate precomputed twiddle tables.

The **monnier** code uses precomputed twiddle factors, but
computes the angles in the table via repeated addition instead of
multiplications, which gratuitously degrades the accuracy.

The double-precision **mixfft** code uses an O(N)-accuracy
recurrence algorithm, and employs small arrays as part of its
recurrence code that apparently prevent extended-precision registers
from being fully exploited on x86. The accuracy of **mixfft** is
further degraded because the code specifies certain constants with
fewer digits than the full double-precision accuracy (e.g.,
`7.0710678118655E-01`

). The **glassman**,
**napack**, and **sorensen-rsrfft** codes use the same
recurrence algorithm in single precision.

The double-precision **rmayer-unstable** routine uses the
O(N)-accuracy method above, while **rmayer-buneman** is a variant of
the same code that uses a recurrence algorithm by Buneman (1987)
achieving O(√log N) mean accuracy (like the FFT itself) via a
precomputed table of O(log N) twiddle factors.

The **cross** code uses an O(N^{2})-accuracy recurrence
algorithm, performed in double precision. For single-precision
transforms, the recurrence is still accurate enough not to degrade the
FFT accuracy until N=262144. For double-precision transforms (for which
the code is apparently unable to fully exploit extended-precision registers
on x86), however, this code is typically the least accurate of all the
routines we benchmark.

The accuracy of **fftw3** is slightly better on x86 machines
without SSE/SSE2/3DNow! than on machines with these instructions, because the
standard x86 floating-point unit computes intermediate results in
extended-precision, whereas SSE and SSE2 use single and double
precision, respectively. You can see the difference if you compare
e.g. the Pentium II results (no SIMD) to the Pentium IV results (SIMD
in single and double precision) or the AMD results (single-precision
SIMD).

- W. M. Gentleman and G. Sande, "Fast Fourier transforms—for fun and
profit,"
*Proc. AFIPS***29**, 563–578 (1966). - O. Buneman, "Stable online creation of sines or cosines of successive angles,"
*Proc. IEEE***75**(10), 1434–1435 (1987). - James C. Schatzman, "Accuracy of the discrete Fourier transform
and the fast Fourier transform,"
*SIAM J. Sci. Comput.***17**(5), 1150–1166 (1996). - Manfred Tasche and Hansmartin Zeuner, "Improved roundoff error
analysis for precomputed twiddle factors,"
*J. Computational Analysis and Applications***4**(1), 1–18 (2002). - William Kahan, "How Java's floating-point hurts everyone everywhere," http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf (2001), retrieved Aug. 15 2003. (See comments on accuracy and precision choices.)

Go back to the benchFFT accuracy results page.