Go back to the FFTW home page.

Experimental Results

In order to quantify the performance of FFTW versus that of other Fourier transform codes, we performed extensive benchmarks on a wide variety of platforms, for both one and three-dimensional transforms.

Many public-domain (and a few proprietary) FFTs were benchmarked along with FFTW. There are a staggering number of FFT implementations floating around; hopefully, this benchmark will put an end to the confusion and allow most of the FFTs to slip quietly into oblivion.

In addition to benchmarking the performance, we also measured the numerical accuracy of each of the codes.

Description of the Benchmark

The benchmark program calls each FFT for many iterations on an array of some size N and measures the time elapsed per iteration. This process was repeated for a variety of N in both one and three dimensions.

In order to keep the times comparable for different values of N, we graphed the time for one iteration divided by N log₂ N. (Recall that, in principle, the FFT takes O(N log N) time to compute. Thus, we are plotting the "constant" factor in this relation.)

It should be noted that, due to flaws in modern architectures and compilers, there are uncertainties in the benchmark results that have nothing to do with the timing accuracy.

Description of the Accuracy Measurements

In order to measure the numerical accuracy of an FFT, we performed the transform followed by the inverse transform of an array consisting of random numbers in the range [-1,1]. The array resulting from the two transforms was compared to the initial array, and the mean relative error of all the elements was computed. (The relative error is defined as |value - correct value| / |correct value|. Array elements for which the correct value was zero were ignored.)

For both one and three dimensional transforms, we calculated the mean error vs. array size for all the FFTs. The results are only presented for one platform since they were similar on all the machines. (Actually, there were a few codes that did not provide inverse transform routines; the error for these routines was not computed.)

FFT Software Used

We tried to incorporate as many FFT implementations as we could into the benchmark, drawing on repositories of mathematical software such as GAMS, Netlib, and others. A comprehensive list of these has been assembled, along with links to where they may be downloaded (for public-domain codes).

We would like to take this opportunity to thank the authors of the various FFTs, especially those who have placed their code in the public domain. Without their aid as a reference, their experiences to learn from, and the stimulation of their competition, FFTW would never have been developed.

Benchmark Results

Graphs of the results on the various platforms can be found through the links below. To summarize our findings, FFTW outperformed nearly all other transform codes for most input sizes. There were two exceptions to this rule.

First, the CWP prime factor algorithm implementation was sometimes faster for very large arrays (in one dimension). It should be noted that prime factor algorithms place strong restrictions on the size of the input array. In particular, CWP does not work for sizes that are powers of two, and thus all the results for CWP were measured on input arrays slightly bigger than the ones used for the other FFTs. In some sense, then, this is comparing apples and oranges. Nevertheless, in many applications people do not really care about the exact size of the problem, and therefore CWP becomes a valid alternative to FFTW. We hope to incorporate some sort of prime factor algorithm in a future release of FFTW. (Actually, FFTW v1.1 does use the prime-factor algorithm in the codelets for composite base cases such as 6, 12, and 15.)

Second, the Cray C90 SCILIB beat FFTW on that machine, but this is a special case (see the C90 results for more information.) We also benchmarked other vendor-optimized codes such as SUNPERF and ESSL; FFTW compares very favorably with these codes--see the charts for more details. ESSL was faster for 3d transforms of sizes that weren't powers of two, as it also apparently uses a prime factor algorithm.

The reader should be aware that not all of the FFTs listed in the graph legends will be visible in every plot. There are several reasons for this. First of all, some routines (fft2 and Beauregard especially) were so slow that they are usually completely off the scale of the graphs (which were designed for comparison of the fastest routines, not the slowest). Second, in the graphs for array sizes that are not powers of two, some routines do not appear because they do not support those transform sizes. Finally, the data points for FFTW_ESTIMATE are often on top of those for FFTW, obscuring the latter.

Benchmark Results by Platform

(In all cases, the code was compiled with the maximum available level of optimization.)

Apple PowerMac 9500/120 (120 MHz PowerPC 604). Compiled with CodeWarrior 11.
IBM RS/6000. Compiled with cc -O3 -qarch=pwrx -qtune=pwrx.
166MHz Pentium (under Linux). Compiled with gcc.
200MHz Pentium Pro (under Linux). Compiled with gcc.
167MHz Sun UltraSPARC-I. Compiled with cc dalign -fast -native -xO5.
SGI Onyx, with an R10000 processor. Compiled with gcc.
Cray C90 at the San Diego Supercomputing Center. Compiled with cc -O3 -h align.

Accuracy Results

Go back to the FFTW home page.