FFTW FAQ - Section 3
Using FFTW

Q3.1. Why not support the FFTW 2 interface in FFTW 3?
Q3.2. Why do FFTW 3 plans encapsulate the input/output arrays and not just the algorithm?
Q3.3. FFTW seems really slow.
Q3.4. FFTW slows down after repeated calls.
Q3.5. An FFTW routine is crashing when I call it.
Q3.6. My Fortran program crashes when calling FFTW.
Q3.7. FFTW gives results different from my old FFT.
Q3.8. FFTW gives different results between runs
Q3.9. Can I save FFTW's plans?
Q3.10. Why does your inverse transform return a scaled result?
Q3.11. How can I make FFTW put the origin (zero frequency) at the center of its output?
Q3.12. How do I FFT an image/audio file in foobar format?
Q3.13. My program does not link (on Unix).
Q3.14. I included your header, but linking still fails.
Q3.15. My program crashes, complaining about stack space.
Q3.16. FFTW seems to have a memory leak.
Q3.17. The output of FFTW's transform is all zeros.
Q3.18. How do I call FFTW from the Microsoft language du jour?
Q3.19. Can I compute only a subset of the DFT outputs?
Q3.20. Can I use FFTW's routines for in-place and out-of-place matrix transposition?

Question 3.1. Why not support the FFTW 2 interface in FFTW 3?

FFTW 3 has semantics incompatible with earlier versions: its plans can only be used for a given stride, multiplicity, and other characteristics of the input and output arrays; these stronger semantics are necessary for performance reasons. Thus, it is impossible to efficiently emulate the older interface (whose plans can be used for any transform of the same size). We believe that it should be possible to upgrade most programs without any difficulty, however.

Question 3.2. Why do FFTW 3 plans encapsulate the input/output arrays and not just the algorithm?

There are several reasons:

It was important for performance reasons that the plan be specific to array characteristics like the stride (and alignment, for SIMD), and requiring that the user maintain these invariants is error prone.
In most high-performance applications, as far as we can tell, you are usually transforming the same array over and over, so FFTW's semantics should not be a burden.
If you need to transform another array of the same size, creating a new plan once the first exists is a cheap operation.
If you need to transform many arrays of the same size at once, you should really use the plan_many routines in FFTW's "advanced" interface.
If the abovementioned array characteristics are the same, you are willing to pay close attention to the documentation, and you really need to, we provide a "new-array execution" interface to apply a plan to a new array.

Question 3.3. FFTW seems really slow.

You are probably recreating the plan before every transform, rather than creating it once and reusing it for all transforms of the same size. FFTW is designed to be used in the following way:

First, you create a plan. This will take several seconds.
Then, you reuse the plan many times to perform FFTs. These are fast.

If you don't need to compute many transforms and the time for the planner is significant, you have two options. First, you can use the FFTW_ESTIMATE option in the planner, which uses heuristics instead of runtime measurements and produces a good plan in a short time. Second, you can use the wisdom feature to precompute the plan; see Q3.9 `Can I save FFTW's plans?'

Question 3.4. FFTW slows down after repeated calls.

Probably, NaNs or similar are creeping into your data, and the slowdown is due to the resulting floating-point exceptions. For example, be aware that repeatedly FFTing the same array is a diverging process (because FFTW computes the unnormalized transform).

Question 3.5. An FFTW routine is crashing when I call it.

Did the FFTW test programs pass (make check, or cd tests; make bigcheck if you want to be paranoid)? If so, you almost certainly have a bug in your own code. For example, you could be passing invalid arguments (such as wrongly-sized arrays) to FFTW, or you could simply have memory corruption elsewhere in your program that causes random crashes later on. Please don't complain to us unless you can come up with a minimal self-contained program (preferably under 30 lines) that illustrates the problem.

Question 3.6. My Fortran program crashes when calling FFTW.

As described in the manual, on 64-bit machines you must store the plans in variables large enough to hold a pointer, for example integer*8. We recommend using integer*8 on 32-bit machines as well, to simplify porting.

Question 3.7. FFTW gives results different from my old FFT.

People follow many different conventions for the DFT, and you should be sure to know the ones that we use (described in the FFTW manual). In particular, you should be aware that the FFTW_FORWARD/FFTW_BACKWARD directions correspond to signs of -1/+1 in the exponent of the DFT definition. (Numerical Recipes uses the opposite convention.)

You should also know that we compute an unnormalized transform. In contrast, Matlab is an example of program that computes a normalized transform. See Q3.10 `Why does your inverse transform return a scaled result?'.

Finally, note that floating-point arithmetic is not exact, so different FFT algorithms will give slightly different results (on the order of the numerical accuracy; typically a fractional difference of 1e-15 or so in double precision).

Question 3.8. FFTW gives different results between runs

If you use FFTW_MEASURE or FFTW_PATIENT mode, then the algorithm FFTW employs is not deterministic: it depends on runtime performance measurements. This will cause the results to vary slightly from run to run. However, the differences should be slight, on the order of the floating-point precision, and therefore should have no practical impact on most applications.

If you use saved plans (wisdom) or FFTW_ESTIMATE mode, however, then the algorithm is deterministic and the results should be identical between runs.

Question 3.9. Can I save FFTW's plans?

Yes. Starting with version 1.2, FFTW provides the wisdom mechanism for saving plans; see the FFTW manual.

Question 3.10. Why does your inverse transform return a scaled result?

Computing the forward transform followed by the backward transform (or vice versa) yields the original array scaled by the size of the array. (For multi-dimensional transforms, the size of the array is the product of the dimensions.) We could, instead, have chosen a normalization that would have returned the unscaled array. Or, to accomodate the many conventions in this matter, the transform routines could have accepted a "scale factor" parameter. We did not do this, however, for two reasons. First, we didn't want to sacrifice performance in the common case where the scale factor is 1. Second, in real applications the FFT is followed or preceded by some computation on the data, into which the scale factor can typically be absorbed at little or no cost.

Question 3.11. How can I make FFTW put the origin (zero frequency) at the center of its output?

For human viewing of a spectrum, it is often convenient to put the origin in frequency space at the center of the output array, rather than in the zero-th element (the default in FFTW). If all of the dimensions of your array are even, you can accomplish this by simply multiplying each element of the input array by (-1)^(i + j + ...), where i, j, etcetera are the indices of the element. (This trick is a general property of the DFT, and is not specific to FFTW.)

Question 3.12. How do I FFT an image/audio file in foobar format?

FFTW performs an FFT on an array of floating-point values. You can certainly use it to compute the transform of an image or audio stream, but you are responsible for figuring out your data format and converting it to the form FFTW requires.

Question 3.13. My program does not link (on Unix).

The libraries must be listed in the correct order (-lfftw3 -lm for FFTW 3.x) and after your program sources/objects. (The general rule is that if A uses B, then A must be listed before B in the link command.).

Question 3.14. I included your header, but linking still fails.

You're a C++ programmer, aren't you? You have to compile the FFTW library and link it into your program, not just #include <fftw3.h>. (Yes, this is really a FAQ.)

Question 3.15. My program crashes, complaining about stack space.

You cannot declare large arrays with automatic storage (e.g. via fftw_complex array[N]); you should use fftw_malloc (or equivalent) to allocate the arrays you want to transform if they are larger than a few hundred elements.

Question 3.16. FFTW seems to have a memory leak.

After you create a plan, FFTW caches the information required to quickly recreate the plan. (See Q3.9 `Can I save FFTW's plans?') It also maintains a small amount of other persistent memory. You can deallocate all of FFTW's internally allocated memory, if you wish, by calling fftw_cleanup(), as documented in the manual.

Question 3.17. The output of FFTW's transform is all zeros.

You should initialize your input array after creating the plan, unless you use FFTW_ESTIMATE: planning with FFTW_MEASURE or FFTW_PATIENT overwrites the input/output arrays, as described in the manual.

Question 3.18. How do I call FFTW from the Microsoft language du jour?

Please do not ask us Windows-specific questions. We do not use Windows. We know nothing about Visual Basic, Visual C++, or .NET. Please find the appropriate Usenet discussion group and ask your question there. See also Q2.2 `Does FFTW run on Windows?'.

Question 3.19. Can I compute only a subset of the DFT outputs?

In general, no, an FFT intrinsically computes all outputs from all inputs. In principle, there is something called a pruned FFT that can do what you want, but to compute K outputs out of N the complexity is in general O(N log K) instead of O(N log N), thus saving only a small additive factor in the log. (The same argument holds if you instead have only K nonzero inputs.)

There are some specific cases in which you can get the O(N log K) performance benefits easily, however, by combining a few ordinary FFTs. In particular, the case where you want the first K outputs, where K divides N, can be handled by performing N/K transforms of size K and then summing the outputs multiplied by appropriate phase factors. For more details, see pruned FFTs with FFTW.

There are also some algorithms that compute pruned transforms approximately, but they are beyond the scope of this FAQ.

Question 3.20. Can I use FFTW's routines for in-place and out-of-place matrix transposition?

You can use the FFTW guru interface to create a rank-0 transform of vector rank 2 where the vector strides are transposed. (A rank-0 transform is equivalent to a 1D transform of size 1, which. just copies the input into the output.) Specifying the same location for the input and output makes the transpose in-place.

For double-valued data stored in row-major format, plan creation looks like this:

fftw_plan plan_transpose(int rows, int cols, double *in, double *out)
{
    const unsigned flags = FFTW_ESTIMATE; /* other flags are possible */
    fftw_iodim howmany_dims[2];

    howmany_dims[0].n  = rows;
    howmany_dims[0].is = cols;
    howmany_dims[0].os = 1;

    howmany_dims[1].n  = cols;
    howmany_dims[1].is = 1;
    howmany_dims[1].os = rows;

    return fftw_plan_guru_r2r(/*rank=*/ 0, /*dims=*/ NULL,
                              /*howmany_rank=*/ 2, howmany_dims,
                              in, out, /*kind=*/ NULL, flags);
}

(This entry was written by Rhys Ulerich.)

Next: Internals of FFTW.
Back: Installing FFTW.
Return to contents.

Matteo Frigo and Steven G. Johnson / fftw@fftw.org - 04 March 2014

FFTW FAQ - Section 3 Using FFTW

FFTW FAQ - Section 3
Using FFTW