Go back to the Parallel FFTW page.

Parallel FFTW on a Sun HPC 5000

First, we measured the speedup of the Cilk and threads parallel FFTW codes relative to the uniprocessor FFTW. This is plotted as a function of transform size for 1, 2, 4, and 8 processors below (for one and three-dimensional transforms).

Cilk and Threads FFTW, 1D Transforms

Cilk and Threads FFTW, 3D Transforms

Next, we made the same measurements for the MPI version of FFTW and three-dimensional transforms. This is a shared-memory implementation of MPI.

We ran two versions of the this transform. The first one returns its output in the same format as its input. The second version returns it output in a transposed format (this is faster because it saves the cost of an extra transpose).

MPI FFTW, 3D Transforms

Go back to the Parallel FFTW page.