Next: One-dimensional distributions, Previous: Load balancing, Up: MPI Data Distribution [Contents][Index]

Internally, FFTW’s MPI transform algorithms work by first computing
transforms of the data local to each process, then by globally
*transposing* the data in some fashion to redistribute the data
among the processes, transforming the new data local to each process,
and transposing back. For example, a two-dimensional `n0`

by
`n1`

array, distributed across the `n0`

dimension, is
transformd by: (i) transforming the `n1`

dimension, which are
local to each process; (ii) transposing to an `n1`

by `n0`

array, distributed across the `n1`

dimension; (iii) transforming
the `n0`

dimension, which is now local to each process; (iv)
transposing back.

However, in many applications it is acceptable to compute a
multidimensional DFT whose results are produced in transposed order
(e.g., `n1`

by `n0`

in two dimensions). This provides a
significant performance advantage, because it means that the final
transposition step can be omitted. FFTW supports this optimization,
which you specify by passing the flag `FFTW_MPI_TRANSPOSED_OUT`

to the planner routines. To compute the inverse transform of
transposed output, you specify `FFTW_MPI_TRANSPOSED_IN`

to tell
it that the input is transposed. In this section, we explain how to
interpret the output format of such a transform.

Suppose you have are transforming multi-dimensional data with (at
least two) dimensions n_{0} × n_{1} × n_{2} × … × n_{d-1}
. As always, it is distributed along
the first dimension n_{0}
. Now, if we compute its DFT with the
`FFTW_MPI_TRANSPOSED_OUT`

flag, the resulting output data are stored
with the first *two* dimensions transposed: n_{1} × n_{0} × n_{2} ×…× n_{d-1}
,
distributed along the n_{1}
dimension. Conversely, if we take the
n_{1} × n_{0} × n_{2} ×…× n_{d-1}
data and transform it with the
`FFTW_MPI_TRANSPOSED_IN`

flag, then the format goes back to the
original n_{0} × n_{1} × n_{2} × … × n_{d-1}
array.

There are two ways to find the portion of the transposed array that
resides on the current process. First, you can simply call the
appropriate ‘`local_size`’ function, passing n_{1} × n_{0} × n_{2} ×…× n_{d-1}
(the
transposed dimensions). This would mean calling the ‘`local_size`’
function twice, once for the transposed and once for the
non-transposed dimensions. Alternatively, you can call one of the
‘`local_size_transposed`’ functions, which returns both the
non-transposed and transposed data distribution from a single call.
For example, for a 3d transform with transposed output (or input), you
might call:

ptrdiff_t fftw_mpi_local_size_3d_transposed( ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2, MPI_Comm comm, ptrdiff_t *local_n0, ptrdiff_t *local_0_start, ptrdiff_t *local_n1, ptrdiff_t *local_1_start);

Here, `local_n0`

and `local_0_start`

give the size and
starting index of the `n0`

dimension for the
*non*-transposed data, as in the previous sections. For
*transposed* data (e.g. the output for
`FFTW_MPI_TRANSPOSED_OUT`

), `local_n1`

and
`local_1_start`

give the size and starting index of the `n1`

dimension, which is the first dimension of the transposed data
(`n1`

by `n0`

by `n2`

).

(Note that `FFTW_MPI_TRANSPOSED_IN`

is completely equivalent to
performing `FFTW_MPI_TRANSPOSED_OUT`

and passing the first two
dimensions to the planner in reverse order, or vice versa. If you
pass *both* the `FFTW_MPI_TRANSPOSED_IN`

and
`FFTW_MPI_TRANSPOSED_OUT`

flags, it is equivalent to swapping the
first two dimensions passed to the planner and passing *neither*
flag.)