Advanced distributed-transpose interface (FFTW 3.3.10)

6.7.2 Advanced distributed-transpose interface

The above routines are for a transpose of a matrix of numbers (of type double), using FFTW’s default block sizes. More generally, one can perform transposes of tuples of numbers, with user-specified block sizes for the input and output:

fftw_plan fftw_mpi_plan_many_transpose
                (ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany,
                 ptrdiff_t block0, ptrdiff_t block1,
                 double *in, double *out, MPI_Comm comm, unsigned flags);

In this case, one is transposing an n0 by n1 matrix of howmany-tuples (e.g. howmany = 2 for complex numbers). The input is distributed along the n0 dimension with block size block0, and the n1 by n0 output is distributed along the n1 dimension with block size block1. If FFTW_MPI_DEFAULT_BLOCK (0) is passed for a block size then FFTW uses its default block size. To get the local size of the data on each process, you should then call fftw_mpi_local_size_many_transposed.