Go back to the FFTW home page.
The FFTW Release Notes
This document describes the new features and changes in each release
of FFTW.
FFTW 3.3.3
Nov 25, 2012
-
Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the
bug report and patch, and to Graham Dennis for the bug report).
-
Use 128-bit ARM NEON instructions instead of 64-bit instructions. This change
appears to speed up even ARM processors with a 64-bit NEON pipe.
-
Speed improvements for single-precision AVX.
-
Speed up planner on machines without "official" cycle counters, such as ARM.
FFTW 3.3.2
May 6, 2012
-
Removed an archaic stack-alignment hack that was failing with
gcc-4.7/i386.
-
Added stack-alignment hack necessary for gcc on Windows/i386. We
will regret this in ten years (see previous change).
-
Fix incompatibility with Intel icc which pretends to be gcc
but does not support quad precision.
-
Make libfftw{threads,mpi} depend upon libfftw when using libtool;
this is consistent with most other libraries and simplifies the life
of various distributors of GNU/Linux.
FFTW 3.3.1
Feb 25, 2012
- Reduced planning time in estimate mode for sizes with large
prime factors.
- Added AVX autodetection under Visual Studio. Thanks Carsten
Steger for submitting the necessary code.
- Modern Fortran interface now uses a separate
fftw3l.f03 interface
file for the long double interface, which is not supported by
some Fortran compilers. Provided new fftw3q.f03 interface file
to access the quadruple-precision FFTW routines with recent
versions of gcc/gfortran.
FFTW 3.3.1-beta1
Aug 21, 2011
- Added support for the NEON extensions to the ARM ISA. (Note to beta
users: an ARM cycle counter is not yet implemented; please contact
fftw@fftw.org if you know how to do it right.)
- MPI code now compiles even if mpicc is a C++ compiler; thanks to
Kyle Spyksma for the bug report.
FFTW 3.3
Jul 26, 2011
- Compiling OpenMP support (--enable-openmp) now installs a
fftw3_omp library, instead of fftw3_threads, so that OpenMP and
POSIX threads (--enable-threads) libraries can be built and
installed at the same time.
- Various minor compilation fixes, corrections of manual typos, and
improvements to the benchmark test program.
FFTW 3.3-beta1
June 27, 2011
- Add support for the AVX extensions to x86 and x86-64. The AVX code
works with 16-byte alignment (as opposed to 32-byte alignment),
so there is no ABI change compared to FFTW 3.2.2.
- Added Fortran 2003 interface, which should be usable on most modern
Fortran compilers (e.g. gfortran) and provides type-checked access
to the the C FFTW interface. (The legacy Fortran-77 interface is
still included also.)
- Added MPI distributed-memory transforms. Compared to 3.3alpha,
the major changes in the MPI transforms are:
- Fixed some deadlock and crashing bugs.
- Added Fortran 2003 interface.
- Added new-array execute functions for MPI plans.
- Eliminated use of large MPI tags, since Cray MPI requires tags < 224;
thanks to Jonathan Bentz for the bug report.
- Expanded documentation.
-
make check now runs MPI tests
- Some ABI changes — not binary-compatible with 3.3alpha MPI.
- Add support for quad-precision
__float128 in gcc 4.6 or later (on x86.
x86-64, and Itanium). The new routines use the fftwq_ prefix.
- Removed support for MIPS paired-single instructions due to lack of
available hardware for testing. Users who want this functionality
should continue using FFTW 3.2.x. (Note that FFTW 3.3 still works
on MIPS; this only concerns special instructions available on some
MIPS chips.)
- Removed support for the Cell Broadband Engine. Cell users should
use FFTW 3.2.x.
- New convenience functions
fftw_alloc_real and fftw_alloc_complex
to use fftw_malloc for real and complex arrays without typecasts
or sizeof.
- New convenience functions
fftw_export_wisdom_to_filename and
fftw_import_wisdom_from_filename that export/import wisdom
to a file, which don't require you to open/close the file yourself.
- New function
fftw_cost to return FFTW's internal cost metric for
a given plan; thanks to Rhys Ulerich and Nathanael Schaeffer for the
suggestion.
- The
--enable-sse2 configure flag now works in both double and single
precision (and is equivalent to --enable-sse in the latter case).
- Remove
--enable-portable-binary flag: we new produce portable binaries
by default.
- Remove the automatic detection of native architecture flag for gcc
which was introduced in fftw-3.1, since new gcc supports
-mtune=native.
Remove the --with-gcc-arch flag; if you want to specify a particlar
arch to configure, use ./configure CC="gcc -mtune=...".
--with-our-malloc16 configure flag is now renamed --with-our-malloc.
- Fixed build problem failure when
srand48 declaration is missing;
thanks to Ralf Wildenhues for the bug report.
- Fixed bug in
fftw_set_timelimit: ensure that a negative timelimit
is equivalent to no timelimit in all cases. Thanks to William Andrew
Burnson for the bug report.
- Fixed stack-overflow problem on OpenBSD caused by using
alloca with
too large a buffer.
FFTW 3.3alpha1
November 15, 2008
- Added back in MPI code from 3.2alpha3.
FFTW 3.2.2
July 14, 2009
- Improve performance of some copy operations of complex arrays on
x86 machines.
- Add configure flag to disable alloca(), which is broken in mingw64.
- Planning in FFTW_ESTIMATE mode for r2r transforms became slower
between fftw-3.1.3 and 3.2. This regression has now been fixed.
FFTW 3.2.1
February 14, 2009
- Performance improvements for some multidimensional r2c/c2r transforms;
thanks to Eugene Miloslavsky for his benchmark reports.
- Compile with icc on MacOS X, use better icc compiler flags.
- Compilation fixes for systems where
snprintf is defined as a macro;
thanks to Marcus Mae for the bug report.
- Fortran documentation now recommends not using
dfftw_execute,
because of reports of problems with various Fortran compilers;
it is better to use dfftw_execute_dft etcetera.
- Some documentation clarifications, e.g. of fact that
--enable-openmp
and --enable-threads are mutually exclusive (thanks to Long To),
and document slightly odd behavior of plan_guru_r2r in Fortran
(thanks to Alexander Pozdneev).
- FAQ was accidentally omitted from 3.2 tarball.
- Remove some extraneous (harmless) files accidentally included in
a subdirectory of the 3.2 tarball.
FFTW 3.2
November 15, 2008
- Worked around apparent glibc bug that leads to rare hangs when freeing
semaphores.
- Fixed segfault due to unaligned access in certain obscure problems
that use SSE and multiple threads.
- MPI transforms not included, as they are still in alpha; the alpha
versions of the MPI transforms have been moved to FFTW 3.3alpha1.
FFTW 3.2-alpha3
Nov 13, 2007
- Performance improvements for sizes with factors of 5 and 10.
- Documented
FFTW_WISDOM_ONLY flag, at the suggestion of Mario
Emmenlauer and Phil Dumont.
- Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code.
- Performance improvements in Cell code for N < 32k, thanks to Jan Wagner
for the suggestions.
- Cycle counter for Sun x86_64 compiler, and compilation fix in cycle
counter for AIX/xlc (thanks to Jeff Haferman for the bug report).
- Fixed incorrect type prefix in MPI code that prevented wisdom routines
from working in single precision (thanks to Eric A. Borisch for the report).
- Added
make check for MPI code (which still fails in a couple corner
cases, but should be much better than in alpha2).
- Many other small fixes.
FFTW 3.2-alpha2
Mar 19, 2007
- Support for the Cell processor, donated by
IBM Research; see also
README.Cell and the Cell section
of the manual.
- New 64-bit API: for every "
plan_guru"
function there is a new "plan_guru64"
function with the same semantics, but which takes fftw_iodim64 instead of
fftw_iodim. fftw_iodim64 is the same as fftw_iodim, except that it takes
ptrdiff_t integer types as parameters, which is a 64-bit type on
64-bit machines. This is only useful for specifying very large transforms
on 64-bit machines. (Internally, FFTW uses ptrdiff_t everywhere
regardless of what API you choose.)
- Experimental MPI support. Complex one- and multi-dimensional FFTs,
multi-dimensional r2r, multi-dimensional r2c/c2r transforms, and
distributed transpose operations, with 1d block distributions.
(This is an alpha preview: routines have not been exhaustively
tested, documentation is incomplete, and some functionality is
missing, e.g. Fortran support.) See mpi/README and also the MPI
section of the manual.
- Significantly faster r2c transforms, especially on machines with SIMD.
- Rewritten multi-threaded support for better performance by
re-using a fixed pool of threads rather than continually
respawning and joining (which nowadays is much slower).
- Support for MIPS paired-single SIMD instructions, donated by
Codesourcery.
-
FFTW_WISDOM_ONLY planner flag, to create plan only if wisdom is
available and return NULL otherwise.
- Removed k7 support, which only worked in 32-bit mode and is
becoming obsolete. Use
--enable-sse instead.
- Added
--with-g77-wrappers configure option to force inclusion
of g77 wrappers, in addition to whatever is needed for the
detected Fortran compilers. This is many intended for GNU/Linux
distros switching to gfortran, but wishing to include both
gfortran and g77 support in FFTW.
- In manual, renamed "guru execute" functions to "new-array execute"
functions, to reduce confusion with the guru planner interface.
(The programming interface is unchanged.)
- Add missing
__declspec attribute to threads API functions when compiling
for Windows (thanks to Robert O. Morris for the bug report)
- Fixed missing return value from
dfftw_init_threads in Fortran;
thanks to Markus Wetzstein for the bug report.
FFTW 3.1.3
Oct 7, 2008
- Bug fix: FFTW computes incorrect results when the user plans both
REDFT11 and RODFT11 transforms of certain sizes. The bug is caused
by incorrect sharing of twiddle-factor tables between the two
transforms, and only occurs when both are used. Thanks to Paul
A. Valiant for the bug report.
FFTW 3.1.2
July 5, 2006
- Correct bug in
configure script: --enable-portable-binary option was ignored!
Thanks to Andrew Salamon for the bug report.
- Threads compilation fix on AIX: prefer xlc_r to cc_r, and don't use
either if we are using gcc. Thanks to Guy Moebs for the bug report.
- Updated FAQ to note that Apple gcc 4.0.1 on MacOS/Intel is broken,
and suggest a workaround.
configure script now detects Core/Duo arch.
- Use
-maltivec when checking for altivec.h. Fixes Gentoo bug #129304,
thanks to Markus Dittrich.
FFTW 3.1.1
March 18, 2006
- Performance improvements for Intel EMT64.
- Performance improvements for large-size transforms with SIMD.
- Cycle counter support for Intel icc and Visual C++ on x86-64.
- In
fftw-wisdom tool, replaced obsolete --impatient with --measure.
- Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas.
- Windows DLL support for Fortran API (added missing
__declspec(dllexport)).
- SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486
CPUs lacking a CPUID instruction; thanks to Eric Korpela.
FFTW 3.1
January 28, 2006
- Faster FFTW_ESTIMATE planner.
- New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size.
- "4-step" algorithm for faster FFTs of very large sizes (> 2^18).
- Faster in-place real-data DFTs (for R2HC and HC2R r2r formats).
- Faster in-place non-square transpositions (FFTW uses these internally
for in-place FFTs, and you can also perform them explicitly using
the guru interface).
- Faster prime-size DFTs: implemented Bluestein's algorithm, as well
as a zero-padded Rader variant to limit recursive use of Rader's algorithm.
- SIMD support for split complex arrays.
- Much faster Altivec/VMX performance.
- New fftw_set_timelimit function to specify a (rough) upper bound to the
planning time (does not affect ESTIMATE mode).
- Removed --enable-3dnow support; use --enable-k7 instead.
- FMA (fused multiply-add) version is now included in "standard" FFTW,
and is enabled with --enable-fma (the default on PowerPC and Itanium).
- Automatic detection of native architecture flag for gcc. New
configure options: --enable-portable-binary and --with-gcc-arch=,
for people distributing compiled binaries of FFTW (see manual).
- Automatic detection of Altivec under Linux with gcc 3.4 (so that
same binary should work on both Altivec and non-Altivec PowerPCs).
- Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX,
Solaris/Intel.
- Various documentation clarifications.
- 64-bit clean. (Fixes a bug affecting the split guru planner on
64-bit machines, reported by David Necas.)
- Fixed Debian bug #259612: inadvertent use of SSE instructions on
non-SSE machines (causing a crash) for --enable-sse binaries.
- Fixed bug that caused HC2R transforms to destroy the input in
certain cases, even if the user specified FFTW_PRESERVE_INPUT.
- Fixed bug where wisdom would be lost under rare circumstances,
causing excessive planning time.
- FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2.
- Fixed accidentally exported symbol that prohibited simultaneous
linking to double/single multithreaded FFTW (thanks to Alessio Massaro).
- Support Win32 threads under MinGW (thanks to Alessio Massaro).
- Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod.
- Fix build failure if no Fortran compiler is found (thanks to Charles
Radley for the bug report).
- Fixed compilation failure with icc 8.0 and SSE/SSE2. Automatic
detection of icc architecture flag (e.g. -xW).
- Fixed compilation with OpenMP on AIX (thanks to Greg Bauer).
- Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski).
- Incorporated patch from FreeBSD ports (FreeBSD does not have memalign,
but its malloc is 16-byte aligned).
- Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc,
MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for
reports/fixes). Added x86-64 cycle counter for PGI compilers,
courtesy Cristiano Calonaci.
- Fix compilation problem in test program due to C99 conflict.
- Portability fix for import_system_wisdom with djgpp (thanks to Juan
Manuel Guerrero).
- Fixed compilation failure on MacOS 10.3 due to getopt conflict.
- Work around Visual C++ (version 6/7) bug in SSE compilation;
thanks to Eddie Yee for his detailed report.
Changes from FFTW 3.1 beta 2
- Several minor compilation fixes.
- Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with
fftw_set_timelimit function. Make wisdom work with time-limited plans.
Changes from FFTW 3.1 beta 1
- Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback.
- Fixed more 64-bit problems, thanks to John Pavel for the bug report.
- Further speed improvements for Altivec/VMX.
- Further speed improvements for non-square transpositions.
- Many minor tweaks.
FFTW 3.0.1
July 6, 2003.
- Some speed improvements in SIMD code.
--without-cycle-counter option is removed. If no cycle counter is found,
then the estimator is always used. A --with-slow-timer option is provided
to force the use of lower-resolution timers.
- Several fixes for compilation under Visual C++, with help from Stefane Ruel.
- Added x86 cycle counter for Visual C++, with help from Morten Nissov.
- Added S390 cycle counter, courtesy of James Treacy.
- Added missing
static keyword that prevented simultaneous linkage
of different-precision versions; thanks to Rasmus Larsen for the bug report.
- Corrected accidental omission of
f77_wisdom.f file; thanks to Alan Watson.
- Support
-xopenmp flag for SunOS; thanks to John Lou for the bug report.
- Compilation with HP/UX cc requires
-Wp,-H128000 flag to increase
preprocessor limits; thanks to Peter Vouras for the bug report.
- Removed non-portable use of
tempfile in fftw-wisdom-to-conf script;
thanks to Nicolas Decoster for the patch.
- Added
make smallcheck target in tests/ directory, at the request of
James Treacy.
FFTW 3.0
April 20, 2003.
Major goals of this release:
- Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below).
- Complete rewrite, to make it easier to add new algorithms and transforms.
- New API, to support more general semantics.
Other enhancements:
- SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec).
(With special thanks to Franz Franchetti for many experimental prototypes
and to Stefan Kral for the vectorizing generator from fftwgel.)
- True in-place 1d transforms of large sizes (as well as compressed
twiddle tables for additional memory/cache savings).
- More arbitrary placement of real & imaginary data, e.g. including
interleaved (as in FFTW 2.x) as well as separate real/imag arrays.
- Efficient prime-size transforms of real data.
- Multidimensional transforms can operate on a subset of a larger matrix,
and/or transform selected dimensions of a multidimensional array.
- By popular demand, simultaneous linking to double precision (fftw),
single precision (fftwf), and long-double precision (fftwl) versions
of FFTW is now supported.
- Cycle counters (on all modern CPUs) are exploited to speed planning.
- Efficient transforms of real even/odd arrays, a.k.a. discrete
cosine/sine transforms (types I-IV). (Currently work via pre/post
processing of real transforms, ala FFTPACK, so are not optimal.)
- DHTs (Discrete Hartley Transforms), again via post-processing
of real transforms (and thus suboptimal, for now).
- Support for linking to just those parts of FFTW that you need,
greatly reducing the size of statically linked programs when
only a limited set of transform sizes/types are required.
- Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along
with a command-line tool (fftw-wisdom) to generate/update it.
- Fortran API can be used with both g77 and non-g77 compilers
simultaneously.
- Multi-threaded version has optional OpenMP support.
- Authors' good looks have greatly improved with age.
Changes from 3.0beta3:
- Separate FMA distribution to better exploit fused multiply-add instructions
on PowerPC (and possibly other) architectures.
- Performance improvements via some inlining tweaks.
fftw_flops now returns double arguments, not int, to avoid overflows
for large sizes.
- Workarounds for automake bugs.
Changes from 3.0beta2:
- The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in
FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so
we replaced it with a slower routine that is more accurate.
- The guru planner and execute functions now have two variants, one that
takes complex arguments and one that takes separate real/imag pointers.
- Execute and planner routines now automatically align the stack on x86,
in case the calling program is misaligned.
README file for test program.
- Fixed bugs in the combination of SIMD with multi-threaded transforms.
- Eliminated internal
fftw_threads_init function, which some people were
calling accidentally instead of the fftw_init_threads API function.
- Check for
-openmp flag (Intel C compiler) when --enable-openmp is used.
- Support AMD x86-64 SIMD and cycle counter.
- Support SSE2 intrinsics in forthcoming gcc 3.3.
Changes from 3.0beta1:
- Faster in-place 1d transforms of non-power-of-two sizes.
- SIMD improvements for in-place, multi-dimensional, and/or non-
FFTW_PATIENT
transforms.
- Added support for hard-coded DCT/DST/DHT codelets of small sizes; the
default distribution only includes hard-coded size-8 DCT-II/III, however.
- Many minor improvements to the manual. Added section on using the
codelet generator to customize and enhance FFTW.
- The default '
make check' should now only take a few minutes; for more
strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'.
fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where
the latter uses stdout.
- Fixed ability to compile with a C++ compiler.
- Fixed support for C99 complex type under glibc.
- Fixed problems with
alloca under MinGW, AIX.
- Workaround for gcc/SPARC bug.
- Fixed multi-threaded initialization failure on IRIX due to lack of
user-accessible
PTHREAD_SCOPE_SYSTEM there.
FFTW 2.1.5
March 24, 2003.
- Bug fix: Fortran wrappers were disabled in version 2.1.4.
FFTW 2.1.4
March 16, 2003.
- Upgraded to newer versions of autoconf, etcetera, to fix compilation
problems on various recent systems.
- The configure script no longer picks the wrong architecture flags
(which caused FFTW to crash) on newer IBM POWER machines running AIX.
- Multi-threaded transforms should now utilize multiple CPUs on
Solaris (which creates threads in single-processor mode by default).
- Added experimental support for OpenMP (and SGI MP) compiler
parallelization directives in the multi-threaded transforms,
instead of using explicit thread spawning. Enable by configuring
--with-openmp or --with-sgi-mp in addition to --enable-threads.
- Expanded FAQ.
FFTW 2.1.3
November 7, 1999.
- The configure script no longer overrides the
CFLAGS environment
variable if it is defined. (Thanks to Diab Jerius.)
- Experimental Fortran-callable wrapper routines for MPI FFTW.
See
mpi/README.f77 for more information.
- The configure script now detects and works around a stack
alignment bug
in gcc 2.95.x on x86.
- configure attempts to guess the appropriate -mcpu flag on
Linux/PPC systems, improving performance (especially on G3s with
gcc 2.95 or later).
- Fixed integer overflow bug for complex transforms of large prime
sizes (> 32768). Thanks to Ezio Riva for the bug report.
- Fixed memory leak in the Matlab wrappers; thanks to Matthew Davis
for the bug report.
- Fixed bugs in the configure script when detecting POSIX threads
libraries on AIX and Tru64 (née Digital) Unix.
- Fixed bug in multi-threaded transforms on AIX (which strangely
creates threads in non-joinable mode by default). Thanks to
Jim Lindsay for the bug report, and for allowing us to debug on
Northwestern University's IBM SP2.
- Slight fix to help build DLL's on Win32 (thanks to Andrew Sterian).
FFTW 2.1.2
May 18, 1999.
- Fixed bug in our MPI test programs which made them fail under MPICH with
the p4 device (TCP/IP). (The 2.1.1 transforms worked, but the test
programs crashed.)
- Added missing
fftw_f77_threads_init function to the Fortran wrappers
for the multi-threaded transforms. Thanks to V. Sundararajan for
the bug report.
- The codelet generator can now output efficient hard-coded DCT/DST
transforms. As a side effect of this work, we slightly reduced the
code size of rfftw.
- Test programs now support GNU-style long options when used with glibc.
- Added some more ideas to our
TODO list.
- Improved codelet generator speed.
FFTW 2.1.1
March 31, 1999.
- Fixed bug in the complex transforms for certain sizes with
intermediate-length prime factors (17-97), which under some
(hopefully rare) circumstances could cause incorrect results.
Thanks to Ming-Chang Liu for the bug report and patch. (The test
program will now catch this sort of problem when it is run in
paranoid mode.)
FFTW 2.1
March 8, 1999.
- Various documentation fixes and improvements.
- The
--enable-type-prefix option to configure makes it easy to install
both single- and double-precision versions of FFTW on the same
(Unix) system. (See the installation section of the manual.)
- The MPI FFTW routines now include parallel one-dimensional transforms
for complex data. (See the
fftw_mpi documentation in the FFTW
manual.)
- The MPI FFTW routines now include parallel multi-dimensional transforms
specialized for real data. (See the
rfftwnd_mpi documentation in the
FFTW manual.)
- The MPI FFTW routines are now documented in the main
manual (in the
doc directory). On Unix systems, they are also
automatically configured, compiled, and installed along with the main
FFTW library when you include --enable-mpi in the flags to the
configure script. (See the FFTW manual.)
- Largely-rewritten MPI code. It is now cleaner and (sometimes) faster.
It also supports the option of a user-supplied workspace for (often)
greater performance (using the
MPI_Alltoall primitive). Beware that
the interfaces have changed slightly, however.
- The multi-threaded FFTW routines now include parallel one- and
multi-dimensional transforms of real data. (See the
rfftw_threads
documentation in the FFTW manual.)
- The multi-threaded FFTW routines are now documented in the main
manual (in the
doc directory). On Unix systems, they are also
automatically configured, compiled, and installed along with the main
FFTW library when you include --enable-threads in the flags to the
configure script. (See the FFTW manual.)
- The multi-threaded FFTW routines now include support for Mach C
threads (used, for example, in Apple's MacOS X).
- The Fortran-callable wrapper routines are now incorporated into
the ordinary FFTW libraries by default (although you can
disable this with the
--disable-fortran option to configure) and
are documented in the main FFTW manual.
- Added Fortran-callable wrapper routines for the multi-threaded
transforms.
- Added an illustration of the data layout to the
rfftwnd tutorial
section of the manual, in the hope of preventing future confusion
on this subject.
- The test programs now allow you to specify multidimensional sizes
(e.g.
128x54x81) for the -c and -s correctness and speed test options.
FFTW 2.0.1
September 29, 1998.
- (bug fix) Due to a poorly-parenthesized expression, rfftwnd overflowed
32-bit integer precision for rank > 1 transforms with a final
dimension >= 65536. This is now fixed. (Thanks to Walter Brisken
for the bug report.)
- (bug fix) Added definition of
FFTW_OUT_OF_PLACE to fftw.h. The
flag is mentioned several times in the documentation, but its
definition was accidentally omitted since FFTW_OUT_OF_PLACE is the
default behavior.
- Corrected various small errors in the documentation. Thanks to
Geir Thomassen and Jeremy Buhler for their comments.
- Improved speed of the codelet generator by orders of magnitude,
since a user needed a hard-coded fft of size 101.
- Modified buffering in multidimensional transforms for some speed
improvements (only when
fftwnd_create_plan_specific is used).
Thanks to Geert van Kempen for his suggestions.
- Added Andrew Sterian's patch to allow FFTW to be used as a shared
library more easily on Win32.
FFTW 2.0
September 11, 1998.
- Completely rewritten real-complex transforms, now using specialized
codelets and an inherently real-complex algorithm for greatly
increased speed. Also, rfftw can now handle odd sizes and
strided transforms. Beware that the output format for 1D rfftw
transforms has changed. See the manual for more details.
- The complex transforms now use a fast algorithm for large prime
factors, working in O(N lg N) time even for prime sizes.
(Previously, the complexity contained an O(p2) term, where p
is the largest prime factor of N. This is still the case for
the rfftw transforms.) Small prime factors are still more
efficient, however.
- Added functions
fftw_one, fftwnd_one, rfftw_one, etcetera, to simplify
and clarify the use of fftw for single, unit-stride transforms.
- Renamed
FFTW_COMPLEX, FFTW_REAL to fftw_complex, fftw_real (for
greater consistency in capitalization). The all-caps names will
continue to be supported indefinitely, but are deprecated. (Also,
support for the COMPLEX and REAL types from FFTW 1.0 is now
disabled by default.)
- There are now Fortran-callable wrappers for the rfftw real-complex
transforms.
- New section of the manual discussing the use of FFTW with multiple
threads, and a new
FFTW_THREADSAFE flag (described therein).
- Added shared library support. Use
configure --enable-shared to
produce a shared library instead of a static library (the default).
- Dropped support for the operation-count (*
_op_count) routines
introduced in v1.3, as these were little-used and were a pain to
keep up-to-date as FFTW changed internally.
- Made it easier to support floating-point types other than
float
and double (e.g. long double). (See the file fftw-int.h.)
FFTW 1.3
April 9, 1998.
- FFTW is now released under the GNU General Public License (GPL).
(Non-free licenses are also available.)
- Multi-dimensional transforms contain significant performance
improvements for dimensions >= 3.
- Performance improvements in multi-dimensional transforms
with
howmany > 1 and stride > dist.
- Improved parallelization and performance in the threads
code for dimensions >= 3.
- Changed the wisdom import/export format (the new wisdom remembers
the stride of the plan that generated it, for use with the new
create_plan_specific functions). (You should regenerate any stored
wisdom you have anyway, since this is a new version of FFTW.)
- Several small fixes to aid compilation on some systems.
- Fixed a bug in the MPI transform (in the transpose routine) that
caused errors for some array sizes.
- Fixed the (hopefully) last few things causing problems with C++
compilers.
- Hack for x86/gcc to properly align local double-precision variables.
- Completely rewritten codelet generator. Now it produces
better code for non powers of 2, and is ready to produce
real->complex transforms.
- Testing algorithm is now more robust, and has a more rigorous
theoretical foundation. (Bugs in testing large transforms or
in single precision are now fixed--these bugs were only in the
test programs and not in the FFTW library itself.)
- Added "specific" planners, which allow plan optimization for a
specific array/stride. They also reduce the memory requirements
of the planner, and permit new optimizations in the multi-dimensional
case. (See the
*_create_plan_specific functions.)
- FFTW can now compute a count of the number of arithmetic operations
it requires, which is useful for some academic purposes. (See the
*_count_plan_ops functions.)
- Adapted for use with GNU autoconf to aid installation on UNIX systems.
(Installation on non-UNIX systems should be the same as before.)
- Used
gettimeofday function if available. (This function typically
has much higher accuracy than clock(), permitting plans to be
created much more quickly than before on many machines.)
- Made timing algorithm (hopefully) more robust in the face of
system interrupts, etc.
- Added wrapper routines for calling FFTW from MATLAB (in the
matlab/ directory).
- Added wrapper routines for calling FFTW from Fortran (in the
fortran/ directory). (These were available
separately before.)
FFTW 1.2.1
December 4, 1997.
- Fixed bugs in the MPI routines (parallel transforms for
distributed memory architectures). Thanks to Eric Skyllingstad for the
bug reports.
FFTW 1.2
September 8, 1997.
- Added a FAQ.
- Fixed bug in
rfftwnd routines where a block was accidentally
allocated to be too small, causing random memory to be
overwritten (yikes!). (Amazingly, this bug only caused the
test program to fail on one system that we could find. Our
test suite can now catch this sort of bug.)
- Abstracted taking differences of times (with
fftw_time_diff
macro/function) to allow more general timer data structures.
- Added "wisdom" mechanism for saving plans & related info.
- Made timing mechanism more robust and maintainable. (Instead of
using a fixed number of iterations, we now repeatedly double
the number of iterations until a specified time interval
(
FFTW_TIME_MIN) is reached.)
- Fixed header files to prevent difficulties when a mix of C and
C++ compilers is used, and to prevent problems with multiple
inclusions.
- Added experimental distributed-memory transforms using MPI.
- Fixed memory leak in
fftwnd_destroy_plan (reported
by Richard Sullivan). Our test programs now all check for leaks.
FFTW 1.1
May 8, 1997.
- Improved speed (yes!) [Some clever tricks with twiddle factors
and better code generator]
- Renamed "blocks" to "codelets," just to be fashionable
- Rewrote planner and executor--much simpler and more readable
code. Reference-counter garbage collection employed throughout.
- Much improved codelet generator. The ML code should now be
readable by humans, and easier to modify.
- Support for Prime Factor transforms in the codelet generator.
- Renamed
COMPLEX to FFTW_COMPLEX to avoid clashes with
existing packages. COMPLEX is still supported
for compatibility with 1.0.
- Added experimental real-complex transform (quick hack,
use at your own risk).
- Added experimental parallel transforms using Cilk.
- Added experimental parallel transforms using threads (currently,
POSIX threads and Solaris threads are implemented and tested).
- Added DOS support, in the sense that we now support 8.3 filenames.
FFTW 1.0
March 24, 1997.
Go back to the FFTW home page.