In the 40th Annual Symposium on Foundations of Computer Science, FOCS '99, 17-18 October, 1999, New York, NY, USA. [Full text]

Abstract

This paper presents asymptotically optimal
algorithms for rectangular matrix transpose, FFT, and sorting on
computers with multiple levels of caching. Unlike previous optimal
algorithms, these algorithms are cache oblivious: no
variables dependent on hardware parameters, such as cache size and
cache-line length, need to be tuned to achieve optimality.
Nevertheless, these algorithms use an optimal amount of work and move
data optimally among multiple levels of cache. For a cache with size
Z and cache-line length L where Z=\Omega(L^2) the
number of cache misses for an m\times n matrix transpose is
\Theta(1+mn/L). The number of cache misses for either an
n-point FFT or the sorting of n numbers is
\Theta(1+(n/L)(1+\logZ n)). We also give an
\Theta(mnp)-work algorithm to multiply an m\times n
matrix by an n\times p matrix that incurs \Theta(1 +
(mn+np+mp)/L + mnp/L\sqrt{Z}) cache faults.

We introduce an ``ideal-cache'' model to analyze
our algorithms, and we prove that an optimal cache-oblivious algorithm
designed for two levels of memory is also optimal for multiple levels.
We also prove that any optimal cache-oblivious algorithm is also
optimal in the previously studied HMM and SUMH models. Algorithms
developed for these earlier models are perforce
cache-aware: their behavior varies as a function of
hardware-dependent parameters which must be tuned to attain
optimality. Our cache-oblivious algorithms achieve the same
asymptotic optimality, but without any tuning.

This page maintained by: Matteo Frigo. Last updated: Sun Jun 21 09:29:38 2009