To appear in Theory of Computing Systems.
An earlier version of this paper appears in the
Proceedings of the 18th ACM Symposium on Parallelism in Algorithms and
Architectures (SPAA'06). [Full text]

Abstract

We present a technique for analyzing the number of cache misses
incurred by multithreaded cache oblivious algorithms on an idealized
parallel machine in which each processor has a private cache. We
specialize this technique to computations executed by the Cilk
work-stealing scheduler on a machine with dag-consistent shared
memory. We show that a multithreaded cache oblivious matrix
multiplication incurs $O(n^3/\sqrt{Z} + (P n)^{1/3}
n^2)$ cache misses when executed by the Cilk scheduler on a machine
with~$P$ processors, each with a cache of size~$Z$,
with high probability. This bound is tighter than previously
published bounds. We also present a new multithreaded cache
oblivious algorithm for 1D stencil computations incurring
$O(n^2/Z + n + \sqrt{P n^{3+\epsilon}})$ cache misses
with high probability, one for Gaussian elimination and back
substitution, and one for the length computation part of the longest
common subsequence problem incurring $O(n^2/Z +
\sqrt{P n^{3.58}})$ cache misses with high probability.

This page maintained by: Matteo Frigo. Last updated: Sun Jun 21 09:29:38 2009