Due to these issues (we will be kind and not refer to them as "bugs"), the benchmark results could occasionally vary significantly between runs in unexpected ways (though usually not by more than 10%). For the most part, we feel that the benchmark is still reliable for comparing the rough performances of different codes, but the reader should be warned not to expect the exact numbers to be easily reproducible.
Particularly bad in this regard, however, was the Pentium Pro (with the gcc compiler). The speed of the CWP code on the ppro, for example, was observed to change by almost a factor of 20 when changes were made to the completely unrelated FFTW code. Other subroutines exhibited similar variations, though not so large (typically, "only" a factor of 2). For this reason, we do not consider the Pentium Pro results to be reliable. (It would be interesting to run the benchmark on this platform using a different compiler.)
Go back to the benchmark results.