Benchmarks don't lie (TM), part 2
Christian.Bauer at Uni-Mainz.DE
Sun Sep 28 19:30:15 CEST 2003
On Wed, Sep 24, 2003 at 12:07:40AM +0200, Richard B. Kreckel wrote:
> AMD released the Opteron processor family today leaving people with the
> budget to buy new hardware wondering what exactly to purchase next.
Well, for those among us who don't have the budget to always buy the latest
kick-ass machines (with their "SDRAM memory" and "hardware accelerated 3D"
and other crazy stuff), the GiNaC Retro Hardware Testing Labs are proud to
present what you've all been waiting for:
The ultimate CAS shootout at 2x200 MHz
- No rules, no mercy. Two CPUs enter, one CPU leaves.
(then, after a while, the other CPU leaves, as soon as I manage to get
the heat sink off the f*cking thing...)
System 1 - ppc:
Umax Pulsar, Dual PowerPC 604e ("Extreme"?) at 200 MHz
L1 cache: 32KB I, 32KB D per CPU
Apple Tsunami board (also used in PowerMac 9500)
L2 cache: 512KB for both CPUs, at 50 MHz
50 MHz system bus
144MB EDO RAM, 60ns
Yellow Dog Linux 2.3 (based on Red Hat 7.2)
System 2 - x86:
Dual Pentium Pro 512K at 200 MHz
L1 cache: 8KB I, 8KB D per CPU
L2 cache: 512KB per CPU, at 200 MHz
Intel Providence (PR440FX) board
66 MHz system bus
256MB registered EDO RAM, 60ns
Red Hat Linux 7.3
Both machines were equipped with Matrox Millennium graphics cards and
SCSI hard disks (ppc: 4GB IBM Fast Narrow; x86: 2GB Conner Fast Wide).
The Umax Pulsar features a fan that appears to be optimized for maximum
noise output. Jet pilots should feel right at home with this computer.
The Intel machine, on the other hand, sports a hard disk that I could still
hear while standing under the shower. Ear protection should be worn at
all times when running both systems in the same room.
But on to the benchmarks...
The tests consisted of compiling GiNaC 1.0.15 (GiNaC >=1.1 would have
required GCC 3), and running its standard benchmark suite. The compiler
options used were
ppc: -g -O2 -mcpu=604e
x86: -g -O2 -march=pentiumpro
and GiNaC was configured with the --disable-static option (the shared
library will be the one used most by applications, anyway).
For the compilation test, only the time required for compiling the library
and tools (ginsh/viewgar) was measured, not the time for compiling the
benchmark suite. The library was built with "make -j 2" ("make -j 3" was
slower by about 30s on both machines).
compile GiNaC 1.0.15 25m 34s 16m 42s
The Pentium Pro really shines here, which may be due to its faster and
larger (combined) L2 cache. But this comparison isn't quite fair really,
as the compilers are of course using different backends on both systems
and producing different output.
So, without further ado, on to the real tests:
commutative expansion and substitution, size 100 1.43s 1.62s
commutative expansion and substitution, size 200 7.32s 7.14s
ratio [5.12] [4.41]
Laurent series expansion of Gamma function, order 20 9.91s 7.429s
Laurent series expansion of Gamma function, order 25 38.74s 28.339s
ratio [3.91] [3.81]
determinant of symbolic 10x10 Vandermonde matrix 6.55s 6.86s
determinant of symbolic 12x12 Vandermonde matrix 56.57s 63.28s
ratio [8.64] [9.22]
determinant of symbolic 8x8 Toeplitz matrix 4.82s 5.65s
determinant of symbolic 9x9 Toeplitz matrix 18.98s 21.12s
ratio [3.94] [3.74]
Lewis-Wester test A (divide factorials) 0.38s 0.56s
Lewis-Wester test B (sum of rational numbers) 0.04s 0.059s
Lewis-Wester test C (gcd of big integers) 0.4s 0.619s
Lewis-Wester test D (normalized sum of rational fcns) 1.5s 1.689s
Lewis-Wester test E (normalized sum of rational fcns) 1.28s 1.489s
Lewis-Wester test F (gcd of 2-var polys) 0.17s 0.19s
Lewis-Wester test G (gcd of 3-var polys) 3.91s 4.459s
Lewis-Wester test H (det of 80x80 Hilbert) 23.12s 27.66s
Lewis-Wester test I (invert rank 40 Hilbert) 7.37s 8.6s
Lewis-Wester test K (invert rank 70 Hilbert) 47.17s 54.45s
ratio [6.40] [6.33]
Lewis-Wester test J (check rank 40 Hilbert) 3.95s 5.05s
Lewis-Wester test L (check rank 70 Hilbert) 22.25s 28.36s
ratio [5.63] [5.62]
Lewis-Wester test M1 (26x26 sparse, det) 0.88s 1.189s
Lewis-Wester test O1 (three 15x15 dets) (average) 109.783s 90.246s
Lewis-Wester test P (det of sparse rank 101) 2.86s 4.19s
Lewis-Wester test P' (det of less sparse rank 101) 14.66s 17.51s
computation of antipodes in Yukawa theory (total) 192.64s 172.27s
timing Fateman's polynomial expand benchmark 362.21s 293.579s
Now, this comes as a bit of a surprise. After reading the MuPAD benchmarks
published at http://www.heise.de/ct/english/96/11/270/ running on machines
very similar to mine, I really expected the Pentium Pro to wipe the floor
with the PowerPC here, but it's actually the other way round. The 604e wins
almost all categories, with some notable exceptions: the Gamma series
expansion, O1, the Yukawa thing, and the expand benchmark.
On the other hand, judging from the "ratio" lines above, the performance of
the Pentium Pro appears to scale better with larger data sets (again with
one exception: the Vandermonde determinants). This, no doubt, is due to the
faster cache and generally better memory interface of the Intel machine.
But still, my next personal machine won't be a Pentium Pro, and it won't be
a "G2" PowerMac, either. The VCS 2600 is going cheap on eBay, though...
/ Physics is an algorithm
More information about the GiNaC-list