Late in 2003 we are summing up the SPEC CPU2000 performance competitions between the most powerful processors from leading companies. These include Intel's Pentium 4 and Pentium 4 EE (Pentium 4 Processor with HT Technology Extreme Edition) as well as the AMD's latest Athlon 64 and Athlon 64 FX products.
Next year we should see more novelties, both "overclocked" and pure novelties from Intel and AMD. Besides, we are going to examine the recently announced Intel compilers 8.0. And, by the way, we might obtain some information about the new SPEC CPU2004 that is already a source of interesting rumours (40 subtests, 3GB distribution, 2GB RAM required, 20-hour per pass).
The software list includes Microsoft Windows XP SP1, test settings as of March 12 2003: Intel 7.1 and Microsoft Visual.NET 2003 compilers. Note that of all processors examined here only Athlon XP doesn't support SSE2 that significantly separates it from other test subjects. Also note that this time Athlon 64/64FX/Opteron processors are examined only in 32-bit mode. Besides, here we consider Athlon 64 FX-51 to be the same as Opteron 148, as their only difference we know is their market position.
First, here are the general integral mark charts, while the most interesting pairs follow.
So, in the integer field Intel is ahead again with its Pentium 4 EE. Note however that the second is AMD Athlon 64 FX-51 that lagging by 9% still outruns the "usual" Pentium 4 by 7%. Despite the single-channel memory controller and 1.5-time lower clock speed, Athlon 64 3200+ is close on the heels of Pentium 4 3.2GHz in this test. And the last one is Athlon XP 3200+ (also because it lacks SSE2 support).
According to SPECfp integral estimate, AMD processors moved a step down. Now Pentium 4 competes with Athlon 64 FX that outruns Athlon 64 by 12%. Perhaps, the AMD64 processor with a dual-channel controller supporting usual (unregistered) DDR400 will change this situation, when it's released.
Pentium 4 EE vs. Pentium 4
Pity, but unlike mere clock speed growth, memory interaction performance and other continuous values, L3 cache significantly affects the results on the one hand, but on the other it makes things considerably less predictable outside this test tasks. We'd already examined the SPEC CPU2000 results dependency on L2 cache size and saw that CINT2000 tasks are affected by cache size differently each time. Sometimes the task indicates considerable growth when moved from 128KB to 256KB, but still doesn't react to L2 increase to 512KB. Therefore, the result strongly depends on the task and its algorithm. At that CFP2000 applications indicated generally weaker dependency on L2 cache size. This, in particular, made us think that even inexpensive small-cache processor series might be enough for calculating tasks (the drawback however would be their slow CPU bus).
So, unfortunately these results indicate only the probable "dividends" from L3 cache and are hardly transferable to other tasks.
So, on the first chart we see that L3 cache provided significant performance boost for most CINT2000 tasks. The maximum boost was observed in the most RAM performance-dependent tasks. The remaining five haven't seem to "notice" anything. I see two reasons of this: either L2 (or even L1) cache is enough, or 2MB L3 is not enough :)
Despite the fact that the integral mark grew by 16% (close to 17% obtained in CINT2000), here we see the vast dispersion of separate test marks, especially of 179.art. This is actually not so surprising, because if you look at the results published by Sun Microsystems at www.spec.org, you will notice that exactly 179.art enabled company's processors that have relatively low clock rates (up to 1.2GHz) to show high integral mark on the level of Pentium 4 3.2GHz. At that Sun's processors utilize 1MB to 8MB L2 cache and achieve 8000+ points in this test.
As we can see in case of Pentium 4 EE, 179.art performed more than 2.5 times faster and achieved the record 2389 points vs. predecessor's 950. Thus the neural network model again confirmed its hunger for the fast memory.
As for other subtests (178.galgel, computational hydrodynamics, and 188.ammp, computational chemistry), the first of them haven't shown its preferences before, while the second reacted with a performance boost to the transition to Athlon XP (Barton) wit h512KB L2 cache.
It's strange that 171.swim didn't stand out this time as well. It seems this amount of cache is still not enough for it.
Athlon 64 FX/Opteron clock rate scalability
Having tested Athlon 64 FX-51 and Opteron 146, we are able to estimate the clock rate scalability of this CPU series. To do this, we can use the data obtained last year for Pentium 4 processors. Note that it describes subtest peculiarities more than CPUs and systems. Also consider that in a year system performance almost doubled.
CINT2000 shows us results similar to the last year's. Almost all subtests were happy with the higher clock rate, excluding 181.mcf, which result increased just by 1.7% at 10% clock rate increase. The integral mark changed by 8% (vs. the previous 7%).
This time there are some slight differences from the previous examination. 173.applu, 188.ammp, 189.lucas indicated less clock rate dependency. However 171.swim and 179.art indicated 1.7% and 2.8%, respectively, vs. the previous 1.0% and 0.5%. The integral mark again grew by 6% at 10% clock rate increase.
DDR400 memory for AMD Opteron
As you remember, AMD Opteron were announced with dual-channel DDR333 support (registered; ECC as an option). However, when the registered DDR400 were unveiled, it turned out that Opteron works nice with these as well. Let's see how bandwidth increase affects the performance in SPEC CPU2000. By the way, you can compare these charts with those published in this old article.
Despite the bus clock increased by 20%, CINT2000 subtests increased results by significantly smaller values. Only 181.mcf stood out with its 12%. This correlates to test's peculiarities acknowledged earlier.
And the CFP2000 tasks reacted more actively and showed over 3% result growth (except 177.mesa and 200.sixtrack). 171.swim and 179.art stood out again. Besides, the aforementioned 189.lucas was obviously more focused on bandwidth.
The four charts above can be considered an up-to-date estimation of core and bus clock influence on SPEC CPU2000 results. By the way, if you take values from the last two charts, divide them by 2 and add the values from the first two charts, you get about 10% values again :)
Athlon 64 and Athlon 64 FX/Opteron memory controllers
The next pair to examine is Athlon 64 and Opteron with AMD64 architecture. The first is coupled with fast unregistered DDR400 memory, while the second is distinguished by a dual-channel memory controller that requires registered modules. We examined Athlon 64 3200+ and Opteron 146 processors, both at 2GHz.
As you can see, in general the processors are on the same level, and the results vary up to 3.5%. Note Opteron advantage in bandwidth-sensitive 181.mcf subtest. And it seems 175.vpr and 255.vortex depend on the lower latency of unregistered modules more (we haven't yet examined how the "registrability" affects performance, but we'll try in the future articles).
The advantage of the dual-channel controller is noticeable in CFP2000 tasks. Two of them sped up by 30% and 35%, respectively, while the five performed 5+ percent faster. Still even here we can see the performance drop in two tasks, namely 188.ammp and 301.aspi. Let's consider them as highly dependent on memory latency. We'll try to estimate this dependency in our next materials.
So, in the late 2003 the SPEC CPU2000 leader is Intel's flagship processor, 3.2GHz Pentium 4 EE. AMD Athlon 64 FX-51 catches up lagging by 9% and 13% respectively in SPECint_base2000 and SPECfp_base2000 (however, it's clock rate is almost 1/3 lower).
Note how these two products stand out of the standard desktop series of both companies. Pentium 4 EE features 2MB L3 cache, while Athlon 64 FX utilizes registered modules unusual for desktop systems. I can state that both companies used their server developments in their battle for the desktop market. Still AMD's approach is more obvious - specs of Athlon 64 FX-15 are identical to those of Opteron 148. At that there are 200/800 processor series with the same specs designed for multiway systems. In its turn, Pentium 4 EE doesn't have analogs among Xeons.
The third place is occupied by 3.GHz Intel Pentium 4 without "EE" postfix. It loses 7% to Athlon 64 FX-51 in the integer taks, being still alike with real values.
The purely desktop Athlon 64 3200+ is the fourth performing similar to 3.2GHz Pentium 4 in CINT2000 tasks and losing it 12% in CFP2000 subtests. This is primarily called forth by its single-channel controller, as these subtests are very bandwidth-sensitive.
As for the Athlon XP, it's application is well founded only for older programs that don't fully use SIMD capabilities of modern processors. In such situations it might compete with solutions on Athlon 64/64FX.
Kirill Kochetkov email@example.com
Write a comment below. No registration needed!