It may be the last article in the series of materials devoted to analyzing various platforms in SPEC CPU2000, because SPEC announced a new long-awaited SPEC CPU2006. Nevertheless, it still has a right to exist and it will be devoted to a top processor, recently presented by Intel — Intel Core 2 Extreme X6800.
Unfortunately, this processor is not very "extreme" this time (compared to other extreme processors from Intel). We'll omit unnecessary details (like unlocked FSB multiplier), but the only difference of this processor from the previous "non-extreme" representative of Intel Core 2 Duo E6700 lies in a higher clock rate — 2.93 GHz, which is higher by 266 MHz (by one step). This extreme processor does not possess a higher FSB clock (333 MHz) to distinguish it from other representatives of Core 2 desktop processors. Thus, this analysis actually comes down to the question "what can additional 266 MHz give to Conroe core" to be answered by SPEC CPU2000 tests.
SPEC CPU2000 tasks were compiled with the following compilers:
In all cases (various optimization options) we used the same keys to compile the code - two-pass compilation with profile-guided optimization (PGO):
Intel Core 2 Extreme X6800
As usual, at first we shall analyze SPEC CPU2000 performance in pure form, that is in absolute values with all optimizations possible, including the new option for Intel Core 2 processors. We'll use the usual single-thread method to run the tests (base metrics).
SPECint tests gave us a surprise — usually inoperable non-optimized 175.vpr and 176.gcc tasks are supplemented with 252.eon (-QxT), specific for Core 2 processors. This task did not behave like that on Core 2 Duo E6700 that took part in our previous analysis.
Here are performance ratings of various optimizations of SPECint 2000 tasks according to SPECint_base2000: no opt. < -QxK < -QxW < -QxT < -QxB < -QxN < -QxP. Compared to test results of Core 2 Duo E6700, -QxT optimization went down in this row, having settled between -QxW and -QxB. It might have to do with 252.eon leaving this list, which resulted in a lower total score (this task contributes much to it.) Comparing individual results of integer tasks, we can make sure that -QxT (native optimization for Core 2 processors) is no worse or even better in most of them than the absolute leader — -QxP optimization, called optimization for Intel Pentium 4/D, Core Solo/Duo, as well as compatible Intel processors with SSE3 support.
SPECfp 2000 tasks with real numbers offered no surprises. Here are the average SPECfp_base2000 results: No Opt. < -QxK < -QxB < -QxW < -QxT < -QxN < -QxP, that is the same sequence as the results of Core 2 Duo E6700.
Comparison with Intel Core 2 Duo E6700
Let's proceed to the next stage of our analysis — comparing the results with the previous leader, Core 2 Duo E6700. Remember that we actually compare the same Conroe cores (of different revisions — Core 2 Duo E6700 was represented by an engineering sample with an earlier core revision — B0 versus B1), but with different clock rates: 2.93 GHz and 2.66 GHz correspondingly.
SPECint 2000. The advantage of Core 2 Extreme X6800 over Core 2 Duo E6700 is demonstrated in all integer tasks, varying on the quantitative level. The least advantageous task is 181.mcf (according to our previous tests, it's critical to memory bandwidth). It demonstrates just a 0.2% advantage in non-optimized code and 4.7%-4.9% with optimizations for modern processors. Maximum advantage is demonstrated in 256.bzip2 - it reaches 13.9% in case of -QxB. Note that the maximum gain expected from the core clock frequency should be 2.93/2.66 = 1.10 times, that is approximately by 10%. It may be the effect of some changes in the newer core revision B1, or it may be a measurement error. Anyway, performance gain in integer tasks (SPECint_base2000) generally amounts to 8.5%-9.6% (if we don't take into account incorrect results of the -QxT optimization that lack 252.eon), that is it falls within those 10% dictated by CPU clock differences.
As usual, tests with real numbers demonstrate a less homogenous picture. We can see a stable drop in 171.swim performance in all optimizations (from -4.8 to -6.0%), which is rather difficult to explain (perhaps it has to do with differences in core revisions, this time — not to the credit of the newer B1), and a large spead in values. For example, in 178.galgel (the gain ranges from 9.0% to 16.3%). We can also note enough tasks that do not gain much performance on the new extreme processor — for example, 173.applu, 183.equake, and 189.lucas. Strange as it may seem, gains in the average SPECfp_base2000 results fall within a narrow interval - 5.5%-6.0%, smaller than performance gains in integer tasks. It quite possibly has to do with greater requirements of real SPEC CPU2000 tasks to memory bandwidth, which is identical in both analyses (peak memory bandwidth of dual-channel DDR2-800, actually limited by the throughput of 266 MHz FSB to 8.53 GB/sec).
Efficiency of dual cores
And finally, by analogy with previous analyses of dual-core processors, let's evaluate the efficiency of running two SPEC CPU2000 instances, using the rate metrics. Results of a single instance obtained in this metrics will be taken for the reference point.
Efficiency of running two instances of integer tasks is very high practically in all cases, except for 181.mcf. According to our previous results, this task cannot boast of high "parallel" efficiency on other dual-core processors as well, including Intel Pentium Extreme Edition, Intel Core Duo, and Intel Core 2 Duo. We had already assumed that such low efficiency of parallel execution of this task has to do with the reduction of available L2 Cache per single core (in this case — from 4 MB to 2 MB), while this task has high requirements to cache/memory bandwidth. According to results of this task as well as all the other SPECint 2000 tasks and the average SPECint_rate2000, efficiency of parallel execution of two task instances on Core 2 Extreme X6800 is a tad lower than on Core 2 Duo E6700. For example, according to the average results, the gain from running two instances on Core 2 Extreme is 76-78%, while it was 78-82% on Core 2 Duo.
The general picture of comparing performance of two instances of real tasks versus a single instance of the task on Core 2 Extreme X6800 looks qualitatively the same as on Core 2 Duo E6700. Like in integer tests, the differences lie in quantitative results, Core 2 Extreme is again defeated. According to average results in SPECfp_rate2000, performance gain from running two instances of the tasks amounts to 47-56%, which is a tad lower than the results obtained on Core 2 Duo (54-63%).
Results obtained in this article are quite natural. An increase in Conroe core's clock frequency from 2.66 GHz to 2.93 GHz (that is approximately by 10%) is generally accompanied by a proportional performance gain in SPEC CPU2000 — from 8.5% to 9.6% for integer tasks and from 5.5% to 6.0% for tasks with real numbers, which are more critical to memory bandwidth than to CPU clock. At the same time, efficiency of parallel execution of tasks on the higher-clocked extreme modification of Conroe core is a tad lower than on the previously reviewed "non-extreme" processor (lower-clocked earlier revision of the core). Performance gain from running two instances of the tasks amounts to 76-78% for integer tasks and 47-56% for tests with real numbers.
Dmitri Besedin (firstname.lastname@example.org)
November 13, 2006
Write a comment below. No registration needed!