SPEC CPU2000. Part 27: Intel Core 2 Extreme X6800, Intel C++/Fortran 9.1 Compilers

It may be the last article in the series of materials devoted to analyzing various platforms in SPEC CPU2000, because SPEC announced a new long-awaited SPEC CPU2006. Nevertheless, it still has a right to exist and it will be devoted to a top processor, recently presented by Intel — Intel Core 2 Extreme X6800.

Unfortunately, this processor is not very "extreme" this time (compared to other extreme processors from Intel). We'll omit unnecessary details (like unlocked FSB multiplier), but the only difference of this processor from the previous "non-extreme" representative of Intel Core 2 Duo E6700 lies in a higher clock rate — 2.93 GHz, which is higher by 266 MHz (by one step). This extreme processor does not possess a higher FSB clock (333 MHz) to distinguish it from other representatives of Core 2 desktop processors. Thus, this analysis actually comes down to the question "what can additional 266 MHz give to Conroe core" to be answered by SPEC CPU2000 tests.

SPEC CPU2000 tasks were compiled with the following compilers:

Intel(R) C++ Compiler for 32-bit applications, Version 9.1 Build 20060323Z Package ID: W_CC_P_9.1.022
Intel(R) Fortran Compiler for 32-bit applications, Version 9.1 Build 20060323Z Package ID: W_FC_C_9.1.024

In all cases (various optimization options) we used the same keys to compile the code - two-pass compilation with profile-guided optimization (PGO):

PASS1_CFLAGS= -Qipo -O3 -Qprof_gen
PASS2_CFLAGS= -Qipo -O3 -Qprof_use

Intel Core 2 Extreme X6800

As usual, at first we shall analyze SPEC CPU2000 performance in pure form, that is in absolute values with all optimizations possible, including the new option for Intel Core 2 processors. We'll use the usual single-thread method to run the tests (base metrics).

	No Opt.	-QxK	-QxW	-QxN	-QxB	-QxP	-QxT
164.gzip	1618	1797	1808	1795	1798	1802	1803
175.vpr	-	2143	2273	2263	2261	2322	2242
176.gcc	-	3333	3360	3375	3378	3391	3385
181.mcf	3961	3669	3672	5084	5113	5093	5102
186.crafty	2337	2356	2707	2714	2690	2677	2704
197.parser	1654	1652	1617	1647	1647	1655	1653
252.eon	2965	3255	3744	3775	3566	3766	-
253.perlbmk	3281	3260	3227	3307	3325	3261	3302
254.gap	2869	2869	3054	3066	3048	3052	3050
255.vortex	4776	4767	4806	4771	4801	4856	4872
256.bzip2	2472	2349	2317	2340	2338	2332	2341
300.twolf	2622	3090	3197	3175	3175	3334	3342
SPECint_base2000	2711	2757	2854	2943	2930	2960	2895

SPECint tests gave us a surprise — usually inoperable non-optimized 175.vpr and 176.gcc tasks are supplemented with 252.eon (-QxT), specific for Core 2 processors. This task did not behave like that on Core 2 Duo E6700 that took part in our previous analysis.

Here are performance ratings of various optimizations of SPECint 2000 tasks according to SPECint_base2000: no opt. < -QxK < -QxW < -QxT < -QxB < -QxN < -QxP. Compared to test results of Core 2 Duo E6700, -QxT optimization went down in this row, having settled between -QxW and -QxB. It might have to do with 252.eon leaving this list, which resulted in a lower total score (this task contributes much to it.) Comparing individual results of integer tasks, we can make sure that -QxT (native optimization for Core 2 processors) is no worse or even better in most of them than the absolute leader — -QxP optimization, called optimization for Intel Pentium 4/D, Core Solo/Duo, as well as compatible Intel processors with SSE3 support.

	No Opt.	-QxK	-QxW	-QxN	-QxB	-QxP	-QxT
168.wupwise	3838	3660	3943	4527	4283	4604	4598
171.swim	2625	2999	3071	3071	3070	3066	3047
172.mgrid	1431	1783	1857	1868	1834	1870	1868
173.applu	1566	1662	1697	2224	2103	2234	2230
177.mesa	1925	2699	2833	2827	2479	2847	2678
178.galgel	2748	5092	6402	7057	6363	7076	7068
179.art	8242	9153	9301	9286	9229	8438	8442
183.equake	2723	2680	2728	2714	2693	3094	3089
187.facerec	2399	2991	3005	2985	2968	3038	3032
188.ammp	1944	1949	2098	2079	1998	2095	1991
189.lucas	2537	2495	2941	2887	2535	2898	2891
191.fma3d	1725	1727	2187	2208	1924	2185	2214
200.sixtrack	769	746	1163	1144	726	1136	1159
301.apsi	1713	1741	1834	1846	1840	1853	1833
SPECfp_base2000	2223	2493	2764	2858	2633	2875	2853

SPECfp 2000 tasks with real numbers offered no surprises. Here are the average SPECfp_base2000 results: No Opt. < -QxK < -QxB < -QxW < -QxT < -QxN < -QxP, that is the same sequence as the results of Core 2 Duo E6700.

Comparison with Intel Core 2 Duo E6700

Let's proceed to the next stage of our analysis — comparing the results with the previous leader, Core 2 Duo E6700. Remember that we actually compare the same Conroe cores (of different revisions — Core 2 Duo E6700 was represented by an engineering sample with an earlier core revision — B0 versus B1), but with different clock rates: 2.93 GHz and 2.66 GHz correspondingly.

SPECint 2000. The advantage of Core 2 Extreme X6800 over Core 2 Duo E6700 is demonstrated in all integer tasks, varying on the quantitative level. The least advantageous task is 181.mcf (according to our previous tests, it's critical to memory bandwidth). It demonstrates just a 0.2% advantage in non-optimized code and 4.7%-4.9% with optimizations for modern processors. Maximum advantage is demonstrated in 256.bzip2 - it reaches 13.9% in case of -QxB. Note that the maximum gain expected from the core clock frequency should be 2.93/2.66 = 1.10 times, that is approximately by 10%. It may be the effect of some changes in the newer core revision B1, or it may be a measurement error. Anyway, performance gain in integer tasks (SPECint_base2000) generally amounts to 8.5%-9.6% (if we don't take into account incorrect results of the -QxT optimization that lack 252.eon), that is it falls within those 10% dictated by CPU clock differences.

As usual, tests with real numbers demonstrate a less homogenous picture. We can see a stable drop in 171.swim performance in all optimizations (from -4.8 to -6.0%), which is rather difficult to explain (perhaps it has to do with differences in core revisions, this time — not to the credit of the newer B1), and a large spead in values. For example, in 178.galgel (the gain ranges from 9.0% to 16.3%). We can also note enough tasks that do not gain much performance on the new extreme processor — for example, 173.applu, 183.equake, and 189.lucas. Strange as it may seem, gains in the average SPECfp_base2000 results fall within a narrow interval - 5.5%-6.0%, smaller than performance gains in integer tasks. It quite possibly has to do with greater requirements of real SPEC CPU2000 tasks to memory bandwidth, which is identical in both analyses (peak memory bandwidth of dual-channel DDR2-800, actually limited by the throughput of 266 MHz FSB to 8.53 GB/sec).

Efficiency of dual cores

And finally, by analogy with previous analyses of dual-core processors, let's evaluate the efficiency of running two SPEC CPU2000 instances, using the rate metrics. Results of a single instance obtained in this metrics will be taken for the reference point.

Efficiency of running two instances of integer tasks is very high practically in all cases, except for 181.mcf. According to our previous results, this task cannot boast of high "parallel" efficiency on other dual-core processors as well, including Intel Pentium Extreme Edition, Intel Core Duo, and Intel Core 2 Duo. We had already assumed that such low efficiency of parallel execution of this task has to do with the reduction of available L2 Cache per single core (in this case — from 4 MB to 2 MB), while this task has high requirements to cache/memory bandwidth. According to results of this task as well as all the other SPECint 2000 tasks and the average SPECint_rate2000, efficiency of parallel execution of two task instances on Core 2 Extreme X6800 is a tad lower than on Core 2 Duo E6700. For example, according to the average results, the gain from running two instances on Core 2 Extreme is 76-78%, while it was 78-82% on Core 2 Duo.

The general picture of comparing performance of two instances of real tasks versus a single instance of the task on Core 2 Extreme X6800 looks qualitatively the same as on Core 2 Duo E6700. Like in integer tests, the differences lie in quantitative results, Core 2 Extreme is again defeated. According to average results in SPECfp_rate2000, performance gain from running two instances of the tasks amounts to 47-56%, which is a tad lower than the results obtained on Core 2 Duo (54-63%).

Conclusion

Results obtained in this article are quite natural. An increase in Conroe core's clock frequency from 2.66 GHz to 2.93 GHz (that is approximately by 10%) is generally accompanied by a proportional performance gain in SPEC CPU2000 — from 8.5% to 9.6% for integer tasks and from 5.5% to 6.0% for tasks with real numbers, which are more critical to memory bandwidth than to CPU clock. At the same time, efficiency of parallel execution of tasks on the higher-clocked extreme modification of Conroe core is a tad lower than on the previously reviewed "non-extreme" processor (lower-clocked earlier revision of the core). Performance gain from running two instances of the tasks amounts to 76-78% for integer tasks and 47-56% for tests with real numbers.

Dmitri Besedin (dmitri_b@ixbt.com)
November 13, 2006

Write a comment below. No registration needed!