SPEC CPU2000. Part 26: Engineering Sample of Intel Core 2 Duo E6700 (Conroe), Intel C++/Fortran 9.1 Compilers

Even though desktop processors of the Intel Core 2 series with the Conroe core are currently represented solely by engineering samples, their performance and microarchitectural features are analyzed rather well in various synthetic tests and real applications. It's high time we should fill up the canvas with performance test results of these processors (represented by an engineering sample of Intel Core 2 Duo E6700 2.66 GHz) in SPEC CPU2000. Especially as there has been recently released Intel C++/Fortran Compiler 9.1, which now features code optimization for Intel Core 2 processors (not quite officially yet :) — for example, the documentation does not mention this feature).

So, we have recompiled SPEC CPU2000 tasks in the following compilers:

Intel(R) C++ Compiler for 32-bit applications, Version 9.1 Build 20060323Z Package ID: W_CC_P_9.1.022
Intel(R) Fortran Compiler for 32-bit applications, Version 9.1 Build 20060323Z Package ID: W_FC_C_9.1.024

In all cases (various optimization options) we used the same keys to compile the code - two-pass compilation with profile-guided optimization (PGO):

PASS1_CFLAGS= -Qipo -O3 -Qprof_gen
PASS2_CFLAGS= -Qipo -O3 -Qprof_use

Intel Core 2 E6700 (Engineering sample)

As usual, at first we shall analyze SPEC CPU2000 performance in pure form, that is in absolute values with all optimizations possible, including the new option for Intel Core 2 processors. We'll use the usual single-thread method to run the tests (base metrics).

After we had published our previous analysis of 65nm processors with Presler and Yonah cores, we found out the reason why 255.vortex wouldn't work — the problem was in errors in input data for this task in this version of the test package. This error has been fixed in this review by using the input data from the previous version of SPEC CPU2000 1.2, so performance results in this task are also available in the table below. Nevertheless, like in all previous reviews (including those with older SPEC CPU2000 1.2), "non-optimized" variants of 175.vpr and 176.gcc are abnormally terminated. That's why their performance rating is still not published in the table below.

	No Opt.	-QxK	-QxW	-QxN	-QxB	-QxP	-QxT
164.gzip	1461	1644	1656	1637	1647	1646	1645
175.vpr	-	1980	2075	2076	2097	2136	2083
176.gcc	-	3064	3084	3089	3068	3097	3089
181.mcf	3952	3611	3620	4846	4877	4863	4869
186.crafty	2123	2145	2467	2470	2450	2438	2465
197.parser	1509	1515	1484	1509	1500	1514	1512
252.eon	2700	2956	3410	3442	3251	3430	3313
253.perlbmk	2951	2984	2975	3009	3014	2991	2999
254.gap	2696	2691	2838	2854	2833	2834	2832
255.vortex	4419	4318	4456	4301	4311	4522	4548
256.bzip2	2259	2152	2057	2104	2052	2076	2081
300.twolf	2356	2821	2904	2878	2844	3012	3017
SPECint_base2000	2495	2542	2626	2694	2672	2715	2706

You can see a new code optimization in the above tables - "-QxT". It's easy to guess that it corresponds to Intel Core 2 processors, though its designation is not quite obvious. Considering that all previous optimizations were based on codenames of processor cores (Katmai, Willamette, Northwood, Banias, Prescott), it would have been natural to expect the "-QxC" option for Core 2 processors (Conroe). Nevertheless, the new option is written as "-QxT", which presumably corresponds to the Tejas core that had never been released.

Anyway, let's analyze the results. Here is the rating by the overall score in SPECint_base2000 integer tasks: "no opt." < -QxK < -QxW < -QxB < -QxN < -QxT < -QxP. We can note the following peculiarities: the best optimization, even if not with heavy odds, is still "-QxP", which you shouldn't take literally as "optimization for Prescott". The latest documentation runs that the -QxP option results in compiling code optimized for Intel Core Duo, Intel Core Solo, and Intel Pentium 4 processors with SSE3 support, as well as all Intel compatible processors supporting the above mentioned instructions. In other words, it's optimization for the above mentioned instruction sets, not for a CPU microarchitecture. From this point of view, the best results demonstrated by Intel Core Duo and Intel Core 2 with the -QxP option are not surprising. What concerns the latter, let's hope that future revisions of Compilers 9.1 will be able to squeeze maximum from these processors with the native option -QxT. Here is the second peculiarity of these results - specific (this time) code optimization for Banias core (-QxB) is also worse than the non-specific optimization for processors supporting the SSE2 instruction set (-QxN). That is the above situation (-QxP vs. -QxT) is repeated here.

	No Opt.	-QxK	-QxW	-QxN	-QxB	-QxP	-QxT
168.wupwise	3709	3514	3790	4408	4134	4499	4487
171.swim	2763	3189	3227	3227	3224	3225	3207
172.mgrid	1330	1682	1756	1763	1722	1763	1762
173.applu	1558	1642	1685	2186	2037	2195	2193
177.mesa	1758	2479	2602	2604	2284	2614	2466
178.galgel	2521	4587	5557	6341	5769	6365	6075
179.art	7465	8341	8455	8460	8421	7679	7682
183.equake	2636	2595	2647	2645	2609	3051	3037
187.facerec	2194	2723	2745	2717	2692	2768	2772
188.ammp	1794	1794	1944	1918	1844	1934	1840
189.lucas	2450	2393	2903	2847	2440	2847	2867
191.fma3d	1637	1639	2106	2124	1835	2100	2135
200.sixtrack	696	678	1061	1043	661	1034	1055
301.apsi	1600	1597	1683	1730	1731	1695	1685
SPECfp_base2000	2101	2351	2610	2710	2486	2722	2697

Let's proceed to performance results in SPEC CPU2000 with real figures. The ratings are in a different order by the mean SPECfp_base2000 results: "no opt." < -QxK < -QxB < -QxW < -QxT < -QxN < -QxP. But the general tendency remains, it becomes even more pronounced: the native -QxT option is worse than the best -QxP option as well as the -QxN optimization for SSE2 for the "new" processors (Northwood and higher). Specific code optimization for Banias core (-QxB) is worse than nonspecific -QxN as well as the old version of the nonspecific SSE2 optimization -QxW. Thus, Intel C++/Fortran compilers should be fine-tuned for better optimization for Intel Core 2 processors. As for now, you can use the best option -QxP, especially as it's impossible so far to compare performance results of Core 2 Duo E6700 and other processors with the -QxT optimization.

Comparison with Intel Pentium Extreme Edition 965

We decided to compare the above results with this processor. The fact is that despite significant microarchitectural and clock differences, performance results of Intel Pentium Extreme Edition 965 in SPEC CPU2000 tasks are... the least inferior to the results obtained for the processor under review!

SPEC CPU2000 Integer Tests. All tasks without exception are executed faster by Intel Core 2 Duo, despite its much lower clock (2.66 GHz versus 3.73 GHz, that is 1.4 times as low). The minimal advantage is demonstrated in 164.gzip (28-31%), the maximal one (87-105%) — in 181.mcf. There is practically no significant spread in relative values corresponding to various optimizations. It's especially noticeable in the mean SPECint_base2000 results — sharp within 54-55%. Considering the difference in clock frequencies, the advantage of the new Intel Core microarchitecture over NetBurst in SPEC CPU2000 integer tasks is over twofold! (2.17-fold, to be more exact).

Tests with real figures demonstrate a less simple picture. But on the whole, Conroe is still superior to Presler. These results depend more on a given optimization and task, which gain practically nothing from Intel Core 2 (for example, 171.swim). 168.wupwise and especially 179.art demonstrate interesting results, having reached the advantage of 145% and 184% correspondingly. The smallest advantage in these tasks is demonstrated with the -QxP optimization, which probably speaks of higher efficiency of this optimization on NetBurst cores supporting SSE3. This optimization gets the smallest advantage on Intel Core 2 in the PECfp_base2000 total score — about 26%, while the other optimizations demonstrate the 33-38% advantage. Considering the difference in clock frequencies of these processors, the advantage of Intel Core microarchitecture over NetBurst in SPEC CPU2000 tasks with real figures is also nearly twofold (up to 1.93 times).

Efficiency of dual cores

And finally, as Intel Core 2 Duo E6700 is a dual core processor, let's evaluate the efficiency of running two SPEC CPU2000 instances, using the rate metrics. The reference point here is the results obtained in this metrics with one instance running.

Efficiency of running two instances of integer tasks is very high practically in all cases, except for 181.mcf. According to our previous results, this task cannot boast of high "parallel" efficiency on the other cores either, such as Intel Pentium Extreme Edition (Presler) and Intel Core Duo (Yonah). In our previous analysis dedicated to Yonah, we assumed that such low efficiency of parallel processing of this task had to do with the reduction of available L2 Cache per core (in this case — from 4 MB to 2 MB), while this task required high cache/memory bandwidth. Nevertheless, it must be noted that the task demonstrates higher parallel efficiency on Intel Core 2 Duo (Conroe) compared to Intel Core Duo (Yonah). The relative result is at least non-negative in most optimizations. That's probably the effect of a much larger L2 Cache (4 MB versus 2 MB in Yonah). It's also reasonable to assume that it's the larger L2 Cache that has a positive effect on the relative results in all other tasks, which are better compared to those for Yonah. For example, the SPECint_rate2000 mean score demonstrates a 78-82% gain when you run two instances of SPEC CPU2000 integer tasks on Conroe, while the same conditions on Yonah yield only 67-70% gain.

A noticeably better picture (compared to Yonah) is also demonstrated in SPEC CPU2000 real tasks. Firstly, there are absolutely no results with negative gain (demonstrated on Yonah, for example, by 179.art). Secondly, the overall efficiency of running two instances of these tasks is also higher in most cases. According to the SPECfp_rate2000 mean score, performance gain from running two instances of tasks on Conroe is 54-63%, which is also higher compared to Yonah results (48-53%).

Conclusion

According to SPEC CPU2000 performance tests, the new Intel Core 2 Duo E6700 processor, represented by an engineering sample, is currently unrivalled. Compared to the previous performance leader (Intel Pentium Extreme Edition 965), the new processor demonstrates the 55% advantage in integer tasks and 26-38% advantage in tasks with real figures. Considering the clock ratio of these processors, the advantage of the new Intel Core microarchitecture over the old NetBurst is more than twofold in the first case and nearly twofold in the second case. The new processor is also notable for high efficiency of running two instances of SPEC CPU2000 tasks, which is noticeably higher compared to its closest counterpart — Intel Core Duo.

Analysis of the currently available revisions of Intel C++/Fortran 9.1 compilers shows that the best optimization for Intel Core 2 processors is still -QxP, which corresponds to Yonah and Prescott cores (Smithfield, Presler) supporting SSE3 instructions. The new specific -QxT optimization for Intel Core 2 processors supporting SSE4 is slightly inferior in SPEC CPU2000 performance. It obviously requires some fine-tuning in these compilers.

Dmitri Besedin (dmitri_b@ixbt.com)
July 04, 2006

Write a comment below. No registration needed!