iXBT Labs - Computer Hardware in Detail

Platform

Video

Multimedia

Mobile

Other

SPEC CPU2000. Part 26: Engineering Sample of Intel Core 2 Duo E6700 (Conroe), Intel C++/Fortran 9.1 Compilers

July 4, 2006




Even though desktop processors of the Intel Core 2 series with the Conroe core are currently represented solely by engineering samples, their performance and microarchitectural features are analyzed rather well in various synthetic tests and real applications. It's high time we should fill up the canvas with performance test results of these processors (represented by an engineering sample of Intel Core 2 Duo E6700 2.66 GHz) in SPEC CPU2000. Especially as there has been recently released Intel C++/Fortran Compiler 9.1, which now features code optimization for Intel Core 2 processors (not quite officially yet :) — for example, the documentation does not mention this feature).

So, we have recompiled SPEC CPU2000 tasks in the following compilers:

  • Intel(R) C++ Compiler for 32-bit applications, Version 9.1 Build 20060323Z Package ID: W_CC_P_9.1.022
  • Intel(R) Fortran Compiler for 32-bit applications, Version 9.1 Build 20060323Z Package ID: W_FC_C_9.1.024

In all cases (various optimization options) we used the same keys to compile the code - two-pass compilation with profile-guided optimization (PGO):

PASS1_CFLAGS= -Qipo -O3 -Qprof_gen
PASS2_CFLAGS= -Qipo -O3 -Qprof_use

Intel Core 2 E6700 (Engineering sample)

As usual, at first we shall analyze SPEC CPU2000 performance in pure form, that is in absolute values with all optimizations possible, including the new option for Intel Core 2 processors. We'll use the usual single-thread method to run the tests (base metrics).

After we had published our previous analysis of 65nm processors with Presler and Yonah cores, we found out the reason why 255.vortex wouldn't work — the problem was in errors in input data for this task in this version of the test package. This error has been fixed in this review by using the input data from the previous version of SPEC CPU2000 1.2, so performance results in this task are also available in the table below. Nevertheless, like in all previous reviews (including those with older SPEC CPU2000 1.2), "non-optimized" variants of 175.vpr and 176.gcc are abnormally terminated. That's why their performance rating is still not published in the table below.

  No Opt. -QxK -QxW -QxN -QxB -QxP -QxT
164.gzip
1461
1644
1656
1637
1647
1646
1645
175.vpr
-
1980
2075
2076
2097
2136
2083
176.gcc
-
3064
3084
3089
3068
3097
3089
181.mcf
3952
3611
3620
4846
4877
4863
4869
186.crafty
2123
2145
2467
2470
2450
2438
2465
197.parser
1509
1515
1484
1509
1500
1514
1512
252.eon
2700
2956
3410
3442
3251
3430
3313
253.perlbmk
2951
2984
2975
3009
3014
2991
2999
254.gap
2696
2691
2838
2854
2833
2834
2832
255.vortex
4419
4318
4456
4301
4311
4522
4548
256.bzip2
2259
2152
2057
2104
2052
2076
2081
300.twolf
2356
2821
2904
2878
2844
3012
3017
SPECint_base2000
2495
2542
2626
2694
2672
2715
2706

You can see a new code optimization in the above tables - "-QxT". It's easy to guess that it corresponds to Intel Core 2 processors, though its designation is not quite obvious. Considering that all previous optimizations were based on codenames of processor cores (Katmai, Willamette, Northwood, Banias, Prescott), it would have been natural to expect the "-QxC" option for Core 2 processors (Conroe). Nevertheless, the new option is written as "-QxT", which presumably corresponds to the Tejas core that had never been released.

Anyway, let's analyze the results. Here is the rating by the overall score in SPECint_base2000 integer tasks: "no opt." < -QxK < -QxW < -QxB < -QxN < -QxT < -QxP. We can note the following peculiarities: the best optimization, even if not with heavy odds, is still "-QxP", which you shouldn't take literally as "optimization for Prescott". The latest documentation runs that the -QxP option results in compiling code optimized for Intel Core Duo, Intel Core Solo, and Intel Pentium 4 processors with SSE3 support, as well as all Intel compatible processors supporting the above mentioned instructions. In other words, it's optimization for the above mentioned instruction sets, not for a CPU microarchitecture. From this point of view, the best results demonstrated by Intel Core Duo and Intel Core 2 with the -QxP option are not surprising. What concerns the latter, let's hope that future revisions of Compilers 9.1 will be able to squeeze maximum from these processors with the native option -QxT. Here is the second peculiarity of these results - specific (this time) code optimization for Banias core (-QxB) is also worse than the non-specific optimization for processors supporting the SSE2 instruction set (-QxN). That is the above situation (-QxP vs. -QxT) is repeated here.

  No Opt. -QxK -QxW -QxN -QxB -QxP -QxT
168.wupwise
3709
3514
3790
4408
4134
4499
4487
171.swim
2763
3189
3227
3227
3224
3225
3207
172.mgrid
1330
1682
1756
1763
1722
1763
1762
173.applu
1558
1642
1685
2186
2037
2195
2193
177.mesa
1758
2479
2602
2604
2284
2614
2466
178.galgel
2521
4587
5557
6341
5769
6365
6075
179.art
7465
8341
8455
8460
8421
7679
7682
183.equake
2636
2595
2647
2645
2609
3051
3037
187.facerec
2194
2723
2745
2717
2692
2768
2772
188.ammp
1794
1794
1944
1918
1844
1934
1840
189.lucas
2450
2393
2903
2847
2440
2847
2867
191.fma3d
1637
1639
2106
2124
1835
2100
2135
200.sixtrack
696
678
1061
1043
661
1034
1055
301.apsi
1600
1597
1683
1730
1731
1695
1685
SPECfp_base2000
2101
2351
2610
2710
2486
2722
2697

Let's proceed to performance results in SPEC CPU2000 with real figures. The ratings are in a different order by the mean SPECfp_base2000 results: "no opt." < -QxK < -QxB < -QxW < -QxT < -QxN < -QxP. But the general tendency remains, it becomes even more pronounced: the native -QxT option is worse than the best -QxP option as well as the -QxN optimization for SSE2 for the "new" processors (Northwood and higher). Specific code optimization for Banias core (-QxB) is worse than nonspecific -QxN as well as the old version of the nonspecific SSE2 optimization -QxW. Thus, Intel C++/Fortran compilers should be fine-tuned for better optimization for Intel Core 2 processors. As for now, you can use the best option -QxP, especially as it's impossible so far to compare performance results of Core 2 Duo E6700 and other processors with the -QxT optimization.

Comparison with Intel Pentium Extreme Edition 965

We decided to compare the above results with this processor. The fact is that despite significant microarchitectural and clock differences, performance results of Intel Pentium Extreme Edition 965 in SPEC CPU2000 tasks are... the least inferior to the results obtained for the processor under review!

SPEC CPU2000 Integer Tests. All tasks without exception are executed faster by Intel Core 2 Duo, despite its much lower clock (2.66 GHz versus 3.73 GHz, that is 1.4 times as low). The minimal advantage is demonstrated in 164.gzip (28-31%), the maximal one (87-105%) — in 181.mcf. There is practically no significant spread in relative values corresponding to various optimizations. It's especially noticeable in the mean SPECint_base2000 results — sharp within 54-55%. Considering the difference in clock frequencies, the advantage of the new Intel Core microarchitecture over NetBurst in SPEC CPU2000 integer tasks is over twofold! (2.17-fold, to be more exact).

Tests with real figures demonstrate a less simple picture. But on the whole, Conroe is still superior to Presler. These results depend more on a given optimization and task, which gain practically nothing from Intel Core 2 (for example, 171.swim). 168.wupwise and especially 179.art demonstrate interesting results, having reached the advantage of 145% and 184% correspondingly. The smallest advantage in these tasks is demonstrated with the -QxP optimization, which probably speaks of higher efficiency of this optimization on NetBurst cores supporting SSE3. This optimization gets the smallest advantage on Intel Core 2 in the PECfp_base2000 total score — about 26%, while the other optimizations demonstrate the 33-38% advantage. Considering the difference in clock frequencies of these processors, the advantage of Intel Core microarchitecture over NetBurst in SPEC CPU2000 tasks with real figures is also nearly twofold (up to 1.93 times).

Efficiency of dual cores

And finally, as Intel Core 2 Duo E6700 is a dual core processor, let's evaluate the efficiency of running two SPEC CPU2000 instances, using the rate metrics. The reference point here is the results obtained in this metrics with one instance running.

Efficiency of running two instances of integer tasks is very high practically in all cases, except for 181.mcf. According to our previous results, this task cannot boast of high "parallel" efficiency on the other cores either, such as Intel Pentium Extreme Edition (Presler) and Intel Core Duo (Yonah). In our previous analysis dedicated to Yonah, we assumed that such low efficiency of parallel processing of this task had to do with the reduction of available L2 Cache per core (in this case — from 4 MB to 2 MB), while this task required high cache/memory bandwidth. Nevertheless, it must be noted that the task demonstrates higher parallel efficiency on Intel Core 2 Duo (Conroe) compared to Intel Core Duo (Yonah). The relative result is at least non-negative in most optimizations. That's probably the effect of a much larger L2 Cache (4 MB versus 2 MB in Yonah). It's also reasonable to assume that it's the larger L2 Cache that has a positive effect on the relative results in all other tasks, which are better compared to those for Yonah. For example, the SPECint_rate2000 mean score demonstrates a 78-82% gain when you run two instances of SPEC CPU2000 integer tasks on Conroe, while the same conditions on Yonah yield only 67-70% gain.

A noticeably better picture (compared to Yonah) is also demonstrated in SPEC CPU2000 real tasks. Firstly, there are absolutely no results with negative gain (demonstrated on Yonah, for example, by 179.art). Secondly, the overall efficiency of running two instances of these tasks is also higher in most cases. According to the SPECfp_rate2000 mean score, performance gain from running two instances of tasks on Conroe is 54-63%, which is also higher compared to Yonah results (48-53%).

Conclusion

According to SPEC CPU2000 performance tests, the new Intel Core 2 Duo E6700 processor, represented by an engineering sample, is currently unrivalled. Compared to the previous performance leader (Intel Pentium Extreme Edition 965), the new processor demonstrates the 55% advantage in integer tasks and 26-38% advantage in tasks with real figures. Considering the clock ratio of these processors, the advantage of the new Intel Core microarchitecture over the old NetBurst is more than twofold in the first case and nearly twofold in the second case. The new processor is also notable for high efficiency of running two instances of SPEC CPU2000 tasks, which is noticeably higher compared to its closest counterpart — Intel Core Duo.

Analysis of the currently available revisions of Intel C++/Fortran 9.1 compilers shows that the best optimization for Intel Core 2 processors is still -QxP, which corresponds to Yonah and Prescott cores (Smithfield, Presler) supporting SSE3 instructions. The new specific -QxT optimization for Intel Core 2 processors supporting SSE4 is slightly inferior in SPEC CPU2000 performance. It obviously requires some fine-tuning in these compilers.



Dmitri Besedin (dmitri_b@ixbt.com)
July 04, 2006



Write a comment below. No registration needed!


Article navigation:



blog comments powered by Disqus

  Most Popular Reviews More    RSS  

AMD Phenom II X4 955, Phenom II X4 960T, Phenom II X6 1075T, and Intel Pentium G2120, Core i3-3220, Core i5-3330 Processors

Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
February 1, 2013 · Processor Roundups

Inno3D GeForce GTX 670 iChill, Inno3D GeForce GTX 660 Ti Graphics Cards

A couple of mid-range adapters with original cooling systems.
January 30, 2013 · Video cards: NVIDIA GPUs

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

AMD FX-8350 Processor

The first worthwhile Piledriver CPU.
September 11, 2012 · Processors: AMD

Consumed Power, Energy Consumption: Ivy Bridge vs. Sandy Bridge

Trying out the new method.
September 18, 2012 · Processors: Intel
  Latest Reviews More    RSS  

i3DSpeed, September 2013

Retested all graphics cards with the new drivers.
Oct 18, 2013 · 3Digests

i3DSpeed, August 2013

Added new benchmarks: BioShock Infinite and Metro: Last Light.
Sep 06, 2013 · 3Digests

i3DSpeed, July 2013

Added the test results of NVIDIA GeForce GTX 760 and AMD Radeon HD 7730.
Aug 05, 2013 · 3Digests

Gainward GeForce GTX 650 Ti BOOST 2GB Golden Sample Graphics Card

An excellent hybrid of GeForce GTX 650 Ti and GeForce GTX 660.
Jun 24, 2013 · Video cards: NVIDIA GPUs

i3DSpeed, May 2013

Added the test results of NVIDIA GeForce GTX 770/780.
Jun 03, 2013 · 3Digests
  Latest News More    RSS  

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  About us & Privacy policy  ·  Twitter  ·  Facebook


18

Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.