Even though desktop processors of the Intel Core 2 series with the Conroe core are currently represented solely by engineering samples, their performance and microarchitectural features are analyzed rather well in various synthetic tests and real applications. It's high time we should fill up the canvas with performance test results of these processors (represented by an engineering sample of Intel Core 2 Duo E6700 2.66 GHz) in SPEC CPU2000. Especially as there has been recently released Intel C++/Fortran Compiler 9.1, which now features code optimization for Intel Core 2 processors (not quite officially yet :) — for example, the documentation does not mention this feature).
So, we have recompiled SPEC CPU2000 tasks in the following compilers:
In all cases (various optimization options) we used the same keys to compile the code - two-pass compilation with profile-guided optimization (PGO):
PASS1_CFLAGS= -Qipo -O3 -Qprof_gen
Intel Core 2 E6700 (Engineering sample)
As usual, at first we shall analyze SPEC CPU2000 performance in pure form, that is in absolute values with all optimizations possible, including the new option for Intel Core 2 processors. We'll use the usual single-thread method to run the tests (base metrics).
After we had published our previous analysis of 65nm processors with Presler and Yonah cores, we found out the reason why 255.vortex wouldn't work — the problem was in errors in input data for this task in this version of the test package. This error has been fixed in this review by using the input data from the previous version of SPEC CPU2000 1.2, so performance results in this task are also available in the table below. Nevertheless, like in all previous reviews (including those with older SPEC CPU2000 1.2), "non-optimized" variants of 175.vpr and 176.gcc are abnormally terminated. That's why their performance rating is still not published in the table below.
You can see a new code optimization in the above tables - "-QxT". It's easy to guess that it corresponds to Intel Core 2 processors, though its designation is not quite obvious. Considering that all previous optimizations were based on codenames of processor cores (Katmai, Willamette, Northwood, Banias, Prescott), it would have been natural to expect the "-QxC" option for Core 2 processors (Conroe). Nevertheless, the new option is written as "-QxT", which presumably corresponds to the Tejas core that had never been released.
Anyway, let's analyze the results. Here is the rating by the overall score in SPECint_base2000 integer tasks: "no opt." < -QxK < -QxW < -QxB < -QxN < -QxT < -QxP. We can note the following peculiarities: the best optimization, even if not with heavy odds, is still "-QxP", which you shouldn't take literally as "optimization for Prescott". The latest documentation runs that the -QxP option results in compiling code optimized for Intel Core Duo, Intel Core Solo, and Intel Pentium 4 processors with SSE3 support, as well as all Intel compatible processors supporting the above mentioned instructions. In other words, it's optimization for the above mentioned instruction sets, not for a CPU microarchitecture. From this point of view, the best results demonstrated by Intel Core Duo and Intel Core 2 with the -QxP option are not surprising. What concerns the latter, let's hope that future revisions of Compilers 9.1 will be able to squeeze maximum from these processors with the native option -QxT. Here is the second peculiarity of these results - specific (this time) code optimization for Banias core (-QxB) is also worse than the non-specific optimization for processors supporting the SSE2 instruction set (-QxN). That is the above situation (-QxP vs. -QxT) is repeated here.
Let's proceed to performance results in SPEC CPU2000 with real figures. The ratings are in a different order by the mean SPECfp_base2000 results: "no opt." < -QxK < -QxB < -QxW < -QxT < -QxN < -QxP. But the general tendency remains, it becomes even more pronounced: the native -QxT option is worse than the best -QxP option as well as the -QxN optimization for SSE2 for the "new" processors (Northwood and higher). Specific code optimization for Banias core (-QxB) is worse than nonspecific -QxN as well as the old version of the nonspecific SSE2 optimization -QxW. Thus, Intel C++/Fortran compilers should be fine-tuned for better optimization for Intel Core 2 processors. As for now, you can use the best option -QxP, especially as it's impossible so far to compare performance results of Core 2 Duo E6700 and other processors with the -QxT optimization.
Comparison with Intel Pentium Extreme Edition 965
We decided to compare the above results with this processor. The fact is that despite significant microarchitectural and clock differences, performance results of Intel Pentium Extreme Edition 965 in SPEC CPU2000 tasks are... the least inferior to the results obtained for the processor under review!
SPEC CPU2000 Integer Tests. All tasks without exception are executed faster by Intel Core 2 Duo, despite its much lower clock (2.66 GHz versus 3.73 GHz, that is 1.4 times as low). The minimal advantage is demonstrated in 164.gzip (28-31%), the maximal one (87-105%) — in 181.mcf. There is practically no significant spread in relative values corresponding to various optimizations. It's especially noticeable in the mean SPECint_base2000 results — sharp within 54-55%. Considering the difference in clock frequencies, the advantage of the new Intel Core microarchitecture over NetBurst in SPEC CPU2000 integer tasks is over twofold! (2.17-fold, to be more exact).
Tests with real figures demonstrate a less simple picture. But on the whole, Conroe is still superior to Presler. These results depend more on a given optimization and task, which gain practically nothing from Intel Core 2 (for example, 171.swim). 168.wupwise and especially 179.art demonstrate interesting results, having reached the advantage of 145% and 184% correspondingly. The smallest advantage in these tasks is demonstrated with the -QxP optimization, which probably speaks of higher efficiency of this optimization on NetBurst cores supporting SSE3. This optimization gets the smallest advantage on Intel Core 2 in the PECfp_base2000 total score — about 26%, while the other optimizations demonstrate the 33-38% advantage. Considering the difference in clock frequencies of these processors, the advantage of Intel Core microarchitecture over NetBurst in SPEC CPU2000 tasks with real figures is also nearly twofold (up to 1.93 times).
Efficiency of dual cores
And finally, as Intel Core 2 Duo E6700 is a dual core processor, let's evaluate the efficiency of running two SPEC CPU2000 instances, using the rate metrics. The reference point here is the results obtained in this metrics with one instance running.
Efficiency of running two instances of integer tasks is very high practically in all cases, except for 181.mcf. According to our previous results, this task cannot boast of high "parallel" efficiency on the other cores either, such as Intel Pentium Extreme Edition (Presler) and Intel Core Duo (Yonah). In our previous analysis dedicated to Yonah, we assumed that such low efficiency of parallel processing of this task had to do with the reduction of available L2 Cache per core (in this case — from 4 MB to 2 MB), while this task required high cache/memory bandwidth. Nevertheless, it must be noted that the task demonstrates higher parallel efficiency on Intel Core 2 Duo (Conroe) compared to Intel Core Duo (Yonah). The relative result is at least non-negative in most optimizations. That's probably the effect of a much larger L2 Cache (4 MB versus 2 MB in Yonah). It's also reasonable to assume that it's the larger L2 Cache that has a positive effect on the relative results in all other tasks, which are better compared to those for Yonah. For example, the SPECint_rate2000 mean score demonstrates a 78-82% gain when you run two instances of SPEC CPU2000 integer tasks on Conroe, while the same conditions on Yonah yield only 67-70% gain.
A noticeably better picture (compared to Yonah) is also demonstrated in SPEC CPU2000 real tasks. Firstly, there are absolutely no results with negative gain (demonstrated on Yonah, for example, by 179.art). Secondly, the overall efficiency of running two instances of these tasks is also higher in most cases. According to the SPECfp_rate2000 mean score, performance gain from running two instances of tasks on Conroe is 54-63%, which is also higher compared to Yonah results (48-53%).
According to SPEC CPU2000 performance tests, the new Intel Core 2 Duo E6700 processor, represented by an engineering sample, is currently unrivalled. Compared to the previous performance leader (Intel Pentium Extreme Edition 965), the new processor demonstrates the 55% advantage in integer tasks and 26-38% advantage in tasks with real figures. Considering the clock ratio of these processors, the advantage of the new Intel Core microarchitecture over the old NetBurst is more than twofold in the first case and nearly twofold in the second case. The new processor is also notable for high efficiency of running two instances of SPEC CPU2000 tasks, which is noticeably higher compared to its closest counterpart — Intel Core Duo.
Analysis of the currently available revisions of Intel C++/Fortran 9.1 compilers shows that the best optimization for Intel Core 2 processors is still -QxP, which corresponds to Yonah and Prescott cores (Smithfield, Presler) supporting SSE3 instructions. The new specific -QxT optimization for Intel Core 2 processors supporting SSE4 is slightly inferior in SPEC CPU2000 performance. It obviously requires some fine-tuning in these compilers.
Dmitri Besedin (email@example.com)
July 04, 2006
Write a comment below. No registration needed!