In this review we'll examine the latest signficant addition to Intel's processor series - the Intel Core 2 Extreme QX9650 processor based on the Yorkfield core. The core belongs to a new family designed for mobile, desktop and server processors generally codenamed Penryn. Frankly speaking, the majority of users have been most anticipating the updated dual-core solutions on Wolfdale core, since quad-core products interest a considerably smaller percent of potential buyers. But this time Intel did it "AMD style", having began to promote a new processor series with a product for enthusiasts and servers. Thus after Core 2 Extreme QX9650 we should see a triplet of Yorkfield-based Core 2 Quad CPUs:
And a quadruplet of Wolfdale-based Core 2 Duo processors:
But this is to happen in the year 2008. And now we'll examine the only processor in the new series already available... Available to our test lab, at least. :)
Intel Core architecture: a mild upgrade
The transition to a new process technology always has a beneficial effect on developers' imagination, because it means more transistor space, natural reduction of power consumption and heat without any original ideas involved. So, it's no wonder that they've decided to modify the good old Conroe as well - in a number of rather ordinary ways at that.
The maximum volume of shared L2 cache was increased to 6 MB (vs. the older 4 MB). The amount of associativity channels was correspondingly increased from 16 to 24 (6/4=24/16). Also, the cache became "smarter" thanks to the new Enhanced Cache Line Split Load technology that tries to speed up reading of data units distributed among different cache strings. Theoretically, it can speed up applications that actively scan large memory volumes, codecs and archivers included.
Newer instruction set
Intel has long been the trendsetter of x86 instruction set expansions that don't require crucial revamps - MMX, SSE, SSE2, SSE3... There was a time, when AMD tried to compete by creating the 3DNow! set. But that was also the last of its efforts, and now AMD prefers just to license instruction sets from Intel.
The new expansion is named SSE4.1 meaning some incompleteness and implying at least the announcement of SSE4.2. SSE4.1 features 47 new instructions aimed at speeding up streaming data processing, video encoding, and scientific calculations. We won't be dwelling on it more in this review, since SSE4.1 deserves a whole dedicated article. We'll just say that, of popular software, at least DivX v6.7 already supports SSE4.1.
Newer functional units
The most changes are related to fast division and shuffle units - introducing the newer Fast Radix-16 Divider and Super Shuffle Engine. Conroe's Radix-16 divider processed 2 bits per pass, while the newer Fast Radix-16 divider processes 4 bits per pass. In its turn, the newer Super Shuffle Engine now provides for any bit shuffle operations in a 128-bit register per 1 clock. According to Intel, this should considerably speed up both the newer SSE4.1 and "older" SSE3 instructions. Besides, we are promised routine virtualization improvements.
In general, this is suspiciously similar to the Prescott, don't you think? :) But we still hope the resemblance is purely formal.
Hardware and software
* - "2 x ..." means per core;
Essential foreword to charts
Our test method has two peculiarities of data representation: (1) all data types are reduced to one - integer relative score (performance of a given processor relative to that of Intel Core 2 Duo E4300, given its performance is 100 points), and (2) detailed results are published in this Microsoft Excel table, while the article contains only summary charts by benchmark classes. We will nevertheless focus your attention on detailed results, when needed.
3D modelling suites
The QX9650 seriously aimed to win from the very outset by outperforming the closest competitor by 6.5%. The value itself is modest, but let's not forget that QX9650 and QX6850 have identical clock rates, so the difference is provided by other reasons.
The reaction of suites the new processor was very different. MATLAB even liked the older QX6850 more, with it slightly outperforming the QX9650. SolidWorks remained nearly indifferent to the new CPU showing a mere 3% of performance boost. But Pro/ENGINEER met the QX9650 with full approval with it outperforming the QX6850 by nearly 6%.
Digital photo processing
Even if we look at the detailed results, no subtests would be distinguished. The QX9650 was always a bit faster than the QX6850, which made it generally slightly faster in the end. Thus it's hard to say why exactly the new CPU won - either cache volume, or improved calculation units.
Since compilers like large volumes of cache, the result is predictable. On the one hand, it's good. On the other, the rest of advantages remained hidden on the background of QX9650's evident advantage in L2 cache volume.
The matter we've already mentioned this article negates all the architectural and especially L2 cache advantages of Intel's modern quad-core processors.
Merely a stunning result. If we look at the details, we'll see that the key was QX9650's nearly two-times dash forward in the Solver, an unparallelized physical model calculator. I think we'll ask the CPU RightMark team for a dedicated examination of this, but the main conclusion is so obvious that it, most likely, is true. What we've seen here is a result of improving processor's calculation units, since CPU RM is rather indifferent to cache volume (and that has been proven by a large number of previous tests.)
Such a modest result, considering the 1.5-time increase of L2 cache. We suppose the testbed memory has become a bottleneck here.
Here, judging by the upper three lines, we've bottlenecked by something else than a processor. Memory engine, perhaps? Then it should show in the future tests involving DDR3-1333 modules.
An old group of tests that has nearly lost its importance due to high predictability of results. No comments.
The QX9650 hasn't shown any special advantages though we've had some hopes in improved calculation units after the CPU RightMark tests. Not satisfied with our standard tests, we wanted to uncover the delights of DivX 6.7 with SSE4 support. This "custom" test was conducted with the following settings:
As you can see, the DivX 6.7 introduced a new Experimental SSE4 full search option. Frankly speaking, the word "experimental" means much to experienced people. Translated from geekspeak to English, it usually means: "Impressed with the new features, we've programmed something, but we don't take it seriously."
The results we've got look rather strange. See for yourself:
Enabling that option actually slows down encoding. However, while the performance drop is significant with SSE2, it's nearly imperceptible when being processed by SSE4.
Therefore, based on our results and Occam's razor, we can suppose that DivX developers had a kind of a "dream feature" processed too slowly to realize until SSE4.1. And suddenly an opportunity occurred.
We just hope that the "dream feature" does actually improve the encoded picture or compression level, since otherwise it's not clear why realize it, in the first place. :)
Despite the impressive victory of the QX9650, we'd still like to draw your attention to the detailed results. It's clear that the new processor showed its key advantage in the Low and sometimes Medium Quality modes. This speaks for some very nice prospects (meaning the novelty has more than enough power for games as well.) But its also means that QX9650 is, most likely, excessive for a gaming rig of today, since at highest resolutions and quality settings the performance is anyway bottlenecked by the graphics engine. This merely negates the difference between the QX9650 and QX6850.
The charts are rather clear. We'll just mention that the QX9650 showed some greater performance boost in "professional" applications than in "home" software. In general, this is a purely positive trend: let the performance monsters act better where it's really needed.
Supposed power consumption
Despite the claimed 130 W TDP (identical to the QX6850's), the real power consumption of QX9650 equals 76 W at 100% load. It's even lower than that of the 65nm lower-clock QX6700. And even if we consider this value as absolute, 76 W is really low for a top CPU in the series. It's been a long time since we saw a high-end processor not exceeding the symbolic 100-watt line.
Fortunately, they haven't made a Prescott out of a Penryn. At the same core clock the new processor performed 8% faster than the old one (on average). Besides, some details indicate that has been achieved not only due to the extensive approach (e.g., increasing L2 cache volume), but also thanks to the actual speeding up of calculation units. This doesn't make it a new architecture, of course, but the modification is still significant. Given the lower-end products based on this core are available all right, Intel's main rival has a reason to worry. While we still wait for a real AMD K10 in a desktop, Intel's new architecture again surmounts the next performance ridge. If this continues, someone may find their new processors slower than competitor's older ones...
Testbed memory modules provided by
Corsair Memory Russia
Stanislav Garmayuk (firstname.lastname@example.org)
November 26, 2007
Write a comment below. No registration needed!