iXBT Labs - How CPU Features Affect CPU Performance, Part 6

Next page >>

Conclusions

The moment of truth

Well, it's time to explain the last unsolved phenomenon: mass cases of superlinear performance growth at the 2.26 to 2.66 GHz transition. We may sound boring to someone :), But let's do it together from the very beginning.

So, what is necessary for a superlinear performance gain to appear at one of the transitions? It looks like a silly question (as the answer is lying on the surface: the next processor must be superlinearly faster). But let's not jump at conclusions: being faster is not the only answer. If we present our hypothetical ideal performance as a frequency function, that is speed (S) = frequency (F) * some coefficient (K), superlinear performance growth is impossible. So what it needs to appear? It needs the next-in-frequency processor to get a heavenly bonus (+B) or... the previous processor to get a negative bonus (-B), that is be slower than it should have been according to its clock rate. Do you feel the change? Now we have to find the answer to one of two questions: it's either "why is the 2.66 GHz processor so fast?" or "why is the 2.26 GHz processor so slow?" We also cannot rule out a possibility that there exist answers to both questions. (Right you are, it's true.)

We would have been looking for the answers probably much longer, but for a lucky fact: we understood that de facto, physically, it was one and the same processor. We only modified the multiplier to obtain different core clock rates. So if we discard the magic of little green men, there is only one answer to it: the problem is in the multiplier. But it's not the answer yet. It's only a search field.

We were also lucky to have different multipliers for the fast and slow processors: 17 and 20. The former is a prime number, that is it divides by itself and by one. The latter divides by 2, 4, 5, and 10. We cried out eureka as soon as we saw the number four -- that's right, uncore multiplier was always equal 16, and this number is also divisible by four!

Drawing a bottom line: probably overheads to coordinate the core and the uncore operating at different frequencies is a significant factor that can affect CPU speed. It's an awkward ratio between multipliers of the 2.26 GHz core and uncore -- 17:16. As 17 is a prime number, this fraction cannot be canceled. In case of the 2.66 GHz processor, this ratio is 20:16, which can be easily reduced to 5:4. To all appearances, the universal rule "the stronger the asynchronism, the worse" works here as well. An indirect proof of this fact is the second diagram that compares the ideal and the real average performance gains: a 2.66 GHz processor is much closer to its ideal than the 2.26 GHz CPU.

We don't insist that this hypothesis is true: this phenomenon requires further investigation. Perhaps even with low-level tests, which provide higher precision and a wider spread in such cases than tests with real software. However, this hypothesis looks valid. Besides, we don't have any other explanation.

As for the two cases of superlinear performance growth at the 2.66 to 3.06 GHz transition, alas, we have nothing else to do than to declare them artifacts, as they cannot be explained logically, and the number of such cases is so small that we can still write them off to accident.

We didn't expect such an abrupt growing difference between the ideal (appropriate to the frequency growth) and the real performance growth at 3.06 GHz. It actually means that even in the best case scenario performance of the hypothetical Core i7 3.46 GHz will be about 156 points (3.46 multiplied by the assumed efficiency of about 45 points per GHz) -- that's an optimistic forecast. On the other hand, the enlarged L3 Cache may raise overall efficiency, so it's too early to sound the alarm. It's confirmed indirectly by Intel's calm position. This company is in no hurry to announce new processor architectures, focusing on other aspects -- for example, graphics solutions and their integration with CPU. So we can say that our today's tests did not surprise us: as a rule within the same architecture, the higher the frequency, the lower the efficiency. It's common knowledge, and it's confirmed by our tests.

However, as we have run large-scale tests, it would be silly to stick to the CPU theme only, and leave programs aside. Let's see how groups of software from our test procedure respond to the growing processor clock rate? First of all, let's take the difference between two extremes: 1.86 GHz and 3.06 GHz.

It's an expectable outcome: scientific and engineering computations, rendering, archiving, video encoding. However, we were surprised to see audio encoding in the bottom of this list. However, the lowest position of games only confirms that we selected the right test options: performance in games with normal graphics settings shouldn't depend much on a processor.

And now let's take a look at the same rating, but from the point of view of the difference between two last positions: 2.66 GHz and 3.06 GHz. This diagram will help us answer the question what applications preserve good scalability even at top frequencies.

The first surprise has to do with the first location: Java applications are the best at scaling at higher frequencies. There are no more surprises: rendering, video encoding, scientific and engineering computations. On the whole, we can establish a fact that the last two diagrams do not disagree with our intuition: even without looking at the results, using only logics and common sense, any editor would have pointed out five leaders.

Summing it all up, we can develop the idea formulated in the paragraph above: there were no revelations, but our tests were still very useful because of theoretical predictability of results. There is nothing better than making sure from time to time that good old theoretical trends are still working.

Memory modules for our testbeds provided by Corsair Memory.

Write a comment below. No registration needed!

<< Previous page