CPU Performance Test Procedure 2005: Drawing a Bottom Line

Frankly speaking, when we announced our unified procedure for testing desktop x86 CPU performance a year ago, we didn't know what would come out of it. We had an idea. We had our reasons: "It can be convenient and useful from some points of view". We wanted to try this idea in practice and see what would come out of it. Of course, we wanted to do a great job — why do it at all otherwise? A year passed. For this year (despite some negative feedback) we had been persistently testing processors strictly according to the new procedure. It's time to look at what we've got.

No tests were carried out, as this article was written. It's important. Important, because it appeared only thanks to the concept, which was accepted a year ago: to test all processors using the same software with the same options, that is a fully unified approach. That's why we can now publish this article to compare performance of forty (forty!) processors from the two leading manufacturers — completely based on the old results. Obviously, no other conditions would have allowed to gather such volume of results to be compared in reasonable time. It's up to our readers to decide how sound this advantage is versus shortcomings of this approach...

Testing

Testbed configurations

Processor	Socket	Clock (GHz)	L2 Cache (KB)	Memory	Motherboard	Video card
AMD Athlon 64 FX-60	Socket 939	2x2.6	2x1024	DDR400*	ASUS A8N-SLI Deluxe	Radeon X800
AMD Athlon 64 FX-57	Socket 939	2.8	1024	DDR400*	EPoX EP-9NPA+ Ultra	Radeon X800
AMD Athlon 64 FX-55	Socket 939	2.6	1024	DDR400*	EPoX EP-9NPA+ Ultra	Radeon X800
AMD Athlon 64 X2 4800+	Socket 939	2x2.4	2x1024	DDR400*	ASUS A8N-SLI	Radeon X800
AMD Athlon 64 X2 3800+	Socket 939	2x2.0	2x512	DDR400*	ASUS A8N-SLI	Radeon X800
AMD Athlon 64 4000+	Socket 939	2.4	1024	DDR400*	EPoX EP-9NPA+ Ultra	Radeon X800
AMD Athlon 64 3700+	Socket 939	2.2	1024	DDR400*	EPoX EP-9NPA+ Ultra	Radeon X800
AMD Athlon 64 3200+	Socket 754	2.0	1024	DDR400*	ASUS K8V Deluxe	Radeon 9800Pro
AMD Athlon 64 3000+	Socket 754	2.0	512	DDR400*	ASUS K8V Deluxe	Radeon 9800Pro
AMD Sempron 3400+	Socket 754	2.0	256	DDR400*	ASUS K8V Deluxe	Radeon 9800Pro
AMD Sempron 3300+	Socket 754	2.0	128	DDR400*	ASUS K8V Deluxe	Radeon 9800Pro
AMD Sempron 3100+	Socket 754	1.8	256	DDR400*	ASUS K8V Deluxe	Radeon 9800Pro
AMD Sempron 3000+	Socket 754	1.8	128	DDR400*	ASUS K8V Deluxe	Radeon 9800Pro
AMD Athlon XP 3200+	Socket 462	2.2	512	DDR400*	Albatron KX18D Pro II	Radeon X800
AMD Athlon XP 3000+	Socket 462	2.167	512	DDR400*	Albatron KX18D Pro II	Radeon X800
AMD Sempron 3000+	Socket 462	2.0	512	DDR400*	Albatron KX18D Pro II	Radeon X800
AMD Sempron 2800+	Socket 462	2.0	256	DDR400*	Albatron KX18D Pro II	Radeon X800
Intel Pentium XE 955	LGA775	2x3.46	2x2048	DDR2-533**	Intel D975XBX	Radeon X800
Intel Pentium XE 840	LGA775	2x3.2	2x1024	DDR2-533**	Intel D955XBK	Radeon X800
Intel Pentium D 840	LGA775	2x3.2	2x1024	DDR2-533**	Intel D955XBK	Radeon X800
Intel Pentium D 820	LGA775	2x2.8	2x1024	DDR2-533**	Intel D955XBK	Radeon X800
Intel Pentium 4 XE 3.73	LGA775	3.73	2048	DDR2-533**	Intel D955XBK	Radeon X800
Intel Pentium 4 670	LGA775	3.8	2048	DDR2-533**	Intel D955XBK	Radeon X800
Intel Pentium 4 560	LGA775	3.6	1024	DDR2-533**	Intel D955XBK	Radeon X800
Intel Pentium 4 540J	LGA775	3.2	1024	DDR2-533**	Intel D955XBK	Radeon X800
Intel Pentium 4 520	LGA775	2.8	1024	DDR2-533**	ASUS P5GDC-V	Radeon X800
Intel Pentium 4 520	LGA775	2.8	1024	DDR400*	ASUS P5GDC-V	Radeon X800
Intel Pentium 4 2.8E	Socket 478	2.8	512	DDR400*	Gigabyte GA-8IPE1000	Radeon 9800Pro
Intel Pentium 4 2.8C	Socket 478	2.8	512	DDR400*	Gigabyte GA-8IPE1000	Radeon 9800Pro
Intel Pentium 4 2.8A	Socket 478	2.8	1024	DDR400*	Gigabyte GA-8IPE1000	Radeon 9800Pro
Intel Celeron D 345J	LGA775	3.06	256	DDR2-533**	ASUS P5GDC-V	Radeon X800
Intel Celeron D 340J	LGA775	2.93	256	DDR2-533**	ASUS P5GDC-V	Radeon X800
Intel Celeron D 335J	LGA775	2.8	256	DDR2-533**	ASUS P5GDC-V	Radeon X800
Intel Celeron D 335	Socket 478	2.8	256	DDR400*	Gigabyte GA-8IPE1000	Radeon 9800Pro
Intel Celeron D 330J	LGA775	2.66	256	DDR2-533**	ASUS P5GDC-V	Radeon X800
Intel Celeron D 325J	LGA775	2.53	256	DDR2-533**	ASUS P5GDC-V	Radeon X800
Intel Celeron 2.8	Socket 478	2.8	128	DDR400*	Gigabyte GA-8IPE1000	Radeon 9800Pro
Intel Pentium M 780	Socket 478	2.26	2048	DDR400*	ASUS P4GPL-X	Radeon X800
Intel Pentium M 770	Socket 478	2.13	2048	DDR400*	ASUS P4GPL-X	Radeon X800
Intel Pentium M 760	Socket 478	2.0	2048	DDR400*	ASUS P4GPL-X	Radeon X800

* — 2-2-2-5 timings, Corsair
** — 3-3-3-8 timings, Corsair

In this article we decided to skip diagrams with detailed test results (over 60 diagrams, 40 processors each — that's too much...). But if you want to have a look at the detailed results of a given processor, you'll find the link without leaving this page: each processor title in the table with testbed configurations is a link to detailed tests results of a given CPU.

Results

I'd like to elaborate on some designations on the diagrams. For example, we sometimes specify a socket after a processor to avoid misunderstanding. So, "Sempron 3000+/462" stands for Sempron 3000+ for Socket 462 (Socket A). Some processors are marked with an asterisk. It means that a given configuration was tested with a less powerful video card (ATI Radeon 9800 Pro AGP), while our standard video card is ATI Radeon X800 PCI Express x16. Thus, if application performance may depend not only on a processor, but also on a video card, it has to be taken into account for the comparison.

SPECapc for 3ds max 6 + 3ds max 7

The situation in this benchmark was traditional: AMD processors were victorious in Interactive sub-test, Intel processors — in Rendering sub-test. AMD processors would most often have higher total score, because (1) the advantage in their favorite sub-test was often larger than the advantage of Intel processors in rendering; (2) SPEC test gives a higher weight ratio to the Interactive sub-test. The situation grew even worse for Intel with the appearance of dual core processors — AMD dual core clocks are close to those of top single core processors from this company, which we cannot say about Intel processors. As a result, Intel's defeat in the Interactive sub-test is aggravated by the defeat in rendering, as 3ds max render engine can be distributed well among processors, so dual core processors from AMD got an advantage.

The consequences of the above said can be seen on the diagram: all the first three places are taken by AMD processors, two of them are top dual core processors, the third one is a top single core processor. Interestingly, Intel processors rank in exactly the same order relative to each other (two top dual core processors followed by a top single core processor) — they just take Places 4, 8, and 9 correspondingly. On the whole, Intel processors look rather pale in SPECapc for 3ds max 7: eight processors from Intel are in the group of ten outsiders, one of them being even a Pentium 4, while there are only two processors from AMD in that group - Semprons for Socket A. We can note quite a good result of the Pentium M: a top processor from this series is practically on a par with the Pentium 4 670 and the Athlon 64 3700+ for Socket 939.

SPECapc for Maya 6 + Maya 6.5

SPECapc for Maya has no rendering sub-test, that's why dual core processors cannot really show themselves: there are only three of them in top ten. Besides, the interactive engine of the program is evidently more loyal to Intel than 3ds max 7. It can be seen well in the results. Victory is out of the question, though — just a less noticeable defeat: two top places are still captured by AMD, but the top ten list contains more than a half of Intel processors: 6 models. It's Intel CPUs that show Maya's love of large L2 Cache: all the four processors from this company with top results (Pentium 4 670, Pentium 4 XE 3.73 GHz, Pentium XE 955, and even Pentium M 780) are equipped with 2 MB L2 Cache.

Mid-rank processors from Intel and AMD are more or less on a par. But all Celeron models without exception got into the list of ten outsiders. There are only a few Athlon XP and Sempron models for Socket A. Interestingly, respectable elderling Athlon XP 3200+ managed to avoid the worst ten positions; moreover, it outperformed Sempron 3000+ for Socket 754. However, if we recall the above mentioned hypothesis about Maya's love of large cache, everything becomes crystal clear: though old, AXP 3200+ has a 512KB Cache, while the newer Sempron 3000+ for Socket 754 has just a 128KB L2 Cache.

Lightwave 8.2, rendering

While Maya has only the Interactive sub-test in our test procedure, Lightwave, on the contrary, offers only the rendering sub-test. Consequently, we have new leaders in top positions of the diagram: there are only three single core processors in top ten, the other processors are all dual core models. The first two places are formally taken by AMD, but the breakaway of the A64 FX-60 and A64 X2 4800+ from the top dual core Intel Pentium XE 955 is not as large as in case of 3ds max 7 and Maya 6.5. Interestingly, Pentium D 820 and Pentium 4 670 demonstrate similar results: if we discard the difference in cache sizes (which is actually possible, as we haven't previously seen any significant difference between 1MB and 2MB L2 Caches in Lightwave), we'll see how dual cores correlate with clock: a 2.8 GHz dual core CPU demonstrates the same rendering performance as a 3.8 GHz single core processor. Of course, this result is far from twofold, but it's still impressive...

The bottom ten situation is threadbare: all Celeron processors (including new models for LGA775) and several AMD processors for Socket 462, rapidly growing outdated. Looking at the constant refrain on the diagrams, it's hard not to agree with the popular opinion among AMD fans that Intel offers no normal low-end processors. But we would shift the stress: low-end processors from Intel are the very normal classic low-end: inexpensive low-performance processors. It's AMD that still "punks" in memory of its aggressive youth and launches good middle-end processors as low-end. That's not the best policy in market terms, as the Low End bites off a part of middle-end sales, though middle-end profits are higher. However, users only benefit from this policy, so nobody (including us) minds such disrespect to marketing :).

SPECapc for SolidWorks 2003

SolidWorks 2003 obviously prefers the AMD K8 architecture, we can see it well in the results. A rare diagram shows such unanimity: even though there are more Intel processors in top ten than in some other cases, the first six positions are taken by AMD CPUs. A number of cores seems to be of no importance: only the clock matters in the competition between CPUs of the same architecture. Well, not only the clock, a cache size is also important. It's easy to make out in the results of Intel processors (the best CPUs are equipped with 2MB L2 Cache) and AMD Sempron for Socket 754: 3100+ outperforms 3000+; 3400+ respectively outperforms 3300+ (these pairs differ only in L2 Cache size from each other). But the cache size didn't save Pentium M: this processor demonstrates more than average results in this test, approximately on the level of Athlon 64 3200+. That may be the fault of the relatively weak FPU. The list of ten outsiders looks traditional: sometimes we even have an impression that Celeron is included into this article only to fill in the bottom positions on the diagrams ;).

Adobe Photoshop CS (8)

Let's run the compulsory program first: there is no authentic (from the manufacturer) benchmark for Adobe Photoshop, so each test lab that uses this program for testing CPU performance chooses one of the two ways: it either uses another test lab's (or some tester's) project, or develops it own test procedure. We've taken up the second road, though we cannot say we chose it completely on our own. The main concept of the Adobe Photoshop test script from iXBT.com consists in using as few complex filters as possible. Instead, we employ the majority of popular functions. They are various Blur and Sharpen filters, color space conversions (RGB —> CMYK —> LAB), lighting effects, rotation, resize, as well as transformation operations. They may be not the heaviest modes, but they are very often used by the majority of designers. Thus, execution time of these operations is a factor, which irritates Adobe Photoshop users most often (if an operation is too slow).

The other side of our concept is that the developer tries to optimize and accelerate the functions that we use in our script. That's why we have the maximum number of optimizations. As a rule, all of them can take advantage of the latest additional instruction sets. As a rule, all of them are also optimized for multi-processor systems — unlike the majority of complex filters (which are often not developed by Adobe, by the way).

Competent optimization naturally leads Pentium XE 955 to the first place. It's no secret that thorough code optimization for NetBurst architecture gives it a serious advantage (unlike AMD processors, which are notable for high performance with non-optimized code). But if we analyze top ten positions instead of only top three places, Intel's advantage gets less prominent — there are four AMD processors out of ten models that demonstrated the highest performance in Adobe Photoshop. But on the whole, Intel is "in the saddle" in this test: in the list of ten outsiders, even Celeron 2.8 GHz (old model, 128 KB Cache) managed to outperform both Semprons for Socket A and even both Athlons XP, while new Celerons D confidently outscore Sempron for Socket 754. There can be only one comment here: that's what you can get from NetBurst, if you invest proper time into a competent optimization.

Adobe Acrobat 6.0 Distiller

It's the simplest and at the same time the most unpredictable test among those we use in our Test Procedure 2005. No matter how hard we tried to find a pattern in the results, it still escaped us. Clock frequency? But why did Athlon 64 FX-60 and X2 4800+ outscore Athlon 64 FX-57? Dual cores? But why did Pentium 4 670 outscore Pentium XE 955? Cache size? But AMD Sempron 3000+ and 3100+ (as well as 3300+ and 3400+) have practically the same results! In short, the program acts "as if bitten by a maggot" and it seems impossible to do anything about it. Thus, it makes absolutely no sense to try and comment on these results. We just have to take them as they are.

All-purpose data compression (archiving)

Strange as it may seem, the most interesting part on the diagram with archiving results is not on top, but on the bottom. It's a very rare case, when Intel Celeron processors demonstrate their worth. To be more exact, Intel Celeron D. Hardware designers just had to equip the high-frequency integer unit of the NetBurst core with a more or less decent L2 Cache (Celeron D has a 256 KB Cache, unlike the first Celeron models based on NetBurst architecture, which were equipped with a 128 KB cache) — at least now, the low end processor from Intel can compete with the old AMD Socket 462 platform in archiving performance. But that's where new Celeron's success ends: Sempron for Socket 754 is strong meat for them.

The top part is dominated by AMD, both dual core and single core processors. Intel is represented by two dual core processors and one single core processor in top ten. In fact we can pick out another pattern: two out of three Intel processors are equipped with 2MB cache and 1066 MHz FSB. What has played the main role in Intel CPU ratings? The second assumption seems more correct: Pentium 4 XE 3.73 and Pentium XE 955 stand near each other, while Pentium D 840 is significantly lower. Despite the large cache, Pentium M does not demonstrate brilliant results: that may be an effect of the relatively slow 533 MHz FSB.

Multimedia lossy compression (MP3/MPEG2-4)

One of the largest blunders in Test Procedure 2005 is inadequate representation of Intel and AMD CPU performance in the average score of the media encoding test. It happens because AMD processors are always defeated nearly twofold by Intel processors in LAME test with Q=0 option. On one hand, everything looks correct — they are outperformed, so what? On the other hand, AMD processor may often win 5 tests out of 6, but it will still get a lower average score due to one heavy loss. Nothing can be done about it. So, we have learnt this lesson for the future, but we'll have to evaluate audio and video encoding performance according to the last year's procedure.

Strange as it may seem, AMD Athlon 64 FX-60 has broken through to top three, to the second place. It's not hard to guess the reason: excellent video encoding performance in SMP optimized applications (especially Windows Media Video and Canopus ProCoder 2) and in DivX 5 codec (favoring AMD) helps to make up for LAME results. But the situation in top ten is much worse: there are only three CPUs from AMD there. The situation with ten outsiders is contrary: three processors from Intel come with seven CPUs from AMD. Celeron D 345J absurdly outperforms Athlon 64 3700+ for Socket 939 in the middle part of the diagram. Of course, we have published the summary diagram for media encoding tests to preserve a complete picture — but you shouldn't use it to evaluate real advantages and disadvantages of processors without looking into the detailed results.

CPU RightMark 2004B

CPU RightMark render engine is optimized for multi-processing. Judging from the summary table with the results of all CPUs tested, its optimization is really good. Judge for yourself: none of the single core processors managed to outperform any dual core processor! Top three places are distributed to AMD's advantage, top ten is dominated by Intel (there are more processors from this company), but it happens only because we tested more dual core processors from Intel. Celerons mostly flock at the bottom, but that's not surprising: these processors lack even such a traditional advantage of Intel processors with NetBurst architecture as Hyper-Threading (its usage in CPU RightMark allows to squeeze several additional performance percents from a processor). In other respects within the same architecture, it's all up to clocks, which is evidently shown on the diagrams. Like many compute/render benchmarks, CPU RightMark is nearly indifferent to L2 Cache size and bus throughput (except for the "killer" combo of a NetBurst core, 400 MHz bus, and 128 KB L2 Cache — Celeron 2.8).

3D games and graphics visualization
in professional packages

Total score in games

For your information, fps shown on the diagram is a geometric mean of fps in four different games: Far Cry, DOOM 3, Painkiller, and Unreal Tournament 2004. For this article we decided to use the diagram with the total score for 800x600 and average quality settings, as our shootout includes quite a lot of low-end CPUs. Besides, here we use testbeds with different video cards, one of which is noticeably weaker than the other. So theoretically, a relatively low resolution and an easy graphics quality mode should smooth differences in video performance and bring CPU performance to the foreground. You can easily see that top three places are taken by AMD processors only. That's not surprising, as we all know preferences of modern games. Top ten game results deal a final blow to NetBurst: there are only three processors from Intel in top ten, two of them being... Pentium M! Only the top Pentium 4 eXtreme Edition managed to break through to Place 8(!).

The ten bottom positions are a coup de grace: eight processors from Intel, including all Celeron models tested, and just two AMD Semprons, both for the outdated Socket A (they were tested with a less powerful video card than, for example, Celeron D xxxJ). Thus, the situation in games is so straightforward that it requires no wordy comments. In this very comparison, it would have been very difficult to change the layout of forces, even by adjusting test software: if you read reviews from other independent test labs, you'll understand that AMD processors are leading in practically all modern games, not only in those included into our test procedure.

SPEC viewperf

Even though 3D visualization tasks in games and professional packages are similar in many aspects, SPEC viewperf results demonstrate a slightly different picture: more Intel processors appeared in top ten (4 models already), Pentium M didn't make it — only high-clocked Pentium 4 processors, three of them having 2MB L2 Cache. It seems coincidental, as Pentium M also has a 2MB L2 cache (these processors are in the middle of the list in this diagram). But remember that unlike Pentium M, a Pentium 4 processor has a much faster bus (at least 800 MHz in the top ten models, Pentium 4 XE 3.73 and Pentium XE 955 offer 1066 MHz). Thus, we can only assume that a combination of two factors helped NetBurst look a tad better in professional packages: a huge 2MB cache (for desktop CPUs) and a fast processor bus. The worst ten results indirectly confirm the love of cache in 3D visualization professional packages — they are mostly demonstrated by processors with small L2 cache. Celeron 325J (2.53 MHz, 256 KB L2 Cache, 533 FSB) seriously outperforms Celeron 2.8 GHz (a higher clock, but a 128KB L2 Cache, and 400MHz FSB). Intel and AMD processors are on even terms in the middle of the diagram. We can note a compact group of five Intel processors running, which includes all the three desktop Pentium M models.

Total score

In the past, we would use the results in this diagram. It's a geometric mean of all the results in the previous diagrams (if higher results were better, they were marked with "X"; if lower results were better, they were marked with "1/X"). On the whole, it's like the proverbial average temperature of all patients in a hospital. Of course, like any primitive averaging (everything is included, no weight numbers), this average score hardly reflects the average CPU performance for various tasks correctly. But... We just decided like this: let's calculate the average score first and then see what'll come out of it. Strange as it may seem, our intuition never failed us in terms of processor rating: we have pictured a similar overall situation purely intuitively. That's why we decided to leave this diagram in the article: let it be...

The top three positions are taken by the three most powerful dual core processors. It's a proof that our test procedure forestalls industry tendencies rather than detects them, as it contains quite a lot of SMP-optimized applications. This peculiarity of our test procedure is also noticeable in the top ten analysis, where the competitors demonstrate touching unanimity: five processors from each manufacturer, only two out of five are single core models. On the whole, we can say that our Test Procedure 2005 is SMP-optimized or even SMP-optimistic. Well, it really conforms with modern tendencies in the processor sector. Though some readers may reproach us with putting a cart before a horse...

Intel "leads" in the bottom ten positions (six processors versus four models from AMD). But that's not the most funny part. If you have a closer look, you'll see that this group includes four Celeron processors for the latest platform from Intel — LGA775, while AMD processors are represented... solely by the outdated Socket 462 platform (Socket A). No comments: he that hath ears to hear, let him hear...

In the middle of the diagram, Intel and AMD processors are present in even proportions. We can just point out a long line of Intel processors (closer to the top, five positions running), which includes a compact group of all the three desktop Pentium M models. It's also an interesting sign, especially as we have already seen this situation in many diagrams (but with a tad different NetBurst participants).

Those readers, who feel pessimistic about the development of modern x86 CPUs, should study the diagram on the whole. You will see that the average performance of the slowest and the fastest processors differs by more than 2.5 times. What is it, if not progress?..

Conclusion

The article devoted to a fully unified procedure for testing x86 desktop CPU performance, the first in the iXBT history, was published a year ago. The first material, based on the updated procedure — about eight months ago. The latest material has been published quite recently. Thus, the test procedure has lived though a full life cycle of approximately one year. For this time we have tested over 40 processors (the article does not include the results of Xeon- and Opteron-based servers/workstations). The accumulated information is sufficient to draw certain conclusions. So, what are the key advantages of the unified test procedure, which is not modified for quite a long time?

Wide-spread discussion of the pre-release of the test procedure as an integral set of tests, options, and data representation methods allows to explain our CPU test objectives to our readers in the first place, and on the other hand, to get their feedback about the test procedure as a finished, integral system.
Unified nature of benchmarks and options allows not to test one processor two times, thus reducing article preparation times and increasing the test lab performance.
It also allows to compare the results of processors from different articles.
And finally, a unified test procedure, unchanged for a long time, allows to write such articles as this one in reasonable time and without fatuous efforts.

What are shortcomings of this approach?

As soon as we start actively using a procedure for testing processors, it's practically impossible to upgrade any benchmark to a newer version, as it will cancel the main advantage: cross-comparison of results from different articles and reusability of old results in new articles.
The same applies to introducing new tests. But we still added Maya 6.5 rendering a tad later — as you can see, we haven't included it into the final article, as there are no results of this test for older processors.
The unified nature inevitably swells the test procedure, as the set of benchmarks must detect positive and negative properties of various processor classes. As a result, we must test Celeron and Sempron processors in 3ds max and Maya, while dual systems based on Opteron and Xeon must be tested in DOOM 3 and Far Cry.
Sometimes reality imposes alterations, forcing us to make the results not quite comparable. Such a thing happened this time, as it was impossible to use the same video card in systems with AGP and PCI Express buses.

As you can see, we don't try to pretend that our approach is ideal and free of drawbacks. But we still think that its advantages outweigh disadvantages. That's why we are currently working on a new version of Unified Test Procedure 2006. It's too early to discuss it, but if you have comments on the approach as such — there is a link below to discuss this article in our forum.

Stanislav Garmatiuk (nawhi@ixbt.com)
January 30, 2006.

Write a comment below. No registration needed!