Test procedures can be different. Due to the above mentioned diversity, we come up with only one important and wide-spread mistake that plagues their authors. This mistake consists in breaking the only fundamental rule: any time, since an alpha stage to a release, an author must be well aware of the objective. This objective can be unreal (to create a procedure that will satisfy everybody), real (to create a procedure that will satisfy the target audience), or there may be no objective at all — when the author has no idea of a user, who will be interested in the results of his tests, and uses all and sundry benchmarks in the hope of accidental true hit. Unfortunately, we most frequently come across Options One and Three. Option Three is probably more frequent ;).
The main rule of test procedures for desktop processors, developed in our iXBT.com testlab, consists in offering as much data as possible to those people who use their computers for work. To those people, who choose a processor not because "I have recently seen an ad running that it's cool", not "to outdo a neighbor", and not because "it's a funny thing to play with." Our potential audience are users, who actually don't need a computer per se. They need a tool to work on their tasks. It may be a computer, or a slide-rule, or a magic wand. Main properties of this tool must be very simple:
As you can see, everything is simple. Simplicity of the approach is our main boast. Of course, it has some drawbacks: this approach does not suit users, who are interested in "the prestige of their computers" rather than in practical usage of their systems. But their interests are satisfied by so many reviewers that, frankly speaking, we don't want to join the competition on such a crowded market.
The new Test Procedure 2006 (the second revision in the series of unified x86 CPU Performance Test Procedures) was developed as an organic successor to the previous test procedure, having absorbed all its advantages and eliminated as many drawbacks as possible, and as a modern procedure for benchmarking CPU performance that meets the key hardware and software development trends in general. From our point of view, test results below prove that our approach was correct. Let's put it like this: our purely subjective opinion (based on our personal experience) on relative performance of different CPUs does not contradict the results obtained. Not bad, by the way: it's well known that a true reviewer will tell you the barest truth as to what processor is faster even half asleep, when you woke him up in the dead of night — without consulting any diagrams, without trying to remember test results, even without waking up :).
Hardware and Software
* — "2x..." means "per each core"
You can easily notice that there are only two newcomers in our tests: Pentium D 805 and Pentium XE 965. They have no conceptual differences from the previously reviewed Intel processors. Pentium D 805 is a new junior model in the budget 800th series of dual core Pentium D processors, adapted for a slower bus and operating at a lower speed. Pentium XE 965 is an overclocked 3.73 GHz Pentium XE 955. Perhaps the most interesting newcomer is Pentium D 805 — as a record-breaker among dual core processors in price: this dual core processor can be bought for less than $150.
Description of the procedure with examples from real tests
Necessary preface to the diagrams
The new test procedure features two peculiarities of data representation: (1) all data types are reduced to one — integer relative score, and (2) the data representation form is modified — there will be one table in Microsoft Excel format available for downloading instead of multiple diagrams. Let's analyze our reasons for these innovations.
So why do we need scores? First of all because dimensions of quantities to be compared and their types often differ even within those application groups, which are represented in the body of an article in one common diagram. For example, let's take a summary diagram for 3D modeling packages. Each column is a mean result of SPEC apc for 3ds max 7.0 and Maya 6.5 (type — score) as well as scene render time in Maya 6.5 and Lightwave 8.5 (type - time). In this case the higher the score, the better. The situation with time is contrary, so we should take it into account - "1/X". The resulting mean figure expresses... nothing. It's nothing more than a symbolic score - the higher it is, the faster a processor is. Why not put some sense into this score? That's what we decided to do: let's take performance of some processor for 100 as a reference point. This reference point is Intel Pentium D 805 — the slowest desktop dual core x86-64 CPU (for now, but it seems to keep this position for good). Thus, if you see that some processor scored 150 points and the other one scored 120 points, it means that the first processor is 1.5 times as fast in given applications as Pentium D 805 and the second processor is 1.2 times as fast. So the former processor is 1.5/1.2=1.25 times as fast as the latter. Reference point for CPU performance for all reviews conforming with this test procedure will certainly remain unchanged, in order to allow cross-article comparison of processors. Score count procedures are open — they are published in the spreadsheet with results. So you may take a look at it, it will require only minimal knowledge of Microsoft Excel formulas.
Why a spreadsheet instead of a long web page with several dozens of diagrams? Firstly, the new procedure generates even more data than the old one, so plotting that many diagrams (unfortunately, this process is not yet automated) takes up much time and effort, noticeably delaying timely publication of articles. Secondly, but no less importantly: it's much more easy to extract data for processing from a spreadsheet than from diagrams (in the latter case, we just have to copy the data from the display, there are no other options). Thus, we have decided that publishing test results in a table will be even more convenient for those our readers, who really pay attention to detailed test results and who are ready to work with them. Besides, as we have already mentioned, it's also very convenient for us. Later on, this spreadsheet will be probably replaced with an online database with an interactive engine to allow our visitors to plot any diagrams and tables to their liking by choosing several processors for comparison from drop-down menus. It will hardly be ready tomorrow, but the work is in progress.
3D Modeling and Rendering
This diagram averages the results of four tests into a mean score: SPECapc for 3ds max 7, SPECapc for Maya 6.5, and render speed tests in Maya 6.5 and Lightwave 8.5 (the latter package is a 64-bit version). Changes from the previous version are rather insignificant: a new revision of SPECapc for 3ds max 7 and a new version of Lightwave — 8.5 instead of 8.2. Besides, Lightwave 8.5 now supports AMD64/EM64T, so we have taken advantage of this fact and now use the 64-bit version of this package.
CAD (computer-aided design)
This section was significantly smaller in the previous procedure. There was no "section" at all — just a single application (SolidWorks 2003), which embodied a whole class. We chose to expand CAD/CAM/CAE presence in our procedure for testing CPU performance. Now we have three packages already: updated SolidWorks 2005 (SPEC test), world-famous Pro/ENGINEER Wildfire 2.0 (a test script also from SPEC), and MATLAB 7.1 (we use the built-in "bench" command).
A brand new section that appeared in response to numerous requests from our users. Unfortunately, we have stumbled upon some difficulties in the process of developing the compilation speed test. Problem One - it would have been strange to introduce into a test procedure, oriented for Microsoft Windows, a single test for the other OS. On one hand, its results would tell nothing to the main audience, who read our reviews to learn results in Windows software. On the other hand, users of alternative operating systems will hardly start reading us for the sake of a single test. So the compilation test must run under Windows as well.
The second problem was to find open sources of sufficient size (so that it took more than several seconds to compile them) for the standard Windows compiler Microsoft Visual C++ (or .NET). It proved to be not that simple. Most of them require either additional tools like Cygwin or MinGW (which is definitely a perversion, as they may contribute to performance and confuse our measurements), or GCC compiler (show me people, who constantly use GCC under Windows, not just for fun...) Fortunately, we managed to find such a package — ACE + TAO with open sources of large size available to download.
Problem Three - open source packages, even adapted for native (without Cygwin/MinGW) compilation under Windows using a compiler from Microsoft... are not intended for its new versions, they cannot be compiled without significant modifications. Thus, the compilation speed test is currently a compromise between the desirable and the available - compilation of ACE+TAO in Microsoft Visual C++ Professional 6.0 (without loading IDE, by running a compiler and a linker from command line).
Many people think that CPU RightMark is a synthetic test. But in our opinion, it's not quite a synthetic test, or quite a peculiar one, if taken from a different angle. The point of this test is that on one hand it uses algorithms that are actually used, for example, for physics computations and rendering (from this point of view it's very close to game engines). On the other hand, CPU RM is sort of a "performance pump": it supports all modern extended instruction sets (it's optimized by manually written assembler inserts) and its render engine supports up to 32 simultaneously executed threads. That is in game engine terms it's a thoroughly optimized engine, which can squeeze maximum performance from any processor. The only synthetic thing about CPU RM is that it demonstrates maximum CPU performance when all possible optimizations are enabled. This test procedure uses CPU RightMark 2005 Lite with improved support for multiprocessing and adjusted scene parameters by default (there are more objects to distribute evenly the load on physics and render units).
Processing Bitmap Images (Photos)
Bitmap image processing is still represented by a single program. But it makes sense, as Adobe Photoshop has been a standard de facto for professional photo processing. A test script for Adobe Photoshop has not been modified, we have only supplemented it — added Filters to Blur, Color, Light, Rotate, Sharpen, Size, and Transform - it contains a lot of bundled filters (several dozens). You can download the script from our web site. The image, processed by this script, has been enlarged to 4096x3072. You can download the image as well as the script (attention: it's about 20 MB). Besides, the program was updated, now we use Adobe Photoshop CS2 (Version 9.0).
Another brand new benchmark and class. We use the built-in command of Apache web server — AB (Apache Benchmark). The command is executed three times for three different files - small (33 KB), average (137 KB) and large (1.8 MB). The small file is a typical front page, the average file is one of iXBT.com articles, the large file is an entire novel (the size of a reputable reference book). Requests go to localhost (127.0.0.1), we measure the number of processed requests per second. Results for the small and large web pages go to the total score with the 0.15 coefficient (15%), results for the average page — with the 0.70 coefficient (70%). Of course, this test uses multiprocessing — it loads all processors in a system.
The old benchmark, practically without changes, only software versions are updated: 7-Zip 4.32 (64-bit version), WinRAR 3.51 (unfortunately, this archiver does not have a 64-bit version). File set (Attention! 75 MB!) remains the same. It consists of 53 MB of files in BMP format, 48 MB of DBF files, 49 MB of DLL files, 49 MB of DOC files, 49 MB of PDF files, and 49 MB of TXT files. RAR does not support multiprocessing either, only 7-Zip can do that (besides, as far as we understand, it's not support for multiprocessing de facto, but support for dualprocessing, that is the archiver can use the second CPU — nothing more).
We have brought audio encoding (in response to multiple readers' requests) into a separate section and significantly expanded the set of tests. We now test MP3 encoding speed in LAME 3.98, encoding audio into lossless formats Monkey Audio and Windows Media Audio (MAC 4.01 and Windows Media Encoder 9 x64 Edition), into OGG format in OGG Encoder 2.8 (Lancer), and into lossy WMA format ("CD Quality" in terms of Windows Media Encoder, that is 64 kbps 2-pass VBR). All the tests have equal weight coefficients in the total score. Unfortunately, we cannot publish online the file, used in audio encoding tests: it's illegal to publish original commercial Audio CD - we use a WAV-image of Jacques Loussier Trio "The Best of Play Bach". You can grab this album on your own: it will be a 100% exact copy of what we use (if you have the original CD).
Our video encoding test has been slightly changed. It now comes as a separate subgroup (audio and video encoding tests used to be in the same group). Only Canopus ProCoder is still of the same version (there are no new versions) and the file to be encoded into MPEG2 is the same (it's a fragment of amateur video from a digital video camera, kindly provided by Mikhail Afanasenkov). The source video for MPEG4 encoding is now an HD Video fragment (Taxi 3 trailer), available for download from Microsoft (1080p, 00:02:42). DivX codec is updated to Version 6.1.1, XviD is represented by Version 1.1.0 release (we downloaded the binary file from Koepi's Media Development Homepage). There has been no newer versions of Windows Media Video codec (VCM), hence the old version. The x264 codec (Version 438) has been downloaded from http://x264.nl/. VirtualDubMod profiles can be downloaded from our web site. Profile notes: as the x264 codec allows to set the number of encoding threads only manually, there are three profiles: for single-, dual-, and quad-processor configurations. A separate mention should be made of the existing VirtualDub 64-bit. Unfortunately, it's absolutely no good in our case - as far as we understand, it can work only with 64-bit codecs. Thus, the list of available codecs includes only those that come as part of Windows XP x64 Edition.
Another new benchmarking group, represented by a single application so far. According to developers, FineReader 8.0 Professional supports multiprocessing as well. But frankly speaking, we noticed no such thing on the core load curves for dual core processors. A 200 page PDF file with text and graphics is used as an OCR source. It seems reasonable to us, as page images take up much more room and therefore they would have increased the load on the disk system, while we are interested in CPU performance. As usual, our test PDF file is available for download.
3D Shooter Games
The only surviving game from the previous test procedure is Unreal Tournament 2004, as the new version is not released yet. The other are state-of-the-art games: F.E.A.R., Half-Life 2, Quake 4. As in the previous version of our test procedure, we monitor three performance values: at 640x480 with the minimum graphics level, at 800x600 with average video quality, and at 1024x768 with high video quality. Higher resolution still do not seem like a good idea, as we are up to benchmarking performance of processors, not video cards. The mean score includes (with equal weight coefficients) only test results in the medium mode (800x600) — let's not forget that we test low-end processors as well as top models. The other results are available in the XLS spreadsheet. In case of F.E.A.R. we use its built-in benchmark, UT 2004 — ONS_dria demo, Half-Life 2 — Andrey Vorobiev's demo, our video editor, Quake 4 — our own demo (you shouldn't really download it out of sheer curiosity: it's very long and takes up 200 MB in archive).
Yep, we actually did it: each review will contain diagrams with the mean temperature "in a hospital" :). There will be three of them: "Professional" Score, "Home" Score, and Overall Score. As you can see from the diagram titles, they differ in what tests are represented there. For example, Professional Score takes into account only the results in 3D modeling packages, CAD, compilation speed test, CPU RightMark, image processing in Photoshop (that's a professional tool, amateurs use simpler programs), and web-server performance test. Home Score includes video and audio encoding, archiving files, OCR, and games, of course. Overall Score includes absolutely all tests.
Of course, the value of these data is up to you to decide, some people will certainly evaluate it as useless. That's their right, we do not force them to browse the diagrams they are not interested in. On the other hand, these figures are not just abstract data, calculated by unknown formulas (all the formulas are available — just download the spreadsheet), these figures are based on real tests in specific applications. Let's put it like this: if you cannot or don't want to analyze anything more complex — you'd better use our total scores instead of ad booklets. Don't you?
What concerns the results on diagrams, we have already mentioned above that they illustrate properties and preferences of the new test procedure rather than anything interesting in themselves. All processors that took part in our tests, except for two models (Pentium D 805 and Pentium XE 965), are well known to our readers by our previous tests. And the new models offer no surprises, they are simple and predictable. But of course, that was not done for nothing. In our opinion, we'll need all these results — for other articles with more interesting new processors. The fact that there is just one AMD processor is also natural: Athlon 64 FX-60 embodies achievements of state-of-the-art multicore x86 CPUs, it's the current leader. FX-60 is the fastest dual core processor among the existing models — according to our old and new tests. It seems to symbolize a close link between our test procedure and the real state of affairs on the CPU market and confirms that our test results are adequate ;).
Of course, it's too late to change anything cardinally in this test procedure. But we tried (as much as possible) to take into account all wishes of our readers. So, we hope that the public response will be generally positive. But if you have some observations on specific tests, which can be accepted scot-free — we'll be much obliged to receive them by email.
Memory modules for our testbeds are kindly provided by
Russian representatives of Corsair Memory
Stanislav Garmatiuk (firstname.lastname@example.org)
May 4, 2006
Write a comment below. No registration needed!