iXBT Labs - Test Method for Evaluating PC Performance - Page 1: Introduction, key principles

<< Previous page

We present you our new test method. Our readers may have already noticed that it has a somewhat different title. What used to be called the "Test Method for Evaluating Processor Performance" is now called the "Test Method for Evaluating PC Performance." Processors have sort of been placed on the same shelf with other PC parts. This is both true and not true. On the one hand, we agree with a simple fact that a CPU plays one of dominant roles in performance of modern computers, but only one of them, non plus ultra. On the other hand, we cannot contradict a fact that a CPU is still a central part in the modern architecture and its role hasn't changed. Processors still affect performance of all subsystems of a modern PC. This contradiction (even a dialectic unity of opposites) leaves us no choice but to admit two facts: 1) the performance of a ready PC depends not only on CPU; 2) PC performance depends on CPU so much that it would be stupid to ignore this influence, and it can be analyzed in isolation, as an independent value, but not quite self-contained.

So this test method has been developed to separate three independent characteristics from each other:

CPU performance (speed);
Other subsystems performance;
Entire PC performance.

We couldn't forget about the CPU-centric nature of our old test method so quickly, of course: processors traditionally play a dominant role in this version. However, the general vector has changed cardinally, and its effect will become stronger in the next versions of our method: we are heading towards an all-purpose method for testing computer performance, where the CPU influence has been certainly taken into account. But it will be done proportionally to its effect on performance of the entire computer, not as a self-sufficient factor. In other words, we'll gradually stop answering your simple but mostly pointless question as to which processor is better. But we'll try to give an adequate answer to a more complex question - which processor is better for you, considering your tasks and preferences for the other PC components.

The key principles of our approach to testing

Why resource-intensive applications?

Benchmarking PC components and ready systems by running resource-intensive applications has its traditional opponents. They ask one not-so-silly question: why do we need these tests? What an average user will learn from their results, if he doesn't know even half of the names of such applications (and the second user will know nothing about the other half)? This question sounds quite convincing: indeed, who actually needs it, and do users really need it at all? We'll try to answer it in detail, which will take three parts.

For one, don't go over the top with averaging, lest it becomes absurd: it's really difficult to find a professional car designer among passers-by, but lots of cars nearby suggest the idea that such people really exist. A typical user (who is mentioned so many times by marketing specialists) really has very few tasks, where PC speed exercises a dominant influence on their execution speed (not his or her working speed). That's why we fully agree with the idea that a modern typical user couldn't care less about performance problems. So what? We don't force users to read our articles. Our reviews seems to find those designers that don't walk the streets. So, we do not try to please everybody.
For two, the success of a product from a consumer's point of view (who vote with their money) correlates directly with its real technical properties in most cases (and the fact that most users have no idea of them does not affect this trend). In other words, product comparisons based on well-posed tests and opinion polls about products often give the same or similar results. However, in this case tests have a number of undeniable advantages: they are much easier and faster performed than public opinion polls. Besides, a product can be tested right after its rollout (or even prior to its official appearance in the market), when there are still few users, who have already formed their opinions on a given product. So, even if you don't understand how an evaluation method works, it does not mean that it does not work.
Indeed, it's theoretically possible to develop a test method based on a paradigm other than resource-intensive applications. And by the way, we don't say that it cannot be useful. However, it's already the fourth version of the classic test method (plus intermediate editions), and we still found enough reasons to improve the method. So don't rush to announce half-baked alternatives. We should discuss something only when it's ready for it.

Grouping tests by common properties

We've been grouping applications by certain common properties for a long time already, publishing only diagrams with total scores by groups. Detailed results of each tests are published in a separate spreadsheet linked to the article. The reason for such approach is obvious: we obtain a great many results. For example, the test method described in this article outputs 88 test results. Can you imagine an article with 88 diagrams? It will be a disaster. All the more difficult to read such an article to an end.

Thus, grouping tests and test results in a detailed test method (with lots of tests) is practically inevitable. However, results depend directly on careful group planning: you may get just several groups of results instead of one big heap or you may get something well-knit, logical, and meaningful. In all previous versions of our test method we tried to group applications either by popular software classes or by similarity of tasks. In this test method we decided to give up grouping tests by the first attribute and focus on tasks. Besides, the next logical step of this idea is to abandon separate total scores for non-professional and professional segments: if users are interested in performance in a certain class of tasks, we hope they can identify it by name. If you are interested in the average score, it will be represented by a single value.

There are fewer groups in the fourth version of the test method: only 11 (the previous test method had 14). On the other hand, the number of applications and benchmarks was increased. This effect has been reached owing to well thought-out grouping. However, we also accommodated the interests of those people, who cannot get used to the new format: now the table with results contains three tabbed pages: raw test results, summary in the new style, and summary in the old style. So you can choose which one to use. Now we shall describe test groups and ideas behind the new style, as you are probably already familiar with the old ones.

Write a comment below. No registration needed!

Next page >>