More than a year and a half have passed since we published an article devoted to performance analysis of dual-core processors in 3D games. Since that time quad-core processors have become widespread. Besides AMD has launched triple-core processors as well (although they physically have four cores, one is disabled). It has increased the share of 3D games supporting multi-core processors and the share of multiplatform games and corresponding tools. This has been happening because Microsoft Xbox 360 is based on a triple-core CPU, and Sony PlayStation 3 uses a Cell processor consisting of a single universal core and several computing units.
Multiprocessor systems used to have weak support in games, because they were rarely used in computers for games. Now it's absolutely impossible to find a good single-core processor in stores. A minimal requirement for modern gaming PCs is a dual-core processor, or triple- or quad-core even. Besides, they are not very expensive nowadays, quite affordable for ordinary users. So users wonder how they benefit from the second, third, or fourth CPU core in a 3D game. In this article we'll go into this topic and show performance gains provided by each additional core (up to four) in a few games.
Note that performance gains in multi-core systems are possible even for single-threaded applications, or when secondary threads cannot use the full capacity of processor cores. 3D applications that can use resources of only a single core may sometimes work faster on multi-core processors, because Direct3D API and graphics drivers can distribute a part of load between several cores. An operating system may also contribute by distributing threads between physical CPU cores, except for cases when an application handles this on its own. So, proceeding from theory to practice, we've tested several 3D games with a quad-core processor and analyzed the results.
Testbeds and settings
We've chosen one of the popular quad-core processors from Intel. It would have been better to disable cores in AMD Phenom processors because of their peculiarities of interaction between processor cores and cache memory. Besides, Core 2 Quad consists of two halves. And it has shared L3 Cache. So, if we run across strange relative performance results of triple- and quad-core Core 2 processors, we'll verify them with AMD Phenom.
When we were preparing for the tests, we found out that using "Set affinity" in Windows Task Manager for a selected application does not give correct test results. Many games detect the number of physical processor cores available to an operating system, and when the operating system forbids using some of the cores, the remaining core has to execute several hard-driving application threads simultaneously. As a result, performance may drop even below the level of a real single-core CPU.
So we used bcdedit to disable processor cores (it's a counterpart of boot.ini in Windows XP that Windows Vista doesn't have). So, in order to use two processors in Vista, type the following line in command prompt:
After a reboot, the system and applications will detect only the specified number of cores, which gets us as close to the physical presence as possible, except for some differences, including specifics of Intel Core 2 quad-core processors described above.
We used default video driver settings, texture filtering was set to High quality. We used three test resolutions, all of them wide: 1280x720(800), 1680x1050, and 1920x1200 are standard modes for wide-spread LCD monitors.
Tests were run with anisotropic filtering 16x and MSAA 4x selected in game options, if supported by a given application. The effect of CPU power would have been more noticeable, if anisotropic filtering and antialiasing had been disabled. But it goes against the idea of tests in real conditions, because all users play games with high graphics quality settings, if they have a powerful computer.
Our game tests include both standard built-in benchmarks, often used in articles, as well as games that do not offer standard tools to measure performance. So we have to use FRAPS to measure performance in some games, even though this method has several drawbacks described in our articles: Problems of Testing 3D Performance with FRAPS and Problems of Testing 3D Performance with FRAPS, Part 2. It will do for one-time-only tests, although its measurement error is bigger.
Along with the average frame rate in three resolutions with different numbers of enabled CPU cores, we measured the average and maximum usage of each core in our quad-core processor. These values can be used to determine how effectively an application, graphics API, video driver, and operating system use their resources. To cut down on the volume of data displayed in the table, we used only one resolution -- 1280x720(800) -- where the CPU usage must be the highest. Anisotropic filtering and multisampling remain enabled.
We used PIX for Windows from DirectX SDK in these tests. What concerns applications that don't work with PIX (Crysis, ETQW, Lost Planet, DMC4, GRID), we used monitoring features of RivaTuner.
This game is the sum of technical progress in many fields, including multiprocessing. Unfortunately, rendering performance in Crysis is limited mostly by a graphics card here. And even in the open scene with lots of geometry and active physics computations, performance is still limited by the power of a graphics card, not by a CPU.
Let's see what out multi-CPU systems can offer in such conditions. We set game options to High instead of the maximum level, to keep performance within decent limits. Multisampling was disabled for the same reason.
As we can see, Crysis runs fine on two cores, even though its CPU load is rather high. The difference between two, three, and four CPU cores does not exceed a measurement error. And two cores give a good performance boost in all modes, there is a performance difference even in 1920x1200.
The two lower resolutions show that performance is apparently limited by the speed of a single processor core. Perhaps, this boost appears, when physics computations as well as D3D API call processing are moved to the second CPU core. Let's see the load on CPU cores in the process of our tests:
At once we get interesting results: even considering significant usage of the second core, and some of the load for the third and fourth cores, overall performance is still limited by performance of a separate core. Look, the average Core 4 usage was 100%, and it means that it was operating at its full capacity almost all the time. That is, the overall speed of the game was most likely limited by a processor.
However, performance in games is often limited by a processor core that processes Direct3D API draw calls. Up to Direct3D 10 inclusive, there is just no way of distributing these computations between several cores. We can only wait for DirectX 11 and corresponding optimizations in future games. As for now, we confirm that the best choice for Crysis is a fast dual-core processor.
If we take the power of a single core in Core 2 Quad Q6600 for 100%, the average processor usage in this test was about 145%. That is, it's another indirect proof that a single-core processor won't be enough, and a dual-core CPU will be up to the handle.
Write a comment below. No registration needed!