Our analysis of NVIDIA G8x performance with different numbers of active shader units has been caused by the existing Mid- and Low-End cut-down solutions of this architecture as well as by possible future solutions from both chipmakers with a different number of unified processors (and probably with a different ratio of various units.)
It was very interesting to see the real difference in their performance with different numbers of units that execute vertex and pixel shaders in the existing games. This analysis will also help us evaluate the effect of shader performance on rendering speed, and how much texture and fill rates limit modern games.
Top GPUs integrate a lot of ALUs to execute all shader types. But do existing applications really need so many units? Do the existing games really load these units with enough pixel and vertex shaders (we don't even mention geometry shaders so far)? Or could they do with fewer units, and the reduced number of such units in Mid- and Low-End GPUs is justified?
Today we'll try to answer all these questions by analyzing performance of the GeForce 8800 GTX in several popular games with various numbers of active units. As you may already know, the total number of such units in the G80 is 128. We used RivaTuner, written by Alexei Nikolaychuk, to disable these units. So we benchmarked performance with 32, 64, 96, and 128 active units in order to evaluate comparative rendering speed in popular games. The list of these games is published below. Unfortunately, a pure analysis is not possible, because the G8x architecture does not allow to disable ALUs separately from TMUs.
Testbed configuration and settings
We used the following testbed configuration:
We used two video modes: the standard 1280x1024 mode (or 1280x960 for games that do not support the previous resolution) without anisotropic filtering and antialiasing; and the High Quality mode - 1600x1200, multisampling 4x and a maximum anisotropic filtering level - 16x These two modes were selected in order to dismiss a possibility of a CPU and other GPU units limiting performance.
The list of games used in our tests includes only standard benchmarks frequently used in articles: Quake 4, F.E.A.R., Serious Sam 2, Call of Juarez, Company of Heroes, S.T.A.L.K.E.R.: Shadow of Chernobyl. We haven't used games without standard benchmarking tools this time, the above-mentioned games are quite enough for our analysis. We also used RivaTuner 2.02.
Quake 4 1.3
The game uses a graphics engine that appeared in DOOM 3 three years ago. It's rather old, and our results shouldn't depend much on a number of shader units. Besides, Quake 4 depends much on a processor, because it uses a CPU to compute and apply shadows, as well as to compute physics. Let's analyze the "easy" mode first:
That's it, performance in 1280x1024 evidently depends on a CPU. We noticed a performance drop only with 32 active shader units. Performance results in other cases are similar. Now what concerns the heavy mode:
Now we can see that performance in this mode with anisotropic filtering and anti-aliasing is also limited by the number and power of shader units and texturing units. Only when we switch from 96 to 128 units, performance is limited by something else and depends little on the number of active ALUs. Interestingly, even such an old game as Quake 4 depends much on the power of execution units. We might have expected higher dependence on ROPs. Although performance does not depend fully on the number of execution units, the game will evidently do fine with fewer shader units than available in the G80.
F.E.A.R. also uses shadow algorithms similar to DOOM 3. It loads a CPU with physics, which may limit performance in the easy mode. However, rendering speed in F.E.A.R. mostly depends on graphics card's power, on its fillrate and memory bandwidth. Let's see what the built-in benchmark shows in the first test mode:
As in the previous case, performance is limited by a CPU. The number of active ALUs in the G80 does not have a strong effect on the overall performance. Let's have a look at the heavy mode:
Our early assumption that performance is mostly limited by fillrate is not proved. Performance depends a little on the number of shader and texture units. This situation resembles much what we've already seen in Quake 4. Performance gain is small only in the last case. Before that, the power of execution units evidently affects rendering speed, although the effect is not proportional to their number.
Serious Sam 2
It's another relatively old game, which performance depends mostly on a graphics card, especially in game levels with not many enemies. The game engine does not load shader units much. But it generates a heavy load on texture units, so we'll most likely see a similar picture as in the previous two games.
The easy mode is traditionally limited by CPU performance. Only in case of 32 active ALUs, we can see a heavy drop in rendering speed. This must not happen in the heavy mode:
The situation is practically the same as in the other games we tested. Although performance depends on the number of active units even more noticeably, especially in case of 32 and 64 shader units. But this dependence remains noticeable even further. The game gains rendering speed, when the number of active ALUs is increased to 96 and 128. To all appearances, ALUs themselves do not limit performance, it's the fault of texturing units tied to them.
Call of Juarez
It's a relatively new game with a high-tech engine. We used the Direct3D 9 version of the game. As we found out in the technological review, performance in Call of Juarez is mostly limited by a graphics card. It's responsible for the largest share of the load. The game demonstrates much geometry and pixel processing, which is done by the units we are experimenting with. A processor may limit performance only because of many draw calls. But the game features optimizations to reduce this effect. So CPU performance must not be a serious limiting factor.
Perhaps it's the first game to show dependence on the number of shader and texture units even in the easy mode. This dependence is almost identical for both modes. The game proves again that its rendering speed depends on a graphics card in the first place. The speed grows much with each step, although not proportionally to the number of units. It speaks of the fact that the number and power of execution units processing vertex/pixel shaders in Call of Juarez have a strong effect on the resulting framerates.
Company of Heroes
This game is not a first person shooter, but a real-time strategy. Unfortunately, the benchmark built into Company of Heroes does not reflect game performance. It shows a script scene, which does not look much like the game itself. But it will be still interesting to have a look at the difference in cinematics rendering speed with a different number of active GPU units. We used the Direct3D 9 version of the game. Let's analyze the easy mode first:
Even if the dependence is not that illustrative as in Call of Juarez, it's still more noticeable than in the old games. CPU limits performance only with 96 and 128 units executing vertex and pixel shaders.
The situation in the heavy mode is almost identical to the previous case: big performance gains from increasing the number of ALUs are always present. Although a framerate also depends on the power of other GPU units, the game depends much on the power of shader and texture units, especially in case of 32 and 64 processors.
S.T.A.L.K.E.R.: Shadow of Chernobyl
This latest game is included into the article because of its popularity and advanced technologies. It uses many interesting technical solutions. Fortunately, the latest patches add an option to record and play demos as well as an almost sterling benchmark. I said "almost sterling", because a demo cannot record gameplay. You just fly about a location. So this is not a gameplay benchmark. Still it's better than nothing. Let's see what we've got in 1280x1024 with high quality settings:
As in the first games, rendering speed is limited by a CPU. A small difference in the framerate is noticeable only with the fewest active units. Let's see what will happen in the heavy mode:
Unfortunately, the Direct3D 9 engine does not allow to use multisampling, so the load on a graphics card is much easier. Performance drops only in case of 32 active shader units. It's similar in other cases. As you can see, performance changes only a little, when we change the mode. It indicates that performance is limited by a CPU or some other parameter, but evidently not by a GPU. It renders Direct3D 9 tests almost useless for our article. In this case performance does not depend on the number of ALUs and TMUs, except for the weakest configuration.
Alexei Berillo (email@example.com)
September 6, 2007
Write a comment below. No registration needed!