iXBT Labs - Computer Hardware in Detail






Dependence of NVIDIA G8x Architecture
Performance on the Amount of Shader Units

Our analysis of NVIDIA G8x performance with different numbers of active shader units has been caused by the existing Mid- and Low-End cut-down solutions of this architecture as well as by possible future solutions from both chipmakers with a different number of unified processors (and probably with a different ratio of various units.)

It was very interesting to see the real difference in their performance with different numbers of units that execute vertex and pixel shaders in the existing games. This analysis will also help us evaluate the effect of shader performance on rendering speed, and how much texture and fill rates limit modern games.

Top GPUs integrate a lot of ALUs to execute all shader types. But do existing applications really need so many units? Do the existing games really load these units with enough pixel and vertex shaders (we don't even mention geometry shaders so far)? Or could they do with fewer units, and the reduced number of such units in Mid- and Low-End GPUs is justified?

Today we'll try to answer all these questions by analyzing performance of the GeForce 8800 GTX in several popular games with various numbers of active units. As you may already know, the total number of such units in the G80 is 128. We used RivaTuner, written by Alexei Nikolaychuk, to disable these units. So we benchmarked performance with 32, 64, 96, and 128 active units in order to evaluate comparative rendering speed in popular games. The list of these games is published below. Unfortunately, a pure analysis is not possible, because the G8x architecture does not allow to disable ALUs separately from TMUs.

Testbed configuration and settings

We used the following testbed configuration:

  • CPU: AMD Athlon 64 X2 4600+ Socket 939
  • Motherboard: Foxconn WinFast NF4SK8AA-8KRS (NVIDIA nForce4 SLI)
  • RAM: 2048 MB DDR SDRAM PC3200
  • Graphics card: NVIDIA GeForce 8800 GTX 768MB
  • HDD: Seagate Barracuda 7200.7 120 Gb SATA
  • Operating system: Microsoft Windows XP Professional SP2
  • Video driver: NVIDIA ForceWare 158.22

We used two video modes: the standard 1280x1024 mode (or 1280x960 for games that do not support the previous resolution) without anisotropic filtering and antialiasing; and the High Quality mode - 1600x1200, multisampling 4x and a maximum anisotropic filtering level - 16x These two modes were selected in order to dismiss a possibility of a CPU and other GPU units limiting performance.

The list of games used in our tests includes only standard benchmarks frequently used in articles: Quake 4, F.E.A.R., Serious Sam 2, Call of Juarez, Company of Heroes, S.T.A.L.K.E.R.: Shadow of Chernobyl. We haven't used games without standard benchmarking tools this time, the above-mentioned games are quite enough for our analysis. We also used RivaTuner 2.02.

Test Results

Quake 4 1.3

The game uses a graphics engine that appeared in DOOM 3 three years ago. It's rather old, and our results shouldn't depend much on a number of shader units. Besides, Quake 4 depends much on a processor, because it uses a CPU to compute and apply shadows, as well as to compute physics. Let's analyze the "easy" mode first:

That's it, performance in 1280x1024 evidently depends on a CPU. We noticed a performance drop only with 32 active shader units. Performance results in other cases are similar. Now what concerns the heavy mode:

Now we can see that performance in this mode with anisotropic filtering and anti-aliasing is also limited by the number and power of shader units and texturing units. Only when we switch from 96 to 128 units, performance is limited by something else and depends little on the number of active ALUs. Interestingly, even such an old game as Quake 4 depends much on the power of execution units. We might have expected higher dependence on ROPs. Although performance does not depend fully on the number of execution units, the game will evidently do fine with fewer shader units than available in the G80.


F.E.A.R. also uses shadow algorithms similar to DOOM 3. It loads a CPU with physics, which may limit performance in the easy mode. However, rendering speed in F.E.A.R. mostly depends on graphics card's power, on its fillrate and memory bandwidth. Let's see what the built-in benchmark shows in the first test mode:

As in the previous case, performance is limited by a CPU. The number of active ALUs in the G80 does not have a strong effect on the overall performance. Let's have a look at the heavy mode:

Our early assumption that performance is mostly limited by fillrate is not proved. Performance depends a little on the number of shader and texture units. This situation resembles much what we've already seen in Quake 4. Performance gain is small only in the last case. Before that, the power of execution units evidently affects rendering speed, although the effect is not proportional to their number.

Serious Sam 2

It's another relatively old game, which performance depends mostly on a graphics card, especially in game levels with not many enemies. The game engine does not load shader units much. But it generates a heavy load on texture units, so we'll most likely see a similar picture as in the previous two games.

The easy mode is traditionally limited by CPU performance. Only in case of 32 active ALUs, we can see a heavy drop in rendering speed. This must not happen in the heavy mode:

The situation is practically the same as in the other games we tested. Although performance depends on the number of active units even more noticeably, especially in case of 32 and 64 shader units. But this dependence remains noticeable even further. The game gains rendering speed, when the number of active ALUs is increased to 96 and 128. To all appearances, ALUs themselves do not limit performance, it's the fault of texturing units tied to them.

Call of Juarez

It's a relatively new game with a high-tech engine. We used the Direct3D 9 version of the game. As we found out in the technological review, performance in Call of Juarez is mostly limited by a graphics card. It's responsible for the largest share of the load. The game demonstrates much geometry and pixel processing, which is done by the units we are experimenting with. A processor may limit performance only because of many draw calls. But the game features optimizations to reduce this effect. So CPU performance must not be a serious limiting factor.

Perhaps it's the first game to show dependence on the number of shader and texture units even in the easy mode. This dependence is almost identical for both modes. The game proves again that its rendering speed depends on a graphics card in the first place. The speed grows much with each step, although not proportionally to the number of units. It speaks of the fact that the number and power of execution units processing vertex/pixel shaders in Call of Juarez have a strong effect on the resulting framerates.

Company of Heroes

This game is not a first person shooter, but a real-time strategy. Unfortunately, the benchmark built into Company of Heroes does not reflect game performance. It shows a script scene, which does not look much like the game itself. But it will be still interesting to have a look at the difference in cinematics rendering speed with a different number of active GPU units. We used the Direct3D 9 version of the game. Let's analyze the easy mode first:

Even if the dependence is not that illustrative as in Call of Juarez, it's still more noticeable than in the old games. CPU limits performance only with 96 and 128 units executing vertex and pixel shaders.

The situation in the heavy mode is almost identical to the previous case: big performance gains from increasing the number of ALUs are always present. Although a framerate also depends on the power of other GPU units, the game depends much on the power of shader and texture units, especially in case of 32 and 64 processors.

S.T.A.L.K.E.R.: Shadow of Chernobyl

This latest game is included into the article because of its popularity and advanced technologies. It uses many interesting technical solutions. Fortunately, the latest patches add an option to record and play demos as well as an almost sterling benchmark. I said "almost sterling", because a demo cannot record gameplay. You just fly about a location. So this is not a gameplay benchmark. Still it's better than nothing. Let's see what we've got in 1280x1024 with high quality settings:

As in the first games, rendering speed is limited by a CPU. A small difference in the framerate is noticeable only with the fewest active units. Let's see what will happen in the heavy mode:

Unfortunately, the Direct3D 9 engine does not allow to use multisampling, so the load on a graphics card is much easier. Performance drops only in case of 32 active shader units. It's similar in other cases. As you can see, performance changes only a little, when we change the mode. It indicates that performance is limited by a CPU or some other parameter, but evidently not by a GPU. It renders Direct3D 9 tests almost useless for our article. In this case performance does not depend on the number of ALUs and TMUs, except for the weakest configuration.


  • Rendering speed of modern games in easy modes is often limited by a CPU. In this case there will be no performance gain from increasing the number of shader units and texture units. So a reduced number of unified processors in Mid- and Low-End cards, which are used in similar conditions (a relatively low resolution, medium quality settings, no anti-aliasing and anisotropic filtering), is quite justified.
  • What concerns heavy modes in high resolutions with maximum settings, performance is often limited by the number and speed of shader units and texture units. We can see a performance difference between configurations with 32 and 64, 64 and 96 active units almost in all games. But the difference between 96 and 128 is much smaller. Here is the main conclusion - the number of shader and texture units is very important for modern games in heavy modes.
  • There is a noticeable correlation with release dates - newer games have higher requirements to the power and number of execution units, for example Call of Juarez and Company of Heroes. These games show that the importance of powerful computing units in GPUs will only grow in future. Especially if we speak of future games, which will use geometry shaders along with actively using vertex and pixel shaders. S.T.A.L.K.E.R. is an exception. Its performance in the heavy mode is so limited by a CPU that there is almost no difference between the configurations with 64 and 128 stream processors. However, most games do not behave this way. They are rarely limited by a CPU in the heavy mode.
Alexei Berillo (sbe@ixbt.com)
September 6, 2007

Write a comment below. No registration needed!

Article navigation:

blog comments powered by Disqus

  Most Popular Reviews More    RSS  

AMD Phenom II X4 955, Phenom II X4 960T, Phenom II X6 1075T, and Intel Pentium G2120, Core i3-3220, Core i5-3330 Processors

Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
February 1, 2013 · Processor Roundups

Inno3D GeForce GTX 670 iChill, Inno3D GeForce GTX 660 Ti Graphics Cards

A couple of mid-range adapters with original cooling systems.
January 30, 2013 · Video cards: NVIDIA GPUs

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

AMD FX-8350 Processor

The first worthwhile Piledriver CPU.
September 11, 2012 · Processors: AMD

Consumed Power, Energy Consumption: Ivy Bridge vs. Sandy Bridge

Trying out the new method.
September 18, 2012 · Processors: Intel
  Latest Reviews More    RSS  

i3DSpeed, September 2013

Retested all graphics cards with the new drivers.
Oct 18, 2013 · 3Digests

i3DSpeed, August 2013

Added new benchmarks: BioShock Infinite and Metro: Last Light.
Sep 06, 2013 · 3Digests

i3DSpeed, July 2013

Added the test results of NVIDIA GeForce GTX 760 and AMD Radeon HD 7730.
Aug 05, 2013 · 3Digests

Gainward GeForce GTX 650 Ti BOOST 2GB Golden Sample Graphics Card

An excellent hybrid of GeForce GTX 650 Ti and GeForce GTX 660.
Jun 24, 2013 · Video cards: NVIDIA GPUs

i3DSpeed, May 2013

Added the test results of NVIDIA GeForce GTX 770/780.
Jun 03, 2013 · 3Digests
  Latest News More    RSS  

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  About us & Privacy policy  ·  Twitter  ·  Facebook

Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.