iXBT Labs - Computer Hardware In Detail

Platform

Video

Multimedia

Mobile

Other

Dependence of NVIDIA G8x Architecture
Performance on the Amount of Shader Units

September 6, 2007



Our analysis of NVIDIA G8x performance with different numbers of active shader units has been caused by the existing Mid- and Low-End cut-down solutions of this architecture as well as by possible future solutions from both chipmakers with a different number of unified processors (and probably with a different ratio of various units.)

It was very interesting to see the real difference in their performance with different numbers of units that execute vertex and pixel shaders in the existing games. This analysis will also help us evaluate the effect of shader performance on rendering speed, and how much texture and fill rates limit modern games.

Top GPUs integrate a lot of ALUs to execute all shader types. But do existing applications really need so many units? Do the existing games really load these units with enough pixel and vertex shaders (we don't even mention geometry shaders so far)? Or could they do with fewer units, and the reduced number of such units in Mid- and Low-End GPUs is justified?

Today we'll try to answer all these questions by analyzing performance of the GeForce 8800 GTX in several popular games with various numbers of active units. As you may already know, the total number of such units in the G80 is 128. We used RivaTuner, written by Alexei Nikolaychuk, to disable these units. So we benchmarked performance with 32, 64, 96, and 128 active units in order to evaluate comparative rendering speed in popular games. The list of these games is published below. Unfortunately, a pure analysis is not possible, because the G8x architecture does not allow to disable ALUs separately from TMUs.

Testbed configuration and settings

We used the following testbed configuration:

  • CPU: AMD Athlon 64 X2 4600+ Socket 939
  • Motherboard: Foxconn WinFast NF4SK8AA-8KRS (NVIDIA nForce4 SLI)
  • RAM: 2048 MB DDR SDRAM PC3200
  • Graphics card: NVIDIA GeForce 8800 GTX 768MB
  • HDD: Seagate Barracuda 7200.7 120 Gb SATA
  • Operating system: Microsoft Windows XP Professional SP2
  • Video driver: NVIDIA ForceWare 158.22

We used two video modes: the standard 1280x1024 mode (or 1280x960 for games that do not support the previous resolution) without anisotropic filtering and antialiasing; and the High Quality mode - 1600x1200, multisampling 4x and a maximum anisotropic filtering level - 16x These two modes were selected in order to dismiss a possibility of a CPU and other GPU units limiting performance.

The list of games used in our tests includes only standard benchmarks frequently used in articles: Quake 4, F.E.A.R., Serious Sam 2, Call of Juarez, Company of Heroes, S.T.A.L.K.E.R.: Shadow of Chernobyl. We haven't used games without standard benchmarking tools this time, the above-mentioned games are quite enough for our analysis. We also used RivaTuner 2.02.

Test Results

Quake 4 1.3

The game uses a graphics engine that appeared in DOOM 3 three years ago. It's rather old, and our results shouldn't depend much on a number of shader units. Besides, Quake 4 depends much on a processor, because it uses a CPU to compute and apply shadows, as well as to compute physics. Let's analyze the "easy" mode first:

That's it, performance in 1280x1024 evidently depends on a CPU. We noticed a performance drop only with 32 active shader units. Performance results in other cases are similar. Now what concerns the heavy mode:

Now we can see that performance in this mode with anisotropic filtering and anti-aliasing is also limited by the number and power of shader units and texturing units. Only when we switch from 96 to 128 units, performance is limited by something else and depends little on the number of active ALUs. Interestingly, even such an old game as Quake 4 depends much on the power of execution units. We might have expected higher dependence on ROPs. Although performance does not depend fully on the number of execution units, the game will evidently do fine with fewer shader units than available in the G80.

F.E.A.R.

F.E.A.R. also uses shadow algorithms similar to DOOM 3. It loads a CPU with physics, which may limit performance in the easy mode. However, rendering speed in F.E.A.R. mostly depends on graphics card's power, on its fillrate and memory bandwidth. Let's see what the built-in benchmark shows in the first test mode:

As in the previous case, performance is limited by a CPU. The number of active ALUs in the G80 does not have a strong effect on the overall performance. Let's have a look at the heavy mode:

Our early assumption that performance is mostly limited by fillrate is not proved. Performance depends a little on the number of shader and texture units. This situation resembles much what we've already seen in Quake 4. Performance gain is small only in the last case. Before that, the power of execution units evidently affects rendering speed, although the effect is not proportional to their number.

Serious Sam 2

It's another relatively old game, which performance depends mostly on a graphics card, especially in game levels with not many enemies. The game engine does not load shader units much. But it generates a heavy load on texture units, so we'll most likely see a similar picture as in the previous two games.

The easy mode is traditionally limited by CPU performance. Only in case of 32 active ALUs, we can see a heavy drop in rendering speed. This must not happen in the heavy mode:

The situation is practically the same as in the other games we tested. Although performance depends on the number of active units even more noticeably, especially in case of 32 and 64 shader units. But this dependence remains noticeable even further. The game gains rendering speed, when the number of active ALUs is increased to 96 and 128. To all appearances, ALUs themselves do not limit performance, it's the fault of texturing units tied to them.

Call of Juarez

It's a relatively new game with a high-tech engine. We used the Direct3D 9 version of the game. As we found out in the technological review, performance in Call of Juarez is mostly limited by a graphics card. It's responsible for the largest share of the load. The game demonstrates much geometry and pixel processing, which is done by the units we are experimenting with. A processor may limit performance only because of many draw calls. But the game features optimizations to reduce this effect. So CPU performance must not be a serious limiting factor.

Perhaps it's the first game to show dependence on the number of shader and texture units even in the easy mode. This dependence is almost identical for both modes. The game proves again that its rendering speed depends on a graphics card in the first place. The speed grows much with each step, although not proportionally to the number of units. It speaks of the fact that the number and power of execution units processing vertex/pixel shaders in Call of Juarez have a strong effect on the resulting framerates.

Company of Heroes

This game is not a first person shooter, but a real-time strategy. Unfortunately, the benchmark built into Company of Heroes does not reflect game performance. It shows a script scene, which does not look much like the game itself. But it will be still interesting to have a look at the difference in cinematics rendering speed with a different number of active GPU units. We used the Direct3D 9 version of the game. Let's analyze the easy mode first:

Even if the dependence is not that illustrative as in Call of Juarez, it's still more noticeable than in the old games. CPU limits performance only with 96 and 128 units executing vertex and pixel shaders.

The situation in the heavy mode is almost identical to the previous case: big performance gains from increasing the number of ALUs are always present. Although a framerate also depends on the power of other GPU units, the game depends much on the power of shader and texture units, especially in case of 32 and 64 processors.

S.T.A.L.K.E.R.: Shadow of Chernobyl

This latest game is included into the article because of its popularity and advanced technologies. It uses many interesting technical solutions. Fortunately, the latest patches add an option to record and play demos as well as an almost sterling benchmark. I said "almost sterling", because a demo cannot record gameplay. You just fly about a location. So this is not a gameplay benchmark. Still it's better than nothing. Let's see what we've got in 1280x1024 with high quality settings:

As in the first games, rendering speed is limited by a CPU. A small difference in the framerate is noticeable only with the fewest active units. Let's see what will happen in the heavy mode:

Unfortunately, the Direct3D 9 engine does not allow to use multisampling, so the load on a graphics card is much easier. Performance drops only in case of 32 active shader units. It's similar in other cases. As you can see, performance changes only a little, when we change the mode. It indicates that performance is limited by a CPU or some other parameter, but evidently not by a GPU. It renders Direct3D 9 tests almost useless for our article. In this case performance does not depend on the number of ALUs and TMUs, except for the weakest configuration.

Conclusions

  • Rendering speed of modern games in easy modes is often limited by a CPU. In this case there will be no performance gain from increasing the number of shader units and texture units. So a reduced number of unified processors in Mid- and Low-End cards, which are used in similar conditions (a relatively low resolution, medium quality settings, no anti-aliasing and anisotropic filtering), is quite justified.
  • What concerns heavy modes in high resolutions with maximum settings, performance is often limited by the number and speed of shader units and texture units. We can see a performance difference between configurations with 32 and 64, 64 and 96 active units almost in all games. But the difference between 96 and 128 is much smaller. Here is the main conclusion - the number of shader and texture units is very important for modern games in heavy modes.
  • There is a noticeable correlation with release dates - newer games have higher requirements to the power and number of execution units, for example Call of Juarez and Company of Heroes. These games show that the importance of powerful computing units in GPUs will only grow in future. Especially if we speak of future games, which will use geometry shaders along with actively using vertex and pixel shaders. S.T.A.L.K.E.R. is an exception. Its performance in the heavy mode is so limited by a CPU that there is almost no difference between the configurations with 64 and 128 stream processors. However, most games do not behave this way. They are rarely limited by a CPU in the heavy mode.
Alexei Berillo (sbe@ixbt.com)
September 6, 2007

Write a comment below. No registration needed!


Article navigation:

blog comments powered by Disqus

  Most Popular Reviews More    RSS  

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

ASUS M5A97 EVO, ASUS M5A99X EVO Motherboards

Mainstream Socket AM3+ boards from the new M5 series.
November 1, 2011 · Motherboards

71 CPU Roundup

Summing up the year 2011.
January 25, 2012 · General Platform

ASRock P67 Pro3 (B3) Motherboard

A mid-end model with USB 3.0, eSATA 6Gbps and UEFI.
March 29, 2011 · Motherboards

Gigabyte GA-890FXA-UD7 Motherboard

AMD 890FX chipset in tests.
May 17, 2010 · Motherboards
  Latest Reviews More    RSS  

i3DSpeed, April 2012

Retested all graphics cards with NVIDIA Drivers 301.24 and AMD CATALYST 12.4. Added test results of the reference and overclocked AMD Radeon HD 7850, NVIDIA GeForce GTX 690, NVIDIA GeForce GTX 680 SLI, AMD Radeon HD 7970 CrossFireX, AMD Radeon HD 7770/78
May 05, 2012 · 3Digests

i3DSpeed, March 2012

Retested all graphics cards with NVIDIA Drivers 295.73 and AMD CATALYST 12.3. Added test results of the reference and overclocked AMD Radeon HD 7870 and NVIDIA GeForce GTX 680.
Apr 05, 2012 · 3Digests

i3DSpeed, February 2012

Retested all graphics cards with NVIDIA Drivers 295.52 and AMD CATALYST 12.1, added test results of AMD Radeon HD 7970/7950/7770/7750/6930.
Mar 05, 2012 · 3Digests

Palit GeForce GTX 560 Ti Twin Light Turbo 1024MB GDDR5, KFA2 GeForce GTX 560 Ti LTD OC 1024MB V2.0 Graphics Cards

A couple of interesting custom cards, one heavily overclocked.
Mar 01, 2012 · Video cards: NVIDIA GPUs

i3DSpeed, January 2012

Retested all graphics cards with NVIDIA Drivers 295.52 and AMD CATALYST 12.1, added AMD Radeon HD 7950 test results.
Feb 14, 2012 · 3Digests
  Latest News More    RSS  
  Useful Links Get listed  

Wholesale Computers & Networking

Get great Dell Coupons at CouponSnapshot.com

Saving more with great Lenovo coupon codes

Cut your budget with Coupon codes

Great HP vouchers

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  Feedback  ·  About us & Privacy policy  ·  Twitter  ·  Facebook


22

Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.