iXBT Labs - Computer Hardware in Detail

Platform

Video

Multimedia

Mobile

Other

NVIDIA GeForce 9800 GTX Graphics Card



On the example of
BFG GeForce 9800 GTX 512MB PCI-E,
MSI GeForce 9800 GTX 512MB PCI-E (N9800GTX-T2D512),
Zotac GeForce 9800 GTX 512MB PCI-E

Part I: Theory and Architecture

We've tested three graphics cards based on GeForce 9800 GTX made by BFG, MSI, and Zotac. All of them operate at the nominal frequencies. They are reference cards, bought by partners from NVIDIA, and manufactured at Flextronics and PC Partner plants by orders of the Californian chip maker.

Graphics Cards

BFG GeForce 9800 GTX 512MB PCI-E
  • GPU: GeForce 9800 GTX (G92)
  • Interface: PCI-Express x16
  • GPU frequencies (ROPs/Shaders): 675/1688 MHz (nominal - 675/1688 MHz)
  • Memory frequencies (physical (effective)): 1100 (2200) MHz (nominal - 1100 (2200) MHz)
  • Memory bus width: 256bit
  • Vertex processors: -
  • Pixel processors: -
  • Unified processors: 128
  • Texture processors: 64 (BLF/TLF)
  • ROPs: 16
  • Dimensions: 270x100x32 mm (the last figure is maximum thickness of the graphics card).
  • PCB color: black
  • RAMDACs/TDMS: integrated into GPU.
  • Output connectors: 2xDVI (Dual-Link/HDMI), TV-Out.
  • VIVO: not available
  • TV-out: integrated into GPU.
  • Multi-GPU operation: SLI (Hardware).
MSI GeForce 9800 GTX 512MB PCI-E (N9800GTX-T2D512)
  • GPU: GeForce 9800 GTX (G92)
  • Interface: PCI-Express x16
  • GPU frequencies (ROPs/Shaders): 675/1688 MHz (nominal - 675/1688 MHz)
  • Memory frequencies (physical (effective)): 1100 (2200) MHz (nominal - 1100 (2200) MHz)
  • Memory bus width: 256bit
  • Vertex processors: -
  • Pixel processors: -
  • Unified processors: 128
  • Texture processors: 64 (BLF/TLF)
  • ROPs: 16
  • Dimensions: 270x100x32 mm (the last figure is maximum thickness of the graphics card).
  • PCB color: black
  • RAMDACs/TDMS: integrated into GPU.
  • Output connectors: 2xDVI (Dual-Link/HDMI), TV-Out.
  • VIVO: not available
  • TV-out: integrated into GPU.
  • Multi-GPU operation: SLI (Hardware).

Zotac GeForce 9800 GTX 512MB PCI-E
  • GPU: GeForce 9800 GTX (G92)
  • Interface: PCI-Express x16
  • GPU frequencies (ROPs/Shaders): 675/1688 MHz (nominal - 675/1688 MHz)
  • Memory frequencies (physical (effective)): 1100 (2200) MHz (nominal - 1100 (2200) MHz)
  • Memory bus width: 256bit
  • Vertex processors: -
  • Pixel processors: -
  • Unified processors: 128
  • Texture processors: 64 (BLF/TLF)
  • ROPs: 16
  • Dimensions: 270x100x32 mm (the last figure is maximum thickness of the graphics card).
  • PCB color: black
  • RAMDACs/TDMS: integrated into GPU.
  • Output connectors: 2xDVI (Dual-Link/HDMI), TV-Out.
  • VIVO: not available
  • TV-out: integrated into GPU.
  • Multi-GPU operation: SLI (Hardware).



BFG GeForce 9800 GTX 512MB PCI-E
Zotac GeForce 9800 GTX 512MB PCI-E
MSI GeForce 9800 GTX 512MB PCI-E (N9800GTX-T2D512)
Each graphics card has 512 MB of GDDR3 SDRAM allocated in eight chips on the front side of the PCB.

Samsung memory chips (GDDR3). 0.8 ns memory access time, which corresponds to 1250 (2500) MHz.





Comparison with the reference design, front view
Zotac GeForce 9800 GTX 512MB PCI-E
Reference card NVIDIA GeForce 8800 GTS 512
BFG GeForce 9800 GTX 512MB PCI-E
MSI GeForce 9800 GTX 512MB PCI-E (N9800GTX-T2D512)


Comparison with the reference design, back view
Zotac GeForce 9800 GTX 512MB PCI-E
Reference card NVIDIA GeForce 8800 GTS 512
BFG GeForce 9800 GTX 512MB PCI-E
MSI GeForce 9800 GTX 512MB PCI-E (N9800GTX-T2D512)

So, you can see that 9800GTX cards do not differ much from 8800 GTS 512, but they got a very long PCB, which takes us back to the times of 8800 GTX/Ultra. In the latter case it was somehow justified by the 384-bit bus as well as by a very complex power supply circuit (power consumption was very high). But our card under review is just as long, and it has only a standard 256-bit bus. It's not quite clear what made the engineers change the power supply circuits so radically (they take up much of the PCB) - the core is the same, and there are not many differences from 8800 GTS 512. Only because of a tad faster memory? Or 3-way SLI support? We don't think so.

In my opinion, engineers just didn't have much to do (the only reason that comes to mind is that they simplified the PCB to make fewer layers)... Nevertheless, users have to think about large PC cases again, to accommodate such a card.

We'll talk about the cooler below.

Graphics cards of this series are equipped with an audio connector to plug to a sound card, in order to transmit the audio stream to HDMI (via a DVI-to-HDMI adapter). That is the graphics card does not contain an audio codec, but it receives the audio signal from an external sound card. So if this function is important to you, make sure the bundle includes a special audio cable.

All cards have TV-Out with a unique jack. You will need a special bundled adapter to output video to a TV set via S-Video or RCA. You can read about the TV-Out in more detail here.

Analog monitors with d-Sub (VGA) interface are connected with special DVI-to-d-Sub adapters. The bundle also includes DVI-to-HDMI adapters (these graphics cards support video/audio transfer to HDMI receivers), so there should be no problems with such monitors. Maximum resolutions and frequencies:

  • 240 Hz Max Refresh Rate
  • 2048 x 1536 x 32bit x85Hz Max - analog interface
  • 2560 x 1600 @ 60Hz Max - digital interface (all DVIs with Dual-Link)

What concerns MPEG2 playback features (DVD-Video), we analyzed this issue in 2002. Little has changed since that time. CPU load during video playback on modern graphics cards does not exceed 25%.

What concerns HDTV. You can read one review here.

These cards require additional power supply (TWO CONNECTORS!), so each card is bundled with an adapter from molex to 6-pin, even though all modern PSUs are equipped with these cables.

Now about the cooling system. All these graphics cards are equipped with the reference cooler, so we'll examine the one used in the Zotac product.

Zotac GeForce 9800 GTX 512MB PCI-E
Reference cooler GeForce 8800 GTS

As we can see, this cooling system features a traditional long closed heat sink and a fan that blows air through it. Good news - the hot air is blown out of a PC case.

I've compared two cooling systems: the one in 8800 GTS 512 and the new device in 9800 GTX. The only difference is in heat sink lengths. The fan is slow, so the cooling system is noiseless.

We monitored temperatures using RivaTuner (written by A.Nikolaychuk AKA Unwinder). Here are the results:

GeForce 9800 GTX 512MB PCI-E

All cards with the reference cooling system cope with their responsibilities well. Their temperatures never rise close to the critical line.

You can see the G92 chip (GeForce 9800 GTX) below. Revision A2, it differs from other G92 chips with its numbering - 420.



Let's proceed to bundles.

Bundle

All bundles include a User Manual, CD with drivers and utilities, external power splitter, DVI-to-VGA, DVI-to-HMDI, and component output (TV-out) adapters. And now let's see what other accessories are added by each vendor.



BFG GeForce 9800 GTX 512MB PCI-E
The bundle does not include an HDMI adapter, although it contains two DVI-to-VGA adapters (I cannot imagine why a user would want to plug two CRT monitors to the new graphics card these days). As usual, the bundle contains a pile of fliers and useless documents instead of a single User Manual. Besides, there is no audio cable to output the audio signal from a sound card to HDMI. It seems that the American company neglects the HDMI feature.
MSI GeForce 9800 GTX 512MB PCI-E (N9800GTX-T2D512)
It's a basic bundle, the manufacturer added only an audio cable.
Zotac GeForce 9800 GTX 512MB PCI-E
The same bundle. Plus a bonus - the Lost game.


Packages

BFG GeForce 9800 GTX 512MB PCI-E

It's a traditional small black box. Bundled cables and adapters will just pour out of it, if you open it. The card itself is packed well in a makeshift box. The company paid heed to criticism, and now we can see an updated box design, clear and comprehensible.

MSI GeForce 9800 GTX 512MB PCI-E (N9800GTX-T2D512)

The company sticks to the traditional bag design of the package - a jacket with a white cardboard box inside. All bundled components are in the bottom of the box under the card. The card itself is inside a foamed polyurethane package, so damages in transit are out of the question. The package has a nice, attractive design.

Zotac GeForce 9800 GTX 512MB PCI-E

The box is designed in traditional orange palette of Zotac. Unfortunately, the dragon theme is abandoned again. Besides, there is no window to show the card. The box itself is made of thick cardboard. All bundled components are arranged into compartments inside. The card itself is inside a foamed polyurethane box for protection in transit.

Installation and Drivers

Testbed configuration:

  • Intel Core2 (775 Socket) based computer
    • CPU: Intel Core2 Extreme QX9650 (3000 MHz)
    • Motherboard: Gigabyte GA-X38-DQ6 on the Intel X38 chipset
    • RAM: 2 GB DDR2 SDRAM Corsair 1142MHz (CAS (tCL)=5; RAS to CAS delay (tRCD)=5; Row Precharge (tRP)=5; tRAS=15)
    • HDD: WD Caviar SE WD1600JD 160GB SATA
    • PSU: Tagan 1100-U95 (1100W)

  • OS: Windows XP SP2; DirectX 9.0c
  • OS: Windows Vista 32bit; DirectX 10.0
  • Monitor: Dell 3007WFP (30")
  • Drivers: ATI CATALYST 8.3; NVIDIA 174.74.
  • VSync is disabled.

Synthetic tests

Our synthetic benchmarks are available here:

  • D3D RightMark Beta 4 (1050) with its description at http://3d.rightmark.org
  • D3D RightMark Pixel Shading 2 and D3D RightMark Pixel Shading 3 - tests of Pixel Shaders 2.0 and 3.0 link.
  • RightMark3D 2.0 with a brief description: link

RightMark3D 2.0 requires MS Visual Studio 2005 runtime as well as the latest update of DirectX runtime.

Synthetic tests were run with the following graphics cards:

  • NVIDIA GeForce 9800 GTX with standard parameters (GF9800GTX)
  • NVIDIA GeForce 8800 Ultra with standard parameters (GF8800U)
  • NVIDIA GeForce 8800 GTS 512MB with standard parameters (GF8800GTS 512)
  • RADEON HD 3870 with standard parameters (HD3870)

We selected them to compare with GeForce 9800 GTX for the following reasons: GeForce 8800 GTS 512MB is practically an identical graphics card with similar clock rates. Old GeForce 8800 Ultra will help us evaluate the effect of bandwidth on our synthetic tests and to see what minor architectural changes give us. RADEON HD 3870 is the fastest single-GPU solution from AMD.

However, we suspect that our performance analysis with synthetic tests will not reveal anything interesting, because nothing has changed from the architectural point of view, it's still the same G92 (it just operates at different frequencies.) We are still waiting for new architectures.

Direct3D 9: Pixel Filling tests

This test determines peak texel rate in FFP mode for different numbers of textures applied to a pixel:

Not all graphics cards can demonstrate results close to their theoretical maximum. Results of synthetic tests are most often a tad lower than the theoretical maximum. Graphics cards based on G80 and RV670 come closer to this threshold than the other cards, they are just 10-12% short of the target line. NVIDIA cards, notable for improved TMUs, fail to reach their theoretical maximum in our old test. G92 can look up over 32 texels per cycle from 32-bit textures with bilinear filtering, although theoretically it can do better.

In case of few textures per pixel, GeForce 9800 GTX is outperformed by GeForce 8800 Ultra. In case of one texture, it's close to RADEON HD 3870. In such cases all graphics cards are limited by video memory bandwidth. ROP capacities are revealed better in heavier conditions. The fastest graphics card based on G92 performs more than twice as fast as HD 3870 in this case. Let's have a look at the fill rate results:

The second synthetic test measures the fill rate. It shows the same situation adjusted for the number of pixels written into the frame buffer. In case of 0 and 1 texture, performance is apparently limited by memory bandwidth as well as by the number and frequency of ROPs. The situation resembles the previous test - as the number of textures per pixel grows, GeForce 9800 GTX heavily outperforms its competitor from AMD and noticeably outperforms even GeForce 8800 Ultra starting from three textures per pixel. In case of 1 and 2 textures, the latter has a significant advantage in memory bandwidth and fill rate.

Direct3D 9: Geometry Processing Speed Tests

Let's analyze a couple of stress geometry tests. The first test uses the simplest vertex shader that shows maximum triangle throughput:

As all these GPUs are based on unified architectures, their unified processors in this test are busy with geometry processing only. So all solutions demonstrate high results, which are evidently limited not by peak performance of unified processors, but by performance of other units, for example, triangle setup.

Test results prove again that AMD GPUs process geometry faster than NVIDIA GPUs. Performance difference between GeForce 9800 GTX and RADEON HD 3870 is not big so far, but the AMD solution is faster even in such a simple task. G80 and RV670 execute this test with similar efficiency in various modes, peak performance in FFP, VS 1.1, and VS 2.0 does not differ much. FFP mode is noticeably faster in all representatives of the G9x architecture.

We've removed intermediate geometry tests with a single light source. So we proceed straight to the most complex geometry task with three light sources and static/dynamic branching:

Now we can see the performance difference better, the gap between AMD and NVIDIA solutions has grown. RADEON HD 3870 outperforms all other solutions. Even this most complex geometry task does not reveal its full potential. Test results in various modes are practically identical. Besides, in case of three mixed light sources, optimized FFP emulation in G92 becomes even more noticeable.

GeForce 9800 GTX demonstrates appropriately better results than GeForce 8800 Ultra, which can be explained with the increased GPU frequency. Performance difference from GeForce 8800 GTS also corresponds to the theoretical difference. On the whole, all solutions perform well in these tests, they can use all unified stream processors to solve geometry tasks. What concerns real applications, unified processors are busy mostly with pixels there. We proceed to such tests now.

Direct3D 9: Pixel Shaders Tests

The first group of pixel shaders to be reviewed here is too simple for modern GPUs. It includes various versions of pixel programs of relatively low complexity: 1.1, 1.4, and 2.0.

These tests are too easy for modern architectures and fail to reveal their true power. Performance in simple tests is limited by texel and fill rates, we can see it in low results of RADEON HD 3870, which is outperformed by all NVIDIA cards in all tests. It's apparently the effect of relatively few TMUs in the AMD GPU, because it's still outperformed even in more complex PS 2.0 tests, such as Phong with three light sources.

GeForce 9800 GTX demonstrates excellent results on a par with GeForce 8800 Ultra, being slightly outperformed in tasks, where performance is limited by the fill rate, and performing faster in other tests. GeForce 8800 GTS is slightly slower than both cards. Let's have a look at results in more complex pixel programs of intermediate versions:

The procedural water test (which depends much on texturing performance) uses dependent texture lookups of high nesting depth, so the only RADEON card lags far behind the NVIDIA solutions based on G92 and G80. It's 2.5-3 times as slow. The graphics card under review is the leader here, it expectedly outperforms all its congeners, in compliance with the theoretical data.

The second test (arithmetic-intensive) apparently favors the R6xx architecture with lots of arithmetic units. In this test the AMD card catches up with GeForce 8800 Ultra, but both G92-based cards are still faster. If we compare GeForce 9800 GTX and 8800 Ultra, the new card from NVIDIA is faster, although the performance difference is only 5%. Compared to GeForce 8800 GTS 512MB, the gap matches the theoretical 3% value.

Direct3D 9: New Pixel Shaders Tests

These tests of DirectX 9 pixel shaders are even more complex, they are divided into two categories. We'll start with easier shaders - SM 2.0:

  • Parallax Mapping - a texturing method used in many modern games
  • Frozen Glass - a complex procedural texture that visualizes frozen glass with adjustable parameters

There are two modifications of these shaders: arithmetic intensive and texture sampling intensive. Let's analyze arithmetic-intensive modifications, they are more promising from the point of view of future applications:

A situation in the Frozen Glass test is similar to that in the previous group of Water tests. Even though these are arithmetic tests, which depend on shader unit frequency, GeForce 9800 GTX is slightly outperformed by 8800 Ultra. Performance seems to be limited not only by arithmetics and texel rate, but also by fill rate. RADEON HD 3870 in this test is more than twice as slow as any other card in this review.

In return, the AMD card is noticeably faster in the second Parallax Mapping test, although it's still outperformed by NVIDIA cards. Even though G92 has improved TMUs (parallax mapping needs an additional texture lookup), GeForce 8800 Ultra is still faster than GeForce 8800 GTS and GeForce 9800 GTX, but only a little. Let's analyze results obtained in the texture sampling intensive tests, where the G92-based cards should demonstrate higher results:

The situation has changed a little. Performance in this test is limited by the speed of texture units more than ever, so the new G92-based card heavily outperforms RADEON HD 3870 in both tests. The performance difference reaches 2-2.5 times. GeForce 8800 Ultra is defeated this time, GeForce 9800 GTX is slightly faster here. All contenders execute arithmetic-intensive shaders faster than their modifications with lots of texture lookups.

Let's have a look at results of another two pixel shader tests - SM 3.0. They are the most complex of all our tests for Direct3D 9 pixel shaders. The tests load ALUs and texture units heavily. Both shader programs are complex, long, and include a lot of branches:

  • Steep Parallax Mapping is a much heavier modification of parallax mapping
  • Fur - a procedural shader that visualizes fur

The load on graphics cards in these tests is great, only such powerful GPUs can cope with it and keep up acceptable performance. Although AMD cards efficiently execute complex Pixel Shaders 3.0 with a lot of branches, GeForce 9800 GTX is more than twice as fast as the RV670-based solution in both tests, which can be explained with faster bilinear texture lookups in the G9x architecture and more efficient usage of available resources (scalar and superscalar architectures.) Performance difference from GeForce 8800 GTS 512MB is not big, even smaller than the theoretical value. But GeForce 8800 Ultra is defeated again, by 9% and 17% in the first and the second test correspondingly.

When we analyze results of such synthetic tests, we should take into account that the situation may be different in real applications, if they use trilinear and/or anisotropic texture filtering.

Direct3D 10: PS 4.0 Tests (texturing, loops)

New RightMark3D 2.0 includes two old PS 3.0 tests (Direct3D 9), rewritten for DirectX 10, and two brand new tests. The first two tests can now enable self-shadowing and shader supersampling, which increase their load on GPUs.

These tests measure efficiency of executing looped pixel shaders with a lot of texture lookups (up to several hundreds of lookups per pixel in the heaviest mode!) and a relatively low ALU load. In other words, they measure a texture sampling rate and branching efficiency in a pixel shader.

The first pixel shader test will be the Fur test. When used with the lowest settings, it uses 15-30 texture lookups from bump maps and two lookups from the main texture. The High Effect Detail mode increases the number of lookups to 40-80. When shader supersampling is enabled, the number of lookups grows to 60-120. The heaviest mode is the High mode with SSAA - 160-320 lookups from a bump map.

Let's see what happens in modes without supersampling - they are relatively simple, and the correlation of results in Low/High modes must be similar.

All results in the High mode are approximately 1.5 times as low as in the Low mode. Interestingly, GeForce 8800 Ultra demonstrates a bigger difference. Perhaps, the Low mode is affected by the fill rate and higher memory bandwidth. The procedural Fur tests for Direct3D 10 with lots of texture lookups again show a huge advantage of NVIDIA solutions over AMD cards.

Now we can say for sure that performance of this test depends not only on the number and speed of TMUs, but also on the fill rate and memory bandwidth. Our proof is the comparison of GeForce 9800 GTX and 8800 Ultra - their performance difference is great, especially in the Low mode. Let's have a look at the results in this test with enabled shader supersampling, which quadruples the load. Perhaps it will change the situation, and memory bandwidth/fill rate will produce a weaker effect:

Theoretically, supersampling quadruples the load. But a performance drop of NVIDIA cards is deeper than that of AMD cards. So the performance gap between them closes down, and HD 3870 comes up a little. But this fact does not save this card from defeat - NVIDIA cards enjoy an overwhelming advantage. In other respects, performance gap between GeForce 9800 GTX and GeForce 8800 Ultra closes down as the shader grows more complex. That is the fill rate and memory bandwidth have a weaker effect on the overall performance. The fastest G80-based card is still in the lead.

The second test that measures efficiency of executing complex looped pixel shaders with many texture lookups is called Steep Parallax Mapping. With low settings it uses 10-50 texture lookups from a bump map and three lookups from main textures. The heavy mode with self-shadowing doubles the number of texture lookups, and supersampling quadruples this number. The most complex test mode with supersampling and self-shadowing uses 80-400 texture lookups, that is eight times as many as in the low mode. Let's analyze simple modes without supersampling first:

This test is more interesting from the practical point of view. Various parallax mapping methods have been used in games for a long time already. Heavy modifications, such as our steep parallax mapping, are already used in the latest releases, e.g. in Crysis and Lost Planet. Along with supersampling, our test can enable self-shadowing that doubles the GPU load (High mode).

Even though AMD solutions used to be traditionally strong in our Direct3D 9 tests of parallax mapping, they cannot cope with our updated D3D10 test without supersampling on a par with the best GeForces. Besides, self-shadowing causes a bigger performance drop in AMD products than in NVIDIA solutions.

GeForce 9800 GTX under review again failed to outperform GeForce 8800 Ultra, although it comes close to that in the High mode. There is almost no difference with GeForce 8800 GTS 512MB. Let's see what supersampling will change. It caused a bigger performance drop in NVIDIA cards in the previous test...

Supersampling and self-shadowing increase the load on graphics cards by almost eight times, causing a great performance drop. Performance differences between various cards are not as they used to be. Supersampling has a similar effect as in the previous case - the AMD card significantly improves its results versus NVIDIA cards. However, RADEON HD 3870 is still outperformed by all GeForces by more than twofold. What concerns the comparison of GeForce 9800 GTX with the old G80-based card, it finally outperforms GeForce 8800 Ultra, but only in the heaviest mode. The difference between two G92-based cards is too small to take it into account. Their results are very close, as they theoretically should be.

Direct3D 10: PS 4.0 Tests (computing)

The next couple of pixel shader tests contains minimum texture lookups to reduce the effect of TMU performance. They use a lot of arithmetic operations, so they measure arithmetic performance of GPUs, how fast they execute arithmetic instructions in pixel shaders.

The first computing test is called Mineral. It's a complex procedural texturing test, which uses only two texture lookups and 65 sin and cos instructions.

We already noted many times in the analysis of our Direct3D 9 synthetic test results, that the modern AMD architecture often performs better in complex arithmetic tasks than the competing architecture from NVIDIA. But time flies, and the situation changes. Modifications in G92 really helped improve performance. For example, in this test RADEON HD 3870 is outperformed by any GeForce contender, although not much.

GeForce 9800 GTX offers the best results. This card is slightly faster than GeForce 8800 Ultra and GeForce 8800 GTS 512MB. Everything corresponds to theoretical data (the number and clock rate of unified shader units.) Results are also affected by memory bandwidth (it also affects the fill rate), because GeForce 8800 Ultra is very close to GeForce 8800 GTS, although the theoretical difference is bigger...

The second shader test is called Fire, it's even harder for ALUs. It contains only a single texture lookup, while the number of sin/cos instructions is doubled to 130. Let's see what changes as the load grows:

AMD cards failed this test in 2007, demonstrating very low results. But the bug has been fixed since RADEON HD 3870 X2, so AMD results are finally close to the theoretical values. And now RADEON HD 3870 performs in this test even better than all GeForce 8800 and 9800 cards.

What concerns relative performance of NVIDIA cards, their rendering performance in this case is apparently limited by shader units. GeForce 8800 GTS 512MB outperforms GeForce 8800 Ultra. GeForce 9800 GTX is faster than both of them, its results are fully compliant with the theoretical ALU performance, that is the frequency/number ratio of execution units.

Direct3D 10: Geometry Shader Tests

RightMark3D 2.0 includes two geometry shader tests. The first one is called Galaxy, it's similar to point sprites from previous Direct3D versions. It animates a system of particles using a GPU, a geometry shader creates four vertices from each dot, forming a particle. Similar algorithms should be used in future DirectX 10 games.

A change of balance in geometry tests does not affect rendering results, the image is always identical, only scene processing methods differ. GS load value determines what shader will be busy - vertex or geometry. The amount of work is always the same.

Let's analyze the first modification of Galaxy with vertex processing for three levels of geometric complexity:

So, the correlation of results with different complexity levels of the scene is approximately the same. Performance demonstrated corresponds to the number of points, FPS is halved each step. Previous reviews proved that it was not a hard task for modern graphics cards. Performance in this test is not apparently limited by shader ALUs. The task is limited by ALUs as well as by memory bandwidth and fill rate (to a lesser degree.) Performance difference between two cards on G92 is very small, both of them outperform GeForce 8800 Ultra. And the competing graphics card from AMD is nearly twice as slow in this test. Perhaps the situation will change, when some work is moved to a geometry shader.

But the difference is small, nothing has changed much. All graphics cards from NVIDIA demonstrate almost the same results with various GS load values, which are responsible for moving some of the load to the geometry shader. The only difference consists in slightly higher results of AMD RADEON HD 3870, this time it's not outperformed that much. Let's see what will change in the next test, which generates a heavier load on geometry shaders...

Hyperlight is the second geometry test that uses several techniques: instancing, stream output, buffer load. It employs dynamic generation of geometry by rendering into two buffers, as well as a new Direct3D 10 feature - stream output. The first shader generates ray directions, their speed and growth vectors. These data are stored in a buffer, which is used by the second shader for rendering. Each ray point is used to generate 14 vertices in a circle, up to a million output points.

The new type of shader programs is used to generate rays. If "GS load" is set to "Heavy", it's also used for rendering. That is in Balanced mode, geometry shaders are used only to generate and grow rays. Output is up to instancing. The geometry shader also outputs data in the Heavy mode. Let's analyze the easy mode first:

Relative results in various modes correspond to the load: performance scales well in all cases. It's close to theoretical parameters, according to which, each next level of Polygon count must be twice as slow. GeForce 9800 GTX again performs a tad better than the old GeForce 8800 Ultra, and this difference grows together with the load. The card from AMD is again outperformed by all NVIDIA solutions with any geometry complexity. Moreover, the performance difference is even bigger than in the previous test, more than twofold already.

The situation is generally similar to the previous test, but the results may change on the next diagram for the test that uses geometry shaders more actively. It will be also interesting to compare test results obtained in Balanced and Heavy modes.

That's the first time in geometry tests, when the performance correlation changes significantly. It turns out that the AMD GPU executes complex geometry shaders more efficiently than NVIDIA GPUs. But the difference is very small. It looks like NVIDIA has fixed some bugs in its drivers, and now GeForce 9800 GTX outperforms GeForce 8800 Ultra by almost 10% and practically catches up with the AMD card based on RV670 in the heaviest conditions. The performance gap used to be much wider...

What concerns the comparison of results obtained in different modes, everything is as usual. The graphics card from AMD is outperformed even though NVIDIA cards lose much performance as they switch from instancing to a geometry shader. All GeForces based on G92 and G80 perform better in Balanced mode, than RADEON HD 3870 does in Heavy mode. You should keep in mind that the image does not differ (visually) in these modes.

Direct3D 10: Vertex texture fetch rate

Vertex Texture Fetch tests measure the speed of many vertex texture fetches. These tests are essentially similar, and the correlation of their results in Earth and Waves tests must also be similar. Both tests use displacement mapping based on texture fetches. The only major difference is that the Waves test uses conditional branches, while the Earth test does not.

Let's analyze the first test (Earth) in Effect detail Low mode:

It's interesting to note that relative results in various modes differ much. Judging by our previous reviews, this test is heavily affected by memory bandwidth - the easier the mode, the stronger its effect on performance. It's apparent in relative results of GeForce 9800 GTX and GeForce 8800 Ultra - the latter is victorious in the Low mode (owing to its much higher memory bandwidth), the average results are similar. And they are practically identical in the most complex mode. Performance difference between GeForce 8800 GTS and 9800 GTX corresponds to the theoretical difference, RADEON HD 3870 is outperformed nearly twofold. Let's have a look at results of this test with more texture lookups:

The situation hasn't changed much, GeForce 8800 Ultra is still victorious in easy modes, GF 9800 GTX shoots forward in the heavy mode. GeForce 8800 GTS is outperformed by them by almost the same value as in the previous case, the AMD card hasn't changed its position or lag. Just like in the previous case, results of the graphics cards grow closer, as the task gets more complex.

Let's have a look at results of the second vertex texture fetch test. The Waves test executes fewer texture lookups, but it uses conditional branches. The number of bilinear texture lookups in this case reaches 14 (Effect detail Low) or 24 (Effect detail High) per each vertex. Geometry complexity changes just like in the previous test.

The Waves test favors AMD products. RADEON HD 3800 looks very good, outperforming G92-based solutions in easy modes, and being slightly slower in the heavy mode. Performance in this test seems to depend not on TMU, but on memory bandwidth and fillrate, because both G92-based cards are outperformed by the prev-gen solution - GeForce 8800 Ultra. The heavier the texture fetch test, the closer the results. Interestingly, this time GeForce 8800 GTS 512MB is even slightly faster than GeForce 9800 GTX (it will replace the former) in all modes. Perhaps there is something wrong with optimizations in the drivers, as this fact cannot be explained with theoretical differences. Let's analyze the second modification of the test:

There are almost no changes here again, although RADEON HD 3870 performs better as the test grows more complex, because NVIDIA cards lose much more speed. The other conclusions also hold true - performance is limited by memory bandwidth in all modes, especially in the Low mode. TMUs start to play an important role in the High mode, so GeForce 9800 GTX almost catches up with GeForce 8800 GTS. But it fails to catch up with RADEON HD 3870, which takes up the second place after the Ultra card. AMD cards have noticeably improved their positions in the VTF tests of late. We previously noted that NVIDIA cards coped better with vertex texture fetches, but now the situation has changed.

Conclusions on the synthetic tests

Synthetic tests of GeForce 9800 GTX and other products from both competitors show us that the new solution from NVIDIA is very powerful. But it does not differ much in performance from GeForce 8800 GTS 512MB, which it's designed to replace. The new product often outperforms the old top GeForce 8800 Ultra in synthetic tests, and it's almost always ahead of RADEON HD 3870. Highly efficient architecture, a sufficient number of ALUs, TMUs, and ROPs, as well as high operating frequencies allow this GPU to demonstrate excellent results in all our synthetic tests.

That's the effect of improvements in the G9x architecture versus G8x. It's notable for high computing performance, which is important for modern and future applications with lots of complex shaders of all types. Compared to the previous G8x architecture, the new G9x features modified TMUs and ROPs. Texture units can fetch twice as much data in certain conditions, and its ROPs support the new compression technique, which improves efficiency of video memory utilization.

The graphics card is well balanced, it has enough execution units of all types, and its only potential drawback is lower memory capacity, narrower memory bus (together with fewer ROPs) and consequently, its lower throughput. That's what is not enough in some tests to demonstrate higher results than GeForce 8800 Ultra. But the graphics card under review is not designed as its replacement, and it comes at a lower price. So its performance is quite sufficient for successful competition in its price range.

In the next part of the article we'll analyze tests of this card from NVIDIA in modern games, which must be similar to our conclusions made after synthetic tests, adjusted for the stronger effect of the fill rate and memory bandwidth. Gaming results should be much more interesting than results demonstrated in synthetic tests, because rendering speed in games almost always depends more on texel and fill rates than on ALUs and geometry processors.

Part 3: Gaming tests

PSU provided by TAGAN
Monitor provided by NVIDIA


Andrey Vorobiev (anvakams@ixbt.com)
Alexei Berillo (sbe@ixbt.com)
May 7, 2008

Write a comment below. No registration needed!


Article navigation:



blog comments powered by Disqus

  Most Popular Reviews More    RSS  

AMD Phenom II X4 955, Phenom II X4 960T, Phenom II X6 1075T, and Intel Pentium G2120, Core i3-3220, Core i5-3330 Processors

Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
February 1, 2013 · Processor Roundups

Inno3D GeForce GTX 670 iChill, Inno3D GeForce GTX 660 Ti Graphics Cards

A couple of mid-range adapters with original cooling systems.
January 30, 2013 · Video cards: NVIDIA GPUs

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

AMD FX-8350 Processor

The first worthwhile Piledriver CPU.
September 11, 2012 · Processors: AMD

Consumed Power, Energy Consumption: Ivy Bridge vs. Sandy Bridge

Trying out the new method.
September 18, 2012 · Processors: Intel
  Latest Reviews More    RSS  

i3DSpeed, September 2013

Retested all graphics cards with the new drivers.
Oct 18, 2013 · 3Digests

i3DSpeed, August 2013

Added new benchmarks: BioShock Infinite and Metro: Last Light.
Sep 06, 2013 · 3Digests

i3DSpeed, July 2013

Added the test results of NVIDIA GeForce GTX 760 and AMD Radeon HD 7730.
Aug 05, 2013 · 3Digests

Gainward GeForce GTX 650 Ti BOOST 2GB Golden Sample Graphics Card

An excellent hybrid of GeForce GTX 650 Ti and GeForce GTX 660.
Jun 24, 2013 · Video cards: NVIDIA GPUs

i3DSpeed, May 2013

Added the test results of NVIDIA GeForce GTX 770/780.
Jun 03, 2013 · 3Digests
  Latest News More    RSS  

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  About us & Privacy policy  ·  Twitter  ·  Facebook


Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.