iXBT Labs - Computer Hardware in Detail

Platform

Video

Multimedia

Mobile

Other

NVIDIA GeForce 9800 GX2 Graphics Card



New Dual-GPU Challenger of the 3D Throne
On the example of
Gigabyte GeForce 9800 GX2 2x512MB PCI-E,
XFX GeForce 9800 GX2 2x512MB PCI-E

NVIDIA GeForce 9800 GX2 Graphics Card. Part 1: Theory and Architecture

We've tested two graphics cards from Gigabyte and XFX based on GeForce 9800 GX2. All of them operate at the nominal frequencies. They are reference cards, bought by partners from NVIDIA, and manufactured at Flextronics and PC Partner plants by orders of the Californian chip maker.

Graphics Cards

Gigabyte GeForce 9800 GX2 2x512MB PCI-E
  • GPU: 2 x GeForce 9800 GTX (2 x G92)
  • Interface: PCI-Express x16
  • GPU frequencies (ROPs/Shaders):600/1512 MHz (nominal - 600/1512 MHz)
  • Memory frequencies (physical (effective): 1000 (2000) MHz (nominal - 1000 (2000) MHz)
  • Memory bus width: 2 x 256bit
  • Vertex processors: -
  • Pixel processors: -
  • Unified processors: 2 x 128
  • Texture processors: 2 x 64 (BLF/TLF)
  • ROPs: 2 x 16
  • Dimensions: 270x100x33 mm (the last figure is maximum thickness of the graphics card).
  • PCB color: black
  • RAMDACs/TDMS: integrated into GPU.
  • Output connectors: 2xDVI (Dual-Link/HDMI), HDMI.
  • VIVO: not available
  • TV-out: not installed.
  • Multi-GPU operation: SLI (Hardware).
XFX GeForce 9800 GX2 2x512MB PCI-E
  • GPU: 2 x GeForce 9800 GTX (2 x G92)
  • Interface: PCI-Express x16
  • GPU frequencies (ROPs/Shaders): 600/1512 MHz (nominal - 600/1512 MHz)
  • Memory frequencies (physical (effective): 1000 (2000) MHz (nominal - 1000 (2000) MHz)
  • Memory bus width: 2 x 256bit
  • Vertex processors: -
  • Pixel processors: -
  • Unified processors: 2 x 128
  • Texture processors: 2 x 64 (BLF/TLF)
  • ROPs: 2 x 16
  • Dimensions: 270x100x33 mm (the last figure is maximum thickness of the graphics card).
  • PCB color: black
  • RAMDACs/TDMS: integrated into GPU.
  • Output connectors: 2xDVI (Dual-Link/HDMI), HDMI.
  • VIVO: not available
  • TV-out: not installed.
  • Multi-GPU operation: SLI (Hardware).


Gigabyte GeForce 9800 GX2 2x512MB PCI-E
XFX GeForce 9800 GX2 2x512MB PCI-E
Each graphics card has 1024 MB of GDDR3 SDRAM allocated in sixteen chips (8 chips on the front side of each PCB)

Samsung memory chips (GDDR3). 0.8 ns memory access time, which corresponds to 1250 (2500) MHz.



Comparison with the reference design, front view
Gigabyte GeForce 9800 GX2 2x512MB PCI-E
Reference card NVIDIA GeForce 9800 GTX
XFX GeForce 9800 GX2 2x512MB PCI-E


Comparison with the reference design, back view
Gigabyte GeForce 9800 GX2 2x512MB PCI-E
Reference card NVIDIA GeForce 9800 GTX
XFX GeForce 9800 GX2 2x512MB PCI-E


These cards are apparently of the unique design. Unlike the dual-GPU predecessor 7950GX2, this time NVIDIA approached the new product more seriously. It has to do with high heat release of each G92 in the first place, as well as fast memory, which also needs to be cooled. So it's impossible to use the old relatively simple cooling systems.

It's difficult and expensive to place both GPUs with their memory kits on a single PCB. However, having completed the research of 9800 GX2 complexity, the company had doubts whether a double-PCB design would be cheaper than a single-PCB design, like RADEON 3870 X2.





It looks like a box with connectors, not like a graphics card. NVIDIA engineers decided to place two PCBs along the walls of the housing with a cooler between them. So both PCBs are installed with the front sides facing each other, chips are inside.

Engineers took care of the integrity of the design, so it's very difficult to remove the housing. One half of the housing is fixed in an intricate way, it's impossible to remove this part without force.

There are two PCBs inside (as in 9800 GTX) with cuts to let the air in to the turbine in the tail part of the device.

As we can see on the photos, the air is sucked in at the sides in the rear part of the card. There is no PCB there, only a large turbine rotating at a relatively low speed. Where does the hot air go to? Such turbines usually throw it out of a PC case through vent grids that take up the second slot. But in this case, two PCBs allow to design this exhaust only in the bottom part of the device (have a look at the first photos that show connectors - you will see the vent grid that lets the hot air out).

But this hole won't be sufficient, so the engineers decided to use the upper part of the device. There are vent grids on top of the graphics card as well, so the hot air ejected through them stays inside a PC case. A more convenient and relatively compact design (just two slots) required to sacrifice the removal of hot air from a PC case.

It will certainly raise operating temperatures of the card and neighboring components. Moreover, temperature of the housing after a long session rises above the pain threshold. It needs to cool down before you can remove the card. You should take it into account - if you want to buy this product, you must have a large well-ventilated PC case.

The card is 270 mm long, just like 8800 GTX/Ultra. So a PC case should be large enough to accommodate this device. Width of the housing does not change along the card, so a motherboard should have a zone (30 mm wide) behind PCI-E x16 free of ports or high capacitors, that is not only behind the PCI-E slot, but also behind the neighboring slot.

Graphics cards of this series are equipped with an audio connector to plug to a sound card, in order to transmit the audio stream to HDMI (via a DVI-to-HDMI adapter). That is the graphics card does not contain an audio codec, but it receives the audio signal from an external sound card. So if this function is important to you, make sure the bundle includes a special audio cable.

The graphics card uses TWO power connectors, 6-pin and 8-pin. So you should make sure the bundle includes a power cable adapter to 8-pin.

Unfortunately, these power connectors are installed with their latches inwards. So it will be very difficult to unplug power cables from a PSU, because you have to press the latch to remove the plug (it's almost impossible to push a finger between the connectors, so you will have to use a screwdriver for this operation).



These cards DO NOT have a TV-Out.

Analog monitors with d-Sub (VGA) interface are connected with special DVI-to-d-Sub adapters. The bundle also includes DVI-to-HDMI adapters (these graphics cards support video/audio transfer to HDMI receivers), so there should be no problems with such monitors. There is also the third output, directly to an HDMI receiver.

Maximum resolutions and frequencies:

  • 240 Hz Max Refresh Rate
  • 2048 x 1536 x 32bit x 85 Hz Max - analog interface
  • 2560 x 1600 @ 60 Hz Max - digital interface (all DVIs with Dual-Link)

What concerns MPEG2 playback features (DVD-Video), we analyzed this issue in 2002. Little has changed since that time. CPU load during video playback on modern graphics cards does not exceed 25%.

What concerns HDTV. You can read one review here.

We monitored temperatures using RivaTuner (written by A.Nikolaychuk AKA Unwinder). Here are the results:

GeForce 9800 GX2 2x512MB PCI-E

It's possible to monitor temperatures using an external sensor on the motherboard (the first GPU temperature value), as well as temperatures of each core, read through the NVIDIA driver (the second and the third temperatures). Practice proves our assumption that this cooling system is not very efficient, because it throws hot air outside only partially, most part of it stays inside. So it grows hot to 90B°C. It does not affect noise though, the fan rotates at a relatively low speed. The cooling system is not completely noiseless, of course, you can still make out some rustling. But it's not critical.

Let's proceed to bundles.

Bundles

All bundles include a User Manual, CD with drivers and utilities, two external power splitters (one for 6 pins, the other for 8 pins), DVI-to-VGA, DVI-to-HMDI adapters, audio cable to transfer the signal from a sound card to HDMI. And now let's see what other accessories are added by each vendor.

Gigabyte GeForce 9800 GX2 2x512MB PCI-E
A basic bundle.
XFX GeForce 9800 GX2 2x512MB PCI-E
There is no HDMI adapter, although it's irrelevant, as there is a special connector. Company Of Heroes with the DX10 patch as a bonus.

Packages

Gigabyte GeForce 9800 GX2 2x512MB PCI-E

Traditional vertically-designed box. A bright glossy jacket in Gigabyte style with a box made of thick cardboard inside. The card itself is inside a foamed polyurethane compartment, so damages in transit are out of the question. Bundled accessories are stored in a compartment over the card.

XFX GeForce 9800 GX2 2x512MB PCI-E

This company is famous for its thick packages for Hi-End products. The same here. There is a cutout in a thick piece of porolon inside the box to accommodate the card. So the card will be perfectly preserved in transit.

Bundled accessories are stored in a special compartment.



Installation and Drivers

Testbed configuration:

  • Intel Core2 (775 Socket) based computer
    • CPU: Intel Core2 Extreme QX9650 (3000 MHz)
    • Motherboard: Gigabyte GA-X38-DQ6 on the Intel X38 chipset
    • RAM: 2 GB DDR2 SDRAM Corsair 1142 MHz (CAS (tCL)=5; RAS to CAS delay (tRCD)=5; Row Precharge (tRP)=5; tRAS=15)
    • HDD: WD Caviar SE WD1600JD 160GB SATA
    • PSU: Tagan 1100-U95 (1100W)

  • OS: Windows XP SP2; DirectX 9.0c
  • OS: Windows Vista 32bit; DirectX 10.0
  • Monitor: Dell 3007WFP (30")
  • Drivers: ATI CATALYST 8.3; NVIDIA 174.74.
  • VSync disabled.

    Synthetic tests

    Our synthetic benchmarks are available here:

    • D3D RightMark Beta 4 (1050) with its description at http://3d.rightmark.org
    • D3D RightMark Pixel Shading 2 and D3D RightMark Pixel Shading 3 - tests of Pixel Shaders 2.0 and 3.0 link.
    • RightMark3D 2.0 with a brief description: link

    RightMark3D 2.0 requires MS Visual Studio 2005 runtime as well as the latest update of DirectX runtime.

    Synthetic tests were run with the following graphics cards:

    • NVIDIA GeForce 9800 GX2 with standard parameters (GF9800GX2)
    • NVIDIA GeForce 9800 GTX with standard parameters (GF9800GTX)
    • NVIDIA GeForce 8800 Ultra with standard parameters (GF8800U)
    • RADEON HD 3870 X2 with standard parameters (HD3870X2)

    These graphics cards were chosen for our comparison with GeForce 9800 GX2 for the following reasons: it will be compared to the new GeForce 9800 GTX and the old GeForce 8800 Ultra, because they are the fastest single-GPU graphics cards based on the same architecture. Besides, it will help us evaluate the effect of the second GPU in synthetic tests. What concerns RADEON HD 3870 X2, it's a similar dual-GPU solution from AMD, the only direct competitor to the graphics card under review.

    Direct3D 9: Pixel Filling tests

    This test determines peak texel rate in FFP mode for different numbers of textures applied to a pixel:

    Traditionally, not all graphics cards can demonstrate results close to their theoretical maximum. Results of synthetic tests are most often a tad lower than the theoretical maximum. Graphics cards based on G80 and RV670 come closer to this threshold than the other cards. NVIDIA cards, notable for improved TMUs, fail to reach their theoretical maximum in our old test.

    What concerns our graphics card under review, for some strange reason, which has to do with multi-GPU rendering, it has failed our fillrate tests. RADEON HD 3870 X2 performs very well, demonstrating results in between GeForce 9800 GTX and GeForce 8800 Ultra, when graphics cards are not limited by video memory bandwidth. Let's have a look at the fill rate results:

    The second synthetic test measures the fill rate. It shows the same situation adjusted for the number of pixels written into the frame buffer. In case of 0 and 1 texture, performance is still limited by memory bandwidth as well as by the number and frequency of ROPs. The competing solution from AMD again demonstrates results close to single-GPU cards from NVIDIA.

    We can see well that GeForce 9800 GX2 has a performance limit in this test, which does not let this graphics card demonstrate good results in compliance with the theory. However, our synthetic tests for DX9 are getting old, so we don't use them to determine relative performance. But we can assume that such a situation may also be repeated in some of the newer tests.

    Direct3D 9: Geometry Processing Speed Tests

    Let's analyze a couple of stress geometry tests. The first test uses the simplest vertex shader that shows maximum triangle throughput:

    All these GPUs are based on unified architectures, their unified processors in this test are busy with geometry processing only. So all solutions demonstrate high results, which are evidently limited not by peak performance of unified processors, but by performance of other units, for example, triangle setup.

    Test results prove again that AMD GPUs are faster at processing geometry than NVIDIA GPUs. The difference between all GeForce and RADEON HD 3870 X2 cards is quite big. The AMD solution is faster even in such a simple task. However, GeForce 9800 GX2 is almost twice as fast as the single-GPU GeForce 9800 GTX, considering the lower operating frequency of the GPU in the former. Performance in geometry tasks in AFR mode is practically doubled, and single-GPU solutions from the same manufacturer are left far behind.

    Test execution efficiency of all NVIDIA representatives in various modes is slightly different. G9x cards, including the dual-GPU product, work faster in FFP mode than in VS 1.1 and VS 2.0. These results do not differ much in the G80 product. We've removed intermediate geometry tests with a single light source. So we proceed straight to the most complex geometry task with three light sources and static/dynamic branching:

    In this case performance difference between AMD and NVIDIA products has grown. RADEON HD 3870 X2 outperforms the other solutions, including GeForce 9800 GX2. Even the most complex geometry task does not reveal full potential of RV670. Its results are similar in various modes, they are almost identical to those on the previous diagram. Optimized FFP emulation in G92 becomes even more noticeable in this test with three mixed light sources.

    Efficiency of dual-GPU rendering in GeForce 9800 GX2 is lower this time, the twofold mark is not always reached. However, the card is noticeably faster than single-GPU GeForce 8800 Ultra and 9800 GTX, as it should. On the whole, all GPUs perform well in these tests, they can use all unified stream processors to solve geometry tasks. What concerns real applications, unified processors are busy mostly with pixels there. We proceed to such tests now.

    Direct3D 9: Pixel Shaders Tests

    The first group of pixel shaders to be reviewed here is too simple for modern GPUs. It includes various versions of pixel programs of relatively low complexity: 1.1, 1.4, and 2.0.

    These tests are too easy for modern architectures and fail to reveal their true power. Performance in these tests is often limited by texel and fill rates, we can see it well in weak results of RADEON HD 3870 X2, which performs on the level of single-GPU cards from NVIDIA in many tests. Sometimes it's even slower. It's apparently the effect of relatively few TMUs in the AMD GPU, because it's still outperformed by the dual-GPU card from NVIDIA in relatively complex PS 2.0 tests, such as Phong with three light sources.

    GeForce 9800 GX2 demonstrates excellent results in simple tests of pixel shaders, it's approximately twice as fast as its single-GPU counterpart, considering its lower frequencies. The RADEON card is always heavily outperformed, except for the most complex shader, when performance is not limited by the rill and texel rates anymore. Let's have a look at results in more complex pixel programs of intermediate versions:

    The procedural water test (which depends much on texturing performance) uses dependent texture lookups of high nesting depth, so the dual-GPU card from AMD is outperformed even by single-GPU solutions based on G92 and G80. The card under review works efficiently in AFR mode, doubling its frame rate versus the single-GPU GeForce 9800 GTX (in full compliance with frequencies and theory.)

    The second test (arithmetic-intensive) apparently favors the AMD architecture with lots of arithmetic units: the card from AMD not only outperforms single-GPU cards from NVIDIA here, but is also slightly faster than GeForce 9800 GX2. In its turn, the card under review honestly performs twice as fast as the single-G92 card operating at the same frequency.

    Direct3D 9: New Pixel Shaders Tests

    These tests of DirectX 9 pixel shaders are even more complex, they are divided into two categories. We'll start with easier shaders - SM 2.0:

    • Parallax Mapping - a texturing method used in many modern games
    • Frozen Glass - a complex procedural texture that visualizes frozen glass with adjustable parameters

    There are two modifications of these shaders: arithmetic intensive and texture sampling intensive. Let's analyze arithmetic-intensive modifications, they are more promising from the point of view of future applications:

    RADEON HD 3870 X2 is bad off in the Frozen Glass test. The situation here resembles what happens in the previous block, Water test. In this case the AMD solution is defeated even by the single-GPU GeForce 9800 GTX and GeForce 8800 Ultra, to say nothing of the dual-GPU 9800 GX2. Even though these arithmetic tests should depend mostly on shader units, their performance seems to be limited not only by arithmetics and texel rate, but also by fill rate.

    In the second test of Parallax Mapping, the AMD card almost catches up with GeForce 9800 GX2, which is the leader of this test. Both dual-GPU cards effectively double their performance relative to single-GPU cards. Let's analyze results obtained in the texture sampling intensive tests, where the G92-based cards should demonstrate higher results:



    All contenders execute shaders with many texture lookups more slowly than arithmetic-intensive modifications. The situation has changed a little. Performance in this test is limited by the speed of texture units more than ever, so the dual-G92 card heavily outperforms RADEON HD 3870 X2 in both tests. The performance difference reaches twofold. The AMD card can compete only with single-GPU cards from NVIDIA here. And GeForce 9800 GX2 is twice as fast as its single-GPU counterpart, considering the difference in frequencies.

    Let's have a look at results of another two pixel shader tests - SM 3.0. They are the most complex of all our tests for Direct3D 9 pixel shaders. The tests load ALUs and texture units heavily. Both shader programs are complex, long, and include a lot of branches:

    • Steep Parallax Mapping is a much heavier modification of parallax mapping
    • Fur - a procedural shader that visualizes fur

    The GPU load grows very much. Although AMD GPUs execute complex Pixel Shaders 3.0 with lots of branches efficiently, the dual-RV670 card performs similarly to the single-GPU GeForce 9800 GTX. Judging by results of G80- and G92-based cards, it can be explained by accelerated bilinear texture lookups in the G9x architecture, as well as higher utilization efficiency of available resources.

    Performance difference between the dual-GPU GeForce 9800 GX2 and the single-GPU GeForce 9800 GTX (based on the same graphics processor) corresponds to the theoretical data, performance is doubled with the help of AFR, if we take into account reduced operating frequencies of the graphics card under review.

    Direct3D 10: PS 4.0 Tests (texturing, loops)

    New RightMark3D 2.0 includes two old PS 3.0 tests (Direct3D 9), rewritten for DirectX 10, and two brand new tests. The first two tests can now enable self-shadowing and shader supersampling, which increase their load on GPUs.

    These tests measure efficiency of executing looped pixel shaders with a lot of texture lookups (up to several hundreds of lookups per pixel in the heaviest mode!) and a relatively low ALU load. In other words, they measure a texture sampling rate and branching efficiency in a pixel shader.

    The first pixel shader test will be the Fur test. When used with the lowest settings, it uses 15-30 texture lookups from bump maps and two lookups from the main texture. The High Effect Detail mode increases the number of lookups to 40-80. When shader supersampling is enabled - the number of lookups grows to 60-120. And the High mode with SSAA is the heaviest mode - 160-320 lookups from a bump map.

    Let's see what happens in modes without supersampling - they are relatively simple, and the correlation of results in Low/High modes must be similar.

    Results of all cards in the High mode are approximately 1.5 times as low as in the Low mode. As GeForce 8800 Ultra is a tad faster than the single-G92 solution, the test is affected by the fill rate and higher memory bandwidth. The procedural Fur tests for Direct3D 10 with lots of texture lookups again show a huge advantage of NVIDIA solutions over AMD cards. The dual-GPU solution from the latter cannot compete with NVIDIA cards in this test.

    GeForce 9800 GX2 (considering its lower frequencies) performs almost precisely twice is fast as the single-GPU GeForce 9800 GTX, AFR is very efficient in this synthetic test. Let's have a look at the results in this test with enabled shader supersampling, which quadruples the load. Perhaps it will change the situation, and memory bandwidth/fill rate will produce a weaker effect:

    Theoretically, supersampling quadruples the load. But a performance drop of NVIDIA cards is deeper than that of AMD cards. The performance gap between HD 3820 X2 and the single-GPU GeForces closes down, but the AMD card is still slower, NVIDIA cards have a supreme advantage here. In other respects the situation is the same - as the GPU load grows, performance differences between the cards are decreased, GeForce 9800 GX2 is still almost twice as fast as the new single-GPU card from NVIDIA.

    The second test that measures efficiency of executing complex looped pixel shaders with many texture lookups is called Steep Parallax Mapping. With low settings it uses 10-50 texture lookups from a bump map and three lookups from main textures. The heavy mode with self-shadowing doubles the number of texture lookups, and supersampling quadruples this number. The most complex test mode with supersampling and self-shadowing uses 80-400 texture lookups, that is eight times as many as in the low mode. Let's analyze simple modes without supersampling first:

    This test is more interesting from the practical point of view. Various parallax mapping methods have been used in games for a long time already. Heavy modifications, such as our steep parallax mapping, are already used in the latest releases, e.g. in Crysis and Lost Planet. Along with supersampling, our test can enable self-shadowing that doubles the GPU load (High mode).

    Although AMD solutions have been traditionally strong in our Direct3D 9 tests of parallax mapping, even the dual-GPU card from AMD fails to cope with our updated D3D10 test without supersampling on a par with the single-GPU GeForce 9800 GTX and 8800 Ultra. Self-shadowing causes a bigger performance drop in AMD products.

    GeForce 9800 GX2 demonstrates high FPS, almost twice as high as in 9800 GTX, taking their frequencies into account. It all indicates that SLI AFR copes with these synthetic tests perfectly. Let's see what supersampling will change. Performance drop from supersampling was bigger in NVIDIA cards in the previous test.

    Supersampling and self-shadowing increase the load on graphics cards by almost eight times, causing a great performance drop. Performance differences between graphics cards change. Supersampling has the similar effect as in the previous case - the AMD card significantly improves its results versus NVIDIA cards. RADEON HD 3870 X2 practically catches up with single-GPU GeForces. But the dual-GPU GeForce 9800 GX2 is far ahead, it's precisely twice as fast as the equally-clocked GeForce 9800 GTX, in full compliance with theoretical data.

    Direct3D 10: PS 4.0 Tests (computing)

    The next couple of pixel shader tests contains minimum texture lookups to reduce the effect of TMU performance. They use a lot of arithmetic operations, so they measure arithmetic performance of GPUs, how fast they execute arithmetic instructions in pixel shaders.

    The first computing test is called Mineral. It's a complex procedural texturing test, which uses only two texture lookups and 65 sin and cos instructions.

    We already noted in the analysis of our Direct3D 9 synthetic test results that the modern AMD architecture often performs better in complex arithmetic tasks than the competing architecture from NVIDIA. But modifications in G92 really improve its performance, as you can see in the comparison of two single-GPU cards. Now RADEON HD 3870 X2 is even outperformed by GeForce 9800 GX2, although insignificantly.

    Nominally, the highest framerate is demonstrated by GeForce 9800 GX2, but SLI efficiency has dropped a little compared to the previous tests, so the performance gain from the second GPU does not reach 95-100%. The other aspects correspond to the theoretical data on the number and frequency of unified shader units, as well as memory bandwidth and fill rate, as the performance difference between GeForce 8800 Ultra and GeForce 9800 GTX is very small.

    The second shader test is called Fire, it's even harder for ALUs. It contains only a single texture lookup, while the number of sin/cos instructions is doubled to 130. Let's see what changes as the load grows:

    AMD cards failed this test last year, demonstrating very low results. But the bug has been fixed since RADEON HD 3870 X2, so AMD results are close to the theoretical values. For example, RADEON HD 3870 X2 demonstrates maximum speed in this test. It's 20% as fast as GeForce 9800 GX2.

    What concerns relative performance of NVIDIA cards, rendering speed of the dual-GPU configuration is precisely twice as high as that of the single-GPU counterpart operating at the reduced frequencies. That is performance is apparently limited by shader performance, so the results are fully compliant with the theoretical difference.

    Direct3D 10: Geometry Shader Tests

    RightMark3D 2.0 includes two geometry shader tests. The first one is called Galaxy, it's similar to point sprites from previous Direct3D versions. It animates a system of particles using a GPU, a geometry shader creates four vertices from each dot, forming a particle. Similar algorithms should be used in future DirectX 10 games.

    A change of balance in geometry tests does not affect rendering results, the image is always identical, only scene processing methods differ. GS load value determines what shader will be busy - vertex or geometry. The amount of work is always the same.

    Let's analyze the first modification of Galaxy with vertex processing for three levels of geometric complexity:

    The correlation of results with different complexity levels of the scene is almost the same. Performance demonstrated corresponds to the number of points, FPS is halved each step. Previous reviews proved that it was not a hard task for modern graphics cards. Performance in this test is not apparently limited by shader ALUs. The task is limited by ALUs as well as by memory bandwidth and fill rate, although to a lesser degree.

    The dual-GPU card from AMD demonstrates results in between those of single- and dual-GPU cards from NVIDIA. The dual-GPU GeForce 9800 GX2 leads here, being precisely twice as fast as its single-GPU counterpart, taking the frequency difference into account. Perhaps the situation will change, when some work is moved to a geometry shader.

    But the difference is small, nothing has changed much. All graphics cards from NVIDIA demonstrate almost the same results with a different GS load, which is responsible for moving some of the load to the geometry shader. AMD RADEON HD 3870 X2 has improved its results a little, it's noticeably less outperformed by the dual-GPU card under review, especially in heavy modes. Let's see what will change in the next test, which generates a heavier load on geometry shaders.

    Hyperlight is the second geometry test that uses several techniques: instancing, stream output, buffer load. It employs dynamic generation of geometry by rendering into two buffers, as well as a new Direct3D 10 feature - stream output. The first shader generates ray directions, their speed and growth vectors. These data are stored in a buffer, which is used by the second shader for rendering. Each ray point is used to generate 14 vertices in a circle, up to a million output points.

    The new type of shader programs is used to generate rays. If "GS load" is set to "Heavy", it's also used for rendering. That is in Balanced mode, geometry shaders are used only to generate and grow rays. Output is up to instancing. The geometry shader also outputs data in the Heavy mode. Let's analyze the easy mode first:

    Here is the first unexpected result in Direct3D 10 tests. We have the same problem here as in the fill rate tests for DX9 - GeForce 9800 GX2 demonstrates apparently anomalous results. It cannot be in any conditions, so it's an apparent bug in the drivers. Performance of the dual-GPU competitor from AMD is also disappointing. However, there are no bugs here. The results are quite low, over 1.5 times as low as those of the single-GPU GeForce 9800 GTX and 8800 Ultra with any geometry complexity.

    In other respects, relative results in various modes correspond to the load: performance scales well in all cases. It's close to theoretical parameters, according to which, each next level of Polygon count must be twice as slow. The situation is similar to the previous test, but the results may change on the next diagram for the test that actively uses geometry shaders. It will be also interesting to compare test results obtained in Balanced and Heavy modes.

    GeForce 9800 GX2 still has problems with SLI AFR in this test. So we cannot compare the cards with each other. Relative performance of other cards has changed. AMD GPUs execute complex geometry shaders more efficiently than NVIDIA GPUs. However, if SLI had worked correctly in GeForce 9800 GX2, this solution would have demonstrated results similar to those of RADEON HD 3870 X2. It looks like NVIDIA fixed some problems in the drivers, and now all single-GPU GeForces perform well, much faster than they used to.

    What concerns the comparison of results obtained in different modes, everything is as usual. The graphics card from AMD is outperformed even though NVIDIA cards lose much performance as they switch from instancing to a geometry shader. RADEON HD 3870 X2 is still outperformed by the best single-G92 GeForce, which performs faster in Balanced mode than the RADEON does in Heavy. You should keep in mind that the image does not differ (visually) in these modes.

    Direct3D 10: Vertex texture fetch rate

    Vertex Texture Fetch tests measure the speed of many vertex texture fetches. These tests are similar, and the correlation of their results in Earth and Waves tests must be also similar. Both tests use displacement mapping based on texture fetches. The only major difference is that the Waves test uses conditional branches, while the Earth test does not.

    Let's analyze the first test (Earth) in Effect detail Low mode:

    Relative results in various modes differ more than usual, because results in this test are affected by memory bandwidth. The easier the mode, the stronger its effect on performance. We can see it well in test results of GeForce 8800 Ultra, which demonstrates good results in all modes, on the level of the dual-GPU card from AMD.

    We are not pleased with GeForce 9800 GX2 in this test either. It's for the first time (except for invalid results) that the performance difference between this card and GeForce 9800 GTX noticeably differs from the theoretical value. Our today's card under review still remains the leader, but the above-mentioned difference amounts to just 20-50%, and we didn't take the frequency difference into account. Let's have a look at results of this test with more texture lookups:

    The situation hasn't changed much, GeForce 8800 Ultra looks great in easy modes because of its high memory bandwidth, it still outperforms RADEON HD 3870 X2. The card from AMD hasn't changed its position or lag. GeForce 9800 GX2 demonstrates the best result, but its performance difference from the single-GPU counterpart is still far from twofold. It's not that easy to distribute synthetic vertex texture fetch tests between several GPUs.

    Let's have a look at results of the second vertex texture fetch test. The Waves test executes fewer texture lookups, but it uses conditional branches. The number of bilinear texture lookups in this case reaches 14 (Effect detail Low) or 24 (Effect detail High) per each vertex. Geometry complexity changes just like in the previous test.

    The Waves test favors AMD products, the dual-GPU RADEON HD 3870 X2 performs great here, it outperforms all NVIDIA card up to 1.5 times. Performance in this test seems to depend not on TMU, but on memory bandwidth and fillrate, judging by results of the prev-gen solution - GeForce 8800 Ultra. The dual-G92 card performs only on a par with the Ultra!

    The heavier the texture fetch test, the closer the results. However, GeForce 9800 GX2 is outperformed much by the competing card from AMD. Perhaps, there is something wrong with the implementation of SLI in the drivers - performance difference between the single- and dual-GPU GeForce 9800 is just 30-60%. Judging by the G80-based card, memory bandwidth has a stronger effect here. Let's analyze the second modification of the test:

    There are almost no changes. Although relative results of the dual-GPU cards have become better (especially those of RADEON HD 3870 X2), as test complexity grows. The other cards from NVIDIA suffer from a heavier performance drop than the AMD card. The other conclusions also hold true - performance is limited by memory bandwidth in all modes, especially in the Low mode. TMUs start to play an important role in the High mode.

    GeForce 9800 GX2 demonstrates very weak results. It fails to double performance versus GeForce 9800 GTX. Besides, AMD cards have improved their positions in VTF tests of late. And now the fastest card from AMD copes with vertex texture fetch tests better than all NVIDIA cards.

    Conclusions on the synthetic tests

    Synthetic tests of GeForce 9800 GX2 and other products from both competitors show us that the new dual-GPU product from NVIDIA is a very powerful card. It demonstrates the highest frame rate in many cases compared to the best single-GPU GeForces as well as the competing dual-GPU solution from AMD - RADEON HD 3870 X2. Highly efficient architecture, a sufficient number of ALUs, TMUs, and ROPs, as well as high operating frequencies allow this solution to demonstrate good results almost in all synthetic tests.

    That's the effect of improvements in the G9x architecture versus G8x. It's notable for high computing performance, which is important for modern and future applications with lots of complex shaders of all types. Compared to the previous G8x architecture, the new G9x features modified TMUs and ROPs. Texture units can fetch twice as much data in certain conditions, and its ROPs support the new compression technique, which improves efficiency of video memory utilization.

    GeForce 9800 GX2 is a well balanced card, its main drawback is the SLI technology itself. Our tests revealed some problems of the dual-GPU system operating in the AFR mode. The new graphics card failed DX9 tests of fill rate and one of DX10 tests of geometry shaders, which indicates insufficient optimizations in the drivers. The same concerns VTF tests, although to a lesser degree - demonstrated results are too low, the card apparently suffers from low efficiency of dual-GPU rendering.

    In the next part of the article we'll analyze tests of this card from NVIDIA in modern games. They must be similar to our conclusions made after synthetic tests. But game results will be more interesting than synthetic ones, as efficiency of the dual-GPU AFR in games should not be as high as in synthetic tests.

    NVIDIA GeForce 9800 GX2 Graphics Card. Part 3: Gaming tests

    PSU provided by TAGAN
    Monitor provided by NVIDIA

    Andrey Vorobiev (anvakams@ixbt.com)
    Alexey Berillo (sbe@ixbt.com)
    April 30, 2008

    Write a comment below. No registration needed!


  • Article navigation:



    blog comments powered by Disqus

      Most Popular Reviews More    RSS  

    AMD Phenom II X4 955, Phenom II X4 960T, Phenom II X6 1075T, and Intel Pentium G2120, Core i3-3220, Core i5-3330 Processors

    Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
    February 1, 2013 · Processor Roundups

    Inno3D GeForce GTX 670 iChill, Inno3D GeForce GTX 660 Ti Graphics Cards

    A couple of mid-range adapters with original cooling systems.
    January 30, 2013 · Video cards: NVIDIA GPUs

    Creative Sound Blaster X-Fi Surround 5.1

    An external X-Fi solution in tests.
    September 9, 2008 · Sound Cards

    AMD FX-8350 Processor

    The first worthwhile Piledriver CPU.
    September 11, 2012 · Processors: AMD

    Consumed Power, Energy Consumption: Ivy Bridge vs. Sandy Bridge

    Trying out the new method.
    September 18, 2012 · Processors: Intel
      Latest Reviews More    RSS  

    i3DSpeed, September 2013

    Retested all graphics cards with the new drivers.
    Oct 18, 2013 · 3Digests

    i3DSpeed, August 2013

    Added new benchmarks: BioShock Infinite and Metro: Last Light.
    Sep 06, 2013 · 3Digests

    i3DSpeed, July 2013

    Added the test results of NVIDIA GeForce GTX 760 and AMD Radeon HD 7730.
    Aug 05, 2013 · 3Digests

    Gainward GeForce GTX 650 Ti BOOST 2GB Golden Sample Graphics Card

    An excellent hybrid of GeForce GTX 650 Ti and GeForce GTX 660.
    Jun 24, 2013 · Video cards: NVIDIA GPUs

    i3DSpeed, May 2013

    Added the test results of NVIDIA GeForce GTX 770/780.
    Jun 03, 2013 · 3Digests
      Latest News More    RSS  

    Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  About us & Privacy policy  ·  Twitter  ·  Facebook


    Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.