iXBT Labs - Computer Hardware in Detail






NVIDIA GeForce 8800 GTS 512MB (G92)

Part 2: Features, synthetic tests

We've already covered all architectural features in the first part of this review.

Today we'll review graphics cards from BFG and Zotac. However, in this case we actually examined the same reference card twice. The BFG card differs a little in frequencies.

Graphics Cards

BFG GeForce 8800 GTS OC 512MB PCI-E
  • GPU: GeForce 8800 GTS (G92)
  • Interface: PCI-Express x16
  • GPU frequencies (ROPs/Shaders): 675/1674 MHz (nominal - 650/1620 MHz)
  • Memory frequencies (physical (effective)): 970 (1940) MHz (nominal - 970 (1940) MHz)
  • Memory bus width: 256bit
  • Vertex processors: -
  • Pixel processors: -
  • Unified processors: 128
  • Texture processors: 64 (28) (BLF)
  • ROPs: 16
  • Dimensions: 220x100x32 mm (the last figure is the maximum thickness of a graphics card).
  • PCB color: light green
  • RAMDACs/TDMS: integrated into GPU.
  • Output connectors: 2xDVI (Dual-Link/HDMI), TV-Out.
  • VIVO: not available
  • TV-out: integrated into GPU.
  • Multi-GPU operation: SLI (Hardware).
Zotac GeForce 8800 GTS 512MB PCI-E
  • GPU: GeForce 8800 GTS (G92)
  • Interface: PCI-Express x16
  • GPU frequencies (ROPs/Shaders): 650/1620 MHz (nominal - 650/1620 MHz)
  • Memory frequencies (physical (effective)): 970 (1940) MHz (nominal - 970 (1940) MHz)
  • Memory bus width: 256bit
  • Vertex processors: -
  • Pixel processors: -
  • Unified processors: 128
  • Texture processors: 64 (28) (BLF)
  • ROPs: 16
  • Dimensions: 220x100x32 mm (the last figure is the maximum thickness of a graphics card).
  • PCB color: light green
  • RAMDACs/TDMS: integrated into GPU.
  • Output connectors: 2xDVI (Dual-Link/HDMI), TV-Out.
  • VIVO: not available
  • TV-out: integrated into GPU.
  • Multi-GPU operation: SLI (Hardware).

BFG GeForce 8800 GTS OC 512MB PCI-E
Zotac GeForce 8800 GTS 512MB PCI-E
Each graphics card has 512 MB of GDDR3 SDRAM allocated in eight chips on the front side of the PCB.

Qimonda memory chips (GDDR3). 1.0 ns memory access time, which corresponds to 1000 (2000) MHz.

Comparison with the reference design, front view
Zotac GeForce 8800 GTS 512MB PCI-E Reference card NVIDIA GeForce 8800 GT
BFG GeForce 8800 GTS OC 512MB PCI-E

Comparison with the reference design, back view
Zotac GeForce 8800 GTS 512MB PCI-E Reference card NVIDIA GeForce 8800 GT
BFG GeForce 8800 GTS OC 512MB PCI-E

We can see on the photos that the 8800GT and the new 8800 GTS use the same PCB. The latter is reinforced with some power elements, which are not soldered in the former card.

Each card requires a single 6-pin PCI-E cable from a PSU. So you should keep it in mind. PSU requirements: you need a 400W PSU or higher with the 12V channel supporting at least 18-20A.

The cards have TV-Out with a unique connector. You will need a special adapter (usually shipped with a card) to output video to a TV-set via S-Video or RCA. You can read about TV Out here.

The cards are equipped with a couple of DVIs. Dual link DVI allows resolutions above 1600x1200 via the digital interface. Analog monitors with d-Sub (VGA) interface are connected with special DVI-to-d-Sub adapters. Maximum resolutions and frequencies:

  • 240 Hz Max Refresh Rate
  • 2048 x 1536 x 32bit @ 85Hz Max - analog interface
  • 2560 x 1600 @ 60Hz Max - digital interface

What concerns MPEG2 playback features (DVD-Video), we analyzed this issue in 2002. Little has changed since that time. CPU load during video playback on modern graphics cards does not exceed 25%.

What concerns HDTV and other trendy video features, you can read one of our reviews here.

Now about the cooling systems. Let's examine a reference cooler on the card from Zotac.

Zotac GeForce 8800 GTS 512MB PCI-E

NVIDIA made a mistake when it decided not to use two-slot coolers in the 8800 GT, although they were used even in relatively cheap 8800 GTS 320MB cards. The slim single-slot cooler is not efficient. Moreover, it does not throw hot air out of a system unit, and it's a bit noisy.

So engineers returned to the old device in the 8800 GTS 512MB - we can see a big two-slot device that cools not only the core, but also memory chips and some elements in the power circuit. I don't understand why they designed a new cooler instead of using the old one from the 8800 GTS (those devices were efficient and quiet), changing only the contact plate for the rearranged memory chips.

The new cooler is more angular, and its fan is tilted relative to the PCB surface. In other respects, the coolers are identical. Air is pumped through the massive heat sink soldered to the base and connected to it with heat pipes. The hot air is then thrown out of a system unit.

The fan is slow, below 1000 rpm, so there is no noise at all.

As the new version of RivaTuner (written by A.Nikolaychuk AKA Unwinder) already supports G92, let's have a look at monitoring results.

Zotac GeForce 8800 GTS 512MB PCI-E

BFG GeForce 8800 GTS OC 512MB PCI-E

Temperature of the core exceeds 80°C, but the fan speed does not grow (NVIDIA engineers were probably overcautious to accelerate the fan, lest it should become noisy). I guess, you can use RT to accelerate the fan a little, without breaking the silence, but the core temperature will drop. However, even 85°C is not a problem, when the hot air is thrown out of a PC case.

Now let's have a look at the processors.

GeForce 8800 GTS 512 (G92)

It's a medium-sized die (we've seen larger), even though it contains a great many transistors.


BFG GeForce 8800 GTS OC 512MB PCI-E
User's Manual, CD with drivers, two DVI-to-d-Sub adapters and S-Video-to-RCA, TV cord extensions, component output adapter, power cable.

Zotac GeForce 8800 GTS 512MB PCI-E
It's a similar bundle without fliers and registration cards, which abound in the BFG box. There is also one DVI-to-VGA adapter (but in my opinion, few users will use these new cards with two analog CRT monitors).


BFG GeForce 8800 GTS OC 512MB PCI-E
I already mentioned that BFG was looking for ways to distinguish itself. They would remove and return the "clasped head" from time to time. They seem to have run out of design ideas. I don't like this design, inscriptions are printed so haphazardly that you even want to set them straight. On the other hand, retailers will like box dimensions: they are not big or tiny.

The manufacturer uses a cute way to package all cables: they are wrapped in a package and put next to the card. The card itself is placed in a rigid compartment in two packets, so it's protected from damages in transit.

The general tendency to decrease the packaging quality and bundles is apparent: it all started with stylish boxes with T-shirts and other presents, and now it ends with simple black boxes.

Zotac GeForce 8800 GTS 512MB PCI-E

I mentioned in the 8800 GT review that ZOTAC changed its package design. It's bright and stylish now. However, in the previous case the box had a window to show the card, now it hasn't. Besides, the attractive dragon theme has also been abandoned for some reason.

The entire bundle is arranged into rigid plastic compartments, so the card won't be damaged in transit.

Installation and Drivers

Testbed configuration:

  • Intel Core2 Duo (775 Socket) based computer
    • CPU: Intel Core2 Duo Extreme X6800 (2930 MHz) (L2=4096K)
    • Motherboard: EVGA nForce 680i SLI on NVIDIA nForce 680i
    • RAM: 2 GB DDR2 SDRAM Corsair 1142MHz (CAS (tCL)=5; RAS to CAS delay (tRCD)=5; Row Precharge (tRP)=5; tRAS=15)
    • HDD: WD Caviar SE WD1600JD 160GB SATA
  • Operating system: Windows XP SP2; DirectX 9.0c
  • Operating system: Windows Vista Ultimate; DirectX 10.0
  • Monitor: Dell 3007WFP (30").
  • Drivers: ATI CATALYST 7.10; NVIDIA Drivers 169.06.

VSync is disabled.

Synthetic tests

Our synthetic benchmarks can be downloaded here:

  • D3D RightMark Beta 4 (1050) with its description on http://3d.rightmark.org
  • D3D RightMark Pixel Shading 2 and D3D RightMark Pixel Shading 3 - tests of Pixel Shaders 2.0 and 3.0 link.
  • RightMark3D 2.0 with a brief description: link

RightMark3D 2.0 requires MS Visual Studio 2005 runtime and the latest DirectX runtime update.

Synthetic tests were run with the following graphics cards:

  • NVIDIA GeForce 8800 GTS 512MB with the standard parameters (GF8800GTS 512)
  • NVIDIA GeForce 8800 GT with standard parameters (GF8800GT)
  • NVIDIA GeForce 8800 GTX with standard parameters (GF8800GTX)
  • NVIDIA GeForce 8800 GTS 640MB with standard parameters (GF8800GTS 640)
  • NVIDIA GeForce 8800 GTS 320MB with standard parameters (GF8800GTS 320)
  • RADEON HD 3870 with standard parameters (HD3870)

We selected them to compare with the GeForce 8800 GTS 512MB for the following reasons: the GeForce 8800 GT interests us as a graphics card from the other price segment based on the same GPU; old models of the GeForce 8800 GTS help us evaluate the effect of architectural changes (the number of execution units, modified TMUs) and overclocking; and the RADEON HD 3870 is the fastest product from AMD, although from a different price segment. Well, and the GeForce 8800 GTX is interesting as one of the fastest G80-based cards.

Direct3D 9: Pixel Filling tests

This test determines peak texel rate in FFP mode for different numbers of textures applied to a pixel:

Only some graphics cards demonstrate results close to theoretical maximum. Synthetic results are often lower than theoretic values. The closest results are provided by the GeForce 8800 cards based on the G80. Besides, the AMD solution is closer to them than NVIDIA cards with improved TMUs - they are far from theoretical maximum in our old test. A difference in performance results demonstrated by the GeForce 8800 GT and the new GeForce 8800 GTS 512MB is similar to the difference in GPU clock rates. Judging by results, the G92 fetches over 30 texels per cycle for 32-bit textures (bilinear filtering), although it should have been twice as fast here.

In case of few textures per pixel, the GeForce 8800 GTS 512 MB looks worse than the old GeForce 8800. It has insufficient video memory bandwidth here, it's lower than in the GTX card. But why is the new card outperformed by the old GTS in a test with a single texture? These solutions have the same memory bandwidth. Perhaps it's the effect of the ROP number (there are more of these units in the old card.) Or the point is in optimizations for many textures. The new card becomes faster than all its competitors in heavier conditions, including the GeForce 8800 GTX. Let's have a look at results in the fill rate test:

The second synthetic test measures the fill rate. It shows the same situation adjusted for the number of pixels written into the frame buffer. In case of 0 and 1 texture, the new solution from NVIDIA is outperformed by the older card, which can be explained with lower memory bandwidth and fewer ROPs. The new card shoots forward again when the number of textures per pixel grows. The GeForce 8800 GTS 512MB is up to twice as fast as the only card from AMD, when its results are not limited by memory bandwidth.

Direct3D 9: Geometry Processing Speed Tests

Let's analyze extreme geometry tests. The first test uses the simplest vertex shader that shows maximum triangle throughput:

We can see that all these GPUs are based on unified architectures, all their unified processors are busy with geometry in this test. All solutions demonstrate high results. They are evidently limited not by peak performance of unified processors, but by other units, e.g. triangle setup.

Our GPUs execute the test in various modes with similar efficiency, peak performance in FFP, VS 1.1 and VS 2.0 modes is little different, although FFP is a tad faster in NVIDIA. We cannot say anything definite about these test results, only that AMD GPUs process geometry traditionally faster. Let's see what has changed in a more complex test with a single diffuse light source:

Almost the same situation. These solutions apparently have a higher potential. This time the FFP mode is even faster on all cards from NVIDIA. However, all GeForces are still outperformed by the RADEON card, but not much. Let's see what will happen in heavier conditions - complex lighting with a single light source and glares:

The difference between AMD and NVIDIA has grown a little. The RV670 remains the leader in geometry performance, and all cards from NVIDIA rank in a straight line, the new GeForce 8800 GTS 512MB being a little faster than the others. Optimized FFP emulation in G8x/G9x becomes more apparent with a mixed light source. Let's analyze the most complex geometry task with three light sources, including static and dynamic branches:

OK, now we can see the difference between all participants of the tests. The RADEON HD 3870 is still in the lead, its potential is not fully revealed even in our most complex geometry task, its results almost match those demonstrated above. We note traditional opposite weaknesses of vertex units in AMD and NVIDIA architectures - dynamic branches cause a deeper performance drop in the former, while static branches do it with the latter.

The GeForce 8800 GTS 512MB is faster than the other GeForce 8800 cards in all tests owing to a higher clock rate of the G92. On the whole, all GPUs perform well in these tests owing to advantages of the unified architecture. They can use all their unified stream processors to solve geometry tasks. But unified shader processors will be busy mostly with pixels in real applications. So we proceed to such tests now.

Direct3D 9: Pixel Shaders Tests

The first group of pixel shaders to be reviewed here is too simple for modern GPUs. It includes various versions of pixel programs of relatively low complexity: 1.1, 1.4, and 2.0.

We can see that these tests are too easy for modern architectures and fail to reveal their true capacity. In simple tests, performance is limited by the texel rate, we can see it well in low results of the RADEON HD 3870. Results become more interesting in more complex PS 2.0 tests. The card from AMD even shoots forward in the most complex test.

The GeForce 8800 GTS 512MB is always faster than other GeForce 8800 cards. It's especially noticeable, when we compare this card with the old GTS product, the difference is quite big in the most complex tests. The new GTS card even outperforms the GTX model in full compliance with theory - all its characteristics are better, except for the memory bandwidth and fill rate. They do not affect results very much in these simple tests. Let's have a look at test results of more complex pixel programs of intermediate versions:

Depending on the texel rate, the water test uses dependent texture lookups of high nesting depth, so the RADEON lags far behind all NVIDIA solutions. Our card under review is almost three times as fast. The new GTS card with 512 MB of memory is always faster than the other GeForce 8800 cards. For example, the old GTS 640MB is outperformed by the new card by more than 1.5 times. See what happens next...

The AMD card shoots forward in the second more arithmetic-intensive test. This task fits its architecture with more unified processors. But look, the difference between this card and the new product based on the G92 is already small, just 10%. As NVIDIA upgraded from G80 to G92, it fixed some architectural problems. So the fastest of its graphics cards performs almost on a par with the best RADEON card. They fared much worse not long ago.

Direct3D 9: New Pixel Shaders Tests

These tests of DirectX 9 pixel shaders are even more complex, they are divided into two categories. We'll start with easier shaders - SM 2.0:

  • Parallax Mapping - a texturing method used in many games, which is described in detail in our article Modern 3D Graphics Terms
  • Frozen Glass - a complex procedural texture that visualizes frozen glass with adjustable parameters

There are two modifications of these shaders: arithmetic intensive and texture sampling intensive. Let's analyze arithmetic-intensive modifications, they are more promising from the point of view of future applications:

Situation with the NVIDIA cards in the Frozen Glass test is similar to that in the previous group of tests. The old GeForce 8800 GTS is still much slower than the new card, which always outperforms even the 8800 GTX. NVIDIA cards based on the G80 and G92 outperform the HD 3870 in this test, which proves that performance is limited by texel rate in the first place.

AMD solutions traditionally lead in the second test of Parallax Mapping. Only the recently reviewed GeForce 8800 GT is dangerously close to them, outperforming the GeForce 8800 GTX. Improved TMUs play their role in this test, because parallax mapping requires an additional texture lookup. As the GeForce 8800 GTS 512MB operates at a higher clock rate, and it has more unlocked TMUs and ALUs, this card becomes a leader in this test, outperforming the HD 3870! Let's analyze results obtained in the texture sampling intensive tests, where the GeForce 8800 GTS 512MB may perform even better:

It's a tad different situation, because performance in these tests is limited by texturing speed. That's why the new GeForce 8800 GTS outperforms the old product in one of the tests by more than twofold, the GeForce 8800 GTX - by 1.5 times! The RADEON HD 3870 is outperformed in both tests by almost all GeForce 8800 cards, except for the old GTS card in Parallax Mapping. Arithmetic-intensive shaders work faster on all graphics cards, so it does not make sense to use texturing intensive shaders with modern GPUs.

Let's have a look at results of another two pixel shader tests - SM 3.0. They are the most complex of all our tests for Direct3D 9 pixel shaders. The tests load ALUs and texture units heavily. Both shader programs are complex, long, and include a lot of branches:

  • Steep Parallax Mapping is a much heavier modification of parallax mapping, which is also described in the article Modern 3D Graphics Terms
  • Fur - a procedural shader that visualizes fur

The GPU load in these two tests is too great even for such powerful chips as the RV670 and G92. Even though the AMD card efficiently executes complex pixel shaders (SM 3.0) with a lot of branches, the HD 3870 is twice as slow as the new GeForce 8800 GTS 512MB. The G92-based cards again demonstrate noticeably better results in these tests than even the GeForce 8800 GTX. The advantage of the GeForce 8800 GTS 512MB is very high, over 1.5 times. It can be explained with the increased frequency as well as faster bilinear texture fetches. When you analyze results of such synthetic tests, you should take into account that the situation will be different in real applications, because they often use trilinear and/or anisotropic filtering of textures. Besides, performance is often limited by the fill rate and memory bandwidth, so the GeForce 8800 GTX may get an advantage here.

Direct3D 10: PS 4.0 Tests (texturing, loops)

New RightMark3D 2.0 includes two old Direct3D 9 PS 3.0 tests, rewritten for DirectX 10, and two brand new tests. The first two tests can now enable self-shadowing and shader supersampling, which increase their load on GPUs.

These tests measure efficiency of executing looped pixel shaders with a lot of texture lookups (up to several hundreds of lookups per pixel in the heaviest mode!) and a relatively low ALU load. In other words, they measure a texture sampling rate and branching efficiency in a pixel shader.

The first pixel shader test will be the Fur test. When used with the lowest settings, it uses 15-30 texture lookups from bump maps and two lookups from the main texture. The High Effect Detail mode increases the number of lookups to 40-80. When shader supersampling is enabled - the number of lookups grows to 60-120. And the High mode with SSAA is the heaviest mode - 160-320 lookups from a bump map.

Let's see what happens in modes without supersampling - they are relatively simple, and the correlation of results in Low/High modes must be similar.

The Fur tests with lots of texture lookups show a huge advantage of NVIDIA solutions over AMD cards. There is no point in comparing them. This lag is impossible even theoretically. Perhaps, AMD hasn't fixed bugs in its Direct3D 10 drivers yet.

All results of NVIDIA solutions in the High mode are approximately 1.5 times as low as in the Low mode. A comparison of results demonstrated by GeForce 8800 GTS cards, the new and old models, shows a big advantage of the 512 MB modification. Judging by the results demonstrated by the GeForce 8800 GTS 512MB, GT, and GTX cards, performance in this test depends not only on the number and speed of TMUs, or the disparity would have been different. To all appearances, rendering speed is limited by the fill rate (ROPs) and memory bandwidth. Let's have a look at the results in this test with enabled shader supersampling, which quadruples the load. Perhaps it will change the situation:

Only top GPUs from NVIDIA can cope with this load, the AMD card lags behind. Supersampling quadruples the load, but a performance drop on G8x-based cards is bigger than on RV670-based solutions. So the HD 3870 comes close to the old GTS cards. As the shader grows more complex and increases the GPU load, the gap between the GeForce 8800 GTX and the new GTS cards grows short. When the effect of the fill rate and memory bandwidth on the overall performance decreases.

The second test that measures efficiency of executing complex looped pixel shaders with many texture lookups is called Steep Parallax Mapping. With low settings it uses 10-50 texture lookups from a bump map and three lookups from main textures. The heavy mode with self-shadowing doubles the number of texture lookups, and supersampling quadruples this number. The most complex test mode with supersampling and self-shadowing uses 80-400 texture lookups, that is eight times as many as in the low mode. Let's analyze simple modes without supersampling first:

This test is more interesting from the practical point of view. Various parallax mapping methods have been used in games for a long time already. Heavy modifications, such as our steep parallax mapping, are used in the latest games, e.g. in Crysis and Lost Planet. Along with supersampling, our test can enable self-shadowing that doubles the GPU load (High mode).

Although AMD solutions used to be strong in our Direct3D 9 tests of parallax mapping, the RADEON HD 3870 is not very fast in our updated D3D10 test without supersampling. Self-shadowing causes a deeper performance drop on this card than on NVIDIA solutions. The GeForce 8800 GTS 512MB outperforms all other cards in the High mode. And it keeps almost on a par with the GTX card in the Low mode. Let's see what supersampling will change. Performance drop from supersampling was bigger in NVIDIA cards in the previous test, so it brought the G80/G92-based cards closer to each other.

FPS values obtained with enabled supersampling and self-shadowing again indicate a very heavy GPU load. These two options enabled together increase the load by almost eight times, causing a very big performance drop. The performance difference between our graphics cards remains. However, when supersampling is enabled, the AMD card improves its results relative to NVIDIA, just like in the previous case. NVIDIA cards drop performance by four times, while the performance drop on the HD 3870 reaches threefold. But it still lags behind the other cards.

Both "old" modifications of the GeForce 8800 GTS demonstrate identical results, being 1.5 times as slow as the new card. What concerns the comparison of the GeForce 8800 GTS 512MB and GTX, the overhauled GTS card is a tad faster this time than the GTX product. As the ALU load grows, the situation changes to the good of the G92. We can see that the modified TMUs with more address units of the G92 actually have no advantages over the G80 in real conditions.

Direct3D 10: PS 4.0 Tests (computing)

The next couple of pixel shader tests contains a minimum number of texture lookups to reduce the effect of TMUs on performance. They use a lot of arithmetic operations, so they measure arithmetic performance of GPUs, how fast they execute arithmetic instructions in pixel shaders.

The first computing test is called Mineral. It's a complex procedural texturing test, which uses only two texture lookups and 65 sin and cos instructions.

In our synthetic Direct3D 9 tests we noted that the AMD architecture often performs in arithmetic-intensive tasks better than the NVIDIA architecture. But the RADEON HD 3870 is slower here than the best card on the G80 and both cards on the G92. The GeForce 8800 GTS 512MB is much faster than the old GeForce 8800 GTS cards. The overhauled card on the new GPU from NVIDIA outperforms absolutely all cards, while the GT card is just a little slower than the GeForce 8800 GTX. It all agrees with performance (the number of clock rate) of unified processors.

The second shader test is called Fire, it's even harder for ALUs. It contains only a single texture lookup, while the number of sin/cos instructions is doubled to 130. Let's see what changes as the load grows:

The RADEON HD 3870 traditionally fails this test, the bug in AMD drivers hasn't been fixed. On the other hand, this bug may be actually a hardware problem, judging by how long it cannot be fixed. What concerns the comparison of NVIDIA cards, the situation hasn't changed much. The GeForce 8800 GTS 512MB is more than 1.5 times as fast as the older cards of the same name with different memory capacities. The GTX card is now 14% as slow. There is a similar difference in ALU power, frequencies, and the number of these units between these cards.

Direct3D 10: Geometry Shader Tests

RightMark3D 2.0 includes two geometry shader tests. The first one is called Galaxy, it's similar to point sprites from previous Direct3D versions. It animates a system of particles using a GPU, a geometry shader creates four vertices from each particle. Similar algorithms should be used in future DirectX 10 games.

A change of balance in geometry tests does not affect rendering results, the image is always identical, only scene processing methods differ. GS load value determines what shader will be busy - vertex or geometry. The amount of work is always the same.

Let's analyze the first modification of Galaxy with vertex computing for three levels of geometric complexity:

The correlation of results with different complexity levels of the scene is almost the same, only absolute values are different. Performance demonstrated corresponds to the number of points, FPS is halved each step. The new GeForce 8800 GTS demonstrates the highest results, being slightly faster than the GeForce 8800 GTX. However, the difference between the GT, GTX, and GTS 512MB cards is small.

This task is not very heavy for modern graphics cards. Even the GeForce 8600 GTS demonstrates a high result in this test, which indicates that shader ALUs do not limit performance. Perhaps the situation will change, when some work is moved to a geometry shader.

But no, there are actually no changes. All graphics cards demonstrate almost the same results with modified GS load, which is responsible for offloading some work to the geometry shader. The GeForce 8800 GTS 512MB is still in the lead, outperforming both RADEON HD 3870 and GeForce 8800 GTX. There is a little difference between results of graphics cards with different numbers of execution units and frequencies. Perhaps, it will change in the second test.

Hyperlight is the second geometry test that uses several techniques: instancing, stream output, buffer load. It employs dynamic generation of geometry by rendering into two buffers, as well as a new Direct3D 10 feature - stream output. The first shader generates ray directions, their speed and growth vectors. These data are stored in a buffer, which is used by the second shader for rendering. Each ray point is used to generate 14 vertices in a circle, up to a million output points.

The new type of shader programs is used to generate rays. If "GS load" is set to "Heavy" - it's also used for rendering. That is in Balanced mode, geometry shaders are used only to generate and grow rays. Output is up to instancing. The geometry shader also outputs data in the Heavy mode. Let's analyze the easy mode first:

The difference between results of NVIDIA graphics cards remains small with any geometry complexity. The AMD card is more than twice as slow as the GeForce 8800 GTS 512MB. Performance scales well. A difference between the modes is close to theoretical parameters. Each level of Polygon count is twice as slow as the previous one.

In general, the situation is similar to the previous test - the GeForce 8800 GTS 512MB outperforms all its competitors, and the GT card works on a par with the GTX. Results may change in the next test that uses geometry shaders more actively. It will be also interesting to compare results obtained in the Balanced and Heavy modes.

The correlation of performance results has changed much this time. The AMD GPU executes complex geometry shaders more efficiently than NVIDIA GPUs. But only if we compare with all old models of the latter. NVIDIA GPUs demonstrate results strictly according to the number of their unified shader processors and their operating frequencies. The GT card is slightly slower than the GTX. And the GeForce 8800 GTS 512MB performs approximately on a par with the RADEON HD 3870! We again see how NVIDIA catches up with AMD in previously unfavorable tests. The company gradually eliminates the problems in its GPUs. The higher the complexity of a test, the faster the GTS 512MB relative to the other cards.

If we compare results in various modes, all GeForces perform better in Balanced mode than the RADEON HD 3870 in Heavy mode. There is no visual difference between the images obtained in various modes. The AMD solution is outperformed, even though NVIDIA cards lose much performance as they switch from instancing to a geometry shader, while AMD cards profit from it.

Here is our main conclusion on geometry shaders - even though different geometry tests may yield different results, the GeForce 8800 GTS 512MB always demonstrates very high results and always outperforms its competitors. NVIDIA now wins even as geometry complexity grows, although AMD GPUs were traditionally in the lead here.

Direct3D 10: Vertex texture fetch rate

Vertex Texture Fetch tests measure the speed of many vertex texture fetches. These tests are essentially similar, and the correlation of their results in Earth and Waves tests must also be similar. Both tests use displacement mapping based on texture fetches. The only major difference is that the Waves test uses conditional branches, while the Earth test does not.

Let's analyze the first test (Earth) in Effect detail Low mode:

Results in all modes show similar performance of graphics cards relative to each other. Judging by our previous reviews, results of this test are affected by memory bandwidth. It's especially noticeable in the comparison of the GTX and GTS 512MB cards, because the performance difference cannot be explained solely with a different number of texture units, especially as the G92-based card outperforms the G80 card as the load grows.

The HD 3870 is slightly outperformed, and it looks good in the High mode. Let's have a look at the results of the same test with more texture lookups:

The situation is almost the same. The GeForce 8800 GTX is in the lead in the Low mode, the GTS 512MB with better TMUs but lower memory bandwidth comes forward as the test gets more complex. Then goes the GeForce 8800 GT (because of the difference in the number of TMUs and memory bandwidth), the RADEON HD 3870, and both old GeForce 8800 GTS cards.

Let's have a look at results of the second vertex texture fetch test. The Waves test executes fewer texture lookups, but it uses conditional branches. The number of bilinear texture lookups in this case reaches 14 (Effect detail Low) or 24 (Effect detail High) per each vertex. Geometry complexity changes just like in the previous test.

A situation in the Waves test resembles that in the Earth test. But the difference between graphics cards based on the G92 and the G80 has grown wider - perhaps memory bandwidth is even more important here. The GeForce 8800 GTS 512MB is always slower than the GTX here. It's even outperformed by the GTS in the Low mode. But as geometry complexity grows, this card gets back to its positions, being outperformed by the GTX card in the most complex test by just 5%. Nevertheless, the new GTS card is always faster than the AMD card. Let's analyze the second mode:

Results are also similar to the previous case. Only the difference between the GTS 512MB and GTX has become smaller, and the RADEON HD 3870 is outperformed by all NVIDIA cards, including the old GTS products. On the whole, the G92 copes very well with vertex texture fetch tests. When geometry complexity is low, its performance is limited by lower bandwidth of local video memory. And when the amount of geometry data grows, its performance comes close to that of the GeForce 8800 GTX, in some cases the GeForce 8800 GTS 512MB is even faster.

Conclusions on the synthetic tests

  • We'll repeat the main conclusion of the previous part: the G92 architecture has changed little from the G8x, it's notable for high arithmetic performance and designed for modern and future applications with many complex shaders of all types. High efficiency of unified processors, many TMUs and ROPs, as well as high operating frequencies allow this GPU to demonstrate excellent results in all synthetic tests. Improved texturing units are of assistance here as well. The GeForce 8800 GTS 512MB always performs on a par with a more expensive GeForce 8800 GTX or better in synthetic tests, especially when the load grows and memory bandwidth does not limit its performance.
  • The architecture was improved with modified TMUs. This and the increased clock rate help the GeForce 8800 GTS 512MB outperform the GeForce 8800 GTX in many cases, having a higher theoretical texel rate in certain conditions. The only weak spot in the GeForce 8800 GTS 512MB versus the GeForce 8800 GTX is a narrower memory bus, its lower bandwidth, and fewer ROPs. That's exactly what's necessary in some tests to demonstrate results higher than those of the GTX card. However, the card we examine today is cheaper, so its performance is sufficient to compete with AMD graphics cards and with more expensive cards from NVIDIA.

Synthetic tests prove that the GeForce 8800 GTS 512MB is a very powerful graphics card. It can compete well even with more expensive graphics cards from NVIDIA and AMD. The next part of this review will be devoted to tests of the new NVIDIA card in modern games. These tests should prove that our synthetic conclusions are true.

Andrey Vorobiev (anvakams@ixbt.com)
Alexei Berillo (sbe@ixbt.com)
December 20, 2007

Write a comment below. No registration needed!

Article navigation:

blog comments powered by Disqus

  Most Popular Reviews More    RSS  

AMD Phenom II X4 955, Phenom II X4 960T, Phenom II X6 1075T, and Intel Pentium G2120, Core i3-3220, Core i5-3330 Processors

Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
February 1, 2013 · Processor Roundups

Inno3D GeForce GTX 670 iChill, Inno3D GeForce GTX 660 Ti Graphics Cards

A couple of mid-range adapters with original cooling systems.
January 30, 2013 · Video cards: NVIDIA GPUs

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

AMD FX-8350 Processor

The first worthwhile Piledriver CPU.
September 11, 2012 · Processors: AMD

Consumed Power, Energy Consumption: Ivy Bridge vs. Sandy Bridge

Trying out the new method.
September 18, 2012 · Processors: Intel
  Latest Reviews More    RSS  

i3DSpeed, September 2013

Retested all graphics cards with the new drivers.
Oct 18, 2013 · 3Digests

i3DSpeed, August 2013

Added new benchmarks: BioShock Infinite and Metro: Last Light.
Sep 06, 2013 · 3Digests

i3DSpeed, July 2013

Added the test results of NVIDIA GeForce GTX 760 and AMD Radeon HD 7730.
Aug 05, 2013 · 3Digests

Gainward GeForce GTX 650 Ti BOOST 2GB Golden Sample Graphics Card

An excellent hybrid of GeForce GTX 650 Ti and GeForce GTX 660.
Jun 24, 2013 · Video cards: NVIDIA GPUs

i3DSpeed, May 2013

Added the test results of NVIDIA GeForce GTX 770/780.
Jun 03, 2013 · 3Digests
  Latest News More    RSS  

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  About us & Privacy policy  ·  Twitter  ·  Facebook

Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.