Part 2: Features, Synthetic Tests
We've covered all architectural details in the first part of this review.
Our test lab examined three graphics cards from BFG, Forsa, and ZOTAC. Running a few steps forward, we can tell you that they are all practically identical (reference cards). They differ only in operating frequencies. And the ZOTAC product has a different cooler.
With return of the 256bit bus, we can see the old design based on the circular layout of memory chips around the core, similar to the 7900 family. Of course, new realities and faster memory made the card longer than the 7900GS, but not longer than the 7900GTX/8800GTS.
The photos show that the board requires one 6-pin PCI-E cable from a PSU. So you should keep it in mind. PSU requirements: you need a 400W PSU or higher with the 12V channel supporting at least 18-20A.
The cards have TV-Out with a unique jack. You will need a special adapter (usually shipped with a card) to output video to a TV-set via S-Video or RCA. You can read about TV-Out in more detail here.
The cards are equipped with a couple of DVIs. Dual link DVI allows resolutions above 1600x1200 via the digital interface. Analog monitors with d-Sub (VGA) interface are connected with special DVI-to-d-Sub adapters. Maximum resolutions and frequencies:
As for the MPEG2 playback features (DVD-Video), we analyzed this issue in 2002. Little has changed since that time. CPU load during video playback on modern graphics cards does not exceed 25%.
What concerns HDTV and other trendy video features, you can read one of our reviews here.
BFG and Forsa products have identical cooling systems:
Even though the cooler from ZOTAC has the same dimensions, it's more efficient, so it copes with cooling without high rotational speed. It has a bigger fan and its heat sink has more fins. Let's compare them.
First of all, the latest beta of RivaTuner from Alexei Nokolaychuk now supports G92.
GeForce 8800 GT 600/1512/1800 MHz
GeForce 8800 GT 700/1674/2000 MHz
The faster reference cooler hardly copes with its task, while the second cooler from Zotac noiselessly cools the core and memory operating at much higher frequencies.
Have a look at the processor.
The die is quite big. It proves that it contains practically sterling G80, cut down only in its bus. The core in the 8800GT is cut down to make its speed adequate to its price.
Our Forsa GeForce 8800 GT (G92) 512MB PCI-E is OEM, so we cannot describe its bundle and package.
Installation and Drivers
VSync is disabled.
Starting from this review, we'll use new RightMark3D 2.0 for Direct3D 10 applications in MS Windows Vista. Some previously known tests were rewritten for DX10, new types of synthetic tests were added: modified tests of pixel shaders rewritten for SM 4.0, tests of geometry shaders, vertex texture fetch tests. However, previous versions of RightMark will also be used until low level fill rate and other tests appear in the new version.
All our synthetic benchmarks can be downloaded here:
Synthetic tests were run with the following graphics cards:
We selected them to compare with the GeForce 8800 GT for the following reasons: the GeForce 8600 GTS is the best previous Mid-End product; the old GeForce 8800 will help us evaluate the effect of architectural changes (a different number of ROPs, modified TMUs), higher frequencies, and lower video memory bandwidth; comparison with the RADEON HD 2900 XT will be interesting, because new Mid-End solutions from AMD are based on the R600.
Direct3D 9: Pixel Filling tests
This test determines peak texel rate in FFP mode for different numbers
of textures applied to a pixel:
Many graphics cards demonstrate results close to theoretical maximum. Results of synthetic tests are most often a tad lower than the theoretical maximum in modes with many textures. The old GeForce 8800 and the top AMD card (with some reservations) come closer to this threshold than the other cards. The two graphics cards from NVIDIA feature GPUs with improved TMUs, but they fail to reach the theoretic maximum in our old test.
Judging by these results, the GeForce 8800 GT only adds more confusion. The situation with the G92 does not wholly repeat what happened to the G84. Judging by the figures, the new GPU looks up over 30 texels per cycle for 32bit textures with bilinear filtering. Theoretically, it must be higher with bilinear filtering (56) than with trilinear filtering (28). What's especially interesting, the test with trilinear filtering gave the same results.
In case of few textures per pixel, the GeForce 8800 GT looks worse than the
other GeForce 8800 cards. Its video memory bandwidth is insufficient
(lower than in the GTX and GTS cards.) But in heavier conditions,
the new graphics card starts outperforming all its competitors, revealing
its higher frequency and architectural changes in its TMUs. Have a
look at the fill rate test:
The second synthetic test measures the fill rate. It shows the same situation adjusted for the number of pixels written into the frame buffer. In case of 0, 1, and 2 textures, the new Mid-End solution from NVIDIA is outperformed by the old top cards, coming forward only with many textures per pixel. Compared to the future competitors from AMD, the GeForce 8800 GT has a chance to offer faster texel and fill rates, when they are not limited by video memory bandwidth.
Direct3D 9: Geometry Processing Speed Tests
Let's analyze extreme geometry tests. The first test uses the simplest
vertex shader that shows maximum triangle throughput:
As all the GPUs are based on unified architectures, all unified processors in this test are busy with geometry processing. So all solutions demonstrate high results, which are evidently not limited by peak performance of unified processors, but by performance of other units, for example, triangle setup.
Test execution efficiency of various GPUs in various modes is approximately
the same, peak performance in FFP, VS 1.1 and VS 2.0 modes is little
different. These results do not show anything for certain. But we
can see that the AMD solution is traditionally faster at processing
geometry than NVIDIA GPUs. Let's see what will change in a more complex
test with a single diffuse light source:
We can see some difference here, although potential of these solutions
is evidently higher. The GeForce 8600 GTS is not outperformed that
much by more powerful solutions. The FFP mode is a tad faster on all
graphics cards this time, except for the G84-based card. GeForce cards
are outperformed by the top RADEON product in all modes, although
the performance difference is not very big. Let's see what will happen
in heavier conditions - complex lighting with a single light source
It's a similar situation. The leader in geometry performance is still
the R600. So future Mid-End solutions from AMD will be evidently faster
than the G92 at processing geometry. In case of a mixed light source,
the effect of optimized FFP emulation is apparent in most solutions.
This time the GeForce 8600 GTS is even more outperformed, and the
GeForce 8800 GT is no worse than its brothers. Let's analyze the most
complex geometry task with three light sources, including static and
We can see differences between all contenders. The RADEON HD 2900 XT is even at a greater advantage here. This most complex geometry task seems not to reveal its full potential. We traditionally note opposite weaknesses of vertex units in AMD and NVIDIA architectures - dynamic branches cause a deeper performance drop in the former, while static branches do it with the latter.
The GeForce 8800 GT will be analyzed separately. Higher frequency of the G92 versus both G80 units makes itself felt in FFP mode. So this GPU is faster, because its triangle setup units work faster. Perhaps, other reasons for this behavior may include new architectural optimizations, larger caches, etc. In all other cases, when the main bottleneck is in shader units, graphics cards perform strictly in compliance with their theoretical maximum, and the G92 is slightly outperformed by the top G80.
A brief conclusion on geometry tests: all GPUs perform well in these tests owing to their unified architecture, they can use all unified stream processors to solve geometry tasks. What concerns real applications, unified processors will be busy mostly with pixels there. We proceed to such tests now.
Direct3D 9: Pixel Shaders Tests
The first group of pixel shaders to be reviewed here is too simple
for modern GPUs. It includes various versions of pixel programs of
relatively low complexity: 1.1, 1.4, and 2.0.
We can see that the tests are too easy for modern architectures and fail to reveal their true capacity. Performance in simple tests is limited by texture lookups and fill rate, we can see it in low results of the RADEON HD 2900 XT. Results get more interesting in more complex PS 2.0 tests, the GeForce 8800 GT always outperforms the GTS product, being only slightly slower than the top GTX card in full compliance with the theory.
GeForce 8600 GTS and 8800 GT being on a par is out of the question, the previous
Mid-End solution is heavily outperformed, more than twofold. Its performance
is limited by the fill rate and texture lookups in the first place.
Let's have a look at results in more complex pixel programs of intermediate
Depending on the texel rate, the water test uses dependent texture lookups of high nesting depth, so the RADEON lags far behind the NVIDIA solutions. The GeForce 8600 GTS is again noticeably slower than the GeForce 8800 GT. The AMD card shoots forward in the second more compute-intensive test. This task fits its architecture with more unified processors. The difference in results demonstrated by the GeForce 8800 GT and the GTS/GTX cards appears owing to the performance differences of shader units and TMUs. These results agree well with the theory.
Direct3D 9: New Pixel Shaders Tests
These tests of DirectX 9 pixel shaders are even more complex, they are divided into two categories. We'll start with easier shaders - SM 2.0:
There are two modifications of these shaders: arithmetic intensive and texture
sampling intensive. Let's analyze arithmetic-intensive modifications,
they are more promising from the point of view of future applications:
Situation with the NVIDIA cards in the Frozen Glass test is similar to that in the previous group of tests. The GeForce 8600 GTS is still outperformed by the 8800 GT more than twofold. The latter keeps very close to the 8800 GTX. NVIDIA cards based on the G80 and G92 outperform the HD 2900 XT, which confirms the fact that their performance is limited by the texel rate.
Although the HD 2900 XT leads in the Parallax Mapping test (the second test),
the GeForce 8800 GT is only a little slower, outperforming the GeForce
8800 GTX! To all appearances, that's the effect of improved TMUs,
as parallax mapping requires an additional texture lookup. Let's analyze
results obtained in the texture sampling intensive tests, where the
GeForce 8800 GT may perform even better:
The situation changes quite radically. Performance is limited by the speed of texture units more than ever, so the GeForce 8800 GT is faster than the GeForce 8800 GTX by almost one third! And the RADEON HD 2900 XT is outperformed by the GeForce 8800 cards in the Parallax Mapping test, where they have always been very strong. You should be aware that the situation in real applications will be different, because you almost always enable trilinear and/or anisotropic filtering on such powerful graphics cards. So the GeForce 8800 GT will most likely be slower than the GTX card.
As usual, arithmetic-intensive shaders work faster on all graphics cards. Texturing-intensive shaders make no sense for modern GPU architectures, new products from AMD and NVIDIA prefer arithmetic operations to texturing.
Let's have a look at results of another two pixel shader tests - SM 3.0. They are the most complex of all our tests for Direct3D 9 pixel shaders. The tests load ALUs and texture units heavily. Both shader programs are complex, long, and include a lot of branches:
The load on graphics cards in these two tests is rather large for such powerful GPUs as R600 and G80, and the G80 outperforms the G84 by more than twofold. Although the R600 apparently executes complex Pixel Shaders 3.0 with a lot of branches more efficiently than the G80, its advantage over the new G92 almost disappears in our synthetic tests. What's interesting, the GeForce 8800 GT again performs noticeably better than the GeForce 8800 GTX in both tests. This acceleration relative to the G80 can be explained only by bilinear texture lookups, because the new GPU does not have another 20-40% theoretical advantage over the G80.
Direct3D 10: PS 4.0 Tests (texturing, loops)
New RightMark3D 2.0 includes two old Direct3D 9 PS 3.0 tests, rewritten for DirectX 10, and two brand new tests. The first two tests can now enable self-shadowing and shader supersampling, which increase their load on GPUs.
These tests measure efficiency of executing looped pixel shaders with a lot of texture lookups (up to several hundreds of lookups per pixel in the heaviest mode!) and a relatively low ALU load. In other words, they measure a texture sampling rate and branching efficiency in a pixel shader.
The first pixel shader test will be the Fur test. When used with the lowest settings, it uses 15-30 texture lookups from bump maps and two lookups from the main texture. The High Effect Detail mode increases the number of lookups to 40-80. When shader supersampling is enabled - the number of lookups grows to 60-120. And the High mode with SSAA is the heaviest mode - 160-320 lookups from a bump map.
Let's see what happens in modes without supersampling - they are relatively simple, and the correlation of results in Low/High modes must be similar.
The Fur tests with lots of texture lookups show a huge advantage of NVIDIA solutions over the RADEON HD 2900 XT. The previous Mid-End card demonstrates a higher result than the top solution from AMD. This lag is theoretically impossible. Perhaps, AMD still hasn't finetuned its Direct3D 10 drivers.
All results in the High mode are approximately 1.5 times as low as in the Low mode. Results of both GeForce 8800 GTS cards indicate no influence of memory size on the tests. The GeForce 8600 GTS is slower than the G80-based solutions by as much as it should, cut down execution units seriously affect its performance. Judging by the results demonstrated by the GeForce 8800 GT and the GTX cards, performance in this test depends not only on the number and speed of TMUs, or the disparity would have been different. It's 20-25% in the test. Only the fill rate and memory bandwidth differ by this value.
Let's have a look at the results in this test with enabled shader supersampling, which quadruples the load. Perhaps it will change the situation:
Only top GPUs from NVIDIA can cope with such complexity. The GeForce 8600 GTS still outperforms the best of AMD, being two or three times as slow as the G80-based cards. And the GeForce 8800 GT is still outperformed by the GTX card. However, it closes the gap as the shader gets more complex and the GPU load grows. Supersampling quadruples the load. But it slows down G8x-based cards approximately by five times, and R6xx-based cards - only by 3.5 times. So the HD 2900 XT almost catches up with the GeForce 8600 GTS.
The second test that measures efficiency of executing complex looped pixel shaders with many texture lookups is called Steep Parallax Mapping. With low settings it uses 10-50 texture lookups from a bump map and three lookups from main textures. The heavy mode with self-shadowing doubles the number of texture lookups, and supersampling quadruples this number. The most complex test mode with supersampling and self-shadowing uses 80-400 texture lookups, that is eight times as many as in the low mode. Let's analyze simple modes without supersampling first:
This test is more interesting from the practical point of view. Various parallax mapping methods have been used in games for a long time already. Heavy modifications, such as our steep parallax mapping, will soon be used, e.g. in Crysis. Along with supersampling, this test can enable self-shadowing that doubles the GPU load (High mode).
Even though AMD solutions have been traditionally strong in our Direct3D 9 tests of parallax mapping, the RADEON HD 2900 XT performs only on a par with the GeForce 8600 GTS in the updated test without supersampling. Besides, self-shadowing causes a bigger performance drop in AMD products, over two times versus 1.5 in NVIDIA solutions.
The GeForce 8800 GT is noticeably faster than the old GeForce 8600 GTS, their performance difference reaches three times. The GeForce 8800 GTX is still in the lead, but the GT card is slower only by 13-16%. Let's see what supersampling will change. Performance drop from supersampling was bigger in NVIDIA cards in the previous test, so it brought the 8800 GT and the GTX cards closer to each other.
FPS values obtained with enabled supersampling and self-shadowing again indicate a very heavy GPU load. These two options enabled together increase the load by almost eight times, causing a catastrophic performance drop. The performance difference between our graphics cards remains. However, when supersampling is enabled, the AMD card improves its results versus NVIDIA, just like in the previous case. NVIDIA drops performance fourfold, AMD - only threefold. But the R600 is still faster only than the G84, being outperformed by other graphics cards.
Both GeForce 8800 GTS cards demonstrate identical results, while the GeForce 8600 GTS is twice as slow. What concerns a comparison of the GeForce 8800 GT and GTX, the new product is outperformed by the GTX card. But the difference reaches 10-12% in this complex test! Modified TMUs with more address units of the G92 actually have no advantages over the G80 in real conditions.
Direct3D 10: PS 4.0 Tests (computing)
The next couple of pixel shader tests contains very few texture lookups to minimize the effect of TMUs on performance. They use a lot of arithmetic operations, so they measure arithmetic performance of GPUs, how fast they execute arithmetic instructions in pixel shaders.
The first computing test is called Mineral. It's a complex procedural texturing test, which uses only two texture lookups and 65 sin and cos instructions.
According to our synthetic Direct3D 9 tests, the AMD R6xx performs better than the competing architecture from NVIDIA in complex arithmetic tasks. The RADEON HD 2900 XT is apparently faster in this test, G80/G92-based solutions cannot compete with it here.
The GeForce 8800 GT performs almost three times as fast as the GeForce 8600 GTS, the traditional ratio again. The new product from NVIDIA outperforms both GTS cards and is slightly slower than the GeForce 8800 GTX, conforming with performance (the number and clock rate) of unified processors - less than by 5%.
The second shader test is called Fire, it's even harder for ALUs. It contains only a single texture lookup, while the number of sin/cos instructions is doubled to 130. Let's see what changes as the load grows:
We cannot compare the new NVIDIA card with the RADEON HD 2900 XT and future cards from AMD, the bug in AMD drivers is not fixed yet. Another possible reason is a hardware problem. In this case AMD is doomed in this test.
What concerns a comparison with graphics cards from the same camp, the situation is the same - the GeForce 8800 GT is three times as fast as the 8600 GTS, it's faster than any 8800 GTS by 40-50%, and it's outperformed by the GTX card only by 3% - exactly the difference in ALUs, their frequencies, and the number of execution units.
Direct3D 10: Geometry Shader Tests
RightMark3D 2.0 includes two geometry shader tests. The first one is called Galaxy, it's similar to point sprites from previous Direct3D versions. It animates a system of particles using a GPU, a geometry shader creates four vertices from each particle. Similar algorithms should be used in future DirectX 10 games.
A change of balance in geometry tests does not affect rendering results, the image is always identical, only scene processing methods differ. GS load value determines what shader will be busy - vertex or geometry. The amount of work is always the same.
Let's analyze the first modification of Galaxy with vertex computing for three levels of geometric complexity:
The correlation of results with different complexity levels of the scene is almost the same, only absolute values are different. Performance demonstrated corresponds to the number of points, FPS is halved each step. The GeForce 8800 GT demonstrates high results, being insignificantly faster than the GeForce 8800 GTX.
Memory size does not affect the results. This is an easy task for modern graphics cards. The GeForce 8600 GTS offers high results, which may indicate that ALUs don't limit performance. Perhaps the situation will change, when some work is moved to a geometry shader.
There are no great changes here. All graphics cards, except for the GeForce 8600 GTS, demonstrate the same results with a different GS load, when some of the load is moved to the geometry shader. The GeForce 8800 GT is still faster than the HD 2900 XT, the GeForce 8800 GTS, and it's very close to the GTX. Interestingly, there is a small difference between graphics cards with different numbers of execution units and clock rates. Perhaps the situation will change in the second test.
Hyperlight is the second geometry test that uses several techniques: instancing, stream output, buffer load. It employs dynamic generation of geometry by rendering into two buffers, as well as a new Direct3D 10 feature - stream output. The first shader generates ray directions, their speed and growth vectors. These data are stored in a buffer, which is used by the second shader for rendering. Each ray point is used to generate 14 vertices in a circle, up to a million output points.
The new type of shader programs is used to generate rays. If "GS load" is set to "Heavy", it's also used for rendering. That is in Balanced mode, geometry shaders are used only to generate and grow rays. Output is up to instancing. The geometry shader also outputs data in the Heavy mode. Let's analyze the easy mode first:
Results of various NVIDIA graphics cards do not differ much for any geometry complexity. And the AMD solution is outperformed by the GeForce 8600 GTS, when the geometry load is not very heavy. Performance scales well in all cases. It's close to theoretical parameters, according to which, each next level of Polygon count must be twice as slow.
If we ignore poor results of the HD 2900 XT, the situation is exactly the same - the GeForce 8800 GT outperforms all three GTS cards and fares on a par with the GTX. However, results may change in the next test that uses geometry shaders more actively. P¼P¾P¶P½P¾ We can compare results obtained in the Balanced and Heavy modes.
The relation of performance results hasn't changed much. The AMD R600 apparently performs faster than the NVIDIA GPUs, its advantage reaches 1.5-2 times. We confirm our older conclusion that AMD solutions profit from complexity of a geometry shader compared to NVIDIA graphics cards. This time all NVIDIA GPUs lined up in their results - the G84 is 2-3 times as slow as the G80, depending on a model. The GeForce 8800 GT is slightly outperformed by the top GTX, but it's much better than the GeForce 8800 GTS.
If we compare results in various modes, the GeForce 8800 GT performs better in Balanced mode than the RADEON HD 2900 XT in Heavy mode. You should keep in mind that the image does not differ in these modes. The AMD solution is outperformed, even though NVIDIA cards lose much performance as they switch from instancing to a geometry shader, while AMD cards profit from it.
Here is our main conclusion on geometry shaders - even though different geometry tests may yield different results, the GeForce 8800 GT always demonstrates very high results and outperforms its closest competitors. The RADEON HD 2900 XT shoots forward as geometry complexity grows. But these are synthetic tests, real performance will be demonstrated in games, which will be analyzed in the next part of our review.
Direct3D 10: Vertex texture fetch rate
Vertex Texture Fetch tests measure the speed of many vertex texture fetches. These tests are similar, and the correlation of their results in Earth and Waves tests must be also similar. Both tests use displacement mapping based on texture fetches. The only major difference is that the Waves test uses conditional branches, while the Earth test does not.
Let's analyze the first test (Earth) in Effect detail Low mode:
All three graphs demonstrate a similar picture of performance. Performance correlation between top solutions and Mid-End graphics cards remains the same, it's approximately two times between the GeForce 8600 GTS and the 8800 GTS, up to 1.5 times between the GTX and GTS cards. There is quite a big performance difference between the GeForce 8800 GT and GTX cards. It cannot be explained with the different number of texture units. This test seems to be affected by memory bandwidth, which differs significantly in these products.
Let's have a look at results of this test with more texture lookups:
The situation hasn't changed much. The GeForce 8800 GTX is still in the lead, followed by the GeForce 8800 GT (it lags slightly behind because of the difference in the number of TMUs and memory bandwidth). Then go the RADEON HD 2900 XT and both GeForce 8800 GTS cards. The GeForce 8600 GTS falls far behind and reveals all its weaknesses.
Let's have a look at results of the second vertex texture fetch test. The Waves test executes fewer texture lookups, but it uses conditional branches. The number of bilinear texture lookups in this case reaches 14 (Effect detail Low) or 24 (Effect detail High) per each vertex. Geometry complexity changes just like in the previous test.
The Waves tests demonstrates a similar situation (as in the Earth test), but the difference between the G92 and G80 has grown bigger. The GeForce 8800 GT is sometimes outperformed by the GTS cards and by the RADEON. But as geometry complexity grows, it restores its strength, being outperformed in the most complex test by less than 20%. Let's analyze the second mode:
These results agree with those demonstrated in the previous tests, only the RADEON HD 2900 XT falls back relative to NVIDIA solutions. The GeForce 8800 GT generally copes well with vertex texture fetch tests. When geometry complexity is low, its performance is limited by lower memory bandwidth. When the amount of geometry data grows, performance of this card comes close to the GeForce 8800 GTX.
Conclusions on the synthetic tests
Synthetic tests of the GeForce 8800 GT and older products from various price segments show us that the new Mid-End product from NVIDIA is a very powerful card. It can compete with more expensive cards from NVIDIA and AMD, especially as this GPU is manufactured by a better process technology, gaining additional advantages in power consumption and heat dissipation. It will be very interesting to compare the GeForce 8800 GT with its competitors from AMD.
The next part of the article will contain tests of the new Mid-End solution from NVIDIA in modern games, which should prove our conclusions based on synthetic tests. The gaming part has always been the most important segment of our reviews. Users should choose graphics cards on the ground of real gaming tests.
Write a comment below. No registration needed!