Part 2: Features, synthetic tests
We've already covered all architectural features in the first part of this review.
Today we'll review graphics cards from BFG and Zotac. However, in this case we actually examined the same reference card twice. The BFG card differs a little in frequencies.
We can see on the photos that the 8800GT and the new 8800 GTS use the same PCB. The latter is reinforced with some power elements, which are not soldered in the former card.
Each card requires a single 6-pin PCI-E cable from a PSU. So you should keep it in mind. PSU requirements: you need a 400W PSU or higher with the 12V channel supporting at least 18-20A.
The cards have TV-Out with a unique connector. You will need a special adapter (usually shipped with a card) to output video to a TV-set via S-Video or RCA. You can read about TV Out here.
The cards are equipped with a couple of DVIs. Dual link DVI allows resolutions above 1600x1200 via the digital interface. Analog monitors with d-Sub (VGA) interface are connected with special DVI-to-d-Sub adapters. Maximum resolutions and frequencies:
What concerns MPEG2 playback features (DVD-Video), we analyzed this issue in 2002. Little has changed since that time. CPU load during video playback on modern graphics cards does not exceed 25%.
What concerns HDTV and other trendy video features, you can read one of our reviews here.
Now about the cooling systems. Let's examine a reference cooler on the card from Zotac.
As the new version of RivaTuner (written by A.Nikolaychuk AKA Unwinder) already supports G92, let's have a look at monitoring results.
Zotac GeForce 8800 GTS 512MB PCI-E
BFG GeForce 8800 GTS OC 512MB PCI-E
Temperature of the core exceeds 80°C, but the fan speed does not grow (NVIDIA engineers were probably overcautious to accelerate the fan, lest it should become noisy). I guess, you can use RT to accelerate the fan a little, without breaking the silence, but the core temperature will drop. However, even 85°C is not a problem, when the hot air is thrown out of a PC case.
Now let's have a look at the processors.
It's a medium-sized die (we've seen larger), even though it contains a great many transistors.
Installation and Drivers
VSync is disabled.
Our synthetic benchmarks can be downloaded here:
Synthetic tests were run with the following graphics cards:
We selected them to compare with the GeForce 8800 GTS 512MB for the following reasons: the GeForce 8800 GT interests us as a graphics card from the other price segment based on the same GPU; old models of the GeForce 8800 GTS help us evaluate the effect of architectural changes (the number of execution units, modified TMUs) and overclocking; and the RADEON HD 3870 is the fastest product from AMD, although from a different price segment. Well, and the GeForce 8800 GTX is interesting as one of the fastest G80-based cards.
Direct3D 9: Pixel Filling tests
This test determines peak texel rate in FFP mode for different numbers of textures applied to a pixel:
Only some graphics cards demonstrate results close to theoretical maximum. Synthetic results are often lower than theoretic values. The closest results are provided by the GeForce 8800 cards based on the G80. Besides, the AMD solution is closer to them than NVIDIA cards with improved TMUs - they are far from theoretical maximum in our old test. A difference in performance results demonstrated by the GeForce 8800 GT and the new GeForce 8800 GTS 512MB is similar to the difference in GPU clock rates. Judging by results, the G92 fetches over 30 texels per cycle for 32-bit textures (bilinear filtering), although it should have been twice as fast here.
In case of few textures per pixel, the GeForce 8800 GTS 512 MB looks worse than the old GeForce 8800. It has insufficient video memory bandwidth here, it's lower than in the GTX card. But why is the new card outperformed by the old GTS in a test with a single texture? These solutions have the same memory bandwidth. Perhaps it's the effect of the ROP number (there are more of these units in the old card.) Or the point is in optimizations for many textures. The new card becomes faster than all its competitors in heavier conditions, including the GeForce 8800 GTX. Let's have a look at results in the fill rate test:
The second synthetic test measures the fill rate. It shows the same situation adjusted for the number of pixels written into the frame buffer. In case of 0 and 1 texture, the new solution from NVIDIA is outperformed by the older card, which can be explained with lower memory bandwidth and fewer ROPs. The new card shoots forward again when the number of textures per pixel grows. The GeForce 8800 GTS 512MB is up to twice as fast as the only card from AMD, when its results are not limited by memory bandwidth.
Direct3D 9: Geometry Processing Speed Tests
Let's analyze extreme geometry tests. The first test uses the simplest vertex shader that shows maximum triangle throughput:
We can see that all these GPUs are based on unified architectures, all their unified processors are busy with geometry in this test. All solutions demonstrate high results. They are evidently limited not by peak performance of unified processors, but by other units, e.g. triangle setup.
Our GPUs execute the test in various modes with similar efficiency, peak performance in FFP, VS 1.1 and VS 2.0 modes is little different, although FFP is a tad faster in NVIDIA. We cannot say anything definite about these test results, only that AMD GPUs process geometry traditionally faster. Let's see what has changed in a more complex test with a single diffuse light source:
Almost the same situation. These solutions apparently have a higher potential. This time the FFP mode is even faster on all cards from NVIDIA. However, all GeForces are still outperformed by the RADEON card, but not much. Let's see what will happen in heavier conditions - complex lighting with a single light source and glares:
The difference between AMD and NVIDIA has grown a little. The RV670 remains the leader in geometry performance, and all cards from NVIDIA rank in a straight line, the new GeForce 8800 GTS 512MB being a little faster than the others. Optimized FFP emulation in G8x/G9x becomes more apparent with a mixed light source. Let's analyze the most complex geometry task with three light sources, including static and dynamic branches:
OK, now we can see the difference between all participants of the tests. The RADEON HD 3870 is still in the lead, its potential is not fully revealed even in our most complex geometry task, its results almost match those demonstrated above. We note traditional opposite weaknesses of vertex units in AMD and NVIDIA architectures - dynamic branches cause a deeper performance drop in the former, while static branches do it with the latter.
The GeForce 8800 GTS 512MB is faster than the other GeForce 8800 cards in all tests owing to a higher clock rate of the G92. On the whole, all GPUs perform well in these tests owing to advantages of the unified architecture. They can use all their unified stream processors to solve geometry tasks. But unified shader processors will be busy mostly with pixels in real applications. So we proceed to such tests now.
Direct3D 9: Pixel Shaders Tests
The first group of pixel shaders to be reviewed here is too simple for modern GPUs. It includes various versions of pixel programs of relatively low complexity: 1.1, 1.4, and 2.0.
We can see that these tests are too easy for modern architectures and fail to reveal their true capacity. In simple tests, performance is limited by the texel rate, we can see it well in low results of the RADEON HD 3870. Results become more interesting in more complex PS 2.0 tests. The card from AMD even shoots forward in the most complex test.
The GeForce 8800 GTS 512MB is always faster than other GeForce 8800 cards. It's especially noticeable, when we compare this card with the old GTS product, the difference is quite big in the most complex tests. The new GTS card even outperforms the GTX model in full compliance with theory - all its characteristics are better, except for the memory bandwidth and fill rate. They do not affect results very much in these simple tests. Let's have a look at test results of more complex pixel programs of intermediate versions:
Depending on the texel rate, the water test uses dependent texture lookups of high nesting depth, so the RADEON lags far behind all NVIDIA solutions. Our card under review is almost three times as fast. The new GTS card with 512 MB of memory is always faster than the other GeForce 8800 cards. For example, the old GTS 640MB is outperformed by the new card by more than 1.5 times. See what happens next...
The AMD card shoots forward in the second more arithmetic-intensive test. This task fits its architecture with more unified processors. But look, the difference between this card and the new product based on the G92 is already small, just 10%. As NVIDIA upgraded from G80 to G92, it fixed some architectural problems. So the fastest of its graphics cards performs almost on a par with the best RADEON card. They fared much worse not long ago.
Direct3D 9: New Pixel Shaders Tests
These tests of DirectX 9 pixel shaders are even more complex, they are divided into two categories. We'll start with easier shaders - SM 2.0:
There are two modifications of these shaders: arithmetic intensive and texture sampling intensive. Let's analyze arithmetic-intensive modifications, they are more promising from the point of view of future applications:
Situation with the NVIDIA cards in the Frozen Glass test is similar to that in the previous group of tests. The old GeForce 8800 GTS is still much slower than the new card, which always outperforms even the 8800 GTX. NVIDIA cards based on the G80 and G92 outperform the HD 3870 in this test, which proves that performance is limited by texel rate in the first place.
AMD solutions traditionally lead in the second test of Parallax Mapping. Only the recently reviewed GeForce 8800 GT is dangerously close to them, outperforming the GeForce 8800 GTX. Improved TMUs play their role in this test, because parallax mapping requires an additional texture lookup. As the GeForce 8800 GTS 512MB operates at a higher clock rate, and it has more unlocked TMUs and ALUs, this card becomes a leader in this test, outperforming the HD 3870! Let's analyze results obtained in the texture sampling intensive tests, where the GeForce 8800 GTS 512MB may perform even better:
It's a tad different situation, because performance in these tests is limited by texturing speed. That's why the new GeForce 8800 GTS outperforms the old product in one of the tests by more than twofold, the GeForce 8800 GTX - by 1.5 times! The RADEON HD 3870 is outperformed in both tests by almost all GeForce 8800 cards, except for the old GTS card in Parallax Mapping. Arithmetic-intensive shaders work faster on all graphics cards, so it does not make sense to use texturing intensive shaders with modern GPUs.
Let's have a look at results of another two pixel shader tests - SM 3.0. They are the most complex of all our tests for Direct3D 9 pixel shaders. The tests load ALUs and texture units heavily. Both shader programs are complex, long, and include a lot of branches:
The GPU load in these two tests is too great even for such powerful chips as the RV670 and G92. Even though the AMD card efficiently executes complex pixel shaders (SM 3.0) with a lot of branches, the HD 3870 is twice as slow as the new GeForce 8800 GTS 512MB. The G92-based cards again demonstrate noticeably better results in these tests than even the GeForce 8800 GTX. The advantage of the GeForce 8800 GTS 512MB is very high, over 1.5 times. It can be explained with the increased frequency as well as faster bilinear texture fetches. When you analyze results of such synthetic tests, you should take into account that the situation will be different in real applications, because they often use trilinear and/or anisotropic filtering of textures. Besides, performance is often limited by the fill rate and memory bandwidth, so the GeForce 8800 GTX may get an advantage here.
Direct3D 10: PS 4.0 Tests (texturing, loops)
New RightMark3D 2.0 includes two old Direct3D 9 PS 3.0 tests, rewritten for DirectX 10, and two brand new tests. The first two tests can now enable self-shadowing and shader supersampling, which increase their load on GPUs.
These tests measure efficiency of executing looped pixel shaders with a lot of texture lookups (up to several hundreds of lookups per pixel in the heaviest mode!) and a relatively low ALU load. In other words, they measure a texture sampling rate and branching efficiency in a pixel shader.
The first pixel shader test will be the Fur test. When used with the lowest settings, it uses 15-30 texture lookups from bump maps and two lookups from the main texture. The High Effect Detail mode increases the number of lookups to 40-80. When shader supersampling is enabled - the number of lookups grows to 60-120. And the High mode with SSAA is the heaviest mode - 160-320 lookups from a bump map.
Let's see what happens in modes without supersampling - they are relatively simple, and the correlation of results in Low/High modes must be similar.
The Fur tests with lots of texture lookups show a huge advantage of NVIDIA solutions over AMD cards. There is no point in comparing them. This lag is impossible even theoretically. Perhaps, AMD hasn't fixed bugs in its Direct3D 10 drivers yet.
All results of NVIDIA solutions in the High mode are approximately 1.5 times as low as in the Low mode. A comparison of results demonstrated by GeForce 8800 GTS cards, the new and old models, shows a big advantage of the 512 MB modification. Judging by the results demonstrated by the GeForce 8800 GTS 512MB, GT, and GTX cards, performance in this test depends not only on the number and speed of TMUs, or the disparity would have been different. To all appearances, rendering speed is limited by the fill rate (ROPs) and memory bandwidth. Let's have a look at the results in this test with enabled shader supersampling, which quadruples the load. Perhaps it will change the situation:
Only top GPUs from NVIDIA can cope with this load, the AMD card lags behind. Supersampling quadruples the load, but a performance drop on G8x-based cards is bigger than on RV670-based solutions. So the HD 3870 comes close to the old GTS cards. As the shader grows more complex and increases the GPU load, the gap between the GeForce 8800 GTX and the new GTS cards grows short. When the effect of the fill rate and memory bandwidth on the overall performance decreases.
The second test that measures efficiency of executing complex looped pixel shaders with many texture lookups is called Steep Parallax Mapping. With low settings it uses 10-50 texture lookups from a bump map and three lookups from main textures. The heavy mode with self-shadowing doubles the number of texture lookups, and supersampling quadruples this number. The most complex test mode with supersampling and self-shadowing uses 80-400 texture lookups, that is eight times as many as in the low mode. Let's analyze simple modes without supersampling first:
This test is more interesting from the practical point of view. Various parallax mapping methods have been used in games for a long time already. Heavy modifications, such as our steep parallax mapping, are used in the latest games, e.g. in Crysis and Lost Planet. Along with supersampling, our test can enable self-shadowing that doubles the GPU load (High mode).
Although AMD solutions used to be strong in our Direct3D 9 tests of parallax mapping, the RADEON HD 3870 is not very fast in our updated D3D10 test without supersampling. Self-shadowing causes a deeper performance drop on this card than on NVIDIA solutions. The GeForce 8800 GTS 512MB outperforms all other cards in the High mode. And it keeps almost on a par with the GTX card in the Low mode. Let's see what supersampling will change. Performance drop from supersampling was bigger in NVIDIA cards in the previous test, so it brought the G80/G92-based cards closer to each other.
FPS values obtained with enabled supersampling and self-shadowing again indicate a very heavy GPU load. These two options enabled together increase the load by almost eight times, causing a very big performance drop. The performance difference between our graphics cards remains. However, when supersampling is enabled, the AMD card improves its results relative to NVIDIA, just like in the previous case. NVIDIA cards drop performance by four times, while the performance drop on the HD 3870 reaches threefold. But it still lags behind the other cards.
Both "old" modifications of the GeForce 8800 GTS demonstrate identical results, being 1.5 times as slow as the new card. What concerns the comparison of the GeForce 8800 GTS 512MB and GTX, the overhauled GTS card is a tad faster this time than the GTX product. As the ALU load grows, the situation changes to the good of the G92. We can see that the modified TMUs with more address units of the G92 actually have no advantages over the G80 in real conditions.
Direct3D 10: PS 4.0 Tests (computing)
The next couple of pixel shader tests contains a minimum number of texture lookups to reduce the effect of TMUs on performance. They use a lot of arithmetic operations, so they measure arithmetic performance of GPUs, how fast they execute arithmetic instructions in pixel shaders.
The first computing test is called Mineral. It's a complex procedural texturing test, which uses only two texture lookups and 65 sin and cos instructions.
In our synthetic Direct3D 9 tests we noted that the AMD architecture often performs in arithmetic-intensive tasks better than the NVIDIA architecture. But the RADEON HD 3870 is slower here than the best card on the G80 and both cards on the G92. The GeForce 8800 GTS 512MB is much faster than the old GeForce 8800 GTS cards. The overhauled card on the new GPU from NVIDIA outperforms absolutely all cards, while the GT card is just a little slower than the GeForce 8800 GTX. It all agrees with performance (the number of clock rate) of unified processors.
The second shader test is called Fire, it's even harder for ALUs. It contains only a single texture lookup, while the number of sin/cos instructions is doubled to 130. Let's see what changes as the load grows:
The RADEON HD 3870 traditionally fails this test, the bug in AMD drivers hasn't been fixed. On the other hand, this bug may be actually a hardware problem, judging by how long it cannot be fixed. What concerns the comparison of NVIDIA cards, the situation hasn't changed much. The GeForce 8800 GTS 512MB is more than 1.5 times as fast as the older cards of the same name with different memory capacities. The GTX card is now 14% as slow. There is a similar difference in ALU power, frequencies, and the number of these units between these cards.
Direct3D 10: Geometry Shader Tests
RightMark3D 2.0 includes two geometry shader tests. The first one is called Galaxy, it's similar to point sprites from previous Direct3D versions. It animates a system of particles using a GPU, a geometry shader creates four vertices from each particle. Similar algorithms should be used in future DirectX 10 games.
A change of balance in geometry tests does not affect rendering results, the image is always identical, only scene processing methods differ. GS load value determines what shader will be busy - vertex or geometry. The amount of work is always the same.
Let's analyze the first modification of Galaxy with vertex computing for three levels of geometric complexity:
The correlation of results with different complexity levels of the scene is almost the same, only absolute values are different. Performance demonstrated corresponds to the number of points, FPS is halved each step. The new GeForce 8800 GTS demonstrates the highest results, being slightly faster than the GeForce 8800 GTX. However, the difference between the GT, GTX, and GTS 512MB cards is small.
This task is not very heavy for modern graphics cards. Even the GeForce 8600 GTS demonstrates a high result in this test, which indicates that shader ALUs do not limit performance. Perhaps the situation will change, when some work is moved to a geometry shader.
But no, there are actually no changes. All graphics cards demonstrate almost the same results with modified GS load, which is responsible for offloading some work to the geometry shader. The GeForce 8800 GTS 512MB is still in the lead, outperforming both RADEON HD 3870 and GeForce 8800 GTX. There is a little difference between results of graphics cards with different numbers of execution units and frequencies. Perhaps, it will change in the second test.
Hyperlight is the second geometry test that uses several techniques: instancing, stream output, buffer load. It employs dynamic generation of geometry by rendering into two buffers, as well as a new Direct3D 10 feature - stream output. The first shader generates ray directions, their speed and growth vectors. These data are stored in a buffer, which is used by the second shader for rendering. Each ray point is used to generate 14 vertices in a circle, up to a million output points.
The new type of shader programs is used to generate rays. If "GS load" is set to "Heavy" - it's also used for rendering. That is in Balanced mode, geometry shaders are used only to generate and grow rays. Output is up to instancing. The geometry shader also outputs data in the Heavy mode. Let's analyze the easy mode first:
The difference between results of NVIDIA graphics cards remains small with any geometry complexity. The AMD card is more than twice as slow as the GeForce 8800 GTS 512MB. Performance scales well. A difference between the modes is close to theoretical parameters. Each level of Polygon count is twice as slow as the previous one.
In general, the situation is similar to the previous test - the GeForce 8800 GTS 512MB outperforms all its competitors, and the GT card works on a par with the GTX. Results may change in the next test that uses geometry shaders more actively. It will be also interesting to compare results obtained in the Balanced and Heavy modes.
The correlation of performance results has changed much this time. The AMD GPU executes complex geometry shaders more efficiently than NVIDIA GPUs. But only if we compare with all old models of the latter. NVIDIA GPUs demonstrate results strictly according to the number of their unified shader processors and their operating frequencies. The GT card is slightly slower than the GTX. And the GeForce 8800 GTS 512MB performs approximately on a par with the RADEON HD 3870! We again see how NVIDIA catches up with AMD in previously unfavorable tests. The company gradually eliminates the problems in its GPUs. The higher the complexity of a test, the faster the GTS 512MB relative to the other cards.
If we compare results in various modes, all GeForces perform better in Balanced mode than the RADEON HD 3870 in Heavy mode. There is no visual difference between the images obtained in various modes. The AMD solution is outperformed, even though NVIDIA cards lose much performance as they switch from instancing to a geometry shader, while AMD cards profit from it.
Here is our main conclusion on geometry shaders - even though different geometry tests may yield different results, the GeForce 8800 GTS 512MB always demonstrates very high results and always outperforms its competitors. NVIDIA now wins even as geometry complexity grows, although AMD GPUs were traditionally in the lead here.
Direct3D 10: Vertex texture fetch rate
Vertex Texture Fetch tests measure the speed of many vertex texture fetches. These tests are essentially similar, and the correlation of their results in Earth and Waves tests must also be similar. Both tests use displacement mapping based on texture fetches. The only major difference is that the Waves test uses conditional branches, while the Earth test does not.
Let's analyze the first test (Earth) in Effect detail Low mode:
Results in all modes show similar performance of graphics cards relative to each other. Judging by our previous reviews, results of this test are affected by memory bandwidth. It's especially noticeable in the comparison of the GTX and GTS 512MB cards, because the performance difference cannot be explained solely with a different number of texture units, especially as the G92-based card outperforms the G80 card as the load grows.
The HD 3870 is slightly outperformed, and it looks good in the High mode. Let's have a look at the results of the same test with more texture lookups:
The situation is almost the same. The GeForce 8800 GTX is in the lead in the Low mode, the GTS 512MB with better TMUs but lower memory bandwidth comes forward as the test gets more complex. Then goes the GeForce 8800 GT (because of the difference in the number of TMUs and memory bandwidth), the RADEON HD 3870, and both old GeForce 8800 GTS cards.
Let's have a look at results of the second vertex texture fetch test. The Waves test executes fewer texture lookups, but it uses conditional branches. The number of bilinear texture lookups in this case reaches 14 (Effect detail Low) or 24 (Effect detail High) per each vertex. Geometry complexity changes just like in the previous test.
A situation in the Waves test resembles that in the Earth test. But the difference between graphics cards based on the G92 and the G80 has grown wider - perhaps memory bandwidth is even more important here. The GeForce 8800 GTS 512MB is always slower than the GTX here. It's even outperformed by the GTS in the Low mode. But as geometry complexity grows, this card gets back to its positions, being outperformed by the GTX card in the most complex test by just 5%. Nevertheless, the new GTS card is always faster than the AMD card. Let's analyze the second mode:
Results are also similar to the previous case. Only the difference between the GTS 512MB and GTX has become smaller, and the RADEON HD 3870 is outperformed by all NVIDIA cards, including the old GTS products. On the whole, the G92 copes very well with vertex texture fetch tests. When geometry complexity is low, its performance is limited by lower bandwidth of local video memory. And when the amount of geometry data grows, its performance comes close to that of the GeForce 8800 GTX, in some cases the GeForce 8800 GTS 512MB is even faster.
Conclusions on the synthetic tests
Synthetic tests prove that the GeForce 8800 GTS 512MB is a very powerful graphics card. It can compete well even with more expensive graphics cards from NVIDIA and AMD. The next part of this review will be devoted to tests of the new NVIDIA card in modern games. These tests should prove that our synthetic conclusions are true.
Write a comment below. No registration needed!