Direct3D 9: Pixel Filling tests
This test determines peak texel rate in FFP mode for different numbers of textures applied to a pixel:
As usual, not all graphics cards can demonstrate results close to theoretical maximum. Results of synthetic tests are most often a tad lower than the theoretical maximum. Graphics cards based on G80 and RV670 come closer to this threshold than the other cards, they are just 10-15% short of the target line. NVIDIA cards, notable for improved TMUs, fail to reach their theoretical maximum in our old test. We see no improvements in the GT200. G92 looks up about 32 texels per cycle from 32-bit textures with bilinear filtering, and GT200 also fails to reach its theoretical maximum. Perhaps, the problem is in our old test.
Nevertheless, GeForce GTX 280 is too close to GeForce 9800 GTX. In case of one texture, it's even outperformed by GeForce 8800 Ultra, although it has higher memory bandwidth! In such cases the cards are limited by video memory bandwidth. In case of more textures per pixel, ROPs reveal their worth a tad better. In heavier conditions, the GT200-based card shoots forward (if we take into account the incorrect result of the dual-GPU card from NVIDIA). What concerns the dual-GPU card from AMD, the new product outperforms it in all test modes. Let's have a look at the fill rate results:
The second synthetic test measures the fill rate. It shows the same situation adjusted for the number of pixels written into the frame buffer. In case of 0 and 1 texture, GeForce GTX 280 demonstrates a strangely low result. Performance in these modes is usually limited by memory bandwidth as well as by the number and frequency of ROPs. And the new solution has no problems here.
But the situation resembles the previous test - GeForce GTX 280 slightly outperforms its closest competitors only with many textures per pixel, although the difference must be bigger.
Direct3D 9: Geometry Processing Speed Tests
Let's analyze a couple of stress geometry tests. The first test uses the simplest vertex shader that shows maximum triangle throughput:
All these GPUs are based on unified architectures, their unified processors in this test are busy with geometry processing only. So all solutions demonstrate high results, which are evidently limited not by peak performance of unified processors, but by performance of other units, for example, triangle setup.
Test results actually prove again that AMD GPUs process geometry faster than NVIDIA GPUs. And dual-GPU solutions effectively double their frame rates in AFR mode. GeForce GTX 280 is outperformed by the dual-GPU cards, it outperforms the G80-based card, and it fares on a par with the fastest single-G92 card. So this test depends solely on GPU clock rates. Interestingly, test execution efficiency of GT200 in various modes resembles that of the G80 instead of G92.
We've removed intermediate geometry tests with a single light source. So we proceed straight to the most complex geometry task with three light sources and static/dynamic branching:
In this case AMD and NVIDIA cards demonstrate a better difference, the gap has grown wider. GeForce GTX 280 shows the best result among NVIDIA cards, slightly outperforming GeForce 9800 GTX and 8800 Ultra, except for the FFP test, which is of no interest to anyone now. On the whole, the new GPU performs well in these geometry tests. What concerns real applications, unified processors are busy mostly with pixels there. We proceed to such tests now.
Direct3D 9: Pixel Shaders Tests
The first group of pixel shaders to be reviewed here is too simple for modern GPUs. It includes various versions of pixel programs of relatively low complexity: 1.1, 1.4, and 2.0.
These tests are too easy for modern architectures and fail to reveal their true power. You can see it well in the first two tests (Wood and Psychodelic), where almost all solutions demonstrate identical results. Besides, performance in simple tests is limited by the texel rate. We can see it in weak results of RADEON HD 3870 X2, which performs on a par with single-GPU cards from NVIDIA.
GeForce GTX 280 shows good results in more complex tests, outperforming the top card on G92 and the G80-based card. As the task grows more complex, GT200 breaks away from the old GPUs even further. However, it fails to catch up with the 9800 GX2 in all tests. Let's have a look at results in more complex pixel programs of intermediate versions:
The procedural water test (which depends much on texturing performance) uses dependent texture lookups of high nesting depth, so the cards line up strictly according to their texel rates as in the first graph. The only RADEON is outperformed by all cards based on G92, G80, and GT200, even though it's a dual-GPU solution. The graphics card under review is outperformed only by the dual-GPU 9800 GX2 card, being faster than single-GPU cards from NVIDIA, in strict compliance with the theory.
The second test (arithmetic-intensive) apparently favors the R6xx and GT200 architectures with lots of arithmetic units. The AMD card demonstrates the best result in this test, followed by the dual-GPU card from NVIDIA. But the best news here is that GeForce GTX 280 is only a tiny bit slower! It's a good result, GT200 is 1.7 times as fast in this test as a single G92, just like NVIDIA mentioned in its presentations. And the 9800 GX2 suffers from low efficiency of SLI.
Direct3D 9: New Pixel Shaders Tests
These tests of DirectX 9 pixel shaders are even more complex, they are divided into two categories. We'll start with easier shaders - SM 2.0:
There are two modifications of these shaders: arithmetic intensive and texture sampling intensive. Let's analyze arithmetic-intensive modifications, they are more promising from the point of view of future applications:
Results of our graphics cards in the Frozen Glass test differ from results in the previous tests. Even though these are arithmetic tests, which depend on shader unit frequency, GeForce GTX 280 outperforms the 9800 GTX only a little, and the dual-GPU 9800 GX2 card is much faster than both. Performance seems to be limited not only by arithmetics, but also by the texel rate. RADEON HD 3870 X2 demonstrates the weakest result.
In return, the AMD card is noticeably faster in the second Parallax Mapping test, although it's still outperformed by the best NVIDIA cards. But this time it's outperformed only by the new card and the dual-GPU solution. Improvements in TMUs and caches affect results of the GTX 280. It outperforms the dual-GPU RADEON and is only slightly slower than the similar solution based on two G92 chips. Let's analyze results obtained in the texture sampling intensive tests, where the G92-based cards should demonstrate higher relative results:
The situation has changed a little, performance is apparently limited by the speed of texture units. GeForce GTX 280 significantly outperforms the AMD card in all tests, and it's slightly faster than all single-GPU cards from NVIDIA. But GeForce 9800 GX2 is ahead of them all. It must be noted that all solutions execute arithmetic-intensive shaders 1.5-2 times as fast as their modifications with lots of texture lookups.
Let's have a look at results of another two pixel shader tests - SM 3.0. They are the most complex of all our tests for Direct3D 9 pixel shaders. The tests load ALUs and texture units heavily. Both shader programs are complex, long, and include a lot of branches:
Although AMD cards efficiently execute complex Pixel Shaders 3.0 with a lot of branches, GeForce 9800 GTX performs on a par with the dual-RV670 card. It can be explained with faster bilinear texture lookups in the G9x architecture and higher efficiency of using available resources owing to the difference between scalar and superscalar architectures.
GeForce 9800 GX2 almost doubles its performance, being the leader in both tests. And GeForce GTX 280 naturally fares in between these solutions. We'd like to see a bigger performance difference between GT200 and G92, of course - at least 1.6-1.7 times.
Write a comment below. No registration needed!