Direct3D 10: PS 4.0 Tests (texturing, loops)
New RightMark3D 2.0 includes two old PS 3.0 tests (Direct3D 9), rewritten for DirectX 10, and two brand new tests. The first two tests can now enable self-shadowing and shader supersampling, which increase their load on GPUs.
These tests measure efficiency of executing looped pixel shaders with a lot of texture lookups (up to several hundreds of lookups per pixel in the heaviest mode!) and a relatively low ALU load. In other words, they measure a texture sampling rate and branching efficiency in a pixel shader.
The first pixel shader test will be the Fur test. When used with the lowest settings, it uses 15-30 texture lookups from bump maps and two lookups from the main texture. The High Effect Detail mode increases the number of lookups to 40-80. When shader supersampling is enabled, the number of lookups grows to 60-120. The heaviest mode is the High mode with SSAA - 160-320 lookups from a bump map.
Let's see what happens in modes without supersampling - they are relatively simple, and the correlation of results in Low/High modes must be similar.
Results in the High mode are almost 1.5 times as low as in the Low mode. In other respects, the procedural Fur tests for Direct3D 10 with lots of texture lookups again show a huge advantage of NVIDIA solutions over AMD cards. Performance in this test depends not only on the number and speed of TMUs, but also on the fill rate and memory bandwidth. Comparison of results demonstrated by GeForce 9800 GTX and 8800 Ultra proves it.
GeForce GTX 280 shows very good results in this test. It's only a little slower than GeForce 9800 GX2, having outperformed the single-G92 card by 60-70%. Let's have a look at the results in this test with enabled shader supersampling, which quadruples the load. Perhaps it will change the situation, and memory bandwidth/fill rate will produce a weaker effect:
Theoretically, supersampling quadruples the load. But a performance drop of NVIDIA cards is deeper than that of AMD cards. So the performance gap between them closes down, and HD 3870 together with its X2 modification come up a little. However, NVIDIA cards still enjoy an overwhelming advantage.
In other respects, as the shader grows more complex and the GPU load increases, performance difference between GeForce GTX 280 and all other cards from NVIDIA grows very big. Now the new GTX card outperforms the old one by 2.5! That's the effect of the architecture modified for executing the most complex shaders. The dual-GPU 9800 GX2 card is also outperformed with a bigger advantage.
The second test that measures efficiency of executing complex looped pixel shaders with many texture lookups is called Steep Parallax Mapping. With low settings it uses 10-50 texture lookups from a bump map and three lookups from main textures. The heavy mode with self-shadowing doubles the number of texture lookups, and supersampling quadruples this number. The most complex test mode with supersampling and self-shadowing uses 80-400 texture lookups, that is eight times as many as in the low mode. Let's analyze simple modes without supersampling first:
This test is more interesting from the practical point of view. Various parallax mapping methods have been used in games for a long time already. Heavy modifications, such as our steep parallax mapping, are already used in some projects, e.g. in Crysis and Lost Planet. Along with supersampling, our test can enable self-shadowing that doubles the GPU load (High mode).
The situation is similar to what we saw in the previous test. Even though AMD solutions used to be strong in our Direct3D 9 tests of parallax mapping, they cannot cope with our updated D3D10 test without supersampling on a par with the GeForces. Besides, self-shadowing causes a bigger performance drop in AMD products than in NVIDIA solutions.
GeForce GTX 280 without supersampling is the fastest card here, it outperforms even GeForce 9800 GX2. It's more than twice as fast as the 9800 GTX and 8800 Ultra in the High mode. Let's see what supersampling will change. Performance drop from supersampling was bigger in NVIDIA cards in the previous test.
Supersampling and self-shadowing increase the load on graphics cards by almost eight times, causing a great performance drop. Performance differences between graphics cards change. Supersampling has a similar effect here - AMD cards improve their results relative to NVIDIA solutions. The HD 3870 is still outperformed by all GeForces. But the X2 card is almost on the same level as the 8800 Ultra and 9800 GTX.
If we compare GeForce GTX 280 with the old top cards based on a single G80 or G92, they are both 2-3 times as slow! And the new card is much faster than the dual-G92 solution in the High mode. It's another excellent result that shows how well GT200 copes with such complex tasks.
Direct3D 10: PS 4.0 Tests (computing)
The next couple of pixel shader tests contains minimum texture lookups to reduce the effect of TMU performance. They use a lot of arithmetic operations, so they measure arithmetic performance of GPUs, how fast they execute arithmetic instructions in pixel shaders.
The first computing test is called Mineral. It's a complex procedural texturing test, which uses only two texture lookups and 65 sin and cos instructions.
We already noted in the analysis of our synthetic test results that the modern AMD architecture often performs better in complex arithmetic tasks than the competing architecture from NVIDIA. But time goes on, the situation changes, now RADEON HD 3870 is outperformed by any GeForce. In return, the HD 3870 X2 performs very well (thank AFR), it's almost on a par with the dual-GPU GeForce 9800 GX2.
But today we are interested in performance of GeForce GTX 280. It's brilliant. The card based on the new GT200 GPU almost catches up with dual-GPU solutions of the previous generation, outperforming the "old" GeForce 8800 Ultra and "almost new" GeForce 9800 GTX by 60-70%, which corresponds to the difference in their pure shader power, their numbers and clock rates.
The second shader test is called Fire, it's even harder for ALUs. It contains only a single texture lookup, while the number of sin/cos instructions is doubled to 130. Let's see what changes as the load grows:
Rendering performance in this test is apparently limited by the speed of shader units. The bug in AMD drivers was fixed since the rollout of RADEON HD 3870 X2. So results of AMD solutions now agree with the theory, and RADEON HD 3870 outperforms even all GeForce 8800 and 9800 cards in this test.
But GeForce GTX 280 is still faster. It outperforms older single-GPU cards from NVIDIA by 1.5, which is also close to the theoretical difference in shader performance. RADEON HD 3870 X2 is a leader in this test. Perhaps, new AMD solutions will snatch the palm in arithmetic tests.
Write a comment below. No registration needed!