Direct3D 10: PS 4.0 (texturing, loops)
RightMark3D 2.0 includes two old PS 3.0 tests (Direct3D 9), rewritten for DirectX 10, and two new tests. The first two tests can now enable self-shadowing and shader supersampling, which increase their load on GPUs.
These tests measure efficiency of executing looped pixel shaders with a lot of texture lookups (up to several hundreds of lookups per pixel in the heaviest mode) and a relatively low ALU load. In other words, they measure a texture sampling rate and branching efficiency in a pixel shader.
The first pixel shader test will be the Fur test. When used with the lowest settings, it uses 15-30 texture lookups from bump maps and two lookups from the main texture. The High Effect Detail mode increases the number of lookups to 40-80. When shader supersampling is enabled, the number of lookups grows to 60-120. The heaviest mode is the High mode with SSAA -- 160-320 lookups from a bump map.
Let's see what happens in modes without supersampling -- they are relatively simple, and the correlation of results in Low/High modes must be similar.
If we ignore the difference between AMD and NVIDIA solutions, performance in this test depends on the number and speed of TMUs as well as on the fill rate and memory bandwidth. Results in the High mode are approximately 1.5 times as low as in the Low mode. That's how it should be. NVIDIA cards are traditionally strong in Direct3D 10 Fur tests with lots of texture lookups. And the new graphics cards from AMD can barely compete with them -- HD 5770 performs on a par with the GTS 250.
On the other hand, the HD 5770 is faster than the HD 4870, that's good news. The graphics card based on the old RV770 again ranks in between the two models from the new HD 5700 series. And the HD 5870 is almost twice as fast as the HD 5770 in full agreement with the theory. Let's have a look at the results in this test with enabled shader supersampling, which quadruples the load. Perhaps it will change the situation, and memory bandwidth/fill rate will produce a weaker effect:
Theoretically, supersampling quadruples the load, and this time the Cypress-based card is exactly twice as fast as the top solution based on Juniper. Interestingly, the HD 4870 now competes only with the HD 5750, while the HD 5770 is even faster. Memory bandwidth has a lower effect on results here, so performance seems to be limited by ALUs and branching efficiency. The GTS 250 loses much from supersampling and rolls down to the last position, while the GTX 260 remains on the second place. It's apparently stronger than the HD 5770 in this test.
The second test that measures efficiency of executing complex looped pixel shaders with many texture lookups is called Steep Parallax Mapping. With low settings it uses 10-50 texture lookups from a bump map and three lookups from main textures. The heavy mode with self-shadowing doubles the number of texture lookups, and supersampling quadruples this number. The most complex test mode with supersampling and self-shadowing uses 80-400 texture lookups, that is eight times as many as in the low mode. Let's analyze simple modes without supersampling first:
This test is even more interesting from the practical point of view. Various parallax mapping methods have been used in games for a long time already. Heavy modifications, such as our steep parallax mapping, are already used in some projects, e.g. in Crysis and Lost Planet. Along with supersampling, our test can enable self-shadowing that doubles the GPU load (High mode).
This situation is very close to what we have seen in the previous test. These results are similar even in absolute values, except for the drop of the GTS 250. RADEON HD 5870 is twice as fast as the new HD 5770, as it should be in the updated D3D10 version of the synthetic test without supersampling. RADEON HD 5750 successfully competes with the GTS 250, and the HD 5770 has nothing to oppose to its competitor (GeForce GTX 260).
The new graphics card (HD 5770) almost catches up with the old top HD 4870 in the light mode, but it wins in the heavy mode. Let's see what supersampling will change. It caused a performance drop in NVIDIA cards in the previous test, especially in the lower solution.
Supersampling and self-shadowing increase the load on graphics cards by almost eight times, causing a great performance drop. Performance differences between the cards have changed. Supersampling has a similar effect here -- NVIDIA cards deteriorate their results relative to AMD solutions, especially the GTS 250, which is lagging behind.
The difference between the HD 5770 and HD 5870 is again approximately twofold, theoretical difference is again confirmed by our synthetic tests. RADEON HD 4870 again demonstrates performance closer to the lower HD 5750 solution from the new series. HD 5770 is faster than both of them, but it's outperformed by GeForce GTX 260.
Direct3D 10: PS 4.0 (computing)
The next couple of pixel shader tests contains minimum texture lookups to reduce the effect of TMU performance. They use a lot of arithmetic operations, so they measure arithmetic performance of GPUs, how fast they execute arithmetic instructions in pixel shaders.
The first computing test is called Mineral. It's a complex procedural texturing test, which uses only two texture lookups and 65 sin and cos instructions.
We are sick and tired of mentioning in the analysis of our synthetic test results that the modern AMD architecture performs much better in complex arithmetic tasks than the competing products from NVIDIA. People got used to it. As you can see on the diagram, this fact holds true now as well -- even the relatively weak new HD 5700 solutions easily outperform the corresponding NVIDIA cards in this test.
There is also a twofold performance difference between the HD 5770 and the HD 5870 again. Note the excellent result of the HD 4870, which should have been outperformed by the HD 5770 because of its lower clock rate (they have the same number of ALUs.) Is it the effect of higher memory bandwidth or some hardware peculiarities of the GPU? Or it's the lack of optimizations in video drivers for the RV8xx? Anyway, we expected better results from Cypress and Juniper in this test versus the RV770.
Let's take a look at the second test called Fire. It's even heavier for ALUs. It contains only a single texture lookup, while the number of sin/cos instructions is doubled to 130. Let's see what changes as the load grows:
In the second test, rendering performance is limited solely by shader units, there are no other restrictions. Perhaps, the previous arithmetic test was affected by the higher memory bandwidth in the HD 4870 versus the HD 5770. And this time we can see results similar to the PS 3.0 tests. And RADEON HD 5870 is almost twice as fast in this test as the HD 5770.
What concerns RADEON HD 5700, these cards expectedly outperform corresponding NVIDIA products. Competition is left far behind, almost twice as slow. Peak arithmetic results remain unchanged once again -- all AMD solutions are at an indisputable advantage.
Write a comment below. No registration needed!