Direct3D 9: new pixel shaders tests
These tests of DirectX 9 pixel shaders are even more complex, they are divided into two categories. We'll start with easier shaders - SM 2.0:
There are two modifications of these shaders: arithmetic intensive and texture sampling intensive. Let's analyze arithmetic-intensive modifications, they are more promising from the point of view of future applications:
These universal tests depend on the speed of ALUs and texturing, it's the overall GPU balance that matters here. Performance of graphics cards in the Frozen Glass test is limited not only by arithmetic speed, but also by texel rate. This situation is similar to what we have seen in Cook-Torrance. The new HD 5870 is outperformed by the HD 4870 X2, and it's only twice as fast as the HD 4890 (memory bandwidth does not limit performance here). However, it's significantly faster than both cards from NVIDIA.
Results in the second Parallax Mapping test depend on memory bandwidth a little, and the HD 5870 gets close to the GTX 295. Performance difference from the RV770 at the same frequencies is even below 1.5-fold. That's the first sign that we can forget about twofold performance gains in all tests, especially in games, which are more complex than synthetic applications. The new GPU apparently cannot unlock its entire capacity in these tests of textural and arithmetic performance. Let's analyze results obtained in the texture sampling intensive tests to make sure our conclusions are correct:
It's a similar situation, but the HD 5870 copes with texture lookups better, almost catching up with the HD 4870 X2 in both tests. It's a very good result, even though it's not twice as high as the result of the HD 4890 at the same frequencies. Performance is limited by the TMU speed to a higher degree here, and it's much easier to reveal texturing than arithmetic capacity in the HD 5870 in this test.
Let's have a look at results of another two pixel shader tests -- SM 3.0. They are the most complex of all our tests for Direct3D 9 pixel shaders. The tests load ALUs and texture units heavily. Both shader programs are complex, long, and include a lot of branches:
Here is another situation altogether! Both PS 3.0 tests absolutely do not depend on memory bandwidth, so the new solution shows its worth and outperforms dual-GPU cards. It's all right with NVIDIA, but it also outperforms the HD 4870 X2 a little -- a very good result. The new R7xx architecture offered huge performance gains in PS 3.0 tests. The same applies to Cypress, as RADEON HD 5780 is noticeably faster than the other contenders, especially NVIDIA cards.
Efficiency of more than doubled computing power is only a little short of twofold here. Very good, we are expecting strong results in the other arithmetic tests. The overhauled architecture from AMD again demonstrates excellent results due to many ALUs.
Direct3D 10: PS 4.0 tests (texturing, loops)
RightMark3D 2.0 includes two old PS 3.0 tests (Direct3D 9), rewritten for DirectX 10, and two new tests. The first two tests can now enable self-shadowing and shader supersampling, which increase their load on GPUs.
These tests measure efficiency of executing looped pixel shaders with a lot of texture lookups (up to several hundreds of lookups per pixel in the heaviest mode) and a relatively low ALU load. In other words, they measure a texture sampling rate and branching efficiency in a pixel shader.
The first pixel shader test will be the Fur test. When used with the lowest settings, it uses 15-30 texture lookups from bump maps and two lookups from the main texture. The High Effect Detail mode increases the number of lookups to 40-80. When shader supersampling is enabled, the number of lookups grows to 60-120. The heaviest mode is the High mode with SSAA -- 160-320 lookups from a bump map.
Let's see what happens in modes without supersampling - they are relatively simple, and the correlation of results in Low/High modes must be similar.
Performance in this test depends on the number and speed of TMUs, and a little on the fill rate and memory bandwidth. Results in the High mode are approximately 1.5 times as low as in the Low mode. That's how it should be. NVIDIA cards are traditionally strong in Direct3D 10 Fur tests with lots of texture lookups. But that's the first time when the new solution from AMD performs on a par with GeForce GTX 285 and outperforms the HD 4870 X2.
Performance difference between the HD 4890 and the HD 5870 gets close to twofold. RADEON HD 5870 is slower only than the dual-GPU solution from NVIDIA, which has always demonstrated excellent results here. Let's have a look at the results in this test with enabled shader supersampling, which quadruples the load. Perhaps it will change the situation, and memory bandwidth/fill rate will produce a weaker effect:
Theoretically, supersampling quadruples the load, and the HD 5870 is a tad stronger this time. Performance difference from the HD 4890 in the most complex situations reaches more than twofold, but the new card from AMD fails to catch up with the GTX 295. However, it outscores the GTX 285. Memory bandwidth does not affect test results here, so performance seems to be limited by ALUs and branching efficiency.
The second test that measures efficiency of executing complex looped pixel shaders with many texture lookups is called Steep Parallax Mapping. With low settings it uses 10-50 texture lookups from a bump map and three lookups from main textures. The heavy mode with self-shadowing doubles the number of texture lookups, and supersampling quadruples this number. The most complex test mode with supersampling and self-shadowing uses 80-400 texture lookups, that is eight times as many as in the low mode. Let's analyze simple modes without supersampling first:
This test is even more interesting from the practical point of view. Various parallax mapping methods have been used in games for a long time already. Heavy modifications, such as our steep parallax mapping, are already used in some projects, e.g. in Crysis and Lost Planet. Along with supersampling, our test can enable self-shadowing that doubles the GPU load (High mode).
Situation with the previous tests repeats here, even absolute results are similar. RADEON HD 5870 is the best among its relatives in the updated D3D10 version of the test without supersampling, leaving even the HD 4870 X2 behind. The new graphics card almost catches up with the single-GPU solution from NVIDIA in the light mode, but it wins in the heavy mode.
Let's see what supersampling will change. Performance drop from supersampling was bigger in NVIDIA cards in the previous test.
Supersampling and self-shadowing increase the load on graphics cards by almost eight times, causing a great performance drop. Performance differences between the cards have changed. Supersampling has a similar effect here -- AMD cards improve their results relative to the NVIDIA solution.
Although the GTX 295 keeps its leading position, RADEON HD 5870 is noticeably faster than the NVIDIA GeForce GTX 285 and much faster than the powerful HD 4870 X2. There is an almost twofold performance difference again between the HD 4890 and the HD 5870. Theoretically doubled power is noticeable in synthetic tests.
Write a comment below. No registration needed!