Direct3D 10: PS 4.0 tests (texturing, loops)
New RightMark3D 2.0 includes two old PS 3.0 tests (Direct3D 9), rewritten for DirectX 10, and two brand new tests. The first two tests can now enable self-shadowing and shader supersampling, which increase their load on GPUs.
These tests measure efficiency of executing looped pixel shaders with a lot of texture lookups (up to several hundreds of lookups per pixel in the heaviest mode!) and a relatively low ALU load. In other words, they measure a texture sampling rate and branching efficiency in a pixel shader.
The first pixel shader test will be the Fur test. When used with the lowest settings, it uses 15-30 texture lookups from bump maps and two lookups from the main texture. The High Effect Detail mode increases the number of lookups to 40-80. When shader supersampling is enabled, the number of lookups grows to 60-120. The heaviest mode is the High mode with SSAA -- 160-320 lookups from a bump map.
Let's see what happens in modes without supersampling - they are relatively simple, and the correlation of results in Low/High modes must be similar.
Unfortunately, as we use a special version of RightMark 2.0, which hasn't been publicly released, the multi-GPU render mode in D3D10 tests wouldn't work. And a little difference between GTX 280 and GTX 280 SLI can be explained with a measurement error. At least it happens with the drivers provided by NVIDIA for GeForce GTX 295. We regret to say that the rest of our article devoted to Direct3D 10 tests will be very dull. We cannot compare GeForce GTX 295 with a single-GPU GTX 280 or with the HD 4870 X2, which will most likely become the winner in all tests.
Everything worked fine with GeForce 9800 GX2 and the SLI mode demonstrated the expected advantage. But this time the drivers do not seem to enable SLI for our RightMark 2.0 executable. So we publish results of this test only to see performance ratios between GTX 295 and SLI systems based on GTX 280 and GTX 260. Despite the single-GPU rendering mode, their relative results will most likely conform with dual-GPU performance as well. Alas, comparisons with a single GTX 280 and RADEON HD 4870 X2 are impossible.
Performance in the Fur test depends not only on the number and speed of TMUs, but also on the fill rate and memory bandwidth. AMD fixed the bugs in its drivers some time ago, and now the dual-GPU card from AMD performs on a par with single-GPU NVIDIA cards in Direct3D 10 tests of procedural fur visualization with a lot of texture lookups.
Let's have a look at the results in this test with enabled shader supersampling, which quadruples the load. Perhaps it will change the situation, and memory bandwidth with fill rate will produce a weaker effect:
Theoretically, supersampling quadruples the load. And practically nothing has changed this time -- the new dual-GPU card from AMD slightly outperforms single-chip cards from NVIDIA. Good thing AMD finally improved its results in this test -- they had been really bad. If only NVIDIA enabled SLI for RightMark again.
The second test that measures efficiency of executing complex looped pixel shaders with many texture lookups is called Steep Parallax Mapping. With low settings it uses 10-50 texture lookups from a bump map and three lookups from main textures. The heavy mode with self-shadowing doubles the number of texture lookups, and supersampling quadruples this number. The most complex test mode with supersampling and self-shadowing uses 80-400 texture lookups, that is eight times as many as in the low mode. Let's analyze simple modes without supersampling first:
Let's see what supersampling will change. Performance drop from supersampling was bigger in NVIDIA cards in the previous test. Supersampling and self-shadowing increase the load on graphics cards by almost eight times, causing a great performance drop.
These tests are more interesting from the practical point of view. Various parallax mapping methods have been used in games for a long time already. Heavy modifications, such as our steep parallax mapping, are already used in some projects, e.g. in Crysis and Lost Planet. Along with supersampling, our test can enable self-shadowing that doubles the GPU load (High mode).
Results of the previous test repeat themselves here. The card from AMD wins, when supersampling is enabled, even though self-shadowing causes a big performance drop in AMD products. Even in a single-GPU mode, GeForce GTX 295 is not much slower than the HD 4870 X2, mostly in heavy modes. Comparisons with the other cards from NVIDIA conform with the theoretical data -- GTX 295 ranks in between GTX 280 and GTX 260.
Direct3D 10: PS 4.0 tests (computing)
The next couple of pixel shader tests contains minimum texture lookups to reduce the effect of TMU performance. They use a lot of arithmetic operations, so they measure arithmetic performance of GPUs, how fast they execute arithmetic instructions in pixel shaders.
The first computing test is called Mineral. It's a complex procedural texturing test, which uses only two texture lookups and 65 sin and cos instructions.
We always note in the analysis of our synthetic test results that modern NVIDIA solutions often perform worse than competing cards from AMD in complex arithmetic tasks. Even if all NVIDIA solutions (except for a single GTX 280) had performed twice as fast, they would have still been defeated by RADEON HD 4870 X2, which demonstrated very high results in the Mineral test. And GeForce GTX 295 (broken SLI) performs close to GTX 280 SLI.
The second shader test is called Fire, it's even harder for ALUs. It contains only a single texture lookup, while the number of sin/cos instructions is doubled to 130. Let's see what changes as the load grows:
Rendering performance in this test is also limited solely by the speed of shader units. So this test favors AMD architectures. Advantage of solutions from this company is even more impressive this time. Even if we double results of GeForce GTX 295, it will still be over 1.5 times as slow as the dual-GPU card from AMD. Anyway, GTX 295 demonstrates high results among NVIDIA cards (disabled SLI).
Write a comment below. No registration needed!