Direct3D 10: PS 4.0 tests (texturing, loops)
RightMark3D 2.0 includes two old PS 3.0 tests (Direct3D 9), rewritten for DirectX 10, and two brand new tests. The first two tests can now enable self-shadowing and shader supersampling, which increase their load on GPUs.
These tests measure efficiency of executing looped pixel shaders with a lot of texture lookups (up to several hundreds of lookups per pixel in the heaviest mode) and a relatively low ALU load. In other words, they measure a texture sampling rate and branching efficiency in a pixel shader.
The first pixel shader test will be the Fur test. When used with the lowest settings, it uses 15-30 texture lookups from bump maps and two lookups from the main texture. The High Effect Detail mode increases the number of lookups to 40-80. When shader supersampling is enabled, the number of lookups grows to 60-120. The heaviest mode is the High mode with SSAA -- 160-320 lookups from a bump map.
Let's see what happens in modes without supersampling -- they are relatively simple, and the correlation of results in Low/High modes must be similar.
Results in the High mode are less than 1.5 times as low as in the Low mode, but the ratios are similar. Performance in this test depends not only on the number and speed of TMUs, but also on the fill rate and memory bandwidth. On the whole, the procedural Fur tests for Direct3D 10 with lots of texture lookups show some advantage of NVIDIA solutions even over dual-GPU AMD cards.
GeForce GTX 285 defeats the older GTX 280 (by 12-13%, which conforms with theoretical data) as well as HD 4850 X2 and HD 4870 X2 in both modes. Let's take a look at the results in this test with enabled shader supersampling, which quadruples the load. Perhaps it will change the situation -- memory bandwidth and fill rate should have a weaker effect there:
Theoretically, supersampling quadruples the load. But a performance drop of NVIDIA cards is traditionally deeper than that of AMD cards. Performance gap between solutions from different manufacturers closes down, and GTX 285 performs on a par with HD 4870 X2. However, HD 4850 X2 is still faster.
The second test that measures efficiency of executing complex looped pixel shaders with many texture lookups is called Steep Parallax Mapping. With low settings it uses 10-50 texture lookups from a bump map and three lookups from main textures. The heavy mode with self-shadowing doubles the number of texture lookups, and supersampling quadruples this number. The most complex test mode with supersampling and self-shadowing uses 80-400 texture lookups, that is eight times as many as in the low mode. Let's analyze simple modes without supersampling first:
This test is more interesting for us from the practical point of view. Various parallax mapping methods have been used in games for a long time already. Heavy modifications, such as our steep parallax mapping, are already used in some projects, e.g. in Crysis and Lost Planet. Along with supersampling, our test can enable self-shadowing that doubles the GPU load (High mode).
We've got the same situation as in the previous test, and GeForce GTX 285 breaks away from its competitors even further. Its performance gap from GTX 280 reaches 15% already. Perhaps, this test is limited by the speed of stream processors to a greater degree. In the updated D3D10 version of parallax mapping tests without supersampling, AMD cards still cope with the task worse than NVIDIA products. Besides, self-shadowing causes a bigger performance drop in AMD products than in NVIDIA solutions.
Let's see what supersampling will change. Performance drop from supersampling was bigger in NVIDIA cards in the previous test.
Supersampling and self-shadowing increase the load on graphics cards by almost eight times, causing a great performance drop. Performance ratios between graphics cards have changed, although supersampling has a similar effect here -- AMD cards improve their results relative to NVIDIA solutions.
What concerns the comparison of the overhauled GeForce GTX 285 and the former top card, the new card outperforms its predecessor by 13-14%, which conforms with theoretical data, if the test speed is limited by arithmetic performance. The graphics card copes with the test well in all modes. It performs on a par with its dual-GPU competitors, only in Low mode the new card is a tad outperformed by HD 4870 X2, which is not its direct competitor.
Direct3D 10: PS 4.0 tests (computing)
The next couple of pixel shader tests contains minimum texture lookups to reduce the effect of TMU performance. They use a lot of arithmetic operations, so they measure arithmetic performance of GPUs, how fast they execute arithmetic instructions in pixel shaders.
The first computing test is called Mineral. It's a complex procedural texturing test, which uses only two texture lookups and 65 sin and cos instructions.
We already noted in our articles that the modern AMD architecture often performs better in complex arithmetic tasks than the competing architecture from NVIDIA. That's what happened this time as well. Owing to their dual-GPU design (thanks to AFR mode), AMD cards are more than twice as fast as both GeForces.
GeForce GTX 285 is faster than GeForce GTX 280 by 15%, which agrees with the theoretical difference in frequencies of stream processors. Unfortunately, a graphics card with a single GT200b cannot compete with dual-GPU cards from AMD, the RV770 architecture is too good at arithmetics.
The second shader test is called Fire, it's even harder for ALUs. It contains only a single texture lookup, while the number of sin/cos instructions is doubled to 130. Let's see what changes as the load grows:
Rendering performance in the second test is also limited solely by the speed of shader units. Performance difference between NVIDIA and AMD solutions is even bigger here. The latter cards would be faster even if they had a single GPU.
GeForce GTX 285 outperforms GTX 280 by more than 16%, which is close to the theoretical difference in shader performance, caused by the increased frequencies. But the new solution still cannot compare to RADEON HD 4850 X2 and HD 4870 X2.
Write a comment below. No registration needed!