Direct3D 10: PS 4.0 texturing, loops
The new RightMark3D 2.0 includes two old PS 3.0 tests (Direct3D 9), rewritten for DirectX 10, as well as two brand new tests. The first two tests can now enable self-shadowing and shader supersampling, which increase their load on GPUs.
These tests measure efficiency of executing looped pixel shaders with a lot of texture lookups (up to several hundreds of lookups per pixel in the heaviest mode) and a relatively low ALU load. In other words, they measure a texture sampling rate and branching efficiency in a pixel shader.
The first pixel shader test will be the Fur test. When used with the lowest settings, it uses 15-30 texture lookups from bump maps and two lookups from the main texture. The High Effect Detail mode increases the number of lookups to 40-80. When shader supersampling is enabled, the number of lookups grows to 60-120. The heaviest mode is the High mode with SSAA -- 160-320 lookups from a bump map.
Let's see what happens in modes without supersampling -- they are relatively simple, and the correlation of results in Low/High modes should be similar.
In this test performance depends on both the number and efficiency of TMUs and on fillrate and memory bandwidth (to a lesser degree). The results of the High effect detail part are about 1.5 times lower than those of the Low effect detail part -- all according to the theory. NVIDIA solutions are traditionally strong in the Direct3D 10 Fur test with a lot of texture fetches, but AMD is catching up.
GeForce GTX 480 is about 1/3 faster than GeForce GTX 285, but still lags behind GeForce GTX 295 -- like it was in the DX9 tests. This shows the effect of fillrate and memory bandwidth, a field where NVIDIA's novelty has an advantage compared to the previous-generation single-GPU card. The position of GF100 is similar relatively to the two RV870-based products as well. Now take a look at the results of the same test with shader supersampling enabled. The latter should increase load on the graphics cards by 4 times. Perhaps, it will somewhat reduce the effect of fillrate and memory bandwidth.
Strangely enough, this time GeForce GeForce GTX 480 yields, while both Radeons do a bit better. The difference between GeForce GTX 480 and GeForce GTX 285 is very small, meaning that performance is most likely limited by texturing. Or memory bandwidth -- that of GeForce GTX 480 is not much higher than that of GeForce GTX 285. What this test doesn't show is the effect of ALU performance and efficient branching.
The second test that measures efficiency of executing complex looped pixel shaders with many texture lookups is called Steep Parallax Mapping. With low settings it uses 10-50 texture lookups from a bump map and three lookups from main textures. The heavy mode with self-shadowing doubles the number of texture lookups, and supersampling quadruples this number. The most complex test mode with supersampling and self-shadowing uses 80-400 texture lookups, that is eight times as many as in the low mode. Let's analyze simple modes without supersampling first:
This test is more interesting from the practical point of view. Various parallax mapping methods have been used in games for a long time already. Heavy modifications, such as our steep parallax mapping, are already used in some projects, e.g. in Crysis and Lost Planet. Along with supersampling, our test can enable self-shadowing that doubles the GPU load (the High mode).
The diagram is very similar to the previous, even down to absolute results. In the non-supersampling variant of the test GeForce GTX 480 does a bit better than the previous-generation single-GPU card. But it still loses to the dual-GPU GeForce GTX 295. GeForce GTX 480 also slightly outperforms Radeon HD 5870. But AMD's dual-GPU solution is still the absolute winner.
Let's see how supersampling will affect the results. NVIDIA products have always suffered a bit more from it.
Supersampling and self-shadowing increase the load on graphics cards by almost eight times, causing a great performance drop. Performance differences between graphics cards change. Supersampling has a similar effect here -- AMD cards improve their results on the background of NVIDIA solutions.
Both dual-GPU cards outperform GeForce GTX 480. But this time the novelty also loses a bit to Radeon HD 5870, its direct rival. This is probably what we'll see in games -- in some cases GeForce GTX 480 will considerably outperform the competitor, and in some it will lose a bit. Well, at least GeForce GTX 480 outperforms its predecessor -- noticeably in the light-load mode, a bit in the heavy-load mode. Unfortunately, architectural changes haven't provided any special boosts for these tests.
Direct3D 10: PS 4.0 computing
The next couple of pixel shader tests contains minimum texture lookups to reduce the effect of TMU performance. They use a lot of arithmetic operations, so they measure arithmetic performance of GPUs, how fast they execute arithmetic instructions in pixel shaders.
The first computing test is called Mineral. It's a complex procedural texturing test, which uses only two texture lookups and 65 sin and cos instructions.
Math tests should demonstrate major changes, because GF100 has doubled ALU performance compared to GT200. However, theoretically, AMD cards should be even faster in these synthetic tests, because AMD's architecture is obviously more beneficial in complex computing tasks. This time is no exception. Though GeForce GTX 480 catches up, the difference between NVIDIA and AMD is still more than 1.5 times.
It's also interesting to compare the novelty with GeForce GTX 285 and GeForce GTX 295. This time it can neither outperform the former by 2 times or just outperform the latter. This confirms that this test doesn't depend on ALU performance alone. Though, the difference in memory bandwidth isn't the only reason as well. GF100 can only gain 38% compared to GeForce GTX 285, which is quite strange and as poor.
The second shader test is called Fire, it's even harder for ALUs. It contains only a single texture lookup, while the number of sin/cos instructions is doubled to 130. Let's see what changes as the load grows:
In the second test rendering performance is almost solely limited by the performance of shader units. However, the difference between GeForce GTX 285 and GeForce GTX 480 is too small -- just 58% -- though, theoretically, it should be close to 100%. But at least the novelty catches up with GeForce GTX 295, something it couldn't do in the previous test. Anyway, the competing Radeon HD 5870, not to mention Radeon HD 5970, does even better.
Let's draw an intermediate conclusion on D3D10 math tests. All NVIDIA cards lose, even the new GeForce GTX 480 loses by about 2 times under peak synthetic load. Nevermind that, theoretically, it should be twice as fast as GeForce GTX 285. The real numbers are much smaller.
In other words, the peak-load math tests are still conquered by AMD, even the rollout of the GeForce GTX 400 series cannot change that.
But let's proceed to geometry shaders. The novelty promises to be very strong in those tests.
Write a comment below. No registration needed!