Direct3D 10: PS 4.0 tests (texturing, loops)
New RightMark3D 2.0 includes two old PS 3.0 tests (Direct3D 9), rewritten for DirectX 10, and two brand new tests. The first two tests can now enable self-shadowing and shader supersampling, which increase their load on GPUs.
These tests measure efficiency of executing looped pixel shaders with a lot of texture lookups (up to several hundreds of lookups per pixel in the heaviest mode!) and a relatively low ALU load. In other words, they measure a texture sampling rate and branching efficiency in a pixel shader.
The first pixel shader test will be the Fur test. When used with the lowest settings, it uses 15-30 texture lookups from bump maps and two lookups from the main texture. The High Effect Detail mode increases the number of lookups to 40-80. When shader supersampling is enabled, the number of lookups grows to 60-120. The heaviest mode is the High mode with SSAA -- 160-320 lookups from a bump map.
Let's see what happens in modes without supersampling - they are relatively simple, and the correlation of results in Low/High modes must be similar.
Performance in this test depends mostly on the number and speed of TMUs, and a little on the fill rate and memory bandwidth. Results in the High mode are approximately 1.5 times as low as in the Low mode. That's how it should be. NVIDIA cards are traditionally strong in Direct3D 10 Fur tests with lots of texture lookups. But the new solution from AMD performs on a par with GeForce 9800 GT, with which it competes in this price range. Just like DX9 texturing tests.
RADEON HD 4770 performs on a par with GeForce 9800 GT, and it also outperforms the old HD 4830. The equally-clocked HD 4770 and HD 4830 demonstrate identical results, as they theoretically should. This fact proves that the cards have the same architecture. Let's have a look at the results in this test with enabled shader supersampling, which quadruples the load. Perhaps it will change the situation, and memory bandwidth/fill rate will produce a weaker effect.
Theoretically, supersampling quadruples the load, this time the advantage of NVIDIA card has gone. It's outperformed by the HD 4830 (just a little) and by the new HD 4770. As the shader grows more complex and the GPU load increases, performance difference decreases, but it still remains significant. The HD 4770 operating at its nominal frequencies is much faster than the HD 4830. When operating at the same frequencies, the new solution is just a bit faster (although a 2-3% difference looks like a measurement error).
The second test that measures efficiency of executing complex looped pixel shaders with many texture lookups is called Steep Parallax Mapping. With low settings it uses 10-50 texture lookups from a bump map and three lookups from main textures. The heavy mode with self-shadowing doubles the number of texture lookups, and supersampling quadruples this number. The most complex test mode with supersampling and self-shadowing uses 80-400 texture lookups, that is eight times as many as in the low mode. Let's analyze simple modes without supersampling first.
This test is even more interesting from the practical point of view. Various parallax mapping methods have been used in games for a long time already. Heavy modifications, such as our steep parallax mapping, are already used in some projects, e.g. in Crysis and Lost Planet. Along with supersampling, our test can enable self-shadowing that doubles the GPU load (High mode).
The situation is similar to what we saw in the previous test, as if supersampling is enabled. RADEON HD 4770 is the best in the updated D3D10 version of the test without supersampling, leaving GeForce 9800 GT and HD 4830 behind. This time, results of the GeForce card are identical to results of AMD cards operating at the same frequencies. By the way, the latter demonstrate similar results, just as they theoretically should.Let's see what supersampling will change. Performance drop from supersampling was bigger for the NVIDIA card in the previous test.
Supersampling and self-shadowing increase the load on graphics cards by almost eight times, causing a great performance drop. Performance differences between the cards have changed. Supersampling has a similar effect here -- AMD cards improve their results relative to the NVIDIA solution.
RADEON HD 4770 is twice as fast as NVIDIA GeForce 9800 GT, and it's much faster than the HD 4830. Comparison of the RV740 and RV770LE operating at the same frequencies again proves that RV740 and RV770 have the same architecture.
Direct3D 10: PS 4.0 tests (computing)
The next couple of pixel shader tests contains minimum texture lookups to reduce the effect of TMU performance. They use a lot of arithmetic operations, so they measure arithmetic performance of GPUs, how fast they execute arithmetic instructions in pixel shaders.
The first computing test is called Mineral. It's a complex procedural texturing test, which uses only two texture lookups and 65 sin and cos instructions.
We always note in the analysis of our synthetic test results that modern AMD architecture performs better in complex arithmetic tasks than the competing products from NVIDIA. Our tests just prove it one more time. The HD 4770 is almost twice as fast as the equally-priced GeForce 9800 GT. The new RADEON demonstrates excellent performance on a par with competing graphics cards from a higher price range.
But that was to be expected. Interesting results appeared after we compared the HD 4770 and HD 4830 at equal frequencies. The difference of 5% is a tad bigger than just a measurement error. It may be caused by different optimizations in drivers for different graphics cards. Or this test also depends a little on video memory bandwidth, as we can see in the comparison of HD 4830 at different frequencies. Perhaps, it's the effect of higher latencies of GDDR5 memory.
The second shader test is called Fire, it's even harder for ALUs. It contains only a single texture lookup, while the number of sin/cos instructions is doubled to 130. Let's see what changes as the load grows.
Rendering speed in the second test is limited more by shader performance. So we have similar results, RADEON HD 4770 demonstrates better relative performance in this test than in the previous one. It means that the card was apparently limited by memory in the first test. But the difference is not very big. This time HD 4770 is more than twice as fast as GeForce 9800 GT! Equally-clocked solutions based on RV740 and RV770LE show similar results, which agrees with the theory. Our conclusion after arithmetic tests remains the same -- AMD solutions rule supreme here.
Write a comment below. No registration needed!