Direct3D 10: PS 4.0 tests (computing)
The next couple of pixel shader tests contains minimum texture lookups to reduce the effect of TMU performance. They use a lot of arithmetic operations, so they measure arithmetic performance of GPUs, how fast they execute arithmetic instructions in pixel shaders.
The first computing test is called Mineral. It's a complex procedural texturing test, which uses only two texture lookups and 65 sin and cos instructions.
We always note in the analysis of our synthetic test results that modern AMD architecture performs better in complex arithmetic tasks than the competing products from NVIDIA. Our tests just prove it one more time. The new HD 5870 is twice as fast as the best single-GPU card from NVIDIA, and it even outperforms the dual-GPU card from the same company.
We get an interesting result compared to the HD 4890 and HD 4870 X2 -- there is neither twofold difference with the single-GPU solution, nor better results than prev-gen dual-GPU cards. Perhaps the test is not completely dependent on ALU speed. But it does not depend on memory bandwidth either. Only 44% of performance gain versus the HD 4890 -- it's very strange.
But let's take a look at the second test called Fire. It's even heavier for ALUs. It contains only a single texture lookup, while the number of sin/cos instructions is doubled to 130. Let's see what changes as the load grows:
In the second test, rendering performance is limited almost solely by shader units, so results are similar to those in PS 3.0 tests. RADEON HD 5870 is a tad faster in this test than the HD 4870 X2, and it's almost twice as fast as the HD 4890. So, it was apparently hampered by something in the first test.
NVIDIA cards are way behind. The single-GPU modification of the GTX 285 is over 2.5 times as slow as the new card. That's exactly how much faster a new product from NVIDIA must be compared to the GT200 in order to at least catch up with AMD in arithmetic performance. Peak arithmetic results remain unchanged once again -- AMD solutions are at an indisputable advantage, and the Cypress only adds strength to it.
Direct3D 10: geometry shader tests
RightMark3D 2.0 includes two geometry shader tests. The first one is called Galaxy, it's similar to point sprites from previous Direct3D versions. It animates a system of particles using a GPU, a geometry shader creates four vertices from each dot, forming a particle. Similar algorithms should be used in future DirectX 10 games.
A change of balance in geometry tests does not affect rendering results, the image is always identical, only scene processing methods differ. GS load value determines what shader will be busy -- vertex or geometry. The amount of work is always the same.
Let's analyze the first modification of Galaxy with vertex processing for three levels of geometric complexity:
Performance ratios are approximately similar with different geometry complexity. Performance demonstrated corresponds to the number of points, FPS is halved each step. It's not a hard task for modern graphics cards. Performance in this test is not limited by streaming processors. The task is limited by something else than memory bandwidth, we can see it in results of the HD 5870 at various frequencies. Very strange.
Perhaps the problem is in the drivers, as the HD 5870 performs exactly(!) as the HD 4890, as GeForce GTX 285 shows similar results. Only the HD 4870 X2 and GTX 295 X2 gain from AFR, so they are practically twice as fast as single-GPU cards. So these results are useless. Perhaps the situation will change, when some work is moved to a geometry shader.
Test results almost do not change, as the load grows. All our graphics cards look as if they don't see changes in GS load values, which are responsible for moving some of the load to the geometry shader, all of them demonstrate similar results. Let's see what will change in the next test, which generates a heavier load on geometry shaders. And it's time to remove the Galaxy test from our test procedure, as it's useless.
Hyperlight is the second geometry test that uses several techniques: instancing, stream output, buffer load. It employs dynamic generation of geometry by rendering into two buffers, as well as a new Direct3D 10 feature -- stream output. The first shader generates ray directions, their speed and growth vectors. These data are stored in a buffer, which is used by the second shader for rendering. Each ray point is used to generate 14 vertices in a circle, up to a million output points.
The new type of shader programs is used to generate rays. If "GS load" is set to "Heavy", it's also used for rendering. That is in Balanced mode, geometry shaders are used only to generate and grow rays. Output is up to instancing. The geometry shader also outputs data in the Heavy mode. Let's analyze the easy mode first:
How odd. Dual-GPU configurations showed their true face, both from NVIDIA and AMD. Now that's strange. In other respects, relative results in various modes correspond to the load: performance scales well in all cases. It's close to theoretical parameters, according to which, each next level of Polygon count must be twice as slow.
The speed of the RADEON HD 5890 is a tad higher than that of the prev-gen solution in all tests, GeForce GTX 285 performs on a similar level. A difference between two HD 5870 configurations shows that performance is limited by memory bandwidth.
By on the whole, judging by the two tests, performance is limited by something else than memory bandwidth, fillrate, and computing power. It does not depend much on frequency either. Perhaps, we've reached the limit of API and/or the video driver. However, the HD 5870 fares well against all cards here. Results must change on the next diagram for the test that actively uses geometry shaders. It will be also interesting to compare test results obtained in Balanced and Heavy modes.
There is only a minor change again. The RV7xx executes geometry shaders better. Engineers finally solved the problem of previous architectures. The same concerns Cypress. But performance is limited by something invisible here, so we cannot evaluate the difference. It looks like we are out of geometry shader tests -- none of them shows realistic results. Probably limited by raster performance? By CPU? We cannot say for sure without additional research.
Write a comment below. No registration needed!