Direct3D 10: geometry shader tests
RightMark3D 2.0 includes two geometry shader tests. The first one is called Galaxy, it's similar to point sprites from previous Direct3D versions. It animates a system of particles using a GPU, a geometry shader creates four vertices from each dot, forming a particle. Similar algorithms should be used in future DirectX 10 games.
A change of balance in geometry tests does not affect rendering results, the image is always identical, only scene processing methods differ. GS load value determines what shader will be busy -- vertex or geometry. The amount of work is always the same.
Let's analyze the first modification of Galaxy with vertex processing for three levels of geometric complexity.
Performance ratios are approximately similar with different geometry complexity. Performance demonstrated corresponds to the number of points, FPS is halved each step. It's not a hard task for modern graphics cards. Performance in this test is not limited by streaming processors that apparently. The task is also limited by memory bandwidth, we can see it in results of the HD 4830 that outperforms the HD 4770. The latter does not accelerate when its clock rate is decreased either.
There may be some differences in drivers as well -- , results of this test are too strange. The HD 4770 performs almost on a par with GeForce 9800 GT, the HD 4830 being faster than both cards. The HD 4830 also has an advantage at equal frequencies -- it looks like GDDR5 memory does not always provide the same performance as GDDR3 with the bus twice as wide. Perhaps the situation will change, when some work is moved to a geometry shader.
As the load grows, the situation is changing. There is a difference between modifications of this test -- the HD 4770 almost catches up with the HD 4830. And now the new card from AMD outperforms its main competitor from NVIDIA. Interestingly, the card from NVIDIA and the HD 4830 demonstrate the same results with various GS load values, which are responsible for moving some of the load to the geometry shader. And the new HD 4770 accelerates a little. Let's see what will change in the next test, which generates a heavier load on geometry shaders.
Hyperlight is the second geometry test that uses several techniques: instancing, stream output, buffer load. It employs dynamic generation of geometry by rendering into two buffers, as well as a new Direct3D 10 feature -- stream output. The first shader generates ray directions, their speed and growth vectors. These data are stored in a buffer, which is used by the second shader for rendering. Each ray point is used to generate 14 vertices in a circle, up to a million output points.
The new type of shader programs is used to generate rays. If "GS load" is set to "Heavy", it's also used for rendering. That is in Balanced mode, geometry shaders are used only to generate and grow rays. Output is up to instancing. The geometry shader also outputs data in the Heavy mode. Let's analyze the easy mode first.
Relative results in various modes correspond to the load: performance scales well in all cases. It's close to theoretical parameters, according to which, each next level of Polygon count must be twice as slow. This time RADEON HD 4770 is slower than the equally-priced GeForce 9800 GT in all subtests. The HD 4770 and HD 4830 demonstrate almost identical speed both at nominal and equal frequencies, although the RV770LE is a tad faster.
Judging by all previous tests, performance here is limited by something else than memory bandwidth, fill rate, or computing power. It even does not depend much on the frequency. Perhaps the speed is limited by the video driver. However, the HD 4770 fares well against the only GeForce card here. Results must change on the next diagram for the test that actively uses geometry shaders. It will be also interesting to compare test results obtained in Balanced and Heavy modes.
Now we can at least see the difference between NVIDIA and AMD. The RV7xx executes geometry shaders better. Engineers finally solved the problem of previous architectures. Now the new solution is approximately twice as fast as GeForce 9800 GT in these conditions. Interestingly, the HD 4830 outperforms the HD 4770 here, and the difference is apparently bigger than a measurement error. That may be the difference between GDDR3 and GDDR5 memory. Or the drivers are optimized differently for different models. There are no other explanations.
What concerns the comparison of results obtained in different modes, everything is as usual. The graphics cards from AMD improve their results, as they switch from instancing to a geometry shader. And NVIDIA cards get slower. If we compare results obtained in different modes (upon condition of identical rendered images), we can see that NVIDIA and AMD cards perform on a par in different modes.
Direct3D 10: vertex texture fetch rate
Vertex Texture Fetch tests measure the speed of many vertex texture fetches. These tests are essentially similar, and the correlation of their results in Earth and Waves tests must also be similar. Both tests use displacement mapping based on texture fetches. The only major difference is that the Waves test uses conditional branches, while the Earth test does not.
Let's analyze the first test (Earth) in the Effect detail Low mode.
Judging by our previous reviews, this test is affected by texturing speed and by memory bandwidth. The easier the mode, the stronger the effect of memory bandwidth on performance. We can see it well in the comparison of graphics cards from AMD operating at the same frequencies, where the HD 4830 is victorious. Perhaps, the HD 4770 is limited mostly by memory bandwidth.
However, speaking of nominal frequencies, it's the new Low-End card from AMD that takes the lead in all modes, except for the easiest. Its competitors are close behind. And RADEON HD 4830 comes forward in the easiest mode. Let's have a look at results of this test with more texture lookups.
The situation hasn't changed much, mostly deteriorated results of the NVIDIA card. The HD 4770 demonstrates the best results in two modes, and the HD 4830 still outperforms it in the easiest mode. Comparison at the same frequencies reveals the reason for the difference -- RADEON HD 4770 gets outperformed, as the task becomes easier and performance is limited by video memory. It looks like latencies (GDDR5 has higher latencies) are as important as memory bandwidth.
Let's have a look at results of the second vertex texture fetch test. The Waves test executes fewer texture lookups, but it uses conditional branches. The number of bilinear texture lookups in this case reaches 14 (Effect detail Low) or 24 (Effect detail High) per each vertex. Geometry complexity changes just like in the previous test.
Waves test results resemble what we saw last time. AMD products get even a bigger advantage here, now GeForce 9800 GT is up to 1.5 times as slow. The new RADEON HD 4700 looks good, outperforming its rival. Although it's slower than the predecessor in all modes. The heavier the mode, the smaller the gap. Let's analyze the second modification of the test.
There are almost no changes, although AMD results grow even better versus NVIDIA performance as complexity grows. GeForce 9800 GT is again 1.5 times as slow as RADEON HD 4770. The new card is again outperformed by the HD 4830 (based on the cut-down RV770) in VTF tests. Results in this test are limited by memory speed, even though this restriction is not hard. It remains a mystery why the NVIDIA card fails to demonstrate higher results.
Conclusions on the synthetic tests
Synthetic tests of the new budget RADEON HD 4770 card, based on RV740, as well as other graphics cards from both chipmakers show us that the new solution from AMD is very powerful, the best choice in its price range. The HD 4770 should also perform very well in games and outperform the HD 4830 practically in all cases.
Although the new GPU is based on the R7xx architecture, being little different from solutions with the cut-down RV770, its apparent advantage provided by the new fabrication process is a higher clock rate. So the new card should perform in games almost on a par with RADEON HD 4850. And when the speed is limited by the fill rate, it may even shoot forward. Games are more often limited by ROPs than our tests, and the RV740 may have an advantage here.
It's very important that the 128-bit memory bus and GDDR5 memory reduce manufacturing costs without loosing performance. The old solution with the 256-bit bus and GDDR3 memory is a little faster in just a few synthetic tests, while the HD 4770 is fully justified in most cases.
What concerns competition with NVIDIA, RADEON HD 4770 significantly outperforms the only card from this company in this price range (GeForce 9800 GT) in most synthetic tests. The RV740-based card demonstrates the best result almost in all tests, it performs on a par with more expensive solutions. It's the effect of the successful architecture R7xx and the new 40nm fabrication process.
The next part of our article contains tests of the new solution from AMD in modern games. Results in games should comply with our conclusions on synthetic tests, even taking into account that rendering speed in games depends more on the fill rate and memory. So the HD 4770 may become even faster sometimes. We can assume that the HD 4770 will outperform the HD 4830 in games. It will almost catch up with the HD 4850. And it will most certainly be faster than NVIDIA GeForce 9800 GT.
Write a comment below. No registration needed!