3DMark Vantage Feature tests
We can do nothing with RightMark for now. So we added synthetic tests from 3DMark Vantage into this review. It's a new benchmark with interesting feature tests, which differ from ours. Besides, GPU designers really pay their attention to results demonstrated by their cards in this benchmark. Perhaps, we'll draw some new useful conclusions from the analysis of results in this benchmark.
Feature test 1: Texture Fill
The first test deals with texture fill rate. It fills a rectangle with values read from a small texture with many texture coordinates, which are changed in each frame.
It's a very interesting balance that does not quite agree with our test results. Perhaps Futuremark uses strange conditions, where NVIDIA cards do not get advantage from many not quite usual TMUs. They shouldn't have lagged so much behind. The old dual-GPU card from AMD is outperformed by the new single-GPU product in this test, and the latter becomes the leader here.
The single-GPU model from NVIDIA lags far behind, and the new HD 5870 is even more than twice as fast as the card based on the prev-gen GPU. Cypress shows very good results, AMD architecture contains very efficient TMUs.
Feature Test 2: Color Fill
It uses a simple pixel shader, which does not limit performance. An interpolated color value is written into the off-screen buffer (render target) with alpha blending. It uses a 16-bit off-screen buffer of the FP16 format, which is often used in games employing HDR, so this is a timely test.
We have very interesting results here. Results of this test do not agree with our synthetic tests again, even taking into account that we use an integer buffer with 8 bits per component, while the Vantage test employs a floating-point 16-bit buffer. These numbers generally reflect memory bandwidth (multiplied by two for dual-GPU cards) rather than ROP performance. In this case results agree with theoretical data, and depend mostly on memory bus width, its clock rate and type. The new HD 5870 reveals its very nearly only weakness (relative) in this test.
Although the RV770 architectures feature productive ROPs (inherited by Cypress without changes, proven by the comparison with the HD 4890 at equal frequencies) and higher bandwidth of GDDR5 memory, the new card shows results which are just a tad above the GTX 285 with a 512-bit bus and GDDR3 memory. Here is a potential performance bottleneck for HDR buffers, which are used almost in all games.
Feature Test 3: Parallax Occlusion Mapping
It's one of the most interesting feature tests, as this technique is already used in games. It draws one quadrangle (to be more exact, two triangles) using Parallax Occlusion Mapping that imitates complex geometry. The test uses resource-intensive operations to trace rays and high-res Z maps. This surface is also shaded using the heavy Strauss algorithm. This test uses a very complex and heavy pixel shader with multiple texture lookups during ray tracing, dynamic branches, and complex Strauss lighting algorithms.
This test depends on shader power, branching efficiency, and texture fetch rate combined. That is, it takes a balanced GPU and card to reach high speed. And efficiency of executing branches in shaders is of primary importance, so called execution granularity. Let's see what has changed in Cypress.
It's a very good result compared to the HD 4890 -- the new card is 2.2 times as fast, that's maximum what we can expect from it. It significantly outperforms the dual-RV770 card and the single-GPU GeForce GTX 285. Only the GTX 295 takes the lead, dual-GPU rendering is very effective in this test. NVIDIA solutions execute branched code better than the old cards from AMD. But the new HD 5870 is even more efficient, and it almost catches up with the dual-GPU GTX 295. It's a very good result!
Feature Test 4: GPU ClothThis test computes physical interactions (cloth simulation) using a GPU. It uses vertex simulation with the help of vertex and geometry shaders, with several passes. Stream out is used to move vertices from one pass to the other. Thus, this feature test benchmarks execution of vertex and geometry shaders, and stream out speed.
This test gave strange results for dual-GPU cards last time. But this time the HD 4870 X2 is the fastest card here, while the GTX 295 performs slower than its single-GPU modification. In other respects, there is a strange bottleneck somewhere (stream out? geometry shaders?)
The HD 5870 is only one quarter as fast as the HD 4890, it does not even correspond to the difference in memory bandwidth. Especially as it does not limit performance much here, as we can see in a comparison of the HD 5870 operating at different memory frequencies. So, the new card under review performs on a par with the single-GPU GeForce GTX 285 -- it's a good result. It looks like GPU Cloth performance does not depend on shader speed, but on stream out and/or memory bandwidth. Perhaps the same concerns some of our RightMark 2.0 tests.
Feature Test 5: GPU Particles
Physics simulation test based on particle systems computed with the help of GPUs. The test also uses vertex simulation, where each vertex is a single particle. Stream out is used for the same purpose as in the previous test. The test computes hundreds of thousands of particles, all of them being animated separately, and their collisions with a bump map. Similar to one of our tests in RightMark3D 2.0, particles are drawn with a geometry shader, which creates four vertices from each point and forms a particle from them. However, the heaviest load falls on shader units (vertex calculations), stream out is used as well.
Dual-GPU solutions show strange results again -- the AMD card gets twice as fast, becoming the leader, while the GTX 295 is lower than even the single-GPU GTX 285. So AFR apparently does not work in NVIDIA cards in this test. In other respects, the situation is similar to the previous case, only the new cards from AMD perform better.
NVIDIA solutions are lagging behind now, and the new card from AMD is just a little faster than the HD 4890. Performance in this test depends on memory bandwidth to a certain degree. The card under review outperforms all single-GPU cards, being slower only than the HD 4870 X2. Is performance limited by stream out and/or memory bandwidth again?
Feature Test 6: Perlin Noise
The last feature test is arithmetically intensive for a GPU. It calculates several octaves of Perlin noise in a pixel shader. Each color channel uses its own noise function for higher GPU loads. Perlin noise is a standard algorithm, which is often used for procedural texturing, it's a mathematically complex procedure.
So, the last feature test in the Futuremark benchmark shows pure arithmetic performance of GPUs. No wonder its performance generally agrees with what we have seen above in our arithmetic tests from RightMark 2.0. But a difference between the HD 5870 and the GTX 285 in this test is much higher -- almost 3.5-fold!
AMD cards naturally outperform NVIDIA cards in this test. And RADEON HD 5870 is a pure leader, outscoring even the prev-gen dual-GPU card by 1.5. To say nothing of the HD 4890, which is slower by 2.5 in this test, which is even bigger than the theoretically possible value. That may be the effect of some modifications in the new GPU, which should raise efficiency of arithmetic operations.
Conclusions on the synthetic tests
Synthetic tests of the new RADEON HD 5870 card, based on Cypress, as well as other graphics cards from both chipmakers show us that the new solution from AMD is the most powerful among single-GPU products. It performs very well even against dual-GPU cards of previous generations. We suppose that the HD 5870 should demonstrate very strong results in games, at least outperform single-GPU competitors from NVIDIA, to say nothing of the prev-gen HD 4870 and HD 4890. What concerns dual-GPU cards, in most cases the HD 5870 gets close to the HD 4870 X2. And it does not surprise us.
Although the new GPU belongs to the new R8xx architecture supporting DirectX 11 API, it's essentially little different from the RV770/RV790 on the architectural level. Its apparent advantage provided by the new 40nm process technology is twice as many execution units as well as a higher clock rate (compared to the HD 4870). That's why we assume that the card will demonstrate excellent results in games, similar to the RADEON HD 4870 X2. Architectural improvements play a smaller role.
In rare cases performance of Cypress-based solutions may be limited by memory bandwidth. This GPU is equipped with a 256-bit memory bus, just like the RV770. So using faster GDDR5 memory increases memory bandwidth only by one third. It may be insufficient in those cases, when rendering speed is generally limited by the effective fillrate, which sometimes happens in games. But you should understand that a defeat from the GTX 285 is out of the question. It's just that there will be fewer cases of twofold advantage over the HD 4890 in real tests.
And what concerns competition with NVIDIA, it's practically non-existent. RADEON HD 5870 is significantly faster than the single-GPU competitor from NVIDIA and even the dual-GPU GTX 295 in most synthetic tests. The GTX 295 will certainly perform better in games than in our theoretical tests, but the comparison will still favor the AMD solution.
The next part of our article contains tests of the top solution from AMD based on the new GPU in modern games. Results in games should comply with our conclusions on synthetic tests. But we should take into account that rendering speed in games often depends more on the fill rate and memory bandwidth. So it will be very difficult to catch up with the HD 4870 X2 sometimes. But the HD 5870 will certainly be faster than the HD 4890 and NVIDIA GeForce GTX 285 in most cases.
Write a comment below. No registration needed!