Our synthetic benchmarks:
Since we don't have our own synthetic DirectX 11 tests yet, we took a few samples and demos from various SDKs. These include HDRToneMappingCS11.exe and NBodyGravityCS11.exe from DirectX SDK (February 2010). We also took two samples from SDKs of both NVIDIA and AMD, so no one would accuse us of bias. From the ATI Radeon SDK we took DetailTessellation11.exe and PNTriangles11.exe (these are also found in the DirectX SDK, by the way). From NVIDIA's SDK we borrowed Realistic Character Hair and Realistic Water Terrain. These two should be available for download in the near future.
Tested graphics cards:
Let's explain the graphics cards selection. Radeon HD 5870 and HD 5970 are the fastest single and dual-GPU products from AMD, prices being close to that of GeForce GTX 480. GeForce GTX 285 is the fastest single-GPU graphics card of the previous generation, we'll use it to examine architectural changes. Finally, GeForce GTX 295 is the fastest dual-GPU graphics card from NVIDIA today.
Direct3D 9: pixel filling
This test determines peak texel rate in FFP mode for different numbers of textures applied to a pixel:
This test is a bit old, and graphics cards do not achieve theoretical peaks in it. But it still shows correct peak texturing speed. Anyway, it turns out that GeForce GTX 480 fetches up to 40 texels per clock from 32-bit textures with bilinear filtering. This is 1.5 times lower than the theoretical peak of 60 filtered texels.
This isn't enough to catch up with at least GeForce GTX 285 that fetches texture data 5-7% faster. Not to mention catching up with the competing Radeon HD 5870 that is 1.5 times faster in almost every mode. As for the dual-GPU solution from NVIDIA, it obviously is a victim of software issues. In turn, Radeon HD 5970 is even faster than Radeon HD 5870.
The difference between GeForce GTX 480 and GeForce GTX 285 is almost always the same, except for cases with few textures when bandwidth limitations have a stronger effect. Radeon HD 5870 isn't that far ahead in these tests as well. But when 4-8 textures are involved the difference grows bigger, indicating that GF100 lacks texturing performance to be ahead of its rivals in old games all the time. Let's look at the fillrate test results.
The fillrate test demonstrates the same situation, with the number of pixels in the frame buffer considered. AMD products still lead the way, having more TMUs and being more efficient. In cases where 0-3 textures are mapped the difference between solutions is noticeably smaller. In such modes bandwidth is a key bottleneck.
Direct3D 9: PS 1.1, 1.4, 2.0, 2.a
The first group of pixel shaders to be reviewed here is too simple for modern GPUs. It includes various versions of pixel programs of relatively low complexity: 1.1, 1.4, and 2.0.
These tests are very easy for modern architectures, so they cannot demonstrate all of their capabilities. But these tests are still interesting to assess the balance between texture fetches and math computing. Especially, when there's a new architecture to examine.
In these tests performance is primarily bottlenecked by TMU performance, but with efficiency and texture caching in real applications taken into account. Let's see what effect the architectural changes have. Obviously, the new GeForce GTX 480 shows better results than the previous-generation single-GPU card. Also note that in most tests GeForce GTX 480 catches up with the dual-GPU GeForce GTX 295. This is nice.
Memory bandwidth only limits the new products a bit. Besides, performance depends on texturing, preventing GF100 from catching up with at least Radeon HD 5870, not to mention the dual-GPU card. NVIDIA solutions are outperformed in these tests. This is a kind of alert for other tests where texturing performance is important. Let's see the results of more complex pixel programs:
The SM 2.a tests produce even worse results, even if compared to competitors' performance. The Water test that strongly depends on texturing performance utilizes a dependent sampling of strongly nested textures, so the graphics cards are always ranked by texturing performance, adjusted by different TMU efficiency.
RV870-based cards show maximum results, while GeForce GTX 480 fits between the single and dual-GPU products of the previous generation. Hmm, not very good. But at least it outperforms GeForce GTX 285, meaning that its TMUs are more efficient.
The results of the second test are very similar, though it's more computing-intensive. This test has always favored the AMD's architecture that has more computing units. So, as you can see, AMD's solutions are far ahead of rivals, especially the dual-GPU card.
GeForce GTX 480 outperforms GeForce GTX 285 by only 25% and lags behind the dual-GPU model by the same amount. This obviously indicates that the performance of GeForce GTX 480 is limited by the smaller number of TMUs. This is really the key drawback of the new architecture.
Direct3D 9: PS 2.0
These tests of DirectX 9 pixel shaders are even more complex, they are divided into two categories. We'll start with the easier SM 2.0 shaders.
There are two modifications of these shaders: arithmetic intensive and texture sampling intensive. Let's analyze arithmetic-intensive modifications, they are more promising from the point of view of future applications.
These tests are universal, they depend on both ALU performance and texturing speed -- the balance is the key. As you can see, the Frozen Glass results are limited not only by computing, but also by texture fetch speed. The situation is similar to what we saw in Cook-Torrance, but this time GeForce GTX 480 is much closer to GeForce GTX 295. On the other hand, even the single-GPU Radeon HD 5870 is still far ahead.
The results of Parallax Mapping are also similar. However, Radeon HD 5870 is not as far ahead this time. Let's see what happens next. But games are usually more diverse than synthetic tests. Texturing is not the only thing they utilize. Still, GF100 doesn't have enough TMUs for old tasks like these. Now let's examine the same tests, modified so texture fetch is preferred to computing. This will show whether our intermediate conclusions are correct.
The situation is similar, but AMD cards do better in fetching textures. The dual-GPU Radeon HD 5970 is especially good. GeForce GTX 480 fits between GeForce GTX 285 and GeForce GTX 295 again, because its performance is still limited by TMUs. We're not tired to repeat that GF100 doesn't have the number of TMUs a new powerful architecture should have.
But those were old tasks aimed at texturing, not very complex at that. Let's have a look at the results of two more pixel shader tests -- SM 3.0. These are the most complex of all our tests for Direct3D 9 pixel shaders. The tests load ALUs and texture units heavily. Both shader programs are complex, long, and include a lot of branches.
Finally! Both PS 3.0 tests are very complex. They don't depend on bandwidth or texturing, they are pure math, but with lots of transitions and branches. And it seems GF100 does a fine job in this case.
This is where GeForce GTX 480 shows what it's capable of. It outperforms all the competitors, except for the AMD's dual-GPU product. Besides, GeForce GTX 295 is twice as slow and GeForce GTX 285 is thrice as slow in these most complex tests. The results are clearly boosted by GF100 architectural changes aimed at increasing computing efficiency.
So, the new GF100 architecture demonstrates a very impressive performance boost in the complex PS 3.0 tests which favor the efficiency of performing complex shaders with transitions and branches, not the peak math power of AMD solutions. Though the math power, doubled comparing to GT200, does its job as well. This is a very good result, because outperforming AMD solutions that have more ALUs means a lot.
Write a comment below. No registration needed!