iXBT Labs - Computer Hardware in Detail

Platform

Video

Multimedia

Mobile

Other

NVIDIA GeForce GTX 460 Graphics Card



<< Previous page

     Next page >>

Direct3D 10: PS 4.0 texturing, loops

The new RightMark3D 2.0 includes two old PS 3.0 tests (Direct3D 9), rewritten for DirectX 10, as well as two brand new tests. The first two tests can now enable self-shadowing and shader supersampling, which increase their load on GPUs.

These tests measure efficiency of executing looped pixel shaders with a lot of texture lookups (up to several hundreds of lookups per pixel in the heaviest mode) and a relatively low ALU load. In other words, they measure a texture sampling rate and branching efficiency in a pixel shader.

The first pixel shader test will be the Fur test. When used with the lowest settings, it uses 15-30 texture lookups from bump maps and two lookups from the main texture. The High Effect Detail mode increases the number of lookups to 40-80. When shader supersampling is enabled, the number of lookups grows to 60-120. The heaviest mode is the High mode with SSAA -- 160-320 lookups from a bump map.

Let's see what happens in modes without supersampling -- they are relatively simple, and the correlation of results in Low/High modes should be similar.



In this test performance depends on both the number and efficiency of TMUs and on fillrate and memory bandwidth (to a lesser degree). The results of the High effect detail part are about 1.5 times lower than those of the Low effect detail part -- all according to the theory. NVIDIA solutions are traditionally strong in the Direct3D 10 Fur test with a lot of texture fetches, but ATI is catching up.

GTX 460 still lags behind GTX 465 and HD 5830, but outperforms HD 5770. This can be explained by effective fillrate and bandwidth, in terms of which GTX 460 is behind every competitor except for HD 5770. This is also the reason for the leadership of GTX 480. Now take a look at the results of the same test with shader supersampling enabled. The latter should increase load on the graphics cards by 4 times. Perhaps, it will somewhat reduce the effect of fillrate and memory bandwidth.



Enabling supersampling theoretically quadruples load, and NVIDIA solutions back off a little. In turn, both Radeons look better, HD 5830 even competes with GTX 480. And GTX 460 now yields to both GTX 465 and HD 5770. This test doesn't expose the effects of ALU performance and effective branching.

The second test that measures efficiency of executing complex looped pixel shaders with many texture lookups is called Steep Parallax Mapping. With low settings it uses 10-50 texture lookups from a bump map and three lookups from main textures. The heavy mode with self-shadowing doubles the number of texture lookups, and supersampling quadruples this number. The most complex test mode with supersampling and self-shadowing uses 80-400 texture lookups, that is eight times as many as in the low mode. Let's analyze simple modes without supersampling first.



This test is more interesting from the practical point of view. Various parallax mapping methods have been used in games for a long time already. Heavy modifications, such as our steep parallax mapping, are already used in some projects, e.g. in Crysis and Lost Planet. Along with supersampling, our test can enable self-shadowing that doubles the GPU load (the High mode).

The diagram is very similar to the previous, even down to absolute results. In the non-supersampling variant of the test GTX 460 does a bit better than HD 5830. But it still loses to GTX 465. GTX 480 is the leader again with a clear advantage in every feature except for texturing.

Let's see how supersampling will change the picture. NVIDIA cards should slow down.



Supersampling and self-shadowing increase the load on graphics cards by almost eight times, causing a great performance drop. Performance differences between graphics cards change. Supersampling has a similar effect here -- ATI cards improve their results on the background of NVIDIA solutions.

HD 5830 has almost caught up with GTX 480, and HD 5770 performs on a par with GTX 465. GTX 460 loses to both HD 5770 and GTX 465. We hope this will change in gaming tests. For now we have to conclude that architectural changes introduced into GF104 do not provide any benefits in these synthetic tests.

Direct3D 10: PS 4.0 computing

The next couple of pixel shader tests contains minimum texture lookups to reduce the effect of TMU performance. They use a lot of arithmetic operations, so they measure arithmetic performance of GPUs, how fast they execute arithmetic instructions in pixel shaders.

The first computing test is called Mineral. It's a complex procedural texturing test, which uses only two texture lookups and 65 sin and cos instructions.



Purely mathematical tests should be interesting, because GF104 differs from GF100 in terms of architecture. In other aspects, ATI products are obviously faster, because company's architecture is much better in calculation-heavy tasks than NVIDIA's. Today's tests confirmed this once again. The gap between NVIDIA and ATI cards is quite large. Radeon HD 5830 is the leader, while Radeon HD 5770 almost caught up with the top-class GeForce GTX 480.

But all that was clear from the beginning. Let's pay more attention to GeForce GTX 460 and GeForce GTX 465. These two perform more or less according to theoretical assumptions. GeForce GTX 460 is a bit faster, however, theoretically, the difference should've been larger.

This benchmark didn't fully depend on ALU performance in our previous tests, so let's consider another shader test -- Fire. It's even harder for ALUs. It contains only a single texture lookup, while the number of sin/cos instructions is doubled to 130. Let's see what changes as the load grows.



Well, nothing much changes. In the second test rendering performance is almost only limited by shader units performance as well. So the gap between GeForce GTX 460 and GeForce GTX 465 becomes a bit larger. But it's still insufficient -- it should be about 6%, not 2%. Anyway, the novelty still loses to Radeon HD 5770, not to mention Radeon HD 5830, which is the leader again.

The conclusion on math tests has remained the same for the several recent years: ATI solutions clearly win, and even the GeForce GTX 400 series can't change the situation. In peak math tests even mid-end ATI products perform like NVIDIA's top-class. That's architectural difference for you. Anyway, let's proceed to geometry shaders. GeForce GTX 460 is bound to put up a better show.

Direct3D 10: geometry shaders

RightMark3D 2.0 includes two geometry shaders tests. The first one is called Galaxy, it's similar to point sprites from previous Direct3D versions. It animates a system of particles using a GPU, a geometry shader creates 4 vertices from each dot, forming a particle. Similar algorithms should be used in future DirectX 10 games.

A change of balance in geometry tests does not affect rendering results, the image is always identical, only scene processing methods differ. GS load value determines what shader will be busy -- vertex or geometry. The amount of work is always the same.

Let's analyze the first modification of Galaxy with vertex processing for three levels of geometric complexity:



The correlation of results with different complexity levels of the scene is almost the same. Performance demonstrated corresponds to the number of points, FPS is halved each step. It's not a hard task for modern graphics cards. Performance in this test is not limited by streaming processors that apparently. The task is also limited by memory bandwidth and fill rate, although to a lesser degree.

GeForce GTX 480 outperforms other graphics cards in all modes. Based on the same GF100 GPU and having more geometry units, GeForce GTX 465 outperforms GeForce GTX 460. The latter, in its turn, outperforms both Radeon cards. As we have expected, GF104 remains efficient in handling geometry shaders. Not as efficient as GF100 is, though. Let's see what will change, if we move some of the math to the geometry shader.



NVIDIA solutions perform almost the same, while both Radeon cards speed up a bit, and Radeon HD 5830 performs a bit better than GeForce GTX 460. NVIDIA cards don't react to changes to the GS load parameter that is responsible for moving a part of the math to the geometry shader. Now let's take a look what will happen in the next test that provides heavy load on geometry shaders.

Hyperlight is the second geometry test that uses several techniques: instancing, stream output, buffer load. It employs dynamic generation of geometry by rendering into two buffers, as well as a new Direct3D 10 feature -- stream output. The first shader generates ray directions, their speed and growth vectors. These data are stored in a buffer, which is used by the second shader for rendering. Each ray point is used to generate 14 vertices in a circle, up to a million output points.

The new type of shader programs is used to generate rays. If "GS load" is set to "Heavy", it's also used for rendering. That is in Balanced mode, geometry shaders are used only to generate and grow rays. Output is up to instancing. The geometry shader also outputs data in the Heavy mode. Let's analyze the easy mode first.



Relative results demonstrated in various modes correlate to load once again. Performance scales pretty well in all cases, being close to theoretical peaks, according to which each new polygon count level should be less than 2 times slower.

In this test rendering performance isn't that limited by geometry performance. The new GeForce GTX 460 is only a bit slower than GeForce GTX 465, especially in the heavy-load mode. And it still outperformed both Radeon HD 5830 and Radeon HD 5770 in all modes (especially, the light-load ones). GeForce GTX 480 is a leader, but the gap between it and cheaper graphics cards isn't that large.

Results may change on the next diagram for the test that actively uses geometry shaders. It will be also interesting to compare test results obtained in the Balanced and Heavy modes.



Now we can see the real difference between GF100 and GF104 in geometry shader performance. This is 2 Raster Engines of GF104 vs. 4 Raster Engines of GF100. As you can see, GF100 is about twice as fast in handling geometry and processing geometry shaders.

The new GTX 460 remains only a bit faster than Radeon HD 5830 and Radeon HD 5770. Compare this diagram with the previous one -- when the load on geometry units increased, GF104 behaves similarly to ATI GPUs, not GF100, because it doesn't have that many geometry units. The same GeForce GTX 465, having more Raster Engines, speeds up noticeably in this tests, outperforming GeForce GTX 460 by 1.5 times.

This is another potential bottleneck (compared with GF100). But that's not critical for a mid-end solution, because GF104 is efficient enough in terms of geometry shaders. Let's hope that rendering performance won't drop in tessellation tests due to fewer geometry units.

Direct3D 10: vertex shaders texture fetch

Vertex texture fetch tests measure the speed of many vertex texture fetches. These tests are essentially similar, and the correlation of their results in Earth and Waves tests must also be similar. Both tests use displacement mapping based on texture fetches. The only major difference is that the Waves test uses conditional branches, while the Earth test does not.

Let's analyze the first test, Earth, in the Effect Detail Low mode.



According to previous examinations, the results of this test are s affected by both texturing performance and memory bandwidth. The difference between all graphics cards isn't that big. Only GeForce GTX 480 is considerably ahead of every other card. GeForce GTX 460 and GeForce GTX 465 are bottlenecked by something in the light-load mode, but both outperform ATI products in the heavy-load mode. It seems like fetching vertices is easier for NVIDIA cards. Let's see what will change if we increase the number of texture fetches.



Nothing much changed. Now all NVIDIA cards are bottlenecked by something mysterious in the light-load mode. Besides, GeForce GTX 460 outperforms Radeon HD 5830 and Radeon HD 5770, but lags behind GeForce GTX 465 even more in the heavy-load mode. The lower memory bandwidth must be the reason. We can't think of any other reasons.

Let's have a look at results of the second vertex texture fetch test. The Waves test executes fewer texture lookups, but it uses conditional branches. The number of bilinear texture lookups in this case reaches 14 (Effect Detail Low) or 24 (Effect Detail High) per each vertex. The complexity of geometry changes just like in the previous test.



It's interesting that the Waves results are different from those we've seen on the previous diagrams. Now ATI cards are clearly ahead of their rivals. In this test GeForce GTX 460 outperforms Radeon HD 5770, but even GeForce GTX 465 is behind Radeon HD 5830. It seems like it's bandwidth that's critical here, not texture fetch performance. Now let's look at the second variant of this test.



No considerable changes again this time, though the GF104 GPU has improved its relative results as the test complexity has grown. It has a little advantage over Radeon HD 5870, and it also wins over GeForce GTX 295, except for the light-load test.

No considerable changes again this time, though the GF104 GPU has improved its relative results as the test complexity has grown. Now GeForce GTX 460 clearly outperforms Radeon HD 5770. It also catches up a bit with Radeon HD 5830 in the heavy-load mode, but still lags behind it in the light-load mode.

Frankly speaking, these vertex fetch tests seem almost useless, because almost all graphics cards do well. And performance depends more on things memory bandwidth and such, not on TMU speed.


Write a comment below. No registration needed!


<< Previous page

Next page >>



blog comments powered by Disqus

  Most Popular Reviews More    RSS  

AMD Phenom II X4 955, Phenom II X4 960T, Phenom II X6 1075T, and Intel Pentium G2120, Core i3-3220, Core i5-3330 Processors

Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
February 1, 2013 · Processor Roundups

Inno3D GeForce GTX 670 iChill, Inno3D GeForce GTX 660 Ti Graphics Cards

A couple of mid-range adapters with original cooling systems.
January 30, 2013 · Video cards: NVIDIA GPUs

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

AMD FX-8350 Processor

The first worthwhile Piledriver CPU.
September 11, 2012 · Processors: AMD

Consumed Power, Energy Consumption: Ivy Bridge vs. Sandy Bridge

Trying out the new method.
September 18, 2012 · Processors: Intel
  Latest Reviews More    RSS  

i3DSpeed, September 2013

Retested all graphics cards with the new drivers.
Oct 18, 2013 · 3Digests

i3DSpeed, August 2013

Added new benchmarks: BioShock Infinite and Metro: Last Light.
Sep 06, 2013 · 3Digests

i3DSpeed, July 2013

Added the test results of NVIDIA GeForce GTX 760 and AMD Radeon HD 7730.
Aug 05, 2013 · 3Digests

Gainward GeForce GTX 650 Ti BOOST 2GB Golden Sample Graphics Card

An excellent hybrid of GeForce GTX 650 Ti and GeForce GTX 660.
Jun 24, 2013 · Video cards: NVIDIA GPUs

i3DSpeed, May 2013

Added the test results of NVIDIA GeForce GTX 770/780.
Jun 03, 2013 · 3Digests
  Latest News More    RSS  

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  About us & Privacy policy  ·  Twitter  ·  Facebook


Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.