iXBT Labs - Computer Hardware in Detail

Platform

Video

Multimedia

Mobile

Other

NVIDIA GeForce 9600 GT



On the example of:
BFG GeForce 9600 GT OC 512MB PCI-E
ECS GeForce 9600 GT Accelero Edition 512MB PCI-E
Forsa GeForce 9600 GT 512MB PCI-E
Galaxy GeForce 9600 GT Overclocked 512MB PCI-E
Point Of View GeForce 9600 GT 512MB PCI-E
Zotac GeForce 9600 GT AMP! Edition 512MB PCI-E

Part 1: Theory and architecture

We've tested six graphics cards based on the GeForce 9600 GT made by BFG, ECS, Forsa, Galaxy, Point of View (PoV), and Zotac. All cards, except for PoV and Forsa products, operate at increased frequencies. Only ECS and Galaxy manufacture these cards on their own. The other products are reference cards, bought by partners from NVIDIA, and manufactured at Flextronics and PC Partner plants by orders of the Californian chip maker.

Graphics Cards

BFG GeForce 9600 GT OC 512MB PCI-E
  • GPU: GeForce 9600 GT (G94)
  • Interface: PCI-Express x16
  • GPU frequencies (ROPs/Shaders): 675/1700 MHz (nominal - 650/1625 MHz)
  • Memory frequencies (physical (effective)): 900 (1800) MHz (nominal - 900 (1800) MHz)
  • Memory bus width: 256bit
  • Vertex processors: -
  • Pixel processors: -
  • Unified processors: 64
  • Texture processors: 32 (BLF)
  • ROPs: 16
  • Dimensions: 210x100x15 mm (the last figure is maximum thickness of the graphics card).
  • PCB color: green
  • RAMDACs/TDMS: integrated into GPU.
  • Output connectors: 2xDVI (Dual-Link/HDMI), TV-Out.
  • VIVO: not available
  • TV-out: integrated into GPU.
  • Multi-GPU operation: SLI (Hardware).

ECS GeForce 9600 GT Accelero Edition 512MB PCI-E
  • GPU: GeForce 9600 GT (G94)
  • Interface: PCI-Express x16
  • GPU frequencies (ROPs/Shaders): 680/1750 MHz (nominal - 650/1625 MHz)
  • Memory frequencies (physical (effective)): 930 (1860) MHz (nominal - 900 (1800) MHz)
  • Memory bus width: 256bit
  • Vertex processors: -
  • Pixel processors: -
  • Unified processors: 64
  • Texture processors: 32 (BLF)
  • ROPs: 16
  • Dimensions: 210x130x30 mm (the last figure is maximum thickness of the graphics card).
  • PCB color: blue
  • RAMDACs/TDMS: integrated into GPU.
  • Output connectors: 2xDVI (Dual-Link/HDMI), TV-Out.
  • VIVO: not available
  • TV-out: integrated into GPU.
  • Multi-GPU operation: SLI (Hardware).

Forsa GeForce 9600 GT 512MB PCI-E
  • GPU: GeForce 9600 GT (G94)
  • Interface: PCI-Express x16
  • GPU frequencies (ROPs/Shaders): 650/1625 MHz (nominal - 650/1625 MHz)
  • Memory frequencies (physical (effective)): 900 (1800) MHz (nominal - 900 (1800) MHz)
  • Memory bus width: 256bit
  • Vertex processors: -
  • Pixel processors: -
  • Unified processors: 64
  • Texture processors: 32 (BLF)
  • ROPs: 16
  • Dimensions: 210x100x15 mm (the last figure is maximum thickness of the graphics card).
  • PCB color: green
  • RAMDACs/TDMS: integrated into GPU.
  • Output connectors: 2xDVI (Dual-Link/HDMI), TV-Out.
  • VIVO: not available
  • TV-out: integrated into GPU.
  • Multi-GPU operation: SLI (Hardware).

Point Of View GeForce 9600 GT 512MB PCI-E
  • GPU: GeForce 9600 GT (G94)
  • Interface: PCI-Express x16
  • GPU frequencies (ROPs/Shaders): 650/1625 MHz (nominal - 650/1625 MHz)
  • Memory frequencies (physical (effective)): 900 (1800) MHz (nominal - 900 (1800) MHz)
  • Memory bus width: 256bit
  • Vertex processors: -
  • Pixel processors: -
  • Unified processors: 64
  • Texture processors: 32 (BLF)
  • ROPs: 16
  • Dimensions: 210x100x15 mm (the last figure is maximum thickness of the graphics card).
  • PCB color: green
  • RAMDACs/TDMS: integrated into GPU.
  • Output connectors: 2xDVI (Dual-Link/HDMI), TV-Out.
  • VIVO: not available
  • TV-out: integrated into GPU.
  • Multi-GPU operation: SLI (Hardware).

Galaxy GeForce 9600 GT Overclocked 512MB PCI-E
  • GPU: GeForce 9600 GT (G94)
  • Interface: PCI-Express x16
  • GPU frequencies (ROPs/Shaders): 675/1625 MHz (nominal - 650/1625 MHz)
  • Memory frequencies (physical (effective)): 1000 (2000) MHz (nominal - 900 (1800) MHz)
  • Memory bus width: 256bit
  • Vertex processors: -
  • Pixel processors: -
  • Unified processors: 64
  • Texture processors: 32 (BLF)
  • ROPs: 16
  • Dimensions: 200x100x32 mm (the last figure is maximum thickness of the graphics card).
  • PCB color: blue
  • RAMDACs/TDMS: integrated into GPU.
  • Output connectors: 2xDVI (Dual-Link/HDMI), TV-Out.
  • VIVO: not available
  • TV-out: integrated into GPU.
  • Multi-GPU operation: SLI (Hardware).

Zotac GeForce 9600 GT AMP! Edition 512MB PCI-E
  • GPU: GeForce 9600 GT (G94)
  • Interface: PCI-Express x16
  • GPU frequencies (ROPs/Shaders): 725/1750 MHz (nominal - 650/1625 MHz)
  • Memory frequencies (physical (effective)): 1000 (2000) MHz (nominal - 900 (1800) MHz)
  • Memory bus width: 256bit
  • Vertex processors: -
  • Pixel processors: -
  • Unified processors: 64
  • Texture processors: 32 (BLF)
  • ROPs: 16
  • Dimensions: 210x100x15 mm (the last figure is maximum thickness of the graphics card).
  • PCB color: green
  • RAMDACs/TDMS: integrated into GPU.
  • Output connectors: 2xDVI (Dual-Link/HDMI), TV-Out.
  • VIVO: not available
  • TV-out: integrated into GPU.
  • Multi-GPU operation: SLI (Hardware).



BFG GeForce 9600 GT OC 512MB PCI-E
Point Of View GeForce 9600 GT 512MB PCI-E
Forsa GeForce 9600 GT 512MB PCI-E
Galaxy GeForce 9600 GT Overclocked 512MB PCI-E
ECS GeForce 9600 GT Accelero Edition 512MB PCI-E
Zotac GeForce 9600 GT AMD! Edition 512MB PCI-E
Each graphics card has 512 MB of GDDR3 SDRAM allocated in eight chips on the front side of the PCB.

Samsung memory chips (GDDR3). 1.0 ns memory access time, which corresponds to 1000 (2000) MHz.



Comparison with the reference design, front view
Point Of View GeForce 9600 GT 512MB PCI-E
Reference card NVIDIA GeForce 8800 GT
BFG GeForce 9600 GT OC 512MB PCI-E
Galaxy GeForce 9600 GT Overclocked 512MB PCI-E
Forsa GeForce 9600 GT 512MB PCI-E
ECS GeForce 9600 GT Accelero Edition 512MB PCI-E
Zotac GeForce 9600 GT AMP! Edition 512MB PCI-E


Comparison with the reference design, back view
Point Of View GeForce 9600 GT 512MB PCI-E
Reference card NVIDIA GeForce 8800 GT
BFG GeForce 9600 GT OC 512MB PCI-E
Galaxy GeForce 9600 GT Overclocked 512MB PCI-E
Forsa GeForce 9600 GT 512MB PCI-E
ECS GeForce 9600 GT Accelero Edition 512MB PCI-E
Zotac GeForce 9600 GT AMP! Edition 512MB PCI-E


Possessing a similar architecture (memory bus), its PCB layout shouldn't differ much from that of the 8800 GT. Most changes were made to the power circuits (because these GPUs differ in power consumption). The PCB is simplified, we can see many empty seats on these cards. It applies to the reference design of the 9600 GT, which is used in five out of six graphics cards (ECS introduced only minor changes into the design).

The Galaxy card has its own PCB layout. I guess it's based on a simplified proprietary PCB design of the 8800 GT (no second power connector, no core voltage switch). As a result, this card is a tad shorter than the reference product.

We'll talk about coolers below.

Graphics cards of this series are equipped with an audio connector to plug to a sound card, in order to transmit the audio stream to HDMI (via a DVI-to-HDMI adapter). That is the graphics card does not contain an audio codec, but it receives the audio signal from an external sound card. So if this function is important to you, make sure the bundle includes a special audio cable.

All cards have TV-Out with a unique jack. You will need a special bundled adapter to output video to a TV-set via S-Video or RCA. You can read about the TV-Out in more detail here.

Analog monitors with d-Sub (VGA) interface are connected with special DVI-to-d-Sub adapters. The bundle also includes DVI-to-HDMI adapters (these graphics cards support video/audio transfer to HDMI receivers), so there should be no problems with such monitors. Maximum resolutions and frequencies:

  • 240 Hz Max Refresh Rate
  • 2048 x 1536 x 32bit x85Hz Max - analog interface
  • 2560 x 1600 @ 60Hz Max - digital interface (all DVIs with Dual-Link)

What concerns MPEG2 playback features (DVD-Video), we analyzed this issue in 2002. Little has changed since that time. CPU load during video playback on modern graphics cards does not exceed 25%.

What concerns HDTV. You can read one review here.

These cards require additional power supply, so each card is bundled with an adapter from molex to 6-pin, even though all modern PSUs are equipped with these cables.

Now about the cooling systems. Four cards out of six are copies of the reference design, so they are equipped with identical coolers. We'll examine one of them below in the PoV product. The other two products from ECS and Galaxy are equipped with unique cooling systems, which will also be examined here.

Point Of View GeForce 9600 GT 512MB PCI-E
Reference cooler GeForce 8800 GT


As we can see, this cooling system features a traditional long closed heat sink and a fan that blows air through it. Alas, the hot air is not thrown out of a PC case (this solution would have required a two-slot device, and NVIDIA wanted to boast of a combination of power and a narrow cooler. So in some PC cases, the 8800 GT may grow almost 100B°C hot, heating all other components inside.

I've compared two cooling systems for a reason: the old cooler in the 8800 GT and the new device in the 9600 GT. Experienced users may remember that the 8800 GT cooler had serious problems with noise and efficiency because of its small fan. In the first article about the 8800 GT we examined an alternative cooler in the card from Zotac, which differed from the reference device only in fan size. The effect was striking: lower temperatures and no noise. NVIDIA adopted this solution, and it now installs large fans in reference coolers for the 9600 GT.

Galaxy GeForce 9600 GT Overclocked 512MB PCI-E
Engineers from this company installed a two-slot cooling system again. It was very noisy in previous solutions. However, this problem has been solved here by adding a monitoring system for temperatures and rotational speed. So this cooling system operates at low speed, there is no noise.
ECS GeForce 9600 GT Accelero Edition 512MB PCI-E
ECS traditionally equips its cards with Arctic Cooling systems (passive heat sink without a fan.) We've already examined a similar cooling system, so I'll skip the description here. A huge heat sink rises above the card by more than three centimeters, so it's a two-slot device. It uses heat pipes to channel the heat away from the GPU only. Memory chips have their own small heat sinks.


We monitored temperatures using RivaTuner (written by A.Nikolaychuk AKA Unwinder). Here are the results:

ECS GeForce 9600 GT Accelero Edition 512MB PCI-E




BFG GeForce 9600 GT OC 512MB PCI-E




Point Of View GeForce 9600 GT 512MB PCI-E
Forsa GeForce 9600 GT 512MB PCI-E




Galaxy GeForce 9600 GT Overclocked 512MB PCI-E




Zotac GeForce 9600 GT AMP! Edition 512MB PCI-E




All cards with the reference cooling system, regardless of their operating frequencies, improved their thermal results relative to the 8800 GT. Their temperatures never rise close to the critical line.

Moreover, the ECS card with passive cooling demonstrates the most outstanding results, taking into account that it's noiseless.

You can see the G94 GPU (GeForce 9600 GT) below. For some people, "500 million transistors" is an empty phrase. But not long ago we couldn't even imagine that so many transistors would be used in a Mid-End graphics card! By the way, the die itself is rotated 45B° to its substrate.






Let's proceed to bundles.

All bundles include a User Manual, CD with drivers and utilities, external power splitter, DVI-to-VGA, DVI-to-HMDI, and component output (TV-out) adapters. And now let's see what other accessories are added by each vendor.



BFG GeForce 9600 GT OC 512MB PCI-E
The bundle does not include an HDMI adapter, although it contains two DVI-to-VGA adapters (I cannot imagine why a user would want to plug two CRT monitors to the new graphics card these days). A BFG representative did not inform us about the lack of this adapter, but pointed out the color of envelopes and logos on adapters. The bundle contains a pile of fliers and useless documents instead of a single User Manual. Besides, there is no audio cable to output the audio signal from a sound card to HDMI. It seems that the American company neglects the HDMI feature.


ECS GeForce 9600 GT Accelero Edition 512MB PCI-E
It's a basic bundle, the manufacturer added only an audio cable.


Forsa GeForce 9600 GT 512MB PCI-E
There is no HDMI adapter, but we found a TV cord in the bundle. I'm not sure whether it's a problem, or not. I don't think it is, considering low popularity of HDMI in Russia.


Point Of View GeForce 9600 GT 512MB PCI-E
The bundle lacks a component output adapter and an audio cable.


Galaxy GeForce 9600 GT Overclocked 512MB PCI-E
The bundle resembles that of Forsa, there is no HDMI adapter and an audio cable. Perhaps, this company is also of the opinion that HDMI is not necessary for this card :)


Zotac GeForce 9600 GT AMP! Edition 512MB PCI-E
This is a good bundle, like in the card from ECS.




Unfortunately, none of the companies added bonuses to their bundles!

Packages

BFG GeForce 9600 GT OC 512MB PCI-E

It's a traditional small black box. You just open it and pour everything out of it. The card itself is packed well in a plain box. So we have an impression of a noname product, not a card from a famous American brand. To say nothing of box design. I cannot understand BFG designers.




ECS GeForce 9600 GT Accelero Edition 512MB PCI-E

The company sticks to the traditional vertical box - a jacket with a white cardboard box inside. All components are arranged into compartments inside. The card itself is inside a foamed polyurethane package, so damages in transit are out of the question. The package has a nice, attractive design.




Forsa GeForce 9600 GT 512MB PCI-E

When I saw the box from a distance, the first thought was: Baba-Yaga (witch) on a broom. I thought that Chinese designers started using Russian folk-tales in their work. But no. A closer examination showed a humanoid with a pole-axe (a heavy sigh: another monster with a weapon, again...) The box is made of thick cardboard. Bundled components are just scattered inside (there is only a divider between the card and accessories; they dangle inside in transit.)




Point Of View GeForce 9600 GT 512MB PCI-E

Designer from this company was inspired by something supernatural :-). It smells of avant-garde tendencies. It goes well with the name of the company. That's how this company sees its customers. :)

All components are arranged into compartments inside. The card itself is secured with struts, so dangling in transit is out of the question.




Galaxy GeForce 9600 GT Overclocked 512MB PCI-E

Traditional blue design from this company. Not bad at all! A blurred cheetah goes well with performance characteristics of this graphics card. What concerns Xtreme Tuner, mentioned on the box, we'll examine it below.

The entire bundle, including the card, is secured well in multiple compartments, so the card won't be damaged in transit.




Zotac GeForce 9600 GT AMP! Edition 512MB PCI-E

A nice dragon-theme design. The package consists of a jacket with a cardboard box inside. All components are arranged into compartments inside. The card itself is inside a foamed polyurethane box. So there will be no transportation damages.





Installation and Drivers

Testbed configuration:

  • Intel Core2 (775 Socket) based computer
    • CPU: Intel Core2 Extreme QX9650 (3000 MHz)
    • Motherboard: Gigabyte GA-X38-DQ6 on the Intel X38 chipset
    • RAM: 2 GB DDR2 SDRAM Corsair 1142MHz (CAS (tCL)=5; RAS to CAS delay (tRCD)=5; Row Precharge (tRP)=5; tRAS=15)
    • HDD: WD Caviar SE WD1600JD 160GB SATA
    • PSU: Tagan 1100-U95 (1100W)

  • OS: Windows XP SP2; DirectX 9.0c
  • OS: Windows Vista 32bit; DirectX 10.0
  • Monitor: Dell 3007WFP (30")
  • Drivers: ATI CATALYST 8.1; NVIDIA 169.28/174.12

VSync is disabled.






As I have already mentioned above, the Galaxy card is bundled with a utility for overclocking and monitoring temperatures. I think that it's important, especially for MS Windows Vista users, because NVIDIA drivers for this operating system do not have tabbed pages with overclocking options.

Synthetic tests

Our synthetic benchmarks are available here:

  • D3D RightMark Beta 4 (1050) with its description at http://3d.rightmark.org
  • D3D RightMark Pixel Shading 2 and D3D RightMark Pixel Shading 3 - tests of Pixel Shaders 2.0 and 3.0 link.
  • RightMark3D 2.0 with a brief description: link

RightMark3D 2.0 requires MS Visual Studio 2005 runtime as well as the latest update of DirectX runtime.

Synthetic tests were run with the following graphics cards:

  • NVIDIA GeForce 9600 GT with standard parameters (GF9600GT)
  • NVIDIA GeForce 8800 GT operating at 9600 GT frequencies, that is 650/1625/1800 MHz (GF8800GT (650))
  • NVIDIA GeForce 8800 GT with standard parameters (GF8800GT)
  • NVIDIA GeForce 8600 GTS with standard parameters (GF8600GTS)
  • RADEON HD 3870 with standard parameters (HD3870)
  • RADEON HD 3850 with standard parameters (HD3850)

We selected them to compare with the GeForce 9600 GT for the following reasons: GeForce 8800 GT is a similar solution from a similar price segment; GeForce 8800 GT operating at the same frequencies as the card under review will show possible architectural changes and the effect of fewer ALUs and TMUs; GeForce 8600 GTS is a direct predecessor on the previous architecture (G8x), we'll evaluate performance gain from the modified number of execution units and architectural modifications; RADEON HD 3870 is the fastest single-GPU solution from AMD; HD 3850 is a direct competitor of the GeForce 9600 GT, judging by recommended prices.

Unfortunately, performance analysis in synthetic tests will not be interesting again, because nothing has changed in the G94 versus the G92 from the architectural point of view. It has the same features, they differ only in the number of ALUs and TMUs, as well as operating frequencies. We'll have to wait until Summer or Autumn for new GPU architectures to appear... It would have been great to compare the G94 with the G92 at equal frequencies, some ALUs and TMUs locked in the G92, in order to find out whether there are some differences in the G94. But it's impossible, because we cannot lock execution units in the G92. And it's not interesting to compare our GPU with the cut-down G80, because they cannot have the same number of ROPs.

Direct3D 9: Pixel Filling tests

This test determines peak texel rate in FFP mode for different numbers of textures applied to a pixel:




Far from all graphics cards can demonstrate results close to their theoretical maximum. Results of synthetic tests are most often a tad lower than the theoretical maximum. Graphics cards based on G8x and AMD cards come closer to this threshold than the other cards. NVIDIA cards, notable for improved TMUs, fail to reach their theoretical maximum in our old test. Although the GeForce 9600 GT shows a better ratio than the GeForce 8800 GT here. Judging by our results, the G94 can look up over 16 texels per cycle for 32-bit textures and bilinear filtering, although theoretically it can do better. It may be the effect of insufficient memory bandwidth...

In case of few textures per pixel, the GeForce 9600 GT performs almost on a par with the GeForce 8800 GT and the RADEON HD 3850. In such cases all graphics cards are limited by video memory bandwidth. ROP capacities are revealed better in heavier conditions. The graphics card based on the new Mid-End GPU outperforms much the HD 3850. GeForce 8600 GTS is outperformed by more than twofold. GeForce 8800 GT is only 1.5 times as fast. Let's have a look at the fill rate results:




The second synthetic test measures the fill rate. It shows the same situation adjusted for the number of pixels written into the frame buffer. In case of 0 and 1 texture, performance is apparently limited by memory bandwidth as well as by the number and frequency of ROPs. The situation resembles the previous test - as the number of textures per pixel grows, the new card outperforms its competitor from AMD and noticeably outperforms its predecessor (GeForce 8600 GTS).

Direct3D 9: Geometry Processing Speed Tests

Let's analyze a couple of stress geometry tests. The first test uses the simplest vertex shader that shows maximum triangle throughput:




As all these GPUs are based on unified architectures, their unified processors in this test are busy with geometry processing only. So all solutions demonstrate high results, which are evidently limited not by peak performance of unified processors, but by performance of other units, for example, triangle setup.

Test results prove again that AMD GPUs are traditionally faster at processing geometry than NVIDIA GPUs. Although there is a little difference between the GeForce 9600 GT and the RADEON HD 3850, the AMD card is faster in this task. The G84 and RV670 execute this test with similar efficiency in various modes, peak performance in FFP, VS 1.1, and VS 2.0 does not differ much. Only the FFP mode is noticeably faster in representatives of the G9x architecture.

We've removed intermediate geometry tests with a single light source. So we proceed straight to the most complex geometry task with three light sources and static/dynamic branching:




Now we can see the difference between all our contenders, especially the difference between AMD and NVIDIA solutions. The RADEON HD 3850 outperforms all other solutions. This most complex geometry task does not reveal its full potential. Its results in various modes are absolutely identical, except for the test with dynamic branches. We note traditional opposite weaknesses of vertex units in AMD and NVIDIA architectures - dynamic branches cause a deeper performance drop in the former, while static branches do it with the latter. Besides, in case of three mixed light sources, optimized FFP emulation in the G9x becomes even more noticeable.

Even though the GeForce 9600 GT is a tad slower than the HD 3850, it's much faster than the GeForce 8600 GTS (almost twice as fast). These results agree well with the theory. It demonstrates lower results than the GeForce 8800 GT operating at the same frequencies, which can also be explained, although not the 1.75-fold difference. On the whole, all GPUs perform well in these tests. They can use all unified processors to solve geometry tasks. But unified processors are busy mostly with pixels in real applications. So we proceed to such tests now.

Direct3D 9: Pixel Shaders Tests

The first group of pixel shaders to be reviewed here is too simple for modern GPUs. It includes various versions of pixel programs of relatively low complexity: 1.1, 1.4, and 2.0.




These tests are too easy for modern architectures and fail to reveal their true power. Performance in simple tests is limited by texel and fill rates, we can see it in relatively low results of the RADEON HD 3850. GeForce 9600 GT easily outperforms this card in such tests, because the AMD GPU has fewer TMUs. Results become more interesting in complex PS 2.0 tests. The card from AMD even becomes a leader in the most complex test (Phong with three light sources), outperforming the G92-based card as well.

GeForce 9600 GT demonstrates excellent results. It's more than twice as fast as its predecessor (GeForce 8600 GTS) in complex PS 2.0 tests. Performance difference between them grows large in the most complex tests. The new card is not outperformed by the GeForce 8800 GT as much as it theoretically should be, although the GPU hasn't changed much relative to the G92. Perhaps, these tests are also heavily affected by memory bandwidth, which is almost the same in these graphics cards. Let's have a look at results in more complex pixel programs of intermediate versions:




The procedural water test (which depends much on texturing performance) uses dependent texture lookups of high nesting depth, so the only RADEON card lags far behind the NVIDIA solutions based on the G9x. It competes only with the GeForce 8600 GTS. Our card under review is over 1.5 times as fast as the RADEON. It's the first test, where a practical difference between the GeForce 8800 GT and the 9600 GT is so close to the theoretical value (73% in practice and 75% in theory, the cards operate at the identical frequencies).

The AMD card shoots forward in the second more compute-intensive test. Even the G92-based card is outperformed. This task apparently favors the R6xx architecture with more unified processors. If we compare the GeForce 9600 GT and the 8600 GTS, the new Mid-End card from NVIDIA outperforms the old solution in the compute-intensive test - performance difference reaches 2.5 times. And the GeForce 8800 GT operating at the 9600 GT frequencies is only 1.5 times as fast, not 1.75 times, as it theoretically should have been.

Direct3D 9: New Pixel Shaders Tests

These tests of DirectX 9 pixel shaders are even more complex, they are divided into two categories. We'll start with easier shaders - SM 2.0:

  • Parallax Mapping - a texturing method used in many modern games, which is described in detail in our article Modern 3D Graphics Terms
  • Frozen Glass - a complex procedural texture that visualizes frozen glass with adjustable parameters

There are two modifications of these shaders: arithmetic intensive and texture sampling intensive. Let's analyze arithmetic-intensive modifications, they are more promising from the point of view of future applications:




A situation in the Frozen Glass test is similar to that in the previous group of Water tests. The GeForce 9600 GT is outperformed by the 8800 GT by 60%, and it's twice as fast as the GeForce 8600 GTS. NVIDIA cards based on the G9x GPUs outperform the HD 3850 in this test, which confirms the fact that their performance is limited by the texel rate in the first place.

Traditionally, AMD cards used to be leaders in the second Parallax Mapping test, but the G9x-based solutions with improved TMUs changed the situation. Parallax Mapping requires an additional texture lookup, so the GeForce 8800 GT outperforms the HD 3850, although the GeForce 9600 GT is outperformed by both cards. But the G92 is just 1.5 times as fast. However, it's again faster than the G84 by 2.5 times! Let's analyze results obtained in the texture sampling intensive tests, where the GeForce 9600 GT should demonstrate even higher results:




The situation changes. Performance in these tests is limited by the speed of texture units more than ever, so the new GeForce 9600 GT is faster than the RADEON HD 3850 in both tests. To say nothing of the GeForce 8600 GTS, performance difference between them reaches 2.85 times in the second test! The 9600 GT is outperformed by the more expensive G92-based card, of course, but not as much as it should have been theoretically. By the way, arithmetic-intensive shaders work faster even on the GeForce 8600 GTS.

Let's have a look at results of another two pixel shader tests - SM 3.0. They are the most complex of all our tests for Direct3D 9 pixel shaders. The tests load ALUs and texture units heavily. Both shader programs are complex, long, and include a lot of branches:

  • Steep Parallax Mapping is a much heavier modification of parallax mapping, which is also described in the article Modern 3D Graphics Terms
  • Fur - a procedural shader that visualizes fur



The load on graphics cards in these tests is great, only such powerful GPUs as the RV670 and the G92 could cope with it. And now the G94 has joined them. The card with this GPU outperforms the G84-based card by more than two-fold in both tests. It's outperformed by the GeForce 8800 GT operating at the same frequency by more than 70%, which is close to the theoretical difference in the number of unified processors. Although AMD cards efficiently execute complex Pixel Shaders 3.0 with a lot of branches, the new card from NVIDIA outperforms the AMD solution in both tests, which can be explained with faster bilinear texture lookups in the G9x architecture. When we analyze results of such synthetic tests, we should take into account that the situation may be different in real applications, if they use trilinear and/or anisotropic texture filtering.

Direct3D 10: PS 4.0 Tests (texturing, loops)

New RightMark3D 2.0 includes two old PS 3.0 tests (Direct3D 9), rewritten for DirectX 10, and two brand new tests. The first two tests can now enable self-shadowing and shader supersampling, which increase their load on GPUs.

These tests measure efficiency of executing looped pixel shaders with a lot of texture lookups (up to several hundreds of lookups per pixel in the heaviest mode!) and a relatively low ALU load. In other words, they measure a texture sampling rate and branching efficiency in a pixel shader.

The first pixel shader test will be the Fur test. When used with the lowest settings, it uses 15-30 texture lookups from bump maps and two lookups from the main texture. The High Effect Detail mode increases the number of lookups to 40-80. When shader supersampling is enabled, the number of lookups grows to 60-120. The heaviest mode is the High mode with SSAA - 160-320 lookups from a bump map.

Let's see what happens in modes without supersampling - they are relatively simple, and the correlation of results in Low/High modes must be similar.

All results in the High mode are approximately 1.5 times as low as in the Low mode. The procedural Fur tests for Direct3D 10 with lots of texture lookups again show a huge advantage of NVIDIA solutions over AMD cards. This difference should not have existed. How can the RADEON HD 3870 be slower than the GeForce 8600 GTS, if it's theoretically stronger in every respect?

Judging by our test results, performance of this test depends not only on the number and speed of TMUs, but also on the fill rate and memory bandwidth. What concerns a comparison of results demonstrated by the GeForce 9600 GT and the 8800 GT, the more expensive card is approximately 1.5 times as fast, and the GeForce 8600 GTS is almost twice as slow as the card under review. Let's have a look at the results in this test with enabled shader supersampling, which quadruples the load. Perhaps it will change the situation:

Theoretically, supersampling quadruples the load. But a performance drop of NVIDIA cards is deeper than that of AMD cards. So the performance gap between them closes down, and the HD 3870 catches up with the weakest GeForce. In other respects, nothing changes as the shader grows more complex, thus increasing the GPU load. The difference between the GeForce 8800 GT and the new 9600 GT remains the same, although the fill rate and memory bandwidth do not have the same effect on the overall performance anymore.

The second test that measures efficiency of executing complex looped pixel shaders with many texture lookups is called Steep Parallax Mapping. With low settings it uses 10-50 texture lookups from a bump map and three lookups from main textures. The heavy mode with self-shadowing doubles the number of texture lookups, and supersampling quadruples this number. The most complex test mode with supersampling and self-shadowing uses 80-400 texture lookups, that is eight times as many as in the low mode. Let's analyze simple modes without supersampling first:

This test is more interesting from the practical point of view. Various parallax mapping methods have been used in games for a long time already. Heavy modifications, such as our steep parallax mapping, are already used in the latest releases, e.g. in Crysis and Lost Planet. Along with supersampling, our test can enable self-shadowing that doubles the GPU load (High mode).

Even though AMD solutions have been traditionally strong in our Direct3D 9 tests of parallax mapping, they cope with our updated D3D10 test without supersampling only on a par with the GeForce 8600 GTS. Besides, self-shadowing causes a bigger performance drop in AMD products - over two times versus 1.5 in NVIDIA solutions.

The GeForce 9600 GT is twice as fast as its predecessor. Its performance difference from the GeForce 8800 GT has grown a little, exceeding the 1.6-fold line. Let's see what supersampling will change. Performance drop from supersampling was bigger in NVIDIA cards in the previous test.

FPS values obtained with enabled supersampling and self-shadowing again indicate a very heavy GPU load. These two options enabled together (supersampling and self-shadowing) increase the load on graphics cards by almost eight times, causing a great performance drop. Performance difference between our graphics cards remains. Supersampling has the same effect as in the previous case - the AMD cards significantly improve their results versus NVIDIA solutions. However, even the HD 3870 is still outperformed by the GeForce 9600 GT. What concerns the comparison between the GeForce 9600 GT and other cards from NVIDIA, it's again twice as fast as the GeForce 8600 GTS, and it's outperformed by the equally-clocked 8800 GT by the value close to the theoretical one. At stock frequencies, the difference between them reaches 1.5 times.

Direct3D 10: PS 4.0 Tests (computing)

The next couple of pixel shader tests contains minimum texture lookups to reduce the effect of TMU performance. They use a lot of arithmetic operations, so they measure arithmetic performance of GPUs, how fast they execute arithmetic instructions in pixel shaders.

The first computing test is called Mineral. It's a complex procedural texturing test, which uses only two texture lookups and 65 sin and cos instructions.

We've already noted in the analysis of our Direct3D 9 synthetic test results, that the modern AMD architecture often performs better in complex arithmetic tasks than the competing architecture from NVIDIA. It holds true for this test as well - the RADEON HD 3850 is faster than the GeForce 9600 GT, and the HD 3870 outperforms the best solution on the G92.

Performance of the GeForce 9600 GT is again twice as high as that of the old GeForce 8600 GTS. The updated Mid-End card on the G94 is outperformed by the 8800 GT by less than 50%, which disagrees with the performance difference (the number and frequencies) of unified processors. Perhaps, these test results are also affected by memory bandwidth...

The second shader test is called Fire, it's even harder for ALUs. It contains only a single texture lookup, while the number of sin/cos instructions is doubled to 130. Let's see what changes as the load grows:

Until 2008, AMD cards have failed this test. They have demonstrated very low results that indicated some hardware bug. And only our tests of the RADEON HD 3870 X2 demonstrated decent results. As you can see, the RADEON HD 3870 and the HD 3850 are faster than the GeForce 8800 GT and the GeForce 9600 GT. We shall verify these test results next time by modifying the test in order to eliminate a possibility of specific optimizations...

As for now, we establish a fact that the GeForce 9600 GT is outperformed in this test by its direct competitor (HD 3850) by more than 60%. What concerns the comparison of NVIDIA cards, the situation hasn't changed - the GeForce 8800 GT is faster than the 9600 GT by over 50%. In its turn, the latter is twice as fast as the GeForce 8600 GTS. This situation agrees with the difference in ALU/TMU performance, the number and frequencies of execution units.

Direct3D 10: Geometry Shader Tests

RightMark3D 2.0 includes two geometry shader tests. The first one is called Galaxy, it's similar to point sprites from previous Direct3D versions. It animates a system of particles using a GPU, a geometry shader creates four vertices from each particle. Similar algorithms should be used in future DirectX 10 games.

A change of balance in geometry tests does not affect rendering results, the image is always identical, only scene processing methods differ. GS load value determines what shader will be busy - vertex or geometry. The amount of work is always the same.

Let's analyze the first modification of Galaxy with vertex processing for three levels of geometric complexity:

The correlation of results with different complexity levels of the scene is almost the same. Performance demonstrated corresponds to the number of points, FPS is halved each step. Previous reviews proved that it was not a hard task for modern graphics cards. Performance in this test is not apparently limited by shader ALUs, it's memory bandwidth and the fill rate rather than GPU power. The fact that the GeForce 9600 GT performs on a par with the GeForce 8800 GT at equal frequencies proves that their performance is not limited by ALU/TMU speed.

Competitors of the GeForce 9600 GT from AMD perform almost like the GeForce 8600 GTS, and the GeForce 9600 GT outperforms them by more than 50%. The new solution operating at nominal frequencies outperforms even the 8800 GT in this test, which can be explained with its higher clock rates. Perhaps the situation will change, when some work is moved to a geometry shader.

But nothing has changed much in this case. All graphics cards demonstrate almost the same results with a different GS load, which is responsible for moving some of the load to the geometry shader. The only difference among NVIDIA cards is a drop in results of the GeForce 8600 GTS. On the contrary, AMD cards enjoy a little performance gain. And now the slowest card is the GF 8600 GTS, which is again twice as slow as the GF 9600 GT. Let's see what will change in the next test...

Hyperlight is the second geometry test that uses several techniques: instancing, stream output, buffer load. It employs dynamic generation of geometry by rendering into two buffers, as well as a new Direct3D 10 feature - stream output. The first shader generates ray directions, their speed and growth vectors. These data are stored in a buffer, which is used by the second shader for rendering. Each ray point is used to generate 14 vertices in a circle, up to a million output points.

The new type of shader programs is used to generate rays. If "GS load" is set to "Heavy", it's also used for rendering. That is in Balanced mode, geometry shaders are used only to generate and grow rays. Output is up to instancing. The geometry shader also outputs data in the Heavy mode. Let's analyze the easy mode first:

Relative results in various modes again correspond to the load: Performance scales well in all cases. It's close to theoretical parameters, according to which, each next level of Polygon count must be twice as slow. The GeForce 9600 GT performs even a tad better in this test than the equally clocked GF 8800 GT, although this difference falls within a measurement error... AMD cards are again outperformed by all solutions from NVIDIA in this test with any geometry complexity. Even the GeForce 8600 GTS (which is almost twice as slow as the card under review) is faster than the RADEON cards.

The situation is similar to the previous test - the GeForce 9600 GT performs on a par with the 8800 GT. But the results may change in the next test that uses geometry shaders more actively. It will be also interesting to compare test results obtained in Balanced and Heavy modes.

The relation of performance results has changed much now. We can see that all AMD GPUs execute complex geometry shaders more efficiently than all NVIDIA GPUs. But NVIDIA has again eliminated some problems in its architectures, so the GeForce 9600 GT not just outperforms the GeForce 8800 GT by 20%, but it practically catches up with the cheaper AMD RV670-based card, being slower only than the more expensive modification! The performance gap used to be much wider... It's an excellent result for NVIDIA!

In other respects, we can mention that the GeForce 8600 GTS is hopeless in this test. It's almost four times as slow as the new Mid-End card. Even the GeForce 8800 GT is outperformed by the new solution... What concerns the comparison of results obtained in different modes, everything is as usual. Graphics cards from AMD are outperformed even though NVIDIA cards lose much performance as they switch from instancing to a geometry shader. All G9x-based GeForces perform better in Balanced mode, than the RADEON HD 3870 does in Heavy mode. You should keep in mind that the image does not differ (visually) in these modes.

Direct3D 10: Vertex Texture Fetch Rate

Vertex Texture Fetch tests measure the speed of many vertex texture fetches. These tests are essentially similar, and the correlation of their results in Earth and Waves tests must also be similar. Both tests use displacement mapping based on texture fetches. The only major difference is that the Waves test uses conditional branches, while the Earth test does not.

Let's analyze the first test (Earth) in Effect detail Low mode:

Results in various modes again demonstrate similar performance of graphics cards relative to each other. Judging by our previous reviews, this test is heavily affected by memory bandwidth - the easier the mode, the stronger its effect on performance. This time, comparative results of the G92 and the G94 operating at the same frequencies are slightly different towards the top model. But on the whole, the GeForce 9600 GT performs almost on a par with the GeForce 8800 GT. The GeForce 8600 GTS is almost three times as slow, the RADEON HD 3850 is outperformed by one third, and the HD 3870 performs on a similar level. Let's have a look at results of this test with more texture lookups:

The situation hasn't changed much. Only the nominal leader has changed - the GeForce 9600 GT again slightly outperforms the G92-based card. The GeForce 8600 GTS is just as slower, both cards from AMD haven't changed their positions. Results of the graphics cards grow closer, as the task gets more complex.

Let's have a look at results of the second vertex texture fetch test. The Waves test executes fewer texture lookups, but it uses conditional branches. The number of bilinear texture lookups in this case reaches 14 (Effect detail Low) or 24 (Effect detail High) per each vertex. Geometry complexity changes just like in the previous test.

The Waves test favors AMD cards. And the GeForce 9600 GT is now naturally outperformed by the GeForce 8800 GT. Performance in this test seems to depend on TMUs as well, not only on memory bandwidth and fill rate. Both RADEON HD 3800 cards fare very well. The cheaper card performs on a par with the GeForce 8800 GT, almost catching up with it in heavy modes. And the HD 3870 is a leader here, except for the GF 8800 GT with increased frequencies. As usual, the GeForce 8600 GTS is two times as slow as the GeForce 9600 GT in all modes. Let's analyze the second modification of the test:

There are almost no changes here, although the RADEON HD 3800 cards perform better compared to NVIDIA cards, as the test grows more complex, because the latter lose much more speed. The other conclusions also hold true - performance is a tad limited by memory bandwidth in the Low mode, TMUs and ROPs play an important role in the High mode. AMD cards improve their positions in the VTF tests. We previously noted that NVIDIA cards coped better with vertex texture fetches than AMD cards, but now the situation has become much more favorable to them.

Conclusions on the synthetic tests

Synthetic tests of the GeForce 9600 GT and other products from various price segments show us that the new Mid-End product from NVIDIA is a very powerful competitive card. It's approximately twice as fast as its direct predecessor (GeForce 8600 GTS). Moreover, it keeps ahead of the RADEON HD 3850 in most tests, which has a similar recommended price. Besides, it can compete with a more expensive card from AMD - RADEON HD 3870, especially its overclocked modifications.

The G9x architecture hasn't changed much compared to the G8x. It's notable for high computing performance, being designed for modern and future applications with lots of complex shaders of all types. High efficiency of unified processors, a sufficient number of ALUs, TMUs, and ROPs, as well as high operating frequencies allow this GPU to demonstrate excellent results in all synthetic tests.

This GPU architecture was improved (compared to the previous G8x), TMUs and ROPs were modified. Texture units can fetch twice as much data in certain conditions as the G80, and its ROPs support the new compression technique, which improves efficiency of video memory utilization. In many cases, all these features and the increased frequencies help the GeForce 9600 GT perform on a par with the GeForce 8800 GT, which has more ALUs and TMUs (by 75%). Sometimes, the new card is even faster! This graphics card has practically no weaknesses - it's well balanced, it possesses enough execution units, a wide memory bus (for this price segment), and high memory bandwidth. So we can hope that the GeForce 9600 GT will demonstrate excellent results in games and become a powerful competitor to AMD solutions.

The next part of the article will contain the most important tests of the new card from NVIDIA in modern games, which should prove our conclusions based on synthetic tests. These results should be much more interesting than results demonstrated in synthetic tests, because game performance almost always depends more on texel and fill rates than on ALUs and geometry processors, as in our synthetic tests...

Part 3: Performance in games

PSU was kindly provided by TAGAN,
Dell 3007WFP monitor was kindly provided by NVIDIA.


Andrey Vorobiev (anvakams@ixbt.com),
Alexey Berillo (sbe@ixbt.com)
February 26, 2008

Write a comment below. No registration needed!


Article navigation:



blog comments powered by Disqus

  Most Popular Reviews More    RSS  

AMD Phenom II X4 955, Phenom II X4 960T, Phenom II X6 1075T, and Intel Pentium G2120, Core i3-3220, Core i5-3330 Processors

Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
February 1, 2013 · Processor Roundups

Inno3D GeForce GTX 670 iChill, Inno3D GeForce GTX 660 Ti Graphics Cards

A couple of mid-range adapters with original cooling systems.
January 30, 2013 · Video cards: NVIDIA GPUs

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

AMD FX-8350 Processor

The first worthwhile Piledriver CPU.
September 11, 2012 · Processors: AMD

Consumed Power, Energy Consumption: Ivy Bridge vs. Sandy Bridge

Trying out the new method.
September 18, 2012 · Processors: Intel
  Latest Reviews More    RSS  

i3DSpeed, September 2013

Retested all graphics cards with the new drivers.
Oct 18, 2013 · 3Digests

i3DSpeed, August 2013

Added new benchmarks: BioShock Infinite and Metro: Last Light.
Sep 06, 2013 · 3Digests

i3DSpeed, July 2013

Added the test results of NVIDIA GeForce GTX 760 and AMD Radeon HD 7730.
Aug 05, 2013 · 3Digests

Gainward GeForce GTX 650 Ti BOOST 2GB Golden Sample Graphics Card

An excellent hybrid of GeForce GTX 650 Ti and GeForce GTX 660.
Jun 24, 2013 · Video cards: NVIDIA GPUs

i3DSpeed, May 2013

Added the test results of NVIDIA GeForce GTX 770/780.
Jun 03, 2013 · 3Digests
  Latest News More    RSS  

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  About us & Privacy policy  ·  Twitter  ·  Facebook


Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.