On the example of:
BFG GeForce 9600 GT OC 512MB PCI-E
ECS GeForce 9600 GT Accelero Edition 512MB PCI-E
Forsa GeForce 9600 GT 512MB PCI-E
Galaxy GeForce 9600 GT Overclocked 512MB PCI-E
Point Of View GeForce 9600 GT 512MB PCI-E
Zotac GeForce 9600 GT AMP! Edition 512MB PCI-E
In our opinion, the new chip is definitely a success. It's an excellent product. But let's not put the cart before the horse.
Part 1: Theory and architecture
Much time has passed since the launch of Mid-End NVIDIA GeForce 8600 GTS (G84). Unlike the top solution (G80), it had very few ALUs and TMUs, and this product failed to provide the expected performance level. The performance gap between the GeForce 8800 GTX and the GeForce 8600 GTS was too wide. Later on, AMD and NVIDIA launched graphics cards of a higher level: the GeForce 8800 GT and the RADEON HD 3870. But AMD also offered the Low-End HD 3850 to compete with the GeForce 8600 GTS. A better fabrication process and a much earlier launch gave AMD a performance advantage, and the HD 3850 was much faster than the GeForce 8600 GTS in many applications.
And now NVIDIA announces the G94 chip, based on the overhauled G9x unified architecture. The GeForce 9600 GT based on this GPU pushes the GeForce 8600 GTS down in the price line. The new solution ranks in between the 8800 GT and the 8600 GTS. The GeForce 9600 GT is based on the G94, which differs from the G92 only in fewer unified processors and texture units, bringing a 256-bit bus into the segment of cards below $200. Thus, the key features of the G94 are a 256-bit memory bus, and fewer ALUs and TMUs. Let's examine the new solution from NVIDIA...
Before you read this article, you may want to study the baseline theoretical articles: DX Current, DX Next, and Longhorn. They describe various aspects of modern graphics cards and architectural peculiarities of products from NVIDIA and AMD.
These articles predicted the current situation with GPU architectures, many assumptions about future solutions were confirmed. Detailed information about the unified architecture of NVIDIA G8x/G9x solutions can be found in the following articles:
As we mentioned in previous articles, G9x GPUs are based on the GeForce 8 (G8x) architecture and enjoy all its advantages: unified shader architecture, full support for DirectX 10 API, high-quality methods of anisotropic filtering and CSAA with up to sixteen samples. The new GPUs feature improved units (TMU, ROP, PureVideo HD) and a 65-nm fabrication process, which allowed to reduce the costs and launch such powerful solutions in the Mid-End segment. Let's review characteristics of the new GPU and the only graphics card, which is currently available:
GeForce 9600 GT
- Codename: G94
- Fabrication process: 65 nm
- 505 million transistors
- Unified architecture with an array of common processors for streaming processing of vertices and pixels, as well as other data
- Hardware support for DirectX 10, including new Shader Model 4.0, geometry generation, and stream output
- 256-bit memory bus, four independent 64-bit controllers
- Core clock: 650 MHz (GeForce 9600 GT)
- ALUs operate at more than a doubled frequency (1.625 GHz for the GeForce 9600 GT)
- 64 scalar floating-point ALUs (integer and floating-point formats, support for FP32 according to IEEE 754, MAD+MUL without penalties)
- 32 texture address units, support for FP16 and FP32 components in textures
- 32 bilinear filtering units (like in the G84 and the G92, it gives more bilinear samples, but no free trilinear filtering and effective anisotropic filtering)
- Dynamic branching in pixel and vertex shaders
- 4 wide ROPs (16 pixels) supporting antialiasing with up to 16 samples per pixel, including FP16 or FP32 frame buffer. Each unit consists of an array of flexibly configurable ALUs and is responsible for Z generation and comparison, MSAA, blending. Peak performance of the entire subsystem is up to 64 MSAA samples (+ 64 Z) per cycle, in Z only mode — 128 samples per cycle
- Multiple render targets (up to 8 buffers)
- All interfaces (2xRAMDAC, 2xDual DVI, HDMI, DisplayPort) are integrated into the chip
Specifications of the reference GeForce 9600 GT
- Core clock: 650 MHz
- Frequency of unified processors: 1625 MHz
- Unified processors: 64
- 32 texture units, 16 blending units
- Effective memory clock: 1.8 GHz (2*900 MHz)
- Memory type: GDDR3
- Memory: 512 MB
- Memory bandwidth: 57.6 GB/sec
- Maximum theoretical fill rate: 10.4 gigapixel per second
- Theoretical texture sampling rate: 20.8 gigatexel per second
- 2 x DVI-I Dual Link, 2560x1600 video output
- SLI connector
- PCI Express 2.0
- TV-Out, HDTV-Out, support for HDMI and DisplayPort with HDCP
- Power consumption: up to 95 W
- Recommended price: $169-$189
The new Mid-End graphics card from NVIDIA differs much from the previous GeForce 8600 GTS, it has twice as many execution units: ALUs, ROPs, and TMUs. Its shader frequencies are also a tad higher. Video memory bandwidth has also grown almost twofold. That's the effect of the 256-bit bus, which appeared in this price segment for the first time in NVIDIA cards. As a result, the new GeForce 9600 GT is almost twice as powerful in all parameters: execution speed of shaders, texture lookups, plus the increased fill rate and memory bandwidth.
It's also important that the card comes with 512 MB of on-board video memory instead of 256 MB. The latter size is too small. According to our research, it's apparently insufficient for modern games. That may be the reason why NVIDIA decided to manufacture 512-MB cards—the golden mean for modern games, which have high requirements to video memory size and use approximately up to 500-600 MB.
From the architectural point of view, the G94 differs from the G92 only in quantitative characteristics—it has fewer execution units. Besides, there are not that many differences from the G8x. As we wrote in previous articles, the family of G9x GPUs is a modified G8x family, upgraded to a new fabrication process with minor architectural changes. The new Mid-End GPU has 4 large shader units (64 ALUs) and 32 texture units, as well as four wide ROPs.
So, there are not many architectural changes in this GPU. We've described almost all of them. Everything written in reviews of previous solutions holds good. Here is a basic diagram of the G94:
Let's analyze some of the architectural changes in the G9x one more time. Texture units in the G94 are exactly as the ones in the G84/G86 and the G92. They can look up twice as many bilinear filtered texture samples as the G80. But 32 texture units in the GeForce 9600 GT will not work faster in real applications than 32 units in the GeForce 8800 GTX only because of higher GPU frequencies. It may happen only with disabled trilinear and anisotropic filtering, which is a rare situation, only in those algorithms that use unfiltered lookups, e.g. parallax mapping.
According to NVIDIA, another advantage of the G9x, and the GeForce 9600 GT in particular, is a new compression technology implemented in ROPs. It's 15% more efficient than the one used in previous GPUs. Perhaps, they are caused by those architectural modifications in the G9x, which should provide higher efficiency of the 256-bit memory bus versus 320/384-bit ones described in previous articles. There won't be a big difference in real applications. Even according to NVIDIA, new ROPs are just 5% as fast in most cases.
Despite all the changes in the G9x architecture that make the GPU more complex, the number of transistors in the GPU is quite big. Perhaps, such GPU complexity can be explained with integrated NVIO, which used to be a separate chip, more complex TMUs and ROPs, as well as other hidden modifications: changed cache sizes, etc.
The G94 is equipped with the same integrated video processor of the second generation, which is used in the G84/G86 and G92, notable for improved PureVideo HD support. It almost completely offloads a CPU, when a system decodes most popular types of video data, including H.264, VC-1, and MPEG-2 at up to 1920x1080@30-40 Mbps. Video is decoded completely on the hardware level. Although NVIDIA solutions don't decode VC-1 just as efficiently as H.264, and a little part of the process uses a CPU, this feature still allows to play all existing HD DVD and Blu-Ray discs even with not very powerful computers. You can read about the second-gen video processor in our reviews of the G84/G86 and the G92. Links to these articles are published in the beginning of the review.
Let's analyze software improvements of PureVideo HD in the GeForce 9600 GT. The latest innovations in PureVideo HD include dual-thread decoding, dynamic adjustment of contrast and saturation. These changes are not made only for the GeForce 9600 GT. ForceWare 174 and higher will introduce these features to all GPUs with PureVideo HD. Along with the card under review, this list includes: GeForce 8600 GT/GTS, GeForce 8800 GT, and GeForce 8800 GTS 512.
Dynamic contrast adjustment is often used in consumer electronics (TV sets and video players). It can improve the image quality in videos shot with a nonoptimal combination of exposure and aperture. When each frame is decoded, this function analyzes its histogram. If a frame has bad contrast, the histogram is recalculated and applied to the image. Here is an example (original image is on the left, processed image is on the right):
The same applies to dynamic saturation improvement in PureVideo HD. Consumer electronics have been using some improvement algorithms for a long time already, unlike computer monitors, which reproduce everything as is (so the image is often dull). Automatic color balance for each new frame improves the image for human eyes:
Dual-thread decoding accelerates decoding and post processing of two different video data streams simultaneously. It may come in handy in such modes as "picture in picture", which are used in some Blu-Ray and HD DVD discs (for example, the second video stream may show a producer giving his or her comments on scenes playing in the main window). Such features are used in WAR and Resident Evil: Extinction.
Another interesting new feature in the latest version of PureVideo HD is support for Aero interface in Windows Vista during hardware accelerated video playback in a window. This was not possible in previous versions. It's not a very important feature, but still.
Support for external interfaces
Support for external interfaces in the GeForce 9600 GT is identical to that in the GeForce 8800 GT, except for the integrated support for DisplayPort. An additional NVIO chip for external interfaces in the GeForce 8800 has been integrated into the G94.
Reference GeForce 9600 GT cards come with two Dual Link DVIs with HDCP support. HDMI and DisplayPort support is fully implemented on the hardware level. These ports can be added by NVIDIA partners in non-reference graphics cards. According to NVIDIA, unlike the G92, DisplayPort support is also integrated into the GPU, so there is no need in external transmitters. In fact, HDMI and DisplayPort connectors are not mandatory, they can be replaced with adapters from DVI to HDMI or DisplayPort, which are sometimes bundled with modern graphics cards.
So, we've briefly covered all architectural features of the GeForce 9600 GT. The next part of this article will be devoted to synthetic tests. We'll see how the new Mid-End card from NVIDIA performs in stress tests.
Write a comment below. No registration needed!