NVIDIA GeForce 6200 TurboCache 16/32MB (128MB) (NV44)

GeForce 6200 TC official specification

Codename of the chip is NV44
110nm process (TSMC)
77 million transistors
FC packaging (flip-chip, without a metal cap)
64-bit dual-channel memory interface
Up to 64 MB DDR/GDDR-2/GDDR-3
PCI Express16x bus interface integrated into the chip
Interface conversion into APG 8x using bidirectional PCI Express<->AGP bridge HIS
Advanced features of the system memory addressed via PCI Express to store the frame buffer, textures, and other information traditionally stored in local memory.
4 pixel processors, each with a texture unit with random floating point and integer filtering (anisotropy up to 16x).
3 vertex processors, each with a texture unit, without filtering selected values (discrete sampling)
Calculation, blending, and writing of up to 2 full (color, depth, stencil buffer) pixels per clock
Calculation and writing of up to 4 values of depth and stencil buffer per clock (no operations with color)
Support for a "double-sided" stencil buffer
Support for special geometry rendering optimizations to accelerate shadow algorithms based on a stencil buffer (so called Ultra Shadow II Technology), particularly widely used in the Doom III engine.
All necessary things to support Pixel and Vertex Shaders 3.0, including dynamic branching in pixel and vertex processors, vertex texture fetch, etc.
Floating point filtering of textures
Floating point frame buffer (including blending operations)
MRT
2 RAMDAC 400 MHz
2 DVI interfaces (require interface chips)
TV-Out and TV-In (require interface chips)
Programmable streaming video processor (for encoding, decoding, and video postprocessing purposes)
2D accelerator supporting all GDI+ functions
Integrated thermal and power monitoring.

Reference GeForce 6200 TC-16 specification

Core frequency 350 MHz
Effective memory frequency 700 MHz (2*350 MHz)
32-bit memory bus
GDDR memory type
16 MB of memory
2.8 GB/sec Memory Bandwidth
Theoretical fill rate of 700 megapixel/sec.
Theoretical texture fetch speed 1.4 gigatexel/sec
1 x VGA (D-Sub) and 1 x DVI-I connector
TV-Out
PCI-Express card does not require an additional power connector

Reference GeForce 6200 TC-32 specification

Core frequency 350 MHz
Effective memory frequency 700 MHz (2*350 MHz)
64-bit memory bus
GDDR memory type
32 MB of memory
5.6 GB/sec Memory Bandwidth
Theoretical fill rate of 700 megapixel/sec.
Theoretical texture fetch speed 1.4 gigatexel/sec
1 x VGA (D-Sub) and 1 x DVI-I connector
TV-Out
PCI-Express card does not require an additional power connector

Chip architecture

There are no global architectural differences between NV40 and NV43, there are just some innovations in the pixel pipeline aimed at effective operations with system memory as a stencil buffer, but we shall cover this topic later.

On the whole NV44 is a scaled (reduced number of vertex and pixel processors and memory controller channels) solution based on the NV40 architecture. The differences are quantitative (bold elements on the diagram) but not qualitative – from the architectural point of view the chip remains practically unchanged.

So, we have 3 vertex processors, as in NV43, and one (instead of two) independent pixel processor operating with one quad (2x2 pixel fragment). PCI Express has become native, as in NV43 (that is implemented in the chip). And there seems to be no plans for the AGP 8x card with this chip, at least its TC modification (TurboCache), because the idea itself of the effective usage of system memory for rendering requires adequate throughput of the graphics bus both ways. Even if there appear cards with a 64MB buffer and an additional bidirectional PIC-E <-> AGP bridge (dotted line), their price will be outrageously high for the low-end segment and they won't be able to compete there with truly AGP 8x solutions.

Besides, note an important limiting factor – two-channel controller and a 64-bit (!) memory bus – we'll analyze and discuss this fact later on. Judging from the chip package and the pin number, 64 bits are a hardware limit for NV44 and there will be no 128-bit cards.

The architecture of vertex and pixel processors and of the video processor remained the same – these elements were described in detail in our review of GeForce 6800 Ultra. And now let's consider potential quantitative and qualitative changes relative to NV43:

Considerations about what and how has been cut down

Obviously, even taking into account that system memory is used via PCI Express, all 6200 TC models will suffer the insufficient memory bandwidth in the first place, both system and local. Peak fill rate in 6200 TC is 700 megapixels, while local memory bandwidth is more than twice as low (without taking into account potentially reduced caches and two-channel memory controller).

Thus, we can predict that everything will be up to two parameters. System memory performance and its addressing via PCI Express (depends on a chipset and other system parameters) and complexity of executed shaders (complex shaders with a great number of instructions will not be that fastidious to memory and will probably manage to reveal the complete computing and fillrate potential of pixel processors).

The weak point in the 6200 TC series, as in case of 6600 series, will be high resolutions and full-screen antialiasing modes, especially in simple applications. And the strong point – programs with long and complex shaders and anisotropic filtering without simultaneous MSAA. This tendency will be expressed particularly painfully, much more prominently than in the 6600 – that is one can say that these accelerators depend strongly on applications. To all appearances, its results compared to those of the competing series (for example the X300 * from ATI) will differ manifold, depending on a test, resolution, and settings. That is in some applications the 6200 TC will look on a par or even better and in other applications it will be noticeably outscored.

Later on we shall verify this assumption with tests.

It's hard to judge how justified 64 and even 32-bit memory buses are – on the one hand, this move cuts the price of the chip package and reduces the number of rejected chips; on the other hand, the price difference between PCBs for 64 and 32-bit is still lower than for 256 and 128! And this difference is richly compensated by the price difference between a usually inexpensive 200 MHz DDR and still expensive 350 MHz GDDR memory. Again, as in case with the 6600, from the manufacturers' point of view the solution with 128-bit bus and cheaper DDR memory (as in case of the X300) would be more convenient, at least if they had a choice. But from the point of view of NVIDIA, who manufactures chips, very often selling them together with memory, the 64-bit or even 32-bit solution with GDDR is more profitable.

To all appearances, vertex and pixel processors in NV44 remained the same, not to take into account declared revisions to achieve effective system memory addressing from texturing and blending units. However, these are only words – in reality there are some reasons to believe that all these features are not that essential, they are implemented rather on the level of common cache manager and crossbar. They were initially included into the NV4X family, there was just no point in using them (on the level of drivers) in senior cards with faster, more capacious, and larger local memory. There is also no point in this technology in cards with the AGP interface, which will inevitably become the bottleneck (because of the low write speed into the system memory, comparable to PCI).

That's how NVIDIA explains the differences in its articles:

... regular architecture and NV44 with TurboCache:

You can see the difference due to data feed for textures and the additional way to write frame data (blending) into the system memory. However, the initial chip architecture with a crossbar treating the graphics bus practically as the fifth channel of the memory controller may have been capable of this from the very beginning (starting from NV40 or even earlier). And it's difficult to say whether NV44 actually has architectural changes in writing and reading data or these features are implemented on the level of drivers.

On the other hand, we shall not deny that it would be optimal to have some paging MMU and dynamic swapping of data from system to local memory, which would be treated as L3 Cache. In case of such architecture everything falls to its place. The efficiency will be noticeably higher than discrete allocation of objects and minor hardware revisions will be justified. Especially as having tested this paging unit, one can use it in future architectures, which to all appearances will be equipped with such units without fail (because of DX Next requirements).

In the first case everything depends on the driver, it determines which textures, geometric data and frame buffers and where to locate. In case of paging everything may be determined by hardware only, or one can mark some objects as "not cacheable", for example, geometric data or some frame buffers, which would provide no special gain in case of copying into local memory of the video card, but on the contrary would decrease the final efficiency. This distribution looks like a very difficult task and it actually requires constant (in each frame) analysis of values of internal chip performance counters. It must be done relative to various separate objects in its memory, because it's very difficult to predict the influence of access patterns in different cases. In the peak case, especially in case of TC-16, the system memory can be read even faster than the local memory. But system memory latencies and its real performance (DDR200-300*64 bit - CPU needs and competitive access arbitration) will result in lower efficiency than that of the 32-bit 350 MHz local option anyway.

So, only practice will show how justified NVIDIA's solution was in case of 32-bit TC-16 and in case of 64-bit TC-32. But one thing we can say for sure – everything will be actually up to the price and applications popular among users. If the card costs $40 – that's one story, if $70 – another, because competing 128-bit solutions will strangle TurboCache with their 128-bit memory, which is more efficient anyway. Applications will also make themselves felt – intensive shaders or UltraShadow as in Doom 3 – and the 6200 TC goes great guns; simple shaders and full screen antialiasing – and it fails.

We shall check all these assumptions and talk about the prices further.

Video Cards

Both cards are completely identical in PCB layout, the only difference is that the 16 MB card does not have one of the two memory chips.

NVIDIA GeForce 6200 TurboCache 16/32 MB (128MB) (NV44)

NVIDIA GeForce 6200 TurboCache 16/32 MB (128MB) (NV44)
The cards have the PCI-Express x16 interface, 16 MB or 32 MB DDR SDRAM allocated in one or two chips on the front and the back sides of the PCB. Samsung memory chips. 2.8ns memory access time, which corresponds to 350 (700) MHz, at which the memory operates. GPU frequency — 350 MHz. Memory exchange bus – 32 bit in the 16 MB card and 64 bit in the 32 MB card. 4 pixel pipelines 3 vertex ones.

Comparison with the reference design, front view
NVIDIA GeForce 6200 TurboCache 16/32 MB (128MB) (NV44)	Reference card NVIDIA GeForce 6200 128MB

Comparison with the reference design, back view
NVIDIA GeForce 6200 TurboCache 16/32 MB (128MB) (NV44)	Reference card NVIDIA GeForce 6200 128MB

We can see that the layout has been vastly modified. Personally, I am perplexed why such a primitive card with two memory chips has such a complex PCB. We still remember old cards with the same memory capacity, which were significantly simpler. Yes, sure, the frequencies are high. But why on Earth cut the bus down so much and raise the frequency? In my personal opinion, this approach seems unreasonable and even silly. Only an amateur can install ONE(!) but expensive BGA 2.8ns chip and an absolutely horrible bus (32 bit) instead of installing 4 or 8 old chips (TSOP, 5ns, for example), which will be CHEAPER, and leaving the bus alone. So, I have a lot of questions left unanswered after I examined the design of this card.

The cooling system is primitive: just a heatsink, so we shall skip it.

Let's have a look at the graphics processor.

You may notice that the chip dimensions are a tad smaller than those of its predecessors (for example, GeForce 6200/6600 – NV43). I mean the substrate, not the dye itself, which has also grown less though (that's clear, the number of pipelines is two times less). It indicates that NV44 has hardware support for the memory exchange bus of not more than 64 bit. And another question of mine: why such a complex PCB with a 64 bit bus?

Installation and Drivers

Testbed configurations:

Athlon64 based computer
- CPU: AMD Athlon64 2400 MHz (218MHz x 11; L2=512K) (~3800+);
- motherboard: ASUS A8N SLI Deluxe based on NVIDIA nForce4 SLI; ONE video card mode!
- RAM: 1 GB DDR SDRAM 400MHz
- HDD: WD Caviar SE WD1600JD 160GB SATA
Operating system: Windows XP SP2; DirectX 9.0c
Monitors: ViewSonic P810 (21") and Mitsubishi Diamond Pro 2070sb (21").
ATI drivers 6.497 (CATALYST 4.12); NVIDIA drivers 71.20.

VSync is disabled.

Both companies have enabled trilinear filtering optimizations in their drivers by default.

It should be noted that Alexei Nickolaichuk, the author of RivaTuner, has quickly added the nv44 support into his utility. It turned up that the chip has two bits in its registers responsible for each quad. That is, it would have been logical for the chip to have 8 pipelines, but 2 ROP. This is certainly beyond any logic and common sense (besides, FillRate tests demonstrated that there are actually 4 pipelines). That's why the only sound conclusion is that NV44 operates with pixel pairs rather than with quads. We are going to investigate into this issue.

Note that you cannot learn how much memory is actually installed on the video card but from RT :), because the drivers inform about the total capacity: local memory + part of the main RAM.

Test results

Before giving a brief evaluation of 2D, I will repeat that at present there is NO valid method for objective evaluation of this parameter due to the following reasons:

2D quality in most modern 3D accelerators dramatically depends on a specific sample, and it's impossible to evaluate all the cards.
2D quality depends not only on the video card, but also on the monitor and a cable.
A great impact on this parameter has been recently demonstrated by monitor-card pairs. That is there are monitors, which just won't "work" with specific video cards.

What concerns our samples under review together with Mitsubishi Diamond Pro 2070sb, these cards demonstrated identically excellent quality in the following resolutions and frequencies:

NVIDIA GeForce 6200 TurboCache 16/32 MB (128MB) (NV44)

1600x1200x75Hz, 1280x1024x85Hz, 1024x768x100Hz

Test results: performance comparison

We used the following test applications:

Unreal Tournament 2004 v.3225 (Digital Extreme/Epic Games) – Direct3D, Vertex Shaders, Hardware T&L, Dot3, cube texturing, max quality; demo (MOON)
Tomb Raider: Angel of Darkness v.49 (Core Design/Eldos Software) – DirectX 9.0, Paris5_4 demo. The tests were conducted with the quality set to maximum, only Depth of Fields PS20 was disabled.
Half-Life2 (Valve/Sierra) – DirectX 9.0, demo (ixbt01, ixbt02, ixbt03 The tests were carried out with maximum quality, option -dxlevel 90, presets for video card types are removed from dxsupport.cfg.
FarCry 1.3 (Crytek/UbiSoft), DirectX 9.0, multitexturing, 3 demo from Research, Pier, Regulator levels (the game is started with -DEVMODE option), Very High test settings.
DOOM III (id Software/Activision) – OpenGL, multitexturing, test settings – High Quality (ANIS8x), demo ixbt1 (33MB!). We have a sample batch file to start the game automatically with increased speed and reduced jerking (precaching) d3auto.rar. (DO NOT BE AFRAID of the black screen after the first menu, that's how it should be! It will last 5-10 seconds and then the demo should start)
3DMark05 (FutureMark) – DirectX 9.0, multitexturing, test settings – trilinear,

Unreal Tournament 2004

Test results: Unreal Tournament 2004

GeForce 6200TC 16MB (79USD) versus RADEON X300SE (70USD) – defeat
GeForce 6200TC 32MB (99USD) versus RADEON X300 (90USD) – the same picture.

It's a sad picture so far for the new video cards, which are also more expensive at that (preliminary prices).

TR:AoD, Paris5_4 DEMO

Test results: TRAOD

The same results as in the previous test, failure. Certainly, the new cards look good against a more expensive PCX5750, but this card is already leaving the market pushed by the GF6200/6600.

FarCry, Research

Test results: FarCry Research

The results are still obviously negative. The only exception is that the 32 MB card outscored the X300SE in 1024x768. But the prices do not allow competition. If NVIDIA dropped its prices for the new video cards by 30-40 dollars, only then the picture would be better.

FarCry, Regulator

Test results: FarCry Regulator

GeForce 6200TC 16MB (79USD) versus RADEON X300SE (70USD) – that's the first test where the new cards demonstrate better results, even victory!
GeForce 6200TC 32MB (99USD) versus RADEON X300 (90USD) – the success is even greater!

FarCry, Pier

Test results: FarCry Pier

GeForce 6200TC 16MB (79USD) versus RADEON X300SE (70USD) – defeated again
GeForce 6200TC 32MB (99USD) versus RADEON X300 (90USD) – the results are not that simple here. We can just call it parity with the X300.

Half-Life2: ixbt01 demo

Test results: Half-Life2, ixbt01

GeForce 6200TC 16MB (79USD) versus RADEON X300SE (70USD) – defeat
GeForce 6200TC 32MB (99USD) versus RADEON X300 (90USD) – victory only over the X300SE, but the card was outscored by its competitor.

Half-Life2: ixbt02 demo

Test results: Half-Life2, ixbt02

GeForce 6200TC 16MB (79USD) – RADEON X300SE (70USD) – still defeated
GeForce 6200TC 32MB (99USD) versus RADEON X300 (90USD) – similar to the previous test

Half-Life2: ixbt03 demo

Test results: Half-Life2, ixbt03

The same picture.

DOOM III

Test results: DOOM III

GeForce 6200TC 16MB (79USD) versus RADEON X300SE (70USD) –we can cautiously announce a victory.
GeForce 6200TC 32MB (99USD) – RADEON X300 (90USD) – significant success.

At least the new cards managed to demonstrate good results in this game.

The 32 MB video card is obviously victorious, though a cheaper modification with 16 MB is slightly outscored.

Conclusions

The main weak point is the memory bandwidth, the two-channel 64-bit controller, and writing results to the frame buffer. TurboCache is helpful, but it's not a cure-all. Good 128 bit solution may have a comparable price, being certainly a winner in performance.

Complete compliance with NV40 in the set of supported features and the architecture of vertex and pixel processors is laudable, there is no need to take into account NV44 peculiarities in programs.

Lesser complexity and consumption played their positive role. The clock frequency is high, and, judging from the technology, it can still be raised much higher in future – the only bottleneck is memory.

I want to believe that such a narrow system bus and simple package will soon result in the cost reduction.

At such preliminary prices (79 and 99 USD) they are obviously not faring well. The tests demonstrated that they will be in demand only if these cards are cheaper than their competitors (considerably cheaper, the price does not exceed $50-70).

In this case video cards' performance obviously depends very much on drivers, TurboCache being provided by them. That's why their performance may still grow.

Besides, as you can see, TurboCache was of little help to these cards. The objective was good – to reduce the cards' costs, because 128 and 256 MB of memory in a low end segment make up a considerable part of the cost. But what's the result? Higher prices instead of lower ones? Let's hope the prices will settle down.

If the final retail prices are not 79, 99 and 129 USD for these products but 49, 69 and 89, these video cards will get a chance to being in demand, they will even be attractive.

That's why we are not drawing final conclusions, they are impossible so far because everything is up to the prices for the new cards.

In our 3Digest you can find more detailed comparisons of various video cards.

Theoretical materials and reviews of video cards, which concern functional properties of the GPU ATI RADEON X800 (R420)/X850 (R480)/6200tc (nv44)/X700 (RV410) and NVIDIA GeForce 6800 (NV40/45)/6600 (NV43)

Andrey Vorobiev (anvakams@ixbt.com)

Alexander Medvedev (unclesam@ixbt.com)

January 12, 2005

Write a comment below. No registration needed!

Article navigation: