Installation and Drivers
VSync is disabled.
There is no point in describing driver settings, because nothing changed since GeForce 6800.
To study the features of GeForce 6600GT, it was very interesting to compare these cards with their senior counterparts based on GeForce 6800/6800GT operating on the same frequencies and with reduced number of pipelines as GeForce 6600GT. That's why we used Palit GeForce 6800 for these purposes,
and modified it using RivaTuner:
Thus, having limited the number of pipelines to that of GeForce 6600GT and having set the frequencies to 325/175 (350) MHz, we can test this 6600 emulator based on 6800 and compare the results with similar test results of GeForce 6600GT operating at 325/350 (700) MHz. The memory frequency in GeForce 6800 is reduced twofold because of the bus differences in both cards. Then the resulting memory bandwidth is the same:
But you should bear in mind that GeForce 6800 as well as some other video cards, with which we'll compare 6600GT, have the AGP interface, and that's why they can be tested only on another testbed (Athlon64 3400+ based platform). Our preliminary tests of similar video cards on AGP/PCX interfaces, e.g. X800XT, demonstrated that the results obtained on Athlon64 3400+ and Pentium4 3600 MHz have only inconsiderable difference. And when the cards are loaded with AA or AF (or both), the difference disappears.
That's why we compared our cards with RADEON 9800 PRO 128MB and GeForce 6800LE (8 pixel and 4 vertex pipelines) despite the difference in interfaces and platforms. The difference percentage figures obtained without AA and AF (so called pure speed) are highlighted in the tables with dark blue colour, which makes the figures difficult to read. In this symbolic way we marked the most "unfair" comparison, where the platform influence on performance is the greatest. If you wish, you may ignore these results.
Then, not contenting ourselves with indirect comparisons without GeForce 6800 based on PCX, we have conducted another comparison with GeForce 6800GT PCI-E, where we limited its number of pipelines in the core (8/3 as in 6600) and its frequencies to 350/250 (500) MHz. Thus, the comparison with GeForce 6600GT (operating at 350/500 (1000) MHz) will be completely justified by the core frequency as well as by the number of pipelines and memory bandwidth. Only the difference in memory size and type may have its effect.
Thus, in tables with comparison percentages in gaming and some synthetic tests we introduced several colour notation conventions:
Before giving a brief evaluation of 2D, I will repeat that at present there is NO valid method for objective evaluation of this parameter due to the following reasons:
What concerns the combo of our sample under review and Mitsubishi
Diamond Pro 2070sb, this card demonstrated the excellent
quality in the following resolutions and frequencies:
Synthetic tests in D3D RightMark
The version of the synthetic benchmark package D3D RightMark Beta 4 (1050), which we used, and its description are available on the web site http://3d.rightmark.org
A list of video cards:
To detect specific differences between NV40 and NV43 we carried out tests not only with the original 6600 GT (500/500 frequencies) but also with the reference pair of cards - 6800 GT (350/250) with 8 pixel and 3 vertex pipelines and 6600 GT (350/500) also with the 8/3 pipeline scheme and the equivalent memory bandwidth (it's 16 GB/sec in both cards with these memory frequency settings). Thus, we hope to see the differences concerned with various possible quantitative and qualitative modifications introduced in NV43. Besides, for general comparison we publish the results of 6800 Ultra. It's a monument point for absolute comparisons to find out how much the mainstream solution loses to the expensive high end.
At first let's find out the conformity of claimed characteristics (8 pixels per clock, etc) with reality. So:
Pixel Filling Test
Peak performance of texelrate, FFP mode, for various numbers of textures applied to one pixel:
Theoretical maximum of NV43-500 in this test is 4 gigatexel/sec. In reality we reached 3.4 gigatexels, which unambiguously testifies to 8 texture modules. In case of one texture, the result is lower than in case of two – we suffer from the insufficient frame buffer bandwidth. And then we face a smooth dependence - with each new texture the fill rate gradually drops. Results of the reference card 6800 GT match those of 6600 starting with two textures – either it's the effect of insufficient two-channel memory controller in 6600, or (which is also possible) we are right about the lite process of blending and writing the results into a frame buffer.
And now – the fill rate of frame buffer, FFP mode, for various number of textures applied to one pixel:
Let's see how the fill rate depends on a shader version.
So, firstly, the fill rate does not depend on shader versions (which perfectly correlates with our idea of the NV40 architecture). And speaking of the fill efficiency, we've got the following picture:
Which again unambiguously testifies to the considerable difference in efficiency of writing data to the frame buffer. We cannot really say, which fault it is:
But the fact remains. 6800 GT (350/250) based on NV40, which is absolutely identical in memory bandwidth and the number of pipelines, performs much better than 6600 GT (350/500) based on NV43 in case of small number of textures (and, consequently, maximum writing load on the frame buffer). That is 6600 processes 8 pixels per clock, but writes to the frame buffer only 4(!) per clock. And while the pixel shader processes the following values, it writes the remaining 4 pixels. This may mislead somebody to think that NV43 has 4 pipelines but 8 texture units. This is wrong. Actually, this measure is quite reasonable, because shaders work minimum with two commands.
Geometry Processing Speed Test
The simplest shader – maximum throughput for triangles:
The bottleneck here is obviously not the accelerator, but the processor, software, and the platform. Peak geometrical throughput of modern accelerators is more than sufficient.
More complex shader – one simple point light source:
Everything depends on the core frequency here. But it was sufficient as in case with 6800 Ultra and 6600 GT. And, as in the previous test, the results were limited by other factors, processor, system, etc.
Note the equality of NV43 and NV40 at the 350 core frequency. It just confirms our assumptions about the complete identity of vertex units in these chips.
Let's complicate the task even further:
In more intensive calculations the chips rate strictly according to their clock frequency and a number of vertex units - no surprises here. The only figure attracting attention is the high FFP result, which is practically comparable with the 6 and 3 unit configurations. Special units accelerating FFP emulation allowed 6800 Ultra to rest on processor even in this test.
And now the most complicated task, three light sources, for comparison without branching, with static and dynamic control:
At first, let's note the equal shader performance in the 2.a profile compilation with dynamic branching and in the 3.0 profile compilation – which is quite expected because dynamic branching is organized on the hardware level by the same method in both profiles. The only principal difference of 3.0 in this case is in the slightly expanded command system and texture fetch, which performance we didn't test yet (this test will be implemented in the next version of RightMark D3D, but we already know that this function is executed by NV4X chips unhurriedly). At the minimum, it's good that the latest DX compilation of Shaders 3.0 is optimized and does not lose to 2.x profiles.
On the whole the cards fared strictly according to their clock frequencies and numbers of pipelines, with the only exception of FFP in 6800 Ultra, where the chip demonstrated phenomenal performance and was limited only by the testbed capacity.
Pixel Shaders Test
The first group of shaders (1.1, 1.4 and 2.0) is rather easy to execute in real time:
All the tests match well the clock frequency and the number of pipelines, and chip results at the 350 core frequency match perfectly. Thus, we draw a conclusion about the identity of pixel processors in NV40 and NV43. As it was expected, due to its high core frequency 6600 GT fares well in pixel shaders even on the background of a much more expensive 6800 Ultra. We can only be glad for the buyers of mainstream solutions, they will be able to play the latest games, though in moderate resolutions.
And now let's have a look at complex shaders:
The same. Interestingly, 6600 is always faster (though not much) than 6800 GT in similar configuration. There seem to be some minor differences concerned with pixel pipeline optimizations, which added these several percents of permanent advantage. However, the difference is so small, that it may be due to smaller GDDR3 latencies.
Thus, concerning pixel shaders:
Firstly, peak efficiency (without textures and with textures) depending on the geometry complexity:
HSR algorithm is very similar and the results match almost everywhere. However, without textures NV43 demonstrates results different from NV40, especially in scenes with high overlapping factor. What's the matter? Let's see the absolute figures:
You can see that 6800 GT demonstrated a little higher speed of skipping invisible blocks and writing results, where the texturing was not enabled. It looks more like the effect of reduced write efficiency of the frame buffer (which we noticed during fill rate tests) than the reduced efficiency of the HSR subsystem in NV43.
Point Sprites Test
You can clearly see more effective operation of NV40 with blending and writing to frame buffer – with all sprite sizes it outscores NV43 at the equal frequency.
With large sprite sizes NV40 outscores NV43, again due to higher maximum speed of blending and writing to the frame buffer. With average sizes the results are identical – the limiting factor is the calculation of colour, lighting, and texture fetch. Here NV40 and NV43 are on the level.
And again we can see that MSAA 4x in NV40 is considerably more effective. This is the result of the two-channel memory controller and/or a simplified unit of rendering, writing, and post processing of results. As we have already assumed above, the reason may be in complete (or partial?) lack of screen buffer compression algorithms. Perhaps they are disabled in the driver. In our coming reviews we'll try to examine this issue more thoroughly and find an answer.
Write a comment below. No registration needed!