iXBT Labs - Computer Hardware in Detail

Platform

Video

Multimedia

Mobile

Other

NVIDIA GeForce 6600GT and 6600 (NV43):
Part 1 - Performance.








Contents

  1. Official specifications
  2. Architecture
  3. Video cards' features
  4. Testbed configurations, benchmarks, 2D quality
  5. Synthetic tests in D3D RightMark
  6. Synthetic tests in 3DMark03: FillRate Multitexturing
  7. Synthetic tests in 3DMark03: Vertex Shaders
  8. Synthetic tests in 3DMark03: Pixel Shaders
  9. Test results: Quake3 ARENA
  10. Test results: Serious Sam: The Second Encounter
  11. Test results: Return to Castle Wolfenstein
  12. Test results: Code Creatures DEMO
  13. Test results: Unreal Tournament 2003
  14. Test results: Unreal II: The Awakening
  15. Test results: RightMark 3D
  16. Test results: TRAOD
  17. Test results: FarCry
  18. Test results: Call Of Duty
  19. Test results: HALO: Combat Evolved
  20. Test results: Half-Life2(beta)
  21. Test results: Splinter Cell
  22. Test results: DOOM III
  23. Test results: 3DMark03 Game1
  24. Test results: 3DMark03 Game2
  25. Test results: 3DMark03 Game3
  26. Test results: 3DMark03 Game4
  27. Test results: 3DMark03 MARKS
  28. Conclusions



Installation and Drivers

Testbed configurations:

    Pentium4 Overclocked 3200 MHz (Prescott) based computer
    • Intel Pentium4 3600 MHz CPU (225MHz x 16; L2=1024K, LGA775); Hyper-Threading enabled
    • ABIT AA8 DuraMAX mainboard based on i925X
    • 1 GB DDR2 SDRAM 300MHz
    • WD Caviar SE WD1600JD 160GB SATA HDD

  • Athlon 64 3400+ based computer
    • AMD Athlon 64 3200+ (L2=1024K) CPU
    • ASUS K8V SE Deluxe mainboard based on VIA K8T800
    • 1 GB DDR SDRAM PC3200
    • Seagate Barracuda 7200.7 80GB SATA HDD

  • Operating system – Windows XP SP2; DirectX 9.0c
  • Monitors: ViewSonic P810 (21") and Mitsubishi Diamond Pro 2070sb (21").
  • ATI drivers v6.476 (CATALYST 4.9); NVIDIA drivers v65.76.

VSync is disabled.




There is no point in describing driver settings, because nothing changed since GeForce 6800.

ATTENTION!

To study the features of GeForce 6600GT, it was very interesting to compare these cards with their senior counterparts based on GeForce 6800/6800GT operating on the same frequencies and with reduced number of pipelines as GeForce 6600GT. That's why we used Palit GeForce 6800 for these purposes,




and modified it using RivaTuner:




Thus, having limited the number of pipelines to that of GeForce 6600GT and having set the frequencies to 325/175 (350) MHz, we can test this 6600 emulator based on 6800 and compare the results with similar test results of GeForce 6600GT operating at 325/350 (700) MHz. The memory frequency in GeForce 6800 is reduced twofold because of the bus differences in both cards. Then the resulting memory bandwidth is the same: 128bit x 700 MHz = 256bit x 350 MHz.

But you should bear in mind that GeForce 6800 as well as some other video cards, with which we'll compare 6600GT, have the AGP interface, and that's why they can be tested only on another testbed (Athlon64 3400+ based platform). Our preliminary tests of similar video cards on AGP/PCX interfaces, e.g. X800XT, demonstrated that the results obtained on Athlon64 3400+ and Pentium4 3600 MHz have only inconsiderable difference. And when the cards are loaded with AA or AF (or both), the difference disappears.

That's why we compared our cards with RADEON 9800 PRO 128MB and GeForce 6800LE (8 pixel and 4 vertex pipelines) despite the difference in interfaces and platforms. The difference percentage figures obtained without AA and AF (so called pure speed) are highlighted in the tables with dark blue colour, which makes the figures difficult to read. In this symbolic way we marked the most "unfair" comparison, where the platform influence on performance is the greatest. If you wish, you may ignore these results.

Then, not contenting ourselves with indirect comparisons without GeForce 6800 based on PCX, we have conducted another comparison with GeForce 6800GT PCI-E, where we limited its number of pipelines in the core (8/3 as in 6600) and its frequencies to 350/250 (500) MHz. Thus, the comparison with GeForce 6600GT (operating at 350/500 (1000) MHz) will be completely justified by the core frequency as well as by the number of pipelines and memory bandwidth. Only the difference in memory size and type may have its effect.

Thus, in tables with comparison percentages in gaming and some synthetic tests we introduced several colour notation conventions:

  • ALL BLUE COLOURS mean comparisons with video cards operating on a different (Athlon64) platform;
  • DARK BLUE COLOUR - comparison in the "pure speed" mode without AA and AF (you can ignore it);
  • BLUE COLOUR - comparison with GeForce 6800 operating at the same frequencies and with the same number of pipelines, and the same memory bandwidth as in GeForce 6600GT;
  • DARK YELLOW COLOUR means a similar comparison with GeForce 6800GT PCX;
  • LIGHT-GREEN COLOUR - GeForce 6600 (300/300 (600) MHz) analysis;
  • WHITE COLOUR - all other comparisons.

Test results:

Before giving a brief evaluation of 2D, I will repeat that at present there is NO valid method for objective evaluation of this parameter due to the following reasons:

  1. 2D quality in most modern 3D accelerators dramatically depends on a specific sample, and it's impossible to evaluate all the cards.
  2. 2D quality depends not only on the video card, but also on the monitor and a cable.
  3. A great impact on this parameter has been recently demonstrated by monitor-card combos, that is there are monitors, which just won't "work" with specific video cards.

What concerns the combo of our sample under review and Mitsubishi Diamond Pro 2070sb, this card demonstrated the excellent quality in the following resolutions and frequencies:

NVIDIA GeForce 6600GT 1600x1200x85Hz, 1280x1024x120Hz, 1024x768x160Hz

Synthetic tests in D3D RightMark

The version of the synthetic benchmark package D3D RightMark Beta 4 (1050), which we used, and its description are available on the web site http://3d.rightmark.org

A list of video cards:

  • 6600 GT (500/500)
  • 6600 GT (350/500)
  • 6800 GT (350/250)
  • 6800 Ultra (400/550)

To detect specific differences between NV40 and NV43 we carried out tests not only with the original 6600 GT (500/500 frequencies) but also with the reference pair of cards - 6800 GT (350/250) with 8 pixel and 3 vertex pipelines and 6600 GT (350/500) also with the 8/3 pipeline scheme and the equivalent memory bandwidth (it's 16 GB/sec in both cards with these memory frequency settings). Thus, we hope to see the differences concerned with various possible quantitative and qualitative modifications introduced in NV43. Besides, for general comparison we publish the results of 6800 Ultra. It's a monument point for absolute comparisons to find out how much the mainstream solution loses to the expensive high end.

At first let's find out the conformity of claimed characteristics (8 pixels per clock, etc) with reality. So:

Pixel Filling Test

Peak performance of texelrate, FFP mode, for various numbers of textures applied to one pixel:




Theoretical maximum of NV43-500 in this test is 4 gigatexel/sec. In reality we reached 3.4 gigatexels, which unambiguously testifies to 8 texture modules. In case of one texture, the result is lower than in case of two – we suffer from the insufficient frame buffer bandwidth. And then we face a smooth dependence - with each new texture the fill rate gradually drops. Results of the reference card 6800 GT match those of 6600 starting with two textures – either it's the effect of insufficient two-channel memory controller in 6600, or (which is also possible) we are right about the lite process of blending and writing the results into a frame buffer.

And now – the fill rate of frame buffer, FFP mode, for various number of textures applied to one pixel:




Let's see how the fill rate depends on a shader version.










So, firstly, the fill rate does not depend on shader versions (which perfectly correlates with our idea of the NV40 architecture). And speaking of the fill efficiency, we've got the following picture:



Video card ( core / memory ) Theoretical limit Practical limit
6600 GT (500/500)
4000
1887
6600 GT (350/500)
2800
1259
6800 GT (350/250)
2800
2515
6800 U (400/550)
6400
5032

Which again unambiguously testifies to the considerable difference in efficiency of writing data to the frame buffer. We cannot really say, which fault it is:

  1. Two-channel memory controller
  2. Some buffers, reduced in NV43 as against NV40
  3. Reduced throughput of the blending and writing block.
  4. Frame buffer compression technology (which could be disabled in drivers as a performance leap reserved for future, as well as cut down from NV43 as it had been already done with NV34);

But the fact remains. 6800 GT (350/250) based on NV40, which is absolutely identical in memory bandwidth and the number of pipelines, performs much better than 6600 GT (350/500) based on NV43 in case of small number of textures (and, consequently, maximum writing load on the frame buffer). That is 6600 processes 8 pixels per clock, but writes to the frame buffer only 4(!) per clock. And while the pixel shader processes the following values, it writes the remaining 4 pixels. This may mislead somebody to think that NV43 has 4 pipelines but 8 texture units. This is wrong. Actually, this measure is quite reasonable, because shaders work minimum with two commands.

Geometry Processing Speed Test

The simplest shader – maximum throughput for triangles:




The bottleneck here is obviously not the accelerator, but the processor, software, and the platform. Peak geometrical throughput of modern accelerators is more than sufficient.

More complex shader – one simple point light source:




Everything depends on the core frequency here. But it was sufficient as in case with 6800 Ultra and 6600 GT. And, as in the previous test, the results were limited by other factors, processor, system, etc.

Note the equality of NV43 and NV40 at the 350 core frequency. It just confirms our assumptions about the complete identity of vertex units in these chips.

Let's complicate the task even further:




In more intensive calculations the chips rate strictly according to their clock frequency and a number of vertex units - no surprises here. The only figure attracting attention is the high FFP result, which is practically comparable with the 6 and 3 unit configurations. Special units accelerating FFP emulation allowed 6800 Ultra to rest on processor even in this test.

And now the most complicated task, three light sources, for comparison without branching, with static and dynamic control:




At first, let's note the equal shader performance in the 2.a profile compilation with dynamic branching and in the 3.0 profile compilation – which is quite expected because dynamic branching is organized on the hardware level by the same method in both profiles. The only principal difference of 3.0 in this case is in the slightly expanded command system and texture fetch, which performance we didn't test yet (this test will be implemented in the next version of RightMark D3D, but we already know that this function is executed by NV4X chips unhurriedly). At the minimum, it's good that the latest DX compilation of Shaders 3.0 is optimized and does not lose to 2.x profiles.

On the whole the cards fared strictly according to their clock frequencies and numbers of pipelines, with the only exception of FFP in 6800 Ultra, where the chip demonstrated phenomenal performance and was limited only by the testbed capacity.

So:

  1. Vertex architecture of NV43 completely matches that of NV40.
  2. With equal frequencies the equal number of vertex units gives equal performance.
  3. Vertex unit performance, especially FFP emulation and some other simple tests, is very high, it exceeds conservatively our testbed capacity.
  4. As it was already noted in our previous reviews, dynamic branching in NV4X chips is more preferable than static, and the lack of branching is more preferable than its presence.
  5. Vertex Shaders 3.0 are compiled and work no less effectively than 2.x, as it has been expected.

Pixel Shaders Test

The first group of shaders (1.1, 1.4 and 2.0) is rather easy to execute in real time:




All the tests match well the clock frequency and the number of pipelines, and chip results at the 350 core frequency match perfectly. Thus, we draw a conclusion about the identity of pixel processors in NV40 and NV43. As it was expected, due to its high core frequency 6600 GT fares well in pixel shaders even on the background of a much more expensive 6800 Ultra. We can only be glad for the buyers of mainstream solutions, they will be able to play the latest games, though in moderate resolutions.

And now let's have a look at complex shaders:




The same. Interestingly, 6600 is always faster (though not much) than 6800 GT in similar configuration. There seem to be some minor differences concerned with pixel pipeline optimizations, which added these several percents of permanent advantage. However, the difference is so small, that it may be due to smaller GDDR3 latencies.

Thus, concerning pixel shaders:

  1. The performance is up to the mark, even in comparison with more expensive solutions. Pixel shader reserve is considerable and sufficient for even the most demanding applications.
  2. From the architectural point of view, pixel processors in NV40 and NV43 are almost identical, the difference is below a couple of percents – possible inaccuracies of different memory types.

HSR Test

Firstly, peak efficiency (without textures and with textures) depending on the geometry complexity:








HSR algorithm is very similar and the results match almost everywhere. However, without textures NV43 demonstrates results different from NV40, especially in scenes with high overlapping factor. What's the matter? Let's see the absolute figures:








You can see that 6800 GT demonstrated a little higher speed of skipping invisible blocks and writing results, where the texturing was not enabled. It looks more like the effect of reduced write efficiency of the frame buffer (which we noticed during fill rate tests) than the reduced efficiency of the HSR subsystem in NV43.

Conclusion:

  1. HSR algorithm is not modified.
  2. In some cases its performance in NV43 differs from that in NV40, most likely due to indirect reasons (frame buffer write efficiency and two-channel memory controller)

Point Sprites Test




You can clearly see more effective operation of NV40 with blending and writing to frame buffer – with all sprite sizes it outscores NV43 at the equal frequency.




With large sprite sizes NV40 outscores NV43, again due to higher maximum speed of blending and writing to the frame buffer. With average sizes the results are identical – the limiting factor is the calculation of colour, lighting, and texture fetch. Here NV40 and NV43 are on the level.

MSAA Test




And again we can see that MSAA 4x in NV40 is considerably more effective. This is the result of the two-channel memory controller and/or a simplified unit of rendering, writing, and post processing of results. As we have already assumed above, the reason may be in complete (or partial?) lack of screen buffer compression algorithms. Perhaps they are disabled in the driver. In our coming reviews we'll try to examine this issue more thoroughly and find an answer.


[ The previous part (1) ]

[ The next part (3) ]



Andrey Vorobiev (anvakams@ixbt.com)
Alexander Medvedev (unclesam@ixbt.com)

07 September, 2004



Write a comment below. No registration needed!


Article navigation:



blog comments powered by Disqus

  Most Popular Reviews More    RSS  

AMD Phenom II X4 955, Phenom II X4 960T, Phenom II X6 1075T, and Intel Pentium G2120, Core i3-3220, Core i5-3330 Processors

Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
February 1, 2013 · Processor Roundups

Inno3D GeForce GTX 670 iChill, Inno3D GeForce GTX 660 Ti Graphics Cards

A couple of mid-range adapters with original cooling systems.
January 30, 2013 · Video cards: NVIDIA GPUs

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

AMD FX-8350 Processor

The first worthwhile Piledriver CPU.
September 11, 2012 · Processors: AMD

Consumed Power, Energy Consumption: Ivy Bridge vs. Sandy Bridge

Trying out the new method.
September 18, 2012 · Processors: Intel
  Latest Reviews More    RSS  

i3DSpeed, September 2013

Retested all graphics cards with the new drivers.
Oct 18, 2013 · 3Digests

i3DSpeed, August 2013

Added new benchmarks: BioShock Infinite and Metro: Last Light.
Sep 06, 2013 · 3Digests

i3DSpeed, July 2013

Added the test results of NVIDIA GeForce GTX 760 and AMD Radeon HD 7730.
Aug 05, 2013 · 3Digests

Gainward GeForce GTX 650 Ti BOOST 2GB Golden Sample Graphics Card

An excellent hybrid of GeForce GTX 650 Ti and GeForce GTX 660.
Jun 24, 2013 · Video cards: NVIDIA GPUs

i3DSpeed, May 2013

Added the test results of NVIDIA GeForce GTX 770/780.
Jun 03, 2013 · 3Digests
  Latest News More    RSS  

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  About us & Privacy policy  ·  Twitter  ·  Facebook


Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.