iXBT Labs - Computer Hardware in Detail

Platform

Video

Multimedia

Mobile

Other

NVIDIA GeForce FX 5600 Ultra (NV31)
& GeForce FX 5200 Ultra (NV34)






Before we start analyzing the new accelerator we recommend that you read the analytical article scrutinizing architecture and specifications of NVIDIA GeForce FX (NV30) and practical review of NVIDIA GeForce FX 5800 Ultra because the accelerators tested are based on NV30 technologies

CONTENTS

  1. General information
  2. Peculiarities of NVIDIA GeForce FX 5600 Ultra and 5200 Ultra video cards 
  3. Test system configuration and drivers' settings 
  4. Test results: briefly on 2D
  5. RightMark3D synthetic tests: philosophy and tests description 
  6. Test results: RightMark3D: Pixel Filling 
  7. Test results: RightMark3D: Geometry Processing Speed 
  8. Test results: RightMark3D: Hidden Surface Removal 
  9. Test results: RightMark3D: Pixel Shading 
  10. Test results: RightMark3D: Point Sprites 
  11. Test results: 3DMark2001 SE synthetic tests 
  12. Summary on the synthetic tests
  13. Test results: 3DMark2001 SE: Game1 
  14. Test results: 3DMark2001 SE: Game2 
  15. Test results: 3DMark2001 SE: Game3 
  16. Test results: 3DMark2001 SE: Game4 
  17. Test results: 3DMark03: Game1 
  18. Test results: 3DMark03: Game2 
  19. Test results: 3DMark03: Game3 
  20. Test results: 3DMark03: Game4 
  21. Test results: Quake3 ARENA 
  22. Test results: Serious Sam: The Second Encounter 
  23. Test results: Return to Castle Wolfenstein 
  24. Test results: Code Creatures DEMO 
  25. Test results: Unreal Tournament 2003 DEMO 
  26. Test results: AquaMark 
  27. Test results: RightMark 3D 
  28. Test results: DOOM III Alpha version 
  29. 3D quality: Anisotropic filtering
  30. 3D quality in general
  31. Conclusion

General information

We have recently tested the new NVIDIA's flagship - GeForce FX 5800 Ultra earlier known as NV30. But what is NV30? Chip or family? Actually we have two chips working at different clock speeds, hence 5800 Ultra and 5800. But NV30 can be considered a founder of a new family as it offers a new architecture and a new approach to making graphics processors. 




Last time when we tested NV30, we found out that it's time to give up the traditional ideas on accelerator's design, in particular, on the rendering architecture. 

Now a chip can have a different number pipelines. Actually, the driver can configure VPU's operation separately for each scene. It makes accelerators pretty flexible in modern games. Unfortunately, some old games using single texturing can't make use of all advantages of new processors, and a lot depends on drivers settings in games. 

Beside the flexible configuring NV30 supports DX9, and, therefore, shaders 2.0 and 2.0+. All the advantages of the new Microsoft's API are listed in the reviews mentioned above. 




The history demonstrates that all progressive 3D technologies were integrated into top High-End accelerators coming at high prices and affordable for a very narrow range of users. That is why game developers preferred to use old tried techniques in new games rather than new technologies. Why to spend time and money on shaders for more realistic images if only 5% of users are going to see it? Development of flexible games adjustable for different accelerators can be too costly. That is why development of the game market was held back by GeForce2 and TNT2 flooding the stores in spite of a great number of 3D games (they had rather moderate volumes of polygons in a scene and unimpressive effects). 

Up to 2003 NVIDIA was bringing only weak solutions onto the sub-$100 market, such as GeForce4 MX, deprived of even DX8 support. Even today all NVIDIA's accelerators with shader support are dearer than $120-130. On the contrary, ATI brought into this sector its RADEON 9000(PRO) with DirectX 8.1 support yet in summer 2002. It was first modern accelerator in the Low-End sector. But RADEON 9000 failed to gain wide popularity because of problems in drivers, incompatibility with some mainboards and low build quality of some boards coming from ATI's partners (you can forget about ATI's famous 2D quality). Finally, it was wrong to give the weaker solution a higher number in the name: both RADEON 9000 and RADEON 8500 support the same technologies (but the latter supports hardware TrueForm in contrast to 9000). 

Although GeForce4 MX was getting less popular, RADEON 9000 was a too weak competitor. But it worried NVIDIA that new technologies too slowly  penetrated into the market because low and middle sectors were flooded with DX7 accelerators (not even DX8). That's why the first step in developing NV30 was to expand DX9 techniques to Low-level and Mid-level accelerators. 

ATI offers something alike (RADEON 9500 64MB/128MB, 9500 PRO). But the lowest price its solutions reached (9500 64MB) is $150. Moreover, ATI has a too wide gap between its models - RADEON 9000 PRO costs less than $100. In the range of $100-150 ATI has only old RADEON 8500 which is of little interest today. NVIDIA's line codenamed NV34 targets exactly this sector and also aimed at pressing out RADEON 9000/PRO. All GeForce4 MX will  either go down to $50 or leave the market at all. When RADEON 9200 arrives (the same RADEON 9000 with AGP 8x), it will be fighting against the cheapest NV34 models. 

NV31 line is positioned against RADEON 9500-9500 PRO and oncoming RADEON 9600/PRO. NVIDIA GeForce4 Ti 4600(4800) and 4200 (4200-8x) will be replaced with GeForce FX 5600 Ultra and 5600. 

So, in spring 2003 NVIDIA rolls out a new batch of accelerators for Low and Middle market sectors. 

  • GeForce FX 5600 Ultra - 350 MHz chip, 128/256 MB 350 MHz (DDR 700) 128-bit memory (~$199); 
  • GeForce FX 5600 - ??? MHz chip, 128 MB ??? MHz (DDR) 128-bit memory  (~$179); 
  • GeForce FX 5200 Ultra - 325 MHz chip, 64/128 MB 325 MHz (DDR 650) 128-bit memory  (~$100-149); 
  • GeForce FX 5200 - ??? MHz chip, 64/128 MB ??? MHz (DDR) 128-bit memory (~$79-99). 

The prices given in parenthesis are rough and can fall down. 




Now have a look at the architecture of GeForce FX 5600 and 5200: 
 
 

  NV30 NV31 NV34
Technology, nm. 130 130 150
Transistors, M. 125 75 47
Pixel pipelines 4 2/4(1) 2
Texture units 8 4 4
Core clock speed, MHz 400/500 (Ultra) 350 (Ultra) 250/325 (Ultra)
Memory bus, bit 128 (DDR II) 128 (DDR) (2) 128 (DDR)
Memory bus clock speed (eff.), MHz 800/1000 (Ultra) 700 (Ultra) 333...650 (Ultra)
Pixel shader 2.0+ 2.0+ 2.0+ (3)
Vertex shaders 2.0+ 2.0+ 2.0+
Memory bandwidth, GB/s. 16 11.2 Up to 10.4 (Ultra)
HSR Yes Yes Yes
Early Z test Yes Yes Yes
Z compression Yes Yes Yes
Color compression in MSAA modes Up to 1:4 Up to 1:4 No
Hardware geometrical unit Yes Yes (4) Yes (4)
RAMDAC, MHz 2x400 2x400 2x350
TV-out Integrated ? External ? External ?
DVI External (5) Integrated Integrated
Form-factor FCPGA BGA (6) BGA (6)
External power supply Necessary Desirable Optional

Note: 

  1. NV31 can work in 4x1 mode (pipelines x texture units) or 2x2 (see further for more information). 
  2. DDR2 supported
  3. Pixel shaders in NV34 are a bit different from NV31, but they do comply with 2.0. 
  4. The hardware geometrical processors of NV31 and NV34 have identical computational power, which is twice lower than that of NV30. The difference in geometry processing speed between NV31 and NV34 is on account of memory controllers, and size of  caches and internal queues. 
  5. Up to 2 DVI interfaces, one can have two channels. 
  6. NV25 compliant

Besides: 

  1. NV31 possesses all NV30 features as well as some optimizations. NV35 will probably have also more aggressive performance of floating-point pixel operations. 
  2. Cache and queue size: NV30 > NV31 > NV34, the correlation is probably 4 > 2 > 1 

Cards

The cards have AGP x2/x4/x8 interface, 128 MB DDR SDRAM (8 chips on both PCB sides). 
 

NVIDIA GeForce FX 5600 Ultra; NVIDIA GeForce FX 5200 Ultra 







 
Hynix memory HY5DU283222-AF25, BGA form-factor. Maximum clock speed is 400 (800) MHz, 2.5 ns access time. The memory works at 350 (700) MHz in case of GeForce FX 5600 Ultra and at 325 (650) MHz in case of GeForce FX 5200 Ultra.


Both cards are identical as they are assembled on the same PCB. 
 

NVIDIA GeForce FX 5600 Ultra









NVIDIA GeForce FX 5200 Ultra









NVIDIA GeForce FX 5800 Ultra






NVIDIA GeForce4 Ti 4600






Their PCB is shorter than that of GeForce FX 5800 and GeForce4 Ti 4600. Nevertheless, external power supply is required. NV31 and NV34 have 4-pin connectors for a power supply unit, like that of NV30. If you don't connect it the driver will say the following (in course of OS booting): 




But if NV30 has much worse performance without external power supply, NV31/34 remain indifferent. 

Unlike FX 5800 Ultra, the new cards are bundled with plain coolers like on GeForce4 Ti 4600: 







The cooler is made of a closed heatsink with a fan pumping air through it. The heatsink has an almost mirror cover. 

The processors hidden under the heatsinks traditionally have a plastic cover with a metallic plate in the center. The packages are of the same size. 







The cards have an external TV coder - Philips 7114. 




Overclocking

In case of NV31 clock speeds are not separated for 2D and 3D in the drivers, and it's possible to adjust a single frequency; as a result, the card reaches 405/810 MHz. In NV34 the frequencies are separated (like in NV30), though the card belongs to the low-end sector. The frequencies are identical - 325/650 MHz, but the card doesn't go higher than 355/720 MHz because of bugs (though it doesn't hang). Therefore, the measurements were taken at 350/700 MHz to compare NV31 and NV34 at equal clock speeds in some tests. 

Testbed and drivers

Testbed: 

  • Pentium 4 (Socket 478) based computer: 
    • Intel Pentium 4 3066 (HT=ON); 
    • ASUS P4G8X (iE7205) mainboard; 
    • 1024 MB DDR SDRAM; 
    • Seagate Barracuda IV 40GB; 
    • ViewSonic P810 (21") and ViewSonic P817 (21") monitors
    • Windows XP SP1. 

We used NVIDIA's drivers v42.72, VSync off, texture compression off in applications. DirectX 9.0 installed. 

Video cards used for comparison: 

  • Gainward Powerpack Ultra/750 (GeForce Ti 4600, 300/325 (650) MHz, 128 MB); 
  • ABIT Siluro GFTi4200-8x (GeForce Ti 4200-8x, 250/256 (512) MHz, 128 MB); 
  • Prolink PixelView GeForce4 MX 440-8x (275/256 (512) MHz, 128 MB); 
  • Hercules 3D Prophet 9000 PRO (RADEON 9000 PRO, 275/275 (550) MHz, 128 MB, driver 6.292); 
  • Hercules 3D Prophet 9500 PRO (RADEON 9500 PRO, 275/270 (540) MHz, 128 MB, driver 6.292). 

Drivers settings

The driver settings are given in GeForce FX 5800 Ultra Review

Note that you can access some disabled tabs using this patch for Windows XP registry. 

Test results

2D graphics

Quality is superb. Both cards excellently work at 1600x1200@85Hz. 

Remember that 2D quality much depends on certain samples, monitors and cables; moreover, certain monitors might not work properly with particular video cards. 2D tests were carried out with ViewSonic P817-E monitor and Bargo BNC cable. 

RightMark 3D synthetic tests (DirectX 9)

Today we will describe and run the suite of synthetic tests we are currently developing for the API DX9. 

The developed RightMark3D test suite now includes the following synthetic tests:

  1. Pixel Filling Test; 
  2. Geometry Processing Speed Test; 
  3. Hidden Surface Removal Test; 
  4. Pixel Shader Test; 
  5. Point Sprites Test. 

The philosophy of these synthetic tests and their description are given in NV30 Review

Those who are eager to try RightMark 3D synthetic tests can download the "command-line" test versions which record the final XLS file in the XML format accepted in Microsoft Office XP: 

Every archive contains description of test parameters and an example of a .bat file used for benchmarking accelerators. We welcome your comments and ideas as well as information on errors or improper behavior of the tests. 

Mailto: unclesam@ixbt.com

Practical Tests

Below are the data obtained with budget and mainstream accelerators based on two major families (ATI and NVIDIA). 

  • ATI: 
    • RADEON 9000 PRO 128 Mb 
    • RADEON 9500 64 Mb 
    • RADEON 9500 PRO 128 Mb 

  • NVIDIA: 
    • GeForce 4 MX 440-8x 
    • GeForce 4 Ti 4200-8x 
    • GeForce 4 Ti 4600 
    • GeForce FX 5200 Ultra 
    • GeForce FX 5600 Ultra 

Number of pixel pipelines and their configuration

First of all, let's find out the actual number of pipelines and texture units of new GeForce FX models. 

NV34: 




The configuration of 2x2 - two pixel pipelines with two texture units on each - is obvious. The figures are very close to the theoretical  values of this formula. Now we add pixel shaders 1.1 and 2.0: 







The results are surprisingly the same! Either all operations are executed by a single ALU (whether it is floated-point or fixed-point is yet to be found out), or the drivers use aggressive optimization and bring the shaders to the minimal functional version. I.e. the tasks which can fit into the blend stage frames (register combiners) supported yet in DX7 cards are optimized this way. Later we will look into performance of complex shaders 2.0 to verify it. 

NV31: 










In programs without shader support this chip works as 4x1 in case of one texture and as 2x2 (!) in case of three textures. Consequently, the formula of 2x2 is also used for 2 and 4 textures. The same formula is adopted for shaders 1.1. With the second version the scheme is 2x2, like in NV30, but the operation is slower because textures can't be sampled simultaneously with shader's computational operations. NV31 can be called a 2x2 chip with 4x1 optimization for a particular case when pixel shaders are not used in applications. It seems that chip's pixel unit consists of an array of flexibly configurable ALUs (stages) which can form different numbers of pipelines depending on a given situation (stages or shader settings). 

Summary: 

  • NV31 - 4x1 chip in applications without pixel shaders and 2x2 chip in applications with them. 
  • NV34 - 2x2 chip always, performance doesn't depend on pixel shader version in this test (!). 

Pixel Filling

  1. The test measures a frame buffer fill rate (Pixel Fillrate). Constant color, no texture sampling. Scores are given in million pixels per second for different resolutions both in the standard mode and in 4x MSAA: 



  2. RADEON 9500 PRO takes the lead with its 8 pixel pipelines (no-texture filling depends on the number of pixel pipelines, core clock speed, memory throughput and its effective utilization). It is followed by NV31 (GeForce FX 5600 Ultra) which has 4 pipelines in this test but the gap is not double - RADEON 9500 PRO is limited by memory, in contrast to 5600 Ultra, proving that the latter is more balanced. Will it remain so well-balanced with more intensive shaders? Later we'll see it. 

    5600 Ultra is, as we can see, a DX9 replacement for 4600. But this substitute sells at a much lower price. In MSAA the chips have equally strong slowdown. 

    NV34 performs much worse because of only two pixel pipelines. This chip will be a perfect DX9 replacement for GeForce 4 MX 440. But it's also interesting whether the enhanced architecture can help the new chip make a giant jump forward in real applications.
     

  3. Frame buffer fillrate with simultaneous texturing. One simple bilinear texture sampling added. We will estimate how much the competitive read stream from memory cuts down the fill effectiveness. Data are in million pixels per second for different resolutions in the standard mode and at 4x MSAA: 



  4. The picture is actually the same though the peak values are a little lower. But the scores of RADEON 9500 PRO do not correspond to its capabilities, probably because of this combination of the test and drivers. We are currently discussing it with the experts at ATI. Note that neither RADEON 9700 nor 9700 PRO (see RADEON 9700 PRO DX9 Part 2) had such bugs! 

    Let's see whether the reality goes along with theoretical limits based on the core's frequency and number of pipelines: 

    Product Theoretical maximum Measured maximum (no texture) Measured maximum (one texture
    RADEON 9000 PRO 1100 1049 1026
    RADEON 9500 1100 1049 228 ???
    RADEON 9500 PRO 2200 1746 450 ???
    GeForce4 MX 440-8x 550 537 534
    GeForce4 Ti 4200-8x 1000 978 945
    GeForce4 Ti 4600 1200 1175 1150
    GeForce FX 5600 Ultra 1400 1371 1315
    GeForce FX 5200 Ultra 650 630 622

    Everyone except RADEON 9500 PRO realized its potential. But, as we mentioned above, RADEON 9500 family strangely performs in the single-texture mode. 

  5. Texturing Rate dependence (pixels sampled and filtered from textures, per second) on the number of 256x256 textures applied in a pass: 



  6. The configuration of 4x2 of GeForce 4 4200 was supposed to bring it advantage in multitexutring modes (2 to 4 textures). Now let's see whether this advantage will have an effect in real applications. 

    For comparison we took results of NV34 overclocked up to NV31 regarding memory and core clock speeds, and those of NV30 at the frequency of NV31. In case of single texture when blend stages may not be used these chips are on the same level but in other cases NV30 is twice as efficient as NV31. 

    Product Theoretical maximum Maximum reached
    GeForce FX 5200 Ultra 1300 1142 (2 textures)
    GeForce FX 5600 Ultra 1400 1315 (2 textures)
  7. Dependence on filtering type: 



  8. With tougher anisotropy settings the NV25 (4200 and 4600) loses its performance. It was discussed in depth in our previous reviews. But NV31 and NV34 are slowing down just a bit faster than ATI. RADEON 9500 again demonstrates strange performance. 

Geometry Processing Speed

  1. Fixed TCL performance (for NV30 and R300 - performance of the shader that emulates it): 



  2. The scores are sorted out according to complexity of a lighting model used. The lowest group is the simplest variant which corresponds to the peak accelerator's vertex throughput. Geometrical performance of NV31 and NV34 is identical in most applications; they differ a little only in case of the simplest variant because of the vertex cache volumes rather than of geometry processing speed. In other tasks NV31 and NV34 work 2.5 times slower than NV30 in spite of the same clock speed, - obviously, the number of vertex ALUs is 2.5 smaller. The test implies that both chips support hardware geometry processing. Here NV31 and NV34 lag behind RADEON 9500 PRO by 1.5 times on average. But remember that RADEON 9600 PRO will have twice weaker geometrical performance! 
     

  3. Vertex Shaders 1.1: 



  4. The layout of NV30, NV31 and 34 hasn't changed - the chips use geometry units which are similar in organization. RADEON 9500 shines again - its geometrical power is not limited and reaches RADEON 9700 which makes it look  much superior than NV31 and NV34. 
     

  5. Shaders 2.0 with loops: 



  6. It's similar to vertex shaders 1.1, with the gap from ATI being a little bigger. 

    So, T&L hardware emulation implemented by ATI is less efficient than NV's one and it is comparable to vertex shaders 2.0. The strongest link of NV3x is TCL emulation. The weakest one is loops. In this respect ATI has more flexible optimization in the drivers - transition and loop static execution allows for a more aggressive optimization. 

    NV31 and NV34 have identical performance in vertex tasks but they are 2.5 times slower than NV30 despite the same frequency. 
     

  7. Cross dependence on geometry detail level and shader's complexity: 









  8. As shaders get more complicated and detail levels higher, NV30 improves its position (due to vertex caches and other balancing aspects). In contrast, NV31 and NV34 are optimized not for scenes of moderate detail levels. 

Hidden Surface Removal

  1. HSR support and maximum performance percentagewise depending on resolution and number of triangles, no-texture scene (early Z cull is not counted): 



  2. NV31 and NV34 have HSR support, with NV34 having it a bit more efficient. Besides, GeForce 4 MX is also equipped with it! RADEON 900 PRO, 9500 and GeForce 4 Ti have it disabled. In RADEON 9500 PRO, as well as in RADEON 9700 PRO HSR works to its full capacity. 

    R300 uses an hierarchical structure, surfaces are often removed on a higher level, while the NV30 has only one decision-making level combined with tiles used for depth information compression. In 1600x1200 R300's HSR gets much less effective - probably the hierarchical depth buffer is not used anymore (e.g. for the sake of memory), and decisions are made, like in the NV30, only on the lowest level combined with the compressed blocks in the depth buffer. 
     

  3. HSR support and maximum performance percentagewise depending on resolution and number of triangles, textures enabled (early Z cull is accounted): 



  4. Both NV31/NV34 and R300 boost their performance. NVIDIA's NV31 and NV34 almost reached the record broken by RADEON 9500 PRO, and in the maximum resolution they even outscore it! 

Pixel Shading

This test is carried out only for R300 and NV3x as hardware execution of pixel shaders 2.0 is the minimal requirement here. Thus, software emulation of pixel shaders 2.0 provides only one frame per 2 seconds in small window on the good old GeForce4 Ti 4600 coupled with 2 GHz Pentium 4. 

  1. Shaders 2.0: 



  2. Running at the same frequency NV31 and NV34 work shoulder to shoulder with pixel shaders 2.0 but their performance is twice slower than NV30's one. R300 leads sometimes thanks to 4 (8) pixel pipelines against 2 in case of NV31 and NV34. 

  3. Dependence on resolution: 



  4. It's ok - the dependences coincide, and operation depends only on the number of shaded pixels. 

    Performance doesn't change whether we forcedly set 16 or 32 bit precision in DirectX; most likely, the drivers choose precision themselves ignoring the respective command modifiers in shaders. 
     
     

Point Sprites

  1. Lighting on and off, dependence on size: 






  2. As expected, lighting influences only small sprites; as they grow performance gets limited by the fillrate (approximately at the size of 8). So, for rendering systems comprising of a large number of particles the size should be less than 8. 

    NV31 and NV34 go behind NV30 clocked at the same frequency by the same margin we saw above. With bigger sprites the difference gets weaker due to the memory throughput (we forcedly slowed down NV30). It means that memory throughput is getting more and more influencial. 

    The peak is reached without lighting and makes 8 M sprites per sec for NV31/34 and about 17 M for NV30 brought down to the speed of GeForce 5600 Ultra and for RADEON 9500 (PRO). 

    But point sprites are not cure-all, the figures are not much better as compared to usual polygons. However, point sprites are more convenient in programming and, first of all, for all kinds of systems of particles. 
     
     

3D graphics, 3DMark2001 SE synthetic tests

All measurements in 3D are taken at 32bit color depth.

Fillrate

 





The dependence is very similar to Pixel Filling from RightMark 3D, but here it is less explicit. Besides, the drivers couldn't make NV31 work in 4x1 mode, and the chip revealed only half of its potential. 

Multitexturing: 




The results also remind those obtained in the RightMark 3D tests. NV31 does work in 2x2, which is preferable for multitexturing. 

Pixel Shader

Simple variant: 




NV31 works much faster than NV34. This test actively samples textures but makes few calculations; besides, all calculations are made in the integer format (shaders 1.1). Probably, big texture caches of NV31 help it outpace NV34. 

Twice (four times for PRO version) greater number of pipelines lets RADEON 9500 take the highest positions. On the other hand, in spite of twice fewer pipelines NV31 keeps up with RADEON 9500 due to a high core frequency and more efficient pipelines. 

Let's see if the picture changes with more intensive calculations of the pixel shaders: 




Now R300 is leading in all versions (8 and 4 pipelines) proving that NV3x family is weaker in calculations and stronger with textures. 

Vertex Shaders

 





The VS test brings results similar to what we got before. However that may be, but we are inclined to trust the synthetic tests of RightMark 3D which do not show such strong dependence on resolution. 

In general, 3D Mark 2001 scores match RightMark 3D, though they carry less information because synthetic tests parameters are unadjustable. 

Summary on synthetic tests

Let's sum up the detailed examination of various units of NV31 and NV34 in the synthetic tests. 

NV31 is able to work in 4x1 and 2x2 configurations  depending on textures and applications. NVIDIA thus removed the NV30's setback affecting performance in old and modern applications that actively work in single-texture mode. 

Both NV34 and NV31 support hardware geometry processing. It's interesting that its performance is 2.5 times (not 2 or 3) lower than that of NV30 running at the same clock speed. Probably NVIDIA incorporated a wide VLIW geometrical processor similar to 3dlabs P10 with an ALU array able to simultaneously process several vertices. Such design is the most optimal regarding efficiency (operation of ALU) and allows scaling geometrical performance even if it's not a whole number. 

If NV34's pixel architecture reminds NV30, NV31 is more similar to NV35 - a more flexibly configurable chip. However, NV31 doesn't show computational advantage of pixel shaders 2.0 over NV34 - it demonstrates half of NV30' computational performance, - probably the pixel part differs from NV30. Probably, it uses universal ALUs for texture sampling, stages and pixel shaders of both versions; it helps to save on transistors and let us hope for performance gain with newer drivers. 

NV34's pixel architecture is to be studied further - in the fillrate test it demonstrates equal results for all shader types, in contrast to NV30 or NV31. At the same time, the character of the results is similar to NV31 in case of shaders 2.0. Probably, NV34 use certain aggressive optimizations on the driver level. 

So, in real applications performance difference between NV31 and NV34 will depend on queue and cache sizes. Regarding basic units (geometrical, pixel) these chips perform equally at the same clock speed. 

Now it's clear how separate units are going to work; real applications and prices will decide the rest. 

3D graphics, 3DMark2001 game tests

Anisotropy was set to 16x for ATI's cards and to 8x for NVIDIA because algorithms of this function considerably differ (we discussed it in NV30 Review). The criterion is just one: maximum quality. The screenshots were shown several times already. Besides, it's interesting to compare NVIDIA's different anisotropy modes (Application, Balanced, Aggressive) with the ATI's high-quality mode; our readers can estimate how speed and quality correlate looking at the screenshots from NV30 Review demonstrating anisotropic quality. 

The tables give us all necessary data. Just remember that we compared GeForce FX 5200 Ultra and RADEON 9500 64MB though they are positioned for different market sectors. However, the starting price of the most expensive FX 5200 Ultra card might go below $149, i.e. close to RADEON 9500 64MB. 
 
 

3DMark2001, 3DMARKS



















3DMark2001, Game1 Low details




















 

3DMark2001, Game2 Low details




















 

3DMark2001, Game3 Low details




















 

3DMark2001, Game4



















In 3DMark2001 NV31 easily competes against its rivals in the tough modes with AA and/or anisotropy, and NV34 performs well fighting against RADEON 9000 PRO and GeForce4 MX 440-8x, the former low-end market leader. Sometimes it loses to RADEON 9500 64MB but production of RADEON 9500 will cease at the end of March, and prices for this card are not going to fall down, in contrast to NV34. 

3D graphics, 3DMark03 game tests

 


3DMark03, 3DMARKS








3DMark03, Game1

Wings of Fury Test characteristics: 

  • DirectX 7.0; approx. 32000 polygons on the scene, 16 MB memory used for textures, 6 MB for buffers for vertices and 1 MB for indices. 
  • All geometrical operations are based on Vertex Shaders 1.1 which can be emulated with CPU (if there is no hardware support). 
  • All planes have 4 texture layers, that is why accelerators able to process 4 textures in a pass will benefit. 
  • Fire and tail effects are made with the point sprite and other techniques. 






3DMark03, Game2

Battle of Proxycon: 

  • DirectX 8.1; Approx. 250 000 polygons on the scene with Pixel Shaders 1.1 (and 150 000 polygons on the scene with Shaders 1.4), 80 MB memory used for textures, 6 MB for buffers for vertices and 1 MB for indices.
  • All geometrical operations are based on Vertex Shaders 1.1 which can be emulated with CPU (if there is no hardware support). 
  • All heroes are "dressed" with the vertex shaders as well. 
  • Some light sources made dynamic shadows with a stencil buffer. 
  • All pixel operations are carried out with shaders 1.1, and if possible with shaders 1.4. 
  • Calculation of per-pixel lighting for haze effects and other components. 
  • Accelerators supporting pixel shaders 1.1 use one pass for determining Z buffer, then 3 passes for each light source. If an accelerator supports shaders 1.4, it needs one pass for each light source. 






3DMark03, Game3

Trolls' Lair: 

  • DirectX 8.1; approx. 560 000 polygons on the scene with Pixel Shaders 1.1 (and 280 000 polygons on the scene with Shaders 1.4), 64 MB memory used for textures, 19MB for buffers for vertices and 2 MB for indices. 
  • All geometrical operations are based on Vertex Shaders 1.1 which can be emulated via CPU (if there is no hardware support). 
  • All heroes are "dressed" with vertex shaders as well. 
  • Some light sources made dynamic shadows with a stencil buffer. 
  • All pixel operations are carried out with shaders 1.1, and if possible with shaders 1.4. 
  • Calculation of per-pixel lighting for haze effects and other components. 
  • Realism of the heroine's hair is achieved with physical models and anisotropic lighting. 






3DMark03, Game4

Mother Nature: 

  • DirectX 9.0; approx. 780 000 polygons on the scene, 50 MB memory used for textures, 54MB for buffers for vertices and 9 MB for indices. 
  • Every leaf is separately animated with Vertex Shaders 2.0. Grass is animated with vertex shaders 1.1. 
  • Lake's surface is formed with pixel shaders 2.0. 
  • Sky is made with pixel shaders 2.0, sun glints are formed with extra-precision calculations in DX9. 
  • Earth surface is made with shaders 1.4. 






3DMark03 revealed that NVIDIA's new-comers lost in tough scenes with all latest techniques used. However, games like the above ones will never appear. But still, we can see that the shaders' low speed in NV3x has a strong effect on the overall performance in the tests based on DX9 technologies. But... When such games (DX9) come on the market, we'll get a new generation of accelerators even on the Low/Middle markets. 

3D graphics, game tests

3D games used to estimate 3D performance: 

  • Return to Castle Wolfenstein (MultiPlayer) (id Software/Activision) - OpenGL, multitexturing, Checkpoint-demo, test settings - maximum, S3TC OFF, the configurations can be downloaded from here

  •  
  • Serious Sam: The Second Encounter v.1.05 (Croteam/GodGames) - OpenGL, multitexturing, Grand Cathedral demo, test settings: quality, S3TC OFF 

  •  
  • Quake3 Arena v.1.17 (id Software/Activision) - OpenGL, multitexturing, Quaver, test settings - maximum: detailing level - High, texture detailing level - #4, S3TC OFF, smoothness of curves is much increased through variables r_subdivisions "1" and r_lodCurveError "30000" (at default r_lodCurveError is 250 !), the configurations can be downloaded from here 
  • Unreal Tournament 2003 Demo (Digital Extreme/Epic Games) - Direct3D, Vertex Shaders, Hardware T&L, Dot3, cube texturing, default quality 

  •  
  • Code Creatures Benchmark Pro (CodeCult) test demonstrates operation of cards in the DirectX 8.1, Shaders, HW T&L. 

  •  
  • AquaMark (Massive Development) test demonstrates operation of cards in the DirectX 8.1, Shaders, HW T&L. 

  •  
  • RightMark 3D v.0.4 (one of the game scenes) - DirectX 8.1, Dot3, cube texturing, shadow buffers, vertex and pixel shaders (1.1, 1.4). 

Quake3 Arena, Quaver

 




















NVIDIA's drivers are perfectly optimized for games based on Quake3 engine, and the new-comers win the battle. 

Serious Sam: The Second Encounter, Grand Cathedral

 




















Although ATI's drivers are optimized for this game, NV31 manages to win in the tough modes. But NV34 loses even to RADEON 9000 PRO. 

Return to Castle Wolfenstein (Multiplayer), Checkpoint

 




















Strangely enough but the picture is similar though the game is built on the Quake3 engine. It's all because of the drivers... If you remember, exactly in this test NV30 had its performance jumping from 70 to 120 fps in 1600x1200 between the driver version 42.*. 

Code Creatures

 





The slow shaders are pushing to the bottom again... 

Unreal Tournament 2003 DEMO

 




















It's different here: sometimes the cards lose but the victory comes in the tough modes again. 

AquaMark

 




















This test shows how optimized the anisotropy operation is. Compare the newer and older NVIDIA's solutions in this mode. In general, they keep to the tradition and win in the tougher modes. 

RightMark 3D

 




















This test strongly depends on shaders, and NVIDIA's models do not shine here. 

DOOM III Alpha

 











This test shows that the outcome can be perfect if operation of the processor and game is properly optimized. This game uses capabilities of the new NV3x architecture the best way. 

3D graphics quality

ANISOTROPIC FILTERING

Operation of anisotropic filtering of the new family was carefully studied in NV30 Review. That is why today you will be given just several examples. 

NV3x has actually three types of anisotropic filtering: one depends on application, another is balanced between performance and quality and the third one prefers performance to quality (aggressive type). The balanced one is used by default. We measured speed of both this type and the aggressive one. In some driver versions the name of Aggressive mode can change for Performance mode. 

Have a look at the difference between these types. 


 
Application Balanced Aggressive
RightMark3D









Anisotester (from Unwinder)









Serious Sam: TSE










 

The anisotropic types differ both in speed and quality. They manipulate MIP-mapping and trilinear filtering. It's noticeable in XMAS, - the program that tests anisotropy in OpenGL. 


 
Modes Application Balanced Aggressive
Example 1
Bilinear 








Trilinear 








Anisotropic 8x 








Example 2
Anisotropic 8x 








In Performance/Aggressive mode there are almost no traces of trilinear filtering. However, in case of RADEON 9000/8500/9200/9100 the trilinear and anisotropic filtering types can't coexist. Moreover, these cards have inferior quality of anisotropy. That is why I think the Balanced mode is a good compromise between speed and quality (as you know, ATI's fans have to put up with the fact that anisotropy doesn't work to its full capacity at all angles in case of RADEON 9500/9700). 

3D quality as a whole

AA performance was carefully studied in GeForce FX 5800 Review and here I just must say that we couldn't make some screenshots in several AA modes because AA is processed with some post-filter after all data are transferred from the frame-buffer to RAMDAC, and any screenshot is a frame buffer's copy. 

Conclusion

First comes the speed aspect. Well, the cards sometimes lose, sometimes win. Obviously, a user should know about 3D techniques, about anisotropy and AA even if he or she wants to buy a mid-level accelerator because only in these modes of high quality NVIDIA's new-comers yield best results. 

It's quite possible that the company will release a more powerful NV31 based card with higher core clock speed and DDR memory support together with NV35. The card will stand against new mid-range RADEONs which haven't reached the stores yet: 

  1. At the starting prices which are surely going to be very high, both in NV31 and NV34 families the top solution will somewhere lose to its competitors. 
  2. On the upside, these are latest accelerators supporting all up-to-date technologies. They markedly outdo the previous generation (GeForce4 Ti) in many tests in spite of more modest specs of the chips regarding the traditional approach to defining a peak fillrate. 
  3. The launch of ATI's RV350 is still far (at least a month away), that is why if ATI's solution is going to be faster than NV31, the latter will have enough time to cut its price significantly. 
  4. However, there are some disadvantages. A low shader speed typical of the whole NV3x family slows down operation in some modern applications, and only AA and/or anisotropy can help it. But if a game actively uses texturing, NV31/NV34 will sweep the field (for example, in DOOM III). 
  5. The best advantage of these chips is the fact that at less than $100 users can admire the beauty of modern games, tests and demo version. 
  6. I wish more NV30's weaknesses were done away with in these chips. 


 
 
 

Andrey Vorobiev  (anvakams@ixbt.com)  
Alexander Medvedev  (unclesam@ixbt.com


Write a comment below. No registration needed!


Article navigation:



blog comments powered by Disqus

  Most Popular Reviews More    RSS  

AMD Phenom II X4 955, Phenom II X4 960T, Phenom II X6 1075T, and Intel Pentium G2120, Core i3-3220, Core i5-3330 Processors

Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
February 1, 2013 · Processor Roundups

Inno3D GeForce GTX 670 iChill, Inno3D GeForce GTX 660 Ti Graphics Cards

A couple of mid-range adapters with original cooling systems.
January 30, 2013 · Video cards: NVIDIA GPUs

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

AMD FX-8350 Processor

The first worthwhile Piledriver CPU.
September 11, 2012 · Processors: AMD

Consumed Power, Energy Consumption: Ivy Bridge vs. Sandy Bridge

Trying out the new method.
September 18, 2012 · Processors: Intel
  Latest Reviews More    RSS  

i3DSpeed, September 2013

Retested all graphics cards with the new drivers.
Oct 18, 2013 · 3Digests

i3DSpeed, August 2013

Added new benchmarks: BioShock Infinite and Metro: Last Light.
Sep 06, 2013 · 3Digests

i3DSpeed, July 2013

Added the test results of NVIDIA GeForce GTX 760 and AMD Radeon HD 7730.
Aug 05, 2013 · 3Digests

Gainward GeForce GTX 650 Ti BOOST 2GB Golden Sample Graphics Card

An excellent hybrid of GeForce GTX 650 Ti and GeForce GTX 660.
Jun 24, 2013 · Video cards: NVIDIA GPUs

i3DSpeed, May 2013

Added the test results of NVIDIA GeForce GTX 770/780.
Jun 03, 2013 · 3Digests
  Latest News More    RSS  

ASUS Launches R9 200, R7 200 Series, Matrix R9 280X Graphics Cards

Apacer Launches SATA SLC-lite SSD solutions

ADATA Introduces a Stylish External HDD HC630

Samsung Introduces New Wireless Multiroom Speakers

WD Gives Consumers a Cloud of Their Own

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  About us & Privacy policy  ·  Twitter  ·  Facebook


Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.