We have recently tested the new NVIDIA's flagship - GeForce FX 5800 Ultra earlier known as NV30. But what is NV30? Chip or family? Actually we have two chips working at different clock speeds, hence 5800 Ultra and 5800. But NV30 can be considered a founder of a new family as it offers a new architecture and a new approach to making graphics processors.
Now a chip can have a different number pipelines. Actually, the driver can configure VPU's operation separately for each scene. It makes accelerators pretty flexible in modern games. Unfortunately, some old games using single texturing can't make use of all advantages of new processors, and a lot depends on drivers settings in games.
Beside the flexible configuring NV30 supports DX9, and, therefore, shaders 2.0 and 2.0+. All the advantages of the new Microsoft's API are listed in the reviews mentioned above.
The history demonstrates that all progressive 3D technologies were integrated into top High-End accelerators coming at high prices and affordable for a very narrow range of users. That is why game developers preferred to use old tried techniques in new games rather than new technologies. Why to spend time and money on shaders for more realistic images if only 5% of users are going to see it? Development of flexible games adjustable for different accelerators can be too costly. That is why development of the game market was held back by GeForce2 and TNT2 flooding the stores in spite of a great number of 3D games (they had rather moderate volumes of polygons in a scene and unimpressive effects).
Up to 2003 NVIDIA was bringing only weak solutions onto the sub-$100 market, such as GeForce4 MX, deprived of even DX8 support. Even today all NVIDIA's accelerators with shader support are dearer than $120-130. On the contrary, ATI brought into this sector its RADEON 9000(PRO) with DirectX 8.1 support yet in summer 2002. It was first modern accelerator in the Low-End sector. But RADEON 9000 failed to gain wide popularity because of problems in drivers, incompatibility with some mainboards and low build quality of some boards coming from ATI's partners (you can forget about ATI's famous 2D quality). Finally, it was wrong to give the weaker solution a higher number in the name: both RADEON 9000 and RADEON 8500 support the same technologies (but the latter supports hardware TrueForm in contrast to 9000).
Although GeForce4 MX was getting less popular, RADEON 9000 was a too weak competitor. But it worried NVIDIA that new technologies too slowly penetrated into the market because low and middle sectors were flooded with DX7 accelerators (not even DX8). That's why the first step in developing NV30 was to expand DX9 techniques to Low-level and Mid-level accelerators.
ATI offers something alike (RADEON 9500 64MB/128MB, 9500 PRO). But the lowest price its solutions reached (9500 64MB) is $150. Moreover, ATI has a too wide gap between its models - RADEON 9000 PRO costs less than $100. In the range of $100-150 ATI has only old RADEON 8500 which is of little interest today. NVIDIA's line codenamed NV34 targets exactly this sector and also aimed at pressing out RADEON 9000/PRO. All GeForce4 MX will either go down to $50 or leave the market at all. When RADEON 9200 arrives (the same RADEON 9000 with AGP 8x), it will be fighting against the cheapest NV34 models.
NV31 line is positioned against RADEON 9500-9500 PRO and oncoming RADEON 9600/PRO. NVIDIA GeForce4 Ti 4600(4800) and 4200 (4200-8x) will be replaced with GeForce FX 5600 Ultra and 5600.
So, in spring 2003 NVIDIA rolls out a new batch of accelerators for Low and Middle market sectors.
The prices given in parenthesis are rough and can fall down.
Now have a look at the architecture of GeForce FX 5600 and 5200:
The cards have
Both cards are identical as they are assembled on the same PCB.
Their PCB is shorter than that of GeForce FX 5800 and GeForce4 Ti 4600. Nevertheless, external power supply is required. NV31 and NV34 have 4-pin connectors for a power supply unit, like that of NV30. If you don't connect it the driver will say the following (in course of OS booting):
But if NV30 has much worse performance without external power supply, NV31/34 remain indifferent.
Unlike FX 5800 Ultra, the new cards are bundled with plain coolers like on GeForce4 Ti 4600:
The cooler is made of a closed heatsink with a fan pumping air through it. The heatsink has an almost mirror cover.
The processors hidden under the heatsinks traditionally have a plastic cover with a metallic plate in the center. The packages are of the same size.
The cards have an external TV coder - Philips 7114.
In case of NV31 clock speeds are not separated for 2D and 3D in the drivers, and it's possible to adjust a single frequency; as a result, the card reaches 405/810 MHz. In NV34 the frequencies are separated (like in NV30), though the card belongs to the low-end sector. The frequencies are identical - 325/650 MHz, but the card doesn't go higher than 355/720 MHz because of bugs (though it doesn't hang). Therefore, the measurements were taken at 350/700 MHz to compare NV31 and NV34 at equal clock speeds in some tests.
Testbed and drivers
We used NVIDIA's drivers v42.72, VSync off, texture compression off in applications. DirectX 9.0 installed.
Video cards used for comparison:
The driver settings are given in GeForce FX 5800 Ultra Review.
Note that you can access some disabled tabs using this patch for Windows XP registry.
Quality is superb. Both cards excellently work at 1600x1200@85Hz.
Remember that 2D quality much depends on certain samples, monitors and cables; moreover, certain monitors might not work properly with particular video cards. 2D tests were carried out with ViewSonic P817-E monitor and Bargo BNC cable.
RightMark 3D synthetic tests (DirectX 9)
Today we will describe and run the suite of synthetic tests we are currently developing for the API DX9.
The developed RightMark3D test suite now includes the following synthetic tests:
The philosophy of these synthetic tests and their description are given in NV30 Review.
Those who are eager to try RightMark 3D synthetic tests can download the "command-line" test versions which record the final XLS file in the XML format accepted in Microsoft Office XP:
Every archive contains description of test parameters and an example of a .bat file used for benchmarking accelerators. We welcome your comments and ideas as well as information on errors or improper behavior of the tests.
Below are the data obtained with budget and mainstream accelerators based on two major families (ATI and NVIDIA).
Number of pixel pipelines and their configuration
First of all, let's find out the actual number of pipelines and texture units of new GeForce FX models.
The configuration of 2x2 - two pixel pipelines with two texture units on each - is obvious. The figures are very close to the theoretical values of this formula. Now we add pixel shaders 1.1 and 2.0:
The results are surprisingly the same! Either all operations are executed by a single ALU (whether it is floated-point or fixed-point is yet to be found out), or the drivers use aggressive optimization and bring the shaders to the minimal functional version. I.e. the tasks which can fit into the blend stage frames (register combiners) supported yet in DX7 cards are optimized this way. Later we will look into performance of complex shaders 2.0 to verify it.
In programs without shader support this chip works as 4x1 in case of one texture and as 2x2 (!) in case of three textures. Consequently, the formula of 2x2 is also used for 2 and 4 textures. The same formula is adopted for shaders 1.1. With the second version the scheme is 2x2, like in NV30, but the operation is slower because textures can't be sampled simultaneously with shader's computational operations. NV31 can be called a 2x2 chip with 4x1 optimization for a particular case when pixel shaders are not used in applications. It seems that chip's pixel unit consists of an array of flexibly configurable ALUs (stages) which can form different numbers of pipelines depending on a given situation (stages or shader settings).
RADEON 9500 PRO takes the lead with its 8 pixel pipelines (no-texture filling depends on the number of pixel pipelines, core clock speed, memory throughput and its effective utilization). It is followed by NV31 (GeForce FX 5600 Ultra) which has 4 pipelines in this test but the gap is not double - RADEON 9500 PRO is limited by memory, in contrast to 5600 Ultra, proving that the latter is more balanced. Will it remain so well-balanced with more intensive shaders? Later we'll see it.
5600 Ultra is, as we can see, a DX9 replacement for 4600. But this substitute sells at a much lower price. In MSAA the chips have equally strong slowdown.
NV34 performs much worse because of only two pixel pipelines. This chip
will be a perfect DX9 replacement for GeForce 4 MX 440. But it's also interesting
whether the enhanced architecture can help the new chip make a giant jump
forward in real applications.
The picture is actually the same though the peak values are a little lower. But the scores of RADEON 9500 PRO do not correspond to its capabilities, probably because of this combination of the test and drivers. We are currently discussing it with the experts at ATI. Note that neither RADEON 9700 nor 9700 PRO (see RADEON 9700 PRO DX9 Part 2) had such bugs!
Let's see whether the reality goes along with theoretical limits based on the core's frequency and number of pipelines:
Everyone except RADEON 9500 PRO realized its potential. But, as we mentioned above, RADEON 9500 family strangely performs in the single-texture mode.
The configuration of 4x2 of GeForce 4 4200 was supposed to bring it advantage in multitexutring modes (2 to 4 textures). Now let's see whether this advantage will have an effect in real applications.
For comparison we took results of NV34 overclocked up to NV31 regarding memory and core clock speeds, and those of NV30 at the frequency of NV31. In case of single texture when blend stages may not be used these chips are on the same level but in other cases NV30 is twice as efficient as NV31.
With tougher anisotropy settings the NV25 (4200 and 4600) loses its performance. It was discussed in depth in our previous reviews. But NV31 and NV34 are slowing down just a bit faster than ATI. RADEON 9500 again demonstrates strange performance.
Geometry Processing Speed
The scores are sorted out according to complexity of a lighting model used.
The lowest group is the simplest variant which corresponds to the peak
accelerator's vertex throughput. Geometrical performance of NV31 and NV34
is identical in most applications; they differ a little only in case of
the simplest variant because of the vertex cache volumes rather than of
geometry processing speed. In other tasks NV31 and NV34 work 2.5 times
slower than NV30 in spite of the same clock speed, - obviously, the number
of vertex ALUs is 2.5 smaller. The test implies that both chips support
hardware geometry processing. Here NV31 and NV34 lag behind RADEON 9500
PRO by 1.5 times on average. But remember that RADEON 9600 PRO will have
twice weaker geometrical performance!
The layout of NV30, NV31 and 34 hasn't changed - the chips use geometry
units which are similar in organization. RADEON 9500 shines again - its
geometrical power is not limited and reaches RADEON 9700 which makes it
look much superior than NV31 and NV34.
It's similar to vertex shaders 1.1, with the gap from ATI being a little bigger.
So, T&L hardware emulation implemented by ATI is less efficient than NV's one and it is comparable to vertex shaders 2.0. The strongest link of NV3x is TCL emulation. The weakest one is loops. In this respect ATI has more flexible optimization in the drivers - transition and loop static execution allows for a more aggressive optimization.
NV31 and NV34 have identical performance in vertex tasks but they are
2.5 times slower than NV30 despite the same frequency.
As shaders get more complicated and detail levels higher, NV30 improves its position (due to vertex caches and other balancing aspects). In contrast, NV31 and NV34 are optimized not for scenes of moderate detail levels.
Hidden Surface Removal
NV31 and NV34 have HSR support, with NV34 having it a bit more efficient. Besides, GeForce 4 MX is also equipped with it! RADEON 900 PRO, 9500 and GeForce 4 Ti have it disabled. In RADEON 9500 PRO, as well as in RADEON 9700 PRO HSR works to its full capacity.
R300 uses an hierarchical structure, surfaces are often removed on a
higher level, while the NV30 has only one decision-making level combined
with tiles used for depth information compression. In 1600x1200 R300's
HSR gets much less effective - probably the hierarchical depth buffer is
not used anymore (e.g. for the sake of memory), and decisions are made,
like in the NV30, only on the lowest level combined with the compressed
blocks in the depth buffer.
Both NV31/NV34 and R300 boost their performance. NVIDIA's NV31 and NV34 almost reached the record broken by RADEON 9500 PRO, and in the maximum resolution they even outscore it!
This test is carried out only for R300 and NV3x as hardware execution of pixel shaders 2.0 is the minimal requirement here. Thus, software emulation of pixel shaders 2.0 provides only one frame per 2 seconds in small window on the good old GeForce4 Ti 4600 coupled with 2 GHz Pentium 4.
Running at the same frequency NV31 and NV34 work shoulder to shoulder with pixel shaders 2.0 but their performance is twice slower than NV30's one. R300 leads sometimes thanks to 4 (8) pixel pipelines against 2 in case of NV31 and NV34.
It's ok - the dependences coincide, and operation depends only on the number of shaded pixels.
Performance doesn't change whether we forcedly set 16 or 32 bit precision
in DirectX; most likely, the drivers choose precision themselves ignoring
the respective command modifiers in shaders.
As expected, lighting influences only small sprites; as they grow performance gets limited by the fillrate (approximately at the size of 8). So, for rendering systems comprising of a large number of particles the size should be less than 8.
NV31 and NV34 go behind NV30 clocked at the same frequency by the same margin we saw above. With bigger sprites the difference gets weaker due to the memory throughput (we forcedly slowed down NV30). It means that memory throughput is getting more and more influencial.
The peak is reached without lighting and makes 8 M sprites per sec for NV31/34 and about 17 M for NV30 brought down to the speed of GeForce 5600 Ultra and for RADEON 9500 (PRO).
But point sprites are not cure-all, the figures are not much better
as compared to usual polygons. However, point sprites are more convenient
in programming and, first of all, for all kinds of systems of particles.
3D graphics, 3DMark2001 SE synthetic tests
All measurements in 3D are taken at 32bit color depth.
The dependence is very similar to Pixel Filling from RightMark 3D, but here it is less explicit. Besides, the drivers couldn't make NV31 work in 4x1 mode, and the chip revealed only half of its potential.
The results also remind those obtained in the RightMark 3D tests. NV31 does work in 2x2, which is preferable for multitexturing.
NV31 works much faster than NV34. This test actively samples textures but makes few calculations; besides, all calculations are made in the integer format (shaders 1.1). Probably, big texture caches of NV31 help it outpace NV34.
Twice (four times for PRO version) greater number of pipelines lets RADEON 9500 take the highest positions. On the other hand, in spite of twice fewer pipelines NV31 keeps up with RADEON 9500 due to a high core frequency and more efficient pipelines.
Let's see if the picture changes with more intensive calculations of the pixel shaders:
Now R300 is leading in all versions (8 and 4 pipelines) proving that NV3x family is weaker in calculations and stronger with textures.
The VS test brings results similar to what we got before. However that may be, but we are inclined to trust the synthetic tests of RightMark 3D which do not show such strong dependence on resolution.
Summary on synthetic tests
Let's sum up the detailed examination of various units of NV31 and NV34 in the synthetic tests.
NV31 is able to work in 4x1 and 2x2 configurations depending on textures and applications. NVIDIA thus removed the NV30's setback affecting performance in old and modern applications that actively work in single-texture mode.
Both NV34 and NV31 support hardware geometry processing. It's interesting that its performance is 2.5 times (not 2 or 3) lower than that of NV30 running at the same clock speed. Probably NVIDIA incorporated a wide VLIW geometrical processor similar to 3dlabs P10 with an ALU array able to simultaneously process several vertices. Such design is the most optimal regarding efficiency (operation of ALU) and allows scaling geometrical performance even if it's not a whole number.
If NV34's pixel architecture reminds NV30, NV31 is more similar to NV35 - a more flexibly configurable chip. However, NV31 doesn't show computational advantage of pixel shaders 2.0 over NV34 - it demonstrates half of NV30' computational performance, - probably the pixel part differs from NV30. Probably, it uses universal ALUs for texture sampling, stages and pixel shaders of both versions; it helps to save on transistors and let us hope for performance gain with newer drivers.
NV34's pixel architecture is to be studied further - in the fillrate test it demonstrates equal results for all shader types, in contrast to NV30 or NV31. At the same time, the character of the results is similar to NV31 in case of shaders 2.0. Probably, NV34 use certain aggressive optimizations on the driver level.
So, in real applications performance difference between NV31 and NV34 will depend on queue and cache sizes. Regarding basic units (geometrical, pixel) these chips perform equally at the same clock speed.
3D graphics, 3DMark2001 game tests
Anisotropy was set to 16x for ATI's cards and to 8x for NVIDIA because algorithms of this function considerably differ (we discussed it in NV30 Review). The criterion is just one: maximum quality. The screenshots were shown several times already. Besides, it's interesting to compare NVIDIA's different anisotropy modes (Application, Balanced, Aggressive) with the ATI's high-quality mode; our readers can estimate how speed and quality correlate looking at the screenshots from NV30 Review demonstrating anisotropic quality.
The tables give us all necessary data. Just remember that we compared
GeForce FX 5200 Ultra and RADEON 9500 64MB though they are positioned for
different market sectors. However, the starting price of the most expensive
FX 5200 Ultra card might go below $149, i.e. close to RADEON 9500 64MB.
3DMark2001, Game1 Low details
3DMark2001, Game2 Low details
3DMark2001, Game3 Low details
In 3DMark2001 NV31 easily competes against its rivals in the tough modes with AA and/or anisotropy, and NV34 performs well fighting against RADEON 9000 PRO and GeForce4 MX 440-8x, the former low-end market leader. Sometimes it loses to RADEON 9500 64MB but production of RADEON 9500 will cease at the end of March, and prices for this card are not going to fall down, in contrast to NV34.
3D graphics, 3DMark03 game tests
Wings of Fury Test characteristics:
Battle of Proxycon:
3DMark03 revealed that NVIDIA's new-comers lost in tough scenes with all latest techniques used. However, games like the above ones will never appear. But still, we can see that the shaders' low speed in NV3x has a strong effect on the overall performance in the tests based on DX9 technologies. But... When such games (DX9) come on the market, we'll get a new generation of accelerators even on the Low/Middle markets.
3D graphics, game tests
3D games used to estimate 3D performance:
Quake3 Arena, Quaver
Serious Sam: The Second Encounter, Grand Cathedral
Return to Castle Wolfenstein (Multiplayer), Checkpoint
Strangely enough but the picture is similar though the game is built on the Quake3 engine. It's all because of the drivers... If you remember, exactly in this test NV30 had its performance jumping from 70 to 120 fps in 1600x1200 between the driver version 42.*.
The slow shaders are pushing to the bottom again...
Unreal Tournament 2003 DEMO
DOOM III Alpha
3D graphics quality
Operation of anisotropic filtering of the new family was carefully studied in NV30 Review. That is why today you will be given just several examples.
NV3x has actually three types of anisotropic filtering: one depends on application, another is balanced between performance and quality and the third one prefers performance to quality (aggressive type). The balanced one is used by default. We measured speed of both this type and the aggressive one. In some driver versions the name of Aggressive mode can change for Performance mode.
Have a look at the difference between these types.
|Anisotester (from Unwinder)|
|Serious Sam: TSE|
The anisotropic types differ both in speed and quality. They manipulate MIP-mapping and trilinear filtering. It's noticeable in XMAS, - the program that tests anisotropy in OpenGL.
In Performance/Aggressive mode there are almost no traces of trilinear filtering. However, in case of RADEON 9000/8500/9200/9100 the trilinear and anisotropic filtering types can't coexist. Moreover, these cards have inferior quality of anisotropy. That is why I think the Balanced mode is a good compromise between speed and quality (as you know, ATI's fans have to put up with the fact that anisotropy doesn't work to its full capacity at all angles in case of RADEON 9500/9700).
AA performance was carefully studied in GeForce FX 5800 Review and here I just must say that we couldn't make some screenshots in several AA modes because AA is processed with some post-filter after all data are transferred from the frame-buffer to RAMDAC, and any screenshot is a frame buffer's copy.
First comes the speed aspect. Well, the cards sometimes lose, sometimes win. Obviously, a user should know about 3D techniques, about anisotropy and AA even if he or she wants to buy a mid-level accelerator because only in these modes of high quality NVIDIA's new-comers yield best results.
It's quite possible that the company will release a more powerful NV31 based card with higher core clock speed and DDR memory support together with NV35. The card will stand against new mid-range RADEONs which haven't reached the stores yet:
Write a comment below. No registration needed!
|blog comments powered by Disqus|
|Most Popular Reviews||More RSS|
Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
February 1, 2013 · Processor Roundups
A couple of mid-range adapters with original cooling systems.
January 30, 2013 · Video cards: NVIDIA GPUs
An external X-Fi solution in tests.
September 9, 2008 · Sound Cards
The first worthwhile Piledriver CPU.
September 11, 2012 · Processors: AMD
Trying out the new method.
September 18, 2012 · Processors: Intel
|Latest Reviews||More RSS|
Added the test results of NVIDIA GeForce GTX 770/780.
Jun 03, 2013 · 3Digests
Added the test results of AMD Radeon HD 7990.
May 21, 2013 · 3Digests
A great card for regular users.
May 13, 2013 · Video cards: AMD GPUs
16-phase design, low noise, and great performance.
Apr 26, 2013 · Video cards: AMD GPUs
Added the test results of AMD Radeon HD 7850 1024MB, AMD Radeon HD 7790 (standard and overclocked), ASUS Ares II (Radeon 7970 GHz CrossFire), ASUS Ares II CrossFire, NVIDIA GeForce GTX 650 Ti Boost, GeForce GTX Titan. Replaced the 3DMark11 and Formula 1 (
Apr 16, 2013 · 3Digests
|Latest News||More RSS|