NVIDIA GeForce3 and modern processors

By Alexander Medvedev

We have recently seen the press-releases from NVIDIA on optimization of GeForce3 operation with the most efficient processors - Intel Pentium4 and AMD Athlon. Now we are going to examine the efficiency of such optimization.

Why there is still something that depends on a processor

At first sight, it seems that the more powerful the T&L is (of the modern accelerator), the less it depends on a processor. But it's wrong. There are a lot of aspects exhausting the processor despite the hardware T&L. As a rule, all those calculations depend on the definite applications and are not directly connected with drivers. And what then - can we then optimize at the drivers' level? The answer will be the following:

Transmission of the initial geometrical information for the following processing by T&L block.
Conversion of the vertex formats which are not supported hardwarely, other preliminary processing.
Optimization of models (sorting and grouping) that accelerates their output.
Texture compression.
Texture positioning and management.

The experience show that even the drivers of the latest accelerators such as GF3 can show themselves. A huge number of polygons and textures which are successfully processed by modern accelerators must be not only generated but also transferred to the accelerator. And you have to transfer geometrical data with every frame. Such load can entirely load an average processor which is trying to provide the insatiable GeForce3 with data. Say, 4-10 million polygons per second (a real figure for the GF2 Ultra or the GF3). Now let's estimate a volume of geometrical data: 10 million triangles * 1..2 vertices per triangle on average. Each vertex is not only XYZ coordinates in the floating format, but it is also texture coordinates (2 for each), and, possibly, a normal vector or two arbitrary attribute vectors (colors). The figure is around 0.5 GBytes/s. And the figure doesn't account for textures! The limiting factor is vivid - in such situation everything depends on a processor, a chipset and AGP (i.e. a platform). Many modern games (including the Q3 which hardly uses T&L and Unreal/Unreal Tournament) transfer unnecessary data via the bus. It's rational to store all graphics data in the accelerator's local memory and process them right there. This trend starts being applied in the DX8 (pure hardware vp). But anyway, data are to be renewed. By the way, API in this case is a severe limiting factor - the OpenGL is not capable of transferring even a tenth part of the aforementioned number of triangles. And here, at the OpenGL ICD level, the driver's optimization for the definite platform plays a considerable role.

The GLmark test has aroused many questions. The code intended for the new NVIDIA extensions gives a great gain on the accelerators of this company. Note that such program can't serve a universal test. But the GL Mark itself is an example of the OGL programs of the future.

Tests

The tests were carried out with the 10.80 version of drivers (Developers Release for GeForce 3). The operating system and other parameters of the platform correspond to our 3Digest (Windows98 SE2 + DX8). There is no a significant performance increase as compared with earlier 10.50 (and 10.40 for w2k), but the compatibility with applications and stability have risen considerably. It's still unclear why there is no volume texture support. Although it's included in the driver, it's still locked.

Testbeds

Pentium III - 1000 MHz based PC:

CPU Intel Pentium III - 1000EB (133 MHz FSB);
Mainboard Chaintech 6OJV (i815);
RAM 256 MBytes PC133 (2xDIMM Transcend);
HDD IBM DPTA 20 GBytes UltraDMA/66;

Athlon - 1.2 GHz based PC:

CPU AMD Athlon Thunderbird - 1.2 GHz (266 MHz FSB);
Mainboard Soltek 75KAV-X (VIA KT133A);
RAM 256 MBytes PC133 (2xDIMM Hyundai);
HDD Fujitsu 10 GBytes UltraDMA/66

Pentium4 - 1.5 GHz based PC:

CPU Intel Pentium 4 - 1.5 GHz;
Mainboard ASUS P4T (i850);
RAM 256 MBytes ECC PC800 RDRAM (2x RIMM Buffalo);
HDD 45 GBytes IBM Deskstar GXP75

We also used the Quake3 and 3D Mark. The detailed information concerning the settings can be found in the 3Digest.

3D Mark 2000

NVIDIA GeForce3

NVIDIA GeForce2 Ultra

So, a clear dependence on the platform. The max gain is won by the P4, the Athlon is the second. Interestingly that the Game1 test is much better for the GeForce3 than for the GeForce2 Ultra, and the Game2 test is not the best argument for this card to be bought. Anyway, considering the release of the 3D Mark 2001 do not pay too much attention to the scores - the games are too strongly connected with the DX7. And the synthetical tests are the best. Let's look at he results of the T&L test:

1 light source

4 light sources

8 light sources

In case of one light source the GeForce3 can be scaled having lifted up its result on the more powerful processors and outshines the GeForce2 Ultra whose T&L works at higher frequency. The improved caching and enhanced selection of triangles help the GeForce3 to enable the whole potential of the more powerful processors having organized the successful interaction with them. With the increase of the number of light sources the advantages of the GF3 get smaller than GF2 Ultra (taking into consideration difference in frequency). The GeForce3 has a hidden potential of T&L performance which can be used only by the latest processors. Let's look at the textures:

16 MBytes textures

32 MBytes textures

64 MBytes textures

The GeForce3 is an unquestionable leader:

In shading performance for small volumes of textures (16-32 MBytes) when everything is kept in local buffer, and it is only the caching efficiency and accelerator operating algorithms with its local memory that matter, the OverDraw is absent in the test scene and the HSR doesn't influence the results.
In advantage received on the advanced processors from the higher texture processing speed and loading in the local memory (the test with 64 MBytes).

The Pentium4 is a leader, the Athlon just slightly lags behind. In case of small volume of textures the Athlon on the GeForce3 falls behind even the Pentium III, and in case of huge volume it leads on the GeForce2 Ultra.

Quake3 Arena

The results depend on the CPU, and the owners of the GF3 should think about it. The Pentium4 with its wide bus goes forward, the second one is the Athlon followed by the Pentium III. With the resolution increase the lag gets thinner, but at the resolution which is the most popular for games (1024) the difference is still great.

However that may be, we should wait for the new generation of games and test (for example, 3D Mark 2001). Then we can define whether the performance stronger depends on the processor or not. The both ways are possible - a more complicated geometry of games will be balanced by usage of shaders, by storing of the geometry in the local memory and API cards which are optimized for higher bandwidth. We are expecting the 11th Detonator and a new 3D Mark some day soon.

Write a comment below. No registration needed!