NVIDIA GeForce4 Ti 4400 and GeForce4 Ti 4600 (NV25) Review

By Andrew Worobyew
and Alexander Medvedev

ANISOTROPIC FILTERING

The MIP-mapping technology improves image quality for scenes with objects that extend from the foreground deep into the background. It creates for each texture a set of MIP-levels (its copies of different detail levels) which are chosen based on the size and the resulting scale. The further the triangle is, the more blurry the MIP-level will be used. Trilinear filtering smoothes over sharp edges of MIP-levels. Thus, while bilinear filtering removes sharp edges between texture pixels, the trilinear one softens an image even more so that only close objects can be seen sharper. At the same time, walls which are at a too sharp angle for us are too blurry. And anisotropic filtering is used to cope with such inconvenient objects for bilinear and trilinear filterings.

Different processor makers realize this function differently. Besides, speed characteristics of anisotropic filterings of ATI and NVIDIA differ much as well. Only the resulting quality is similar.

But is that true? As you know, the NVIDIA's anisotropy (in case of GeForce3) has high quality, but it eats much as well. The performance drop can reach 50%! ATI's anisotropy (in case of RADEON 8500) is much cheaper and provides apparently the same quality.

Quality of anisotropy can be estimated by examples of walls, floors etc. And our attentive readers know that the RADEON 8500 doesn't use any anisotropy on some surfaces located at angles different from 90 degrees. Look at the screenshots of the Serious Sam game:

ATI RADEON 8500

NVIDIA GeForce4

Here are animated GIF files:

ATI RADEON 8500	NVIDIA GeForce4

At some angles the RADEON 8500 provides no sharpness. The NVIDIA GeForce3 and GeForce4 do not have such problems. It isn't good if a user can't choose between the full anisotropy with great losses and its cheaper approximation.

The GeForce4 has, in fact, the same anisotropy method as of the GeForce3, i.e. 3 levels, each having the maximum texture sampling value for realization of anisotropic filtering (Level2 - 8, Level4 - 16, Level8 - 32 samples).

NVIDIA's and ATI's approaches to anisotropic filtering realization

While bilinear and trilinear filterings are mathematically strictly defined (though some time ago NVIDIA called a trilinear filtering some approximation method - dithering of values from different MIP levels), the concept of anisotropic filtering doesn't imply definite algorithms of its realization. Approaches of NVIDIA and ATI to this issue are different. Let me show you some figures:

NVIDIA: the figure shows fetching of bilinear samples in the texture space during implementation of the anisotropic filtering. Depending on filtering quality settings and inclination of surface a standard bilinear (or trilinear) filtering is implemented from one to four times for points lying on a straight line which divides a pixel projected from the screen onto a texture surface along its long side (the line is shown with an arrow on the figure). The values obtained this way (blue circles) are averaged, and make the result of the filtering. Each value is based on four closest discrete values of the texture (rectangles) and can have its own independent coordinates. Such approach suits for arbitrarily oriented textures but it requires a great level of performance - for the visible part of triangles non-parallel to the screen the number of fetched texture samples grows up several times, so does the shading time.

The ATI's approach is more limited but more efficient as well:

The values are fetched in a line which can lie either horizontally or vertically in the texture's plane. For values of the projection vector which are close to the orts (the arrow on the figure) the filtering quality will be high, but as it turns the effect will be decreasing until this method starts making no sense at all. In real applications the filtering will work good on walls or ceilings, but the results will be vanishing on surfaces located at angles different from the right one the result will be less noticeable till the critical angle of 45 degrees is reached. However, such approach is beneficial from the computational point of view. First of all, we can choose organized lines from 2xN texture points in size (squares on the figure) which can be effectively fetched during N/2 cycles with the help of standard texture units meant for bilinear filtering. Then we filter values (circles on the figure) using every time the same offset values relative to the discrete points of the original texture. Such operation can be fulfilled at one clock by a special circuit of ten multipliers which is integrated into a texture unit; interpolation parameters are, thankfully, calculated just one time and remain unchangeable for all 1..5 calculated points. Besides, we can speed up this algorithm which is anyway efficient by calculating texture variants specially compressed on axes in advance (so called RIP mapping).

The NVIDIA's approach needs more time to get the result but it processes objects at any angles equally good, not only those positioned just horizontally or vertically. The ATI's method has a rational core as the most of modern games use mostly horizontal and vertical surfaces.

Quake3

Return to Castle Wolfenstein

3DMark2001, Game1 Low details

3DMark2001, Game2 Low details

3DMark2001, Game3 Low details

3DMark2001, Game4

As you can see, the performance of the GeForce4 falls down by a greater margin as compared with the GeForce3. The Level8 kills all advantages of the GeForce4 as far as speed is concerned. But is it possible at least at Level4 to get the same quality as that of the RADEON 8500? Yes! There will be losses relative to the Level8, but high quality can be provided by making the LOD BIAS value lower. Till recently this parameter could be changed only in the Direct3D with the help of tweakers, for example, RivaTuner. The 27.* drivers allows making it in the OpenGL as well but only in the Registry. Let's see what we can get by setting LOD BIAS to -1 for the Serious Sam: The Second Encounter.

Anisotropic filtering Level 8

Anisotropic filtering Level 4, LOD BIAS = 0

Anisotropic filtering Level 4, LOD BIAS = -1

Anisotropic filtering Level 2, LOD BIAS = -1

Well, the effect is achieved. However, there are also some side effects - moire and texture noise, but the RADEON 8500 has the same when the anisotropy is enabled (at its maximum level). Here the performance drop is not so great. At the Level2 decreasing the LOD BIAS doesn't help any more, though quality of the Level4 at LOD BIAS = 0 can be achieved.

ANTI-ALIASING (AA)

This function is used to remove a stair-step effect. When the AA is enabled the performance drop is even more considerable.

The Quincunx level is fast but it often makes textures soapy. In the GeForce4 we can use the next AA level (4x) which has excellent quality.

Let's look at quality of two the most interesting AA types of the GeForce3 Ti 500 and GeForce4 and compare them.

GeForce3 Ti 500	GeForce4
3DMark2001, Game 1
No AA



AA Quincunx



AA 4x



AA 4x	AA 4xS



3DMark2001, Game 2
No AA



AA Quincunx



AA 4x



AA 4x	AA 4xS



3DMark2001, Game 3
No AA


AA Quincunx


AA 4x


AA 4x	AA 4xS

There is no much difference between the GF3 and GF4 at AA 4x and AA Quincunx levels. The AA 4xS doesn't improve visual quality as well.

New hybrid AA mode: 4xS.

The new hybrid (MS and SS simultaneously) mode of full-screen anti-aliasing is available on NV25 based cards. Two subunits (2x1) positioned one over the other and obtained the way typical of the 2x MSAA are averaged in every original 2x2 AA unit (a usual 4x MSAA unit is shown on the right for comparison):

S1 is the first 2x1 subunit, and S2 is the second one. Samples are calculated according to the multisampling method inside the subunit, i.e. from one selected texture value, but texture values can differ in upper and lower subunits, unlike in a usual 4x MSAA. From the accelerator's standpoint we just calculate a vertically doubled image in a standard 2x MSAA mode (2x1 units). This mode can also be set up in the NV20 but only through undocumented driver parameters in the register. The NV25 based cards allow making this setting in the driver's control panel. It should be noted that the NV25 performs excellently: although the number of interpolated texture values is now twice larger, the performance differs from the 4x by just a couple of percents, and visual quality is much better. This method can't improve the situation considerably on polygons' edges - SSAA and MSAA look similar there, but textures must be now less blurry. Moreover, for horizontal surfaces (landscapes, floor, ceiling) this method implements some anisotropic filtering functions (2x quality). Later we will examine closely realization of the 4xS on real images and performance results.

And now let's estimate a performance drop.

Quake3

Return to Castle Wolfenstein

3DMark2001, Game1 Low details

3DMark2001, Game2 Low details

3DMark2001, Game3 Low details

It is interesting that in AA 2x and AA Quincunx the performance is almost the same (thanks to a great memory bandwidth and optimization of the GeForce4 Ti 4600 in the multisampling mode). Other modes also became more attractive thanks to a greater performance of the GeForce4.

The 4xS mode (works only in Direct3D) turned out to be quite strange. Its speed and quality are approximately at the 4x level. Probably, this mode will be improved.

Joint operation of the anisotropic filtering and AA will be examined in our next reviews of production video cards on the GeForce4 Ti.

Conclusion

The bugs and disadvantages of the NV20 in comparison with the R200 were cured and optimized in the NV25.
Although the technology is the same, and the number of transistors is just a bit higher, the chip working at the same frequency as the previous model is much more efficient especially in intensive tasks and has a much higher frequency limit.
The AA has become cheaper, especially in the Quincunx.
Dual monitor support is excellent both on program and hardware levels.
The chip doesn't belong to the new generation, it is just a debugged and improved version of the previous one.
Similar technology and complexity promise that cards on this chip won't be priced higher than on the Ti500. In this respect the chip can be considered successful - much higher performance and wider capabilities at the same sum of money.
Usage of the BGA memory is a successful and justified move.
Some possibilities (anisotropy, EMBM) are worse a little than in the NV20; but taking into account a much higher clock speed it can be forgiven.
I know that many fans and owners of NVIDIA cards expected more of this chip as almost a year passed since the GeForce3 was released. However, we consider the strategy of gradual growth and optimization more justified as the activity on the IT market decays. The most of new capabilities of the DirectX 8 arrived together with the GeForce3 are just starting their way towards real applications.

The Ti 4400 will probably be positioned as a direct competitor against RADEON 8500, including the price. The Ti 4600 will take a higher position, and lack of competitors will let its price go up without limit.

The new chip of NVIDIA (NV25) is able to sit firmly in the upper-level gaming market, and probably will be the main carrier of DX8 advanced technologies.

Write a comment below. No registration needed!