NVIDIA GeForce4 Ti 4400 and GeForce4 Ti 4600 (NV25) Review

By Andrew Worobyew
and Alexander Medvedev

The NV25 chip was awaited for a long time by many as an echo of 3dfx's deals, as a competitor against ATI RADEON 8500 and as the second optimized and enriched incarnation of NV20. Let me dive directly into the root of the matter...

Attention! Before reading the review you should turn to the previous articles on NVIDIA GeForce3 (NV20) and ATI Radeon 8500 (R200).

Product line

The GeForce 4 line is based on two chips - NV17 and NV25 which is the today's main hero:

GeForce 4 Ti4600 - NV25 300 MHz core, 128 MBytes 325(650) MHz 128-bit DDR memory.
GeForce 4 Ti4400 - NV25 275 MHz core, 128 MBytes 275(550) MHz 128-bit DDR memory.
One more junior card on the NV25 will be announced later.
GeForce 4 MX460 - 300 MHz core, 64 MBytes 275(550) MHz 128-bit DDR memory.
GeForce 4 MX440 - 270 MHz core, 64 MBytes 200(400) MHz 128-bit DDR memory.
GeForce 4 MX420 - 200 MHz core, 64 MBytes 166 MHz 128-bit DDR memory.

Note:

The GeForce 3 line will be quickly replaced with the GeForce 4 one.
NV17 doesn't and won't support pixel and vertex shaders.
NV17 will have MPEG2 hardware decoder and dynamic power management system, and the NV25 does not.
NV17 has only two fill pipelines, and NV25 - four.
NV25 has a superscalar (dual) T&L unit, NV17 has a single one.
NV17 and NV25 have similar memory controllers (2x-channel one of the NV17 and a 4-channel one of the NV25).
Both chips have the same set of systems for increasing the memory effective bandwidth (Z buffer compression and fast Z clear, MSAA, HSR).
NV17 has 2 integrated LCD-panel controllers.
Both chips have two independent RAMDACs, CRTC controllers, and integrated TV-Out and DVI interfaces.

This pretty monster will help promoting and advertising NV25 based products demonstrating advanced soft illumination, skeletal animation, hair and fur made of vertex shaders and per-pixel relief:

Theory

NV25

Main architectural innovations of the NV25 (vs. NV20)

2 independent CRTC controllers. Flexible support of all possible modes and of output of two frame buffers independent in resolution and in contents onto any accessible signal receivers.
2 normal 350 MHz RAMDACs integrated in the chip (with a 10bit palette).
Integrated TV-Out.
Integrated TDMS transmitter (for DVI interface).
2 units of interpretation and implementation of vertex shaders. They promise a considerable growth of a processing speed of a scene with complicated geometry. The units can't implement a different microcode of shaders, and the only advantage of processing of two vertices simultaneously is a performance increase.
Improved fill pipelines provide hardware support of pixel shaders up to v1.3 inclusive.
According to NVIDIA, an effective fillrate in the MSAA modes got higher, and now 2x AA and Quincunx AA modes will cause much less performance drop. The Quincunx AA is improved (positions of sample fetching are shifted). A new AA method appeared - 4xS.
Improved separate caching system (4 separate caches for geometry, frame buffer and Z-buffer).
Improved lossless compression (1:4) and fast Z clear.
Improved hidden surface removal algorithm (Z Cull HSR).

Further we will check all these declared advantages of the new chip.

The above changes are rather evolutionary rather than revolutionary as compared with the previous NVIDIA's product (NV20). But it is typical of NVIDIA to release first a product carrying a great deal of new technologies and then its improved (optimized) variant. Just take TNT and TNT2, GF256 and GF2, and now GF3 and GF4. The experience shows that usually the second variant meets with great success.

Performance characteristics

First of all, a bit of explanation:

The accelerator can't be examined ignoring its drivers. Chip's capabilities depend on whether the drivers support certain applications for two main APIs. Many characteristics from the table can depend on drivers and can be correct, first of all, for a certain version. Moreover, some possibilities can be available even if the drivers do not mention them (for example, clipping planes in D3D for NVIDIA cards). We will consider such features absent as a correctly written application mustn't try to use options the driver doesn't report about.
The most of the data relate to Direct3D, and in OpenGL these parameters can differ. There are several reasons of it, and first of all, it should be noted that this gaming API is closer to the accelerator's hardware. Besides, capabilities of modern accelerators are much dependent on the D3D specification.

And now take a look at the summary table of the key characteristics of the chips and cards tested today. Keep in mind that in the nearest quarter the ATI RADEON 8500 will be a main competitor of NV25 based cards (GeForce 4 Ti 4600 and Ti 4600) because of the postponed release of RADEON 8500XT and because first R300 based products won't appear very soon.

Card	GeForce3 Ti 500	RADEON 8500	GeForce 4 Ti 4600 (GeForce 4 Ti 4400)
Chip, revision, driver version
Chip	NV20	R200	NV25
Revision	A5	A23	A03
Driver version	27.30	6.018	27.30
Main parameters
Pipelines	4	4	4
Texture units	2	2	2
Textures in a single pass	4	6	4
Core frequency, MHz	240	275	300 (275)
Technology, micron	0.15	0.15	0.15
Fill rate (million pixels)	960	1100	1200 (1100)
Fill rate (million texels)	1920	2200	2400
RAMDAC, MHz	350	400 (+ external 240)	350*2
Local memory parameters
Memory frequency, MHz	250	275	325 (275)
Memory bus, bits	128 (DDR)	128 (DDR)	128 (DDR)
Memory size, MB	64	64	128
Memory speed, ns	3.8	3.6	2.8 (3.6)
API's
OpenGL version	1.3	1.3	1.3
DirectX version	8.1	8.1	8.1
GDI+ acceleration	Yes	Yes	Yes
Pixel pipeline
Pixel shaders	1.0, 1.1	1.0..1.4	1.0..1.3
Range of calculated color values	-1.0..+1.0	-8.0..+8.0	-1.0..+1.0
Texture fetch stages	4	8	4
Blend stages	8	8	8
Multisampling	2,3,4 samples	No	2,3,4 samples
Clipping planes	0	6	0
Vertex shader
Vertex shaders	1.0, 1.1	1.0, 1.1	1.0, 1.1
Vertex streams	16	8	16
Constants for vertex shader	96	192	96
Matrices for HW blending	4	4	4
Indexed blending	No	Up to 57 matrices	No
Light sources	8	8	8
N-Patches	No	Yes	No
RT-Patches	No	No	No
Primitives	1048575	65536	1048575
Vertices	1048575	16777215	1048575
Other parameters
Pure Device	Yes	Yes	Yes
Sprite scaling up to	64	256	8192
3D textures	Yes (with anisotropy)	Yes (without MIPMAP)	Yes (with anisotropy)
Reflection mapping	Yes (with anisotropy)	Yes (without MIPMAP)	Yes (with anisotropy)
Anisotropic filtering	Yes	Yes	Yes
Anisotropy degree up to	2,3,4 bi/trilinear sampling	2,3 bilinear sampling in a line	2,3,4 bi/trilinear sampling
Fog	FOGVERTEX FOGRANGE FOGTABLE	FOGVERTEX FOGRANGE	FOGVERTEX FOGRANGE FOGTABLE
Frame buffer
Rendering buffer formats	A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5	A8R8G8B8 X8R8G8B8 R5G6B5 A1R5G5B5 A4R4G4B4 R3G3B2	A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5
Z-buffer formats	D32 D24S8 D16 D24X8	D32 D24S8 D16 D24X8	D32 D24S8 D16 D24X8
Texture formats
Maximum texture size (maximum repeat)	4096x4096(8192)	2048x2048(2048)	4096x4096(8192)
2D texture formats	A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5 A1R5G5B5 A4R4G4B4 P8 V8U8 L6V5U5 X8L8V8U8 DXT1 DXT2 DXT3 DXT4 DXT5 D24S8 D16 D24X8	A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5 A1R5G5B5 A4R4G4B4 R3G3B2 L8 A8L8 V8U8 L6V5U5 X8L8V8U8 Q8W8V8U8 V16U16 W11V11U10 DXT1 DXT2 DXT3 DXT4 DXT5	A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5 A1R5G5B5 A4R4G4B4 P8 V8U8 L6V5U5 X8L8V8U8 DXT1 DXT2 DXT3 DXT4 DXT5 D24S8 D16 D24X8
3D texture formats	A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5 A1R5G5B5 A4R4G4B4 P8	A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5 A1R5G5B5 A4R4G4B4 R3G3B2 L8 A8L8 Q8W8V8U8 W11V11U10 DXT1 DXT2 DXT3 DXT4 DXT5	A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5 A1R5G5B5 A4R4G4B4 P8

Comments:

GeForce4 Ti4600 has a higher clock speed of the core and memory than the RADEON 8500. The GeForce4 Ti4400 also has the same core and memory clock speed as the RADEON 8500.
At last NVIDIA products got dual-monitor support, and unlike the R200, here both normal 350 MHz RAMDACs are integrated into the NV25 chip.
The NV25's RAMDAC has a lower frequency as compared with the primary RAMDAC of the R200 (350 against 400 MHz)
Organization of the internal architecture of the NV25 is close to the NV20 and R200 - 4 fill pipelines with two texture blocks on each. However, in case of the R200 results of their operation can accumulate twice, thus, allowing us to combine up to 6 textures at a pass; the NV25 limits them to 4. Nevertheless, there is no any applications capable to get a sound gain using 6 textures at a pass. However, the Next Doom is going to be such application.
Again, NVIDIA doesn't support 1.4 pixel shaders (see R200 review in detail) and a more flexible mechanism of dependent sampling of texture values. Shaders are, in fact, translated into settings of sampling and combination pipelines; the number of stages of texture sampling pipeline remains the same - 4 of the NV25/NV20 against 8 of the R200; some slight changes in the combination pipeline make possible to support 1.2 and 1.3 shaders on the hardware level. Their difference from the 1.1 shaders is connected not with organization of more flexible dependent sampling, but with utilization and modification of Z values and other useful options.
Combination pipelines of all chips have 8 stages and support all specified DirectX 8.1 operations.
The current drivers of the NV25 do not support a larger number of constants to be enabled in vertex shaders (96 against 192 of the R200) or of vertex shader instructions (128). It seems that there are no other qualitative changes apart from the second T&L unit (which is also a vertex shader interpreter) are made in the pipeline.
The NV25 memory now successfully works at the same frequency as the R200, with the rated access time being also the same. However, it doesn't mean the same efficiency as the R200 and NV20/NV25 approach issues related with memory operation differently. The NV25 prefers smaller blocks and an effective 4-channel crossbar controller, and the R200 uses larger blocks and intensive combined caching. What approach is more viable in modern tests and applications we will see later.
All the cards have normal DirectX 8.1 and OpenGL 1.3 drivers. The OpenGL driver from ATI is considered less efficient than that of NVIDIA. But the difference is becoming narrower, and at present it depends on how the OpenGL works with geometry and whether it uses index buffers - the R200 itself is less efficient in delivering geometry via AGP than the NV20/NV25.
To some reason, the current drivers of the NV20 and NV25 report that there are no clipping planes, though in our tests they work excellently. The reason is that NVIDIA uses a special pixel driver to realize clipping planes which uses the most part of slots of a combination pipeline and then an application becomes unable to use its own pixel shader and some other resources. It doesn't comply with the DirectX standard, and that is why clipping planes were disabled at the level of the reported capabilities.
NV25 doesn't support N-Patches on a hardware level again.
The drivers of the NV20 and NV25 do not support hardware tessellation of smooth surfaces (HOS based on RT-Patches). When a card doesn't support N-Patches on a hardware level the API tries to emulate them using RT-Patches. It makes operation of N-Patches very slow. NVIDIA thus had to disable the RT-Patches so that games supporting N-Patches won't be too slow.
NV25 doesn't support indexed matrix blending like the NV20 as the shaders can help organize flexibly any schemes of matrix blending.
Multisampling hasn't changed since NV20 - the same 2..4 samples which the R200 is not capable of.
Realization of the anisotropy of NV25/NV20 is different from R200, and each approach has its advantages and disadvantages.
The range of pixel shader values of the NV25 is still -1.0 to 1.0 - the higher precision of the R200 had no response to.
All cards support a standard set of texture formats, though the R200 supports some more formats for additional data in shaders (normal and displacement maps) with an increased precision of component delivery (11 and 16bit - V16U16, W11V11U10); NV25 and NV20 make possible to use textures with the Z buffer format (D32, D24S8, D16, D24X8) necessary to realize Depth Buffer Shadows algorithms. Usage of this algorithm which is peculiar to NVIDIA products by applications in the drivers for DirectX is nonstandard.
NV25 doesn't allow compressing 3D textures. It is a bad drawback of the drivers or the chip. At the same time the OpenGL drivers from NVIDIA have their own 3D texture compression format.
NV25 supports all types of fog, like the NV20 does.

Here is a complete list of OpenGL extensions supported by the NV25 in the current drivers:

GL_VENDOR: NVIDIA Corporation
GL_RENDERER: GeForce4 Ti 4400/AGP/SSE2
GL_VERSION: 1.3.1
GL_EXTENSIONS:

GL_ARB_imaging
GL_ARB_multisample
GL_ARB_multitexture
GL_ARB_texture_border_clamp
GL_ARB_texture_compression
GL_ARB_texture_cube_map
GL_ARB_texture_env_add
GL_ARB_texture_env_combine
GL_ARB_texture_env_dot3
GL_ARB_transpose_matrix
GL_S3_s3tc
GL_EXT_abgr
GL_EXT_bgra
GL_EXT_blend_color
GL_EXT_blend_minmax
GL_EXT_blend_subtract
GL_EXT_compiled_vertex_array
GL_EXT_draw_range_elements
GL_EXT_fog_coord
GL_EXT_multi_draw_arrays
GL_EXT_packed_pixels
GL_EXT_paletted_texture
GL_EXT_point_parameters
GL_EXT_rescale_normal
GL_EXT_secondary_color
GL_EXT_separate_specular_color
GL_EXT_shared_texture_palette
GL_EXT_stencil_wrap
GL_EXT_texture3D
GL_EXT_texture_compression_s3tc
GL_EXT_texture_edge_clamp
GL_EXT_texture_env_add
GL_EXT_texture_env_combine
GL_EXT_texture_env_dot3
GL_EXT_texture_cube_map
GL_EXT_texture_filter_anisotropic
GL_EXT_texture_lod
GL_EXT_texture_lod_bias
GL_EXT_texture_object
GL_EXT_vertex_array
GL_EXT_vertex_weighting
GL_HP_occlusion_test
GL_IBM_texture_mirrored_repeat
GL_KTX_buffer_region
GL_NV_blend_square
GL_NV_copy_depth_to_color
GL_NV_evaluators
GL_NV_fence
GL_NV_fog_distance
GL_NV_light_max_exponent
GL_NV_multisample_filter_hint
GL_NV_occlusion_query
GL_NV_packed_depth_stencil
GL_NV_point_sprite
GL_NV_register_combiners
GL_NV_register_combiners2
GL_NV_texgen_reflection
GL_NV_texture_compression_vtc
GL_NV_texture_env_combine4
GL_NV_texture_rectangle
GL_NV_texture_shader
GL_NV_texture_shader2
GL_NV_texture_shader3
GL_NV_vertex_array_range
GL_NV_vertex_array_range2
GL_NV_vertex_program
GL_NV_vertex_program1_1
GL_SGIS_generate_mipmap
GL_SGIS_multitexture
GL_SGIS_texture_lod
GL_SGIX_depth_texture
GL_SGIX_shadow
GL_WIN_swap_hint
WGL_EXT_swap_control

The same list in the latest drivers of the R200:

GL_VENDOR: ATI Technologies Inc.
GL_RENDERER: Radeon 8500 DDR x86/SSE2
GL_VERSION: 1.3.2475 WinXP Release
GL_EXTENSIONS:

GL_ARB_multitexture
GL_ARB_texture_border_clamp
GL_ARB_texture_compression
GL_ARB_texture_cube_map
GL_ARB_texture_env_add
GL_ARB_texture_env_combine
GL_ARB_texture_env_crossbar
GL_ARB_texture_env_dot3
GL_ARB_transpose_matrix
GL_ARB_vertex_blend
GL_ARB_window_pos
GL_S3_s3tc
GL_ATI_element_array
GL_ATI_envmap_bumpmap
GL_ATI_fragment_shader
GL_ATI_map_object_buffer
GL_ATI_pn_triangles
GL_ATI_texture_mirror_once
GL_ATI_vertex_array_object
GL_ATI_vertex_streams
GL_ATIX_texture_env_combine3
GL_ATIX_texture_env_route
GL_ATIX_vertex_shader_output_point_size
GL_EXT_abgr
GL_EXT_bgra
GL_EXT_blend_color
GL_EXT_blend_func_separate
GL_EXT_blend_minmax
GL_EXT_blend_subtract
GL_EXT_clip_volume_hint
GL_EXT_compiled_vertex_array
GL_EXT_draw_range_elements
GL_EXT_fog_coord
GL_EXT_packed_pixels
GL_EXT_point_parameters
GL_ARB_point_parameters
GL_EXT_rescale_normal
GL_EXT_secondary_color
GL_EXT_separate_specular_color
GL_EXT_stencil_wrap
GL_EXT_texgen_reflection
GL_EXT_texture_env_add
GL_EXT_texture3D
GL_EXT_texture_compression_s3tc
GL_EXT_texture_cube_map
GL_EXT_texture_edge_clamp
GL_EXT_texture_env_combine
GL_EXT_texture_env_dot3
GL_EXT_texture_lod_bias
GL_EXT_texture_filter_anisotropic
GL_EXT_texture_object
GL_EXT_vertex_array
GL_EXT_vertex_shader
GL_KTX_buffer_region
GL_NV_texgen_reflection
GL_NV_blend_square
GL_SGI_texture_edge_clamp
GL_SGIS_texture_border_clamp
GL_SGIS_texture_lod
GL_SGIS_generate_mipmap
GL_SGIS_multitexture
GL_WIN_swap_hint
WGL_EXT_extensions_string
WGL_EXT_swap_control

Now let's turn to the video cards based on two NV25 versions: GeForce4 Ti 4400 and 4600.

[ Part II ]

Write a comment below. No registration needed!