NVIDIA GeForce4 Ti 4400
and GeForce4 Ti 4600 (NV25) Review
|
The NV25 chip was awaited for a long time by many
as an echo of 3dfx's deals, as a competitor against ATI RADEON 8500
and as the second optimized and enriched incarnation of NV20. Let
me dive directly into the root of the matter...
Attention! Before reading the review you
should turn to the previous articles on NVIDIA
GeForce3 (NV20) and ATI
Radeon 8500 (R200).
Product line
The GeForce 4 line is based on two chips - NV17
and NV25 which is the today's main hero:
- GeForce 4 Ti4600 - NV25 300 MHz core, 128 MBytes 325(650) MHz
128-bit DDR memory.
- GeForce 4 Ti4400 - NV25 275 MHz core, 128 MBytes 275(550) MHz
128-bit DDR memory.
- One more junior card on the NV25 will be announced later.
- GeForce 4 MX460 - 300 MHz core, 64 MBytes 275(550) MHz 128-bit
DDR memory.
- GeForce 4 MX440 - 270 MHz core, 64 MBytes 200(400) MHz 128-bit
DDR memory.
- GeForce 4 MX420 - 200 MHz core, 64 MBytes 166 MHz 128-bit DDR
memory.
Note:
- The GeForce 3 line will be quickly replaced with the GeForce
4 one.
- NV17 doesn't and won't support pixel and vertex shaders.
- NV17 will have MPEG2 hardware decoder and dynamic power management
system, and the NV25 does not.
- NV17 has only two fill pipelines, and NV25 - four.
- NV25 has a superscalar (dual) T&L unit, NV17 has a single
one.
- NV17 and NV25 have similar memory controllers (2x-channel one
of the NV17 and a 4-channel one of the NV25).
- Both chips have the same set of systems for increasing the
memory effective bandwidth (Z buffer compression and fast Z clear,
MSAA, HSR).
- NV17 has 2 integrated LCD-panel controllers.
- Both chips have two independent RAMDACs, CRTC controllers,
and integrated TV-Out and DVI interfaces.
This pretty monster will help promoting and advertising
NV25 based products demonstrating advanced soft illumination, skeletal
animation, hair and fur made of vertex shaders and per-pixel relief:
Theory
NV25
Main architectural innovations of the NV25 (vs.
NV20)
- 2 independent CRTC controllers. Flexible support of all possible
modes and of output of two frame buffers independent in resolution
and in contents onto any accessible signal receivers.
- 2 normal 350 MHz RAMDACs integrated in the chip (with a 10bit
palette).
- Integrated TV-Out.
- Integrated TDMS transmitter (for DVI interface).
- 2 units of interpretation and implementation of vertex shaders.
They promise a considerable growth of a processing speed of a
scene with complicated geometry. The units can't implement a different
microcode of shaders, and the only advantage of processing of
two vertices simultaneously is a performance increase.
- Improved fill pipelines provide hardware support of pixel shaders
up to v1.3 inclusive.
- According to NVIDIA, an effective fillrate in the MSAA modes
got higher, and now 2x AA and Quincunx AA modes will cause much
less performance drop. The Quincunx AA is improved (positions
of sample fetching are shifted). A new AA method appeared - 4xS.
- Improved separate caching system (4 separate caches for geometry,
frame buffer and Z-buffer).
- Improved lossless compression (1:4) and fast Z clear.
- Improved hidden surface removal algorithm (Z Cull HSR).
Further we will check all these declared advantages
of the new chip.
The above changes are rather evolutionary rather
than revolutionary as compared with the previous NVIDIA's product
(NV20). But it is typical of NVIDIA to release first a product carrying
a great deal of new technologies and then its improved (optimized)
variant. Just take TNT and TNT2, GF256 and GF2, and now GF3 and
GF4. The experience shows that usually the second variant meets
with great success.
Performance characteristics
First of all, a bit of explanation:
- The accelerator can't be examined ignoring its drivers. Chip's
capabilities depend on whether the drivers support certain applications
for two main APIs. Many characteristics from the table can depend
on drivers and can be correct, first of all, for a certain version.
Moreover, some possibilities can be available even if the drivers
do not mention them (for example, clipping planes in D3D for NVIDIA
cards). We will consider such features absent as a correctly written
application mustn't try to use options the driver doesn't report
about.
- The most of the data relate to Direct3D, and in OpenGL these
parameters can differ. There are several reasons of it, and first
of all, it should be noted that this gaming API is closer to the
accelerator's hardware. Besides, capabilities of modern accelerators
are much dependent on the D3D specification.
And now take a look at the summary table of the
key characteristics of the chips and cards tested today. Keep in
mind that in the nearest quarter the ATI RADEON 8500 will be a main
competitor of NV25 based cards (GeForce 4 Ti 4600 and Ti 4600) because
of the postponed release of RADEON 8500XT and because first R300
based products won't appear very soon.
Card |
GeForce3 Ti 500 |
RADEON 8500 |
GeForce 4 Ti 4600 (GeForce 4 Ti 4400) |
Chip, revision, driver version |
Chip |
NV20 |
R200 |
NV25 |
Revision |
A5 |
A23 |
A03 |
Driver version |
27.30 |
6.018 |
27.30 |
Main parameters |
Pipelines |
4 |
4 |
4 |
Texture blocks |
2 |
2 |
2 |
Textures at a pass |
4 |
6 |
4 |
Core frequency, MHz |
240 |
275 |
300 (275) |
Fill rate (million pixels) |
960 |
1100 |
1200 (1100) |
Fill rate (million texels) |
1920 |
2200 |
2400 |
RAMDAC, MHz |
350 |
400 (+ external 240) |
350*2 |
Local memory parameters |
Memory frequency, MHz |
250 |
275 |
325 (275) |
Memory bus, bits |
128 (DDR) |
128 (DDR) |
128 (DDR) |
Technology, micron |
0.15 |
0.15 |
0.15 |
Memory size, MB |
64 |
64 |
128 |
Memory speed, ns |
3.8 |
3.6 |
2.8 (3.6) |
OpenGL version |
1.3 |
1.3 |
1.3 |
DirectX version |
8.1 |
8.1 |
8.1 |
GDI+ acceleration |
Yes |
Yes |
Yes |
Pixel pipeline |
Pixel shaders |
1.0, 1.1 |
1.0..1.4 |
1.0..1.3 |
Range of calculated color values |
-1.0..+1.0 |
-8.0..+8.0 |
-1.0..+1.0 |
Texture stages |
4 |
8 |
4 |
Combination stage |
8 |
8 |
8 |
Multisampling |
2,3,4 samples |
No |
2,3,4 samples |
Clipping planes |
0 |
6 |
0 |
Vertex shader |
Vertex shaders |
1.0, 1.1 |
1.0, 1.1 |
1.0, 1.1 |
Vertex streams |
16 |
8 |
16 |
Constants of vertex shader |
96 |
192 |
96 |
Matrices for blending (max.) |
4 |
4 |
4 |
Indexed blending |
No |
Up to 57 matrices |
No |
Light sources |
8 |
8 |
8 |
N-Patches |
No |
Yes |
No |
RT-Patches |
No |
No |
No |
Primitives |
1048575 |
65536 |
1048575 |
Vertices |
1048575 |
16777215 |
1048575 |
Other parameters |
Pure Device |
Yes |
Yes |
Yes |
Sprite scaling up to |
64 |
256 |
8192 |
3D textures |
Yes (with anisotropy) |
Yes (without MIPMAP) |
Yes (with anisotropy) |
Reflection mapping |
Yes (with anisotropy) |
Yes (without MIPMAP) |
Yes (with anisotropy) |
Anisotropic filtering |
Yes |
Yes |
Yes |
Anisotropy degree up to |
2,3,4 bi/trilinear sampling |
2,3 bilinear sampling in a line |
2,3,4 bi/trilinear sampling |
Fog |
FOGVERTEX FOGRANGE FOGTABLE |
FOGVERTEX FOGRANGE |
FOGVERTEX FOGRANGE FOGTABLE |
Frame buffer |
Rendering buffer formats |
A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5 |
A8R8G8B8 X8R8G8B8 R5G6B5 A1R5G5B5
A4R4G4B4 R3G3B2 |
A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5 |
Z-buffer formats |
D32 D24S8 D16 D24X8 |
D32 D24S8 D16 D24X8 |
D32 D24S8 D16 D24X8 |
Texture formats |
Maximum texture size (maximum repeat) |
4096x4096(8192) |
2048x2048(2048) |
4096x4096(8192) |
2D texture formats |
A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5 A1R5G5B5
A4R4G4B4 P8 V8U8 L6V5U5 X8L8V8U8 DXT1 DXT2 DXT3 DXT4 DXT5
D24S8 D16 D24X8 |
A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5 A1R5G5B5
A4R4G4B4 R3G3B2 L8 A8L8 V8U8 L6V5U5 X8L8V8U8 Q8W8V8U8 V16U16
W11V11U10 DXT1 DXT2 DXT3 DXT4 DXT5 |
A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5 A1R5G5B5
A4R4G4B4 P8 V8U8 L6V5U5 X8L8V8U8 DXT1 DXT2 DXT3 DXT4 DXT5
D24S8 D16 D24X8 |
3D texture formats |
A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5 A1R5G5B5
A4R4G4B4 P8 |
A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5 A1R5G5B5
A4R4G4B4 R3G3B2 L8 A8L8 Q8W8V8U8 W11V11U10 DXT1 DXT2 DXT3
DXT4 DXT5 |
A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5 A1R5G5B5
A4R4G4B4 P8 |
Comments:
- GeForce4 Ti4600 has a higher clock speed of the core and memory
than the RADEON 8500. The GeForce4 Ti4400 also has the same core
and memory clock speed as the RADEON 8500.
- At last NVIDIA products got dual-monitor support, and unlike
the R200, here both normal 350 MHz RAMDACs are integrated into
the NV25 chip.
- The NV25's RAMDAC has a lower frequency as compared with the
primary RAMDAC of the R200 (350 against 400 MHz)
- Organization of the internal architecture of the NV25 is close
to the NV20 and R200 - 4 fill pipelines with two texture blocks
on each. However, in case of the R200 results of their operation
can accumulate twice, thus, allowing us to combine up to 6 textures
at a pass; the NV25 limits them to 4. Nevertheless, there is no
any applications capable to get a sound gain using 6 textures
at a pass. However, the Next Doom is going to be such application.
- Again, NVIDIA doesn't support 1.4 pixel shaders (see R200
review in detail) and a more flexible mechanism of dependent
sampling of texture values. Shaders are, in fact, translated into
settings of sampling and combination pipelines; the number of
stages of texture sampling pipeline remains the same - 4 of the
NV25/NV20 against 8 of the R200; some slight changes in the combination
pipeline make possible to support 1.2 and 1.3 shaders on the hardware
level. Their difference from the 1.1 shaders is connected not
with organization of more flexible dependent sampling, but with
utilization and modification of Z values and other useful options.
- Combination pipelines of all chips have 8 stages and support
all specified DirectX 8.1 operations.
- The current drivers of the NV25 do not support a larger number
of constants to be enabled in vertex shaders (96 against 192 of
the R200) or of vertex shader instructions (128). It seems that
there are no other qualitative changes apart from the second T&L
unit (which is also a vertex shader interpreter) are made in the
pipeline.
- The NV25 memory now successfully works at the same frequency
as the R200, with the rated access time being also the same. However,
it doesn't mean the same efficiency as the R200 and NV20/NV25
approach issues related with memory operation differently. The
NV25 prefers smaller blocks and an effective 4-channel crossbar
controller, and the R200 uses larger blocks and intensive combined
caching. What approach is more viable in modern tests and applications
we will see later.
- All the cards have normal DirectX 8.1 and OpenGL 1.3 drivers.
The OpenGL driver from ATI is considered less efficient than that
of NVIDIA. But the difference is becoming narrower, and at present
it depends on how the OpenGL works with geometry and whether it
uses index buffers - the R200 itself is less efficient in delivering
geometry via AGP than the NV20/NV25.
- To some reason, the current drivers of the NV20 and NV25 report
that there are no clipping planes, though in our tests they work
excellently. The reason is that NVIDIA uses a special pixel driver
to realize clipping planes which uses the most part of slots of
a combination pipeline and then an application becomes unable
to use its own pixel shader and some other resources. It doesn't
comply with the DirectX standard, and that is why clipping planes
were disabled at the level of the reported capabilities.
- NV25 doesn't support N-Patches on a hardware level again.
- The drivers of the NV20 and NV25 do not support hardware tessellation
of smooth surfaces (HOS based on RT-Patches). When a card doesn't
support N-Patches on a hardware level the API tries to emulate
them using RT-Patches. It makes operation of N-Patches very slow.
NVIDIA thus had to disable the RT-Patches so that games supporting
N-Patches won't be too slow.
- NV25 doesn't support indexed matrix blending like the NV20
as the shaders can help organize flexibly any schemes of matrix
blending.
- Multisampling hasn't changed since NV20 - the same 2..4 samples
which the R200 is not capable of.
- Realization of the anisotropy of NV25/NV20 is different from
R200, and each approach has its advantages and disadvantages.
- The range of pixel shader values of the NV25 is still -1.0
to 1.0 - the higher precision of the R200 had no response to.
- All cards support a standard set of texture formats, though
the R200 supports some more formats for additional data in shaders
(normal and displacement maps) with an increased precision of
component delivery (11 and 16bit - V16U16, W11V11U10); NV25 and
NV20 make possible to use textures with the Z buffer format (D32,
D24S8, D16, D24X8) necessary to realize Depth Buffer Shadows algorithms.
Usage of this algorithm which is peculiar to NVIDIA products by
applications in the drivers for DirectX is nonstandard.
- NV25 doesn't allow compressing 3D textures. It is a bad drawback
of the drivers or the chip. At the same time the OpenGL drivers
from NVIDIA have their own 3D texture compression format.
- NV25 supports all types of fog, like the NV20 does.
Here is a complete list of OpenGL extensions supported
by the NV25 in the current drivers:
- GL_VENDOR: NVIDIA Corporation
- GL_RENDERER: GeForce4 Ti 4400/AGP/SSE2
- GL_VERSION: 1.3.1
- GL_EXTENSIONS:
- GL_ARB_imaging
- GL_ARB_multisample
- GL_ARB_multitexture
- GL_ARB_texture_border_clamp
- GL_ARB_texture_compression
- GL_ARB_texture_cube_map
- GL_ARB_texture_env_add
- GL_ARB_texture_env_combine
- GL_ARB_texture_env_dot3
- GL_ARB_transpose_matrix
- GL_S3_s3tc
- GL_EXT_abgr
- GL_EXT_bgra
- GL_EXT_blend_color
- GL_EXT_blend_minmax
- GL_EXT_blend_subtract
- GL_EXT_compiled_vertex_array
- GL_EXT_draw_range_elements
- GL_EXT_fog_coord
- GL_EXT_multi_draw_arrays
- GL_EXT_packed_pixels
- GL_EXT_paletted_texture
- GL_EXT_point_parameters
- GL_EXT_rescale_normal
- GL_EXT_secondary_color
- GL_EXT_separate_specular_color
- GL_EXT_shared_texture_palette
- GL_EXT_stencil_wrap
- GL_EXT_texture3D
- GL_EXT_texture_compression_s3tc
- GL_EXT_texture_edge_clamp
- GL_EXT_texture_env_add
- GL_EXT_texture_env_combine
- GL_EXT_texture_env_dot3
- GL_EXT_texture_cube_map
- GL_EXT_texture_filter_anisotropic
- GL_EXT_texture_lod
- GL_EXT_texture_lod_bias
- GL_EXT_texture_object
- GL_EXT_vertex_array
- GL_EXT_vertex_weighting
- GL_HP_occlusion_test
- GL_IBM_texture_mirrored_repeat
- GL_KTX_buffer_region
- GL_NV_blend_square
- GL_NV_copy_depth_to_color
- GL_NV_evaluators
- GL_NV_fence
- GL_NV_fog_distance
- GL_NV_light_max_exponent
- GL_NV_multisample_filter_hint
- GL_NV_occlusion_query
- GL_NV_packed_depth_stencil
- GL_NV_point_sprite
- GL_NV_register_combiners
- GL_NV_register_combiners2
- GL_NV_texgen_reflection
- GL_NV_texture_compression_vtc
- GL_NV_texture_env_combine4
- GL_NV_texture_rectangle
- GL_NV_texture_shader
- GL_NV_texture_shader2
- GL_NV_texture_shader3
- GL_NV_vertex_array_range
- GL_NV_vertex_array_range2
- GL_NV_vertex_program
- GL_NV_vertex_program1_1
- GL_SGIS_generate_mipmap
- GL_SGIS_multitexture
- GL_SGIS_texture_lod
- GL_SGIX_depth_texture
- GL_SGIX_shadow
- GL_WIN_swap_hint
- WGL_EXT_swap_control
The same list in the latest drivers of the R200:
- GL_VENDOR: ATI Technologies Inc.
- GL_RENDERER: Radeon 8500 DDR x86/SSE2
- GL_VERSION: 1.3.2475 WinXP Release
- GL_EXTENSIONS:
- GL_ARB_multitexture
- GL_ARB_texture_border_clamp
- GL_ARB_texture_compression
- GL_ARB_texture_cube_map
- GL_ARB_texture_env_add
- GL_ARB_texture_env_combine
- GL_ARB_texture_env_crossbar
- GL_ARB_texture_env_dot3
- GL_ARB_transpose_matrix
- GL_ARB_vertex_blend
- GL_ARB_window_pos
- GL_S3_s3tc
- GL_ATI_element_array
- GL_ATI_envmap_bumpmap
- GL_ATI_fragment_shader
- GL_ATI_map_object_buffer
- GL_ATI_pn_triangles
- GL_ATI_texture_mirror_once
- GL_ATI_vertex_array_object
- GL_ATI_vertex_streams
- GL_ATIX_texture_env_combine3
- GL_ATIX_texture_env_route
- GL_ATIX_vertex_shader_output_point_size
- GL_EXT_abgr
- GL_EXT_bgra
- GL_EXT_blend_color
- GL_EXT_blend_func_separate
- GL_EXT_blend_minmax
- GL_EXT_blend_subtract
- GL_EXT_clip_volume_hint
- GL_EXT_compiled_vertex_array
- GL_EXT_draw_range_elements
- GL_EXT_fog_coord
- GL_EXT_packed_pixels
- GL_EXT_point_parameters
- GL_ARB_point_parameters
- GL_EXT_rescale_normal
- GL_EXT_secondary_color
- GL_EXT_separate_specular_color
- GL_EXT_stencil_wrap
- GL_EXT_texgen_reflection
- GL_EXT_texture_env_add
- GL_EXT_texture3D
- GL_EXT_texture_compression_s3tc
- GL_EXT_texture_cube_map
- GL_EXT_texture_edge_clamp
- GL_EXT_texture_env_combine
- GL_EXT_texture_env_dot3
- GL_EXT_texture_lod_bias
- GL_EXT_texture_filter_anisotropic
- GL_EXT_texture_object
- GL_EXT_vertex_array
- GL_EXT_vertex_shader
- GL_KTX_buffer_region
- GL_NV_texgen_reflection
- GL_NV_blend_square
- GL_SGI_texture_edge_clamp
- GL_SGIS_texture_border_clamp
- GL_SGIS_texture_lod
- GL_SGIS_generate_mipmap
- GL_SGIS_multitexture
- GL_WIN_swap_hint
- WGL_EXT_extensions_string
- WGL_EXT_swap_control
The most of NV25 extensions remained standard which
means a stronger influence of NVIDIA on the OpenGL. Now let's turn
to the video cards based on two NV25 versions: GeForce4 Ti 4400
and 4600.
[ Part II ]
Write a comment below. No registration needed!
|
|
|
|
|