ATI RADEON 8500: Part I

By Andrey Vorobiev
and Alexander Medvedev

Today we are starting to climb the RADEON 8500 moutain. The expedition is promised to be quite tough and risky. But while we are at its foot we should study the theoretical issues concerned. If you are a beginner, you'd better first take a look at our previous examinations. Besides, all participants should consider the scheme outlined in the first R200 preview.

Technical data

Here is a comparison table of general characteristics of the chips and possibilities available in the current DirectX 8.1 drivers:

Who is who
Card	GeForce3 Ti 500	RADEON 8500
Chip	NV20	R200
Chip revision	A05	A13
Basic parameters
Pipelines	4	4
Texture units	2	2
Texture per pass	4	6
Core frequency, MHz	240	275
Fillrate (milion pixels)	960	1100
Fillrate (million texels)	1920	2200
RAMDAC, MHz	350	400 (+ external 240)
Memory
Memory frequency, MHz	250	275
Memory bus, bit	128 (DDR)	128 (DDR)
Technology, micron	.15	.15
Memory size, MB	64	64
Memory speed, ns	3.5	3.6
API
OpenGL version	1.3	1.2 (1.3?)
DirectX version	8.1	8.1
GDI+	Yes	Yes
Pixel pipeline
Pixel shaders	1.0, 1.1	1.0, 1.1, 1.4
Maximum color in pixel shader registers	1.0	8.0
Texture sampling stages	4	8
Texture combination stages	8	8
Vertex pipeline
Vertex shaders	1.0, 1.1	1.0, 1.1
Vertex streams	16	8
Vertex shader constants	96	192
Other
Texture size (max.)	2048X2048 (4096X4096?)	2048X2048
Matrices for blending (max.)	4	4
Indexed blending	No	up to 57 matrices
Sprite scaling up to	64	64
Light sources	8	8
Clipping planes	0 (6?)	6
Pure Device	Yes	Yes
N-Patches	No	Yes
RT-Patches	No (!)	No
Multisampling	2, 3, 4	No
3D textures	Yes	Yes (without MIPMAP)
Environment maps	Yes	Yes (without MIPMAP)
Anisotropic filtering	Yes (without MIPMAP)	Yes (without MIPMAP)
Anisotropy degree up to	8	16
Fog	FOGVERTEX FOGRANGE FOGTABLE	FOGVERTEX FOGRANGE
Frame buffer
Rendering buffer formats	A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5	A8R8G8B8 X8R8G8B8 R5G6B5 A1R5G5B5 A4R4G4B4 R3G3B2
Z-buffer formats	D32 D24S8 D16 D24X8	D32 D24S8 D16 D24X8
Texture formats
2D texture formats	A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5 A1R5G5B5 A4R4G4B4 DXT1 DXT2 DXT3 DXT4 DXT5 V8U8 L6V5U5 X8L8V8U8 Q8W8V8U8 P8 D32 D24S8 D16 D24X8	A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5 A1R5G5B5 A4R4G4B4 DXT1 DXT2 DXT3 DXT4 DXT5 V8U8 L6V5U5 X8L8V8U8 Q8W8V8U8 L8 R3G3B2 A8L8 V16U16 W11V11U10
3D texture formats	A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5 A1R5G5B5 A4R4G4B4 P8	A8R8G8B8 X8R8G8B8 R5G6B5 X1R5G5B5 A1R5G5B5 A4R4G4B4 R3G3B2 L8 A8L8 DXT1 DXT2 DXT3 DXT4 DXT5

Let me draw your attention to the following points:

Higher clock speed of the RADEON 8500 core and memory;
Higher limiting frequency of the primary RAMDAC R200 - 400 MHz, and presence of the secondary, external RAMDAC working at 240 MHz which makes possible to apply a VGA signal to two receivers;
An internal chip architecture organization is similar to the NV20 - 4 shading pipelines, with two texture units on each. But this time their results can twice accumulate, so we can combine up to 6 textures at a pass. Of course, with two penalty cycles for 6 textures and with one for 4 ones at least (like for the NV20);
1.4 pixel shader support (see the R200 preview for details) and a more flexible texture value sampling mechanism. Since the shaders are translated into settings of sampling and combination pipelines, texture sampling pipelines increased up to 8 stages and their possibilities extended;
Combination pipelines of both chips have up to 8 stages and support all DirectX 8.1 declared operations;
The number of constants for vertex shaders has increased to 192 against 96 of the NV20. This allows realizing more complex algorithms of blending and vertex processing;
Despite a modest access time (3.6 ns against 3.5 ns of the GeForce3 Ti 500), the RADEON 8500 memory works successfully at a higher frequency, even without any heatsinks. The R200 and NV20 use different approaches for memory operation. While the NV20 prefers smaller blocks and an effective crossbar controller, the R200 uses larger blocks and an intensive caching. Later we will see what approach is more viable in modern tests, but now it should be noted that the R200 treats memory softer, and this may increase its overclocking potential;
The R200 outdoes the NV20 in a set of realized DirectX 8.1 features, but the NV20 has an excellent OpenGL 1.3 driver. The current OpenGL driver of the R200 corresponds to v.1.2 (ATI say's that they already have OpenGL 1.3 for XP, alpha version) and is not so efficient as the NVIDIA's baby;
The R200 can set 6 arbitrary clipping planes, while the NV20 lacks for such a possibility (in current drivers);
The R200 has a hardware support of the N-Patches, the NV20 lacks for it;
The current NV20's drivers do not support hardware tesselation of smooth surfaces anymore (RT-Patches);
The R200 can implement an indexed matrix blending using a palette of 57 matrices (4 can be enabled at a time). But this feature doesn't seem so considerable when vertex shaders are used. The shaders can organize any blending schemes with any number of matrices used. But is the blending with shaders so efficient as the hardware one? Later we will try to answer this question;
The R200 doesn't support Multisampling (!);
R200 doesn't support Mip-mapping (and, therefore, a trilinear filtering) for environment and 3D textures;
Both accelerators do not allow for simultaneous Mip-mapping and anisotropic filtering in the DirectX, i.e. it is impossible to enable a trilinear filtering + an anisotropic one. On the other hand, the NV20 supports this mode in the OpenGL, while DirectX applications are lacking for anisotropy filtering as a rule;
The maximum anisotropy degree is twice higher in the R200;
Pixel shaders of the R200 can operate on values exceeding 1.0 (i.e. 255) - namely, from 0 to 8.0. It is an OverBright approach. Complex calculations can have an additional precision reserve, i.e. you can realize accumulation of particular vertices, for example, for a more adequate delivery of bright lighting;
Texture formats are almost the same, but while the R200 has several exotic formats for using additional data in shaders (normal and bump maps) with an increased precision of component delivery (11bit and 16bit - V16U16, W11V11U10), the NV20 can use textures with a Z-buffer format (D32, D24S8, D16, D24X8) which are necessary for realization of algorithms of the Shadow Buffer class;
All texture compression formats are supported, but while the R200 allows compressing also 3D textures into the same formats, the NV20 does not! Taking into account significant dimensions of 3D textures we can consider it a worse drawback of the drivers or of the chip. In the OpenGL NVIDIA successfully uses its own 3D texture compression format;
The DirectX R200 supports all kinds of a fog except the table one;

Here is a list of OpenGL extensions supported by current drivers of the Radeon 8500:

GL_VENDOR: ATI Technologies Inc.
GL_RENDERER: Radeon 8500 DDR x86/SSE
GL_VERSION: 1.2.2357 Win9x Release
GL_EXTENSIONS:
GL_ARB_multitexture
GL_ARB_texture_border_clamp
GL_ARB_texture_compression
GL_ARB_texture_cube_map
GL_ARB_texture_env_add
GL_EXT_texture_env_add
GL_ARB_texture_env_combine
GL_ARB_texture_env_crossbar
GL_ARB_texture_env_dot3
GL_ARB_transpose_matrix
GL_ARB_vertex_blend
GL_S3_s3tc
GL_ATI_element_array
GL_ATI_envmap_bumpmap
GL_ATI_fragment_shader
GL_ATI_pn_triangles
GL_ATI_texture_mirror_once
GL_ATI_vertex_array_object
GL_EXT_vertex_shader
GL_ATI_vertex_streams
GL_ATIX_texture_env_combine3
GL_ATIX_texture_env_route
GL_ATIX_vertex_shader_output_point_size
GL_EXT_abgr
GL_EXT_bgra
GL_EXT_blend_color
GL_EXT_blend_func_separate
GL_EXT_blend_minmax
GL_EXT_blend_subtract
GL_EXT_clip_volume_hint
GL_EXT_compiled_vertex_array
GL_EXT_draw_range_elements
GL_EXT_fog_coord
GL_EXT_packed_pixels
GL_EXT_point_parameters
GL_ARB_point_parameters
GL_EXT_rescale_normal
GL_EXT_secondary_color
GL_EXT_separate_specular_color
GL_EXT_stencil_wrap
GL_EXT_texgen_reflection
GL_EXT_texture3D
GL_EXT_texture_compression_s3tc
GL_EXT_texture_cub
GL_MAX_TEXTURE_SIZE: 1024
GL_MAX_ACTIVE_TEXTURES_ARB: 6

The current situation differs from the one we witnessed a year ago when the R100 was released. The technological advantage is obvious, but it is not so great as it was in the R100 and NV15 (GeForce2). But having a similar pipeline configuration, the chip must be a more efficient as it operates at a higher clock speed.

Well, it's time to start climbing our mountain. First of all, let's take a gander at the ATI RADEON 8500 video card.

Card

The senior model of the ATI's game card has the same name as the graphics processor.

The card has AGP x2/x4 interface, 64 MBytes DDR SDRAM located in 8 chips on the right and back sides of the PCB. The layout is, in fact, very close to the RADEON 64 MBytes DDR.

Hynix (former Hyundai Semiconductor) produces memory chips with 3.6 ns access time, which corresponds to 277 (554) MHz.

The memory operates at 275 (550) MHz, but the chips do not have any heatsinks. Lack of them is a peculiarity of the whole RADEON 7500/8500 series. While NVIDIA card require obligatory cooling of their memory chips which work, at the same time, at a frequency much lower than the rated one (just take, for example, 230 MHz when an access time equals 3.8ns which corresponds to 263 MHz), the ATI engineers equipped the chip with an excellent "ecological" memory controller.

The ATI RADEON 7500 card which was studied some time ago also has a quick 4ns memory. According to some sources, the memory controller of the RADEON 7500 is the same as in the RADEON 8500. And althouth the memory works at 230 MHz, the same controller allows doing without any cooling (the GeForce2 Ultra cards, for example, have memory chips which warm up considerably). Moreover, the memory of the RADEON 7500 has a greater overclocking potential than the GeForce2 Ultra.

Now let me compare the design of both cards: RADEON 7500 (on the top) and RADEON 8500 (on the bottom):

First of all, both cards have different positions of DVI and VGA connectors. Besides, the RADEON 8500 has a RAGE THEATER module which is in charge of the VIVO (Video In Video Out). Here we have only the Video Out enabled. I think ATI refused to equip the 3D accelerator with a full set of VIVO multimedia functions to divide stricktly possibilities of All-In-Wonder series and other cards (the All-In-Wonder combine would differ from a usual video card with VIVO only in a TV tuner, and it would be unprofitable to produce the former cards). Now the complete VIVO possibilities are available only for All-In-Wonder cards.

The cooler of the processor of the Radeon 8500 is glued and not just mounted on the the PCB. All ATI cards have no holes for attaching coolers, that is why if the fan goes out of order you have to break off it from the chip. (which is dangerous as you can damage the chip and the card), look for new grease and clean out the surface before installing a new heatsink and fan risking to erase what is written on the GPU:

When we removed the cooler the dense layer of glue made possible to distinguish only "RADEON 8500". The revision, however, is clear as the PowerStrip informed on the A13 stepping.

As you know, the RADEON 8500 is equipped with a RAGE THEATER coprocessor which controls multimedia functions. The card has a complete multimonitor support, i.e. offers all features of the 7500, including a TV-out excellently working separately from a monitor. But while the older model has it realized through the GPU, in the RADEON 8500 it is the RAGE THEATER which controls the TV-out.

The most of video streams are recorded in the Interlaced mode: even lines are followed by odd ones. It is the first quarter which is drawn first on monitors and TV screens with Interlacing support. Then comes the second quarter in the second pass. But it is not in use on modern monitors, that is why it is needed to transfer to the deinterlacing.

There are two methods to realize it - BOB and Weave. In the first case two frames are implemented: one of the odd ones and the other of the even ones. Each line is to be copied twice. This approach is good for video records with intensive movements. Here is an example:

The other approach, Weave, is suitable for stopped frames. There lines are interlaced which results in one frame with a twice increased vertical resolution. Here is an example:

ATI offers its own method of per-pixel deinterlacing with much higher quality of an image:

The quality of the TV-out and video reprodcution of the RADEON 8500 is one of the best.

Like the RADEON 7500, the 8500 card can display an image on two screens since the chip has two integrated CRTC modules (and a transmitter for digital monitors). The secondary RAMDAC (it is between the GPU and the RAGE THEATER on the photo) turned out to be an external 10bit chip working at a maximum of 240 MHz. Though it is not much, this frequency is still enough to obtain 1600X1200 at 100 Hz on the second monitor. All peculiarities of operation of the RADEON 7500, including the HydraVision technology, are also typical of the 8500 (you can read the RADEON 7500 to get the details on displaying an image on two screens).

The video card ships both in the OEM and Retail packages. In the box you can find:

User Manual;
CD with drivers and utilities;
CDs with games and demo-products;
S-Video-to-RCA adapter;
DVI-to-VGA adapter;
S-Video, RCA cables.

Overclocking

Being cooled, this card worked flawlessly at 310/295(590) MHz of the core and memory. I think, the ATI's new chip has an excellent potetial. The memory didn't speed up much. Nevertheless, taking into account that the card is well balanced, an increase of the GPU frequency is preferable.

Installation and drivers

Test system:

Athlon based system:

AMD Athlon 1400 MHz;
Chaintech 7KJD (AMD760);
512 MBytes DDR SDRAM PC2100;
IBM DTLA HDD, 45 GBytes;
OS Windows 98 SE.

Pentium 4 based system:

Intel Pentium 4 1500 MHz;
ASUS P4T (i850);
512 MBytes RDRAM PC800;
Quantum FB AS HDD, 20 GBytes;
OS Windows 98 SE.

The test system was also supplimented with ViewSonic P810 (21") and ViewSonic P817 (21") monitors.

For the tests we used the ATI 7.191 drivers. For the comparative analyses we processed the results of the NVIDIA GPU based cards obtained with the NVIDIA 21.85 drivers. The competitors are:

NVIDIA GeForce3 Ti 500 (240/250 (500) MHz, 64 MBytes DDR;
NVIDIA GeForce3 Ti 200 (175/200 (400) MHz, 64 MBytes DDR);
ABIT Siluro GF3 VIO (GeForce3, 200/230 (460) MHz, 64 MBytes DDR).

The VSync was disabled in the drivers of all cards.

Attention!!! You all know that such powerful video cards are meant for normal operation in 32-bit color. We omit the analyses of the results obtained in 16-bit color as the latest genration (GeForce3/RADEON 8500) gave up it for lost.

[ Part II ]

Write a comment below. No registration needed!