The Same Candy in a New Wrapper and Probably Sweeter... ATI RADEON X1950 XTX (R580+)

Part 1. Theory and architecture

ATI has again increased the index in its product. Again by "50". We had RADEON X1900, now it's called X1950. However, the old models are not destroyed, they just go down in price. Later on the X1900 XTX will just disappear.

We can already guess that this change in the product name is accompanied by very insignificant modifications. But on the whole, it's an important event. First of all, introduction of GDDR4 memory into the market; secondly, a modified cooling system, which is finally not so noisy under a heavy load and more efficient. But let's not put the cart before the horse.

It's a minor modification of the R580 chip. This time the chip has very few changes, there are absolutely no major innovations. The only significant modification is made to the memory controller, there were some bugs fixed concerning its operation with the new GDDR4 memory type. The updated memory controller in the R580+ supports three memory types now: DDR2, GDDR3 and GDDR4. Besides, according to ATI, the R580+ also has some other minor changes: increased size of some caches, HyperZ now supports 2560x1600. The other features are the same: the number of transistors, the number of pixel, texture, and vertex processors, the process technology. Many sources assumed that the R580+ would be manufactured by the 80nm process technology to reduce chip costs, its power consumption, and probably to increase its frequencies in new products. But it was not to be. Perhaps the new 80nm process technology will be used for the chips of the new generation (R600) and chips from other price sectors to be launched between the R580+ and the next generation.

As the R580+ is nearly a complete copy of the R580, which was in its turn a modified R520 and contained no major innovations either, we recommend that you should read the corresponding reviews: RADEON X1800 (R520) and RADEON X1900 (R580). This article will focus on small differences between the R580+ and the R580 in the first place.

Official RADEON X1950 Specifications

Codename: R580+
Process technology: 90 nm
384 million transistors
Flip-chip package (flipped chip without a metal cap)
256bit memory interface
Up to 1 GB of DDR2, GDDR3 or GDDR4 memory
PCI-Express x16
48 pixel processors
16 texture units
8 vertex processors
Calculating, blending, and writing up to 16 full (color, depth, stencil buffer) pixels per clock
Precision of vertex and pixel calculations — FP32
Support for SM 3.0, including dynamic branching in pixel and vertex processors. The only limitation is the lack of vertex texture fetch
Efficient implementation of jumps and dynamic branching in pixel processors
Support for rendering into FP16 frame buffer, including blending and multisampling operations, a new integer data type is added - RGBA (10:10:10:2) to a frame buffer for higher-quality rendering without using FP16
Support for FP16 textures, including texture compression for FP16 textures, including the 3Dc+ technology. Hardware filtering for FP16 texture fetch is not supporting
Fetching four neighboring texture samples instead of one per clock in case of no filtering (it accelerates filtering, programmed in a pixel shader, for example, for FP16 format)
High-quality algorithm for anisotropic filtering, a user is given a choice between a faster or higher-quality anisotropy options, improved trilinear filtering
Support for two-sided stencil buffer
MRT rendering (Multiple Render Targets — rendering into several buffers)
Memory controller with a 512 bit internal ring bus, two 256 bit rings to each direction, 4 memory channels, programmable arbitration
Efficient caching and more efficient HyperZ (according to ATI, onchip HyperZ buffers are again increased compared to the R580)
2 x 400 MHz RAMDAC
2 x DVI Dual Link supporting HDCP and HDMI
TV-Out and TV-In, HDTV-Out
The latest generation of the hardware GPU, responsible for compressing, decompressing, and post processing video data, supporting hardware decode acceleration of H.264 — the most progressive video format
2D accelerator supporting all GDI+ functions
ATI CrossFire support

Specifications on the RADEON X1950 XTX reference card

Core frequency: 650 MHz
Effective memory frequency: 2.0 GHz (2*1000 MHz)
Memory type: GDDR4, 0.91ns (standard frequency: up to 2*1100 MHz)
Memory: 512 MB
Memory bandwidth: 64.0 GB/sec
Theoretical fillrate: 10.4 gigapixel per second
Theoretical texture sampling rate: 10.4 gigatexel per second
2 x DVI-I (Dual Link, 2560x1600 video output)
PCI-Express 16x bus
TV-Out, HDTV-Out, HDCP support
It consumes more than 100 Watt, approximately like RADEON X1900 XTX

Specifications on the RADEON X1950 CrossFire Edition reference card

Core frequency: 650 MHz
Effective memory frequency: 2.0 GHz (2*1000 MHz)
Memory type: GDDR4, 0.91ns (standard frequency: up to 2*1100 MHz)
Memory: 512 MB
Memory bandwidth: 64.0 GB/sec
Theoretical fillrate: 10.4 gigapixel per second
Theoretical texture sampling rate: 10.4 gigatexel per second
1 x DVI-I (Dual Link, supporting 2560x1600)
PCI-Express 16x bus
CrossFire
It consumes more than 100 Watt, just like RADEON X1900 XTX

As we can see, the specifications of R580+ and RADEON X1950 XTX are almost a complete copy of those on R580 and RADEON X1900 XTX. The only difference from the previous ATI's top model is GDDR4 memory. The chip clock remains the same, the RADEON X1950 XTX operates at the frequency of RADEON X1900 XTX — 650 MHz. But the frequency of local video memory has been changed to 1000(2000) MHz, an unachievable value in the recent past. Such a high frequency is possible only due to the new memory type. The reference card RADEON X1950 XTX uses GDDR4 memory chips with 0.9ns access time, which corresponds to 1100(2200) MHz, a tad higher than the operating frequency in the model under review.

GDDR4 (Graphics Double Data Rate, Version 4) is a new generation of "video" memory designed for 3D video cards; it's nearly twice as fast as GDDR3. The key differences of GDDR4 from GDDR3, most important for users, are increased operating frequencies (hence higher bandwidth) and reduced power consumption. Technically, GDDR4 memory is not that much different from GDDR3. Just another development stage. It significantly facilitates adapting the existing chips and designing future products supporting the new memory type. RADEON X1950 XTX cards have become the first models with GDDR4 chips. NVIDIA plans on launching products with such memory a tad later. It will most likely be video cards based on NVIDIA G80.

The new memory type has been designed by Samsung and Hynix in cooperation with ATI, which directed the development within JEDEC. GDDR4 chips are already manufactured by these two companies, but only Samsung has started its production on a mass scale. Shipments of such memory chips to video card manufacturers have started not long ago. Memory modules operating at up to 1.2(2.4)GHz have been launched in June. The company also announced (1.6)3.2 GHz chips, twice as fast as the most fastest GDDR3 chips. Samsung presently manufactures three types of GDDR4 chips: 0.71ns , 0.83ns, and 0.91ns ones with operating frequencies from 1100(2200) MHz to 1400(2800) MHz. Let's hope that the current problems with GDDR4 availability will be solved, but now this memory is manufactured in limited amounts.

Advantages of new memory chips over GDDR3 memory do not consist only in performance — power consumption of these modules is lower approximately by 30-40% than in GDDR3. Lower power consumption of GDDR4 allows to reduce requirements to power supply and cooling or to increase power consumption of a GPU retaining the same power consumption of a video card on the whole. Some reduction in power consumption can be obtained due to lower VDD voltage for GDDR4 — 1.5 V, which allows to speak about power saving versus GDDR3. But earlier chips, installed on RADEON X1950 cards, used 1.8 V, that is the same voltage level as in GDDR3. The most powerful solutions may use 1.9 V. That's why the X1950 XTX currently consumes no less than the X1900 XTX, even though GDDR4 is potentially less critical to power supply than the previous version of video memory.

Increased memory frequency resulted in higher memory bandwidth. It's 64 GB/s for RADEON X1950 XTX, which is higher than in any other single-GPU video card. For example, NVIDIA GeForce 7800 GTX offers the bandwidth of 51.2 GB/s, GeForce 7800 GTX 512Mb — 54.4 GB/s, the latter being equipped with the fastest GDDR3 memory. GDDR4 memory on RADEON X1950 XTX enjoys a nearly 30% advantage in bandwidth over the previous top model from ATI. It allows the new solution to outperform the X1900 XTX by 15% in case of heavy load on video memory, such as high resolutions with enabled antialiasing. We shall check it up in a practical part of this article.

A CrossFire modification of this card is less different from a regular model this time. GPU and memory frequencies are the same in these models. The only difference consists in a single DVI and a special CrossFire connector instead of two DVIs and a TV-out. Another interesting fact - recommended prices for these two models are also identical, both video cards will come at $449.

In the next part of our review we shall use synthetic tests to see the effect of the increased memory bandwidth. We shall compare RADEON X1900 XTX and RADEON X1950 XTX at their standard frequencies, to see how the performance changes in various modes due to the increased memory frequency. We shall pay much attention to the comparison between RADEON X1900 XTX at standard frequencies and RADEON X1950 XTX at the X1900 XTX frequencies, that is with its video memory frequency reduced to 775(1550) MHz. This comparison will allow us to speak of no global changes in the chip, as well as to learn the performance level of GDDR4 versus GDDR3 at the same frequencies. RADEON X1950 XTX may just as well be outperformed by its predecessor in some cases due to the increased GDDR4 memory latencies. We shall check it in synthetic and gaming tests, of course.

R580+ Architecture

We again recommend that you should read the theoretical part of the R520 review. So this article will not review the architecture in that detail. Here we'll just sum up the features of the new ATI chips. Let's have a look at the diagram offered by the company:

The diagram is taken from the article about the R580, which differs from the R520 in the number of pixel processors and contains the same number of texture processors (four quads, that is 16 textures per cycle). R580+ is actually not different. The chip contains eight standard ATI vertex processors (Vertex Shader Processors on the diagram) that qualify for SM 3.0 requirements, except for texture fetch: ALU of each vertex processor can execute two different operations simultaneously over three vector components and the fourth component or scalar.

Pixel architecture differs from that used by the main competitor - texture units are brought outside the common pipeline. There is no common long pipeline here that processes quads. The texture part exists separately — texture address units and TMUs. Pixel processors that perform arithmetic operations and register arrays are also independent. A special Ultra Threading Dispatch Processor controls the execution process — 512 quads are processed simultaneously, each one can be at a different shader execution stage. Each quad is stored together with its status, shader command, values of previously checked conditions. The processor is constantly monitoring free resources: texture and pixel units. And it directs queued quads into free units. If a quad did not pass a condition and should not be processed by a given shader part, it will skip unnecessary commands and will not load a texture or pixel unit. If a quad waits for data from a texture unit, it will let other quads forward, which will load pixel units.

The Same Candy in a New Wrapper and Probably Sweeter... ATI RADEON X1950 XTX (R580+). Part 2: Video card's features, synthetic tests

Andrey Vorobiev (anvakams@ixbt.com)
Alexei Berillo (sbe@ixbt.com)

August 29, 2006

Write a comment below. No registration needed!