Graphics Card Buyer's Guide

Version 1.2, 29.02.2008

Key features of a graphics card

Modern GPUs consist of many units, amount and features of which determine the resulting rendering speed that affects gameplay. Comparing the number of these units in different graphics cards allows to determine how fast this or that GPU is. Graphics processors have many features, this article section will touch upon the most important ones.

GPU clock rate

GPU operating frequency is measured in megahertz, millions of cycles per second. This directly affects GPU performance. The higher it is, the more vertices and pixels can be processed by a graphics processor per second. For example: GPU clock rate of the RADEON X1900 XTX is 650 MHz. The same processor in the RADEON X1900 XT operates at 625 MHz. So all the major performance features will differ accordingly. But GPU clock rate is not the only one to determine performance. The latter is also heavily affected by architecture: the number of execution units, their characteristics, etc.

In recent solutions clock rates of separate GPU units differ from an entire chip clock rate. That is different GPU units operate at different frequencies. This is done to increase efficiency, as some units can operate at increased frequencies, while others cannot. The latest example is NVIDIA GeForce 8800 - the GTS graphics processor operates at 512 MHz, but shader units operate at the much higher frequency of 1200 MHz.

Fill rate

Fill rate shows how fast a graphics processor can draw pixels. There are two types of fill rate: pixel fill rate and texel rate. The former characterizes the speed of drawing pixels and depends on the operating frequency and the number of ROP (raster operations pipeline) units. The latter is the texture fetch rate, which depends on the operating frequency and the number of texture units.

For example, GeForce 7900 GTX pixel fill rate is 650 (GPU clock rate) * 16 (ROPs) = 10400 megapixels per second. Its texel rate is 650 * 24 (texture units) = 15600 megatexels/s. The greater is the first value, the faster a graphics card draws pixels. The greater is the second value, the faster a graphics card fetches textures. Both are important in modern games, but they must be balanced. That's why the number of ROPs in G7x GPUs, used in GeForce 7 cards, is smaller than the number of texture and pixel units.

The number of pixel shaders (or pixel processors)

Pixel processors are important GPU units, which execute special programs also known as pixel shaders. The number of pixel processors and their frequency can be used to compare shader performance of graphics cards. As most games are now limited by pixel shaders execution performance (read our reviews of game technologies), the number of these units is very important! If a certain graphics card has a GPU with 8 pixel processors, and the other one from the same series features 16 pixel processors, the second card will process pixel shaders twice as fast (all other matters being equal.) So, it will work faster. But you cannot draw final conclusions on the grounds of this quantity alone. You must also consider clock rates and different architectures used in various graphics processor generations. These values can be used to compare only processors belonging to the same series by the same vendor: AMD(ATI) or NVIDIA. In other cases you should pay attention to performance results in games you are interested in.

The number of vertex shader units (vertex processors)

Similarly to those in the previous subsection, these units execute shaders, but these are vertex shaders. This feature is important for some games, but not as much as the previous one. That's because modern games cannot load vertex processors even by half. As vendors balance the numbers of various units and do not allow forces to misalign, you can actually ignore the number of vertex processors in a card. Consider it, only when everything else is equal.

The number of unified shader units (unified processors)

Unified processors combine both unit types mentioned above. They can execute vertex and pixel shaders (as well as geometry programs brought about by DirectX 10). The unified architecture was first used in Microsoft Xbox 360, GPU of which was designed by ATI. Speaking of computer GPUs, their unified processors are rather young, introduced by NVIDIA GeForce 8800. To all appearances, all DirectX 10 compatible GPUs will be based on this kind of unified architecture. Shader units unification means that various shader code (vertex, pixel, and geometric) is universal. So the corresponding unified processors can execute any shaders mentioned above. Thus, in new architectures the number of pixel, vertex, and geometry units is merged into a single value - the number of unified processors.

Texture Mapping Units (TMU)

These units cooperate with all types of shader processors, fetching and filtering texture data necessary to render a scene. The number of texture units in a graphics processor determines texturing performance - texture fetch rate. Although most computing is done by shader units these days, the load on TMUs is still rather heavy. Considering that some games are limited by TMU performance, we can say that the number of TMUs and corresponding high texturing performance are important GPU parameters. This parameter especially affects performance in video modes with enabled trilinear and anisotropic filtering, which require additional texture fetches.

Raster OPeration units (ROP)

ROPs write computed pixels into buffers and blend them. As we have already noted above, ROP performance affects the fill rate, one of the key features of all graphics cards. Although its importance has come down a little of late, there are still situations, when performance depends much on the speed and number of ROPs (read our technological game reviews). It can almost always be explained by active usage of post processing filters and enabled antialiasing with high-quality game settings.

You should again note that modern GPUs cannot be evaluated only by the number of various units and their frequencies. Each GPU series uses a new architecture, units of which differ much from older versions. Besides, the proportion of various units may differ as well. ATI was the first to use the architecture, which had far more pixel processors than texturing units. In our opinion, it was a premature step. But some applications use pixel processors more actively than other units. Therefore such applications will benefit from this solution, to say nothing of the future apps. Besides, the last but one architecture from AMD(ATI) has no separate pixel pipelines, pixel processors are not "tied" to TMUs. However, NVIDIA P² GeForce 8800 is even more complex...

Let's analyze the case of GeForce 7900 GT and GeForce 7900 GS. Both cards have the same operating frequencies, memory interface, and even identical GPUs. But the 7900 GS GPU has 20 active pixel shader and texturing units, while the 7900 GT has 24 units of each type. Let's have a look at performance differences between these solutions in Prey:

The 20% difference in the amount of main executing units yielded different performance gains in test resolutions. The 20% value is unobtainable, because Prey performance with these cards is not limited by TMUs and ROPs only. The difference in 1024x768 is smaller than 8%. In higher resolutions the difference reaches 12%, which is closer to the theoretical difference in unit numbers.

Video memory size

GPUs use built-in memory to store necessary data: textures, vertices, buffers, etc. It seems the higher, the better. But it's not that simple. Judging video card performance by memory size is the most popular error! Inexperienced users often overestimate value of memory size, and use it to compare different solutions. That's understandable - they think if a value (one of the first to be specified in all sources) is twice as high, performance should be twice as high as well. The reality differs from this myth - when you increase memory size, performance is given a certain boost. But then additional memory fails to yield any gains.

Each game requires a certain video memory size sufficient for all data. If you install, say, 4 gigabytes, this graphics card will not render the game faster. Its performance will be limited by GPU units. Given everything else is equal, a 320 MB video card will operate at the same speed as a 640 MB card almost in all situations. There are cases, when large memory size results in noticeable performance gains. This happens in demanding games in high resolutions with maximum settings. But this happens rarely, so while you take memory size into account, don't forget that its performance boost is limited. There are more important parameters, such as memory bus width and frequency. Read about video memory sizes in the next article sections.

Memory bus width

Memory bus width is a very important feature that affects memory bandwidth. A wider bus allows to transfer more data from video memory to GPU and back per second, this having a positive effect on performance in most cases. Theoretically, a 128-bit bus can transfer twice as much data per cycle than a 64-bit one. In practice, the rendering performance difference is not twofold, but is often very close to it, being limited by video memory bandwidth.

Modern graphics cards use various bus widths: from 64- to 512-bit wide depending on a price range and GPU release date. Low-End graphics cards mostly use 64-bit and (much rarely) 128-bit buses. Mid-End cards use 128-bit and sometimes 256-bit buses. And High-End cards use 256-bit to 512-bit buses.

Video memory frequency

Another parameter that affects memory bandwidth is its frequency. As we have noted above, increasing memory bandwidth directly affects 3D performance. Memory bus frequency in modern graphics cards ranges from 500 MHz to 2000 MHz, that is fourfold. As memory bandwidth depends both on memory frequency and its bus width, 1000 MHz memory with a 256-bit bus will have better bandwidth than 1400 MHz memory with a 128-bit bus.

Let's analyze comparative performance of graphics cards with different bandwidths on the example of RADEON X1900 XTX and RADEON X1950 XTX. These cards have nearly identical GPUs of the same features and frequencies. The key difference is memory type and frequency - GDDR3 at 775(1550) MHz and GDDR4 at 1000(2000) MHz, respectively.

As you can clearly see, the card with lower memory bandwidth lags behind. But the difference never reaches theoretical 29%. The framerate difference grows together with resolution, starting from 8% at 1024x768 and reaching 12-13% at maximum resolutions. But these graphics cards have only slightly different memory bandwidths. Actually you should pay special attention to memory bus bandwidth and frequency, when you buy an inexpensive card. Many of these come with only 64-bit interfaces that strongly affect performance. In fact, we do not recommend to buy 64-bit solutions for gaming at all.

Memory types

Graphics cards can be equipped with different types of memory. We shall not consider old SDR memory, you won't find it anyway. All modern types of DDR and GDDR memory transfer twice as much data at the same frequency, so operating clock rate is usually doubled in specification lists. For example, if DDR frequency is specified as 1400 MHz, memory physically operates at 700 MHz. But vendors specify the so-called "effective frequency," the one at which SDR memory could run to provide the same bandwidth.

The main advantage of DDR2 memory is its ability to operate at high frequencies, hence the increased memory bandwidth compared to previous technologies. This is obtained by increasing latencies, which are actually not very important for graphics cards. The first card that used DDR2 was NVIDIA GeForce FX 5800 Ultra. In fact, it was equipped with GDDR2 - not the real DDR2, but a cross between DDR and DDR2. After GDDR2 was used in GeForce FX 5800, the next NVIDIA cards featured DDR memory. But GDDR2 returned in GeForce FX 5700 Ultra and some later Mid-End graphics cards. Video memory technologies have advanced much further since that time. Vendors released GDDR3, which was close to DDR2, but was specially modified for graphics cards.

GDDR3 uses the same technologies as DDR2, but has improved power consumption and heat dissipation, which allow to create chips operating at higher frequencies. Even though it was ATI who developed the standard, the first graphics card to use GDDR3 was the second modification of NVIDIA GeForce FX 5700 Ultra, and then GeForce 6800 Ultra.

Speaking of GDDR4, it's the latest generation of video memory, which is almost twice as fast as GDDR3. The key differences between GDDR4 and GDDR3, which are important to users, are again increased frequencies and reduced power consumption. Technically, GDDR4 memory does not differ much from GDDR3, being a further advancement of the same ideas. The first GDDR4-equipped graphics card was RADEON X1950 XTX. NVIDIA hasn't launched any products with this memory type yet. The advantage of new memory chips over GDDR3 is lower power consumption (approximately by 1/3). This is achieved by applying lower nominal voltage to GDDR4.

So, modern memory types (GDDR3 and GDDR4) differ from DDR in some parameters and provide twice as high data transfer rates. Such video memory uses some special technologies that allow to raise operating frequency. For example, GDDR2 memory usually operates at higher frequencies than DDR. GDDR3 works at even higher frequencies. And GDDR4 provides the highest frequency and bandwidth.

Next: Output connectors

Alexei Berillo aka SomeBody Else (sbe@ixbt.com)

April 8, 2008

Write a comment below. No registration needed!