iXBT Labs - Graphics Card Buying Guide - Page 4: Key features of a graphics card

Key features of a graphics card

Modern GPUs consist of many units, amount and features of which determine the resulting rendering speed that affects gameplay. Comparing the number of these units in different graphics cards allows to determine how fast this or that GPU is. Graphics processors have many features, this article section will touch upon the most important ones.

GPU clock rate

GPU operating frequency is measured in megahertz, millions of cycles per second. This directly affects GPU performance. The higher it is, the more vertices and pixels can be processed by a graphics processor per second. For example: GPU clock rate of the RADEON HD 4870 is 750 MHz. The same processor in the RADEON HD 4850 operates at 625 MHz. So all the major performance features will differ accordingly. But GPU clock rate is not the only one to determine performance. The latter is also heavily affected by architecture: the number of execution units, their characteristics, etc.

Clock rates of separate GPU units sometimes differ from those of the rest of the chip. That is different GPU units operate at different frequencies. This is done to increase efficiency, as some units can operate at increased frequencies, while others cannot. The latest example is NVIDIA GeForce GTX -- the GTX 285 graphics processor operates at 648 MHz, but shader units operate at the much higher frequency of 1476 MHz.

Fill rate

Fill rate shows how fast a graphics processor can draw pixels. There are two types of fill rate: pixel fill rate and texel rate. The former characterizes the speed of drawing pixels and depends on the operating frequency and the number of ROP (raster operations pipeline) units. The latter is the texture fetch rate, which depends on the operating frequency and the number of texture units.

For example, GeForce GTX 275's pixel fill rate is 633 (GPU clock rate) * 28 (ROPs) = 17724 megapixels per second. Its texel rate is 633 * 80 (texture units) = 50640 megatexels/s. The greater is the first value, the faster a graphics card draws pixels. The greater is the second value, the faster a graphics card fetches textures. Both are important in modern games, but they must be balanced. That's why the number of ROPs in modern GPUs is smaller than the number of texture units.

The number of pixel shaders (or pixel processors)

Pixel processors are important GPU units, which execute special programs also known as pixel shaders. The number of pixel processors and their frequency can be used to compare shader performance of graphics cards. As most games are now limited by pixel shaders execution performance, the number of these units is very important! If a certain graphics card has a GPU with 8 pixel processors, and the other one from the same series features 16 pixel processors, the second card will process pixel shaders twice as fast (all other matters being equal.) So, it will work faster. But you cannot draw final conclusions on the grounds of this quantity alone. You must also consider clock rates and different architectures used in various graphics processor generations. These values can be used to compare only processors belonging to the same series by the same vendor: AMD(ATI) or NVIDIA. In other cases you should pay attention to performance results in games you are interested in.

The number of vertex shader units (vertex processors)

Similarly to those in the previous subsection, these units execute shaders, but these are vertex shaders. This feature is important for some games, but not as much as the previous one. That's because modern games cannot load vertex processors even by half. As vendors balance the numbers of various units and do not allow forces to misalign, you can actually ignore the number of vertex processors in a card. Consider it, only when everything else is equal.

The number of unified shader units (unified processors)

Unified processors combine both unit types mentioned above. They can execute vertex, pixel, and geometry programs (as well as other types that will appear in DirectX 11). The unified architecture was first used in Microsoft Xbox 360, GPU of which was designed by ATI. Speaking of computer GPUs, their unified processors appeared in NVIDIA GeForce 8800. All DirectX 10 compatible GPUs are based on this kind of unified architecture. Shader units unification means that various shader code (vertex, pixel, and geometric) is universal. So the corresponding unified processors can execute any shaders mentioned above. Thus, in new architectures the number of pixel, vertex, and geometry units is merged into a single value -- the number of unified processors.

Texture Mapping Units (TMU)

These units cooperate with all types of shader processors, fetching and filtering texture data necessary to render a scene. The number of texture units in a graphics processor determines texturing performance -- texture fetch rate. Although most computing is done by shader units these days, the load on TMUs is still rather heavy. Considering that some games are limited by TMU performance, we can say that the number of TMUs and corresponding high texturing performance are important GPU parameters. This parameter especially affects performance in video modes with enabled trilinear and anisotropic filtering, which require additional texture fetches.

Raster OPeration units (ROP)

ROPs write computed pixels into buffers and blend them. As we have already noted above, ROP performance affects the fill rate, one of the key features of all graphics cards. Although its importance has come down a little of late, there are still situations, when performance depends much on the speed and number of ROPs (read our technological game reviews). It can almost always be explained by active usage of post processing filters and enabled antialiasing with high-quality game settings.

You should again note that modern GPUs cannot be evaluated only by the number of various units and their frequencies. Each GPU series uses a new architecture, units of which differ much from older versions. Besides, the proportion of various units may differ as well. ATI was the first to use the architecture, which had far more pixel processors than texturing units. Several architectures have no separate pixel pipelines, pixel processors are not "tied" to TMUs.

Video memory size

GPUs use built-in memory to store necessary data: textures, vertices, buffers, etc. It seems the more of it, the better. But it's not that simple. Judging video card performance by memory size is the most popular error! Inexperienced users often overestimate value of memory size, and use it to compare different solutions. That's understandable -- they think if a value (one of the first to be specified in all sources) is twice as high, performance should be twice as high as well. The reality differs from this myth -- when you increase memory size, performance is given a certain boost. But then additional memory fails to yield any gains.

Each game requires a certain video memory size sufficient for all data. If you install, say, 4 gigabytes, this graphics card will not render the game faster. Its performance will be limited by GPU units. Given everything else is equal, a 1GB graphics card will operate at the same speed as a 2GB card in many situations.

There are cases, when large memory size results in noticeable performance gains. This happens in demanding games in high resolutions with maximum settings. But this happens rarely, so while you take memory size into account, don't forget that its performance boost is limited. There are more important parameters, such as memory bus width and frequency. Read about video memory sizes in the next parts of this article.

Memory bus width

Memory bus width is a very important feature that affects memory bandwidth. A wider bus allows to transfer more data from video memory to GPU and back per second, this having a positive effect on performance in most cases. Theoretically, a 128-bit bus can transfer twice as much data per cycle than a 64-bit one. In practice, the rendering performance difference is not twofold, but is often very close to it, being limited by video memory bandwidth.

Modern graphics cards use various bus widths: from 64- to 512-bit wide depending on a price range and GPU release date. The cheapest Low-End graphics cards mostly use 64-bit and (much rarely) 128-bit buses. Mid-End cards use 128-bit and sometimes 256-bit buses. And High-End cards use 256-bit to 512-bit buses. Losses in memory bandwidth can be partially compensated by installing modern memory types (see below).

Video memory frequency

Another parameter that affects memory bandwidth is its frequency. As we have noted above, increasing memory bandwidth directly affects 3D performance. Memory bus frequency in modern graphics cards ranges from 500 MHz to 2000 MHz, that is fourfold. As memory bandwidth depends both on memory frequency and its bus width, 1000 MHz memory with a 256-bit bus will have better bandwidth than 1400 MHz memory with a 128-bit bus.

Let's analyze comparative performance of graphics cards with different bandwidths on the example of RADEON X1900 XTX and RADEON X1950 XTX. These cards have nearly identical GPUs of the same features and frequencies. The key difference is memory type and frequency -- GDDR3 at 775(1550) MHz and GDDR4 at 1000(2000) MHz, respectively.

As you can clearly see, the card with lower memory bandwidth lags behind. But the difference never reaches theoretical 29%. The framerate difference grows together with resolution, starting from 8% at 1024x768 and reaching 12-13% at maximum resolutions. But these graphics cards have only slightly different memory bandwidths. Actually you should pay special attention to memory bus bandwidth and frequency, when you buy an inexpensive card. Many of these come with only 64-bit interfaces that strongly affect performance. In fact, we do not recommend to buy 64-bit solutions for gaming at all.

Memory types

Graphics cards can be equipped with different types of memory. We shall not consider old SDR memory, you won't find it anyway. All modern types of DDR and GDDR memory transfer twice or four times as much data at the same frequency, so the operating clock rate is usually doubled or quadrupled in specifications. For example, if DDR frequency is specified as 1400 MHz, memory physically operates at 700 MHz. But vendors specify the so-called "effective frequency," the one at which SDR memory could run to provide the same bandwidth.

The main advantage of DDR2 memory is its ability to operate at high frequencies, hence the increased memory bandwidth compared to previous technologies. This is obtained by increasing latencies, which are actually not very important for graphics cards. The first card that used DDR2 was NVIDIA GeForce FX 5800 Ultra. In fact, it was equipped with GDDR2 -- not the real DDR2, but a cross between DDR and DDR2. After GDDR2 was used in GeForce FX 5800, the next NVIDIA cards featured DDR memory. But GDDR2 returned in GeForce FX 5700 Ultra and some later Mid-End graphics cards. Video memory technologies have advanced much further since that time. Vendors released GDDR3, which was close to DDR2, but was specially modified for graphics cards.

GDDR3 uses the same technologies as DDR2, but has improved power consumption and heat dissipation, which allow to create chips operating at higher frequencies. Even though it was ATI who developed the standard, the first graphics card to use GDDR3 was the second modification of NVIDIA GeForce FX 5700 Ultra, and then GeForce 6800 Ultra.

And GDDR4 is the next generation of video memory, which is almost twice as fast as GDDR3. The key differences between GDDR4 and GDDR3, which are important to users, are again increased frequencies and reduced power consumption. Technically, GDDR4 memory does not differ much from GDDR3, being a further advancement of the same ideas. The first GDDR4-equipped graphics card was ATI RADEON X1950 XTX. NVIDIA hasn't launched products with this memory type at all. The advantage of new memory chips over GDDR3 is lower power consumption (approximately by 1/3). This is achieved by applying lower nominal voltage to GDDR4.

However, GDDR4 was not widely used even by AMD(ATI). Starting from RV7x0 GPUs, their integrated memory controllers support a new memory type GDDR5, which effective frequency is quadrupled to 4 GHz and higher (theoretically up to 7 GHz), which provides throughput up to 120 GB/s with a 256-bit interface. Engineers had to use a 512-bit bus to raise bandwidth of GDDR3/GDDR4 memory. But GDDR5 allows to double performance with smaller chips and lower power consumption. The first GPUs support 1.5 V (unlike 2.0 V for GDDR3, for example) and offer speeds up to 1000*4=4.0 GHz.

So, modern memory types (GDDR3 and GDDR5) differ from DDR in some parameters and provide 2/4 times as high data transfer rates. Such video memory uses some special technologies that allow to raise operating frequency. For example, GDDR2 memory usually operates at higher frequencies than DDR. GDDR3 works at even higher frequencies. And GDDR5 provides the highest frequency and bandwidth.

Write a comment below. No registration needed!

<< Previous page

Next page >>