iXBT Labs - The Market of Graphics Cards in 2007

The entire year has passed under the sign of ATI-AMD merger, slow degradation of activities of this joint company on the market of graphics adapters and a corresponding growth of NVIDIA's share. Back in 2006 ATI started to lose its ground. After 3dfx died away in 2000-2001, there came the time of NVIDIA's domination. It all changed after NVIDIA made some mistakes, and ATI launched the R3xx GPUs and RADEON 9x00 cards. Since then and until 2006, we've been witnessing an interesting struggle between two strong competitors. Common users derived only benefit from it — they got decent products from both companies.

Then ATI went into a stagnation period. Since then the company has been often delaying its new products, it happened with R5xx-based solutions, the same occurred with many R6xx GPUs. The R600 flagship was delayed because of several problems. It was launched much later than the G80 from NVIDIA. Mid-End solutions were also delayed. Even the successful RV670 was launched later than its competitor from NVIDIA. So, this company has forestalled AMD throughout the year. Let's recall all GPUs and corresponding solutions announced and launched in 2007.

April 2007 - G84

(GeForce 8600 GTS, GeForce 8600 GT, GeForce 8500 GT)

NVIDIA introduced a unified graphics architecture into its Mid-End solutions in Spring 2007. Much time passed since the announcement of the GeForce 8800, but there was still no response from AMD. So NVIDIA made another strong move and launched new GPUs and several cards based on these solutions. The key technological innovation was the 80 nm fabrication process, new GPUs (G84 and G86) were actually cut-down modifications of the G80 for Low and Mid-End price segments.

The G84/G86 are based on the famous architecture and enjoy all advantages of the top GPU: unified shaders, full support for DirectX 10, high-quality anisotropic filtering and a new antialiasing method (CSAA). Some units were even improved versus the G80. Such solutions appeared in Low and Mid-End segments in April 2007 (from $89 to $229).

The GeForce 8600 came in two modifications: GTS and GT. First of all, they differed in operating frequencies. The GeForce 8600 GTS was designed to replace the old GeForce 7600 GT, the GT card is a tad lower in rank. What concerns the Low-End segment, NVIDIA presented the GeForce 8500 GT with a GPU cut down correspondingly. Other G86-based cards appeared later: GeForce 8400 GS and GeForce 8300 GS.

The G84 was something in between 1/4 and 1/3 of the G80. It was 1/4 in terms of unified processors, and 1/3 in terms of ROPs. The G86 was just 1/8 of the G80 in arithmetic power, and 1/3 in the number of ROPs. The number of these units was reduced too much. However, the number of transistors in these GPUs is quite big. The G84 has almost half of transistors used in the G80, the G86 - 1/3. It's apparently a compromise solution. If it had contained half of G80 units, it would have been too expensive to manufacture and could have competed with the cheaper modifications of the GeForce 8800.

The key change from the G80 consisted in modified texture units. Each texture unit in the G80 can do four texture addresses and filter eight textures per cycle. But the new GPUs have twice as many address units, so they can do more texture lookups in certain conditions. The G80 could do trilinear filtering "free of charge" or slow down with anisotropic filtering. However, the G84 and G86 have an advantage in case of bilinear-filtered texture lookups.

Another innovation in both GPUs is an improved video processor with advanced support for PureVideo HD. It offloads a CPU almost completely when decoding video of all types, including H.264, VC-1, and MPEG-2 at up to 1920×1080 with the bitrate of up to 30-40 Mbit/s. That is you can play all HD-DVD and Blu-ray discs even on mediocre computers.

Test results show that the weak spots of the new GPUs are few ALUs and TMUs, insufficient memory bandwidth, and relatively few ROPs. As a result, cards with the new GPUs demonstrate very low execution speed of pixel and vertex shaders, as well as low performance, when limited by texel rate.

May 2007 - R600 + RV630/RV610

(RADEON HD 2900 XT + HD 2600/2400)

AMD launched its unified DirectX 10 solutions in May. However, the announcement lacked offers in the Upper High-End segment. Besides, Mid and Low-End cards were presented only on paper, no products appeared in the market. The key difference between the RV630/RV610 and the R600 is the 65 nm fabrication process, which helped reduce production costs, but delayed the appearance of these products in the market.

It was initially announced that the entire series of AMD GPUs possessed identical functions in 3D graphics and video decoding. Later on we found out that it was not true - the RV630/RV610 are equipped with more powerful video decoding units. But in May we had information only about the R600. Its key features were: 700 million transistors, 512-bit memory bus, 320 stream processors, programmable hardware tessellator. These GPUs have a unified architecture, they all have sterling support for DirectX 10 and a number of functions that appeared in DirectX 10.1.

Arithmetic power of the R600 is based on 64 superscalar stream processors, each containing five ALUs and a dedicated branching unit (320 stream processors in total). Shader processors in older solutions from ATI used to contain vector and scalar execution units to execute two instructions per cycle over 3+1 or 4+1 components. Each processor in the R600 can do five instructions over five components. That is each stream processor consists of five independent scalar ALUs, which can execute five MAD instructions per cycle. One of five ALUs can execute a more complex instruction: SIN, COS, LOG, EXP, etc.

The R600 seems to have a great advantage in the number of execution units over the G80. But we should take into account that shader processors of the competing product are twice as fast. Besides, it's very difficult to compare scalar and superscalar architectures, both have weak and strong points. Although each unit of the superscalar architecture can process several independent instructions per cycle, this architecture has a weakness - an application and a driver must feed as much independent instructions to a GPU as possible to make sure that execution units are not idle and to keep its efficiency high. The scalar architecture is more flexible, its efficiency is always higher, because it's not easy to provide chains of 4-5 independent instructions.

The main weakness of all R6xx solutions is very few TMUs. High arithmetic performance is relevant, but texture fetch and filtering rates are no less important for 3D graphics. Modern applications use complex pixel and vertex computations, but they also apply several textures per pixel. Sixteen TMUs are insufficient for such a powerful GPU as the R600 to reveal its potential.

We learnt about another ambiguous solution a tad later - implementation of full-screen antialiasing with unified processors. This solution is more flexible. Its support appeared in DirectX 10.1. However, it's not the best option for existing applications, as it potentially takes up some of performance of unified shader units. In return, this solution allows new FSAA modes with up to 24 samples, special modes with programmable layout of sub-pixels, lookups outside pixel borders, and various weights.

The entire family was announced to have better support for hardware acceleration of decoding resource-intensive video formats (H.264 and VC-1) with high bitrates - ATI Avivo HD. Another improvement is a built-in audio chip, which is used to transfer audio via HDMI. Graphics cards based on R600, RV610, and RV630 do not need external audio and corresponding cables. A special DVI-HDMI adapter is used to transfer the HDMI signal (audio and video data) via DVI.

July 2007 - RV630 and RV610

(RADEON HD 2600 XT, RADEON HD 2600 PRO, RADEON HD 2400 XT, RADEON HD 2400 PRO)

We have already mentioned above that only top R600-based solutions appeared in the market, although AMD R6xx had been announced in May. All other graphics cards and GPUs with the unified architecture were postponed until July. That's when AMD launched Low-End and Mid-End solutions with DirectX 10 support. The main difference between the RV630 and the RV610 is a thinner 65 nm fabrication process, which helped reduce manufacturing costs and improve thermal and electric characteristics. AMD was the first to enter the market of graphics solutions with 65 nm GPUs.

We learned one interesting detail only after the announcement of the top RADEON HD 2900 XT. It turns out that not all R6xx solutions from AMD are functionally identical as far as hardware video decoding is concerned. Just like in case of NVIDIA, new Low-End and Mid-End GPUs from AMD possess better video decoding features, because the R600 had some bugs in the improved video decoding unit (UVD).

The RV630 actually differs from the R600 only in the number of execution units: ALUs, ROPs, TMUs. Everything else is identical to the more expensive GPU. The RV610 has a tad more differences, both quantitative (even fewer ALUs and TMUs) and qualitative (no hierarchic Z-buffer, L2 Texture Cache, shared L1 Cache for vertices and pixels). Quantitative changes: the number of shader processors was cut down to 24 (120 processors) in the RV630 and to 8 (40 processors) in the RV610; the number of texturing units was reduced to 8 and 4, correspondingly, with four ROPs in cheaper GPUs.

As in case of NVIDIA, it was done to reduce the number of transistors and GPU complexity. It had a negative effect on performance relative to the top solution. Just like NVIDIA, AMD offered weak Low-End and Mid-End graphics cards with DirectX 10 support. That's probably the first time when both companies offered Low and Mid-End solutions so different in performance from top products.

October 2007 - G92 (GT)

(GeForce 8800 GT)

When Mid-End GPUs of the NVIDIA G8x family were launched, we complained that they were cut down too much in the number of execution units. Users expected better solutions. We were also disappointed by the lack of a 256-bit bus even in the Upper Mid-End card. Those graphics cards demonstrated weak results in modern games, they offered only nominal support for DirectX 10.

The G92 was launched six months after the announcement of the G84 to improve the situation. The first graphics cards on this GPU were designed for the Upper Mid-End segment, from $200 to $250. This GPU was used in powerful graphics cards that brought a 256-bit memory bus and quite many unified execution units into this price segment. G92-based solutions came in two modifications that differed in memory size and frequencies. Memory frequency of a more expensive modification with 512 MB of local video memory was intended to be higher than in the cheaper 256 MB card.

I don't understand why this GPU was called G92 - it does not have significant changes, except for the 65 nm fabrication process. Architecture of the updated GPU is based on the old GeForce 8 (G8x). Changes have to do with texturing units and a video processor. It's the new fabrication process that helped reduce manufacturing costs and brought such powerful solutions into this price segment.

The G92 is actually a modified flagship of the G80 series manufactured by a new fabrication process. It's much more powerful than the older G84. This GPU preserved eight big shader units and 64 texture units, as well as four wide ROPs. The increased complexity of the GPU can be explained by the NVIO chip, which is now integrated into the GPU, and a new-gen video processor.

Texturing units in the G92 are copies of TMUs in the G84 and G86, unlike the G80, where each TMU can process up to four texture addresses and do up to eight texture filtering operations per cycle. These units in the G92 can do twice as many texture lookups. However, 56 units in the GeForce 8800 GT are no stronger than 32 units in the GeForce 8800 GTX in real tasks. When trilinear and/or anisotropic filtering is enabled, the G80 will be faster. This GPU can do more texture filtering, and its performance will not be limited by the texture lookup rate.

Other important changes in the G92 include a built-in video processor of the second generation with extended PureVideo HD support, which is also used in the G84/G86. And the additional NVIO chip used in the GeForce 8800 cards (to support external interfaces) was integrated into the G92.

The most interesting new feature is support for PCI Express 2.0, which doubles the bandwidth. That is, the x16 slot can transfer data at up to 8 GB/s in each direction versus 4 GB/s provided by the 1.x slot. What's important, PCI Express 2.0 is compatible with PCI Express 1.1. Old graphics cards can work in new motherboards, and new graphics cards supporting the second version can work in motherboards without such support, if the external power supply is sufficient. The interface throughput will remain on the old level, of course.

Game and synthetic tests of the GeForce 8800 GT show that the new Mid-End solution from NVIDIA is very powerful. It competes well even with more expensive graphics cards from NVIDIA and AMD, especially as these GPUs are manufactured by a thinner fabrication process and offer better power consumption and heat release. This advantage in performance has led to overcharged GeForce 8800 GT cards. Moreover, they were temporarily in short supply. This solution appeared in the market almost at once. However, there were too few of them for the huge demand, they were very often highly overpriced.

November 2007 - RV670

(RADEON HD 3850, RADEON HD 3870)

The launch of the GeForce 8800 GT was followed by similar products from AMD in a few days. Previous Mid-End solutions from AMD based on RV630 also suffered from being cut down too much in ALUs, TMUs, and ROPs, as well as a narrow 128-bit memory bus, which looked especially bad versus the 512-bit bus in the RADEON HD 2900 XT. That's when it became clear that none of the manufacturers would offer strong Mid-End solutions on GPUs produced by 80 nm or 90 nm fabrication processes. DirectX 10 support and unified architectures impose certain constraints on GPU complexity. Even cheap GPUs must contain complex units, so there are not many transistors left for execution units.

That's why AMD launched an updated Mid-End solution based on a thinner fabrication process (55 nm), which helped reduce manufacturing costs and bring this solution into a given price segment. There is more than a twofold difference between surface areas of R600 (80 nm) and RV670 (55 nm) GPUs with a similar number of transistors (700 and 666 millions): 408 and 192 mm². As a result, the RV670-based solution consumes twice as little power as the R600 card, demonstrating similar performance. However, the advantage of smaller die surface area and thinner technology is less noticeable compared to the competing product from NVIDIA.

The updated RV670 is based on the same R6xx architecture. It possesses the key features of this family. Moreover, it supports new DirectX 10.1 and offers improved hardware-assisted video decoding using the UVD, which is not available in the R600. This GPU was used in new Mid-End products from this company for $179 - $229, i.e. cheaper than competing cards from NVIDIA.

Starting from the RADEON HD 3870 and 3850, based on the RV670, AMD decided to change the naming standard of ATI RADEON cards. The first digit in a card name stands for a graphics card's generation, the second digit — family (or a market range), the third and fourth digits — card's name within a given generation and family. This naming scheme is logical and comprehensive, users just have to get used to it until the next change.

The RV670 is actually no different from the R600, it has just as many execution units (ALU, ROP, TMU). The only step back in the new Mid-End GPU is no 512-bit bus. It has just a 256-bit one. AMD announced that the memory controller in the RV670 was optimized for more efficient usage of memory bandwidth, that the 256-bit bus was used more efficiently.

The RV670 is the first GPU to support DirectX 10.1. This version will be available only in the first half of 2008 together with Service Pack 1 for MS Windows Vista. The key changes in the version include some improvements: Shader Model 4.1, independent blending modes for MRT, cube map arrays, reading and writing data from/into buffers with MSAA, Gather4, mandatory blending of integer 16-bit formats and filtering 32-bit floating-point formats, as well as support for MSAA with at least four samples, etc. New features of DirectX 10.1 are really useful and convenient. But we shouldn't forget that the updated API will appear only in six months. Graphics cards with its support will also take some time to spread.

Like in the GeForce 8800 GT, one of the main innovations in the RADEON HD 3800 is support for PCI Express 2.0. The real effect of higher PCI Express bandwidth on performance hardly exceeds 5-10% in "playable" modes, but it may be important for CrossFire systems that exchange some part of data via PCI Express.

Speaking of CrossFire, RV670-based solutions claim to be the first graphics cards to support co-operative mode of four cards (or two dual-GPU cards). The updated technology is called ATI CrossFireX. Along with the support for quad-card operation, we should also mention possible overclocking for multi-GPU solutions, including automatic detection of operating frequencies, and support for new multi-monitor modes.

Unlike rarely used CrossFire, the real improvement is ATI PowerPlay — dynamic power management technology adopted from GPUs for notebooks. A special control circuit in a GPU monitors its usage and determines an appropriate operating mode by controlling frequencies and voltages of a GPU and memory, and other parameters in order to optimize power consumption and heat release.

Voltage and frequency of the GPU will be minimized in 2D mode. The same concerns a fan on the GPU heat sink. All parameters will be set to medium under low 3D load, and to maximum when a GPU operates at its full capacity. Unlike previous solutions, these modes are controlled by the GPU, not by the driver. That is, there will be shorter delays and fewer problems with detecting 2D/3D modes.

The HD 3870 is outperformed by the GeForce 8800 GT in most tests, but its recommended price is a tad lower, and its power consumption and heat release are lower because of a thinner fabrication process as well as lower complexity of the chip. Besides, it has a higher potential for price reduction. So, Mid-End graphics cards from AMD are very good. Market immediately reacted to these cards, so their real prices raised much higher above the recommended level.

December 2007 - G92 (GTS)

(GeForce 8800 GTS 512MB)

At the end of the year the NVIDIA GeForce 8800 GTS 512 MB was launched. It's based on the G92 with all execution units (ALUs and TMUs) unlocked. You can distinguish the overhauled GeForce 8800 GTS from cheaper G80-based solutions by the volume of installed video memory (it cannot be 320 MB or 640 MB because of the modified bit capacity of the memory bus), so this card is called GeForce 8800 GTS 512MB. Unlike two modifications of the GeForce 8800 GT with the recommended price of $200-250, the new solution has the recommended price of $349-399.

The new GeForce 8800 GTS 512MB differs much from the old cards. It has more execution units, much higher GPU clock rate, including the clock rate of shader units. Despite the narrowed down memory bus (256-bit versus 320-bit in the old cards), its memory bandwidth remains the same, because engineers raised its clock rate. As a result, the new GTS card possesses much more shader power, and it's much faster at texture lookups. However, its fill rate and memory bandwidth remain on a similar level.

This solution is so good that it sometimes outperforms even the GTX card, to say nothing of both GTS modifications. In addition to the increased number of ALUs and TMUs, as well as noticeably higher clock rates, improved texturing units also provide additional help. The GeForce 8800 GTS 512MB almost always performs on a par with the more expensive GeForce 8800 GTX or even higher, especially as its performance is not limited by memory bandwidth.

It's the only weak spot compared to the top cards. The same concerns fewer ROPs, which number is insufficient in some tests to demonstrate results higher than the GTX. However, the new graphics card is cheaper, and its performance is sufficient to compete with top graphics cards from AMD and even more expensive cards from NVIDIA.

Year 2008

What awaits us in the new year? We'll probably see AMD slowly slide down as a manufacturer of discrete GPUs. There will appear multi-GPU solutions from both manufacturers, new families based on modified architectures. In the first half of the year NVIDIA should announce GeForce 9 graphics cards for several price segments (codenamed D9x), supporting DirectX 10.1 and PCI Express 2.0.

The D9E flagship and D9P/D9M cards will most likely appear in the beginning of the year. Probably some of them may come out by summer. New graphics solutions may appear the first: a Mid-End GeForce 9600 GT (D9P) to replace the GeForce 8600 GTS, as well as a dual-GPU GeForce 9800 GX2 based on two G92 chips to compete with the corresponding card from AMD. Considering the performance difference between G92 and RV670, we can assume that NVIDIA will most likely preserve its leadership in this case.

What concerns AMD, RADEON HD 3650 (RV635), RADEON HD 3470 (RV620), and RADEON HD 3450 (RV620) may come out in the beginning of the year. These are Low and Mid-End solutions. To all appearances, the company will design multi-GPU and multi-core technologies for top segments. The first graphics card with two RV670 GPUs will be called RADEON HD 3870 X2.

This accelerator will probably come out in the end of January, 2008. Its recommended price will be above $300. This graphics card will support PCI Express 2.0. Multi-GPU operation will be provided by a special bridge. There are plans to integrate this logic into future GPU models to simplify PCB layouts and reduce costs. The first solution of this kind may be a graphics card with next-gen GPUs, codenamed R700.

Almost all cards mentioned actually belong to existing architectures. We can expect more serious changes and really interesting solutions only in autumn - G100 from NVIDIA and R700 from AMD. We are somewhat disappointed by the manufacturers' crush for multi-GPU products and SLI/CrossFire. Even though it's very convenient to create products for various price segments based on different numbers of the same GPUs, single-GPU solutions still possess apparent advantages: they will be faster in all applications, not only in those optimized for SLI/CF; they do not contain excessive units in each GPU; and they offer better power consumption and heat release. Besides, single-GPU solutions have no problems with render latencies that appear in the most popular SLI and CrossFire modes. Let's hope that manufacturers won't forget about top single-GPU solutions. Fortunately, graphics processing can be divided into threads.

Alexei Berillo (sbe@ixbt.com)
January 10, 2008

Write a comment below. No registration needed!