A Detailed Analysis of the ATI RADEON HD 2400XT/2600PRO/2600XT (RV610/630)

Theory and architecture
Graphics cards' features, synthetic tests
Results of game tests (performance) and conclusions

Part 2. Graphics cards' features, synthetic tests

In the first part of this article we covered all of the architectural details of the RV610 and the RV630 GPUs. In short, the new RV610 GPU along with the RV630 GPU have a number of cards based on them. These are the various cards flavors the GPUs come in:

ATI RADEON HD 2400 XT (RV610), 256MB GDDR3 (64 bit), 700/700/1600 MHz, 40 unified processors/4 TMUs/4 ROPs - $59-69
ATI RADEON HD 2600 PRO (RV630), 256/512MB GDDR2/3 (128 bit), 600/600/800 MHz, 120 unified processors/8 TMUs/4 ROPs - $99-119
ATI RADEON HD 2600 XT (RV630), 256/512MB GDDR3 (128 bit), 800/1400 MHz, 120 unified processors/8 TMUs/4 ROPs - $129
ATI RADEON HD 2600 XT (RV630), 256/512MB GDDR4 (128 bit), 800/2200 MHz, 120 unified processors/8 TMUs/4 ROPs - $169

Today we are testing five graphics cards. Two of the cards are reference cards, and the three other cards are from Sapphire, TUL, and HIS. This article will not focus solely on the products themselves, but will take a stronger look at the architectures behind these cards.

It is important to note that Alexei Nikolaychuk, the author of RivaTuner, has already added support for these new products to his utility:

RADEON 2400 XT

RADEON 2600 PRO

RADEON 2600 XT

Graphics Cards

ATI RADEON 2600 XT (RV630) 256MB GDDR4 PCI-E
GPU: RADEON HD 2600 XT (RV630) Interface: PCI-Express x16 GPU frequencies (ROPs/Shaders): 800/800 (nominal - 800/800 MHz) Memory clock rate (physical (effective)): 1100 (2200) MHz (nominal - 1100 (2200) MHz) Memory bus width: 128bit Vertex processors: - Pixel processors: - Unified processors: 120 Texture processors: 8 ROPs: 4 Dimensions: 210x100x15 mm (the last figure is the maximum thickness of a graphics card). PCB color: red. RAMDACs/TDMS: integrated into GPU. Output connectors: 2xDVI (1xHDMI via an adapter), TV-Out. VIVO: not available TV-out: integrated into GPU. Multi-GPU operation: CrossFire (integrated into GPU).
HIS RADEON 2600 PRO (RV630) IceQ III Turbo 256MB GDDR2 PCI-E
GPU: RADEON HD 2600 PRO (RV630) Interface: PCI-Express x16 GPU frequencies (ROPs/Shaders): 655/655 (nominal - 600/600 MHz) Memory clock rate (physical (effective)): 530 (1060) MHz (nominal - 400 (800) MHz) Memory bus width: 128bit Vertex processors: - Pixel processors: - Unified processors: 120 Texture processors: 8 ROPs: 4 Dimensions: 170x100x32 mm (the last figure is the maximum thickness of a graphics card). PCB color: blue. RAMDACs/TDMS: integrated into GPU. Output connectors: 2xDVI (1xHDMI via an adapter), TV-Out. VIVO: not available TV-out: integrated into GPU. Multi-GPU operation: CrossFire (Software).
PowerColor RADEON 2600 PRO (RV630) 256MB GDDR2 PCI-E
GPU: RADEON HD 2600 PRO (RV630) Interface: PCI-Express x16 GPU frequencies (ROPs/Shaders): 600/600 (nominal - 600/600 MHz) Memory clock rate (physical (effective)): 414 (828) MHz (nominal - 400 (800) MHz) Memory bus width: 128bit Vertex processors: - Pixel processors: - Unified processors: 120 Texture processors: 8 ROPs: 4 Dimensions: 170x100x15 mm (the last figure is the maximum thickness of a graphics card). PCB color: red. RAMDACs/TDMS: integrated into GPU. Output connectors: 2xDVI (1xHDMI via an adapter), TV-Out. VIVO: not available TV-out: integrated into GPU. Multi-GPU operation: CrossFire (Software).
Sapphire RADEON 2600 PRO (RV630) 256MB GDDR3 PCI-E
GPU: RADEON HD 2600 PRO (RV630) Interface: PCI-Express x16 GPU frequencies (ROPs/Shaders): 700/700 (nominal - 600/600 MHz) Memory clock rate (physical (effective)): 700 (1400) MHz (nominal - 400 (800) MHz) Memory bus width: 128bit Vertex processors: - Pixel processors: - Unified processors: 120 Texture processors: 8 ROPs: 4 Dimensions: 170x100x15 mm (the last figure is the maximum thickness of a graphics card). PCB color: blue. RAMDACs/TDMS: integrated into GPU. Output connectors: 2xDVI (1xHDMI via an adapter), TV-Out. VIVO: not available TV-out: integrated into GPU. Multi-GPU operation: CrossFire (integrated into GPU).
ATI RADEON 2400 XT (RV610) 256MB GDDR3 PCI-E
GPU: RADEON HD 2400 XT (RV610) Interface: PCI-Express x16 GPU frequencies (ROPs/Shaders): 700/700 (nominal - 700/700 MHz) Memory clock rate (physical (effective)): 800 (1600) MHz (nominal - 800 (1600) MHz) Memory bus width: 64bit Vertex processors: - Pixel processors: - Unified processors: 40 Texture processors: 4 ROPs: 4 Dimensions: 170x100x15 mm (the last figure is the maximum thickness of a graphics card). PCB color: red. RAMDACs/TDMS: integrated into GPU. Output connectors: 1xDVI, 1xVGA, TV-Out. VIVO: not available TV-out: integrated into GPU. Multi-GPU operation: CrossFire (Software).

ATI RADEON 2600 XT (RV630) 256MB GDDR4 PCI-E
The ATI Radeon 2600 XT has 256 MB of GDDR4 SDRAM comprised made up by four memory chips located on the front side of the PCB. The Samsung memory chips are manufactured by Samsung and are GDDR4. The chips have a 0.9 ns memory access time, which means they operate at 1100 (2200 effective) MHz.
HIS RADEON 2600 PRO (RV630) IceQ III Turbo 256MB GDDR2 PCI-E
The HIS RADEON 2600 PRO has 256 MB of GDDR2 SDRAM made up by combining eight memory chips located on the front and back sides of the PCB. The Qimonda (Infineon) memory chips are GDDR2 and have a 2.0ns memory access time, meaning they operate at 500 (1000 effective) MHz.
PowerColor RADEON 2600 PRO (RV630) 256MB GDDR2 PCI-E
The PowerColor RADEON 2600 PRO has 256 MB of GDDR2 SDRAM allocated in eight memory chips which are located on the front and back of the PCB. The Hynix memory chips are GDDR2 and have a 2.5 ns memory access time, which means they operate at 400 (800 effective) MHz.
Sapphire RADEON 2600 PRO (RV630) 256MB GDDR3 PCI-E
The Sapphire RADEON 2600 PRO has 256 MB of GDDR3 SDRAM allocated in four chips on the front side of the PCB. The Samsung memory chip have a 1.2ns memory access time, which corresponds to 800 (1600 effective) MHz.
ATI RADEON 2400 XT (RV610) 256MB GDDR3 PCI-E
The ATI RADEON 2400 XT reference card has 256 MB of GDDR3 SDRAM allocated in four chips on the front and back sides of the PCB. The manufacturer of the memory is Hynix. The chips have a 1.1ns memory access time, which corresponds to 900 (1800 effective) MHz.

Comparison with the reference design, front view
ATI RADEON 2600 XT (RV630) 256MB GDDR4 PCI-E	Reference card ATI RADEON X1650 XT 256MB PCI-E


PowerColor RADEON 2600 PRO (RV630) 256MB GDDR2 PCI-E


HIS RADEON 2600 PRO (RV630) IceQ III Turbo 256MB GDDR2 PCI-E

Sapphire RADEON 2600 PRO (RV630) 256MB GDDR3 PCI-E


ATI RADEON 2400 XT (RV610) 256MB GDDR3 PCI-E	Reference card ATI RADEON X1300 256MB PCI-E

Comparison with the reference design, back view
ATI RADEON 2600 XT (RV630) 256MB GDDR4 PCI-E	Reference card ATI RADEON X1650 XT 256MB PCI-E

PowerColor RADEON 2600 PRO (RV630) 256MB GDDR2 PCI-E

HIS RADEON 2600 PRO (RV630) IceQ III Turbo 256MB GDDR2 PCI-E

Sapphire RADEON 2600 PRO (RV630) 256MB GDDR3 PCI-E

ATI RADEON 2400 XT (RV610) 256MB GDDR3 PCI-E	Reference card ATI RADEON X1300 256MB PCI-E

As is evident from the above photographs, these cards have a new design that has very little to do with previous ATI cards.

It is noteworthy that the most expensive 2600 XT card, which has DDR4 memory, also happens to have the most complex design, probably because of the complex power supply binding. Incidentally, the 2600 XT variance which has DDR3 will have the same design as the Radeon 2600 PRO-based Sapphire card.

It is interesting to notice that HIS is manufacturing its graphics cards with a blue PCB. The root cause of this change in color can be attributed to a number of events. However, the most likely one is that this change is a result of HIS changing its manufacturing partner.

The cards come with a TV-Out port that uses a proprietary jack. A special adapter is required (usually shipped with the card) to output video to a TV-set via S-Video or RCA. You can read additional information regarding the TV Out here.

The cards have full support for HDMI (in the 2600 cards). The HDMI support means that these cards come equipped with their own audio codec. The video and audio signals go through a special DVI-to-HDMI adapter, which comes bundled with the cards, to an HDMI receiver. We've already reviewed how modern graphics cards decode HD video in this article. In the near future, we plan to update the article with information about the HD 2600 family.

The cards are equipped with a couple of DVI ports, save for the 2400 XT. The ports are Dual link DVI, so users can get resolutions above 1600x1200. Analog monitors with d-Sub (VGA) interface are connected with special DVI-to-d-Sub adapters. The maximum resolutions and frequencies of the cards are:

240 Hz Max Refresh Rate
2048 x 1536 x 32bit @ 85Hz Max - analog interface
2560 x 1600 @ 60Hz Max - digital interface

HIS traditionally uses an IceQ cooler for its cards, and these new cards based on the RV630/610 are no exception. We have examined the IceQ cooler on many occasions, so it is not completely necessary for us to include a detailed description about it. In short, it is a very quiet and efficient cooler. Here are our thoughts on the other cooling solutions:

ATI RADEON 2600 XT (RV630) 256MB GDDR4 PCI-E
The Radeon 2600 XT's cooler is quite complex and heavy. The heatsink is made completely of copper alloys. The complexity of the cooling system in this case is an advantage, as even a slow, quieter fan is sufficient for cooling because the GPU. We have seen cooling solutions like this one many times. Basically, they consist of a large flat heatsink with a turbine at one end that pumps the air inside the housing and drives it over the core. It should be kept in mind, though, that the hot air is not expelled out of the PC case and remains in the system. You have to rely on other system fans to do this task.

PowerColor RADEON 2600 PRO (RV630) 256MB GDDR2 PCI-E
The 2600 PRO cards are not very demanding when it comes to cooling, and thus they only need simple coolers. Just take a look at the PowerColor RADEON 2600 PRO. All it requires for cooling is a simple plate heatsink with a slow fan. The latter is placed slightly off-center, and the air flows along the heatsink.

Sapphire RADEON 2600 PRO (RV630) 256MB GDDR3 PCI-E
The underlying concept of this device is similar to that of the cooler used in the 2600 XT. It is, however, smaller and does not need copper alloys. In addition, the fan runs at a slow RPM, meaning there is less noise.

ATI RADEON 2400 XT (RV610) 256MB GDDR3 PCI-E
The cooler of the Radeon 2400 XT is also another simple solution. What we don’t understand, though, is why the heatsink is closed here. Whatever the reason is, the cooling solution is a low-noise device. We should also keep in mind that these cards can also come passively cooled, which most likely would mean the whole surface would be covered by a heatsink.

Now let's have a look at the monitoring results.

ATI RADEON 2600 XT

From the above graphs, we can plainly see that the cooling solution of the ATI Radeon 2600 XT reference board works quite well. For the most part, the operating temperatures stay quite low.

Looking at the GPU itself, it was manufactured in Week 20 of 2007. That would mean the manufacturing date would fall into May indicating that the GPU we are testing is currently the latest revision. This would also indicate why the release of these cards was delayed for so long by AMD, so that they could wait for the latest revision of the GPU to complete.

ATI RADEON 2600 PRO

The above tests indicate that a simple cooling solution works well with the ATI Radeon 2600 PRO reference board. Despite the simplified cooler, the core temperature does not rise above 60°C during the tests.

Looking at the GPU, it is plain to see that the ATI Radeon 2600 XT and 2600 PRO are identical chips operating at different frequencies.

ATI RADEON 2400 PRO

In these tests we again see that only a simple cooling solution is needed to cool these cards.

As some of the cards reviewed today are engineering samples, their boxes and bundles are out of the question. Let's just say that the bundle includes DVI-to-VGA, DVI-to-HDMI, and VIVO adapters, TV cords.

Sapphire and PowerColor products came in retail packages.

Sapphire RADEON 2600 PRO (RV630) 256MB GDDR3 PCI-E

PowerColor RADEON 2600 PRO (RV630) 256MB GDDR2 PCI-E

Installation and Drivers

Testbed configuration:

Intel Core2 Duo (775 Socket) based computer
- CPU: Intel Core2 Duo Extreme X6800 (2930 MHz) (L2=4096K)
- Motherboard: EVGA nForce 680i SLI on NVIDIA nForce 680i
- RAM: 2 GB DDR2 SDRAM Corsair 1142MHz (CAS (tCL)=5; RAS to CAS delay (tRCD)=5; Row Precharge (tRP)=5; tRAS=15)
- HDD: WD Caviar SE WD1600JD 160GB SATA
OS: Windows XP SP2 DirectX 9.0c; Windows Vista DirectX 10
Monitor: Dell 3007WFP (30").
Drivers: ATI CATALYST 7.6; NVIDIA Drivers 160.02.

VSync is disabled.

Synthetic tests

The D3D RightMark Beta 4 (1050), the program we use for our synthetic tests, is available for download along with a description of it at http://3d.rightmark.org.

We used complex pixel shader tests for SM 2.0 and 3.0 - D3D RightMark Pixel Shading 2 and D3D RightMark Pixel Shading 3. Some tasks that appear in these tests are already used in real applications. The rest of the tasks will almost certainly be used in the future applications. Our test sets can be downloaded here.

We are going to switch to the new version of our benchmark soon - RightMark3D 2.0. It's written to test Direct3D 10 compatible graphics cards on MS Windows Vista. Some of its tests were rewritten for DX10, and new synthetic tests were added: modified pixel shader tests, rewritten for SM 4.0, geometry shader tests, vertex texture fetch tests. First of all we'll publish an article about RightMark3D 2.0 with test results of many graphics cards. Then we'll start using this test in our baseline articles.

The synthetic tests are run on the following graphics cards:

RADEON HD 2900 XTwith the standard parameters (HD2900XT)
RADEON HD 2600 XT with the standard parameters (HD2600XT)
RADEON HD 2600 PRO with the standard parameters (HD2600PRO)
RADEON HD 2400 XT with the standard parameters (HD2400XT)
NVIDIA GeForce 8600 GT with the standard parameters (GF8600GT)
NVIDIA GeForce 8500 GT with the standard parameters (GF8500GT)

We decided to compare the RV630-based cards against these GeForce models because they have the same market position, even if the prices are different. The RADEON HD 2400 XT is included in order to analyze the effects the scaled-back technical features of the RV610 have on performance. The performance results of the high-end solutions are also necessary in order to evaluate the relative performance of the slower cards.

Pixel Filling Test

This test determines the peak texel rate in FFP mode for various numbers of textures per pixel:

All low-end cards from AMD give results close to the theoretical maximum performance level. Their results in synthetic tests are only a little lower than the theoretical values, especially in modes with lots of textures. On the other hand, both graphics cards from NVIDIA are not even close to their theoretical maximum value, just like in our baseline review of the G84. Either our test is not quite correct, or NVIDIA provides wrong information about its low-end chips. Whatever the case is, we'll soon find out.

Judging by our results, AMD chips can fetch 8 and 4 texels (for RV630 and RV610 correspondingly) per cycle for 32-bit textures and bilinear-filter them. Interestingly, NVIDIA GPUs look better with few textures per pixel, and they start lagging behind in heavier conditions. In short, AMD GPUs perform better than competing NVIDIA solutions judging by our texture fetch test; given that we compare the GeForce 8500 GT against the HD 2600 PRO and GeForce 8600 GT against the ATI Radeon HD 2600 XT.

The second synthetic test from RightMark measures fillrate performance. The test takes into account the number of pixels written into a frame buffer. NVIDIA chips perform better in the fillrate test with 0 and 1 textures, either because they have higher efficiency with the frame buffer or NVIDIA has made some special optimizations to the chips.

Starting at two textures per pixel, the RV630 and G84 perform on par with each other. Immediately afterwards comes the fastest mid-range solution from AMD. The new architecture from AMD seems to be performing better with more texture fetches.

As usual, we run the same task, this time executed in Pixel Shader 2.0 mode:

There are no changes this time either. FFP and Shaders 2.0 perform equally well (perhaps, FFP is emulated by a more efficient shader) on all graphics cards. All of the tested solutions show results similar to the previous test.

Geometry Processing Speed Test

We'll start our tests of execution units with a traditional warning: you should always treat synthetic tests of unified architectures with caution. This is because synthetics tests place loads on selected parts of a GPU only. Real applications, on the other hand, use all GPU resources simultaneously. While GPUs with older architectures can display almost peak results in well-balanced 3D applications, GPUs with unified architectures usually show worse results in synthetic tests.

The first geometry test contains a simple vertex shader that shows maximum triangle throughput:

All GPUs in this review are based on unified architectures and since the processing load in this test mainly deals with geometry, all of the GPUs show high results. The RADEON HD 2600 PRO performs almost on a par with the RADEON X1950 XTX. Evidently, performance is limited by the API and the platform, not by the peak performance of the unified processors as this task is quite easy for them. The GPUs show similar execution efficiency of this test in various modes. There is a very small difference in peak performance in FFP, VS 1.1 and VS 2.0.

There aren't any definitive comments we can say regarding the results of this test. The RV630 isn't on par with the R600, even theoretically, unless we take into account some API limitations. Despite the limitations, we can easily see that AMD products process geometry much faster than NVIDIA GPUs.

Let's see what impact on performance a more complex test with a single diffuse light source will have on performance:

The performance displayed by the cards in this test is slightly closer to reality, but problems do still exist. The high-end Radeon HD 2900XT still performs equal to the mid-range Radeon HD 2600XT; naturally, this wouldn't be true in a real-world situation. The only thing we can say is that something is most likely limiting the performance of the HD 2900XT. FFP mode is slightly faster on some AMD graphics cards than in the last test, and a lot slower on NVIDIA cards. The slower FFP performance may be a problem with NVIDIA's drivers, or NVIDIA might have decided that FFP is not very necessary. Both the GeForce 8500GT along with the GeForce 8600GT are defeated by their price-point competitors by more than two-fold.

Although this synthetic test does not hold much practical use, it shows the peak capacity of the GPUs. Let's see what happens to performance in more demanding lighting conditions. The third diagram offers a more complex computation of lighting with a single light source and a specular component:

The gap between the Radeon HD 2900XT/2600 cards and the GeForce graphics cards grows even larger under more complex lighting conditions. The HD2900XT (R600) GPU remains the leader in geometry performance and the HD2600XT is only slightly behind.

Looking at the other video cards, GeForce 8600 GT is outperformed by even the Radeon HD 2600 PRO. The GeForce 8500 GT ends up barely competing with the Radeon HD 2400 XT. Clearly GPUs from AMD process geometry much faster than NVIDIA GPUs.

Next, let's analyze the most complex geometry task, which consists of three light sources with static and dynamic branches. The performance difference might grow even larger in this test:

But no, the situation is almost the same here, except for the fact that the R600 heavily outperforms the RV630 here. To us, it seems that the potential of the R600 is not completely shown by even our heaviest task. From what we are seeing, its performance is limited by the API. This is another proof that the unified architecture of the R6xx series is indeed quite a strong one that is capable of processing geometry efficiently. The more difficult a task is, the better the results are compared to the unified architecture of the NVIDIA G8x family. One thing to remember, though, is that AMD and NVIDIA GPUs have opposite weak spots in vertex units - AMD loses more performance because of dynamic branching, while NVIDIA suffers more from static branching.

Our conclusion on the geometry tests: The new entry-level and mid-range products from AMD show some very strong results in the synthetic geometry tests. These GPUs perform well in such tests primarily because of their strong unified architecture and special modifications. In the synthetic tests, we see that they are able to use all their unified stream processors to solve geometry tasks. The new unified architecture from AMD demonstrates its aptitude with complex vertex shaders as well. The AMD GPUs are able to outperform competing NVIDIA ones by significant margins. Remember, though, that these are only synthetic tests. In real applications, unified processors are usually busier with computing pixels. Keeping this in mid, we now proceed to the synthetic pixel performance tests.

Pixel Shaders Test

As our comparison does not include NVIDIA GPUs of the older architectures, which gain advantage when a number of temporal registers along with their precision is reduced, we do not publish FP16 results. All of today's GPUs must execute pixel shaders with reduced precision at the same speed as authentic FP32 ones.

The first group of pixel shaders is very simple for modern GPUs. It includes various versions of simple shaders: 1.1, 1.4, and 2.0.

The G8x and R6xx GPUs have very little trouble with these tests. The tests don't show the true worth of unified architectures. Performance in the simplest tests is limited by texture fetches and fillrate, which is why the mid-range GPUs from AMD and NVIDIA end up with similar results. In the PS 2.0 tests the RV630 performs the highest. Looking at the situation of the GeForce 8500 GT/HD 2600 PRO pair, it is noticeably simpler. The AMD solution is always faster than its main competitor, which in the end has comparable performance to only the cheapest of AMD contenders tested today.

Let's have a look at the results of more complex pixel programs, in between SM 2.0 and 3.0:

The procedural water test depends heavily on texturing performance and uses dependent fetches of highly nested textures. In this tests, a trend that we've been seeing repeats and again the GeForce 8600 GT and the HD 2600 XT show similar results. The HD 2900 XT is naturally way ahead, but the difference from the RV630 is less than twofold. The GeForce 8500 GT is again much slower than the HD 2600 PRO, it's even outperformed by the HD 2400 XT. The second test is more intensive in terms of computing. In it, all AMD products shoot forwards in terms of performance; the HD 2600 PRO ends up outperforming even the GF 8600 GT. This task seems to favor the superscalar architecture of the R6xx, which is made up of many unified processors in these GPUs.

Compared to the R600, all inexpensive cards expectedly demonstrate much lower results. Although the RV630 is more than twice as slow as the high-end AMD solution, it still performs well for its class. The HD 2400 XT's performance mainly suffers from being heavily cut down on the side of technical features. Although it demonstrates decent results for its price segment, this HD 2400 XT will hardly be able to provide comfortable frame rates in real-world DirectX 10 applications with quality settings set to high.

New Pixel Shaders tests

The new pixel shader tests were introduced not long ago, and as such they are more GPU-intensive and demanding than the above tests. In fact, we are plan to discontinue performing synthetic tests for old shader models (below 2.0) and use only SM 2.x, 3.0 and 4.0 written in HLSL. Performance of the old shader versions can be evaluated in games (they have been used there for a long time already).

These tests are divided into two categories. We'll start with Shaders 2.0. There are two tests with popular effects from modern 3D applications:

Parallax Mapping is a texturing method we've seen in several modern games (Splinter Cell: Chaos Theory, F.E.A.R., TES4: Oblivion, Prey, etc) - it's described in detail in the article Modern 3D Graphics Terms
Frozen Glass is a complex procedural texture with adjustable parameters. Similar effects (but less complex) have been used in games for some time already.

Both shaders are tested in two modes: mathematical computation intensive and texture sampling intensive. Let's analyze mathematical computation intensive modes, as they are more promising from the point of view of future applications:

The situation is similar to what we have seen in the previous group of tests. The R600 is the leader here, but the high-end RV630-based card is just 1.7 times as slow in the Frozen Glass test. The G84-based card from NVIDIA is only slightly outperformed by the HD 2600 XT in this test. It confirms our assumption that performance is limited mainly by the texture sampling rate (texture sampling is inevitable in any tests). The GeForce 8500 GT traditionally competes only with the weakest card from AMD, which is outperformed by the RV630-based solution by more than a factor of two.

The situation in the second test (Parallax Mapping) is different. The GeForce 8500 GT shows even worse results than the HD 2400 XT, and the GeForce 8600 GT is almost on par with the cheaper (according to preliminary data) HD 2600 PRO. That's what different tests with different TMU and ALU loads can do - the advantage can end up being larger in one test than the other. Assuming that the same may happen in real applications, NVIDIA solutions may end up with the advantage in some applications while AMD cards may perform better in the others.

Let's analyze modes of the same tests that prefer texture sampling to mathematical calculations:

The situation changes here, but not drastically. Performance in these tests is limited mostly by the texturing units, which is why AMD cards do not break away from NVIDIA cards in the Parallax Mapping test.

In any case, all of the GPUs prefer mathematical-intensive shaders and end up working faster with them. For modern architectures, focusing in on textures does not make much sense. GPUs of both architectures (G8x and R6xx) prefer mathematical computations over texturing. AMD (ATI) solutions have traditionally preferred mathematical computations, especially considering the small amount of TMU units.

Take a look at the result of two more pixel shader 3.0 tests. These are out most complex synthetic pixel shader tests for Direct3D 9. These tests place a heavy load not only on the ALUs, but also on the texture units. Both shader programs are relatively complex, long, and have lots of branches:

Steep Parallax Mapping is a much heavier modification of parallax mapping. It's not used in games so far. You can read about it in our article Modern 3D Graphics Terms
Fur is a procedural shader that visualizes fur

The GeForce 8600 GT based on the G84 fares slightly better than the 8500 GT. It is, however, more expensive than its direct competitor. Nevertheless, at least it provides performance similar to that of the Radeon HD 2600 PRO.

These two tests show that cheaper AMD R6xx solutions execute complex pixel shaders 3.0 with many branches more efficiently than NVIDIA's entry-level products. The performance advantage of ATI cards over NVIDIA G8x cards reaches a factor of 1.5-2 in our synthetic tests.

Conclusion on the pixel shader tests: The RV610 and RV630 chips are based on the efficient R6xx architecture, which from what we have seen is well suited for complex pixel shaders. The more complex calculations there are in the game, the more efficient the new architecture from AMD is. In this case, NVIDIA solutions cannot be saved by their higher theoretical texture sampling rate, which is important even in synthetic tests of pixel shaders, to say nothing of real games where the texturing speed plays an even more important role. As games are lagging behind the progress of technology, the situation in games may be slightly different, and the performance results may change. We will have a look at the test results of the RADEON HD 2600 and the HD 2400 in modern games in the next part of this article and will see whether our assumptions are true.

Conclusion on the synthetic tests

The R6xx architecture is notable for high computing performance. It is intended for modern and future 3D applications with many complex shaders of all types. High efficiency and the large amount of unified processors GPU helps even inexpensive ATI GPUs give good results in all synthetic tests, especially in geometry and complex pixel tests. It should be noted that the advantage of AMD GPUs over competing solutions grows with the load. While NVIDIA solutions can compete with ATI GPUs in less intensive conditions, AMD gains an advantage in situations with heavier GPU load.
Entry-level and mid-range GPUs from AMD, as can be expected, do have their weak spots - namely the relatively small amount of TMUs and ROPs. In our opinion, the number of TMUs that ATI's mid-range to entry-level GPUs have may be insufficient for modern games. Even some synthetic tests show that the R6xx chips are sometimes limited by low texture sampling performance. However, competing GPUs from NVIDIA do not have many TMUs or ROPs, so in terms of technical features the R6xx isn't far behind its compeition
The RADEON HD 2600 and the HD 2400 XT/PRO demonstrate very good results in our synthetic tests. They evidently outscore NVIDIA solutions for similar prices. The HD 2600 XT outperforms the GeForce 8600 GT practically in all tests, and the HD 2600 PRO is evidently faster than the GeForce 8500 GT. But the situation in real games may not be that peachy - texture sampling rate, fillrate, and memory bandwidth still play an important role in modern games. AMD and NVIDIA are more or less on a par here. We'll see various situations in games, sometimes NVIDIA cards will be faster, sometimes - AMD cards.

As we have already noted in our baseline review of the new AMD unified architecture, the architecture is very powerful and designed for complex calculations. The architecture scales quite well and the entry-level products reviewed today are quite competitive judging by the synthetic test results. Since they are also manufactured by a finer process technology, the new entry-level and mid-range ATI chips get additional advantages in terms of power consumption and heat release. We are somewhat surprised, though, by the fact that AMD has no card to contend with the GeForce 8600 GTS. Was ATI not able to squeeze enough performance from the RV630 to perform well in modern applications and compete with the top G84 solution?

In the next part of this article we will test the new low-end and mid-range solutions from AMD in modern games and see whether our conclusions made after the synthetic tests are true. The gaming section is the main part of the next article. In our opinion, you should make your choice of whether you want any of these new GPUs using the real-world game performance results. We conduct the synthetic tests primarily in order to reveal the potential of new architectures, and don't intend them to be the main deciding factor when purchasing a product.

The PSU for the testbed was kindly provided by TAGAN

The Dell 3007WFP monitor for the testbeds was kindly provided by NVIDIA

Theory and architecture
Graphics cards' features, synthetic tests
Results of game tests (performance) and conclusions

Andrey Vorobiev (anvakams@ixbt.com)
Alexei Berillo (sbe@ixbt.com)
July 30, 2007

Write a comment below. No registration needed!