We've tested four 3870 X2 cards from the above mentioned companies (AMD partners) in our testlab. Three of them are copies of the reference design. They do not differ much from the reference (except for higher operating frequencies in the card from MSI). That is HIS, MSI and TUL have nothing to do with manufacturing these cards, they just buy ready products from AMD. Then MSI selected cards with GPUs capable of operating at 860 MHz to offer them as OC editions.
Only GeCube presented its own product, a graphics card of a unique design. Its design is truly unique. It concerns both PCB and cooler. All details will be provided below.
Multi-GPU operation:the 3870 X2 has integrated
CrossFire, external CrossFire is also possible (Hardware).
HIS RADEON HD 3870 X2 2x512MB PCI-E
PowerColor RADEON HD 3870 X2 2x512MB PCI-E
MSI R3870X2-T2D1Q-OC (RADEON HD 3870 X2) 2x512MB PCI-E
GeCube RADEON HD 3870 X2 X-Turbo Dual 2x512MB PCI-E
Each graphics card has 2x512=1024 MB of GDDR3 SDRAM allocated in sixteen chips on the front and back sides of the PCB.
Samsung memory chips (GDDR3). 1.0 ns memory access time, which corresponds to 1000 (2000) MHz.
Comparison with the reference design,
front view
HIS RADEON HD 3870 X2 2x512MB PCI-E
Reference card ATI RADEON HD 3870
GeCube RADEON HD 3870 X2 X-Turbo Dual 2x512MB
PCI-E
MSI R3870X2-T2D1Q-OC (RADEON HD 3870 X2) 2x512MB
PCI-E
PowerColor RADEON HD 3870 X2 2x512MB PCI-E
Comparison with the reference design,
back view
HIS RADEON HD 3870 X2 2x512MB PCI-E
Reference card ATI RADEON HD 3870
GeCube RADEON HD 3870 X2 X-Turbo Dual 2x512MB
PCI-E
MSI R3870X2-T2D1Q-OC (RADEON HD 3870 X2) 2x512MB
PCI-E
PowerColor RADEON HD 3870 X2 2x512MB PCI-E
These cards don't have counterparts, so we have nothing to compare them to. However, we'll risk a comparison with a single-GPU graphics card based on the same 3870 GPU. Developers had to use the back side of the board for memory chips (half of memory chips had to be moved from the front to the back side). So they are covered with a metal plate on the back side of the board for cooling. On the other hand, the GeCube card works fine without such a plate, although it has the same memory chips operating even at a higher frequency.
GeCube designed its own PCB. This solution has its pros and cons. Pro: the card is shorter than the reference model, so there are more chances to fit it inside your PC case. Con: the layout of external power connectors. They look down in the reference card (when it's installed in the motherboard), so it's easy to plug power cables to them. What concerns the GeCube card, they are oriented to the rear. That is power cables automatically make the card a tad longer.
The card from GeCube has another peculiarity - four DVIs instead of two (in the reference card). Moreover, non-standard BIOS in this card does not allow the driver to enable CrossFire by default (it cannot be disabled in the reference card). That is users can disable CrossFire and get just two cards, each with two DVIs. In this case you can plug four monitors to your card. So, flexibility of this card is praiseworthy.
What concerns reference cards from HIS, PowerColor, and MSI, their PCBs are as long as the 8800 GTX (270 mm). You should take it into account, if you want to buy this card.
The cards have TV-Out with a unique jack. You will need special bundled adapters to output video to a TV-set via S-Video or RCA. You can read about the TV-Out in more detail here.
Analog monitors with d-Sub (VGA) interface are connected with special DVI-to-d-Sub adapters. The bundle also includes DVI-to-HDMI adapters (these graphics cards can transfer video and audio data to HDMI receivers), so there should be no problems with such monitors. Maximum resolutions and frequencies:
240 Hz Max Refresh Rate
2048 x 1536 x 32bit x85Hz Max - analog interface
2560 x 1600 @ 60Hz Max - digital interface (all DVIs with Dual-Link)
What concerns MPEG2 playback features (DVD-Video), we analyzed this issue in 2002. Little has changed since that time. CPU load during video playback on modern graphics cards does not exceed 25%.
HDTV and other trendy video features. You can read one review here.
These cards require additional power supply, so each card is bundled with Molex-to-6-pin adapters, although all modern PSUs offer such cables. The reference cards have TWO power connectors, one of them is an 8-pin one. You shouldn't be confused by them - a usual 6-pin cable will be sufficient. Two additional pins are responsible for overclocking via the drivers.
Now about the cooling systems.
HIS RADEON HD 3870 X2 2x512MB PCI-E
PowerColor RADEON HD 3870 X2 2x512MB PCI-E
MSI R3870X2-T2D1Q-OC (RADEON HD 3870 X2) 2x512MB PCI-E
Let's examine the reference cooling
system on the card made by HIS, because almost all graphics cards
of this class will be equipped with such coolers.
It's a usual turbine design: a long heat sink covered with
plastic housing to direct the airflow. There is a turbine at
one end, which pumps the air through the heat sink. The other
end has a socket to throw the heat out of a system unit.
We've already seen such coolers on the HD 2900 XT and the HS
3870. This very device is very long, so there are two heat sinks
inside, one on each GPU.
Note that the plate and the heat sink near the turbine are
made of aluminum, while the others are made of copper. Here
is the idea: the first heat sink to meet the airflow gets cooler
air, so aluminum works well here. The second heat sink is cooled
by the air, warmed up by the first heat sink. It must be more
efficient, so it's made of copper. We'll see the results below.
In conclusion of our examination, I want to note that different
heat sinks are used ONLY in the first batches of such coolers.
Products made by MSI and PowerColor come with copper heat sinks
(BOTH), so they are heavier. Perhaps, HIS was one of the first
partners to get the cards from AMD, so their coolers have different
heat sinks.
The turbine is slow, so there is practically no noise. The
fan rotates fast only for several seconds at startup.
GeCube RADEON HD 3870 X2 X-Turbo Dual
2x512MB PCI-E
GeCube engineers came up with a different solution. As the
PCB is of the unique design, the cooler is also different. It's
either proprietary or it has been ordered somewhere.
It's an all-copper device, so it's quite heavy. The copper
base that touches GPUs as well as memory chips (surprisingly,
memory chips on the back of the PCB are left without cooling)
has two dome-shaped heat sinks, where the domes are formed by
heat pipes with small fins on them.
There are two fans inside the domes. They are slow, so the
cooler is absolutely noiseless. However, you should be careful
with fragile fan latches. Don't touch them when you install
the card.
As in the previous case, this cooling system is very wide,
so the graphics card takes up two slots on a motherboard.
We monitored temperatures using RivaTuner (written by A.Nikolaychuk AKA Unwinder). Here are the results:
HIS RADEON HD 3870 X2 2x512MB PCI-E
PowerColor RADEON HD 3870 X2 2x512MB PCI-E
MSI R3870X2-T2D1Q-OC (RADEON HD 3870 X2) 2x512MB PCI-E
GeCube RADEON HD 3870 X2 X-Turbo Dual 2x512MB PCI-E
Our results show that different heat sinks (aluminum and copper) are less efficient. So the card from HIS, operating at lower frequencies, was hotter than the card from MSI (its frequencies are much higher, but it comes with an all-copper reference cooler).
Cooling efficiency of the GeCube card is praiseworthy! And don't forget that the cooler is noiseless.
This is a processor that provides hardware support for CrossFire in the card. It's actually a PCI-E-to-PCI-E bridge. According to specifications, it supports PCI-E 1.1 only, no 2.0, which is available to each GPU. That's why the GPUs exchange data via the PCI-E 1.1 bus.
It's the RV670 - RADEON HD 3870, each card has two such GPUs.
Here is the X2 reference card inside our testbed.
Bundles
HIS RADEON HD 3870 X2 2x512MB PCI-E
The box contains a User's Manual, CD with drivers,
component output adapter, DVI-to-VGA adapter, DVI-to-HDMI adapter,
external power adapter, CrossFire bridge. And the most curious
thing, HIS carries on to bundle bonuses with graphics cards, e.g.
a screwdriver with interchangeable heads. The screwdriver has
a built-in flashlight to illuminate the screw you are working
with and a level. It's a nice bonus for all users, who assemble
or upgrade computers on their own. Unfortunately, there is no
6-pin-to-8-pin adapter, although one of the on-board power connectors
has 8 pins.
GeCube RADEON HD 3870 X2 X-Turbo Dual
2x512MB PCI-E
The bundle includes: User's Manual, CD with drivers,
component output adapter, DVI-to-VGA adapter, DVI-to-HDMI adapter,
external power adapter, CrossFire bridge. It's a small bundle.
Nothing like the luxurious looking card and box.
PowerColor RADEON HD 3870 X2 2x512MB
PCI-E
User's Manual, CD with drivers, component output
adapter, two DVI-to-VGA adapters, DVI-to-HDMI adapter, external
power adapter, CrossFire bridge.
MSI R3870X2-T2D1Q-OC (RADEON HD 3870
X2) 2x512MB PCI-E
The same bundle, plus (VERY IMPORTANT!) a 6-8-pin
adapter. Thus, the MSI card can be plugged so that AMD drivers
will reveal overclocking options.
Boxes
HIS RADEON HD 3870 X2 2x512MB PCI-E
We'd like to give a scolding to HIS here. It possesses great
experience in packaging. But in this case it stuffed a huge
card into a thin box, which bulges when you close it. Plus tasteless
design. The box produces an impression of a cheap card inside,
even of a noname product made in China.
That's sad.
Bundled components are secured in a plastic section inside.
GeCube RADEON HD 3870 X2 X-Turbo Dual
2x512MB PCI-E
Designers from GeCube did a great job. It's an excellent box
with a window to show off the card, which looks gorgeous.
Bundled components are arranged into cardboard sections inside.
PowerColor RADEON HD 3870 X2 2x512MB
PCI-E
Designers still prefer vertically oriented boxes. It's actually
a jacket with a white cardboard box inside. The box contains
all bundled components arranged into cardboard sections in a
pile of cardboard padding. I don't understand why waste so much
cardboard, if they could do with a plastic form.
The box has a stylish design.
MSI R3870X2-T2D1Q-OC (RADEON HD 3870
X2) 2x512MB PCI-E
In this case we can see a famous huge bag, used for all expensive
cards from MSI. However, the bag is half-empty, because the
bundle is small (we can remember the time, when such boxes held
11 CDs with software and various bonuses).
The graphics card is secured inside a foamed polyurethane box.
Installation and Drivers
Testbed configuration:
Intel Core2 (775 Socket) based computer
CPU: Intel Core2 Extreme QX9650 (3000 MHz)
Motherboard: Gigabyte GA-X38-DQ6 on the Intel X38 chipset
RAM: 2 GB DDR2 SDRAM Corsair 1142MHz (CAS (tCL)=5; RAS to CAS delay (tRCD)=5; Row Precharge (tRP)=5; tRAS=15)
HDD: WD Caviar SE WD1600JD 160GB SATA
PSU: Tagan 1100-U95 (1100W).
Operating system: Windows XP SP2; DirectX 9.0c
Operating system: Windows Vista 32-bit; DirectX 10.0
Monitor: Dell 3007WFP (30").
Drivers: ATI CATALYST 7.12/8.1; NVIDIA 169.21/169.25.
Synthetic tests were run with the following graphics cards:
RADEON HD 3870 X2 with standard parameters (HD3870 X2)
RADEON HD 3870 with standard parameters (HD3870)
NVIDIA GeForce 8800 GTS 512MB with standard parameters (GF8800GTS 512)
We've selected these solutions to compare with the new dual-GPU card from AMD for the following reasons: RADEON HD 3870 as a full single-GPU counterpart, which differs only in operating frequencies of the GPU and memory. And the latest GeForce 8800 GTS (G92) is used for two reasons. Firstly, this graphics card is close to the HD 3870 X2 in price. Secondly, it's one of the fastest solutions from the competitor.
We should warn you that the analysis is going to be boring, because nothing changed from the architectural point of view. The GPU is the same, only this card is equipped with two GPUs. And we are familiar with CrossFire already. Besides, it's easy to predict results, as CrossFire always uses the AFR mode. Performance (as well as frames) will be almost doubled in most cases, which is far from the situation in real games - synthetic tests are simple, their frames do not depend on previous results (render targets). So we expect a doubled frame rate almost in all cases.
Direct3D 9: Pixel Filling tests
This test determines peak texel rate in FFP mode for different numbers of textures applied to a pixel:
AMD graphics cards demonstrate results close to theoretical maximum, although they are slower than they should be. As usual, results of synthetic tests fail to reach theoretical values even in modes with a lot of textures. However, the NVIDIA card with improved TMUs and a high texturing rate is doing even worse. Judging by results of our old tests, the GPU fetches twice as few texels per cycle as its theoretical values for 32-bit textures with bilinear filtering.
In case of a single texture, performance of all solutions is limited by memory bandwidth and by the number of ROPs. In case of many textures per pixel, the situation gets better. The GeForce 8800 GTS and the HD 3870 X2 go on a par, although the dual-GPU monster from AMD is still outperformed by the single-GPU solution from NVIDIA. That's the effect of not so many TMUs in the AMD R6xx architecture... There is no twofold difference between the HD 3870 X2 and the HD 3870 in any test, even though the difference between them grows with the number of textures. Let's have a look at the fill rate test:
The second synthetic test measures the fill rate. It shows the same situation adjusted for the number of pixels written into the frame buffer. In case of 0 and 1 texture, all solutions demonstrate similar results, revealing their true power only with many textures per pixel. On the whole, RV670-based graphics cards are expectedly outperformed by G92-based solutions in a texture fetch rate and fill rate, when their performance is not limited by memory bandwidth.
Direct3D 9: Geometry Processing Speed Tests
Let's analyze a couple of extreme geometry tests. The first test uses the simplest vertex shader that shows maximum triangle throughput:
All GPUs are based on unified architectures, all unified processors in this test are busy with geometry processing. So all solutions demonstrate high results, which are evidently not limited by peak performance of unified processors, but by performance of other units, for example, triangle setup.
Our GPUs execute this test in various modes with similar efficiency. Peak performance in FFP, VS 1.1, and VS 2.0 does not differ much, only the G92 is a tad faster in the FFP mode. We cannot say much about these results, except that AMD GPUs process geometry traditionally faster than NVIDIA GPUs. The card based on two RV670 GPUs becomes an evident leader in geometry performance, nothing stops it from demonstrating doubled frame rate in this test.
We removed two intermediate geometry tests with a single light source, because they don't show anything interesting. So we proceed straight to the most complex geometry task with three light sources, including static and dynamic branches:
The difference between AMD and NVIDIA has grown. The HD 3870 X2 still leads in geometry performance. It's twice as fast as its single-GPU modification in all modes. These GPUs do not reveal their full potential even in our most complex geometry task, they will cope with a heavier load. And results of the only NVIDIA card here are lower, it's significantly outperformed even by the regular HD 3870. Interestingly, optimized FFP emulation in the G92 becomes even more noticeable with three mixed light sources.
Let's draw a bottom line under geometry tests: the HD 3870 X2 has the same GPUs, it uses CrossFire to double the frame rate, so we get expectable results—they are twice as high as results of the single-GPU HD 3870. But don't forget that the situation may change much in real applications...
Direct3D 9: Pixel Shaders Tests
The first group of pixel shaders to be reviewed here is too simple for modern GPUs. It includes various versions of pixel programs of relatively low complexity: 1.1, 1.4, and 2.0.
These tests are too easy for modern architectures and fail to reveal their true capacity. Performance in the simplest tests is limited by texture lookups and the fill rate - we can see it in weak results of AMD cards compared to the GeForce 8800 GTS. That's again the effect of relatively few TMUs... However, the results get better in more complex PS 2.0 tests. For example, in the most complex Phong test with three light sources, the GeForce 8800 GTS is a tad outperformed even by the regular HD 3870.
The twofold difference between the HD 3870 X2 and the HD 3870 is preserved, it's smaller only in the illumination tests. Let's have a look at results in more complex pixel programs of intermediate versions:
The procedural water test depends much on texturing performance. It uses dependent texture lookups of high nesting depth, so both cards from AMD lag behind the only representative of NVIDIA. The GeForce 8800 GTS is much faster than even the dual-GPU solution here. The HD 3870 X2 is more than twice as fast as the single-GPU HD 3870.
In the second compute intensive test, AMD solutions are already ahead. The GeForce 8800 GTS is a tad slower than the regular HD 3870, to say nothing of the X2 card, which is traditionally twice as fast as the HD 3870. So, the second task suits the AMD architecture with many unified processors.
Direct3D 9: New Pixel Shaders Tests
These tests of DirectX 9 pixel shaders are even more complex, they are divided into two categories. We'll start with easier shaders - SM 2.0:
Parallax Mapping - a texturing method used in many games, which is described in detail in our article Modern 3D Graphics Terms
Frozen Glass—a complex procedural texture that visualizes frozen glass with adjustable parameters
There are two modifications of these shaders: arithmetic intensive and texture sampling intensive. Let's analyze arithmetic intensive modifications, they are more promising from the point of view of future applications:
NVIDIA leads in the Frozen Glass test. The GeForce 8800 GTS outperforms even the new card with two RV670 GPUs. The single-GPU HD 3870 is 2.5 times as slow. It indicates that performance of this test is limited by the texture fetch rate in the first place.
AMD solutions were traditionally faster in the second Parallax Mapping test. But NVIDIA G92 solutions with improved TMUs (parallax mapping requires an additional texture lookup) changed the situation, so the new GeForce 8800 GTS outperforms the HD 3870. However, the twin RV670 represented by the HD 3870 X2 demonstrates twice as many frames per second in this test as the single-GPU solution. Let's analyze texturing intensive modifications of the same tests:
The situation has changed. Now performance is limited by texturing units even more. So the GeForce 8800 GTS breaks away even further, being only a little slower than the dual-GPU card in the second test. The twofold difference between the HD 3870 X2 and the HD 3870 is preserved...
As usual, arithmetic-intensive shaders work faster on all graphics cards. Texturing-intensive shaders make no sense for modern GPU architectures, new products from AMD and NVIDIA prefer arithmetic operations to texturing.
Let's have a look at results of another two pixel shader tests—SM 3.0. They are the most complex of all our tests for Direct3D 9 pixel shaders. The tests load ALUs and texture units heavily. Both shader programs are complex, long, and include a lot of branches:
Steep Parallax Mapping is a much heavier modification of parallax mapping, which is also described in the article Modern 3D Graphics Terms
Fur—a procedural shader that visualizes fur
This load on graphics cards is too big even for the most powerful solutions. Although the AMD solutions provide efficient execution of complex pixel shaders (3.0) with a lot of branches, the single-GPU HD 3870 is almost twice as slow as the new GeForce 8800 GTS 512MB, which can be explained by accelerated bilinear texture fetches in the latter. The situation in real applications will be different, of course. They use trilinear and/or anisotropic texture filtering, and their performance is limited by the fill rate and memory bandwidth much more often.
The RADEON HD 3870 X2 is still twice as fast as its single-GPU modification, but it gives a very little advantage over the GeForce 8800 GTS. What will happen, when NVIDIA launches a similar card with two G92 GPUs? No need to guess. AMD will be defeated again, in case of equal SLI and CrossFire modes.
Direct3D 10: PS 4.0 Tests (texturing, loops)
New RightMark3D 2.0 includes two old PS 3.0 tests for Direct3D 9, rewritten for DirectX 10, and two brand new tests. The first two tests can now enable self-shadowing and shader supersampling, which increases the GPU load.
These tests measure efficiency of executing looped pixel shaders with a lot of texture lookups (up to several hundreds of lookups per pixel in the heaviest mode!) and a relatively low ALU load. In other words, they measure a texture sampling rate and branching efficiency in a pixel shader.
The first pixel shader test will be the Fur test. When used with the lowest settings, it uses 15-30 texture lookups from bump maps and two lookups from the main texture. The High Effect Detail mode increases the number of lookups to 40-80. When shader supersampling is enabled—the number of lookups grows to 60-120. And the High mode with SSAA is the heaviest mode—160-320 lookups from a bump map.
Let's see what happens in modes without supersampling - they are relatively simple, and the correlation of results in Low/High modes must be similar.
All results in the High mode are approximately 1.5 times as low as in the Low mode. Procedural fur tests with a lot of texture lookups traditionally show a great advantage of NVIDIA over AMD. None of the RADEONs can compete with the GeForce 8800 GTS, even two GPUs are of no help here. This defeat cannot be explained even theoretically. Perhaps, the problem is in bugs of the driver for Direct3D 10. CrossFire is doing great in Direct3D 10, the HD 3870 X2 is twice as fast as the single-GPU solution.
By the way, judging by previous reviews, performance in this test depends not only on the number and speed of TMUs, rendering speed is also limited by the fill rate and memory bandwidth. Let's have a look at the results in this test with enabled shader supersampling, which quadruples the load. Perhaps it will change the situation:
Theoretically, supersampling quadruples the load, but performance drops deeper in NVIDIA solutions than in AMD cards. So the breakaway gets smaller. Still, only the GeForce 8800 GTS copes with tests of such complexity, other solutions demonstrate very low results. Nothing changes between the HD 3870 X2 and the HD 3870. The dual-GPU card is twice as fast again owing to the AFR CrossFire mode.
The second test that measures efficiency of executing complex looped pixel shaders with many texture lookups is called Steep Parallax Mapping. With low settings it uses 10-50 texture lookups from a bump map and three lookups from main textures. The heavy mode with self-shadowing doubles the number of texture lookups, and supersampling quadruples this number. The most complex test mode with supersampling and self-shadowing uses 80-400 texture lookups, that is eight times as many as in the low mode. Let's analyze simple modes without supersampling first:
This test is more interesting from the practical point of view. Various parallax mapping methods have been used in games for a long time already. Heavy modifications, such as our steep parallax mapping, are used in the latest games, e.g. in Crysis and Lost Planet. Besides, along with supersampling, this test also allows to enable self-shadowing that doubles the GPU load (High mode).
Although AMD solutions used to be strong in Direct3D 9 tests of parallax mapping, they fail to keep up with the GeForce 8800 GTS in the updated DX10 test without supersampling. Even the HD 3870 X2 is outperformed by the NVIDIA card here. Besides, self-shadowing causes a bigger performance drop in AMD products, over two times versus 1.5 in NVIDIA solutions.
The X2 card is again twice as fast as the regular HD 3870. That is CrossFire AFR works fine in our tests. Let's see what supersampling will change, as it slowed down NVIDIA cards much more than AMD solutions in the previous test.
That's one more heavy task for GPUs, where two options are enabled: supersampling and self-shadowing. The GPU load grows almost eight times, so the performance drop is enormous. The performance difference between our cards is generally preserved, supersampling has the same effect as in the previous case—AMD cards improve their results relative to the NVIDIA solution. However, even the HD 3870 X2 is still outperformed by the GeForce 8800 GTS 512MB. What concerns the comparison between the HD 3870 X2 and the HD 3870, we have nothing to add—there is still the traditional twofold advantage of the dual-GPU card.
Direct3D 10: PS 4.0 Tests (computing)
The next couple of pixel shader tests contains very few texture lookups to minimize the effect of TMUs on performance. They use a lot of arithmetic operations, so they measure arithmetic performance of GPUs, how fast they execute arithmetic instructions in pixel shaders.
The first computing test is called Mineral. It's a complex procedural texturing test, which uses only two texture lookups and 65 sin and cos instructions.
We've already noted in our synthetic Direct3D 9 tests that the latest architecture from AMD often performs better than NVIDIA's architecture in compute-intensive tasks. Although the RADEON HD 3870 is still slower than the best solution based on the G92 in this test, the dual-GPU card doubles its frame rate and significantly outperforms one of the fastest cards from NVIDIA in FPS.
The second shader test is called Fire, it's even harder for ALUs. It contains only a single texture lookup, while the number of sin/cos instructions is doubled to 130. Let's see what changes as the load grows:
AMD cards failed this test in all previous articles, demonstrating very low results, which indicated an apparent bug in the drivers. Judging by today's test results, the bug has finally been fixed (almost a year after it was reported!), and now the HD 3870 performs on a par with the 512 MB modification of the GeForce 8800 GTS. The RADEON HD 3870 X2 is twice as fast.
Direct3D 10: Geometry Shader Tests
RightMark3D 2.0 includes two geometry shader tests. The first one is called Galaxy, it's similar to point sprites from previous Direct3D versions. It animates a system of particles using a GPU, a geometry shader creates four vertices from each particle. Similar algorithms should be used in future DirectX 10 games.
A change of balance in geometry tests does not affect rendering results, the image is always identical, only scene processing methods differ. GS load value determines what shader will be busy—vertex or geometry. The amount of work is always the same.
Let's analyze the first modification of Galaxy with vertex computing for three levels of geometric complexity:
The correlation of results with different complexity levels of the scene is almost the same, only absolute values are different. Performance corresponds to the number of points, FPS is halved each step. Only the dual-GPU card from AMD can compete with the GeForce 8800 GTS in this test. It's twice as fast as its single-GPU modification based on the RV670, which in its turn is slower than the NVIDIA card. However, it's not a hard task for modern graphics cards. Our previous tests demonstrate that performance is not limited by shader ALUs here, the task is limited more by memory bandwidth than by GPU. Perhaps the situation will change, when some work is moved to a geometry shader.
There are no significant changes in this case. All graphics cards demonstrate practically the same results, when GS load changes (responsible for moving some computations into a geometry shader). The GeForce 8800 GTS still outperforms the HD 3870, while the HD 3870 X2 is twice as fast as the single-GPU card. Perhaps, it's the effect of different clock rates and a measurement error. We'll see what will happen in the next test...
Hyperlight is the second geometry test that uses several techniques: instancing, stream output, buffer load. It employs dynamic generation of geometry by rendering into two buffers, as well as a new Direct3D 10 feature—stream output. The first shader generates ray directions, their speed and growth vectors. These data are stored in a buffer, which is used by the second shader for rendering. Each ray point is used to generate 14 vertices in a circle, up to a million output points.
The new type of shader programs is used to generate rays. If "GS load" is set to "Heavy"—it's also used for rendering. That is in Balanced mode, geometry shaders are used only to generate and grow rays. Output is up to instancing. The geometry shader also outputs data in the Heavy mode. Let's analyze the easy mode first:
Relative results in various modes correspond to the load: performance scales well in all cases. It's close to theoretical parameters, according to which, each next level of Polygon count must be twice as slow. Performance of the NVIDIA card in this test is again much higher than that of both cards from AMD with any geometry complexity. A single-GPU card from AMD is outperformed by the GeForce 8800 GTS 512MB by more than twofold, and the dual-GPU card cannot even get close to it.
It's the first test in our review, when the RADEON HD 3870 X2 outperforms the single-GPU by less than twofold. Their performance difference is just 1.5. It's a synthetic test, where the most favorable CrossFire mode for FPS does not yield a twofold performance gain. It will be worse in game tests...
Results may change in the next test, which uses geometry shaders more actively. It will be also interesting to compare results obtained in Balanced and Heavy modes.
The correlation of performance results has changed very much. AMD GPUs execute more complex geometry shaders more efficiently than the NVIDIA GPU. However, the GeForce 8800 GTS 512MB performs almost on a par with the RADEON HD 3870, while the gap used to be much larger. The HD 3870 X2 significantly outperforms both competitors, although the performance difference does not reach twofold.
What concerns the comparison of results in different modes, the new GeForce 8800 GTS in Balanced mode demonstrates better results than the RADEON HD 3870 X2 in Heavy mode. You should keep in mind that the image does not differ in these modes. That is, AMD solutions perform better in the second mode (using the geometry shader for output instead of instancing), while NVIDIA prefers the first one. However, when we compare performance results in the best modes, the GeForce 8800 GTS is a tad faster than the new dual-GPU card from AMD.
Direct3D 10: Vertex texture fetch rate
Vertex Texture Fetch tests measure the speed of many vertex texture fetches. These tests are similar, and the correlation of their results in Earth and Waves tests must be also similar. Both tests use displacement mapping based on texture lookups. The only major difference is that the Waves test uses conditional branches, while the Earth test does not.
Let's analyze the first test (Earth) in Effect detail Low mode:
Results in different modes again demonstrate a similar picture with relative performance. Judging by our previous reviews, results of this test are heavily affected by memory bandwidth. The easier the mode, the stronger the effect on performance.
The GeForce 8800 GTS outperforms the RADEON HD 3870, but the dual-GPU card shoots forward. The difference between AMD cards reaches twofold, but only in heavy modes. Memory bandwidth is insufficient in the easiest mode. The difference in the average mode also fails to reach twofold. Let's have a look at results of this test with more texture lookups:
The situation hasn't changed much, the RADEON HD 3870 X2 is still ahead. It's up to twice as fast as the HD 3870, as the task grows more complex. The NVIDIA GeForce 8800 GTS is somewhere in between.
Let's have a look at results of the second vertex texture fetch test. The Waves test executes fewer texture lookups, but it uses conditional branches. The number of bilinear texture lookups in this case reaches 14 (Effect detail Low) or 24 (Effect detail High) per each vertex. Geometry complexity changes just like in the previous test.
The situation in the Waves test is slightly different from previous results. Both HD 3800 cards look good. The single-GPU product performs on a par with the GeForce 8800 GTS, being faster in the Low mode and slower in the High mode. The HD 3870 X2 is twice as fast, so it becomes a leader owing to CrossFire. Let's analyze the second mode:
There are almost no changes. But as the test grows more complex, results of a single-GPU HD 3870 get evidently better than those of the GeForce 8800 GTS. It wins in all modes, not just in the Low mode. The other conclusions still hold true—performance in the Low mode is a tad limited by memory bandwidth, TMUs and ROPs play a more important role in the High mode. AMD drivers seem to be optimized, because NVIDIA solutions used to cope with vertex texture fetch tests better than AMD cards. Now the situation is much better. Especially as the X2 card is again more than twice as fast as the regular HD 3870.
Conclusions on the synthetic tests
There is nothing new or interesting for our conclusions. The RADEON HD 3870 X2 is based on two GPUs, which have been already reviewed. They changed little compared to the R600, all architectural fortes and weaknesses remain the same. The dual-GPU solution is notable for high computing performance, especially in modern and future applications with many complex shaders of all types. The weakest link here is relatively few texturing units, which do not allow all R6xx-based cards demonstrate higher performance in tests that depend much on texturing speed.
What concerns a performance gain of the HD 3870 X2 relative to the single-GPU modification, all results can be explained with the AFR CrossFire mode (Alternate Frame Rendering). As frames do not depend on each other in our tests, the frame rate grows approximately twofold, except for several tests, where the card suffers from insufficient memory bandwidth. Besides, we should keep in mind that it's not easy to double performance in real games even in the AFR mode, which has its own problems - higher latencies compared to honest twofold performance gain. In other words, in many cases 60 FPS provided by CrossFire or SLI will be just as playable as 30 FPS on a single-GPU card.
On the whole, having analyzed synthetic test results, we should admit that the RADEON HD 3870 X2 is quite a fast solution. In certain conditions it can even compete with more expensive graphics cards. It outperforms the single-GPU GeForce 8800 GTS 512MB (with a similar price tag) in many cases. But don't forget that when you choose between a single-GPU and a dual-GPU graphics card, you must take into account the above-mentioned problems of CrossFire/SLI systems. Besides, you should keep in mind the results, demonstrated in real games, not in synthetic or semi-synthetic tests, like 3DMark.
That's why it's vital to pay attention to the next part of the article devoted to tests of the new dual-GPU card from AMD in modern games. These results must be much more interesting than synthetic results from this part, because it's not that easy to double performance in games.