iXBT Labs - Computer Hardware in Detail

Platform

Video

Multimedia

Mobile

Other

CMP vs. SMP on AMD Platform:
SMP Systems Based on Opteron 250 and
Dual Core Opteron 275

August 10, 2005






As we have already written before, the appearance of dual core processors from both leading x86 CPU manufacturers has raised natural interest: how their performance correlates with "honest" SMP systems, where two equally-clocked cores are in separate processors? We have already analyzed the competition between Pentium XE 640 and Dual Xeon 3.2 GHz and drawn the evident conclusion: dual core processors are preferable in terms of performance, though that's most likely not due to the performance difference between the processors, but due to a more "conservative" platform. What's the situation in the AMD camp?

At first sight, Opteron systems are advantageous in this competition, if compared to the already tested Xeon system: they use the same DDR400 as Athlon 64 X2. However, there are still nuances: the memory modules are Registered; besides, they have higher timings (it's probably a tribute to the notorious "server reliability"). Besides, you shouldn't forget that the dual processor system based on Opteron has two memory controllers (in each processor) instead of only one, as in case of Athlon 64 X2. Moreover: in the most general case, you can install memory in such systems in two ways. In the first case, memory is distributed between the processors (each one has its own memory and "another's" memory, calling which would require using the memory controller of the second processor, connected with Hyper Transport bus). In the second case, one of the processors has no memory at all (all the memory "belongs" to the other processor, being slow to call), while all the modules are connected to the second processor (thus, all the memory is its own, fast).

In fact, there also exist BIOS settings (Node Memory Interleave, DRAM Bank Interleave, Swizzle Memory Banks) that introduce additional options to play with :), but we decided to leave them to our recognized expert in this field Dmitri Besedin (expect an additional article...). In this review we analyzed the two polar options only: four modules on a processor and two modules on each processor (4 x 1 and 2 x 2 on the diagrams correspondingly). Other settings are either Auto or BIOS Optimal Defaults. Besides, as we obtained obvious results, we carried out our experiments only on a system with two single-core Opteron 250 processors), while the dual-core Opteron 275 processor was tested on the 2 x 2 configuration.

We failed to compare the system on dual core Opteron with other contenders correctly, as the top dual core Opteron has lower clock than the top desktop dual core Athlon 64 X2 4800+. However, we thought it better to test currently available processors than to delay the article, waiting for the ideal compatibility between all the configurations. Besides, it seemed appropriate to add another processor to our diagrams: top single-core Athlon 64 FX-57. It's quite logical: it's up to our readers to decide in what tasks the single-core CPU at a higher clock is better than any CMP/SMP modification.

And on the contrary, dual core processors from Intel look absolutely out of place in this article: the situation with Pentium XE 840 performance compared to top dual core solutions from AMD is quite clear from the previous article on the top dual core CPU from this company. However, if you want to freshen your memory about this comparison, you can open this article in a separate window and consult it from time to time the results in both articles are obtained using the same methods and 100% comparable.

Testing

Testbed configurations

  • Processors
    • AMD Athlon 64 FX-57 (1 MB L2, 2.8 GHz core)
    • AMD Athlon 64 X2 4800+ (2 x 1 MB L2, 2 x 2.4 GHz core)
    • AMD Opteron 250 (1 MB L2, 2.4 GHz core)
    • AMD Opteron 275 (2 x 1 MB L2, 2 x 2.2 GHz core)

  • Motherboards
    • ASUS A8N-SLI Deluxe (NVIDIA nForce4 SLI)
    • Tyan Thunder K8WE (S2895) (AMD-8131 + NVIDIA nForce Pro 2200/2050)
    • OEM H8DCE (NVIDIA nForce Pro 2200/2050)

  • Memory
    • 2x512 MB PC3200 (DDR400) DDR SDRAM DIMM 2-2-2-5 (Corsair TwinX)
    • 4x512 MB Registered PC3200 (DDR400) DDR SDRAM DIMM 3-3-3-8 (Corsair)

  • Video card: ATI Radeon X800 (256 MB)
  • HDD: Samsung SP1614C (SATA), 7200 rpm, 8 MB Cache
  • AC power adapter: FSP 550-60PLN (500-550W)
  • Windows XP Professional SP2, DirectX 9.0c
  • ATI CATALYST 5.4 (Display Driver 6.14.10.6525)

If you didn't pay attention: Opteron-based systems used twice as large memory capacity, compared to our test standard: the total of 2 GB instead of only one. It was required to use both channels of the memory controller in each Opteron, which gives a total of four modules instead of the standard two. But as our test method does not include tests that would require more than 1 GB, Opteron results are quite comparable to the Athlon 64 X2 results, obtained earlier: further increase of memory capacity on a testbed does not result in performance gain in our tests; we have already checked that. You should also note that as we tested platforms as a whole, not processors as such, we decided to meet the wish of manufacturers and test their systems as unchanged as possible. That's the reason why Opteron 250 was tested on a motherboard from Tyan, while Opteron 275 on the OEM H8DCE (this motherboard will be described below). However, considering the same chipset and identical memory timings in both testbeds, we can be sure that the influence of motherboards on performance was vanishing (if there was any influence at all).





Tyan Thunder K8WE motherboard with
Opteron 250 processors
and Titan TTC-K8ATB/825
coolers

I'd like to note another thing: firstly, we don't often come across motherboards based on chipsets from two manufacturers simultaneously. But considering that chipsets for AMD Athlon 64 / Opteron usually use the same bus between chips as processors do (Hyper Transport), we can quite as well expect "cross compatibility" between each other. In this case both motherboards use nForce Professional 2200 and 2050 from NVIDIA, which allow two sterling PCI-E x16 slots (so that the motherboards allow to install two cards, based on NVIDIA chips, in SLI mode), plus SATA 2.0 controller supporting RAID and NCQ. Hence the orientation of these motherboards to workstations, as server motherboards usually have the most primitive integrated video (servers don't need powerful video, most of the time they operate with turned off monitors or even without them). Besides, the Tyan motherboard uses the AMD-8131 chip (here is the confusion of chipsets!), which allows three PCI-X slots.





Supermicro H8DCE motherboard with
Opteron 275 processors
and Supermicro SNK-P0014AP4
coolers

The subject of pleasantry of gurus, who remember the times of AMD Athlon Slot A, is the fact that the OEM motherboard has been shipped in a classic (for those who remember...) white "badgeless" box, bundled only with a driver CD and a small green piece of paper containing the name of this motherboard (no mention of a manufacturer!) But it was still no problem for us to find out that this motherboard was most likely manufactured by Supermicro. So one of the last bastions of faithful Intelers among the motherboard manufacturers seems to fall soon.

Test results

Diagrams with all test results are published on a separate web page without comments, just as is». The article provides only summary diagrams that calculate the results of entire test groups into average scores. This approach appeases curiosity of the most inquisitive readers, who are against cutting down the number of test results published in our articles, and still makes the article less motley and graphics-intense. What concerns our comments, real professionals (who are interested in details) are expected to need none of them.

SPECapc for 3ds max 6 + 3ds max 7.0

A complete set of diagrams

The results are quite predictable: the dual Opteron 250 processor turned out the slowest, probably due to registered memory with the worst timings (have a look at the full set of diagrams to see that it's mostly the effect of the interactive test); the four-processor system based on two Opteron 275 CPUs got the highest score, solely due to its striking rendering speed (it's slower than Athlon 64 X2 4800+ in the interactive test, but it couldn't possibly be faster as the interactive test does not use more than one CPU and the core clock in Opteron 275 is lower).

Dual Opteron in the 4 x 1 mode looks like an absolute snail, but pay attention to this asterisk: the fact is that in order to complete this test correctly, we had to introduce changes into the hardware configuration of this testbed, so this result cannot be considered absolutely correct and it would be right to ignore it. You can read about the reasons at the end of the article, "Petty annoyances" section.

Maya 6.5

A complete set of diagrams

SPEC for Maya is actually a counterpart of Interactive test in SPEC for 3ds max, so the victory of the single-core FX-57 is quite clear and easy to explain. We can also easily explain the imposing advantage of the dual Opteron 275 over the other systems in the rendering test: this process can be distributed between processors perfectly, so four practically sterling processors, which are slower only by 200 MHz than the single-core dual processor and the dual core processor, are beyond competition.

Lightwave 8.2, rendering

Lightwave doesn't go along well with four processors. No, the dual Opteron 275 is still victorious with a noticeable breakaway from its competitors, but its result makes you doubt whether Lightwave renderer is distributed well between the processors: theoretically, the breakaway must have been larger (just look at the rendering results in Maya and 3ds max on the page with detailed diagrams).

SPECapc for SolidWorks 2003

A complete set of diagrams

This picture is typical of a single-processor test: the top single-core processor outperformed all its competitors (remember that the lowest result is the best in this test). The other processors are approximately on a par, Opteron 250 in the 4 x 1 mode is again worse. Running a few steps forward, it's not actually a typical picture for a single-threaded application for some reason the benchmark occupied the second CPU (CPU1), even though the memory was installed on the first processor (CPU0 in Windows terms). However, this information is useful anyway: that's what happens, when a processor executes code from the other processor's memory.

Adobe Photoshop CS (8)

A complete set of diagrams

Unfortunately, the average score for the system based on two Opteron 275 processors is unavailable, as this system failed to pass all the tests (it failed Lighting Effects, this situation is reviewed at the end of the article, "Petty annoyances" section). But the diagrams with other test results (see the complete set of diagrams) show that the four-processor system can achieve its potential only at Blur, it just catches up with Athlon 64 X2 4800+ at Rotate and Sharpen operations. So the usual dual core processor is still the best.

Adobe Acrobat 6.0

This seemingly low-information test is the first candidate to be struck off from our method, as it's 100% predictable. But as it's still part of our method, we shall publish it anyway...

All-purpose data compression (archiving)

A complete set of diagrams

Our recommendation here is to look through detailed diagrams, as the summary shows only who is the winner, but not the reasons. 7-zip favours the dual core Athlon 64 X2 4800+, the breakaway is rather large. Opteron 250 turned out significantly slower than Athlon 64 X2 4800+ in 7-zip test, being still slower in the 4 x 1 mode than in the 2 x 2 mode; though it's only natural: 7-zip can make use of multiprocessing, but when one of the CPUs has no memory and still actively uses it, performance must drop.

But despite four CPUs installed in the system, Opteron 275 cannot outperform even Opteron 250. It most likely means that 7-zip cannot effectively use four processors (unlike two). Single-threaded WinRAR naturally favours the single-core A64 FX-57, but the other ratings have changed: the worst result belongs to Opteron 250 in the 2 x 2 mode. That's probably the effect of memory shared by two CPUs and the program executed by only one of them: it's also confirmed by the fact that the result is better in the 4 x 1 mode being the only active process, the program naturally "occupied" CPU0, the one with memory.

Multimedia lossy compression (MP3/MPEG2-4)

A complete set of diagrams

Large differences in details can be seen despite the seeming parity. Athlon 64 FX-57 is victorious in all single-threaded tests (even in one of LAME tests, even though it is partially optimized for SMP). However, it's significantly outperformed by all SMP/CMP systems in two tests that can make use of several processors: Windows Media Video 9 and Canopus ProCoder. But even these two tests are different: Windows Media Video codec made a good use of two processors, but it turned out completely indifferent to four CPUs (dual Opteron 275): results of this system are no better (even a tad worse) than those of dual processor systems. Only Canopus ProCoder benefits from four CPUs.

CPU RightMark 2004B

A complete set of diagrams

RightMark supports up to 16 render threads, so the dual system based on dual core Opteron 275 is expectedly victorious. But among dual processor testbeds Athlon 64 X2 4800+ is again the best certainly due to more conservative memory in Opteron 250 testbeds.

3D games and graphics visualization
in professional packages

DOOM 3

Far Cry

Painkiller

Unreal Tournament 2004

Total score in games

A complete set of diagrams

The general Opteron tendency for single-threaded tasks and the 4 x 1 mode is obvious: a single-threaded task is executed much faster by a processor with memory, compared to the configuration when memory is distributed between two processors. And of course the absolute champion is the single-core single-processor, but the highest-clocked Athlon 64 FX-57.

SPEC viewperf

A complete set of diagrams

This time SPEC viewperf agrees with games as far as its performance verdict is concerned. That's what actually happens most of the time.

Execution speed drop of the main process versus
the number of background processes

It's the only test where we decided to include the Pentium XE 840 results, as it was very interesting to compare "virtual four-processing" (2 CPUs + Hyper-Threading) with "real" (2 dual-core Opteron 275 processors).

You can easily see that four real CPUs (even if integrated into two packages) are quite enough for the main application not to notice the effect of four background tasks. That's what we think: background tasks "roam freely" among the three processors due to their status, while the foreground application occupies one CPU and sticks to it. A small performance drop of the foreground process can be explained by multiple memory calls from background tasks, which still slightly interfere with the foreground process to get data from memory at short notice. We deliberately omit the curve of the dual processor Opteron 250, as it's essentially no different from the curve of the dual core Athlon 64 X2 4800+.

Petty annoyances

We haven't encountered situations, when a testbed cannot pass some of the tests, for a long time. But in this case we have run into this very situation, so let's analyze the problems thoroughly.

Tyan Thunder K8WE

At first the testbed with this motherboard failed to pass SPECapc 3ds max in the 4 x 1 configuration, that is when all the four DIMM modules were installed into the first processor, while the second operated without its own memory. Multiple runs produced the same result: the testbed spontaneously switched off after running the test for some time (but not at the same stage, though the time span from the start of the test to switching off was approximately the same, ± one minute). This phenomenon made us take a good look at the motherboard design; that's what we found out:

  1. Tyan engineers (Supermicro did the same thing, though) based SLI support, that is two sterling PCI Express x16 slots, on two NVIDIA chips, each of them containing one PCI-E x16 controller.
  2. That's why the chips had to be connected to Hyper-Transport Link of different processors. Thus, the first PCI-E x16 slot is tied to the first processor, while the second slot to the second processor.
  3. We installed a video card into the first slot and the first processor in the 4 x 1 mode acted as a memory controller for the entire system, that is all the four DIMMs were connected to it.
  4. We assume that when all the four DIMMs are connected to a processor and the PCI-E x16 controller with an installed video card is connected to its HT-Link, the processor... just doesn't cope with this load!

This assumption has been backed up experimentally: the SPEC results for 3ds max, published on the diagrams, were obtained after installing all the four DIMMs into the slots connected to the memory controller of the second processor (not the first one, as in case with other tests in the 4 x 1 configuration). To all appearances, distributing the load between two CPUs (the first one was responsible only for a video card, the second one only for memory) helped solve the problem of excessive load, as the system managed to pass the SPEC test for 3ds max right after the changed were made.

Of course, after we detected this unpleasant peculiarity, we carried out an out-of-competition test for the Supermicro motherboard in the same situation. To the credit of this company be it said that the H8DCE aced this test, having passed the 3ds max test in the 4 x 1 mode with a video card in the first slot and four DIMMs on the first CPU without failures.

Besides, we want to note extremely high overheating of both chips of the nForce Professional core logic on this motherboard. You cannot touch your finger to the chip heatsinks for more than a second, lest you should burn it. Supermicro's approach to this problem looks reasonable: it equipped one of the chips with active cooling and the second one is covered by a much larger heatsink than that on the Tyan motherboard. Anyway, both motherboards must be installed into PC cases with powerful fans (Tyan motherboard would be better with two fans).

Supermicro H8DCE

...Unfortunately, it featured another bug, totally incurable: the testbed based on this motherboard failed the Adobe Photoshop / Lighting Effects test. Thus, as you may have already noticed, the overall score in Photoshop automatically gets unavailable. We currently have neither information on the reasons of his failure, nor any guesses. We can only establish the fact. This time the failure always occurred at the same place and looked the same. That's how it happened:




Conclusion

Well, as the issue of dual core processors is covered well in previous articles, we can only list the new bits of information obtained in our today's tests. There are only three important points:

  1. As in case with dual core Intel processors, the dual core desktop AMD Athlon 64 X2 processor is generally more preferable in terms of performance than the dual processor system based on an equally-clocked single-core Opteron. It has to do with the general conservative nature of server and desktop platforms, which prefer more reliable but slower memory, as well as with memory access specifics in Opteron systems, where single-threaded and multi-threaded tasks prefer different module installations (there is no ideal configuration for any usage as always).
  2. Building a workstation on dual core Opteron processors is hardly a universally good idea, as even those applications that profit from two processors often cannot use a greater number of CPUs (for example, four). So, the dual core Opteron is mostly a server-oriented processor (AMD actually positions it in the same way). Unfortunately, we have no streamlined method for testing CPU performance with server tasks so far, so we can say nothing about the performance of systems based on dual core Opteron processors in their element.
  3. It's hard to tell whether it's a fault of our motherboard samples or Opteron memory specifics (or both factors simultaneously), but the failures during standard tests are somewhat disturbing. However, it just proves the fact that when you buy a server or a workstation, you should make sure they are compatible with your software. This recommendation is actually all-purpose and can be applied to non-AMD systems as well. Servers or workstations are expensive, and the let's-test-and-then-buy approach is usually accepted by retailers.

P.S. It seems obvious to us, but we shall publish it anyway in case somebody has missed it: none of dual core processors as well as SMP systems can still be recommended for gaming, as no game profits from more than one processor in a system (probably except for chess :). A single-core processor is always cheaper than an equally-clocked dual core processor.






Stanislav Garmatiuk (nawhi@ixbt.com)
August 10, 2005.


Write a comment below. No registration needed!


Article navigation:



blog comments powered by Disqus

  Most Popular Reviews More    RSS  

AMD Phenom II X4 955, Phenom II X4 960T, Phenom II X6 1075T, and Intel Pentium G2120, Core i3-3220, Core i5-3330 Processors

Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
February 1, 2013 · Processor Roundups

Inno3D GeForce GTX 670 iChill, Inno3D GeForce GTX 660 Ti Graphics Cards

A couple of mid-range adapters with original cooling systems.
January 30, 2013 · Video cards: NVIDIA GPUs

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

AMD FX-8350 Processor

The first worthwhile Piledriver CPU.
September 11, 2012 · Processors: AMD

Consumed Power, Energy Consumption: Ivy Bridge vs. Sandy Bridge

Trying out the new method.
September 18, 2012 · Processors: Intel
  Latest Reviews More    RSS  

i3DSpeed, September 2013

Retested all graphics cards with the new drivers.
Oct 18, 2013 · 3Digests

i3DSpeed, August 2013

Added new benchmarks: BioShock Infinite and Metro: Last Light.
Sep 06, 2013 · 3Digests

i3DSpeed, July 2013

Added the test results of NVIDIA GeForce GTX 760 and AMD Radeon HD 7730.
Aug 05, 2013 · 3Digests

Gainward GeForce GTX 650 Ti BOOST 2GB Golden Sample Graphics Card

An excellent hybrid of GeForce GTX 650 Ti and GeForce GTX 660.
Jun 24, 2013 · Video cards: NVIDIA GPUs

i3DSpeed, May 2013

Added the test results of NVIDIA GeForce GTX 770/780.
Jun 03, 2013 · 3Digests
  Latest News More    RSS  

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  About us & Privacy policy  ·  Twitter  ·  Facebook


20

Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.