iXBT Labs :: Computer Hardware In Detail

Platform

Video

Multimedia

Mobile

Other

CMP vs. SMP on Intel's Platform: Comparing Low-Level Memory Characteristics in RightMark Memory Analyzer

ShareThis


Just two days ago we published an article with the performance comparison between the new dual core Intel platforms (Pentium Extreme Edition 840 processor, Intel 955X chipset) and the traditional dual processor platforms — SMP systems based on equally clocked Intel Xeon processors (Nocona and Irwindale cores) and the Intel E7525 chipset for workstations. We got really interesting results in some tests — performance of the system based on the 3.2 GHz dual core Pentium Extreme 840 processor (rigorously analogous to a dual processor system based on 3.2 GHz Xeon (Nocona)) turned out not only higher than the performance of this platform, but also of the platform built on Intel Xeon processors with Irwindale core, which have twice as large L2 Cache (2 MB in each processor/core, compared to 2 MB L2 Cache for the entire Pentium Extreme Edition 840 processor, that is 1 MB per core).

Such a result could be explained by faster DDR2-533 memory on the desktop dual core platform compared to Registered ECC DDR2-400, used in server platforms. It's quite clear that the reason is not in higher DDR2-533 bandwidth, which potential is not revealed completely in this case (dual channel mode) due to the 200MHz FSB. It's a fault of registered modules to some degree, but the most likely reason is better characteristics of the memory controller in the new i955X chipset than the older E7525. Well, enough of guessing — in this little article we shall compare main memory characteristics of the platforms on the quantitative level. The recently released RightMark Memory Analyzer 3.55 will help us in the matter.

Testbed configurations

Testbed 1

  • CPU: Intel Pentium Extreme Edition 840 (Smithfield core, 2 x 1 MB L2, 800 MHz FSB, 2 x 3.2 GHz core)
  • Motherboard: ASUS P5WD2-Premium (Intel 955X chipset, BIOS 0205 dated 04/22/2005)
  • Memory: 2x512 MB PC2-5400 Corsair XMS2 PRO DDR2-533, 3-3-3-8
  • Video card: ATI Radeon X800 (256 MB)
  • HDD: Samsung SP1614C (SATA), 7200 rpm, 8 MB Cache
  • AC power adapter: FSP 550-60PLN (500-550W)

Testbed 2

  • Processors: 2 x Intel Xeon 3.2 GHz (Irwindale core, 2 MB L2, 800 MHz FSB)
  • Motherboard: ASUS NCT-D (Intel E7525 chipset, BIOS 1006 dated 02/23/2005)
  • Memory: 2x512 MB PC2-3200 Samsung DDR2-400, ECC, 3-3-3-8
  • Video card: ATI Radeon X800 (256 MB)
  • HDD: Samsung SP1614C (SATA), 7200 rpm, 8 MB Cache
  • AC power adapter: FSP 550-60PLN (500-550W)

Software

Real Memory Bandwidth

The real read and write memory bandwidth was tested in two modes — with enabled hardware prefetch, which is a normal processor mode, and with disabled hardware prefetch on the one hand. On the other hand, the real memory read/write bandwidth results were obtained without software prefetch, while the maximum real memory read bandwidth result — with software prefetch (using PREFETCHNTA instructions with optimal prefetch distance). And finally, the maximum real memory bandwidth results are obtained by the Non-Temporal Store method (using such instructions as MOVNTPS/MOVNTDQ).

For definiteness (in order to avoid confusion in interpreting relative percentages), the tables below contain parenthetic values for the lower performance platform that show how much a given parameter on this platform is worse in comparison with the other platform.

Characteristic Pentium XE 840
(Smithfield)
Xeon
(Irwindale)
Real Memory Read Bandwidth, MB/s
5747
4345
(1.32)
Real Memory Write Bandwidth, MB/s
2153
1878
(1.15)
Real Memory Read Bandwidth without Hardware Prefetch, MB/s
3605
2422
(1.49)
Real Memory Write Bandwidth without Hardware Prefetch, MB/s
2229
1725
(1.29)
Maximum Real Memory Read Bandwidth, MB/s
6501
5641
(1.15)
Maximum Real Memory Write Bandwidth, MB/s
4281
4232
(1.01)
Maximum Real Memory Read Bandwidth without Hardware Prefetch, MB/s
6532
5614
(1.16)
Maximum Real Memory Write Bandwidth without Hardware Prefetch, MB/s
4281
4233
(1.01)

Absolute results of the Pentium Extreme Edition 840 desktop platform are impressive — the real memory read bandwidth (5747 MB/s) is higher (!) than the maximum real memory read bandwidth, obtained on the Xeon (Irwindale) platform — 5641 MB/s. By the way, the latter is only 88% of the theoretical FSB bandwidth and the theoretical DDR2-400 bandwidth. According to our multiple reviews of the Intel Pentium 4 platforms, tests with software prefetch practically always, irregardless of a chipset type and its operating mode, reach 100% of the theoretical memory bandwidth (sometimes even higher — due to a higher FSB frequency as well as relatively large L2 or L3 Cache). Thus, we can draw a conclusion that approximately 15% of memory performance losses on dual processor Intel Xeon platforms have to do solely with registered modules and the error correction code (ECC).

As we have already mentioned above, another important factor that influences memory performance is a chipset itself (to be more exact, a built-in memory controller). Performance losses in the older E7525 chipset are more prominent in real memory read bandwidth tests. While the excellent hardware prefetch algorithm partially hides the breakaway between the i955X and the E7525 (in this case the memory bandwidth of the Xeon platforms is 1.32 times as low as the memory bandwidth of the Pentium XE 840 platform), disabled hardware prefetch illustrates the advantage of the latest desktop chipset in comparison with the older chipset for workstations (E7525). In this case the Xeon platform is almost 1.5 times as inferior to the dual core platform.

Results of the maximum real memory write bandwidth tests are the least interesting — in this case everything is limited to 2/3 of the theoretical memory bandwidth, which is always lower than the maximum real memory bandwidth even for registered DDR2-400. That's why the differences between the platforms in this parameter are negligibly small.

Memory Latency

Memory latency in case of pseudo-random (random within one page, but sequential on the level of full pages) and random access modes was also measured in two modes, with enabled and disabled hardware prefetch. Remember that the first mode provides "real" memory latency and the second mode — sort of ideal latency, depending only on the memory modules and the chipset, but not depending on the CPU.

Characteristic Pentium XE 840
(Smithfield)
Xeon
(Irwindale)
Pseudo Random Access Latency (min — max), ns
47.4 — 55.3
77.7 — 86.1
(1.56 — 1.64)
Pseudo Random Access Latency (min — max) without Hardware Prefetch, ns
72.8 — 95.2
125.8 — 149.5
(1.57 — 1.73)
Random Access Latency (min — max), ns
93.7 — 114.9
137.4 — 159.5
(1.39 — 1.46)
Random Access Latency (min — max) without Hardware Prefetch, ns
94.7 — 118.0
138.7 — 163.3
(1.38 — 1.46)

While the memory bandwidth disadvantage of the Xeon (Irwindale) platform reaches 1.5 times maximum, the situation with memory latency is still worse. Interestingly, it almost doesn't depend on whether Hardware Prefetch is enabled or disabled (it quite naturally influences only absolute values, but the alignment of forces with disabled hardware prefetch remains the same). On the average, the Xeon platform is defeated by the Pentium XE 840 desktop platform by 1.4 times in terms of random access latency. In case of the pseudo random walk, the breakaway grows to 1.55 — 1.7 times.

Conclusion

Thus, the reason for lower performance of server Intel Xeon dual processor platforms (by the example of Irwindale) compared to the desktop dual core Intel Pentium Extreme Edition is determined for certain. The weak spot of server platforms from Intel is their memory system. Firstly, it requires registered DDR2-400 modules with ECC. Secondly, it's based on the older E7525 chipset, which memory controller is noticeably inferior to that in the new desktop i955X chipset.

Memory bandwidth losses due to registered memory modules amount to 1.15 times (relative to the maximum theoretical value, which can actually be obtained on Pentium XE 840/i955X). Memory controller in the E7525 chipset has noticeably stronger influence — the average memory performance drop due to the chipset amounts to 1.3 times (irrelative to whether the modules are registered or not), in some cases it even reaches 1.5 times.

In conclusion I want to note that despite the significant differences in low-level memory characteristics of these platforms, performance differences in real tests are much lower. It can be explained by the fact that real applications and tests are far from 100% sensitive to memory bandwidth and latency.

Dmitri Besedin (dmitri_b@ixbt.com)

June 22, 2005.


Article navigation:



  Most Popular Reviews More   RSS  

59 CPU Roundup

Summarizing results obtained with CPU Test Method v4.0.
Mar 12, 2010 · General Platform

Intel H55, H57 Chipsets

Integrated chipsets for lower-end Nehalem processors.
Jan 18, 2010 · Chipsets

E-MU 0404 USB

A top-class external audio interface.
Nov 23, 2006 · ProAudio

How CPU Features Affect CPU Performance, Part 7

Intel Core i7, memory subsystem.
Feb 18, 2010 · General Platform

AMD Athlon II X3 425, 435 Processors

Three cores for the price of two.
Oct 30, 2009 · Processors: AMD
  Latest Reviews More   RSS  

AMD 890GX Chipset Overview

Newer integrated graphics core, Dual Graphics, SATA 3.0 support.
Mar 16, 2010 · Chipsets

ATI Radeon HD 5850/5870/5970 Graphics Cards

From AFOX, Manli, Sapphire, XFX.
Mar 15, 2010 · Video cards: ATI GPUs

59 CPU Roundup

Summarizing results obtained with CPU Test Method v4.0.
Mar 12, 2010 · General Platform

MSI 770-C35 Motherboard

A solution for conservative users.
Mar 11, 2010 · Motherboards

AMD Athlon II X2 215 Processor

The cheapest AMD has to offer.
Mar 10, 2010 · Processors: AMD
  Latest News More   RSS  
 
  Useful Links Get listed  

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ·  Compare Prices  ||  Feedback  ·  Advertise at iXBT Labs  ·  About us  ·  Affiliates  ·  Forum


28

Copyright © Byrds Research & Publishing, Ltd., 1997—2010. All rights reserved.