iXBT Labs - Computer Hardware in Detail

Platform

Video

Multimedia

Mobile

Other

CMP vs. SMP on Intel's Platform: Comparing Low-Level Memory Characteristics in RightMark Memory Analyzer




Just two days ago we published an article with the performance comparison between the new dual core Intel platforms (Pentium Extreme Edition 840 processor, Intel 955X chipset) and the traditional dual processor platforms — SMP systems based on equally clocked Intel Xeon processors (Nocona and Irwindale cores) and the Intel E7525 chipset for workstations. We got really interesting results in some tests — performance of the system based on the 3.2 GHz dual core Pentium Extreme 840 processor (rigorously analogous to a dual processor system based on 3.2 GHz Xeon (Nocona)) turned out not only higher than the performance of this platform, but also of the platform built on Intel Xeon processors with Irwindale core, which have twice as large L2 Cache (2 MB in each processor/core, compared to 2 MB L2 Cache for the entire Pentium Extreme Edition 840 processor, that is 1 MB per core).

Such a result could be explained by faster DDR2-533 memory on the desktop dual core platform compared to Registered ECC DDR2-400, used in server platforms. It's quite clear that the reason is not in higher DDR2-533 bandwidth, which potential is not revealed completely in this case (dual channel mode) due to the 200MHz FSB. It's a fault of registered modules to some degree, but the most likely reason is better characteristics of the memory controller in the new i955X chipset than the older E7525. Well, enough of guessing — in this little article we shall compare main memory characteristics of the platforms on the quantitative level. The recently released RightMark Memory Analyzer 3.55 will help us in the matter.

Testbed configurations

Testbed 1

  • CPU: Intel Pentium Extreme Edition 840 (Smithfield core, 2 x 1 MB L2, 800 MHz FSB, 2 x 3.2 GHz core)
  • Motherboard: ASUS P5WD2-Premium (Intel 955X chipset, BIOS 0205 dated 04/22/2005)
  • Memory: 2x512 MB PC2-5400 Corsair XMS2 PRO DDR2-533, 3-3-3-8
  • Video card: ATI Radeon X800 (256 MB)
  • HDD: Samsung SP1614C (SATA), 7200 rpm, 8 MB Cache
  • AC power adapter: FSP 550-60PLN (500-550W)

Testbed 2

  • Processors: 2 x Intel Xeon 3.2 GHz (Irwindale core, 2 MB L2, 800 MHz FSB)
  • Motherboard: ASUS NCT-D (Intel E7525 chipset, BIOS 1006 dated 02/23/2005)
  • Memory: 2x512 MB PC2-3200 Samsung DDR2-400, ECC, 3-3-3-8
  • Video card: ATI Radeon X800 (256 MB)
  • HDD: Samsung SP1614C (SATA), 7200 rpm, 8 MB Cache
  • AC power adapter: FSP 550-60PLN (500-550W)

Software

Real Memory Bandwidth

The real read and write memory bandwidth was tested in two modes — with enabled hardware prefetch, which is a normal processor mode, and with disabled hardware prefetch on the one hand. On the other hand, the real memory read/write bandwidth results were obtained without software prefetch, while the maximum real memory read bandwidth result — with software prefetch (using PREFETCHNTA instructions with optimal prefetch distance). And finally, the maximum real memory bandwidth results are obtained by the Non-Temporal Store method (using such instructions as MOVNTPS/MOVNTDQ).

For definiteness (in order to avoid confusion in interpreting relative percentages), the tables below contain parenthetic values for the lower performance platform that show how much a given parameter on this platform is worse in comparison with the other platform.

Characteristic Pentium XE 840
(Smithfield)
Xeon
(Irwindale)
Real Memory Read Bandwidth, MB/s
5747
4345
(1.32)
Real Memory Write Bandwidth, MB/s
2153
1878
(1.15)
Real Memory Read Bandwidth without Hardware Prefetch, MB/s
3605
2422
(1.49)
Real Memory Write Bandwidth without Hardware Prefetch, MB/s
2229
1725
(1.29)
Maximum Real Memory Read Bandwidth, MB/s
6501
5641
(1.15)
Maximum Real Memory Write Bandwidth, MB/s
4281
4232
(1.01)
Maximum Real Memory Read Bandwidth without Hardware Prefetch, MB/s
6532
5614
(1.16)
Maximum Real Memory Write Bandwidth without Hardware Prefetch, MB/s
4281
4233
(1.01)

Absolute results of the Pentium Extreme Edition 840 desktop platform are impressive — the real memory read bandwidth (5747 MB/s) is higher (!) than the maximum real memory read bandwidth, obtained on the Xeon (Irwindale) platform — 5641 MB/s. By the way, the latter is only 88% of the theoretical FSB bandwidth and the theoretical DDR2-400 bandwidth. According to our multiple reviews of the Intel Pentium 4 platforms, tests with software prefetch practically always, irregardless of a chipset type and its operating mode, reach 100% of the theoretical memory bandwidth (sometimes even higher — due to a higher FSB frequency as well as relatively large L2 or L3 Cache). Thus, we can draw a conclusion that approximately 15% of memory performance losses on dual processor Intel Xeon platforms have to do solely with registered modules and the error correction code (ECC).

As we have already mentioned above, another important factor that influences memory performance is a chipset itself (to be more exact, a built-in memory controller). Performance losses in the older E7525 chipset are more prominent in real memory read bandwidth tests. While the excellent hardware prefetch algorithm partially hides the breakaway between the i955X and the E7525 (in this case the memory bandwidth of the Xeon platforms is 1.32 times as low as the memory bandwidth of the Pentium XE 840 platform), disabled hardware prefetch illustrates the advantage of the latest desktop chipset in comparison with the older chipset for workstations (E7525). In this case the Xeon platform is almost 1.5 times as inferior to the dual core platform.

Results of the maximum real memory write bandwidth tests are the least interesting — in this case everything is limited to 2/3 of the theoretical memory bandwidth, which is always lower than the maximum real memory bandwidth even for registered DDR2-400. That's why the differences between the platforms in this parameter are negligibly small.

Memory Latency

Memory latency in case of pseudo-random (random within one page, but sequential on the level of full pages) and random access modes was also measured in two modes, with enabled and disabled hardware prefetch. Remember that the first mode provides "real" memory latency and the second mode — sort of ideal latency, depending only on the memory modules and the chipset, but not depending on the CPU.

Characteristic Pentium XE 840
(Smithfield)
Xeon
(Irwindale)
Pseudo Random Access Latency (min — max), ns
47.4 — 55.3
77.7 — 86.1
(1.56 — 1.64)
Pseudo Random Access Latency (min — max) without Hardware Prefetch, ns
72.8 — 95.2
125.8 — 149.5
(1.57 — 1.73)
Random Access Latency (min — max), ns
93.7 — 114.9
137.4 — 159.5
(1.39 — 1.46)
Random Access Latency (min — max) without Hardware Prefetch, ns
94.7 — 118.0
138.7 — 163.3
(1.38 — 1.46)

While the memory bandwidth disadvantage of the Xeon (Irwindale) platform reaches 1.5 times maximum, the situation with memory latency is still worse. Interestingly, it almost doesn't depend on whether Hardware Prefetch is enabled or disabled (it quite naturally influences only absolute values, but the alignment of forces with disabled hardware prefetch remains the same). On the average, the Xeon platform is defeated by the Pentium XE 840 desktop platform by 1.4 times in terms of random access latency. In case of the pseudo random walk, the breakaway grows to 1.55 — 1.7 times.

Conclusion

Thus, the reason for lower performance of server Intel Xeon dual processor platforms (by the example of Irwindale) compared to the desktop dual core Intel Pentium Extreme Edition is determined for certain. The weak spot of server platforms from Intel is their memory system. Firstly, it requires registered DDR2-400 modules with ECC. Secondly, it's based on the older E7525 chipset, which memory controller is noticeably inferior to that in the new desktop i955X chipset.

Memory bandwidth losses due to registered memory modules amount to 1.15 times (relative to the maximum theoretical value, which can actually be obtained on Pentium XE 840/i955X). Memory controller in the E7525 chipset has noticeably stronger influence — the average memory performance drop due to the chipset amounts to 1.3 times (irrelative to whether the modules are registered or not), in some cases it even reaches 1.5 times.

In conclusion I want to note that despite the significant differences in low-level memory characteristics of these platforms, performance differences in real tests are much lower. It can be explained by the fact that real applications and tests are far from 100% sensitive to memory bandwidth and latency.

Dmitri Besedin (dmitri_b@ixbt.com)

June 22, 2005.


Write a comment below. No registration needed!


Article navigation:



blog comments powered by Disqus

  Most Popular Reviews More    RSS  

AMD Phenom II X4 955, Phenom II X4 960T, Phenom II X6 1075T, and Intel Pentium G2120, Core i3-3220, Core i5-3330 Processors

Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
February 1, 2013 · Processor Roundups

Inno3D GeForce GTX 670 iChill, Inno3D GeForce GTX 660 Ti Graphics Cards

A couple of mid-range adapters with original cooling systems.
January 30, 2013 · Video cards: NVIDIA GPUs

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

AMD FX-8350 Processor

The first worthwhile Piledriver CPU.
September 11, 2012 · Processors: AMD

Consumed Power, Energy Consumption: Ivy Bridge vs. Sandy Bridge

Trying out the new method.
September 18, 2012 · Processors: Intel
  Latest Reviews More    RSS  

i3DSpeed, September 2013

Retested all graphics cards with the new drivers.
Oct 18, 2013 · 3Digests

i3DSpeed, August 2013

Added new benchmarks: BioShock Infinite and Metro: Last Light.
Sep 06, 2013 · 3Digests

i3DSpeed, July 2013

Added the test results of NVIDIA GeForce GTX 760 and AMD Radeon HD 7730.
Aug 05, 2013 · 3Digests

Gainward GeForce GTX 650 Ti BOOST 2GB Golden Sample Graphics Card

An excellent hybrid of GeForce GTX 650 Ti and GeForce GTX 660.
Jun 24, 2013 · Video cards: NVIDIA GPUs

i3DSpeed, May 2013

Added the test results of NVIDIA GeForce GTX 770/780.
Jun 03, 2013 · 3Digests
  Latest News More    RSS  

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  About us & Privacy policy  ·  Twitter  ·  Facebook


Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.