iXBT Labs :: Computer Hardware In Detail

Platform

Video

Multimedia

Mobile

Other

CMP vs. SMP on Intel's Platform: Comparing Low-Level Memory Characteristics in RightMark Memory Analyzer


 

 Forum


Just two days ago we published an article with the performance comparison between the new dual core Intel platforms (Pentium Extreme Edition 840 processor, Intel 955X chipset) and the traditional dual processor platforms — SMP systems based on equally clocked Intel Xeon processors (Nocona and Irwindale cores) and the Intel E7525 chipset for workstations. We got really interesting results in some tests — performance of the system based on the 3.2 GHz dual core Pentium Extreme 840 processor (rigorously analogous to a dual processor system based on 3.2 GHz Xeon (Nocona)) turned out not only higher than the performance of this platform, but also of the platform built on Intel Xeon processors with Irwindale core, which have twice as large L2 Cache (2 MB in each processor/core, compared to 2 MB L2 Cache for the entire Pentium Extreme Edition 840 processor, that is 1 MB per core).

Such a result could be explained by faster DDR2-533 memory on the desktop dual core platform compared to Registered ECC DDR2-400, used in server platforms. It's quite clear that the reason is not in higher DDR2-533 bandwidth, which potential is not revealed completely in this case (dual channel mode) due to the 200MHz FSB. It's a fault of registered modules to some degree, but the most likely reason is better characteristics of the memory controller in the new i955X chipset than the older E7525. Well, enough of guessing — in this little article we shall compare main memory characteristics of the platforms on the quantitative level. The recently released RightMark Memory Analyzer 3.55 will help us in the matter.

Testbed configurations

Testbed 1

  • CPU: Intel Pentium Extreme Edition 840 (Smithfield core, 2 x 1 MB L2, 800 MHz FSB, 2 x 3.2 GHz core)
  • Motherboard: ASUS P5WD2-Premium (Intel 955X chipset, BIOS 0205 dated 04/22/2005)
  • Memory: 2x512 MB PC2-5400 Corsair XMS2 PRO DDR2-533, 3-3-3-8
  • Video card: ATI Radeon X800 (256 MB)
  • HDD: Samsung SP1614C (SATA), 7200 rpm, 8 MB Cache
  • AC power adapter: FSP 550-60PLN (500-550W)

Testbed 2

  • Processors: 2 x Intel Xeon 3.2 GHz (Irwindale core, 2 MB L2, 800 MHz FSB)
  • Motherboard: ASUS NCT-D (Intel E7525 chipset, BIOS 1006 dated 02/23/2005)
  • Memory: 2x512 MB PC2-3200 Samsung DDR2-400, ECC, 3-3-3-8
  • Video card: ATI Radeon X800 (256 MB)
  • HDD: Samsung SP1614C (SATA), 7200 rpm, 8 MB Cache
  • AC power adapter: FSP 550-60PLN (500-550W)

Software

Real Memory Bandwidth

The real read and write memory bandwidth was tested in two modes — with enabled hardware prefetch, which is a normal processor mode, and with disabled hardware prefetch on the one hand. On the other hand, the real memory read/write bandwidth results were obtained without software prefetch, while the maximum real memory read bandwidth result — with software prefetch (using PREFETCHNTA instructions with optimal prefetch distance). And finally, the maximum real memory bandwidth results are obtained by the Non-Temporal Store method (using such instructions as MOVNTPS/MOVNTDQ).

For definiteness (in order to avoid confusion in interpreting relative percentages), the tables below contain parenthetic values for the lower performance platform that show how much a given parameter on this platform is worse in comparison with the other platform.

Characteristic Pentium XE 840
(Smithfield)
Xeon
(Irwindale)
Real Memory Read Bandwidth, MB/s
5747
4345
(1.32)
Real Memory Write Bandwidth, MB/s
2153
1878
(1.15)
Real Memory Read Bandwidth without Hardware Prefetch, MB/s
3605
2422
(1.49)
Real Memory Write Bandwidth without Hardware Prefetch, MB/s
2229
1725
(1.29)
Maximum Real Memory Read Bandwidth, MB/s
6501
5641
(1.15)
Maximum Real Memory Write Bandwidth, MB/s
4281
4232
(1.01)
Maximum Real Memory Read Bandwidth without Hardware Prefetch, MB/s
6532
5614
(1.16)
Maximum Real Memory Write Bandwidth without Hardware Prefetch, MB/s
4281
4233
(1.01)

Absolute results of the Pentium Extreme Edition 840 desktop platform are impressive — the real memory read bandwidth (5747 MB/s) is higher (!) than the maximum real memory read bandwidth, obtained on the Xeon (Irwindale) platform — 5641 MB/s. By the way, the latter is only 88% of the theoretical FSB bandwidth and the theoretical DDR2-400 bandwidth. According to our multiple reviews of the Intel Pentium 4 platforms, tests with software prefetch practically always, irregardless of a chipset type and its operating mode, reach 100% of the theoretical memory bandwidth (sometimes even higher — due to a higher FSB frequency as well as relatively large L2 or L3 Cache). Thus, we can draw a conclusion that approximately 15% of memory performance losses on dual processor Intel Xeon platforms have to do solely with registered modules and the error correction code (ECC).

As we have already mentioned above, another important factor that influences memory performance is a chipset itself (to be more exact, a built-in memory controller). Performance losses in the older E7525 chipset are more prominent in real memory read bandwidth tests. While the excellent hardware prefetch algorithm partially hides the breakaway between the i955X and the E7525 (in this case the memory bandwidth of the Xeon platforms is 1.32 times as low as the memory bandwidth of the Pentium XE 840 platform), disabled hardware prefetch illustrates the advantage of the latest desktop chipset in comparison with the older chipset for workstations (E7525). In this case the Xeon platform is almost 1.5 times as inferior to the dual core platform.

Results of the maximum real memory write bandwidth tests are the least interesting — in this case everything is limited to 2/3 of the theoretical memory bandwidth, which is always lower than the maximum real memory bandwidth even for registered DDR2-400. That's why the differences between the platforms in this parameter are negligibly small.

Memory Latency

Memory latency in case of pseudo-random (random within one page, but sequential on the level of full pages) and random access modes was also measured in two modes, with enabled and disabled hardware prefetch. Remember that the first mode provides "real" memory latency and the second mode — sort of ideal latency, depending only on the memory modules and the chipset, but not depending on the CPU.

Characteristic Pentium XE 840
(Smithfield)
Xeon
(Irwindale)
Pseudo Random Access Latency (min — max), ns
47.4 — 55.3
77.7 — 86.1
(1.56 — 1.64)
Pseudo Random Access Latency (min — max) without Hardware Prefetch, ns
72.8 — 95.2
125.8 — 149.5
(1.57 — 1.73)
Random Access Latency (min — max), ns
93.7 — 114.9
137.4 — 159.5
(1.39 — 1.46)
Random Access Latency (min — max) without Hardware Prefetch, ns
94.7 — 118.0
138.7 — 163.3
(1.38 — 1.46)

While the memory bandwidth disadvantage of the Xeon (Irwindale) platform reaches 1.5 times maximum, the situation with memory latency is still worse. Interestingly, it almost doesn't depend on whether Hardware Prefetch is enabled or disabled (it quite naturally influences only absolute values, but the alignment of forces with disabled hardware prefetch remains the same). On the average, the Xeon platform is defeated by the Pentium XE 840 desktop platform by 1.4 times in terms of random access latency. In case of the pseudo random walk, the breakaway grows to 1.55 — 1.7 times.

Conclusion

Thus, the reason for lower performance of server Intel Xeon dual processor platforms (by the example of Irwindale) compared to the desktop dual core Intel Pentium Extreme Edition is determined for certain. The weak spot of server platforms from Intel is their memory system. Firstly, it requires registered DDR2-400 modules with ECC. Secondly, it's based on the older E7525 chipset, which memory controller is noticeably inferior to that in the new desktop i955X chipset.

Memory bandwidth losses due to registered memory modules amount to 1.15 times (relative to the maximum theoretical value, which can actually be obtained on Pentium XE 840/i955X). Memory controller in the E7525 chipset has noticeably stronger influence — the average memory performance drop due to the chipset amounts to 1.3 times (irrelative to whether the modules are registered or not), in some cases it even reaches 1.5 times.

In conclusion I want to note that despite the significant differences in low-level memory characteristics of these platforms, performance differences in real tests are much lower. It can be explained by the fact that real applications and tests are far from 100% sensitive to memory bandwidth and latency.

Dmitri Besedin (dmitri_b@ixbt.com)

June 22, 2005.




  Most Popular Reviews More   RSS  
March 2, 2009 · Processors: AMD

AMD Phenom II X4 940 Overclocking
With various motherboards. How it affects games.

May 15, 2009 · System Memory

Phenom II Processors and DDR3 Memory
What are the benefits of moving to DDR3?

February 16, 2009 · Processors: AMD

AMD Phenom II X3 720 / X4 810 / X4 920 Processors
Aimed at leadership in the budget segment.

June 3, 2009 · Processors: Intel

Intel Core i7 950 Processor
Original, but not surprising.

May 27, 2009 · Processors: Intel

Intel Core 2 with Various FSB / Memory Clock Rates
What memory clock rate can uncover Core 2 potential?

  Latest Reviews More   RSS  
July 6, 2009 · 3Digests

i3DSpeed, June 2009
NVIDIA ForceWare 186.18 WHQL and ATI CATALYST 9.6 drivers in tests.

June 26, 2009 · Coolers

Five Premium CPU Coolers In Tests
From ASUS, Noctua, Scythe, Thermalright and Thermaltake.

June 22, 2009 · Video cards: NVIDIA GPUs

NVIDIA GeForce GT2XX Reference
Specifications, architecture, technologies, etc.

June 22, 2009 · Video cards: ATI GPUs

AMD(ATI) RADEON R(V)7XX Reference
Specifications, architecture, technologies, etc.

June 18, 2009 · Sound Cards

Creative Sound Blaster X-Fi Titanium Sound Cards
A hardware DSP and the PCI Express bus.

  Latest News More   RSS  
  Useful Links Get listed  
 
iXBT Labs: Technology Reviews Widgets

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ·  Compare Prices  ||  Feedback  ·  Advertise at iXBT Labs  ·  About us  ·  Affiliates  ·  Forum


24

Copyright © Byrds Research & Publishing, Ltd., 1997—2009. All rights reserved.