Although Core i7 processors with the integrated memory controller are already announced and available in stores, their market presence will always be insignificant (Intel's forecast). There is still some time left before the i5 is launched, so system integrators will still use processors based on the previous microarchitecture. And finding an optimal configuration will still be important for Core 2-based systems. In this article we'll analyze several memory configurations in order to understand how fast it should be and how much of it is required to reveal full potential of the fastest processors without paying too much for it. The problem of paying too much is vital, as only 'common' manufacturers (such as Samsung and Hynix) offer JEDEC-compliant modules, which characteristics may include only maximum supported frequencies. Makers of 'elite' memory modules (Corsair, OCZ, GeIL, and others) easily surpass standard limits of frequencies and voltages (usually both of them at once), so their higher prices are reasonable. Moreover, many platforms for Intel processors use DDR3 memory. Along with being more expensive than DDR2, this memory type entices users to buy 'elite' memory modules with out-of-limit performance. By the way, such memory will most likely have low upgrade prospects, because it's officially recommended not to increase DDR3 voltage above 1.65V with Nehalem processors. For our tests we'll take motherboards with two top chipsets: Intel X48 and NVIDIA nForce 790i Ultra SLI. Both of them support top Core 2 configurations: sterling support for PCI Express 2.0, all DDR3 memory standards (at least for memory modules with SPD -- EPP 2.0 or XMP), 400(1600) MHz FSB. The question now arises of how relevant the latter property is for common users, considering that only one processor with 1600MHz FSB has been launched so far? Answer: it's not relevant, of course. But tests of this mode will help us get a better idea of what is going on. Besides, this mode can be considered as an isolated case of overclocking, so we can estimate which memory should be used to overclock a processor. As we have already mentioned above, both chipsets are designed for DDR3 memory. Fortunately, there are already a lot of motherboards with this Intel chipset that allow to install DDR2 memory or support both types, as our MSI model. So what configurations shall we test? We should digress a little here and explain that the speed of operations with memory is limited by memory frequency and timings as well as by characteristics of the processor bus, because its bandwidth can limit the maximum data transfer rate from memory and back. Indeed, starting from dual-channel DDR access, memory bandwidth is not lower than that of a system bus. And it became significantly higher since DDR2 (for example, the 1066MHz FSB is ~8533MB/s wide, which corresponds to bandwidth of dual-channel DDR2-533). But will it suffice to install two DDR2-533 memory modules and a processor with a 1066MHz FSB? We cannot answer this question for certain without taking into account another parameter -- memory timings. It's common knowledge that the higher memory clock rate, the higher should be relative memory latencies (in cycles), just because cycle time will be shorter. But in practice, timings may be preserved even at higher frequencies (because the absolute latency may fit into the specified number of cycles). On the other hand, depending on the chip organization and other parameters, the relative latency cannot be decreased, when frequencies are reduced, because it has reached the limit. Thus, a computer with 1066MHz FSB and two DDR2-533 memory modules (CL=4) should theoretically be a tad slower than the same computer with two DDR2-667 memory modules operating with the same latency (CL=4). In our analysis we tried to test different combinations of FSB frequencies and memory timings/frequencies, supplementing or verifying results obtained with two chipsets. Testbeds
Test results: 1066MHz FSBWe'll start with a processor with a 1066MHz FSB. As we have already mentioned above, dual-channel DDR2-533 memory is sufficient in terms of its bandwidth at this frequency. However, we did not include this memory configuration into our tests, because DDR2-533 is practically extinct already, and its price is inadequate to the situation. DDR2-667 and DDR2-800 modules are present on a larger scale, but we cannot say for sure that they differ in price. Nevertheless, we are going to test a computer with dual-channel DDR2-667 memory just because we are curious. We already noted in our previous articles that the NVIDIA chipset outperforms Intel solutions in equal conditions, and it's especially noticeable in synthetic tests sometimes. Besides, DDR3 memory in modern computers is usually a tad slower than DDR2 (in the same modes and timings). From now forth we shall not pay attention to these issues, unless the difference appears in the memory configuration aspect we are interested in. We'll traditionally start with a low-level memory analysis with our RightMark Memory Analyzer. This diagram shows that performance grows in all cases, as memory frequency is increased to 1066MHz, even if timings are increased -- sometimes it's evidently disproportionate (for example, absolute latencies of DDR3-1066@7-7-7-20-1T memory are much worse than those of DDR3-800@5-5-5-16-1T). And just increasing the memory frequency to 1333MHz will give nothing (or it will be overcome by the effect of increased timings). The same concerns the memory write speed. Small wonder that the memory read latency test demonstrates the same ratios, even though DDR3-1333 managed to outperform DDR3-1066 a little in random access time in this case. Now let's see whether the situation in multi-threaded memory access is different: perhaps, two cores in a competing mode will utilize the bus bandwidth more efficiently? For this purpose we use RMMT (RightMark Multi-Threaded Memory Test) from the RMMA suite. (We'll allocate 32 MB to each thread, data prefetch distance will be adjusted individually to maximize results.)
Results have changed a little (multi-threaded reading is a tad faster, multi-threaded writing is a tad slower), but mutual positions of our contenders haven't. Now we'll verify our data in a couple of real applications and evaluate the difference in real results. Equipped with results of synthetic tests, we did not expect any other outcome. Archiving performance (a group of real tests, which speed depends much on memory performance) really improves with memory frequency increased to 1066MHz, even if timings grow disproportionately. On the other hand, DDR3-1333 does not bring noticeable dividends, although it does not deteriorate performance, if timings do not grow too high. Performance in games is governed by the same rules -- at least in those modes, where speed is limited by a CPU and memory, not by a graphics card. Let's take a look at absolute values of performance gain. In 7-Zip, the fastest configuration with Intel X48 (DDR2-1066@5-5-5-16-2T) accelerates the system with ) 1066MHz FSB by 6.5% versus DDR2-667@4-4-4-12-2T. Not bad: the difference corresponds to CPU multiplier 0.5. That is other things being equal, this performance gain makes the same difference as a one-step-faster CPU. This effect amounts to +8.3% in Doom 3. Here is the main conclusion for this group of tests: contrary to purely theoretical assumptions, faster memory accelerates the system up to DDR2/DDR3-1066. Is it a coincidence that the maximum effective memory frequency matches the FSB frequency? We'll try to find an answer in the next parts of this article. Write a comment below. No registration needed!
|
Platform · Video · Multimedia · Mobile · Other || About us & Privacy policy · Twitter · Facebook Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved. |