iXBT Labs - Computer Hardware In Detail

Platform

Video

Multimedia

Mobile

Other

RightMark Memory Analyzer 3.1: Changes and New Performance Tests

May 1, 2004



Time has come to announce a new version of a universal RightMark Memory Analyzer benchmark. The trend that we noted in RMMA 3.0 continues here and hopefully, will also continue in the future. Namely, there are more innovations than just corrections in this version. Of course, some changes have been introduced too, but they are mostly focused on making the tests more convenient in use.

Changes in RMMA tests

The first change concerns the D-Cache Latency test that initially had a large number of settings.




You can see that Mininal Walk Step Size and Maximal Walk Step Size parameters and consequently, the test variant Variable Parameter = Walk Step Size have disappeared. But we didn't remove anything, we just realised this function as a separate subtest described further.

The second change concerns the I-Cache Latency test.




This test, on the contrary, has been enlarged by a new variant: Variable Parameter = Stride Size, that enables to form a dependence of jump instruction execution latency from the stride size between two successive jumps. The option may be useful for measuring the "effective" size of the I-cache line.

New RMMA tests

And that's about all for the changes. Now it's high time we examined new tests realised in RMMA 3.1.

Memory Walk test




As has been mentioned above, this test is, in fact, a variant of the D-Cache Latency test. It was placed into a seaprate tap in order to simplify the settings. The test has the following parameters:

Strides Count

Strides Count — the number of strides in the dependent access chain. Because the stride size itself varies within a wide range in this test, it is much more sensible here to specify this parameter proper instead of the block size as it was in all other tests.

NOP Count

NOP Count — the number of voids (operations not related to the cache/memory access) inserted between each successive accesses to the cache/memory.

Minimal Stride Size

Min Stride Size, bytes, in the dependent access chain.

Maximal Stride Size

Max Stride Size, bytes, in the dependent access chain.

Selected Tests

Selected Tests define reading modes when testing latency:

Forward Read Latency

Backward Read Latency

Random Read Latency

So, the Memory Walk test reads a fixed number of chain elements that are separated by an offset with the value between Minimal Stride Size and Maximal Stride Size. The procedure allows to estimate the size of the segment belonging to the data cache level we're dealing with. And that, in turn, increases significantly latency of the access to this level. Evidently, if you want to know the size of a segment belonging to a particular data cache level you have to select the Strides Count that would exceed at least by one the associativity of the level.

I-ROB test




The next microarchitecture test is designed to identify the size of the instruction reorganisation buffer. This buffer is installed in all modern CPUs that execute the code in an out-of-order way.

The test is based on the following principle. If we want to make a CPU reorganise an instruction execution, we only have to load some very slow but simple operation that wouldn't occupy the CPU's executive resources. And then we give the CPU a long chain of other simple instructions that wouldn't depend on each other or the result of the first operation. In our case, a dependent data loading from memory will suit perfectly for the "simple but slow" operation; and NOPs (xchg eax, eax) could serve as the "simple independent" instructions. Thus, the succession of instructions executed by this test looks as follows:

// a simple but slow operation dealing with the memory access
mov eax, [eax]
// a variable number of voids
nop
...
nop

Because the first instruction will be executed in at least hundreds of CPU clocks in the right conditions (a large size of the data chain, random reading mode), such load will be enough to reorganise the execution (i.e. — to launch it simultaneously with the memory access) of at least two or three hundred NOPs (considering that modern CPUs execute them at 2-3 operations/clock). This number will exceed the size of any existing I-ROB. And I-ROB exhaustion will manifest itself in an increasing latency of the memory access starting from a certain numbre of NOPs, as it will entail a consecutive execution of other NOPs that haven't found room in the buffer.

This test has parameters mostly similar to typical settings of cache/memory latency tests.

Stride Size

Stride Size, bytes, in the dependent access chain.

Block Size

Block Size, KB — the memory size used for building and reading the chain.

Minimal NOP Count

Min NOP Count — the minimal number of voids executed by the CPU.

Maximal NOP Count

Max NOP Count — the maximal number of voids executed by the CPU.

Selected Tests

Selected Tests define reading modes:

Forward Read Latency

Backward Read Latency

Random Read Latency

Pseudo-Random Read Latency

The latter two modes are preferrable for this test because memory access latency is usually higher in these conditions than in the case of forward/backward reading that enables an effective activity of the Hardware Prefetch algorithm.

Memory performance tests

The following tests realised in RMMA 3.1 are competitive and serve for comparative testing of memory performance. It is essential to note that the tests have strict requirements for both the real memory bandwith and for the CPU's computing power. Thus, they can rather be refered to a mixed type that measures performance of CPU/RAM as a whole.




The first test (Checksum) estimates the CRC32 and Adler32 checksums using algorithms that were realised by Mark Adler in zlib. It has the following parameters: Min Block Size (KB), Max Block Size (KB), Selected TestsCRC32 Checksum and Adler32 Checksum. By default, the test uses large data volumes that exceed the CPU data cache size.




The other test (Substring Search) simply realises a search for the substring of a text of a given size (parameter Substring Length, bytes) in a large-size text array (limited by parameters Min Block Size, KB and Max Block Size, KB). In this test version, the text array is made of random symbols within the range (0x20 — 0x7F). That is, the symbols are common for a text that contains figures, capital and small Latin letters, punctuation marks, etc., while the substring is represented by a text fragment made of the program title. For exapmle, a substring 64 symbols long will look like this:

  0 1 2 3 4 5 6 7 8 9 A B C D E F
00 R i g h t M a r k   M e m o r y
10   A n a l y z e r   R i g h t M
20 a r k   M e m o r y   A n a l y
30 z e r   R i g h t M a r k   M e

The test supports two searching modes specified by Selected Tests: Case-Sensitive (considers the case of the symbols) and Case-Insensitive. The latter mode requires that the case of each symbol of the text array be transformed and thus, this test is executed at a much lower speed than the first one devoid of such transformations.

Dmitri Besedin (dmitri_b@ixbt.com)

29.04.2004


Write a comment below. No registration needed!


Article navigation:

blog comments powered by Disqus

  Most Popular Reviews More    RSS  

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

ASUS M5A97 EVO, ASUS M5A99X EVO Motherboards

Mainstream Socket AM3+ boards from the new M5 series.
November 1, 2011 · Motherboards

71 CPU Roundup

Summing up the year 2011.
January 25, 2012 · General Platform

ASRock P67 Pro3 (B3) Motherboard

A mid-end model with USB 3.0, eSATA 6Gbps and UEFI.
March 29, 2011 · Motherboards

Gigabyte GA-890FXA-UD7 Motherboard

AMD 890FX chipset in tests.
May 17, 2010 · Motherboards
  Latest Reviews More    RSS  

i3DSpeed, April 2012

Retested all graphics cards with NVIDIA Drivers 301.24 and AMD CATALYST 12.4. Added test results of the reference and overclocked AMD Radeon HD 7850, NVIDIA GeForce GTX 690, NVIDIA GeForce GTX 680 SLI, AMD Radeon HD 7970 CrossFireX, AMD Radeon HD 7770/78
May 05, 2012 · 3Digests

i3DSpeed, March 2012

Retested all graphics cards with NVIDIA Drivers 295.73 and AMD CATALYST 12.3. Added test results of the reference and overclocked AMD Radeon HD 7870 and NVIDIA GeForce GTX 680.
Apr 05, 2012 · 3Digests

i3DSpeed, February 2012

Retested all graphics cards with NVIDIA Drivers 295.52 and AMD CATALYST 12.1, added test results of AMD Radeon HD 7970/7950/7770/7750/6930.
Mar 05, 2012 · 3Digests

Palit GeForce GTX 560 Ti Twin Light Turbo 1024MB GDDR5, KFA2 GeForce GTX 560 Ti LTD OC 1024MB V2.0 Graphics Cards

A couple of interesting custom cards, one heavily overclocked.
Mar 01, 2012 · Video cards: NVIDIA GPUs

i3DSpeed, January 2012

Retested all graphics cards with NVIDIA Drivers 295.52 and AMD CATALYST 12.1, added AMD Radeon HD 7950 test results.
Feb 14, 2012 · 3Digests
  Latest News More    RSS  
  Useful Links Get listed  

Wholesale Computers & Networking

Get great Dell Coupons at CouponSnapshot.com

Saving more with great Lenovo coupon codes

Cut your budget with Coupon codes

Great HP vouchers

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  Feedback  ·  About us & Privacy policy  ·  Twitter  ·  Facebook


22

Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.