iXBT Labs - Computer Hardware in Detail






RightMark Memory Analyzer 3.1: Changes and New Performance Tests

May 1, 2004

Time has come to announce a new version of a universal RightMark Memory Analyzer benchmark. The trend that we noted in RMMA 3.0 continues here and hopefully, will also continue in the future. Namely, there are more innovations than just corrections in this version. Of course, some changes have been introduced too, but they are mostly focused on making the tests more convenient in use.

Changes in RMMA tests

The first change concerns the D-Cache Latency test that initially had a large number of settings.

You can see that Mininal Walk Step Size and Maximal Walk Step Size parameters and consequently, the test variant Variable Parameter = Walk Step Size have disappeared. But we didn't remove anything, we just realised this function as a separate subtest described further.

The second change concerns the I-Cache Latency test.

This test, on the contrary, has been enlarged by a new variant: Variable Parameter = Stride Size, that enables to form a dependence of jump instruction execution latency from the stride size between two successive jumps. The option may be useful for measuring the "effective" size of the I-cache line.

New RMMA tests

And that's about all for the changes. Now it's high time we examined new tests realised in RMMA 3.1.

Memory Walk test

As has been mentioned above, this test is, in fact, a variant of the D-Cache Latency test. It was placed into a seaprate tap in order to simplify the settings. The test has the following parameters:

Strides Count

Strides Count — the number of strides in the dependent access chain. Because the stride size itself varies within a wide range in this test, it is much more sensible here to specify this parameter proper instead of the block size as it was in all other tests.

NOP Count

NOP Count — the number of voids (operations not related to the cache/memory access) inserted between each successive accesses to the cache/memory.

Minimal Stride Size

Min Stride Size, bytes, in the dependent access chain.

Maximal Stride Size

Max Stride Size, bytes, in the dependent access chain.

Selected Tests

Selected Tests define reading modes when testing latency:

Forward Read Latency

Backward Read Latency

Random Read Latency

So, the Memory Walk test reads a fixed number of chain elements that are separated by an offset with the value between Minimal Stride Size and Maximal Stride Size. The procedure allows to estimate the size of the segment belonging to the data cache level we're dealing with. And that, in turn, increases significantly latency of the access to this level. Evidently, if you want to know the size of a segment belonging to a particular data cache level you have to select the Strides Count that would exceed at least by one the associativity of the level.

I-ROB test

The next microarchitecture test is designed to identify the size of the instruction reorganisation buffer. This buffer is installed in all modern CPUs that execute the code in an out-of-order way.

The test is based on the following principle. If we want to make a CPU reorganise an instruction execution, we only have to load some very slow but simple operation that wouldn't occupy the CPU's executive resources. And then we give the CPU a long chain of other simple instructions that wouldn't depend on each other or the result of the first operation. In our case, a dependent data loading from memory will suit perfectly for the "simple but slow" operation; and NOPs (xchg eax, eax) could serve as the "simple independent" instructions. Thus, the succession of instructions executed by this test looks as follows:

// a simple but slow operation dealing with the memory access
mov eax, [eax]
// a variable number of voids

Because the first instruction will be executed in at least hundreds of CPU clocks in the right conditions (a large size of the data chain, random reading mode), such load will be enough to reorganise the execution (i.e. — to launch it simultaneously with the memory access) of at least two or three hundred NOPs (considering that modern CPUs execute them at 2-3 operations/clock). This number will exceed the size of any existing I-ROB. And I-ROB exhaustion will manifest itself in an increasing latency of the memory access starting from a certain numbre of NOPs, as it will entail a consecutive execution of other NOPs that haven't found room in the buffer.

This test has parameters mostly similar to typical settings of cache/memory latency tests.

Stride Size

Stride Size, bytes, in the dependent access chain.

Block Size

Block Size, KB — the memory size used for building and reading the chain.

Minimal NOP Count

Min NOP Count — the minimal number of voids executed by the CPU.

Maximal NOP Count

Max NOP Count — the maximal number of voids executed by the CPU.

Selected Tests

Selected Tests define reading modes:

Forward Read Latency

Backward Read Latency

Random Read Latency

Pseudo-Random Read Latency

The latter two modes are preferrable for this test because memory access latency is usually higher in these conditions than in the case of forward/backward reading that enables an effective activity of the Hardware Prefetch algorithm.

Memory performance tests

The following tests realised in RMMA 3.1 are competitive and serve for comparative testing of memory performance. It is essential to note that the tests have strict requirements for both the real memory bandwith and for the CPU's computing power. Thus, they can rather be refered to a mixed type that measures performance of CPU/RAM as a whole.

The first test (Checksum) estimates the CRC32 and Adler32 checksums using algorithms that were realised by Mark Adler in zlib. It has the following parameters: Min Block Size (KB), Max Block Size (KB), Selected TestsCRC32 Checksum and Adler32 Checksum. By default, the test uses large data volumes that exceed the CPU data cache size.

The other test (Substring Search) simply realises a search for the substring of a text of a given size (parameter Substring Length, bytes) in a large-size text array (limited by parameters Min Block Size, KB and Max Block Size, KB). In this test version, the text array is made of random symbols within the range (0x20 — 0x7F). That is, the symbols are common for a text that contains figures, capital and small Latin letters, punctuation marks, etc., while the substring is represented by a text fragment made of the program title. For exapmle, a substring 64 symbols long will look like this:

  0 1 2 3 4 5 6 7 8 9 A B C D E F
00 R i g h t M a r k   M e m o r y
10   A n a l y z e r   R i g h t M
20 a r k   M e m o r y   A n a l y
30 z e r   R i g h t M a r k   M e

The test supports two searching modes specified by Selected Tests: Case-Sensitive (considers the case of the symbols) and Case-Insensitive. The latter mode requires that the case of each symbol of the text array be transformed and thus, this test is executed at a much lower speed than the first one devoid of such transformations.

Dmitri Besedin (dmitri_b@ixbt.com)


Write a comment below. No registration needed!

Article navigation:

blog comments powered by Disqus

  Most Popular Reviews More    RSS  

AMD Phenom II X4 955, Phenom II X4 960T, Phenom II X6 1075T, and Intel Pentium G2120, Core i3-3220, Core i5-3330 Processors

Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
February 1, 2013 · Processor Roundups

Inno3D GeForce GTX 670 iChill, Inno3D GeForce GTX 660 Ti Graphics Cards

A couple of mid-range adapters with original cooling systems.
January 30, 2013 · Video cards: NVIDIA GPUs

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

AMD FX-8350 Processor

The first worthwhile Piledriver CPU.
September 11, 2012 · Processors: AMD

Consumed Power, Energy Consumption: Ivy Bridge vs. Sandy Bridge

Trying out the new method.
September 18, 2012 · Processors: Intel
  Latest Reviews More    RSS  

i3DSpeed, September 2013

Retested all graphics cards with the new drivers.
Oct 18, 2013 · 3Digests

i3DSpeed, August 2013

Added new benchmarks: BioShock Infinite and Metro: Last Light.
Sep 06, 2013 · 3Digests

i3DSpeed, July 2013

Added the test results of NVIDIA GeForce GTX 760 and AMD Radeon HD 7730.
Aug 05, 2013 · 3Digests

Gainward GeForce GTX 650 Ti BOOST 2GB Golden Sample Graphics Card

An excellent hybrid of GeForce GTX 650 Ti and GeForce GTX 660.
Jun 24, 2013 · Video cards: NVIDIA GPUs

i3DSpeed, May 2013

Added the test results of NVIDIA GeForce GTX 770/780.
Jun 03, 2013 · 3Digests
  Latest News More    RSS  

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  About us & Privacy policy  ·  Twitter  ·  Facebook

Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.