SPEC CPU2000 Test. Part 1. Introduction

General description

Standard Performance Evaluation Corporation (SPEC) was founded in 1988 by several suppliers of computer facilities for the purpose of development and support of a wide range of computer system performance measurement programs. Today the corporation consists of over 60 well-known companies.

SPEC offers software for estimation of post servers, Internet servers, file servers, supercomputers and clusters, computation systems, professional graphics applications etc. Some tests are free and available for download, others are quite expensive; the most popular test is SPECviewperf which is used to estimate performance of OpenGL applications. But today we will speak about less known but also very interesting SPEC CPU2000 test.

The CPU2000 is developed to estimate performance of a central processor(s). As the CPU usually works in combination with RAM and chipset, it would be more correct to say that the CPU2000 tests performance of computation systems by using compute-intensive calculations.

The results do not actually depend on such components as a video card, a hard drive and a CD-ROM drive. First of all, because the test uses operation with a command line and doesn't display results. And secondly, the most part of operations are carried out in RAM not to stress a disc subsystem.

So that the portability to different platforms can be high the test consists of source texts and tasks in C, C++ and Fortran. On one hand, it allows comparing such different systems as, for example, a computer based on the AMD Athlon and Windows NT and a cluster of 32 dual-processor computers on the Intel Xeon working under the Unix clone. Such variant, however, brings in one more factor into the test which is a possibility to choose compilers and their settings when creating test files to be implemented on a chosen platform.

All applications enabled are divided into two groups. The CINT2000 includes 12 applications which operate mainly with integer data (and logical operators). 11 are written in C, and 1 in C++. The second suite - CFP2000 consists of 14 applications (6 Fortran-77, 4 Fortran-90 and 4 C) which use intensive floating-point operations. The final scores are based on measurement of time of operation of these applications.

The SPEC CPU is, in fact, a synthetic test. Although all tasks are taken from the real life (e.g., archiving and compilation), they differ from real programs. It can be explained by improvement of algorithms and by a chosen compiler - the test certainly uses the latest version, and a real application was probably compiled with a last-year version. That is why it's impossible to generalize the test results to your favorite application.

But because it's impossible to account for all tasks the SPEC CPU has become an industrial standard; it gives an average score which can be used as a standard reference point in performance comparison.

Unfortunately, tests are usually based either on a script with a real application (then it's much argued about which version should be used) or they are synthetic (then the results should be very carefully generalized as such tasks can never be used in reality at all).

SPEC tried to find compromise by using real applications in source codes (it means that it froze improvement of algorithms and limited code optimization). We will try to see whether the attempt is successful.

We will not discuss whether it's correct to use a single final score. However, it should be noted that the SPEC CPU2000 is a synthetic test, and a single final figure suits better for performance estimation of a wide range of tasks as separate applications can be untypical of the test platform (for example, the program version of OpenGL), and it's more difficult to generalize their test results to even similar tasks.

Test system and utilization

CD with the SPEC CPU2000 contains:

source codes of applications
source codes for operation of tasks
utilities for compiling and running of the benchmarks
documentation

The total size of the files is over 300 MBytes.

System requirements:

256 MBytes RAM (for one processor)
1 GBytes on HDD
Unix , Windows NT (2000/XP)
C, C++, Fortran compiler

Well, almost any system can be estimated in this test.

The process of test implementation includes the following stages:

installation of the packet
reading the the documentation
creation of the configuration file
source text compilation
carrying out of the tests
publication of the results at the SPEC site (if desired)

If you need to compare several similar systems which differ, for example, only in a processor, it's not necessary to compile the tests several times. You can use one system for creation of exe files, and then run them in the test configurations.

The main tool of the test is utilities for compilation and implementation of the tests. To make the portability higher they are written mainly in Perl whose interpreter comes with the test. Besides, these utilities are used to obtain official results which can be then published on the SPEC's site.

This grand test is often referred to by majors, however the publication is not easy - the test uses technologies identical to electronic signatures. The utilities generate and verify checksums of both exe files and their results during compilation and running of the tests, and it's guaranteed that the given results are obtained with these particular program's versions and no figures are incorrect.

Of course, correctness of implementation of each application is controlled. I.e. the output data are compared with the reference ones.

The most important process here is creation of a configuration file. It contains all necessary parameters for test compilation including compilers used, optimization flags, libraries etc. Publication of the results makes no sense if this file is not shown because its contents has the greatest effect on the results.

The disadvantage of the test is lack of an automatic identification of the system configuration; therefore, information in the configuration file is not a sufficient source for repeated implementation of the test in the same conditions. In principle, it's possible to include everything into the narrative to this file, but it's inconvenient because, for example, to test processors which differ only in frequency you must prepare several files which have in fact only one different line.

As each suite includes a great heap of subtests, optimization of different tasks may need different optimization flags and even compilers. To make comparison of the results correct the tests divide into base and peak ones.

The first one has stricter limitations on code compilation: it's allowed to use one compiler (for tests of the same language) and the same optimization flags for it (not more than 4). Two-pass compilation is allowed (for intermodule optimization). In the second case different versions of flags and even compilers can be used on each subtest. In some configurations it helps to get a final score higher by approx 7%.

Another choice to be done is a choice of a metric. The test contains two versions - speed and rate. The first one is used for comparing the ability of a computer to complete single tasks and displays the result percentage-wise of the base system speed. A compiler creating a multiflow code is allowed to be used. But as the source texts are not prepared specially for such variant, no positive effect can be noticed.

The second measures the throughput or rate of a machine carrying out a number of tasks, and the result is obtained in "tasks in hour". As a rule, the number of simultaneously implemented tasks is usually equal to the number of processors (of course, if they are not 32. In this case you can leave one processor for the system). A small drawback of such approach is that similar tasks are started simultaneously. By the way, you can use the rate test with two or more tasks with just one processor. This information can be useful to estimate operation in a multitask system. It's also interesting to run the rate test on a dual-processor system indicating that it's necessary to emulate operation of just one user. The comparison of this figure with the one for two simultaneously implemented tasks allows us to estimate scalability of the architecture of the computation system.

The following formulae are used to calculate the final scores of the test:

"speed" SPEC int/fp= GEOMEAN(reftime/runtime * 100)

"rate" SPEC int/fp= GEOMEAN(1.16 * N * reftime/ruuntime)

where:

GEOMEAN: geometrical mean for all subtests
reftime: operation time on the base system
runtime: operation time on the tested system
N: number of simultaneously implemented tasks

The SUN Ultra 10 is used as the base system. Remember that for the official publication the runtime must be calculated as an average for at least three times of running the test.

Usage of the geometrical mean instead of the arithmetical one makes possible to smooth over differences in runtimes of different tests. It's of great importance as the test suite doesn't change often - the latest version of the CPU2000 replaced the CPU95, and at present they are gathering applications for the CPU2004.

CPU2000's applications

As I mentioned before, the test consists of two suites of applications - measurement of a speed of processing of integer and real arguments. All subtests have their own names and a unique number, and are usually written as, for example, 176.gcc.

Below you can look at brief descriptions of all used applications.

CINT2000 test	Language	Description
164.gzip	C	Version of the popular gzip compression utility. The test implements several operations of compression/decompression on a set of files of the 28 MBytes size. It includes a large TIFF image, a webserver log, a program binary, random data, and a source tar file. All operations happen entirely in memory. This is to help isolate the work done to just the CPU and the memory subsystem.
175.vpr	C	Integrated circuit designing program. VPR is a placement and routing program; it automatically implements a technology-mapped circuit in a Field-Programmable Gate Array (FPGA) chip. During the test it solves problems of choice, positioning and connection of circuit's units for a certain algorithm.
176.gcc	C	Optimizing compiler of C. 176.gcc is based on gcc Version 2.7.2.2. It generates code for a Motorola 88100 processor. The benchmark runs as a compiler with many of its optimization flags enabled. There are 5 input workloads which are preprocessed C code (.i files), 3.7 MBytes.
181.mcf	C	Transportation optimization program. The program is designed for the solution of vehicle scheduling problems occurring in the planning process of public transportation companies (minimization of costs, creation of a timetable including time of arrival etc.)
186.crafty	C	Chess program. Due to its far from linear structure this application can be used to estimate effectiveness of the branching prediction mechanism in modern processors. It solves 5 different chess board layouts, with varying "depths" to which it will search the tree of possible moves, for the next move.
197.parser	C	Syntactic parser of English. The parser has a dictionary of about 60000 word forms. It analyzes an input set of phrases of 770 KBytes.
252.eon	C++	Computer visualization program. Eon is a probabilistic ray tracer that is used to create 3D object's images. It renders an image of a chair sitting in front of a corner in a room. 3 different algorithms are applied in turn to solve the problem.
253.perlbmk	C	PERL realization A cut-down version of Perl v5.005_03, the popular scripting language is used to solve 4 problems: freeware email-to-HTML conversion, operation with specdiff (which is a part of the SPEC suite with some changes), finding of perfect numbers and generation of a random-number sequence.
254.gap	C	A program of analytical calculations in the sphere of discrete algebra. The test includes several combinatorial problems, operation with permutation groups and others.
255.vortex	C	Object-oriented database Operation with three interrelated bases is modulated (mailing list, parts list and geometric data). The program was modified to reduce influence of a disc subsystem, and most of operations are carried out only in RAM. The 255.vortex benchmark is run three different times, each time a different mix of database inserts, deletes and lookups is used to simulate different database usage patterns.
256.bzip2	C	Compression utility One more variation of a compression program. An image, a program and a source text are used as source files. The total data volume is almost 20 MBytes.
300.twolf	C	Computer integrated circuit designing. The original program is used in the process of creating the lithography artwork needed for the production of microchips. It determines the placement and global connections for groups of transistors which constitute the microchip.

Well, the suite is not monotonous. The applications were carefully selected during a long time by SPEC, and such grands as AMD, Compaq, HP and Intel agreed they were interesting for uses.

Now let's take a look at the CFP2000 tests which use primarily computation tasks with real numbers of double accuracy.

CFP2000 test	Language	Description
168.wupwise	Fortran 77	Physics / Quantum Chromodynamics Solves one of the most important equations in the theory of strong interactions among the quarks - the inhomogeneous lattice-Dirac equation via the BiCGStab method.
171.swim	Fortran 77	Meteorology Solves a finite-difference shallow-water equation. Earlier was used to compare performance of supercomputers.
172.mgrid	Fortran 77	Multigrid Solver Computing of a three dimensional potential field. Also used as a standard test of supercomputers.
173.applu	Fortran 77	Computational Fluid Dynamics and Computational Physics Solution of five coupled nonlinear PDE's, on a 3-dimensional logically structured grid, using an implicit psuedo-time marching scheme.
177.mesa	C	3-D Graphics Library Mesa is a free OpenGL work-alike library.
178.galgel	Fortran 90	Computational Fluid Dynamics Numerical calculations of parameters of flowing of liquid in a closed space.
179.art	C	Neural Networks A model of a neural network is used to recognize objects
183.equake	C	Seismic Wave Propagation Simulation The program simulates the propagation of elastic waves in large, highly heterogeneous valleys using a finite element method.
187.facerec	Fortran 90	Image Processing This is an implementation of the face recognition system.
188.ammp	C	Computational Chemistry Modeling large systems of molecules usually associated with Biology.
189.lucas	Fortran 90	Number Theory Performs the Lucas-Lehmer test to check primality of Mersenne numbers.
191.fma3d	Fortran 90	Mechanical Response Simulation The finite-element method is designed to simulate the inelastic, transient dynamic response of three-dimensional solids and structures subjected to impulsively or suddenly applied loads
200.sixtrack	Fortran 77	High Energy Nuclear Physics The function of the program is to model a particle accelerator and to check the Dynamic Aperture (DA) i.e. the long term stability of the beam.
301.apsi	Fortran 77	Weather Prediction Calculation of spreading of pollutant depending of weather conditions.

The CFP2000 suite is also varied. Note, however, that the most of tasks are highly specialized. Besides, as you know, most of algorithms get improved becoming much faster. And I want to say that in this test it's more correct to use one final score of the CFP2000 than scores of separate tests.

Conclusion

In the next parts of the SPEC CPU2000 test we will try to find out what and how it actually measures, what the results depend on and what the obtained figures mean. Also we are going to show a great deal of interesting pictures.

Write a comment below. No registration needed!