The Standard Performance Evaluation Corporation (SPEC) released the long-awaited SPEC CPU2006 on August 24, 2006. It replaced six-year-old SPEC CPU2000. SPEC is a non-profit group, which membership is drawn from hardware and software manufacturers as well as academic and research organizations. SPEC's CPU benchmarks have been the worldwide standard for measuring compute-intensive performance since their introduction in 1989. In this article we are going to analyze the contents of the new version of the test (SPEC CPU2006), its main differences from the previous version (SPEC CPU2000), as well as our first experience with its installation and usage.
SPEC CPU2006 is a useful tool for anyone interested in how hardware systems will perform under compute-intensive workloads based on real applications. This includes computer users, buyers evaluating system options, hardware system vendors, researchers, and application developers. Those who do not own a SPEC CPU2006 license can track performance results on SPEC's web site.
Actual applications provided as source code are the basis for SPEC CPU2006 benchmarks. These applications are intended to evaluate compute-intensive performance of a given system, mostly contributed by the following elements:
We should highlight the last two components in this list — SPEC CPU performance deliberately depends not only on CPU performance. It especially concerns compilers, as applications are provided as source code and their performance may depend on optimizations of binary code generated by a given compiler. At the same time, the other system components (such as IO, graphics, network, as well as an operating system) have a negligible effect on test results, especially when the tests are run on a single processor.
SPEC CPU2006 includes two benchmark suites:
So, SPEC CPU2006 resembles much the previous version of the product (SPEC CPU2000) in its purpose, structure, and load on system components. What are the reasons for releasing a new version of the benchmark? The main reason is the constant development of technologies, so benchmarks should get better as well. SPEC kept in mind the following key issues when developing SPEC CPU2006:
1. Run-time. As of summer, 2006, many of the CPU2000 benchmarks are finishing in less than a minute on leading-edge processors/systems. Small changes or fluctuations in system state or measurement conditions can therefore have significant impacts on the percentage of observed run time. SPEC chose to make run times for CPU2006 benchmarks longer to take into account future performance and prevent this from being an issue for the lifetime of the suites (considering the successful lifetime of SPEC CPU2000, it must be no less than five years.)
2. Application size. As applications grow in complexity and size, CPU2000 becomes less representative of what runs on current systems. For CPU2006, SPEC included some programs with both larger resource requirements and more complex source code.
3. Application type. SPEC felt that there were additional application areas that should be included in CPU2006 to increase variety and representation within the suites. For example, video compression and speech recognition have been added, and molecular biology has been significantly expanded.
SPEC CPU2006 package
SPEC provides the following on the SPEC CPU2006 media (a single DVD):
To run and install SPEC CPU2006, you will need:
1. A computer system running UNIX, Microsoft Windows, or Mac OS X. The benchmark suite includes a toolset. Pre-compiled versions of the toolset are provided that are expected to work with:
For systems not listed in above, such as earlier or later versions of the above systems, you may find that the tools also work, but SPEC has not tested them.
2. A DVD drive (the package is shipped on a DVD).
3. Memory SPEC CPU2006 requirements to memory have grown significantly: the typical memory size is 1 GB for 32-bit systems, exclusive of OS/overhead, but more may be required. Typically, 64-bit environments will require 2GB for some of the benchmarks in the suite. More memory will be needed if you run multi-copy SPECrates: generally 1GB for 32-bit, or 2GB for 64-bit, for each copy you plan to run.
4. Disk space Typically you will need at least 8GB of disk space to install and run the suite. However, space needs can vary greatly depending upon your usage and system. The 8GB estimate is based on the following:
The minimum requirement to disk space is 5 GB, if: you are running only single-CPU metrics; you delete the build directories after the build is done; and you clean run directories between tests.
5. Since SPEC supplies only source code for the benchmarks, you will need a set of C99 and C++98 compilers for CINT2006, and Fortran-95 compiler for CFP2006. If you have no compilers, you may use a pre-compiled set of benchmark executables, given to you by another user of the same revision of SPEC CPU2006, and any run-time libraries that may be required for those executables.
As we have already mentioned, SPEC CPU2006 contains two components that focus on two different types of compute intensive performance. The first suite (CINT2006) measures compute-intensive integer performance, and the second suite (CFP2006) measures compute-intensive floating point performance. CINT2006 contains 12 benchmarks based on real applications written in C and C++, while CFP2006 contains 17 benchmarks written in C, C++, and various Fortran versions, as well as C/Fortran.
Here is the list of SPEC CPU2006 benchmarks, their programming languages, and brief descriptions.
Table 1. CINT2006 benchmarks
Table 2. CFP2006 benchmarks
Some of the SPEC CPU2006 benchmark names sound familiar. Indeed, many of the SPEC benchmarks have been derived from publicly available programs (or cut-down versions of commercial applications). But SPEC benchmarks are not identical to such applications. For this reason, direct comparison of their performance (for example, 403.gcc benchmark and gcc 3.2 compiler) is not correct. Some SPEC CPU2006 benchmarks may also seem familiar to users of old SPEC CPU2000 (for example, 403.gcc "resembles" 176.gcc, 429.mcf — 181.mcf). Nevertheless, these benchmarks are not identical — as a rule, the new version of the benchmark (SPEC CPU2006) uses the latest program code and (in all cases) different input data, which reflect the typical computational load these days. Therefore, it is not valid to compare results of "similar" benchmarks from SPEC CPU2006 and SPEC CPU2000 (for example, 403.gcc and 176.gcc) either. By the way, that's the reason why updated SPEC CPU2006 benchmarks, derived from SPEC CPU2000, bear different numeric indices.
As we have already mentioned, SPEC CPU2006 benchmarks are divided into two large groups — CINT2006 (for integer compute intensive performance comparisons) and CFP2006 (for floating point compute intensive performance comparisons.) Thus, SPEC CPU2006 provides two fundamental performance ratings, generally called SPECint2006 and SPECfp2006.
Each of these ratings provide two important metrics of system performance. The first one measures how fast (that is how much time) a system can solve a single task — this metric is called (speed). The second one reflects how many tasks a system can solve for a certain period of time. This metric is called throughput or rate. In the first case, a single copy of a task is run (the task can be automatically distributed by an optimized compiler in order to use all available system processors — it will still be the speed metric). For the rate metrics, multiple copies of the benchmarks are run simultaneously. Typically, the number of copies is the same as the number of CPUs on the machine, but this is not a requirement.
And finally, each of these metrics (speed and rate) can be measured in two builds: base and peak. This approach to dividing test results is based on compiler usage scenarios. In the first case, simplicity is preferred (in particular, you must use a single set of switches and single-pass make process for all benchmarks). Base metric is required, when you submit your performance results to the SPEC web site. It is notable for stricter compilation rules than for peak metrics (optional). This metric reflects an "experimental" programming approach to compiling benchmark code in order to obtain maximum performance: you can use more than one type of compilers and various optimization flags, as well as multi-pass make process with training workloads stipulated by SPEC CPU2006. Speaking of the latter — multi-pass make process belongs to peak metrics now. It's inadmissible for compiling benchmarks to get base results (which was allowed in SPEC CPU2000).
So considering the above said, SPEC CPU2006 allows to measure up to 8 performance metrics listed in Table 3.
Table 3. SPEC CPU2006 Metrics
Performance results are always output in a normalized form — that is in the form of a ratio between performance of a tested computer and reference performance. SPEC uses a historical Sun system, the "Ultra Enterprise 2" which was introduced in 1997, as the reference machine. The reference machine uses a 296 MHz UltraSPARC II processor, as did the reference machine for CPU2000. But the reference machines for the two suites are not identical: the CPU2006 reference machine has substantially better caches, and the CPU2000 reference machine could not have held enough memory to run CPU2006 (remember the significantly higher requirements of this version to memory). It takes about 12 days to do a rule-conforming run of the base metrics for CINT2006 and CFP2006 on the CPU2006 reference machine.
First Tests. Installation, compilation, and experimental run
Let's proceed to the description of our first experience with SPEC CPU2006. Here are the general instructions from installation to getting valid results:
Let's analyze installation, compilation, and test run on our testbed under Microsoft Windows XP (SP2) with the following configuration:
It's very easy to install the benchmark — you just insert the SPEC CPU2006 DVD and run install.bat in command prompt:
install.bat destination_drive destination_directory
F:\>install.bat E: \CPU2006
The next installation step (to be more exact, preliminary configuration) is to edit shrc.bat in the root folder of the benchmark. You should specify paths to installed compilers (set SHRC_COMPILER_PATH_SET=yes) or specify that you are going to use precompiled benchmarks (set SHRC_PRECOMPILED=yes). Otherwise, you won't be able to run shrc.bat (that's the first thing you should do any time you start the benchmark).
Since our first SPEC CPU2000 tests, we have been using the latest available compilers. In this case we used Intel C++ Compiler and Intel Fortran Compiler 9.1 (9.1.034) with various code optimization parameters to achieve maximum performance on both Intel's, and other processors. We used Microsoft Platform SDK for Windows 2003 Server SP1 (build 3790) as headers and Windows API libraries. Standard tools from Microsoft Visual Studio 2005 Professional Edition were used to compile object code.
It was easy to choose a proper config file for Windows (x86) and Intel compilers - the config folder contained the self-explanatory windows-ia32-icl.cfg. The comment inside it (that it was tested with Intel Compiler 9.1 and MS Visual Studio.Net 2003) was almost a match to our situation. Nevertheless, our attempt to compile benchmarks with this config file (after proper modifications to meet our test requirements) was not fully successful.
First of all, there were some compilation errors in 483.xalancbmk. We decided to add some compatibility options to this task, borrowed from a similar config file (windows-em64t-icl.cfg) for x64 platforms: "CXXPORTABILITY = -Qoption,cpp,--no_wchar_t_keyword". After that step, the benchmark was successfully compiled on our x86 platform.The reason of errors was usage of built-in wchar_t data type, a default in Microsoft Visual Studio 2005 (as well as in Intel C++ Compiler 9.1 with Visual Studio 2005 /Qvc8 compatibility key). While this task expected this data type to be converted into a usual short int. The compatibility problem was solved by adding the "CXXPORTABILITY = -Zc:wchar_t-" option, which allowed to successfully compile the task on our testbed.
The second problem was in 454.calculix, with peak code compilation (tune=peak). In our case, it differed from the base modification (tune=base) by a two-pass compilation with Profile-Guided Optimization (PGO). In fact, the error appeared not during compilation, but when we tried running the compiled binary with any input data - the benchmark produced a strange error stating it couldn't allocate memory for its data (about 20 millions of 4-byte elements, e.g. about 80MB in total.) To solve this problem we had to thoroughly examine the configuration file. In it we found the following option, common for all SPECfp2006 tasks:
It was to allocate 950 million bytes for a stack (which is just 1MB by default.) While the stack for SPECint2006 integer tasks made only 512 million bytes:
After some thinking we added the "EXTRA_LDFLAGS = /F512000000" line right under the 454.calculix compatibility options. With that we reduced its stack and added the remaining free system memory for the heap, which this task uses to store data. As a result, the two-pass compilation of 454.calculix finished successfully and we could run it with any input data.
Table 4. SPEC CPU2006 Task Compilation Time
Speaking of the entire suite compilation time (Table 4 lists compilation times of tasks with specific SSE3 optimizations with the /QxP option), it's rather long. For example, the complete compilation of the base build (tune=base) takes more than 3 hours. At that, individual task compilation times vary greatly - from 1 second to more than 1.5 hours. There's a large group of benchmarks, which compile in less than 2-3 minutes, several benchmarks, which compile in 10-20 minutes, and, finally, the 481.wrf, which compiles in nearly 2 hours.
The two-pass compilation with Profile-Guided Optimization of "tune=peak" significantly increases the total compilation time (by 1.5 times - to about 5 hours). It also introduces significant changes into the distribution of individual benchmark compilation times. Nevertheless, even in this case most benchmarks compile in reasonable 10 minutes or less, and only some of them require 20 to 50 minutes. Interestingly, the absolute compilation time leader (481.wrf base build) compiles almost twice as fast in its "peak" modification. It seems the two-pass compilation with a "training" run-through significantly reduces its code analysis time when executing multi-file inter-procedural optimization used in both cases.
You should pay attention not only to compilation times, but to amount of used RAM as well. As we have already mentioned above, SPEC CPU2006 has high requirements to memory - at least 1GB of free RAM for a 32-bit platform. It turned out that compilation had much higher requirements, at least in our conditions (i.e. Intel compilers with high-level optimizations). Total memory usage during this process made about 1.8GB for single-pass compilation and about 1.9GB for two-pass compilation. We found this out on the qualitative level at first, when we attempted to use our dual-core processor at its full capacity, that is to compile two builds simultaneously (for example, non- and SSE-optimized.) This quickly used up all 2GB of installed RAM and resulted in heavy hard drive swapping. Thus we had to give up this "speed-up" idea and evaluate the total memory usage throughout the entire compilation on one CPU core using Windows Task Manager.
3. Running benchmarks
Like its previous version, SPEC CPU2006 provides two sizes of input and output data sets for benchmarks - test run (size=test) and reference run (size=ref). The former is a quick way to check benchmark operability, while the latter is used to evaluate system performance. Moreover, according to SPEC rules, in order to get valid test results eligible for publishing on their website, each benchmark should be run at least three times.
Speaking of the test run, its name is still justified in SPEC CPU2006. Our testbed completed all benchmarks (CINT2006 and CFP2006) in less than 6 minutes. Peak memory usage in this mode was also suitable for the official requirements - approximately 1.15 GB (considering that about 0.25 GB is used by the operating system).
The reference run of SPEC CPU2006 required a tad more memory (1.4GB,) but took significantly more time to complete. The results obtained on our testbed with SSE3-optimized benchmark code are provided in the Table 5.
Table 5. SPEC CPU2006 Runtimes
So, the total runtime on our system makes approximately 10 hours. Note that this is only a single run. Thus, it would take about 30 hours of CPU time to obtain valid test results, according to SPEC requirements. So, the complete platform benchmarking in SPEC CPU2006, using our test method and taking into account our code optimizations (non-optimized; optimized for SSE, SSE2; optimized for Northwood, Prescott, Conroe) as well as "base" and "peak" runs (12 variants all in all), may take 1 to 2 weeks of pure runtime.
We've just analyzed the contents of SPEC CPU2006, its main peculiarities and differences from the previous SPEC CPU2000, which had been used in our testlab for several years to evaluate performance of various platforms. We've also tried SPEC CPU2006 out, that is we estimated whether we could use it in our testlab (compilation and runs) and evaluated its typical requirements to system resources. SPEC CPU2006 requirements to memory are quite high — up to 1.9 GB to compile benchmarks (fortunately, this procedure is done much more seldom than tests themselves), and about 1.4 GB for reference runs to obtain performance ratings of platforms. It concerns 32-bit platforms. SPEC honestly warns users that 64-bit platforms may require twice as much memory (we'll try to find it out soon). Considering that memory size in typical modern platforms usually does not exceed 2 GB. This fact significantly hampers parallel runs of benchmarks to evaluate "full" performance of a platform with multi-core processors (because in the general case, memory usage should be multiplied by a number of running instances). In this respect, our first results of performance analysis in SPEC CPU2006 to be published in the nearest future will concern solely SPECint2006/SPECfp2006, obtained in the "single-core mode".
Dmitri Besedin (email@example.com)
January 26, 2007
Updated on May 2, 2007
Write a comment below. No registration needed!