It's been a long time since AMD64 CPUs began to be tested. And each of our articles on it brings a lot of responses reproaching us of an unfair attitude to the 64-bit mode of these models.
Indeed, we have nothing to offer here except a couple of synthetic tests and benchmarks. Even when we tried to find AMD's document containg 64-bit software, it was to no avail. Most benchmarks turned out to be in the development stage, and in other cases authors themselves knew nothing of the 64-bit versions :). However, we did manage to find a couple of real applications and we'll certainly try to make the most of them in future articles. As for today, we'll once again test compilers made for AMD64.
We have already tried to test AMD Opteron CPUs in a 64-bit OS with 64-bit compilers. It made no profound impression on us then, although performance in some of SPEC CPU2000 subtests was quite promising.
It has been seven months since that material appeared, and today we'll try to do it once again hoping that OSs and compilers have grown more mature.
Tests were carried out on the following platform:
In Linux, we used SuSE 9.0 Pro and SuSE 9.0 Pro distributives for AMD64. A standard benchmark suite for the workstation was set, after which we refreshed the cores and gcc compilers (using ready-made SeSE rpm's). These are the resulting versions:
The tests were conducted on a standard gcc/g77/g++ compiler as well as on a Portland Group (PGI) compiler version 5.1-3 (released January, 14, 2004).
The gcc benchmark does not include the Fortran 90 compiler, so we can't obtain the official results of SPECfp_base2000. Therefore, only 10 out of 14 subtests are given here with results. And we managed to obtain fully official specs for PGI.
The following optimisation switches were used in the tests:
Because there are too many figures and we're mostly interested in the changes caused by the transition to 64 bits, we'll confine ourselves with the tables only.
We'll start with CINT2000, as usual.
The situation with the gcc compiler is better than it was in the previous testing (version gcc 3.3.1): only two subtests have a performance decrease now (vs. four subtests in gcc 3.3.1), and integral reading performance has risen by 9.6 percent (vs. 4.8 percent in gcc 3.3.1). But the new PGI version seems worse than the previous one: the integral reading has only increased by 3 percent (vs. 11 percent in version 5.0-1), while falls in 181.mcf and 300.twolf have become deeper.
However, it must be taken into account that the previous testing was carried out on Opteron 240 CPUs that have a 1.4 GHz frequency and DDR333 memory.
Now let's take a look at CFP2000.
In respect of pgi, 179.art is falling and other tasks are rising, just like it was last time. The integral reading gives a 14.2-percent increase (vs. 12.5 percent in version 5.0-1).
We also managed to test the much-spoken-of PathScale EKO Compiler Suite version 1.0. Although we only did it in the 64-bit mode, as the 32-bit code generation is only in the alpha version now. However, the "-m32" switch is officially used for peak results of some SPEC CPU2000 subtests. As for optimisation switches, we used the supplied configuration file which is almost fully identical to the one used for publishing results on SPEC's site. Note that the manufacturers were wise enough to install four DIMMs into the test station and to employ interleaving mode, which led to a significant increase in results (which is exactly the thing we showed last summer). Unfortunately, we use only two DIMMs now, so mind the reserve :). For comparison purposes, we'll take the results of 64-bit gcc and pgi versions.
The results show that psc can compete with gcc and pgi in integer calculations and real arithmetics, respectively. So, PathScale is definitely telling us that you don't always have to spoil before you spin. AMD64 can be said to have found solid support in this manufacturer.
Unfortunately, as soon as we were done with the PathScale part, we found out that a new compiler version (1.1) had just appeared (such things happen quite often) :), so we decided to put off the article for several days in order to include new results into it (especially considering that the bugfixes make a long list and many of them belong to SPEC CPU2000 tasks). We also used the new supplied configuration file for version 1.1. Apart from the correction of the mistakes, the version turned 32-bit code support from alpha to beta stage. The test run of the mode showed that almost all SPEC CPU2000 tasks (except 178.galgel which was executed in an indefinite time span) were compiled and passed quality control. On average, the results were 1.5-2 times lower than in 64 bits. Compared to version 1.0, the results changed little: SPECint_base2000 increased by 2.4 percent, SPECfp_base2000 fell by 0.2 percent. Interestingly, AMD ACML 2.0 mathematic library was used to peak-run the 178.galgel test. Obviously, this was the cause of its almost 5-percent increase.
We normally don't use peak readings in our tests. It is partially due to our conviction that adjustments of subtest settings are the department of compiler and CPU manufacturers, while most users seldom practice it. For example, can you guess that it is "-O3 -ipa -LNO:fusion=2:interchange=OFF:blocking=OFF:ou_prod_max=10:ou_max=5: prefetch=2 -OPT:IEEE_arith=1:ro=3:unroll_size=0 -TENV:X=4 -WOPT: mem_opnds=on:retype_expr=on:val=0" that will show the best result? :) And when it comes down to a subtle selection of multiple options, one can often achieve a maximal result on a user program by rewriting the code (e.g. basing on analyser's research). Thus, peak readings in SPEC CPU2000 synthetic tests rather serve for the measurement the compiler's "capabilities" than for a precise comparison of CPUs' performances. But this time around, we'll please AMD fans :) and include the PathScale product's peak readings into our table. And we'll compare it with Intel's fastest compiler for IA32, that worked in Windows XP.
We finally got a small-scale sensation: it is the first time that an Intel compiler loses to its 64-bit rival (to be precise, it also concerns psc version 1.0) in SPECfp_base2000. There can be mixed reaction to this fact. Some may think that the era of 64-bit calculations has come and everybody has to rush in that direction :). Others may placidly analyse the situation and say that users now have one more reason to try using AMD64 on their tasks. The gap is not so big, especially considering that Intel was tested in another OS and its result in Linux may be a little different (see this article).
Windows XP AMD64 version released in February 2004 (build 1069) served as a 64-bit OS. We found two compilers: one from DDK for Windows 2003 Server build 3790 released in March 2003 (version 14.00.2207.0), the other from the Visual Studio «Whidbey» preview (version 14.0.30702.27) (it is named msvc8 in the table).
Unfortunately, there are less figures in this chapter. First, because only a C/C++ compiler was used, and second, some of the tests couldn't be compiled/run for a 64-bit OS. All the results of this chapter are unofficial, partially because each test was only run once.
Interestingly, significant performance falls of the 64-bit code occur exactly in the same places as in gcc/pgi — 181.mcf and 300.twolf.
Only four CFP2000 tests are written in C, so we'll examine no others.
And again, the new compiler ensures a better 64-bit code performance than the last-year version. Although the result in 183.equake is rather bad too.
In our opinion, it's no use comparing MSVC results with Linux compilers. While SPEC CPU2000 integral readings could be compared in a way, separate subtests will be uninteresting and far-fetched in this switch (e.g. MSVC scores better in 179.art but is visibly inferior to gcc in 32-bit 177.mesa).
First of all, according to the integral estimates, all tested programs (except PathScale in CFP2000) lose to Intel's 32-bit compiler. Even this alone can spoil the pleasure of increased performance.
The fact that the compilers can't be possibly compared indicates their crudeness (as well as a rather bad AMD64 adaptation of the codes). But certainly, we can also note some progress in the development of standard compilers for Linux and Windows platforms. Although in such case, compilers are more expected to just work than provide a maximal efficiency of the resulting code.
Compilers (good compilers :)) for AMD64 have an unclear future ahead of them. On one hand, Intel has announced support of the 64-bit mode and its instructions in their CPUs, on the other hand, it is possible that the company's compilers will work with Intel CPUs only.
Concerning the products we have tested, gcc has a license as its advantage, and it will continue developing in the future, while PGI is relatively solid on the market of cluster-system compilers. Speaking about the PathScale product, it has been showing adequate results since the time its first version appeared, and hopefully, it will continue to be competitive to its more famous rivals.
As for the Windows platform and its standard Microsoft compiler, it rather aims at providing a high compatibility and a timely support of developers than at setting performance records.
Kirill Kochetkov firstname.lastname@example.org,
Write a comment below. No registration needed!