iXBT Labs - Computer Hardware in Detail

Platform

Video

Multimedia

Mobile

Other

SPEC CPU2000. Part 15.
AMD64 and the 64-bit Code. Second Try.

June 3, 2004



It's been a long time since AMD64 CPUs began to be tested. And each of our articles on it brings a lot of responses reproaching us of an unfair attitude to the 64-bit mode of these models.

Indeed, we have nothing to offer here except a couple of synthetic tests and benchmarks. Even when we tried to find AMD's document containg 64-bit software, it was to no avail. Most benchmarks turned out to be in the development stage, and in other cases authors themselves knew nothing of the 64-bit versions :). However, we did manage to find a couple of real applications and we'll certainly try to make the most of them in future articles. As for today, we'll once again test compilers made for AMD64.

We have already tried to test AMD Opteron CPUs in a 64-bit OS with 64-bit compilers. It made no profound impression on us then, although performance in some of SPEC CPU2000 subtests was quite promising.

It has been seven months since that material appeared, and today we'll try to do it once again hoping that OSs and compilers have grown more mature.

Tests were carried out on the following platform:

  • AMD Athlon 64 FX-53 CPU
  • ASUS SK8V motherboard (VIA K8T800 chipset)
  • two Corsair PC3200 Registered ECC 512-MB DIMMs (timings: 2-3-2-5)

Linux

In Linux, we used SuSE 9.0 Pro and SuSE 9.0 Pro distributives for AMD64. A standard benchmark suite for the workstation was set, after which we refreshed the cores and gcc compilers (using ready-made SeSE rpm's). These are the resulting versions:

  • i386 platform: 2.4.21-209 core, 3.3.2-26 compiler
  • x86-64 platform: 2.4.21-199 core, 3.3.2-29 compiler

The tests were conducted on a standard gcc/g77/g++ compiler as well as on a Portland Group (PGI) compiler version 5.1-3 (released January, 14, 2004).

The gcc benchmark does not include the Fortran 90 compiler, so we can't obtain the official results of SPECfp_base2000. Therefore, only 10 out of 14 subtests are given here with results. And we managed to obtain fully official specs for PGI.

The following optimisation switches were used in the tests:

  • gcc/g77/g++: -O3 -funroll-all-loops +PGO (-fprofile-arcs/-fbranch-probabilities)
  • pgi: -fastsse -Mipa=fast (two compiler passes for using IPA)

Because there are too many figures and we're mostly interested in the changes caused by the transition to 64 bits, we'll confine ourselves with the tables only.

We'll start with CINT2000, as usual.

  gcc 32 pgi 32 gcc 64 pgi 64 gcc, change, % pgi,
change, %
164.gzip
1027
766
1246
978
21.32
27.68
175.vpr
1086
1011
1150
1006
5.89
-0.49
176.gcc
1429
1306
1475
1248
3.22
-4.44
181.mcf
1079
1016
673
670
-37.63
-34.06
186.crafty
1327
1019
2011
1521
51.54
49.26
197.parser
1011
813
1105
786
9.30
-3.32
252.eon
1143
314
1963
387
71.74
23.25
253.perlbmk
1533
1163
1655
1183
7.96
1.72
254.gap
1150
955
1296
951
12.70
-0.42
255.vortex
1527
1479
1748
1662
14.47
12.37
256.bzip2
1081
940
1233
1011
14.06
7.55
300.twolf
1424
1303
1148
1074
-19.38
-17.57
SPECint_base2000
1221
950
1338
979
9.58
3.05


Well, transition to 64 bits gives a rather ambiguous picture. The integral reading has risen but some other subtests show a wide range of results.

The situation with the gcc compiler is better than it was in the previous testing (version gcc 3.3.1): only two subtests have a performance decrease now (vs. four subtests in gcc 3.3.1), and integral reading performance has risen by 9.6 percent (vs. 4.8 percent in gcc 3.3.1). But the new PGI version seems worse than the previous one: the integral reading has only increased by 3 percent (vs. 11 percent in version 5.0-1), while falls in 181.mcf and 300.twolf have become deeper.

However, it must be taken into account that the previous testing was carried out on Opteron 240 CPUs that have a 1.4 GHz frequency and DDR333 memory.

Now let's take a look at CFP2000.

  gcc32 pgi 32 gcc 64 pgi 64 gcc,
change, %
pgi,
change, %
168.wupwise
1174
1519
1287
1725
9.63
13.56
171.swim
1411
1910
1437
1999
1.84
4.66
172.mgrid
798
1155
916
1357
14.79
17.49
173.applu
870
1150
947
1292
8.85
12.35
177.mesa
1199
1129
1749
1263
45.87
11.87
178.galgel
2077
2636
26.91
179.art
741
1344
1429
1272
92.85
-5.36
183.equake
1435
1174
1428
1262
-0.49
7.50
187.facerec
1416
2072
46.33
188.ammp
1117
932
1368
1267
22.47
35.94
189.lucas
1403
1415
0.86
191.fma3d
1333
1459
9.45
200.sixtrack
451
688
597
718
32.37
4.36
301.apsi
784
1129
984
1386
25.51
22.76
SPECfp_base2000
1266
1446
14.22


And again, 179.art is prominent on the gcc compiler. The use of the 64-bit mode all but doubles its result (although it is rather due to a bad result in 32 bits than to a good result in 64 bits). Other subtests mostly show a performance increase of 25.4 percent on average. Also noteworthy are better 171.swim readings: a 1.8-percent rise instead of a 8.6-percent fall. Thus, CFP2000, too, demonstrates a general performance increase of the 64-bit code for gcc.

In respect of pgi, 179.art is falling and other tasks are rising, just like it was last time. The integral reading gives a 14.2-percent increase (vs. 12.5 percent in version 5.0-1).

We also managed to test the much-spoken-of PathScale EKO Compiler Suite version 1.0. Although we only did it in the 64-bit mode, as the 32-bit code generation is only in the alpha version now. However, the "-m32" switch is officially used for peak results of some SPEC CPU2000 subtests. As for optimisation switches, we used the supplied configuration file which is almost fully identical to the one used for publishing results on SPEC's site. Note that the manufacturers were wise enough to install four DIMMs into the test station and to employ interleaving mode, which led to a significant increase in results (which is exactly the thing we showed last summer). Unfortunately, we use only two DIMMs now, so mind the reserve :). For comparison purposes, we'll take the results of 64-bit gcc and pgi versions.

  psc 1.0 gcc 64 pgi 64
164.gzip
1376
1246
978
175.vpr
1084
1150
1006
176.gcc
1585
1475
1248
181.mcf
671
673
670
186.crafty
2026
2011
1521
197.parser
1047
1105
786
252.eon
1738
1963
387
253.perlbmk
1596
1655
1183
254.gap
1261
1296
951
255.vortex
2287
1748
1662
256.bzip2
1245
1233
1011
300.twolf
1204
1148
1074
SPECint_base2000
1361
1338
979
 
psc 1.0
gcc 64
pgi 64
168.wupwise
1641
1287
1725
171.swim
2070
1437
1999
172.mgrid
1428
916
1357
173.applu
1369
947
1292
177.mesa
1777
1749
1263
178.galgel
2510
2636
179.art
1649
1429
1272
183.equake
1428
1428
1262
187.facerec
1606
2072
188.ammp
1356
1368
1267
189.lucas
1387
1415
191.fma3d
1343
1459
200.sixtrack
673
597
718
301.apsi
1434
984
1386
SPECfp_base2000
1493
1446

The results show that psc can compete with gcc and pgi in integer calculations and real arithmetics, respectively. So, PathScale is definitely telling us that you don't always have to spoil before you spin. AMD64 can be said to have found solid support in this manufacturer.

Unfortunately, as soon as we were done with the PathScale part, we found out that a new compiler version (1.1) had just appeared (such things happen quite often) :), so we decided to put off the article for several days in order to include new results into it (especially considering that the bugfixes make a long list and many of them belong to SPEC CPU2000 tasks). We also used the new supplied configuration file for version 1.1. Apart from the correction of the mistakes, the version turned 32-bit code support from alpha to beta stage. The test run of the mode showed that almost all SPEC CPU2000 tasks (except 178.galgel which was executed in an indefinite time span) were compiled and passed quality control. On average, the results were 1.5-2 times lower than in 64 bits. Compared to version 1.0, the results changed little: SPECint_base2000 increased by 2.4 percent, SPECfp_base2000 fell by 0.2 percent. Interestingly, AMD ACML 2.0 mathematic library was used to peak-run the 178.galgel test. Obviously, this was the cause of its almost 5-percent increase.

We normally don't use peak readings in our tests. It is partially due to our conviction that adjustments of subtest settings are the department of compiler and CPU manufacturers, while most users seldom practice it. For example, can you guess that it is "-O3 -ipa -LNO:fusion=2:interchange=OFF:blocking=OFF:ou_prod_max=10:ou_max=5: prefetch=2 -OPT:IEEE_arith=1:ro=3:unroll_size=0 -TENV:X=4 -WOPT: mem_opnds=on:retype_expr=on:val=0" that will show the best result? :) And when it comes down to a subtle selection of multiple options, one can often achieve a maximal result on a user program by rewriting the code (e.g. basing on analyser's research). Thus, peak readings in SPEC CPU2000 synthetic tests rather serve for the measurement the compiler's "capabilities" than for a precise comparison of CPUs' performances. But this time around, we'll please AMD fans :) and include the PathScale product's peak readings into our table. And we'll compare it with Intel's fastest compiler for IA32, that worked in Windows XP.

  ic 8.0 psc 1.1 psc-peak 1.1
164.gzip
1303
1413
1413
175.vpr
1350
1124
1152
176.gcc
1239
1597
1597
181.mcf
1156
674
1056
186.crafty
1694
2043
2043
197.parser
1487
1048
1222
252.eon
2538
1795
1864
253.perlbmk
1598
1619
1728
254.gap
1626
1403
1403
255.vortex
2444
2303
2440
256.bzip2
1283
1274
1274
300.twolf
1654
1218
1553
SPECint_base2000
1566
1393
1518
 
ic 80
psc 1.1
psc-peak 1.1
168.wupwise
1601
1636
1876
171.swim
2210
2059
2004
172.mgrid
1224
1422
1569
173.applu
1201
1344
1450
177.mesa
1771
1777
1930
178.galgel
2146
2468
2716
179.art
1864
1631
2286
183.equake
1505
1415
1393
187.facerec
1647
1747
1907
188.ammp
1193
1359
1372
189.lucas
1824
1349
1553
191.fma3d
1404
1301
1359
200.sixtrack
631
680
700
301.apsi
1373
1444
1455
SPECfp_base2000
1480
1490
1612

We finally got a small-scale sensation: it is the first time that an Intel compiler loses to its 64-bit rival (to be precise, it also concerns psc version 1.0) in SPECfp_base2000. There can be mixed reaction to this fact. Some may think that the era of 64-bit calculations has come and everybody has to rush in that direction :). Others may placidly analyse the situation and say that users now have one more reason to try using AMD64 on their tasks. The gap is not so big, especially considering that Intel was tested in another OS and its result in Linux may be a little different (see this article).

Windows

Windows XP AMD64 version released in February 2004 (build 1069) served as a 64-bit OS. We found two compilers: one from DDK for Windows 2003 Server build 3790 released in March 2003 (version 14.00.2207.0), the other from the Visual Studio «Whidbey» preview (version 14.0.30702.27) (it is named msvc8 in the table).

Unfortunately, there are less figures in this chapter. First, because only a C/C++ compiler was used, and second, some of the tests couldn't be compiled/run for a 64-bit OS. All the results of this chapter are unofficial, partially because each test was only run once.

  msvc8 32 msvc8 64 ddk 32 ddk 64 msvc8, change, % ddk, change, %
164.gzip
1233
1173
1154
1023
-4.87
-11.35
175.vpr
1132
1195
1183
1113
5.57
-5.92
176.gcc
1554
1534
1549
1534
-1.29
-0.97
181.mcf
1152
769
1158
747
-33.25
-35.49
186.crafty
1612
2021
1576
1699
25.37
7.80
197.parser
1133
1089
1134
940
-3.88
-17.11
252.eon
1465
1402
253.perlbmk
1530
1517
254.gap
1279
1261
255.vortex
1557
1611
1556
1433
3.47
-7.90
256.bzip2
1206
1221
1202
1143
1.24
-4.91
300.twolf
1434
1146
1437
1103
-20.08
-23.24
SPECint_base2000
1346
1333


Two conclusions can be drawn from the results. First, a transition to 64 bits is at least not always good in terms of performance. And second, a new compiler is better adapted for the 64-bit mode. But we can't make really serious conlcusions about performance basing on nothing but the results of the compilers' beta versions. However, is is good news that nine out of twenty tasks written over three years ago could be compiled to work correctly in the 64-bit mode.

Interestingly, significant performance falls of the 64-bit code occur exactly in the same places as in gcc/pgi — 181.mcf and 300.twolf.

Only four CFP2000 tests are written in C, so we'll examine no others.

  msvc8 32 msvc8 64 ddk 32 ddk 64 msvc8, change, % ddk, change, %
168.wupwise
171.swim
172.mgrid
173.applu
177.mesa
858
1652
811
979
92.54
20.72
178.galgel
179.art
1752
1711
1647
1391
-2.34
-15.54
183.equake
1466
1103
1471
1046
-24.76
-28.89
187.facerec
188.ammp
1175
1440
1159
404
22.55
-65.14
189.lucas
191.fma3d
200.sixtrack
301.apsi
SPECfp_base2000

And again, the new compiler ensures a better 64-bit code performance than the last-year version. Although the result in 183.equake is rather bad too.

In our opinion, it's no use comparing MSVC results with Linux compilers. While SPEC CPU2000 integral readings could be compared in a way, separate subtests will be uninteresting and far-fetched in this switch (e.g. MSVC scores better in 179.art but is visibly inferior to gcc in 32-bit 177.mesa).

Conclusions

First of all, according to the integral estimates, all tested programs (except PathScale in CFP2000) lose to Intel's 32-bit compiler. Even this alone can spoil the pleasure of increased performance.

The fact that the compilers can't be possibly compared indicates their crudeness (as well as a rather bad AMD64 adaptation of the codes). But certainly, we can also note some progress in the development of standard compilers for Linux and Windows platforms. Although in such case, compilers are more expected to just work than provide a maximal efficiency of the resulting code.

Compilers (good compilers :)) for AMD64 have an unclear future ahead of them. On one hand, Intel has announced support of the 64-bit mode and its instructions in their CPUs, on the other hand, it is possible that the company's compilers will work with Intel CPUs only.

Concerning the products we have tested, gcc has a license as its advantage, and it will continue developing in the future, while PGI is relatively solid on the market of cluster-system compilers. Speaking about the PathScale product, it has been showing adequate results since the time its first version appeared, and hopefully, it will continue to be competitive to its more famous rivals.

As for the Windows platform and its standard Microsoft compiler, it rather aims at providing a high compatibility and a timely support of developers than at setting performance records.


Kirill Kochetkov kochet@ixbt.com,

01.06.2004


Write a comment below. No registration needed!


Article navigation:



blog comments powered by Disqus

  Most Popular Reviews More    RSS  

AMD Phenom II X4 955, Phenom II X4 960T, Phenom II X6 1075T, and Intel Pentium G2120, Core i3-3220, Core i5-3330 Processors

Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
February 1, 2013 · Processor Roundups

Inno3D GeForce GTX 670 iChill, Inno3D GeForce GTX 660 Ti Graphics Cards

A couple of mid-range adapters with original cooling systems.
January 30, 2013 · Video cards: NVIDIA GPUs

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

AMD FX-8350 Processor

The first worthwhile Piledriver CPU.
September 11, 2012 · Processors: AMD

Consumed Power, Energy Consumption: Ivy Bridge vs. Sandy Bridge

Trying out the new method.
September 18, 2012 · Processors: Intel
  Latest Reviews More    RSS  

i3DSpeed, September 2013

Retested all graphics cards with the new drivers.
Oct 18, 2013 · 3Digests

i3DSpeed, August 2013

Added new benchmarks: BioShock Infinite and Metro: Last Light.
Sep 06, 2013 · 3Digests

i3DSpeed, July 2013

Added the test results of NVIDIA GeForce GTX 760 and AMD Radeon HD 7730.
Aug 05, 2013 · 3Digests

Gainward GeForce GTX 650 Ti BOOST 2GB Golden Sample Graphics Card

An excellent hybrid of GeForce GTX 650 Ti and GeForce GTX 660.
Jun 24, 2013 · Video cards: NVIDIA GPUs

i3DSpeed, May 2013

Added the test results of NVIDIA GeForce GTX 770/780.
Jun 03, 2013 · 3Digests
  Latest News More    RSS  

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  About us & Privacy policy  ·  Twitter  ·  Facebook


22

Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.