iXBT Labs - Computer Hardware in Detail

Platform

Video

Multimedia

Mobile

Other

SPEC CPU2000. Part 17.
AMD64/EM64T and 64-bit code.
The third attempt.

October 6, 2004




 

Four months have already passed since we tested 64-bit compilers on the AMD64 platform. Today we'll proceed with our tests and will see what has changed since that time, because the 32-bit product from Intel has often been in advance of 64-bit compilers. And the main plot will be the release of a 64-bit compiler from Intel. Of course, it was initially created for CPUs from this company supporting the EM64T technology, but it also works fine with AMD64.

Testbed configuration:

  • AMD Athlon 64 3500+ CPU (2,2 GHz, 512 KB L2, Socket 939)
  • Gigabyte K8NSNXP-939 mainboard
  • 2 x Kingston HyperX KHX4000/512 (operating as DDR400)
  • Operating system: SuSE Linux 9.1 x86-64 (Kernel 2.6.5-7.108, gcc 3.3.3-33 compiler)

We used the following compilers:

  • GNU gcc
  • PGI Workstation 5.2-2
  • Pathscale EKO Compiler Suite 1.3-108
  • Intel Compilers 8.0 (C/C++ 8.0.070, Fortran 8.0.050)
  • Intel Compilers 8.1 for EM64T (C/C++/Fortran 8.1.020)

We'll briefly introduce our participants for those who are not keeping up with our publications:

A standard compiler for Linux systems, gcc remains the most popular compiler for non-commercial use. We used a compiler version from the SuSE package (to be more exact – from the update), because the new (at the moment of our tests) version (we tried 3.4.2) did not provide a considerable performance gain in SPEC CPU2000 tests.

PGI is the first 64-bit commercial compiler for AMD64. PGI Workstation 5.2 package includes C, C++, Fortran, and Fortran90 compilers (as well as a debugger and a profiler). Supporting OpenMP and MPI. Since the first release, it has gone through several versions and the current one is 5.2-2. Note that new versions of PGI are in fact released almost every day, but unfortunately the developer does not always increase version numbers. So you can learn how fresh the release is only by the date of the installation package file. Fortran compiler from this package is also used in commercial applications. In particular, it was used to compile the 64-bit version of LS-DYNA for the AMD64 platform.

PathScale EKO Compiler Suite has appeared relatively recently, the first (1.0) version was released this year in spring. Interestingly, this product was initially developed to operate on the AMD64 platform. The package includes C, C++, and Fortran 77/90/95 compilers. It lacks debuggers and other utilities. The compiler works only in 64-bit versions of Linux (it is claimed to support RedHat, Fedora, SuSE). Version 1.3 was released in late August. The developer is trying to attract attention to the quality (speed of the compiled code) of the product by conducting contests of the type "You'll Win if Your Code Runs 10% Faster". Besides, the web site provides multiple test results in various applications (they often use "64bit Commercial Compiler" :) ). As this compiler does not have an apparent 32-bit version (but it has an option to compile 32-bit code, moreover it is used in peak metrics) and the company kindly includes a full config file for SPEC CPU2000 into the package, we additionally obtained peak metrics for it. Note that these results are hors concours, because we use only base metrics for the other compilers.

Intel compilers have always demonstrated high quality of the code optimization both in synthetic tests and in life. The largest processor manufacturer managed to become a serious competitor to purely software companies. Of course, one can lament that they have always known how their hardware products operated to a nicety, but considerable investments into R&D also played their role. We have been testing Intel compilers starting from Version 5.0, and each new version demonstrates considerable performance gains of the compiled code.

Due to these compilers in many respects, Intel Pentium 4 processors demonstrate high results with resource-critical tasks, if a developer was not lazy to use a compiler from Intel :). Wide popularity of SIMD can surely be attributed to them as well.

Interestingly, Intel compilers demonstrate excellent speed on other processors as well :). However, starting from version 8.0 the company introduced the CPU maker check, but the optimization option (including vectorization and SIMD) for "generic" processors remained. By the way, only recently 64-bit compilers for AMD64 have managed to outscore the 32-bit Intel compiler in SPECfp_base2000, while in SPECint_base2000 it is still a leader.

Everybody has been looking forward to the release of compilers for EM64T version of the 64-bit Intel technology expanding IA32. Since the processors with EM64T were introduced this summer, the company couldn't leave them without software support, and the corresponding compiler has been released already in autumn. This special version of the compiler provides code compilation both for Intel CPUs with EM64T (Prescott core, SSE3) and for processors from other companies compatible with the 64-bit mode (but without SSE3 and some memory operation fine tuning options). The latter surely means AMD Athlon 64/Opteron :).

We used the following optimization keys in our tests:

  • gcc/g++/g77: -O3 -funroll-all-loops +PGO (-fprofile-arcs/-fbranch-probabilities), additional key -m32 for testing 32-bit code;
  • PGI: -fastsse -Mipa=fast (two compiler passes to use IPA);
  • Pathscale: config file from the package
  • Intel 8.0: -xW -O3 -ipo +FDO
  • Intel 8.1/EM64T: -xW -O3 -ipo +FDO

As before, these test results should be referred to as "estimated" according to the SPEC terminology, because not all compilers managed to complete the full set of tests (gcc does not have a compiler for Fortran 90, and Intel/EM64T didn't manage to compile 252.eon). However, all other formal requirements have been satisfied.

  gcc32 gcc64 pgi32 pgi64 psc1.3 psc1.3 peak ic80.xW ic81e.xW
164.gzip
1006
1164
777
1009
1347
1351
1192
1214
175.vpr
871
930
801
804
910
936
982
897
176.gcc
1190
1217
1105
1062
1218
1219
1047
1156
181.mcf
1029
667
977
670
665
1043
1041
683
186.crafty
1284
1873
1013
1498
1881
1898
1542
1930
197.parser
1028
974
748
707
927
1110
1175
838
252.eon
1173
1868
293
390
2035
2184
1589
253.perlbmk
1438
1507
1234
1217
1490
1597
1486
1450
254.gap
1200
1162
967
977
1386
1383
1497
1221
255.vortex
1513
1575
1308
1381
2139
2331
2064
2040
256.bzip2
979
1078
893
962
1125
1133
1070
1076
300.twolf
1005
868
960
827
926
1108
1074
883
SPECint_base2000 (est.)
1128
1186
875
907
1261
1380
1280

In SPECint_base2000 all compilers behave in a similar way, except for PGI, which record low results in 252.eon do not allow a decent integral mark (the vexed question – is it worth "digging deep" in synthetics, or is it better to restrict oneself to overall marks – is up to our readers to decide personally). If you want to find a leader, the integral mark points to the 32-bit Intel compiler (remember that the peak result of Pathscale is hors concours). Its serious competitor is Pathscale, in certain tests the difference varies from -36% to +28%. Its integral mark is lower just by 1.5%.

On the whole gcc and PGI are not that bad, in some tasks they demonstrate a good speed. Intel/EM64T (so far?) is outscored by its predecessor and can be now considered only as a potentially interesting compiler.

So, this time the bottom line for CINT2000 will be as follows: if speed is important to you, you should test all the above-mentioned compilers with your application. No doubt one of them will considerably raise the code execution speed of your application.

From the compatibility point of view, reliable compilation of twelve various applications (for the only exception) does not give cause for doubts concerning the quality of the reviewed products.

  gcc32 gcc64 pgi32 pgi64 psc1.3 psc1.3 peak ic80.xW ic81e.xW
168.wupwise
1089
1245
1456
1668
1694
1965
1713
1863
171.swim
1413
1443
2041
2111
2372
2433
2026
1983
172.mgrid
749
857
1138
1236
1465
1470
1165
1237
173.applu
836
949
1253
1335
1512
1820
1234
1356
177.mesa
1063
1626
983
1198
1668
1869
1708
1532
178.galgel
2027
2264
2202
2421
1786
1815
179.art
600
1129
1200
1169
1315
1858
1367
2197
183.equake
1471
1380
1193
1193
1496
1525
1412
1620
187.facerec
1481
2124
1751
1929
1456
1436
188.ammp
825
1022
843
981
1015
1083
910
996
189.lucas
1557
1613
1535
1774
1625
1774
191.fma3d
1346
1461
1358
1441
1309
1384
200.sixtrack
455
550
654
673
665
669
558
550
301.apsi
680
828
1042
1169
1201
1211
1020
1083
SPECfp_base2000 (est.)
1245
1373
1455
1598
1317
1414

In the CFP2000 tests Pathscale is still the leader. The second place is taken by the new product from Intel, which is outscored by its "brother" only in a single context. Its integral mark raised by 7.4%. But it was not enough to become the leader, only 3% separating it from this place.

PGI is 5.6% behind the leader in SPECfp_base2000, but in several tests the results vary from -28% to +21%.

Judging from the results, you shouldn't urgently switch to another compiler. However, as in CINT2000, there is a point in trying other compilers for calculation tasks, the spread in execution speed of different subtests being fortunately rather wide.



Besides the tests on the AMD64 platform, we also managed to take some readings on Intel Xeon/Nocona. We used the same versions of the operating system from SuSE. Note that we installed the initial releases, dated April 2004. Of course we updated the OS after the installation, but we had no problems with its operability. It should be noted that we did not use a heavily loaded computer (2 Intel Xeon 3.0 GHz (Nocona), Supermicro X6DA8-G2 (Intel E7525), 2x512 MB DDR2-400 SDRAM and Western Digital WD360 HDD (SATA)) and, to our mind, you shouldn't take these results as "everything is 100% working!", but the compatibility fact is doubtlessly positive.

This system was used for the SPEC CPU2000 tests with gcc, PGI, and Intel compilers. Unfortunately we had no time to test Pathscale, but we'll try to make up for it in the next material :).

Optimization keys and other settings are similar to the listed above for AMD64 (of course, for IC we used the -xP key instead of -xW). Note that the table does not contain the ic81e.xP results for the 252.eon, 253.perlbmk, 254.gap, and 255.vortex tests. Most likely, there will never be 252.eon results (the test uses the old method for managing streams, which will probably not be supported by the new versions of compilers), while the other three tests will probably be included in the new releases.

There is practically no point in considering the absolute results in the light of the eternal Intel vs AMD dispute – we did not use the fastest processor and the DDR2 usage is rather a negative point so far.

  gcc32 ic80.xP pgi32 gcc64 ic81e.xP pgi64
164.gzip
729
984
737
846
1016
791
175.vpr
739
822
685
708
743
658
176.gcc
1378
1593
1224
1244

1366

1123
181.mcf
845
853
806
498
564
488
186.crafty
868
1109
698
1249
1305
940
197.parser
840
1096
704
841
881
634
252.eon
756
1167
188
1168
208
253.perlbmk
1201
1531
1192
1268
1130
254.gap
1182
1588
1148
1239
1087
255.vortex
1239
2125
1195
1449
1245
256.bzip2
776
935
795
893
964
810
300.twolf
1004
1147
986
791
850
716
SPECint_base2000 (est.)
940
1197
790
974
749
 
gcc32
ic80.xP
pgi32
gcc64
ic81e.xP
pgi64
168.wupwise
1042
2314
1241
1016
2561
1026
171.swim
1559
1919
1949
1453
1909
1915
172.mgrid
714
1226
1116
695
1317
1035
173.applu
752
1227
1276
816
1287
1063
177.mesa
769
1300
934
1324
1526
1030
178.galgel
2063
2197
2030
2100
179.art
413
891
846
800
2725
807
183.equake
1555
1538
1409
1413
1814
1282
187.facerec
1571
1275
1643
1590
188.ammp
561
669
712
736
846
660
189.lucas
1704
1534
1648
1353
191.fma3d
1177
1143
1352
1039
200.sixtrack
286
539
514
406
530
339
301.apsi
582
911
864
620
972
816
SPECfp_base2000 (est.)
1260
1136
1461
1050

As you can see in the results, the most preferable compiler for Xeon/Nocona is the one from Intel. It could have been assumed even before the tests, though :). But the fact that one of the first 64-bit versions is quite operable is certainly pleasing.

Note that the code obtained using gcc and PGI was working on the new Intel processor without any shaman rituals. It is very nice and gives hope that other software, already ported to AMD64, will operate on EM64T without any complications.

It's interesting to compare the effect of the 64-bit transition on different platforms. This comparison is certainly of a conditional character – the choice of processors, platforms, compiler options is far from being univocal. That's why we recommend to hold back your far-reaching conclusions and consider these figures as an additional piece of information about 64 vs 32, Intel vs AMD, gcc vs IC, etc. Especially since you cannot possibly equalize all the parameters, so you have to content yourselves with these figures anyway. The following table contains percentage values of the changes caused by the transition from 32-bit to 64-bit software.

  gcc/Intel gcc/AMD ic/Intel ic/AMD pgi/Intel pgi/AMD
164.gzip
16,05
15,71
3,25
1,85
7,33
29,86
175.vpr
-4,19
6,77
-9,61
-8,66
-3,94
0,37
176.gcc
-9,72
2,27
-14,25
10,41
-8,25
-3,89
181.mcf
-41,07
-35,18
-33,88
-34,39
-39,45
-31,42
186.crafty
43,89
45,87
17,67
25,16
34,67
47,88
197.parser
0,12
-5,25
-19,62
-28,68
-9,94
-5,48
252.eon
54,50
59,25
10,64
33,11
253.perlbmk
5,58
4,80
-2,42
-5,20
-1,38
254.gap
4,82
-3,17
-18,44
-5,31
1,03
255.vortex
16,95
4,10
-1,16
4,18
5,58
256.bzip2
15,08
10,11
3,10
0,56
1,89
7,73
300.twolf
-21,22
-13,63
-25,89
-17,78
-27,38
-13,85
SPECint_base2000
3,62
5,14
-5,19
3,66
 
gcc/Intel
gcc/AMD
ic/Intel
ic/AMD
pgi/Intel
pgi/AMD
168.wupwise
-2,50
14,33
10,67
8,76
-17,32
14,56
171.swim
-6,80
2,12
-0,52
-2,12
-1,74
3,43
172.mgrid
-2,66
14,42
7,42
6,18
-7,26
8,61
173.applu
8,51
13,52
4,89
9,89
-16,69
6,54
177.mesa
72,17
52,96
17,38
-10,30
10,28
21,87
178.galgel
-1,60
1,62
-4,42
11,69
179.art
93,70
88,17
205,84
60,72
-4,61
-2,58
183.equake
-9,13
-6,19
17,95
14,73
-9,01
0,00
187.facerec
4,58
-1,37
24,71
43,42
188.ammp
31,19
23,88
26,46
9,45
-7,30
16,37
189.lucas
-3,29
9,17
-11,80
3,60
191.fma3d
14,87
5,73
-9,10
8,54
200.sixtrack
41,96
20,88
-1,67
-1,43
-34,05
2,91
301.apsi
6,53
21,76
6,70
6,18
-5,56
12,19
SPECfp_base2000
15,98
7,37
-7,57
10,28

From these figures you can see that the gcc behavior is the same on different processors – considerable gains and drops (if there are any) are almost always demonstrated on both platforms. So the effect of transition to the off-the-shelf 64-bit Linux will not depend on what 64-bit version you choose.

The situation with Intel compilers is more interesting. First of all note the considerable drop of indices in many CINT2000 tests on both platforms. Let's hope that these issues will be fixed in the new compiler versions. The effect is sometimes "a tad more positive" for AMD. What concerns CFP2000, almost +16% in the integral mark look quite good. On AMD the effect is worse, but there is nothing to be done here :(. We'll just have to use other compilers.

PGI performed quite well on the Intel processor in the 64-bit mode. Alas, this combination cannot be recommended for calculation tasks. Though it should be noted that the compiler may be "corrected" with the advance of EM64T processors. CFP2000 tests of the product from Portland Group on the AMD processors demonstrated performance gains in most tasks.

Conclusion

The appearance of a new competitor on the market of 64-bit compilers for the AMD64/EM64T platforms revived the would-be stagnation. Of course, working on the AMD platform, Intel 8.1/EM64T does not unveil the full CPU potential. But this fact does not prevent it from getting the second place after Pathscale in SPECfp_base2000 on AMD Athlon 64. It's doing worse in the SPECint_base2000 tests – the new product from Intel is unfortunately outscored even by its 32-bit partner.

What concerns the 64-bit version of the processor from Intel, the first tests demonstrated that the existing 64-bit software for AMD64 works fine on the new competing processor. A full set of compilers and their compatibility with AMD64 are particularly pleasing. Thus, porting software to EM64T will most likely consist in the operability tests of the software on the new core from Intel.


Kirill Kochetkov (kochet@ixbt.com),
October 6, 2004

Write a comment below. No registration needed!


Article navigation:



blog comments powered by Disqus

  Most Popular Reviews More    RSS  

AMD Phenom II X4 955, Phenom II X4 960T, Phenom II X6 1075T, and Intel Pentium G2120, Core i3-3220, Core i5-3330 Processors

Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
February 1, 2013 · Processor Roundups

Inno3D GeForce GTX 670 iChill, Inno3D GeForce GTX 660 Ti Graphics Cards

A couple of mid-range adapters with original cooling systems.
January 30, 2013 · Video cards: NVIDIA GPUs

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

AMD FX-8350 Processor

The first worthwhile Piledriver CPU.
September 11, 2012 · Processors: AMD

Consumed Power, Energy Consumption: Ivy Bridge vs. Sandy Bridge

Trying out the new method.
September 18, 2012 · Processors: Intel
  Latest Reviews More    RSS  

i3DSpeed, September 2013

Retested all graphics cards with the new drivers.
Oct 18, 2013 · 3Digests

i3DSpeed, August 2013

Added new benchmarks: BioShock Infinite and Metro: Last Light.
Sep 06, 2013 · 3Digests

i3DSpeed, July 2013

Added the test results of NVIDIA GeForce GTX 760 and AMD Radeon HD 7730.
Aug 05, 2013 · 3Digests

Gainward GeForce GTX 650 Ti BOOST 2GB Golden Sample Graphics Card

An excellent hybrid of GeForce GTX 650 Ti and GeForce GTX 660.
Jun 24, 2013 · Video cards: NVIDIA GPUs

i3DSpeed, May 2013

Added the test results of NVIDIA GeForce GTX 770/780.
Jun 03, 2013 · 3Digests
  Latest News More    RSS  

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  About us & Privacy policy  ·  Twitter  ·  Facebook


Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.