SPEC CPU2000. Part 17. AMD64/EM64T and 64-bit code. The third attempt

SPEC CPU2000. Part 17.
AMD64/EM64T and 64-bit code.
The third attempt.

		Tweet

Four months have already passed since we tested 64-bit compilers on the AMD64 platform. Today we'll proceed with our tests and will see what has changed since that time, because the 32-bit product from Intel has often been in advance of 64-bit compilers. And the main plot will be the release of a 64-bit compiler from Intel. Of course, it was initially created for CPUs from this company supporting the EM64T technology, but it also works fine with AMD64.

Testbed configuration:

AMD Athlon 64 3500+ CPU (2,2 GHz, 512 KB L2, Socket 939)
Gigabyte K8NSNXP-939 mainboard
2 x Kingston HyperX KHX4000/512 (operating as DDR400)
Operating system: SuSE Linux 9.1 x86-64 (Kernel 2.6.5-7.108, gcc 3.3.3-33 compiler)

We used the following compilers:

GNU gcc
PGI Workstation 5.2-2
Pathscale EKO Compiler Suite 1.3-108
Intel Compilers 8.0 (C/C++ 8.0.070, Fortran 8.0.050)
Intel Compilers 8.1 for EM64T (C/C++/Fortran 8.1.020)

We'll briefly introduce our participants for those who are not keeping up with our publications:

A standard compiler for Linux systems, gcc remains the most popular compiler for non-commercial use. We used a compiler version from the SuSE package (to be more exact – from the update), because the new (at the moment of our tests) version (we tried 3.4.2) did not provide a considerable performance gain in SPEC CPU2000 tests.

PGI is the first 64-bit commercial compiler for AMD64. PGI Workstation 5.2 package includes C, C++, Fortran, and Fortran90 compilers (as well as a debugger and a profiler). Supporting OpenMP and MPI. Since the first release, it has gone through several versions and the current one is 5.2-2. Note that new versions of PGI are in fact released almost every day, but unfortunately the developer does not always increase version numbers. So you can learn how fresh the release is only by the date of the installation package file. Fortran compiler from this package is also used in commercial applications. In particular, it was used to compile the 64-bit version of LS-DYNA for the AMD64 platform.

PathScale EKO Compiler Suite has appeared relatively recently, the first (1.0) version was released this year in spring. Interestingly, this product was initially developed to operate on the AMD64 platform. The package includes C, C++, and Fortran 77/90/95 compilers. It lacks debuggers and other utilities. The compiler works only in 64-bit versions of Linux (it is claimed to support RedHat, Fedora, SuSE). Version 1.3 was released in late August. The developer is trying to attract attention to the quality (speed of the compiled code) of the product by conducting contests of the type "You'll Win if Your Code Runs 10% Faster". Besides, the web site provides multiple test results in various applications (they often use "64bit Commercial Compiler" :) ). As this compiler does not have an apparent 32-bit version (but it has an option to compile 32-bit code, moreover it is used in peak metrics) and the company kindly includes a full config file for SPEC CPU2000 into the package, we additionally obtained peak metrics for it. Note that these results are hors concours, because we use only base metrics for the other compilers.

Intel compilers have always demonstrated high quality of the code optimization both in synthetic tests and in life. The largest processor manufacturer managed to become a serious competitor to purely software companies. Of course, one can lament that they have always known how their hardware products operated to a nicety, but considerable investments into R&D also played their role. We have been testing Intel compilers starting from Version 5.0, and each new version demonstrates considerable performance gains of the compiled code.

Due to these compilers in many respects, Intel Pentium 4 processors demonstrate high results with resource-critical tasks, if a developer was not lazy to use a compiler from Intel :). Wide popularity of SIMD can surely be attributed to them as well.

Interestingly, Intel compilers demonstrate excellent speed on other processors as well :). However, starting from version 8.0 the company introduced the CPU maker check, but the optimization option (including vectorization and SIMD) for "generic" processors remained. By the way, only recently 64-bit compilers for AMD64 have managed to outscore the 32-bit Intel compiler in SPECfp_base2000, while in SPECint_base2000 it is still a leader.

Everybody has been looking forward to the release of compilers for EM64T version of the 64-bit Intel technology expanding IA32. Since the processors with EM64T were introduced this summer, the company couldn't leave them without software support, and the corresponding compiler has been released already in autumn. This special version of the compiler provides code compilation both for Intel CPUs with EM64T (Prescott core, SSE3) and for processors from other companies compatible with the 64-bit mode (but without SSE3 and some memory operation fine tuning options). The latter surely means AMD Athlon 64/Opteron :).

We used the following optimization keys in our tests:

gcc/g++/g77: -O3 -funroll-all-loops +PGO (-fprofile-arcs/-fbranch-probabilities), additional key -m32 for testing 32-bit code;
PGI: -fastsse -Mipa=fast (two compiler passes to use IPA);
Pathscale: config file from the package
Intel 8.0: -xW -O3 -ipo +FDO
Intel 8.1/EM64T: -xW -O3 -ipo +FDO

As before, these test results should be referred to as "estimated" according to the SPEC terminology, because not all compilers managed to complete the full set of tests (gcc does not have a compiler for Fortran 90, and Intel/EM64T didn't manage to compile 252.eon). However, all other formal requirements have been satisfied.

	gcc32	gcc64	pgi32	pgi64	psc1.3	psc1.3 peak	ic80.xW	ic81e.xW
164.gzip	1006	1164	777	1009	1347	1351	1192	1214
175.vpr	871	930	801	804	910	936	982	897
176.gcc	1190	1217	1105	1062	1218	1219	1047	1156
181.mcf	1029	667	977	670	665	1043	1041	683
186.crafty	1284	1873	1013	1498	1881	1898	1542	1930
197.parser	1028	974	748	707	927	1110	1175	838
252.eon	1173	1868	293	390	2035	2184	1589
253.perlbmk	1438	1507	1234	1217	1490	1597	1486	1450
254.gap	1200	1162	967	977	1386	1383	1497	1221
255.vortex	1513	1575	1308	1381	2139	2331	2064	2040
256.bzip2	979	1078	893	962	1125	1133	1070	1076
300.twolf	1005	868	960	827	926	1108	1074	883
SPECint_base2000 (est.)	1128	1186	875	907	1261	1380	1280

In SPECint_base2000 all compilers behave in a similar way, except for PGI, which record low results in 252.eon do not allow a decent integral mark (the vexed question – is it worth "digging deep" in synthetics, or is it better to restrict oneself to overall marks – is up to our readers to decide personally). If you want to find a leader, the integral mark points to the 32-bit Intel compiler (remember that the peak result of Pathscale is hors concours). Its serious competitor is Pathscale, in certain tests the difference varies from -36% to +28%. Its integral mark is lower just by 1.5%.

On the whole gcc and PGI are not that bad, in some tasks they demonstrate a good speed. Intel/EM64T (so far?) is outscored by its predecessor and can be now considered only as a potentially interesting compiler.

So, this time the bottom line for CINT2000 will be as follows: if speed is important to you, you should test all the above-mentioned compilers with your application. No doubt one of them will considerably raise the code execution speed of your application.

From the compatibility point of view, reliable compilation of twelve various applications (for the only exception) does not give cause for doubts concerning the quality of the reviewed products.

	gcc32	gcc64	pgi32	pgi64	psc1.3	psc1.3 peak	ic80.xW	ic81e.xW
168.wupwise	1089	1245	1456	1668	1694	1965	1713	1863
171.swim	1413	1443	2041	2111	2372	2433	2026	1983
172.mgrid	749	857	1138	1236	1465	1470	1165	1237
173.applu	836	949	1253	1335	1512	1820	1234	1356
177.mesa	1063	1626	983	1198	1668	1869	1708	1532
178.galgel			2027	2264	2202	2421	1786	1815
179.art	600	1129	1200	1169	1315	1858	1367	2197
183.equake	1471	1380	1193	1193	1496	1525	1412	1620
187.facerec			1481	2124	1751	1929	1456	1436
188.ammp	825	1022	843	981	1015	1083	910	996
189.lucas			1557	1613	1535	1774	1625	1774
191.fma3d			1346	1461	1358	1441	1309	1384
200.sixtrack	455	550	654	673	665	669	558	550
301.apsi	680	828	1042	1169	1201	1211	1020	1083
SPECfp_base2000 (est.)			1245	1373	1455	1598	1317	1414

In the CFP2000 tests Pathscale is still the leader. The second place is taken by the new product from Intel, which is outscored by its "brother" only in a single context. Its integral mark raised by 7.4%. But it was not enough to become the leader, only 3% separating it from this place.

PGI is 5.6% behind the leader in SPECfp_base2000, but in several tests the results vary from -28% to +21%.

Judging from the results, you shouldn't urgently switch to another compiler. However, as in CINT2000, there is a point in trying other compilers for calculation tasks, the spread in execution speed of different subtests being fortunately rather wide.

Besides the tests on the AMD64 platform, we also managed to take some readings on Intel Xeon/Nocona. We used the same versions of the operating system from SuSE. Note that we installed the initial releases, dated April 2004. Of course we updated the OS after the installation, but we had no problems with its operability. It should be noted that we did not use a heavily loaded computer (2 õ Intel Xeon 3.0 GHz (Nocona), Supermicro X6DA8-G2 (Intel E7525), 2x512 MB DDR2-400 SDRAM and Western Digital WD360 HDD (SATA)) and, to our mind, you shouldn't take these results as "everything is 100% working!", but the compatibility fact is doubtlessly positive.

This system was used for the SPEC CPU2000 tests with gcc, PGI, and Intel compilers. Unfortunately we had no time to test Pathscale, but we'll try to make up for it in the next material :).

Optimization keys and other settings are similar to the listed above for AMD64 (of course, for IC we used the -xP key instead of -xW). Note that the table does not contain the ic81e.xP results for the 252.eon, 253.perlbmk, 254.gap, and 255.vortex tests. Most likely, there will never be 252.eon results (the test uses the old method for managing streams, which will probably not be supported by the new versions of compilers), while the other three tests will probably be included in the new releases.

There is practically no point in considering the absolute results in the light of the eternal Intel vs AMD dispute – we did not use the fastest processor and the DDR2 usage is rather a negative point so far.

	gcc32	ic80.xP	pgi32	gcc64	ic81e.xP	pgi64
164.gzip	729	984	737	846	1016	791
175.vpr	739	822	685	708	743	658
176.gcc	1378	1593	1224	1244	1366	1123
181.mcf	845	853	806	498	564	488
186.crafty	868	1109	698	1249	1305	940
197.parser	840	1096	704	841	881	634
252.eon	756	1167	188	1168		208
253.perlbmk	1201	1531	1192	1268		1130
254.gap	1182	1588	1148	1239		1087
255.vortex	1239	2125	1195	1449		1245
256.bzip2	776	935	795	893	964	810
300.twolf	1004	1147	986	791	850	716
SPECint_base2000 (est.)	940	1197	790	974		749
	gcc32	ic80.xP	pgi32	gcc64	ic81e.xP	pgi64
168.wupwise	1042	2314	1241	1016	2561	1026
171.swim	1559	1919	1949	1453	1909	1915
172.mgrid	714	1226	1116	695	1317	1035
173.applu	752	1227	1276	816	1287	1063
177.mesa	769	1300	934	1324	1526	1030
178.galgel		2063	2197		2030	2100
179.art	413	891	846	800	2725	807
183.equake	1555	1538	1409	1413	1814	1282
187.facerec		1571	1275		1643	1590
188.ammp	561	669	712	736	846	660
189.lucas		1704	1534		1648	1353
191.fma3d		1177	1143		1352	1039
200.sixtrack	286	539	514	406	530	339
301.apsi	582	911	864	620	972	816
SPECfp_base2000 (est.)		1260	1136		1461	1050

As you can see in the results, the most preferable compiler for Xeon/Nocona is the one from Intel. It could have been assumed even before the tests, though :). But the fact that one of the first 64-bit versions is quite operable is certainly pleasing.

Note that the code obtained using gcc and PGI was working on the new Intel processor without any shaman rituals. It is very nice and gives hope that other software, already ported to AMD64, will operate on EM64T without any complications.

It's interesting to compare the effect of the 64-bit transition on different platforms. This comparison is certainly of a conditional character – the choice of processors, platforms, compiler options is far from being univocal. That's why we recommend to hold back your far-reaching conclusions and consider these figures as an additional piece of information about 64 vs 32, Intel vs AMD, gcc vs IC, etc. Especially since you cannot possibly equalize all the parameters, so you have to content yourselves with these figures anyway. The following table contains percentage values of the changes caused by the transition from 32-bit to 64-bit software.

	gcc/Intel	gcc/AMD	ic/Intel	ic/AMD	pgi/Intel	pgi/AMD
164.gzip	16,05	15,71	3,25	1,85	7,33	29,86
175.vpr	-4,19	6,77	-9,61	-8,66	-3,94	0,37
176.gcc	-9,72	2,27	-14,25	10,41	-8,25	-3,89
181.mcf	-41,07	-35,18	-33,88	-34,39	-39,45	-31,42
186.crafty	43,89	45,87	17,67	25,16	34,67	47,88
197.parser	0,12	-5,25	-19,62	-28,68	-9,94	-5,48
252.eon	54,50	59,25			10,64	33,11
253.perlbmk	5,58	4,80		-2,42	-5,20	-1,38
254.gap	4,82	-3,17		-18,44	-5,31	1,03
255.vortex	16,95	4,10		-1,16	4,18	5,58
256.bzip2	15,08	10,11	3,10	0,56	1,89	7,73
300.twolf	-21,22	-13,63	-25,89	-17,78	-27,38	-13,85
SPECint_base2000	3,62	5,14			-5,19	3,66
	gcc/Intel	gcc/AMD	ic/Intel	ic/AMD	pgi/Intel	pgi/AMD
168.wupwise	-2,50	14,33	10,67	8,76	-17,32	14,56
171.swim	-6,80	2,12	-0,52	-2,12	-1,74	3,43
172.mgrid	-2,66	14,42	7,42	6,18	-7,26	8,61
173.applu	8,51	13,52	4,89	9,89	-16,69	6,54
177.mesa	72,17	52,96	17,38	-10,30	10,28	21,87
178.galgel			-1,60	1,62	-4,42	11,69
179.art	93,70	88,17	205,84	60,72	-4,61	-2,58
183.equake	-9,13	-6,19	17,95	14,73	-9,01	0,00
187.facerec			4,58	-1,37	24,71	43,42
188.ammp	31,19	23,88	26,46	9,45	-7,30	16,37
189.lucas			-3,29	9,17	-11,80	3,60
191.fma3d			14,87	5,73	-9,10	8,54
200.sixtrack	41,96	20,88	-1,67	-1,43	-34,05	2,91
301.apsi	6,53	21,76	6,70	6,18	-5,56	12,19
SPECfp_base2000			15,98	7,37	-7,57	10,28

From these figures you can see that the gcc behavior is the same on different processors – considerable gains and drops (if there are any) are almost always demonstrated on both platforms. So the effect of transition to the off-the-shelf 64-bit Linux will not depend on what 64-bit version you choose.

The situation with Intel compilers is more interesting. First of all note the considerable drop of indices in many CINT2000 tests on both platforms. Let's hope that these issues will be fixed in the new compiler versions. The effect is sometimes "a tad more positive" for AMD. What concerns CFP2000, almost +16% in the integral mark look quite good. On AMD the effect is worse, but there is nothing to be done here :(. We'll just have to use other compilers.

PGI performed quite well on the Intel processor in the 64-bit mode. Alas, this combination cannot be recommended for calculation tasks. Though it should be noted that the compiler may be "corrected" with the advance of EM64T processors. CFP2000 tests of the product from Portland Group on the AMD processors demonstrated performance gains in most tasks.

Conclusion

The appearance of a new competitor on the market of 64-bit compilers for the AMD64/EM64T platforms revived the would-be stagnation. Of course, working on the AMD platform, Intel 8.1/EM64T does not unveil the full CPU potential. But this fact does not prevent it from getting the second place after Pathscale in SPECfp_base2000 on AMD Athlon 64. It's doing worse in the SPECint_base2000 tests – the new product from Intel is unfortunately outscored even by its 32-bit partner.

What concerns the 64-bit version of the processor from Intel, the first tests demonstrated that the existing 64-bit software for AMD64 works fine on the new competing processor. A full set of compilers and their compatibility with AMD64 are particularly pleasing. Thus, porting software to EM64T will most likely consist in the operability tests of the software on the new core from Intel.

Kirill Kochetkov (kochet@ixbt.com),
October 6, 2004

Write a comment below. No registration needed!

Article navigation:

blog comments powered by Disqus

SPEC CPU2000. Part 17.AMD64/EM64T and 64-bit code.The third attempt.

Conclusion

SPEC CPU2000. Part 17.
AMD64/EM64T and 64-bit code.
The third attempt.