SPEC CPU2000. Part 19. EM64T in Intel Pentium 4

We already tried to evaluate the performance of a 64-bit processor from Intel in SPEC CPU2000. But it was the server Xeon. Though interesting for a mass of users, it's still intended for the other market.

The launch of the 600th series of Intel Pentium 4 sparked the following question in many people: "Whose 64 bits are better?" Unfortunately, such seemingly simple questions are not always easy to answer. Much depends on parameters, which are left off-camera – operating system, compiler, general system architecture, and certainly price.

Thus, a more generalized question would be more correct, to my mind: "Performance of the system based on hardware (H) under the operating system (O) and the compiler (C) at the task (T)". Of course, the 64-bit architecture formally rests with its classic "much memory to a single process" and a large set of wider registers. But SPEC CPU2000 does not care about the former, and the latter depends much on a compiler.

In this article we shall analyze performance of the Intel Pentium 4 660 processor under Linux and Windows. The former has been a 64-bit system for a long time already, this system is often used to solve heavy computing tasks. And Windows XP x64 has wormed its way here as a way to estimate what we can expect from 64 bits under this OS. To put it mildly, there are no special applications for this OS (just a special update of FarCry, but it's of no help either).

Intel Pentium 4 660

We have already published a detailed review of this model with popular applications under 32-bit OS. Our conclusions in brief: Pentium 660 goes on a par with Athlon 64 FX-55 as far as the generalized term "performance" is concerned.

As the tests of this processor have unexpectedly become so evocative to me, I'd like to share my feelings with our readers.

The first impression – it's really hot… Much has been written about this problem already, besides it's actually a problem of the cooling system, but… The stock cooler just hasn't managed this processor… Interestingly, I haven't noticed any signs of the coming problem. But TM2 technology snapped to action and the test results were much lower – performance losses amounted to 18%. The only way to learn whether the CPU throttles is to use special utilities like RMClock. So, a piece of advice to all concerned: pay close attention to the overheating problem.

In the next tests we replaced the cooler from Intel for Foxconn CMI-775-1N (which looked absolutely the same) and we ran all the tests simultaneously with RMClock. This cooler noticeably improved the situation – the CPU was not overheated, but the additional 80- and 120-mm fans were noisy.

We conducted a research and found Zalman 7700Cu (recommended by many authors), besides, we changed the motherboard from Albatron PX925XE Pro-R to ABIT Fatal1ty AA8XE, and Corsair XMS2 memory was replaced by more advanced XMS2 Pro.

And there was peace for some time. However, fan headers on the new motherboard were placed very inconveniently, so I decided to rely on my Zalman and didn't install additional fans. That was a mistake. When I compiled tests with three different compilers simultaneously, the system informed me that it was hot and that it would rather turn off…

Note that the results were noticeably lower with this hardware configuration. I tried to fix the problem in BIOS, but in vain. As a result, I got back the Albatron but retained the memory. Despite the formally matching timings, two tests demonstrated that the new memory was slower by 5-6%. As our objective in this article is not to break any record, we decided to stick to this configuration. But it left a gall in my mind.

SuSE 9.2

As always, the new release (at least for our tests) of the popular distribution package from Novell produces a nice impression – convenient installation procedure, a good software bundle, Kernel 2.6, installers for two architectures (IA32 and AMD64/EM64T) on the same disc (it's a double layered DVD now, so a backup copy will be much more expensive). I think there is no point in mentioning the EM64T compatibility – SuSE has been supporting the 64-bit Intel architecture since April 2004. All tested compilers also work well with this operating system. Note that the manufacturer currently offers Version 9.3 of this product.

What concerns the above mentioned message, everything turned out rather simple – the OS is developed with regard to the correct standards and supports modern ACPI technologies. So, when the temperature exceeded 85 degrees, specified in BIOS, the system just powered off (having written the reason of its behaviour to the log).

Probably the only thing that didn't work "out-of-the-box" was temperature and fan monitoring, but it could have to do with BIOS and we had no plans on configuring additional monitoring packages.

PC configuration

So, the final tests were carried out under the following configuration:

CPU: Intel Pentium 4 660 (3.6 GHz, 2 MB L2, Socket 775)
Motherboard: Albatron PX925XE Pro-R
RAM: 2 x Corsair CM2X512A-4300C3PRO (operating as DDR2-533 with 3-3-3-6 timings)
OS: SuSE Linux 9.2, i386 and x86-64 versions
OS: Windows XP Pro SP2 and Windows XP Pro x64.

Cooling: Zalman 7700Cu in tandem with 80- and 120-mm fans (operating at reduced rpm). As we already know, SPEC CPU2000 results do not depend on a video card and a hard drive. But we shall specify anyway (just for the record) that we used ATI Radeon X600 and Seagate Barracuda V SATA. Power supply: 460W power supply unit from FSP.

Linux Tests

As our tests take several days to perform and developers like to update their compilers very often, we decided to settle on a single set of versions and use new releases only in the next articles (especially as we know from our experience that the code performance is not changed much in minor releases).

So, we came up with the following set of well-known products:

GNU gcc 3.3.4 (32/64-bit versions included into OS)
PGI Workstation 5.2-4 (32/64-bit versions)
Pathscale EKO Compiler Suite 2.1-280 (64-bit version)
Intel Compilers 8.1 (C/C++ 8.1.030, Fortran 8.1.026)
Intel Compilers 8.1e for EM64T (C/C++/Fortran 8.1.026)

All these compilers have been known to our constant readers for a long time, so we shall not dwell on their descriptions.

gcc/g77 is a classic compiler in Linux systems. It goes without saying that it supports AMD64/EM64T. Efficiency interests developers only after the compatibility issue. The package does not include a Fortran 90 compiler, so it cannot be used to get complete results in tests with floating point operations.

The Portland Group product is rather popular, due to OpenMP and MPI support in particular. That's one of the first commercial compilers with AMD64 support. The latest available version is 6.0-2. It features PGO support, but it results in code execution speed drop in SPEC CPU2000 tests. But the research is in full swing and I hope that this problem will be solved in the next versions. As for now, we'll keep using Build 5.2.

Pathscale is a relatively new product, it has been initially created with 64-bit calculations on AMD64/EM64T architectures in mind. It's in the rapid development stage (its version has grown to 2.1 for a year). However, some versions have minor problems, e.g. not implemented library functions. We shall publish base results as well as peak results hors concours, because this application has no 32-bit versions. Note that we modified the base configuration file (included into the package) for peak runs – now it uses ACML 2.6.0 (well… funny… ACML for an Intel processor… let's see what will happen), 3DNow! is disabled in two tests.

Intel products have deserved their title of the best product from "the author of Pentium". Indeed, who knows all ins and outs of their internal architecture and can develop an effective compiler better than the author. Note that it's currently the only commercial compiler that supports SSE3 in Prescott core.

We traditionally don't try to squeeze maximum performance and use base metrics, we also use identical optimization keys for all tests if possible. Subtleties mostly have to do with the settings for porting code to various operating systems. We can e-mail configuration files to all interested readers. Here are the main optimization keys that we used:

Gcc: -O3 -funroll-all-loops -fprofile-arcs/-fbranch-probabilities;
PGI: -fastsse -Mipa=fast;
Pathscale: -Ofast -fb_create fbdata/-fb_opt fbdata;
Intel: -fast -prof_gen/-prof_use.

Attention: some of the results should again be taken as "estimated" (in terms of SPEC), because they are obtained on beta and in-house compiler versions. However, they probably won't be very different from the official results (at least I haven't recently come across such situations).

	Linux 32			Linux 64					Comparison 64 vs. 32 (%)
	gcc	ic8.1	pgi5.2	gcc	ic8.1	pgi5.2	path2.1	path2.1 peak	gcc	ic8.1	pgi5.2
164.gzip	886	1152	881	1000	1216	950	1086	1082	12.9	5.6	7.9
175.vpr	1068	1223	989	1095	1180	1002	1052	1091	2.5	-3,5	1.3
176.gcc	1723	2102	1585	1704	1899	1514	1677	1676	-1,1	-9,7	-4,5
181.mcf	1499	1977	1395	780	925	755	776	1494	-48,0	-53,2	-45,9
186.crafty	1032	1347	832	1502	1625	1122	1351	1434	45.5	20.6	34.9
197.parser	1071	1457	895	1155	1219	831	1067	1204	7.8	-16,3	-7,2
252.eon	885	1873	221	1404	2251	266	1442	1500	58.6	20.2	20.4
253.perlbmk	1413	2103	1448	1575	2199	1387	1564	1729	11.5	4.6	-4,2
254.gap	1447	1944	1389	1554	1930	1374	1620	1609	7.4	-0,7	-1,1
255.vortex	1586	2560	1516	1896	2786	1621	2460	2597	19.6	8.8	7.0
256.bzip2	1075	1308	1098	1265	1413	1136	1232	1207	17.7	8.0	3.5
300.twolf	1454	1664	1469	1421	1662	1260	1295	1623	-2,3	-0,1	-14,2
SPECint_base2000	1231	1676	1038	1326	1614	1015	1330	1478	7.7	-3,7	-2,2

168.wupwise	1250	2768	1507	1238	3230	1698	2066	2244	-1,0	16.7	12.7
171.swim	1881	2585	2784	2001	2569	2758	2708	2704	6.4	-0,6	-0,9
172.mgrid	881	1612	1432	864	1854	1692	1324	1467	-2,0	15.0	18.2
173.applu	914	1583	1681	1006	1623	1764	1612	1611	10.1	2.5	5.0
177.mesa	913	1519	1102	1560	2044	1232	1506	1776	70.9	34.6	11.8
178.galgel		3461	3157		3544	3397	2535	3002		2.4	7.6
179.art	990	3629	1949	1828	5824	1819	3945	4665	84.7	60.5	-6,7
183.equake	2100	2141	1793	1951	2485	1996	2161	2083	-7,1	16.1	11.3
187.facerec		2078	1653		2210	2252	2378	2518		6.4	36.2
188.ammp	831	991	1130	1198	1426	1322	1099	1169	44.2	43.9	17.0
189.lucas		2232	1960		2245	2063	2179	2168		0.6	5.3
191.fma3d		1414	1452		1725	1624	1219	1227		22.0	11.9
200.sixtrack	343	645	618	480	634	661	559	531	40.0	-1,7	7.0
301.apsi	779	1336	1190	845	1363	1317	1186	1245	8.5	2.0	10.7
SPECfp_base2000		1814	1554		2076	1711	1707	1803		14.4	10.1

You can see from the CINT2000 tests that the total results would have been much better, if not for the significant 50% slump in 181.mcf. We know from previous tests that this test depends much on the speed of memory operations. And something probably goes wrong for the 64-bit code. Perhaps, it runs out of cache or its 64-bit mode peculiarities do not allow efficient operation. This assumption is also supported by the 181.mcf results for dual processor configurations.

Note that the switch to 64 bits in CINT2000 tests looks the most advantageous for the non-commercial gcc compiler. It's also not bad in CFP2000.

The product from Intel expectedly sticks to the highest results on processors from this company. Good news: it demonstrates no serious code execution speed drops in CFP2000. But the situation in CINT2000 is worse. Well, it's still the best in total score in these tests. What concerns some tests with real arithmetic, it was defeated in four such tests by products from Portland Group and Pathscale. Its defeat in 171.swim looks strange, because this test depends much on the speed of memory operations, and Intel should have been the best in this respect.

PGI still demonstrates low results in several CINT2000 tests, which rules it out from competition in total score. However, it's no champion in the other tests either, it cannot even catch up with gcc. It's better at real arithmetic, in general this compiler goes on a par with Pathscale, it even wins four tasks from Intel.

Pathscale EKO Compiler Suite, though quite new, competes well with such classic compilers as Intel and PGI. It stands between Intel and PGI in CINT2000, loses 8 out of 14 CFP2000 tests to PGI, and wins two of them from Intel. Though it's intended for 64-bit platforms, five tests in peak configuration use the –m32 key (including the ill-fated 181.mcf), and it means that not all tasks are good for the new configuration so far. By the way, ACML provides almost 20% gain in the galgel test.

What concerns switching to 64-bit on Intel platform in general, it probably makes sense for such tasks as CFP2000. There is some gain, but not that large – 14% and 10% gain for Intel and Portland Group compilers in SPECfp_base2000 may be important for some users. Especially as we'll lose nothing due to the complete compatibility with 32-bit code (according to our tests, the speed of 32-bit code under a 64-bit OS is practically no different from the performance under the native system).

Windows XP Pro x64 Edition

We also managed to test a couple of compilers under a recently released 64-bit version of Microsoft OS. We used the April PSDK (3790.1830) for 64-bit libraries. Regular Windows XP Pro SP2 was used as an opponent.

We used the following compilers:

Intel Compiler 8.1 (C 027, Fortran 030)
Compiler from Microsoft Visual Studio 2003 (13.10.3077)
Intel Compiler 8.1e for EM64T (C 018, Fortran 017)
64-bit compiler from Microsoft PSDK (14.00.40310.41)

It should be noted that Portland Group also provided a beta version of its compiler for the x64 Windows version. But it was only Fortran (so far), and we didn't manage to get at least some results fast, so we'll have to wait for the release.

	Windows 32 bit		Windows 64 bit		Comparison 64 vs.64 vs. 32 (%)
	ic8.1	msvc	ic8.1	msvc	ic8.1	msvc
164.gzip	1209	972	1213	1057	0.3	8.7
175.vpr	1248	1068	1233	1094	-1,2	2.4
176.gcc	2044		1921	1349	-6,0
181.mcf	2096	1574	2057	1000	-1,9	-36,5
186.crafty	1352	1149	1589	1417	17.5	23.3
197.parser	1498	1119	1411	786	-5,8	-29,8
252.eon	2265	1154	2487	1485	9.8	28.7
253.perlbmk	1898	1507	2069	1498	9.0	-0,6
254.gap	1942	1691	1915	1660	-1,4	-1,8
255.vortex	2913	1719	2881	1580	-1,1	-8,1
256.bzip2	1320	1152	1408	1209	6.7	5.0
300.twolf	1791	1418	1791	1448	0.0	2.1
SPECint_base2000	1737		1770	1271	1.9

168.wupwise	2798		3081		10.1
171.swim	2569		2534		-1,4
172.mgrid	1621		1862		14.9
173.applu	1596		1672		4.8
177.mesa	1588	869	2095	1535	31.9	76.7
178.galgel	3661		3662		0.0
179.art	4501	2086	6118	1969	35.9	-5,6
183.equake	2120	1937	2441	1785	15.1	-7,9
187.facerec	2017		2180		8.1
188.ammp	1352	1143	1298	903	-4,0	-21,0
189.lucas	2278		2235		-1,9
191.fma3d	1576		1721		9.2
200.sixtrack	651		642		-1,4
301.apsi	1358		1350		-0,6
SPECfp_base2000	1915		2069		8.0

It would be wrong to compare these results with previous tests under Windows, because we used an AMD processor that time. On the whole, Intel compiler's transition to 64 bits can be described as "it has become a tad better", but Microsoft results depend much on an application. In comparison with Intel, its fluctuations are noticeably higher both ways.

I'd like to note an interesting moment: 181.mcf test results don't drop compared to tests under Linux. Perhaps, it's the effect of a different memory operation model (reference size and int/long).

What concerns the Linux vs. Windows comparison, the results of Intel compilers are close, but their performance under Windows is still a tad higher. Especially if we compare 32-bit versions.

Bottom line

The development of 64-bit platforms takes its normal course. The launch of processors with EM64T technology has "suddenly" shown that all projects developed for AMD64 processors work well on their twin CPUs. The release of Windows x64 should accelerate this process, especially as the software support is already available at a decent level.

Under Linux, programmers can use the standard gcc compiler, if we don't take into consideration commercial projects on Fortran. But speaking of the performance race, you cannot do without commercial products. And Intel's compiler is an obvious and justified choice for processors from this company. New compiler versions from Portland Group and Pathscale may just as well compete with it in performance. But they are probably intended to run as part of high-performance clusters. Unfortunately, SPEC CPU2000 cannot measure it. So when you choose a compiler for "large-scale" systems, you should take into account not only its performance results, but also its support for modern technologies, program interfaces and standards.

Test results under Windows are contradictory. It's very difficult to forecast how the real programs will run under a new system. On the one hand, the situation with integer applications is not very bad (though it's just the luck in case of MSVC). On the other hand, this reason is not enough for the total upgrade of your hardware. CAD and similar complex applications will use Intel's compiler and be happy. But it's hard to imagine a game, written on Fortran and compiled in IC. Labour-intensive code fragments will most likely be written in 64-bit assembler (especially as the "right" software features all necessary code snippets in asm), and the other parts will be up to MSVC.

We express our gratitude to Novell
for the provided distribution disc of SuSE Linux

Kirill Kochetkov (kochet@ixbt.com)
June 14, 2005.

Write a comment below. No registration needed!