iXBT Labs - Computer Hardware in Detail

Platform

Video

Multimedia

Mobile

Other

SPEC CPU2000. Part 19.
EM64T in Intel Pentium 4

June 14, 2005




We already tried to evaluate the performance of a 64-bit processor from Intel in SPEC CPU2000. But it was the server Xeon. Though interesting for a mass of users, it's still intended for the other market.

The launch of the 600th series of Intel Pentium 4 sparked the following question in many people: "Whose 64 bits are better?" Unfortunately, such seemingly simple questions are not always easy to answer. Much depends on parameters, which are left off-camera – operating system, compiler, general system architecture, and certainly price.

Thus, a more generalized question would be more correct, to my mind: "Performance of the system based on hardware (H) under the operating system (O) and the compiler (C) at the task (T)". Of course, the 64-bit architecture formally rests with its classic "much memory to a single process" and a large set of wider registers. But SPEC CPU2000 does not care about the former, and the latter depends much on a compiler.

In this article we shall analyze performance of the Intel Pentium 4 660 processor under Linux and Windows. The former has been a 64-bit system for a long time already, this system is often used to solve heavy computing tasks. And Windows XP x64 has wormed its way here as a way to estimate what we can expect from 64 bits under this OS. To put it mildly, there are no special applications for this OS (just a special update of FarCry, but it's of no help either).

Intel Pentium 4 660

We have already published a detailed review of this model with popular applications under 32-bit OS. Our conclusions in brief: Pentium 660 goes on a par with Athlon 64 FX-55 as far as the generalized term "performance" is concerned.

As the tests of this processor have unexpectedly become so evocative to me, I'd like to share my feelings with our readers.

The first impression – it's really hot… Much has been written about this problem already, besides it's actually a problem of the cooling system, but… The stock cooler just hasn't managed this processor… Interestingly, I haven't noticed any signs of the coming problem. But TM2 technology snapped to action and the test results were much lower – performance losses amounted to 18%. The only way to learn whether the CPU throttles is to use special utilities like RMClock. So, a piece of advice to all concerned: pay close attention to the overheating problem.

In the next tests we replaced the cooler from Intel for Foxconn CMI-775-1N (which looked absolutely the same) and we ran all the tests simultaneously with RMClock. This cooler noticeably improved the situation – the CPU was not overheated, but the additional 80- and 120-mm fans were noisy.

We conducted a research and found Zalman 7700Cu (recommended by many authors), besides, we changed the motherboard from Albatron PX925XE Pro-R to ABIT Fatal1ty AA8XE, and Corsair XMS2 memory was replaced by more advanced XMS2 Pro.

And there was peace for some time. However, fan headers on the new motherboard were placed very inconveniently, so I decided to rely on my Zalman and didn't install additional fans. That was a mistake. When I compiled tests with three different compilers simultaneously, the system informed me that it was hot and that it would rather turn off…

Note that the results were noticeably lower with this hardware configuration. I tried to fix the problem in BIOS, but in vain. As a result, I got back the Albatron but retained the memory. Despite the formally matching timings, two tests demonstrated that the new memory was slower by 5-6%. As our objective in this article is not to break any record, we decided to stick to this configuration. But it left a gall in my mind.

SuSE 9.2

As always, the new release (at least for our tests) of the popular distribution package from Novell produces a nice impression – convenient installation procedure, a good software bundle, Kernel 2.6, installers for two architectures (IA32 and AMD64/EM64T) on the same disc (it's a double layered DVD now, so a backup copy will be much more expensive). I think there is no point in mentioning the EM64T compatibility – SuSE has been supporting the 64-bit Intel architecture since April 2004. All tested compilers also work well with this operating system. Note that the manufacturer currently offers Version 9.3 of this product.

What concerns the above mentioned message, everything turned out rather simple – the OS is developed with regard to the correct standards and supports modern ACPI technologies. So, when the temperature exceeded 85 degrees, specified in BIOS, the system just powered off (having written the reason of its behaviour to the log).

Probably the only thing that didn't work "out-of-the-box" was temperature and fan monitoring, but it could have to do with BIOS and we had no plans on configuring additional monitoring packages.

PC configuration

So, the final tests were carried out under the following configuration:

  • CPU: Intel Pentium 4 660 (3.6 GHz, 2 MB L2, Socket 775)
  • Motherboard: Albatron PX925XE Pro-R
  • RAM: 2 x Corsair CM2X512A-4300C3PRO (operating as DDR2-533 with 3-3-3-6 timings)
  • OS: SuSE Linux 9.2, i386 and x86-64 versions
  • OS: Windows XP Pro SP2 and Windows XP Pro x64.

Cooling: Zalman 7700Cu in tandem with 80- and 120-mm fans (operating at reduced rpm). As we already know, SPEC CPU2000 results do not depend on a video card and a hard drive. But we shall specify anyway (just for the record) that we used ATI Radeon X600 and Seagate Barracuda V SATA. Power supply: 460W power supply unit from FSP.

Linux Tests

As our tests take several days to perform and developers like to update their compilers very often, we decided to settle on a single set of versions and use new releases only in the next articles (especially as we know from our experience that the code performance is not changed much in minor releases).

So, we came up with the following set of well-known products:

  • GNU gcc 3.3.4 (32/64-bit versions included into OS)
  • PGI Workstation 5.2-4 (32/64-bit versions)
  • Pathscale EKO Compiler Suite 2.1-280 (64-bit version)
  • Intel Compilers 8.1 (C/C++ 8.1.030, Fortran 8.1.026)
  • Intel Compilers 8.1e for EM64T (C/C++/Fortran 8.1.026)

All these compilers have been known to our constant readers for a long time, so we shall not dwell on their descriptions.

gcc/g77 is a classic compiler in Linux systems. It goes without saying that it supports AMD64/EM64T. Efficiency interests developers only after the compatibility issue. The package does not include a Fortran 90 compiler, so it cannot be used to get complete results in tests with floating point operations.

The Portland Group product is rather popular, due to OpenMP and MPI support in particular. That's one of the first commercial compilers with AMD64 support. The latest available version is 6.0-2. It features PGO support, but it results in code execution speed drop in SPEC CPU2000 tests. But the research is in full swing and I hope that this problem will be solved in the next versions. As for now, we'll keep using Build 5.2.

Pathscale is a relatively new product, it has been initially created with 64-bit calculations on AMD64/EM64T architectures in mind. It's in the rapid development stage (its version has grown to 2.1 for a year). However, some versions have minor problems, e.g. not implemented library functions. We shall publish base results as well as peak results hors concours, because this application has no 32-bit versions. Note that we modified the base configuration file (included into the package) for peak runs – now it uses ACML 2.6.0 (well… funny… ACML for an Intel processor… let's see what will happen), 3DNow! is disabled in two tests.

Intel products have deserved their title of the best product from "the author of Pentium". Indeed, who knows all ins and outs of their internal architecture and can develop an effective compiler better than the author. Note that it's currently the only commercial compiler that supports SSE3 in Prescott core.

We traditionally don't try to squeeze maximum performance and use base metrics, we also use identical optimization keys for all tests if possible. Subtleties mostly have to do with the settings for porting code to various operating systems. We can e-mail configuration files to all interested readers. Here are the main optimization keys that we used:

  • Gcc: -O3 -funroll-all-loops -fprofile-arcs/-fbranch-probabilities;
  • PGI: -fastsse -Mipa=fast;
  • Pathscale: -Ofast -fb_create fbdata/-fb_opt fbdata;
  • Intel: -fast -prof_gen/-prof_use.

Attention: some of the results should again be taken as "estimated" (in terms of SPEC), because they are obtained on beta and in-house compiler versions. However, they probably won't be very different from the official results (at least I haven't recently come across such situations).

  Linux 32 Linux 64 Comparison
64 vs. 32 (%)
 
gcc
ic8.1
pgi5.2
gcc
ic8.1
pgi5.2
path2.1
path2.1
peak
gcc
ic8.1
pgi5.2
164.gzip
886
1152
881
1000
1216
950
1086
1082
12.9
5.6
7.9
175.vpr
1068
1223
989
1095
1180
1002
1052
1091
2.5
-3,5
1.3
176.gcc
1723
2102
1585
1704
1899
1514
1677
1676
-1,1
-9,7
-4,5
181.mcf
1499
1977
1395
780
925
755
776
1494
-48,0
-53,2
-45,9
186.crafty
1032
1347
832
1502
1625
1122
1351
1434
45.5
20.6
34.9
197.parser
1071
1457
895
1155
1219
831
1067
1204
7.8
-16,3
-7,2
252.eon
885
1873
221
1404
2251
266
1442
1500
58.6
20.2
20.4
253.perlbmk
1413
2103
1448
1575
2199
1387
1564
1729
11.5
4.6
-4,2
254.gap
1447
1944
1389
1554
1930
1374
1620
1609
7.4
-0,7
-1,1
255.vortex
1586
2560
1516
1896
2786
1621
2460
2597
19.6
8.8
7.0
256.bzip2
1075
1308
1098
1265
1413
1136
1232
1207
17.7
8.0
3.5
300.twolf
1454
1664
1469
1421
1662
1260
1295
1623
-2,3
-0,1
-14,2
SPECint_base2000
1231
1676
1038
1326
1614
1015
1330
1478
7.7
-3,7
-2,2
 
168.wupwise
1250
2768
1507
1238
3230
1698
2066
2244
-1,0
16.7
12.7
171.swim
1881
2585
2784
2001
2569
2758
2708
2704
6.4
-0,6
-0,9
172.mgrid
881
1612
1432
864
1854
1692
1324
1467
-2,0
15.0
18.2
173.applu
914
1583
1681
1006
1623
1764
1612
1611
10.1
2.5
5.0
177.mesa
913
1519
1102
1560
2044
1232
1506
1776
70.9
34.6
11.8
178.galgel
3461
3157
3544
3397
2535
3002
2.4
7.6
179.art
990
3629
1949
1828
5824
1819
3945
4665
84.7
60.5
-6,7
183.equake
2100
2141
1793
1951
2485
1996
2161
2083
-7,1
16.1
11.3
187.facerec
2078
1653
2210
2252
2378
2518
6.4
36.2
188.ammp
831
991
1130
1198
1426
1322
1099
1169
44.2
43.9
17.0
189.lucas
2232
1960
2245
2063
2179
2168
0.6
5.3
191.fma3d
1414
1452
1725
1624
1219
1227
22.0
11.9
200.sixtrack
343
645
618
480
634
661
559
531
40.0
-1,7
7.0
301.apsi
779
1336
1190
845
1363
1317
1186
1245
8.5
2.0
10.7
SPECfp_base2000
1814
1554
2076
1711
1707
1803
14.4
10.1

You can see from the CINT2000 tests that the total results would have been much better, if not for the significant 50% slump in 181.mcf. We know from previous tests that this test depends much on the speed of memory operations. And something probably goes wrong for the 64-bit code. Perhaps, it runs out of cache or its 64-bit mode peculiarities do not allow efficient operation. This assumption is also supported by the 181.mcf results for dual processor configurations.

Note that the switch to 64 bits in CINT2000 tests looks the most advantageous for the non-commercial gcc compiler. It's also not bad in CFP2000.

The product from Intel expectedly sticks to the highest results on processors from this company. Good news: it demonstrates no serious code execution speed drops in CFP2000. But the situation in CINT2000 is worse. Well, it's still the best in total score in these tests. What concerns some tests with real arithmetic, it was defeated in four such tests by products from Portland Group and Pathscale. Its defeat in 171.swim looks strange, because this test depends much on the speed of memory operations, and Intel should have been the best in this respect.

PGI still demonstrates low results in several CINT2000 tests, which rules it out from competition in total score. However, it's no champion in the other tests either, it cannot even catch up with gcc. It's better at real arithmetic, in general this compiler goes on a par with Pathscale, it even wins four tasks from Intel.

Pathscale EKO Compiler Suite, though quite new, competes well with such classic compilers as Intel and PGI. It stands between Intel and PGI in CINT2000, loses 8 out of 14 CFP2000 tests to PGI, and wins two of them from Intel. Though it's intended for 64-bit platforms, five tests in peak configuration use the –m32 key (including the ill-fated 181.mcf), and it means that not all tasks are good for the new configuration so far. By the way, ACML provides almost 20% gain in the galgel test.

What concerns switching to 64-bit on Intel platform in general, it probably makes sense for such tasks as CFP2000. There is some gain, but not that large – 14% and 10% gain for Intel and Portland Group compilers in SPECfp_base2000 may be important for some users. Especially as we'll lose nothing due to the complete compatibility with 32-bit code (according to our tests, the speed of 32-bit code under a 64-bit OS is practically no different from the performance under the native system).

Windows XP Pro x64 Edition

We also managed to test a couple of compilers under a recently released 64-bit version of Microsoft OS. We used the April PSDK (3790.1830) for 64-bit libraries. Regular Windows XP Pro SP2 was used as an opponent.

We used the following compilers:

  • Intel Compiler 8.1 (C 027, Fortran 030)
  • Compiler from Microsoft Visual Studio 2003 (13.10.3077)
  • Intel Compiler 8.1e for EM64T (C 018, Fortran 017)
  • 64-bit compiler from Microsoft PSDK (14.00.40310.41)

It should be noted that Portland Group also provided a beta version of its compiler for the x64 Windows version. But it was only Fortran (so far), and we didn't manage to get at least some results fast, so we'll have to wait for the release.

  Windows 32 bit Windows 64 bit Comparison 64 vs.64 vs. 32 (%)
 
ic8.1
msvc
ic8.1
msvc
ic8.1
msvc
164.gzip
1209
972
1213
1057
0.3
8.7
175.vpr
1248
1068
1233
1094
-1,2
2.4
176.gcc
2044
1921
1349
-6,0
181.mcf
2096
1574
2057
1000
-1,9
-36,5
186.crafty
1352
1149
1589
1417
17.5
23.3
197.parser
1498
1119
1411
786
-5,8
-29,8
252.eon
2265
1154
2487
1485
9.8
28.7
253.perlbmk
1898
1507
2069
1498
9.0
-0,6
254.gap
1942
1691
1915
1660
-1,4
-1,8
255.vortex
2913
1719
2881
1580
-1,1
-8,1
256.bzip2
1320
1152
1408
1209
6.7
5.0
300.twolf
1791
1418
1791
1448
0.0
2.1
SPECint_base2000
1737
1770
1271
1.9
 
168.wupwise
2798
3081
10.1
171.swim
2569
2534
-1,4
172.mgrid
1621
1862
14.9
173.applu
1596
1672
4.8
177.mesa
1588
869
2095
1535
31.9
76.7
178.galgel
3661
3662
0.0
179.art
4501
2086
6118
1969
35.9
-5,6
183.equake
2120
1937
2441
1785
15.1
-7,9
187.facerec
2017
2180
8.1
188.ammp
1352
1143
1298
903
-4,0
-21,0
189.lucas
2278
2235
-1,9
191.fma3d
1576
1721
9.2
200.sixtrack
651
642
-1,4
301.apsi
1358
1350
-0,6
SPECfp_base2000
1915
2069
8.0

It would be wrong to compare these results with previous tests under Windows, because we used an AMD processor that time. On the whole, Intel compiler's transition to 64 bits can be described as "it has become a tad better", but Microsoft results depend much on an application. In comparison with Intel, its fluctuations are noticeably higher both ways.

I'd like to note an interesting moment: 181.mcf test results don't drop compared to tests under Linux. Perhaps, it's the effect of a different memory operation model (reference size and int/long).

What concerns the Linux vs. Windows comparison, the results of Intel compilers are close, but their performance under Windows is still a tad higher. Especially if we compare 32-bit versions.

Bottom line

The development of 64-bit platforms takes its normal course. The launch of processors with EM64T technology has "suddenly" shown that all projects developed for AMD64 processors work well on their twin CPUs. The release of Windows x64 should accelerate this process, especially as the software support is already available at a decent level.

Under Linux, programmers can use the standard gcc compiler, if we don't take into consideration commercial projects on Fortran. But speaking of the performance race, you cannot do without commercial products. And Intel's compiler is an obvious and justified choice for processors from this company. New compiler versions from Portland Group and Pathscale may just as well compete with it in performance. But they are probably intended to run as part of high-performance clusters. Unfortunately, SPEC CPU2000 cannot measure it. So when you choose a compiler for "large-scale" systems, you should take into account not only its performance results, but also its support for modern technologies, program interfaces and standards.

Test results under Windows are contradictory. It's very difficult to forecast how the real programs will run under a new system. On the one hand, the situation with integer applications is not very bad (though it's just the luck in case of MSVC). On the other hand, this reason is not enough for the total upgrade of your hardware. CAD and similar complex applications will use Intel's compiler and be happy. But it's hard to imagine a game, written on Fortran and compiled in IC. Labour-intensive code fragments will most likely be written in 64-bit assembler (especially as the "right" software features all necessary code snippets in asm), and the other parts will be up to MSVC.


We express our gratitude to Novell
for the provided distribution disc of SuSE Linux


Kirill Kochetkov
(kochet@ixbt.com)
June 14, 2005.

Write a comment below. No registration needed!


Article navigation:



blog comments powered by Disqus

  Most Popular Reviews More    RSS  

AMD Phenom II X4 955, Phenom II X4 960T, Phenom II X6 1075T, and Intel Pentium G2120, Core i3-3220, Core i5-3330 Processors

Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
February 1, 2013 · Processor Roundups

Inno3D GeForce GTX 670 iChill, Inno3D GeForce GTX 660 Ti Graphics Cards

A couple of mid-range adapters with original cooling systems.
January 30, 2013 · Video cards: NVIDIA GPUs

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

AMD FX-8350 Processor

The first worthwhile Piledriver CPU.
September 11, 2012 · Processors: AMD

Consumed Power, Energy Consumption: Ivy Bridge vs. Sandy Bridge

Trying out the new method.
September 18, 2012 · Processors: Intel
  Latest Reviews More    RSS  

i3DSpeed, September 2013

Retested all graphics cards with the new drivers.
Oct 18, 2013 · 3Digests

i3DSpeed, August 2013

Added new benchmarks: BioShock Infinite and Metro: Last Light.
Sep 06, 2013 · 3Digests

i3DSpeed, July 2013

Added the test results of NVIDIA GeForce GTX 760 and AMD Radeon HD 7730.
Aug 05, 2013 · 3Digests

Gainward GeForce GTX 650 Ti BOOST 2GB Golden Sample Graphics Card

An excellent hybrid of GeForce GTX 650 Ti and GeForce GTX 660.
Jun 24, 2013 · Video cards: NVIDIA GPUs

i3DSpeed, May 2013

Added the test results of NVIDIA GeForce GTX 770/780.
Jun 03, 2013 · 3Digests
  Latest News More    RSS  

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  About us & Privacy policy  ·  Twitter  ·  Facebook


Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.