We already tried to evaluate the performance of a 64-bit processor from Intel in SPEC CPU2000. But it was the server Xeon. Though interesting for a mass of users, it's still intended for the other market.
The launch of the 600th series of Intel Pentium 4 sparked the following question in many people: "Whose 64 bits are better?" Unfortunately, such seemingly simple questions are not always easy to answer. Much depends on parameters, which are left off-camera – operating system, compiler, general system architecture, and certainly price.
Thus, a more generalized question would be more correct, to my mind: "Performance of the system based on hardware (H) under the operating system (O) and the compiler (C) at the task (T)". Of course, the 64-bit architecture formally rests with its classic "much memory to a single process" and a large set of wider registers. But SPEC CPU2000 does not care about the former, and the latter depends much on a compiler.
In this article we shall analyze performance of the Intel Pentium 4 660 processor under Linux and Windows. The former has been a 64-bit system for a long time already, this system is often used to solve heavy computing tasks. And Windows XP x64 has wormed its way here as a way to estimate what we can expect from 64 bits under this OS. To put it mildly, there are no special applications for this OS (just a special update of FarCry, but it's of no help either).
Intel Pentium 4 660
We have already published a detailed review of this model with popular applications under 32-bit OS. Our conclusions in brief: Pentium 660 goes on a par with Athlon 64 FX-55 as far as the generalized term "performance" is concerned.
As the tests of this processor have unexpectedly become so evocative to me, I'd like to share my feelings with our readers.
The first impression – it's really hot… Much has been written about this problem already, besides it's actually a problem of the cooling system, but… The stock cooler just hasn't managed this processor… Interestingly, I haven't noticed any signs of the coming problem. But TM2 technology snapped to action and the test results were much lower – performance losses amounted to 18%. The only way to learn whether the CPU throttles is to use special utilities like RMClock. So, a piece of advice to all concerned: pay close attention to the overheating problem.
In the next tests we replaced the cooler from Intel for Foxconn CMI-775-1N (which looked absolutely the same) and we ran all the tests simultaneously with RMClock. This cooler noticeably improved the situation – the CPU was not overheated, but the additional 80- and 120-mm fans were noisy.
We conducted a research and found Zalman 7700Cu (recommended by many authors), besides, we changed the motherboard from Albatron PX925XE Pro-R to ABIT Fatal1ty AA8XE, and Corsair XMS2 memory was replaced by more advanced XMS2 Pro.
And there was peace for some time. However, fan headers on the new motherboard were placed very inconveniently, so I decided to rely on my Zalman and didn't install additional fans. That was a mistake. When I compiled tests with three different compilers simultaneously, the system informed me that it was hot and that it would rather turn off…
Note that the results were noticeably lower with this hardware configuration. I tried to fix the problem in BIOS, but in vain. As a result, I got back the Albatron but retained the memory. Despite the formally matching timings, two tests demonstrated that the new memory was slower by 5-6%. As our objective in this article is not to break any record, we decided to stick to this configuration. But it left a gall in my mind.
As always, the new release (at least for our tests) of the popular distribution package from Novell produces a nice impression – convenient installation procedure, a good software bundle, Kernel 2.6, installers for two architectures (IA32 and AMD64/EM64T) on the same disc (it's a double layered DVD now, so a backup copy will be much more expensive). I think there is no point in mentioning the EM64T compatibility – SuSE has been supporting the 64-bit Intel architecture since April 2004. All tested compilers also work well with this operating system. Note that the manufacturer currently offers Version 9.3 of this product.
What concerns the above mentioned message, everything turned out rather simple – the OS is developed with regard to the correct standards and supports modern ACPI technologies. So, when the temperature exceeded 85 degrees, specified in BIOS, the system just powered off (having written the reason of its behaviour to the log).
Probably the only thing that didn't work "out-of-the-box" was temperature and fan monitoring, but it could have to do with BIOS and we had no plans on configuring additional monitoring packages.
So, the final tests were carried out under the following configuration:
Cooling: Zalman 7700Cu in tandem with 80- and 120-mm fans (operating at reduced rpm). As we already know, SPEC CPU2000 results do not depend on a video card and a hard drive. But we shall specify anyway (just for the record) that we used ATI Radeon X600 and Seagate Barracuda V SATA. Power supply: 460W power supply unit from FSP.
As our tests take several days to perform and developers like to update their compilers very often, we decided to settle on a single set of versions and use new releases only in the next articles (especially as we know from our experience that the code performance is not changed much in minor releases).
So, we came up with the following set of well-known products:
All these compilers have been known to our constant readers for a long time, so we shall not dwell on their descriptions.
gcc/g77 is a classic compiler in Linux systems. It goes without saying that it supports AMD64/EM64T. Efficiency interests developers only after the compatibility issue. The package does not include a Fortran 90 compiler, so it cannot be used to get complete results in tests with floating point operations.
The Portland Group product is rather popular, due to OpenMP and MPI support in particular. That's one of the first commercial compilers with AMD64 support. The latest available version is 6.0-2. It features PGO support, but it results in code execution speed drop in SPEC CPU2000 tests. But the research is in full swing and I hope that this problem will be solved in the next versions. As for now, we'll keep using Build 5.2.
Pathscale is a relatively new product, it has been initially created with 64-bit calculations on AMD64/EM64T architectures in mind. It's in the rapid development stage (its version has grown to 2.1 for a year). However, some versions have minor problems, e.g. not implemented library functions. We shall publish base results as well as peak results hors concours, because this application has no 32-bit versions. Note that we modified the base configuration file (included into the package) for peak runs – now it uses ACML 2.6.0 (well… funny… ACML for an Intel processor… let's see what will happen), 3DNow! is disabled in two tests.
Intel products have deserved their title of the best product from "the author of Pentium". Indeed, who knows all ins and outs of their internal architecture and can develop an effective compiler better than the author. Note that it's currently the only commercial compiler that supports SSE3 in Prescott core.
We traditionally don't try to squeeze maximum performance and use base metrics, we also use identical optimization keys for all tests if possible. Subtleties mostly have to do with the settings for porting code to various operating systems. We can e-mail configuration files to all interested readers. Here are the main optimization keys that we used:
Attention: some of the results should again be taken as "estimated" (in terms of SPEC), because they are obtained on beta and in-house compiler versions. However, they probably won't be very different from the official results (at least I haven't recently come across such situations).
You can see from the CINT2000 tests that the total results would have been much better, if not for the significant 50% slump in 181.mcf. We know from previous tests that this test depends much on the speed of memory operations. And something probably goes wrong for the 64-bit code. Perhaps, it runs out of cache or its 64-bit mode peculiarities do not allow efficient operation. This assumption is also supported by the 181.mcf results for dual processor configurations.
Note that the switch to 64 bits in CINT2000 tests looks the most advantageous for the non-commercial gcc compiler. It's also not bad in CFP2000.
The product from Intel expectedly sticks to the highest results on processors from this company. Good news: it demonstrates no serious code execution speed drops in CFP2000. But the situation in CINT2000 is worse. Well, it's still the best in total score in these tests. What concerns some tests with real arithmetic, it was defeated in four such tests by products from Portland Group and Pathscale. Its defeat in 171.swim looks strange, because this test depends much on the speed of memory operations, and Intel should have been the best in this respect.
PGI still demonstrates low results in several CINT2000 tests, which rules it out from competition in total score. However, it's no champion in the other tests either, it cannot even catch up with gcc. It's better at real arithmetic, in general this compiler goes on a par with Pathscale, it even wins four tasks from Intel.
Pathscale EKO Compiler Suite, though quite new, competes well with such classic compilers as Intel and PGI. It stands between Intel and PGI in CINT2000, loses 8 out of 14 CFP2000 tests to PGI, and wins two of them from Intel. Though it's intended for 64-bit platforms, five tests in peak configuration use the –m32 key (including the ill-fated 181.mcf), and it means that not all tasks are good for the new configuration so far. By the way, ACML provides almost 20% gain in the galgel test.
What concerns switching to 64-bit on Intel platform in general, it probably makes sense for such tasks as CFP2000. There is some gain, but not that large – 14% and 10% gain for Intel and Portland Group compilers in SPECfp_base2000 may be important for some users. Especially as we'll lose nothing due to the complete compatibility with 32-bit code (according to our tests, the speed of 32-bit code under a 64-bit OS is practically no different from the performance under the native system).
Windows XP Pro x64 Edition
We also managed to test a couple of compilers under a recently released 64-bit version of Microsoft OS. We used the April PSDK (3790.1830) for 64-bit libraries. Regular Windows XP Pro SP2 was used as an opponent.
We used the following compilers:
It should be noted that Portland Group also provided a beta version of its compiler for the x64 Windows version. But it was only Fortran (so far), and we didn't manage to get at least some results fast, so we'll have to wait for the release.
It would be wrong to compare these results with previous tests under Windows, because we used an AMD processor that time. On the whole, Intel compiler's transition to 64 bits can be described as "it has become a tad better", but Microsoft results depend much on an application. In comparison with Intel, its fluctuations are noticeably higher both ways.
I'd like to note an interesting moment: 181.mcf test results don't drop compared to tests under Linux. Perhaps, it's the effect of a different memory operation model (reference size and int/long).
What concerns the Linux vs. Windows comparison, the results of Intel compilers are close, but their performance under Windows is still a tad higher. Especially if we compare 32-bit versions.
The development of 64-bit platforms takes its normal course. The launch of processors with EM64T technology has "suddenly" shown that all projects developed for AMD64 processors work well on their twin CPUs. The release of Windows x64 should accelerate this process, especially as the software support is already available at a decent level.
Under Linux, programmers can use the standard gcc compiler, if we don't take into consideration commercial projects on Fortran. But speaking of the performance race, you cannot do without commercial products. And Intel's compiler is an obvious and justified choice for processors from this company. New compiler versions from Portland Group and Pathscale may just as well compete with it in performance. But they are probably intended to run as part of high-performance clusters. Unfortunately, SPEC CPU2000 cannot measure it. So when you choose a compiler for "large-scale" systems, you should take into account not only its performance results, but also its support for modern technologies, program interfaces and standards.
Test results under Windows are contradictory. It's very difficult to forecast how the real programs will run under a new system. On the one hand, the situation with integer applications is not very bad (though it's just the luck in case of MSVC). On the other hand, this reason is not enough for the total upgrade of your hardware. CAD and similar complex applications will use Intel's compiler and be happy. But it's hard to imagine a game, written on Fortran and compiled in IC. Labour-intensive code fragments will most likely be written in 64-bit assembler (especially as the "right" software features all necessary code snippets in asm), and the other parts will be up to MSVC.
We express our gratitude to Novell
for the provided distribution disc of SuSE Linux
Kirill Kochetkov (firstname.lastname@example.org)
June 14, 2005.
Write a comment below. No registration needed!