The appearance of 64-bit AMD64/EM64T processors from AMD and Intel is undoubtedly one of the most interesting events on the CPU market for the last five years. Of course, mass upgrade to the new platforms is not a thing of the nearest future, especially if we take into account Microsoft's deliberateness. Fortunately, the new processors are absolutely compatible with IA32 software we use today.
Which cannot be said about the IA64 (64-bit architecture from Intel), which had been announced in 2001, and its main product – the Itanium processor. We are not going to discuss the history of this series, its appearance and development. Let's only note that the IA64 was based on EPIC architecture and it was promised (as it often happens to new products) a brilliant future – marketing materials were gay with cliches "high availability, scalability and performance needed for high-end enterprise and technical computing applications". For the four years this processor has grown from the 733 MHz core with 4 MB L3 Cache and 2.1 GB/sec FSB to 1.6 GHz/9 MB/6.4 GB/sec. But still we can hardly say that every "high-end enterprise" has already bought a pile of Itaniums. Besides, in connection with the appearance of AMD64/EM64T products, many columnists would increasingly often translate the name of this processor as "Itanic".
It will be certainly very difficult to support two obviously competing architectures. However Intel is a grand corporation, and I think it will be successful :), especially if its products are professionally distributed among the markets.
Speaking of the markets. These days the customers are spoilt with the universal nature of IA32/x86. This architecture works in AMD Geode based thin clients as well as in 8-processor servers based on Intel Xeon MP. The efficiency of this architecture is quite another matter :). But the fact remains – IA32 is the most popular architecture today. And it's practically impossible to introduce significant changes into it.
But should be really change it?
Of course, when we speak of a John Dow who loves to play games on its Pentium in the evening and doesn't care a straw about L1 Cache associativity and a compiler used by game developers, it's very important that all his software on the 400 GB hard disk was supported. But when we abandon the entertainment level and look into the high-end enterprise sphere, the situation changes considerably. Efficient solution of predefined tasks becomes of paramount importance here.
In this situation a company doesn't care how many and what processors are installed in a new server, what OS is installed and who developed the database. What is really important is the combination of all these factors – the ready solution for a given problem. And of course the cost (of the solution!), reliability, support, and other "grown-up" notions are also taken into account.
It's all very fine if a company is that large and nice to afford to try this or that solution and then make a choice based on the maximum precise and detailed information.
The other companies have to use various resources, for example articles in magazines and online mass media, test results from manufacturers. You shouldn't pay attention only to conclusions in these articles, because the notion of performance (used in these articles and tests), though often objective, has no sense without additional refinements – cost, reliability, solution deployment time in particular.
Of course there are purely scientific notions in this field, for example "megaflops", however their beauty and simplicity is probably more interesting to those who deal with science (which is well known as "the best modern state-paid way to satisfy curiosity of several persons" (C) :)).
All the rest have to take into account many other factors, both objective (e.g. the cost) and subjective (prospects). There follow their endless combinations :) (provider choice, based on objective costs and subjective reliability).
Of course, for most users of home PCs everything is usually very simple – they define a sum of money that they may/must/can spend on a new device and then choose a maximum fast option (everyone has his/her own definition of "fast", though).
Large-scale solutions have two peculiarities of their performance evaluation. Firstly, to find the optimum one should try to connect performance with profits, which is not always easy to do. Secondly, one can use the notion of sufficient performance, which is allergenic to overclockers. For example, users of a video observation system do not need to know what resources are used to digitize data from 16 video cameras, their main concern is serviceability of the system. More general problems (for example "a file-server for 1000 clients") will require several models.
One of the modeling options is implemented in SPEC tests. In this case we mean the scientific field. We have already written a lot of articles about this issue :), so we are not going to repeat ourselves but try to evaluate how attractive IA64 is for calculations based on SPEC CPU2000 results.
The following system was used in the tests
It's certainly no 16-processor Bull NovaScale 6320 HPC, but according to our experience, this system will do fine to evaluate the performance of calculations and forecasts.
The configuration file for IA64 was based on the file provided by Dell for www.spec.org.
At first, let's provide integral results of the system in IA64 mode:
Is it good or bad? In integer calculations, Itanium 2 is on a par with a 2.8 GHz Pentium 4 processor (without regard to platform, compiler versions, etc), but in real numbers it outscores even Pentium 4 570J (3.8 GHz).
An additional detailed analysis would be practically useless because of the differences between the architectures. We can only note that in some applications the dispersion may reach ±40% for CINT2000, and even higher for CFP2000: -40%...+90% (of course 179.art caused a stir due to its 3 MB cache, it added 217%, even if it's not included into the integral evaluation, the result of our Itanium 2 is just on a par with P4 3.8).
So one can say that the processor under review showed itself to advantage in floating point calculations. Considering that we tested not the fastest model, IA64 may reach higher performance. Maximal result for a compiler from Intel, published at www.spec.org, is 2712 points (SPECfp_base2000) in Itanium 2 1.6 GHz 9 MB L2 (under RHAS, no one performs tests under Windows 2003 AS for some reason). Thus, it outscores all desktop processors and if you need ultimate performance, there is no alternative (by the way, it concerns all existing systems, it should be noted though that IBM Power5 is only a little behind, but it has 36 MB L3 Cache :)). It's also a definite leader in multi-processor configurations.
The situation is a tad worse with integer calculations. In single CPU mode, the fastest Itanium 2 is outscored by the current leader (AMD Athlon 64 FX-55) approximately by 10%. But in multi-processor configurations it takes revenge and becomes a winner again.
A fly in the ointment is the high price for Itanium2 based systems. While you can buy a dual processor system based on AMD Opteron at a reasonable price, a workstation based on Itanium2 will be rather hard to get. Interestingly, HP removed information about workstations on Itanium from its web site (servers are certainly left in place), but you can still find Itanium in META tags of the HTML code :). Yes, even Intel seems not to believe in such positioning of this processor, because the list of partners published on the official web site turns out wrong. None of the 16 companies listed has a workstation on Itanium 2. Only servers are available. What concerns workstations – Opteron and Xeon everywhere, sometimes Pentium 4 and PowerPC (there was such a nice start...).
So, we have examined the native operating mode of Itanium 2. Now let's try to evaluate its capacities for running 32-bit applications (IA32). Of course, under normal conditions no one will actually use such expensive computers for calculations in 32-bit mode, but it's still interesting to find out what we can expect from 32-bit software.
In practice, Itanium 2 can execute 32-bit code by default, including MMX and SSE instructions (that is it's compatible with Pentium III). Regular applications, which we usually use under a usual 32-bit operating system work quite well (we didn't even try 16 bit Windows and MS-DOS... but it would have been fun to launch Norton Commander on Itanium 2 :)).
In the second half of 2003 Intel announced a software layer for IA64 systems, which raises performance of 32-bit code execution – IA-32 Execution Layer. They promise the performance level of Xeon 1.5 GHz. IA32EL uses dynamic translation of 32-bit code into the native code of IA64 architecture. Except for the increased performance, there was added support for the SSE2 set instructions (interestingly, even the CPU_ID instruction is emulated, so Intel compiler 8.x takes this processor for a friendly one and allows Northwood optimization). At present this software can be downloaded for free at the Microsoft web site for operating systems from this corporation and comes shipped with many IA64 Linux distributions (according to Intel).
So, if you don't use IA32EL, Itanium 2 results in SPEC CPU2000 tests for 32-bit code are contemptible... SPECint_base2000/SPECfp_base2000=306/170, which corresponds to Pentium III 700/450 MHz. Yes, emulating the most popular instruction set is hard... On the other hand, the clock of the processor under review is only two/four times as high, so it seems not that bad. The heritage of the past costs dear.
But IA32EL usage considerably raises the spirits: SPECint_base2000/SPECfp_base2000=569/530. At least something, approximately on the level of Pentium 4 1.7 GHz, Athlon XP 1600+/ Pentium 4 1.7 GHz, Athlon XP 1700+. The company seems to have redeemed its promise to be up to Xeon 1.5 GHz. So IA32EL usage is recommended for those who need to run 32-bit applications on Itanium 2. Even if it's just Far or pkzip :). The general performance rise in IA32 applications with IA32EL amounts for 20-410% (a little contribution is made by the SSE2 support).
Intel Itanium 2 platform tests demonstrated that this processor deservedly takes the leading performance positions for real number operations among all modern processors.
Unfortunately, we had no opportunity to test 128-processor systems, but the similarity of our results to the official data allows to trust the figures published at www.spec.org for SGI Altix 3700 and HP Integrity Superdome. However, we shouldn't forget that such high performance costs dear, like everything topnotch, though :).
IA32 code compatibility mode in this processor works well. So your favourite programs like office and notepad will work (and of course third-party monitoring and control systems mandatory for servers). But if the required software should do intensive calculations, it's better to port it to IA64. As a last resort, you can use IA32EL as a temporary solution to the performance problem.
Kirill Kochetkov (email@example.com)
February 15, 2005
Write a comment below. No registration needed!