iXBT Labs - Computer Hardware in Detail

Platform

Video

Multimedia

Mobile

Other

SPEC CPU2000. Part 20. Intel C++/Fortran Compiler 9.0, Intel Pentium 4 670, Pentium M 770 and AMD Athlon 64 FX-57

September 5, 2005



In mid June Intel released Version 9 of its C++ and Fortran compilers. The new version of compilers is not principally different from previous Version 8.1. Its main features are compiler integration for IA-32, IA-64, and EM64T (x86-64) platforms into a unified package and additional options for processors with Hyper-Threading and multi-core processors as far as code optimizations are concerned. In particular, Software-based Speculative Pre-Computation (SSP).

In this article we shall analyze how fast the new version of compilers is compared to the previous version on top (or almost top) single core processors — both from Intel (Pentium 4 and Pentium M) as well as... from AMD (Athlon 64 FX-57 — of course, with some code adjustments, see below).

We used the following compilers:

  • Intel(R) C++ Compiler for 32-bit applications, Version 9.0 Build 20050624Z Package ID: W_CC_C_9.0.020
  • Intel(R) Fortran Compiler for 32-bit applications, Version 9.0 Build 20050624Z Package ID: W_FC_C_9.0.019

As a reference, we used the test code compiled by Intel C++ Compiler 8.1.022 and Intel Fortran Compiler 8.1.025.

As usual, we used identical general compilation keys in all cases (Compilers 8.1 and 9.0, different code optimizations):

PASS1_CFLAGS= -Qipo -O3 -Qprof_gen
PASS2_CFLAGS= -Qipo -O3 -Qprof_use

Pentium 4 670

Let's start with the results of the "native" processor — Pentium 4 670 (3.8 GHz) with Prescott core, which supports all necessary instruction sets and allows to execute code compiled with all possible specific optimization keys: -QxK, -QxW, -QxN, -QxB, and -QxP.


  No Opt. -QxK -QxW -QxN -QxB -QxP
ic8.1 ic9.0 ic8.1 ic9.0 ic8.1 ic9.0 ic8.1 ic9.0 ic8.1 ic9.0 ic8.1 ic9.0
164.gzip
1150
1130
(-1.7%)
1253
1239
(-1.1%)
1255
1248
(-0.6%)
1265
1251
(-1.1%)
-
1247
1267
1241
(-2.1%)
175.vpr
x
x
1207
1201
(-0.5%)
1290
1283
(-0.5%)
1288
1272
(-1.2%)
-
1255
1286
1270
(-1.2%)
176.gcc
x
x
2142
2119
(-1.1%)
2132
2122
(-0.5%)
2146
2125
(-1.0%)
-
2116
2155
2116
(-1.8%)
181.mcf
1595
1594
(-0.1%)
1599
1599
(0.0%)
1598
1600
(0.1%)
2125
2125
(0.0%)
-
2113
2131
2115
(-0.8%)
186.crafty
1251
1260
(0.7%)
1272
1285
(1.0%)
1371
1398
(2.0%)
1375
1406
(2.3%)
-
1387
1387
1389
(0.1%)
197.parser
1553
1030
(-33.7%)
1562
1031
(-34.0%)
1562
1026
(-34.3%)
1560
1025
(-34.3%)
-
1019
1560
1031
(-33.9%)
252.eon
1640
1762
(7.4%)
1795
1836
(2.3%)
2188
2153
(-1.6%)
2391
2360
(-1.3%)
-
2101
2359
2320
(-1.7%)
253.perlbmk
1997
2021
(1.2%)
1954
2015
(3.1%)
1923
2012
(4.6%)
1940
1991
(2.6%)
-
2018
1947
2006
(3.0%)
254.gap
2033
2110
(3.8%)
1936
1990
(2.8%)
2019
2035
(0.8%)
2022
2061
(1.9%)
-
2029
2032
2049
(0.8%)
255.vortex
2876
2941
(2.3%)
2871
2971
(3.5%)
2869
2970
(3.5%)
2854
2970
(4.1%)
-
2852
2833
2948
(4.1%)
256.bzip2
1423
1428
(0.4%)
1390
1399
(0.6%)
1378
1372
(-0.4%)
1360
1348
(-0.9%)
-
1354
1372
1415
(3.1%)
300.twolf
1867
1526
(-18.3%)
1840
1880
(2.2%)
1859
1898
(2.1%)
1865
1910
(2.4%)
-
1879
1869
1908
(2.1%)
SPECint_base2000
1682
1604
(-4.6%)
1682
1642
(-2.4%)
1734
1687
(-2.7%)
1790
1739
(-2.8%)
-
1708
1792
1739
(-3.0%)
 
168.wupwise
1882
1843
(-2.1%)
2031
2074
(2.1%)
2235
2304
(3.1%)
2198
1735
(-21.1%)
-
1762
2860
2914
(1.9%)
171.swim
2089
2088
(0.0%)
2362
2544
(7.7%)
2524
2596
(2.9%)
2525
2595
(2.8%)
-
2553
2526
2595
(2.7%)
172.mgrid
1022
1023
(0.1%)
1237
1216
(-1.7%)
1518
1511
(-0.5%)
1674
1661
(-0.8%)
-
1306
1675
1661
(-0.8%)
173.applu
1419
1438
(1.3%)
1404
1414
(0.7%)
1481
1472
(-0.6%)
1655
1670
(0.9%)
-
1555
1638
1691
(3.2%)
177.mesa
1399
1371
(-2.0%)
1496
1476
(-1.3%)
1666
1669
(0.2%)
1662
1668
(0.4%)
-
1574
1659
1653
(-0.4%)
178.galgel
1445
1440
(-0.3%)
3036
3119
(2.7%)
3581
3637
(1.6%)
3564
3866
(8.5%)
-
3626
3603
3889
(7.9%)
179.art
2716
2356
(-13.3%)
2370
2393
(1.0%)
2918
2613
(-10.5%)
2987
2655
(-11.1%)
-
2524
4648
4597
(-1.1%)
183.equake
2074
2105
(1.5%)
2143
2118
(-1.2%)
2155
2154
(0.0%)
2158
2148
(-0.5%)
-
2092
2156
2420
(12.2%)
187.facerec
1736
1773
(2.1%)
2035
2148
(5.6%)
2049
2151
(5.0%)
2037
2165
(6.3%)
-
2114
2075
2179
(5.0%)
188.ammp
1305
1226
(-6.1%)
1240
1213
(-2.2%)
1365
1345
(-1.5%)
1371
1346
(-1.8%)
-
1210
1369
1346
(-1.7%)
189.lucas
2109
2101
(-0.4%)
2007
2025
(0.9%)
2285
2320
(1.5%)
2279
2331
(2.3%)
-
1984
2302
2306
(0.2%)
191.fma3d
1316
1342
(2.0%)
1291
1342
(4.0%)
1600
1648
(3.0%)
1581
1683
(6.5%)
-
1371
1606
1646
(2.5%)
200.sixtrack
604
606
(0.3%)
597
605
(1.3%)
678
754
(11.2%)
679
746
(9.9%)
-
621
683
748
(9.5%)
301.apsi
1309
1277
(-2.4%)
1317
1301
(-1.2%)
1386
1370
(-1.2%)
1408
1357
(-3.6%)
-
1300
1410
1357
(-3.8%)
SPECfp_base2000
1511
1489
(-1.5%)
1636
1657
(1.3%)
1826
1842
(0.9%)
1854
1845
(-0.5%)
-
1690
1956
2007
(2.6%)

But nevertheless, we shall start with a non-optimized variant. Let's note an important moment: this code version, compiled both by the previous and the new compiler versions, caused errors in 175.vpr and 176.gcc sub-tests — irregardless of the processor type. That's why we used the --noreportable key to start the tests to ignore errors in some sub-tests (--ignore_errors). Integer tests. The new version demonstrates advantage in some sub-tests (252.eon, 253.perlbmk, 254.gap, 255.vortex), which is impossible to compensate by a significant performance drop in 197.parser (about 34%!) as well as 300.twolf. As a result, the total score in SPECint_base2000 = 1604, it's lower by 4.6% than the score obtained in Version 8.1 (1682). The new version demonstrates only a minor performance advantage in some tests with real numbers, but there are noticeable performance drops in some sub-tests (13.3% in 179.art). As a result, the total score in SPECfp_base2000 (1489) is lower by 1.5% than the result obtained with the previous version (1511).

The next optimization variant that uses SSE instructions (-QxK). The situation in integer tests is similar — insignificant advantage of the new version in some sub-tests and the 1.5-fold performance drop in 197.parser. Nevertheless, 300.twolf in this case is notable for better performance (2.2%). The integral score is lower approximately by 2.5% compared to Version 8.1. The situation in floating point tests is different — performance of most tasks grows when we switch to Version 9.0, the maximum gain can be seen in 171.swim (7.7%) and 187.facerec (5.6%) sub-tests. The integral score in SPECfp_base2000 is higher by 1.3% than in the previous version.

What concerns the rest of the code optimization variants (-QxW, -QxN, and -QxP), the situation in integer tests is similar to the -QxK variant: we can still see the 1.5-fold performance drop in 197.parser, resulting in a lower integral score in SPECint_base2000. There are some differences between these optimization variants in floating point tests — in the integral score as well as in some sub-tests. For example, SSE2/Willamette (-QxW) demonstrates a noticeable performance gain in 200.sixtrack (11.2%) and 187.facerec (5.0%) with the significant performance drop in 179.art (-10.5%). The new version wins just 0.9% in SPECfp_base2000. On the contrary, SSE2/Northwood (-QxN) is outperformed by the previous version in total score (by 0.5%), due to a significant performance drop in 168.wupwise (-21.1%) and 179.art (-11.1%), accompanied by some performance gain in a number of sub-tests (178.galgel, 187.facerec, 191.fma3d, and 200.sixtrack). And finally, the native variant for Prescott SSE3 (-QxP) wins 2.6% in total score due to the performance gain in 178.galgel (7.9%), 183.equake (12.2%), 187.facerec (5.0%), and 200.sixtrack (9.5%), accompanied by a nearly imperceptible drop in execution speed of few other sub-tests (maximum — 3.8% in 300.aspi).

Absolute performance in integer as well as real tasks on the whole (according to the integral readings) grows in the row -QxK < -QxB < -QxW < -QxN < -QxP, which is reasonable for Prescott core.

Pentium M 770

We proceed to the second "nearly flagship" from Intel — Pentium M 770 processor with Dothan core 2.13 GHz. Tests with this processor were carried out on a desktop-mobile system — DFI 855GME-MGF motherboard with not the fastest Intel 855GM chipset, to be more exact — not the fastest memory system (single channel DDR-333).


  No Opt. -QxK -QxW -QxN -QxB
ic8.1 ic9.0 ic8.1 ic9.0 ic8.1 ic9.0 ic8.1 ic9.0 ic8.1 ic9.0
164.gzip
1143
1091
(-4.5%)
1248
1245
(-0.2%)
1236
1238
(0.2%)
1247
1246
(-0.1%)
-
1251
175.vpr
x
x
1321
1316
(-0.4%)
1367
1381
(1.0%)
1364
1377
(1.0%)
-
1361
176.gcc
x
x
1822
1805
(-0.9%)
1805
1803
(-0.1%)
1825
1806
(-1.0%)
-
1814
181.mcf
1042
1059
(1.6%)
1054
1052
(-0.2%)
1051
1047
(-0.4%)
1504
1507
(0.2%)
-
1507
186.crafty
1320
1303
(-1.3%)
1312
1313
(0.1%)
1455
1460
(0.3%)
1455
1456
(0.1%)
-
1631
197.parser
1381
1004
(-27.3%)
1392
1002
(-28.0%)
1392
990
(-28.9%)
1388
1008
(-27.4%)
-
1001
252.eon
1589
1736
(9.3%)
1688
1668
(-1.2%)
1922
1930
(0.4%)
2096
2066
(-1.4%)
-
2127
253.perlbmk
1724
1716
(-0.5%)
1736
1755
(1.1%)
1750
1775
(1.4%)
1752
1760
(0.5%)
-
1811
254.gap
1163
1282
(10.2%)
1151
1168
(1.5%)
1280
1302
(1.7%)
1282
1298
(1.2%)
-
1337
255.vortex
2456
2484
(1.1%)
2492
2497
(0.2%)
2466
2492
(1.1%)
2491
2488
(-0.1%)
-
2482
256.bzip2
1225
1238
(1.1%)
1156
1178
(1.9%)
1196
1176
(-1.7%)
1192
1178
(-1.2%)
-
1205
300.twolf
2102
1823
(-13.3%)
2111
2149
(1.8%)
2223
2252
(1.3%)
2220
2256
(1.6%)
-
2241
SPECint_base2000
1459
1416
(-2.9%)
1489
1453
(-2.4%)
1544
1507
(-2.4%)
1605
1564
(-2.6%)
-
1591
 
168.wupwise
1249
1264
(1.2%)
1327
1356
(2.2%)
1133
1145
(1.1%)
1149
1045
(-9.1%)
-
1285
171.swim
713
722
(1.3%)
854
782
(-8.4%)
841
822
(-2.3%)
845
821
(-2.8%)
-
821
172.mgrid
777
786
(1.2%)
835
839
(0.5%)
817
829
(1.5%)
818
820
(0.2%)
-
842
173.applu
612
617
(0.8%)
631
638
(1.1%)
611
608
(-0.5%)
701
703
(0.3%)
-
729
177.mesa
898
906
(0.9%)
1379
1506
(9.2%)
1578
1570
(-0.5%)
1579
1552
(-1.7%)
-
1651
178.galgel
1753
1694
(-3.4%)
2499
2503
(0.2%)
2224
2237
(0.6%)
2218
2428
(9.5%)
-
2803
179.art
2600
2495
(-4.0%)
2388
2360
(-1.2%)
2472
2437
(-1.4%)
2645
2575
(-2.6%)
-
2634
183.equake
888
906
(2.0%)
905
901
(-0.4%)
898
898
(0.0%)
900
899
(-0.1%)
-
900
187.facerec
1165
1156
(-0.8%)
1244
1274
(2.4%)
1237
1275
(3.1%)
1252
1273
(1.7%)
-
1268
188.ammp
1019
980
(-3.8%)
983
968
(-1.5%)
922
905
(-1.8%)
904
891
(-1.4%)
-
963
189.lucas
799
809
(1.3%)
793
791
(-0.3%)
891
899
(0.9%)
895
898
(0.3%)
-
897
191.fma3d
808
821
(1.6%)
801
812
(1.4%)
829
840
(1.3%)
839
853
(1.7%)
-
845
200.sixtrack
542
540
(-0.4%)
533
513
(-3.8%)
464
474
(2.2%)
452
475
(5.1%)
-
528
301.apsi
916
903
(-1.4%)
916
913
(-0.3%)
851
853
(0.2%)
856
846
(-1.2%)
-
902
SPECfp_base2000
963
960
(-0.3%)
1038
1038
(0.0%)
1015
1018
(0.3%)
1031
1030
(-0.1%)
-
1085

Integer tests without code optimizations: the new version demonstrates the highest gain in 254.gap (~10%), the lowest drop — in 197.parser again (it's a tad smaller in comparison with Pentium 4 — about 27%). At an average, the total score in SPECint_base2000 is lower than in the previous version by 3%. Floating point tests demonstrate a little spread in values — both upward and downward. But according to the integral score, the execution speed of the code, compiled in ICC/IFC 8.1 and 9.0, is practically identical. Surprisingly, the absolute results in some sub-tests and the total score in SPECfp_base2000 are too low in comparison with the Pentium 4 results, but integer test results are only a tad lower. It probably has to do with these tests being critical to memory bandwidth, which is much lower in case of a system based on Pentium M with single channel DDR-333 (2.67 GB/s versus 6.4 GB/s). It certainly has nothing to do with FPU, which is not only no worse in Pentium M than in Pentium 4, but rather much better.

Optimization keys (this processor allows -QxK, -QxW, -QxN, and -QxB) don't change the situation significantly, except for the increased overall performance (which grows exactly in the above mentioned row, that is the native code optimization for Banias core turns out the best for Dothan core as well.) Integer tests still demonstrate a tad lower results (approximately by 2.5%) than in the previous version (due to a noticeably reduced performance in 197.parser and the lack of noticeable gain in other sub-tests), while the tests with real numbers are practically equal to it in performance. But the latter effect is again achieved due to a compensating spread in results, both upwards and downwards, (especially prominent in case of -QxK and -QxN — up to 10% in some sub-tests), rather than by their complete identity.

Athlon 64 FX-57

The most interesting thing is reserved for the end of the article. Test results of Intel C++/Fortran Compiler 8.1/9.0 on the latest single core processor from the competitor — AMD Athlon 64 FX-57. You may wonder how we have done it. It very simple. All it has taken us is to study the algorithm of the processor type check in an application, compiled by Intel compilers. Here is how it looks like:

1. Vendor String validation for "GenuineIntel";

2. Detecting a processor model type (Pentium III/Pentium M — Model 6, or Pentium 4/Xeon — Model 15);

3. Determining the availability of necessary extended instruction sets (SSE, SSE2, SSE3).

Judging from this algorithm it's clear that all you should do is to remove Check #1 to make AMD processors execute the code, compiled in Intel C++/Fortran Compiler — given that the processor supports necessary instruction sets. It has to do with Intel and AMD processors having matching model numbers: Model 6 corresponds to AMD K7 processors (most of them support SSE), while Model 15 — AMD K8 processors (supporting SSE, SSE2, and their latest E core revision also supports SSE3). However, even if there had been no match, we could have just as well removed Check #2. In that case operability of applications would have depended solely on the lack/presence of necessary extensions in a processor.

Binary files can be corrected manually, but we have written a small utility — ICC Patcher (you can download it here). It scans a binary file for suspicious GenuineIntel validations and replaces them with NOPs. This utility can patch not only compiled executables, but also source libraries in Intel C++/Fortran Compiler, including those for EM64T. In this case, compiled applications would always run on processors both from Intel and AMD. I repeat that this patching is not "rude". For example, the code, compiled with the -QxP key, would run only on AMD Athlon64/Opteron processors, Core Revision E, and will pop up a warning that it cannot be executed on earlier core revisions and AMD K7 processors.

Let's proceed to test results. In order to save time, we decided not to recompile all test sources with "correct" Intel libraries, but to patch the existing binaries. Thus, we set the check_md5=0 option in config files of the tests, because patching executables changes their control sum.


  No Opt. -QxK -QxW -QxN -QxB -QxP
ic8.1 ic9.0 ic8.1 ic9.0 ic8.1 ic9.0 ic8.1 ic9.0 ic8.1 ic9.0 ic8.1 ic9.0
164.gzip
1437
1363
(-5.1%)
1568
1571
(0.2%)
1546
1546
(0.0%)
1566
1540
(-1.7%)
-
1584
1574
1558
(-1.0%)
175.vpr
x
x
1429
1406
(-1.6%)
1515
1510
(-0.3%)
1516
1503
(-0.9%)
-
1483
1514
1486
(-1.8%)
176.gcc
x
x
2178
2184
(0.3%)
2161
2173
(0.6%)
2182
2143
(-1.8%)
-
2192
2199
2158
(-1.9%)
181.mcf
1149
1150
(0.1%)
1153
1149
(-0.3%)
1152
1148
(-0.3%)
1498
1500
(0.1%)
-
1501
1506
1505
(-0.1%)
186.crafty
1892
1877
(-0.8%)
1903
1921
(0.9%)
1952
1945
(-0.4%)
1935
1939
(0.2%)
-
2011
2011
1992
(-0.9%)
197.parser
1733
1257
(-27.5%)
1773
1275
(-28.1%)
1754
1253
(-28.6%)
1766
1267
(-28.3%)
-
1256
1764
1251
(-29.1%)
252.eon
2216
2622
(18.3%)
2463
2410
(-2.2%)
2973
2901
(-2.4%)
3220
3124
(-3.0%)
-
3176
3177
3133
(-1.4%)
253.perlbmk
2105
2104
(0.0%)
2093
2121
(1.3%)
2123
2148
(1.2%)
2142
2132
(-0.5%)
-
2209
2137
2250
(5.3%)
254.gap
1858
1869
(0.6%)
1889
1910
(1.1%)
1960
1999
(2.0%)
1974
1968
(-0.3%)
-
1990
1990
1952
(-1.9%)
255.vortex
2875
2799
(-2.6%)
2823
2829
(0.2%)
2797
2719
(-2.8%)
2856
2881
(0.9%)
-
2797
2835
2902
(2.4%)
256.bzip2
1480
1514
(2.3%)
1462
1460
(-0.1%)
1431
1437
(0.4%)
1433
1430
(-0.2%)
-
1442
1451
1445
(-0.4%)
300.twolf
1934
1777
(-8.1%)
1940
1939
(-0.1%)
1958
1950
(-0.4%)
1959
1962
(0.2%)
-
1944
1947
1953
(0.3%)
SPECint_base2000
1814
1761
(-2.9%)
1837
1787
(-2.7%)
1879
1823
(-3.0%)
1943
1879
(-3.3%)
-
1894
1950
1893
(-2.9%)
 
168.wupwise
2121
2131
(0.5%)
2166
2200
(1.6%)
2128
2174
(2.2%)
2456
2085
(-15.1%)
-
2197
2385
2366
(-0.8%)
171.swim
1448
1448
(0.0%)
2130
1944
(-8.7%)
2136
2110
(-1.2%)
2138
2110
(-1.3%)
-
2118
2134
2111
(-1.1%)
172.mgrid
1231
1244
(1.1%)
1330
1471
(10.6%)
1432
1463
(2.2%)
1458
1554
(6.6%)
-
1486
1418
1566
(10.4%)
173.applu
1230
1251
(1.7%)
1224
1243
(1.6%)
1205
1196
(-0.7%)
1530
1498
(-2.1%)
-
1530
1538
1513
(-1.6%)
177.mesa
1569
1587
(1.1%)
1893
1939
(2.4%)
2075
2046
(-1.4%)
2072
2075
(0.1%)
-
2018
2077
2046
(-1.5%)
178.galgel
2080
2056
(-1.2%)
2437
2459
(0.9%)
2495
2464
(-1.2%)
2445
2928
(19.8%)
-
2980
2475
2915
(17.8%)
179.art
1798
1804
(0.3%)
1785
1811
(1.5%)
1844
1839
(-0.3%)
1852
1847
(-0.3%)
-
1839
2686
2910
(8.3%)
183.equake
1657
1680
(1.4%)
1678
1669
(-0.5%)
1685
1680
(-0.3%)
1674
1671
(-0.2%)
-
1693
1679
1788
(6.5%)
187.facerec
1862
1722
(-7.5%)
1896
2024
(6.8%)
1902
2030
(6.7%)
1955
2036
(4.1%)
-
1989
1963
2001
(1.9%)
188.ammp
1390
1331
(-4.2%)
1333
1298
(-2.6%)
1319
1299
(-1.5%)
1276
1277
(0.1%)
-
1285
1298
1301
(0.2%)
189.lucas
1615
1624
(0.6%)
1570
1570
(0.0%)
1727
1734
(0.4%)
1729
1724
(-0.3%)
-
1722
1730
1723
(-0.4%)
191.fma3d
1525
1537
(0.8%)
1462
1483
(1.4%)
1566
1564
(-0.1%)
1593
1607
(0.9%)
-
1570
1614
1630
(1.0%)
200.sixtrack
779
778
(-0.1%)
781
791
(1.3%)
757
779
(2.9%)
750
779
(3.9%)
-
820
748
793
(6.0%)
301.apsi
1493
1456
(2.5%)
1475
1484
(0.6%)
1484
1492
(0.5%)
1519
1471
(-3.2%)
-
1474
1510
1464
(-3.0%)
SPECfp_base2000
1515
1506
(-0.6%)
1596
1613
(1.1%)
1633
1642
(0.6%)
1681
1693
(0.7%)
-
1698
1725
1776
(3.0%)

Non-optimized code: 197.parser is noticeably slower in integer tests on this processor as well (27.3% — the same result was obtained for Pentium M). The same concerns 300.twolf (13.3%), which is compensated to some extent by the breakaway in 252.eon (9.3%) and 254.gap (10.2%) tasks. The total score in SPECint_base2000 is lower than in the previous compiler version approximately by 3%, which again reminds of Pentium M test results. Floating point test results are again close to those demonstrated by the previous version, again due to the self-compensating spread in results rather than by the same performance in sub-tests. As a result, the total score in SPECfp_base2000 is just 0.6% low compared to the code compiled in ICC/IFC 8.1.

Optimized variants of integer tests make no noticeable difference in the picture we got on other processors. Namely, the noticeable lag of 197.parser (27-28%) remains, while there is no breakaway in some sub-tests at all (as an exception, we can note the 253.perlbmk task, compiled with -QxP, which demonstrates 5.3% performance gain). The 197.parser lag conditions the 3% drop in the total score in SPECint_base2000 in all cases. What concerns the absolute performance values, they grow in the row -QxK < -QxW < -QxN < -QxP < -QxB. That is the best (not much though, only in some tests and the total score) optimization is for Banias core. Thus, such a result is not at all outstanding, considering that AMD K8 architecture is similar to Intel Pentium III/Pentium M, not to Pentium 4 (NetBurst).

Let's proceed to optimized SPECfp code. Like Intel processors, Athlon 64 FX-57 always demonstrates performance gain when the new compiler version is used. The relative gain value varies (it depends on an optimization type) as well as methods to obtain it. For example, SSE variant (-QxK) demonstrates a noticeable 8.7% drop in 171.swim (note that the Pentium 4 processor gained in this task), while 172.mgrid gains 10.6% and 187.facerec gains 6.8%, the total score in SPECfp_base2000 being 1.1%. In the old SSE2 variant for Willamette core (-QxW, which can run on AMD K8 even without patching), the obvious leadership is retained only in 187.facerec (6.7%), the overall advantage is just 0.6%. The new SSE2 variant for Northwood core differs by a small increase in SPECfp_base2000 (0.7%). But the spread in values is noticeable in some sub-tests (-15.1%(!) in 168.wupwise, +6.6% in 172.mgrid, and +19.8% in 178.galgel). And finally, the best optimization for SSE3 (Prescott core, -QxP) is characterized by almost complete lack of a performance drop (we should just mention the 3% drop in 301.aspi) and a considerable performance increase in a number of tasks (172.mgrid - 10.4%, 178.galgel - 17.8%, 179.art - 8.3%, 183.equake - 6.5%). As a result, the total score in SPECfp_base2000 is higher than in the previous version by 3%. What concerns code efficiency, we have already noted that it's the highest in case of SSE3. Then goes SSE2 for Banias core (-QxB), which again does not contradict to our idea of the AMD K8 architecture, followed by -QxN, -QxW, and -QxK.

Conclusions

The new Intel C++/Fortran Compiler 9.0 demonstrates an ambiguous picture in its "typical" code compilation (we mean compiling with profiles). In general, the resulting integer code is a tad slower (by 3-5%) than the code compiled in previous Version 8.1. Significant performance drop is demonstrated only in one task, but it's quite weighty — from 27 to 34% depending on a processor. You will be lucky, if your code does not resemble this task :).

Nevertheless, the new version of compilers demonstrates a number of advantages over the previous version in terms of calculations with real numbers (where SSE, SSE2, SSE3 instructions are used) — quite insignificant though (from 0 to 3%). The usage of optimization keys for a given micro architecture of a processor remains adequate (-QxP for Pentium 4/Prescott, -QxB for Pentium M/Dothan, we can recommend experimenting with QxB and -QxP for AMD K8 processors).

By the way, let's say several words on AMD processors. According to our research, both ICC/IFC versions (8.1 and 9.0) compile code that demonstrates very good (even the best in some cases) performance on AMD processors... in case we "patch" it :), or we "patch" compiler libraries. It would have been peachy, if Intel the manufacturer replaced the current check of a processor type for a wiser one — similar to what we have used.

This modification would be beneficial to end users in the first place. In this case, even if a software developer uses "automatic" optimizations like -Qax*, the most optimized code will be chosen for execution, depending only on availability of necessary extended instruction sets, not on a CPU manufacturer. Note that one of the points charged by AMD to Intel is that AMD processors may be much slower than their competing processors, when executing an "automatic" code, even though they have necessary extensions.

It would be no less beneficial to software developers and testers — there would be no need to use different compilers for different processors or to develop applications for processors of a given manufacturer.

And of course, AMD itself would profit much — there would be no need to develop its own compiler, which has been on the hook for a long time already :).



Dmitri Besedin (dmitri_b@ixbt.com)
September 5, 2005.



Write a comment below. No registration needed!


Article navigation:



blog comments powered by Disqus

  Most Popular Reviews More    RSS  

AMD Phenom II X4 955, Phenom II X4 960T, Phenom II X6 1075T, and Intel Pentium G2120, Core i3-3220, Core i5-3330 Processors

Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
February 1, 2013 · Processor Roundups

Inno3D GeForce GTX 670 iChill, Inno3D GeForce GTX 660 Ti Graphics Cards

A couple of mid-range adapters with original cooling systems.
January 30, 2013 · Video cards: NVIDIA GPUs

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

AMD FX-8350 Processor

The first worthwhile Piledriver CPU.
September 11, 2012 · Processors: AMD

Consumed Power, Energy Consumption: Ivy Bridge vs. Sandy Bridge

Trying out the new method.
September 18, 2012 · Processors: Intel
  Latest Reviews More    RSS  

i3DSpeed, September 2013

Retested all graphics cards with the new drivers.
Oct 18, 2013 · 3Digests

i3DSpeed, August 2013

Added new benchmarks: BioShock Infinite and Metro: Last Light.
Sep 06, 2013 · 3Digests

i3DSpeed, July 2013

Added the test results of NVIDIA GeForce GTX 760 and AMD Radeon HD 7730.
Aug 05, 2013 · 3Digests

Gainward GeForce GTX 650 Ti BOOST 2GB Golden Sample Graphics Card

An excellent hybrid of GeForce GTX 650 Ti and GeForce GTX 660.
Jun 24, 2013 · Video cards: NVIDIA GPUs

i3DSpeed, May 2013

Added the test results of NVIDIA GeForce GTX 770/780.
Jun 03, 2013 · 3Digests
  Latest News More    RSS  

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  About us & Privacy policy  ·  Twitter  ·  Facebook


[an error occurred while processing this directive]

Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.