As we were conducting our first tests of AMD Phenom X4 9850, following the updated test procedure (with a newer version of MathWorks MATLAB), we noticed an odd phenomenon, utterly unexplainable by logic and common sense: results of Phenom X4 9850 in one of MATLAB benchmarks were many times worse than those of Phenom X3 8750 that had one less core and operated at a lower frequency. We could ignore this fact, since we are neither MATLAB developer nor Phenom engineers. All we can do is benchmark performance and publish results. However, a feeling of something unfinished surfaced each time we looked at AMD Phenom test results.
One of our readers suggested a possible way to solve this problem. In his opinion, in case of Phenom, MATLAB failed to detect processor type correctly, so it did not use a proper optimized library (e.g. Intel Math Kernel Library for Intel CPUs and AMD Core Math Library for AMD CPUs). This assumption was wrong, but it led us to another interesting idea: what happens, if we try and force MATLAB to use Intel's library with AMD Phenom? Having modified the blas.spec file**, we managed to launch MATLAB. A trial run of the built-in benchmark was also successful, and we proceeded to our tests, supported by initial lack of problems. Test results were so interesting that we decided to publish them in a separate article.
** Original BLAS.SPEC:
GenuineIntel Family * Model * mkl.dll mklcompat.dll # Intel processors
AuthenticAMD Family * Model * acml.dll # AMD processors
Replace the DLL name:
GenuineIntel Family * Model * mkl.dll mklcompat.dll # Intel processors
AuthenticAMD Family * Model * mkl.dll mklcompat.dll # AMD processors
Tests
We'd like to draw your attention to the fact that we use results of the built-in MATLAB benchmark in their original raw form here, so don't be surprised to see most numbers below 1. The lower the result, the better.
We'd also like to comment on the differences between the charts and graphs. It's all simple with the charts: they show performance results of AMD Phenom X3 8750 and Phenom X4 9850 operating with AMD CML and Intel MKL libraries in the nominal mode. The graphs offer a different angle: as Phenom allows to disable any core, we systematically tested Phenom X3 with one, two, and three cores. Phenom X4 was tested with one, two, three, and four cores. Thus, the graphs show evolution of performance in each test depending on the number of enabled cores. In fact, we've come up with very interesting graphs.
LU decomposition (LU)
With the first test comes the first shock. Intel MKL accelerates all Phenom modifications without exception -- both triple- and quad-core processors. Performance gain varies from 38% for Phenom X4 to 30% for Phenom X3. That is almost by one third as the worst case scenario.
Quad-core and triple-core Phenoms act absolutely identically on the graphs. The tendency to increase performance, as the number of cores grows, is quite distinct up to four cores. However the strongest effect is provided by the upgrade from one core to two cores. The other performance gains are smaller. Funny, ACML performance gains (as the number of core grows) are even higher than in Intel MKL, even though the absolute performance of ACML with any number of cores is always lower.
Fast Fourier transform (FFT)
The native AMD library and AMD processors cope with Fast Fourier Transform well. Intel MKL even lowers results a little. But the most interesting happens on the graph.
First of all, triple- and quad-core Phenoms perform differently even with the same ACML library and the same number of active cores. For example, Phenom X4 steadily accelerates (even if not much) as we unlock cores from one to three. Only the fourth core drops the tempo. However, Phenom X3 drops its performance as we switch from one to two cores. Then it speeds up, as we enable the third core. MKL behaves in a less strange manner, but the fourth core slows it down as well.
Solving the van der Pol equation using the ODE45 method (ODE)
"Curiouser and curiouser!" This is the best comment on this interesting chart. Phenom X4 improves its results by 38%, as the native ACML is replaced with Intel MKL. And the same procedure actually makes things worse by 11% for Phenom X3. Of course, we can drag in an explanation that Intel lacks multi-core processors with odd number of cores, so Intel MKL does not have to work well with them. But it's not serious.
Comparison of ACML and MKL with different number of cores demonstrates an already familiar picture: MKL copes more or less well with the fourth core in Phenom X4, but ACML drops its performance in this situation.
Write a comment below. No registration needed!