Once upon a time there were two companies. Of course, there were more competitors than that, but of sworn rivals there were two: Intel and AMD. And it used to be so that Intel was usually first to roll out innovations, while AMD followed with cheaper solutions, thus having its share of the market as well. Intel didn't like that, so it took some action about it. Both usual, like releasing new chips or cutting prices, and "original." The latter was the reason for AMD to sue Intel, accusing the rival of unfair competition and violation of the antitrust laws. There's no need to describe every twist and turn of events which has been swaying to and fro since the 80s (counterclaims included). But there's one method in Intel's arsenal we're quite interested in.
Many software developers consider Intel compilers the best in many aspects, code optimization included, and use them for performance-critical programs. Intel also supplies a lot of optimized function libraries for various professional applications. Often there are just no similar alternatives. But same software developers noticed that Intel compilers and libraries often worked suspiciously slow on processors made by other companies. The matter is that generated code (manually written in case of libraries) has several versions of the most critical parts optimized for specific architectures and instruction sets (most often the SSEx series). Code also contains a function -- a CPU dispatcher (don't confuse it with scheduler, the pipeline stage, that may also be called "dispatcher") -- that determines which processor it's being run on, so that it could select the needed path. The problem is that Intel dispatcher checks both instruction set support and processor name. If the latter isn't of Intel make, the dispatcher selects the code path that provides maximum compatibility at the expense of performance. Even if the given processor supports all necessary instructions.
This isn't something new, it's been going on for years. But Intel didn't change anything, although it also never advertised its compilers as working best only on their own processors. As a result, software developers may not be aware that users with "wrong" processor often get reduced performance. But as soon as they switch to Intel CPUs, everything speeds up. If all software developers knew about that, perhaps some of them would optimize their applications manually or switched to other compilers and libraries. The official article on the use of the Intel Performance Primitives library with AMD processors states that the latest IPP version optimizes code honestly. But the article doesn't say that some other libraries do not.
Intel deliberately sets low (sometimes even zero) prices for its programming tools and provides a high level of support. In other words, selling compilers is most likely unprofitable, but it improves processor sales. There's almost no sense in adding new instructions into processors without providing corresponding compiler support, because assembler is rarely used these days. As for AMD, it provides a compiler, too -- Open64 -- but only for Linux.
You probably know about an amicable agreement between Intel and AMD settled in November 2009. It sustained a case of the latter, obligating Intel to pay a handsome sum. Less known is that AMD also demanded to sign an adjustment agreement that listed many methods of unfair competition that Intel would promise not to use anymore. The list included "CPU substitution" as well. The document obligated Intel to change dispatcher code to neutral (in the next version probably). In a month U.S. Federal Trade Commission (FTC) filed an antitrust complaint against Intel which contained stinging indictments related to the aforementioned code substitution. FTC even introduced a term for that -- "Defective compiler." FTC demanded that Intel should release a free alternative to or a patch for the existing compiler, made up for costs of recompilation and redistribution of applications involved, and announced the replacement of old versions with new ones.
Almost a year has passed since that time. Intel released new MKL (Math Kernel Library) v10.3, but the CPU dispatcher has remained almost the same. Scalar, vector and 64-bit versions of functions still use non-optimized methods with "wrong" processors. Moreover, many functions have gained a new branch for the upcoming AVX instruction set -- only for Intel processors, too. So this code will work with Sandy Bridge solutions, but it won't work with AMD Bulldozer CPUs to be released at about the same time and ready to execute AVX instructions. Although there's now a new branch for non-Intel processors that support SSE2. And with many functions it works slower than SSE2 code for Intel solutions. Besides, the branch is only in 32-bit versions of MKL.
Brave sir knight
You may ask how we know all that. Gratitude goes to Agner Fog, the experienced software developer and researcher, known for his manuals on optimization and microarchitectures freely available at his website. In 2007, Fog informed Intel on the results of his compiler research that produced the aforementioned conclusions. A long exchange of letters followed, in which the company denied that the problem existed, even though Fog continued to prove that it did. Other experts also complained about the same problem and received similar response. The situation didn't change even with the rollout of version 11.1.054 released right after the agreement with AMD had been signed.
Strangely enough, Intel stated that it deliberately provided optimizations for specific CPU architectures, not instruction sets, so the optimization was better. This was supposed to dismiss charges of unfair competition, as it would be unreasonable to demand that the company should provide optimizations for other processors. But this also meant that every new CPU from Intel, even supporting the same instruction sets, would require the recompilation of software with the new compiler. Otherwise the old dispatcher run on the new CPU wouldn't even recognize the manufacturer. Unless Intel stretched the truth again. Fog decided to check this out and ran a program made with the old compiler on a new, supposedly unknown, Intel processor. As one could expect, it worked perfectly. The reason was that Intel manipulated family numbers of new processors, so that old software would still recognize them. In particular, the company added extended family and extended model.
After Intel refused to solve the issue, Fog decided to give it some publicity. However, contacting a few IT publications didn't yield much. Probably the issue was too specialized for an average user. But why AMD who had suffered commercially from this hadn't even posted anything on their website? Could they decide it would affect the action against Intel? What about VIA/Centaur...?
Meanwhile, Fog continued to provide new facts (links are available at his blog). For example, according to CNET, Skype agreed with Intel to temporarily limit the functionality of its software on machines with "alternative processors". That limitation was removed later. In other words, it's clear whose fault it is. Now what can be done about that? Fog suggests three options:
- Do not use Intel compiler. The GNU compiler for Linux provides optimization as good as Intel's, although the glibc function library needs to be improved. As for Windows tools, there are no alternatives.
- Use Intel compiler and fix the dispatcher manually. In a C++ manual Fog introduced the fair code along with instructions on implementing it into programs. However, this option relies on undocumented features of Intel compilers that change from version to version.
- Change processor manufacturer string by means of virtualization commands. It is known that AMD's version of this technology can do this, as demonstrated here. But no one has made a full-fledged replacing program yet. One advantage of this method is that it can be used by end users without access to source code. As well as journalists willing to write damning reviews.
Strangely enough, VIA Technologies was the first to run out of patience. The matter was that all 64-bit versions of Windows up to Windows 7 v6.1, as well as FreeBSD, only used to work with specific processors. The list used to include just AMD and Intel. VIA joined the two only from Vista SP2 on. But before VIA was allowed to join the "high caste" (the company actually had a good reason to file an antitrust case), they came up with an original way out. The new VIA Nano processor got a function for changing manufacturer name along with some other things (described further). Strictly speaking, this feature was secretly declared for C3 based on the Nehemiah core rolled out about alongside the first 64-bit operating systems in 2003 (which shows how old the problem is), but either no one cared to check or the information was incorrect. Anyway, it didn't cause a sensation.
However, it has long been rumored and suspected that processor dispatcher issue affects benchmarks as well. And results of those get published on websites that help people choose what to buy. One of the confirmed examples did impair VIA's rights. Futuremark PCMark 2005 (the predecessor of PCMark Vantage) used manufacturer-based optimizations in the memory performance test. And the difference in performance of certain branches could be as much as 1.5 times, other things being equal!
Perhaps PCMark 2005 supposed that non-Intel processors couldn't use SSE2 and the newer versions of this instruction set? But, rolled out in 2005, it should've at least considered VIA C7 supporting SSE2 and SSE3. Especially given that the problem was confirmed for the latest version 1.2.0 dated November 29, 2006. Besides, the difference between optimizations for AMD and Intel was also significant. Futuremark could say that VIA processors weren't as popular as competing products. But the AMD K8 architecture hadn't just existed by 2005, it had already gained SSE3 support.
But these are all examples from the past. Nano became the first solution that allowed to conduct an independent "fairness" test, because Agner Fog gained access to special registers and released a CPUID manipulation program (for VIA only). Moreover, Mr. Fog was kind enough to send us a motherboard with a Nano processor, because our local VIA distributor didn't even want to hear about Nano. Having completed the tests, we understand why.
So, this is what CPU-Z says about the original VIA Nano (at full clock rate):
Now behold as we turn our Nano into...
...an Intel Core 2! What an easy way to add virtualization, halve both L1 caches and even move to the 45nm process technology and change socket, isn't it? All right, let's get to the bottom of this.
Write a comment below. No registration needed!