K8L Architecture on the 65nm SOI Process Technology in Quad-Core Processors Already in Q2 2007?

It's no secret that practically all information, provided by the companies at official press conferences, does not come as a surprise to those who keep tabs on IT news and unofficial bits of leaked info. But the official position is priceless not just because it ties future products to certain dates, but also because it's laconic (strange, but true) unlike the "leakages". The fact is, the grapevine from some Taiwanese and English web sites usually provides only possible ways of development. Real plans are based on the choice of the best of such possible ways, to be more exact — the easiest to do from the technical point of view. These hot-competition days, official plans are published as soon as the technical guys get their bearings as to their capacities and give the green light to the marketing department.

In the beginning of September, AMD has shared its latest plans with representatives of Moscow IT mass media.

The official part was conducted by Pierre Brunswick, Regional Vice-Director (marketing and sales in CIS, Eastern Europe, and Turkey). Judging from how often he attends press events, he has settled in Moscow. By the way, such a high position for an employee responsible for "our" region is an indirect but objective sign that AMD holds these parts as a strategically important region.

The main task for marketing specialists from AMD today is certainly to show that the company is alive and evolving despite the launch of a competing product with a better architecture. That company is not going to give up its share of the market as well as to slow down its growth rate. There are lots of economic and technological reasons for it, as we found out in the course of the presentation.

Of course, securing positions on the conservative server market is one of the key achievements for the last three years.

Dell's choice of AMD processors is also important as it attracts another large customer (especially considering that the share of Dell AMD-based computers may reach 50%). But this long-awaited event is even more important as a political example for the wavering. Interestingly, IBM started introducing Opteron processors into the server line at the same time. For some reasons that are no longer relevant, the old technological partner of AMD had been loyal to Intel in the sector of multi-processor servers.

Much less anticipated (frankly speaking — absolutely unexpected!) summer event — the initiated merger with ATI, also plays its political role. Although we'll have to wait for the joint operation to take effect on characteristics of end products. As for now, we can see only a counterproductive consequence — no support for CrossFire in the 965-series chipsets from Intel. The first engineering samples of motherboards on this chipset supported this technology. There were some bugs to fix, but the company refused to resort to ATI's help "just in case".

As NVIDIA decided not to change its priorities abruptly and to act on its own, SLI support does not appear in Intel chipsets either. Moreover, nForce 500 chipsets from NVIDIA also have some problems — the announced models are still not available (some manufacturers even removed mentions of them from the official web sites). The reasons for delays are not commented on, as usual. But having multiplied the situation by the scant choice and high prices for motherboards with at least one graphics port for Core 2 Duo, the desktop platform from AMD should be called the choice of "normal people", while Intel is for "enthusiasts". A very unusual situation.

All the fabs producing AMD64 processors still fit on the slide about AMD's manufacturing strength (except for the co-production in Chartered Semiconductors, which is already working at its full capacity). The news about converting the old Fab 30 into 65nm and thinner process technologies was rather unexpected. But representatives of the European department keep silent about larger-scale projects in the USA, probably for patriotic reasons.

Then the floor was given to Guiseppe Amato — Technical Director of Sales and Marketing in Europe, Middle East, and Africa.

He shared the key architectural features of the quad-core processors to be launched in 2007:

Native Quad-Core Design, 65nm SOI process technology, keeping the heat emission to the level of modern dual-core processors
Independent dynamic control of the clock and voltage of each core
Integrated Memory Controller (dual-channel DDR2-667/800) and Direct Connect Architecture for direct connection of processors in a multi-CPU system.

But the most interesting information for competition analysis has to do with the official innovations in the quad-core architecture.

We have all seen a similar picture that illustrated the changes in the K8L core. Thus, practically all of them (at least those that were mentioned) must be included into the first generation of quad-core processors, which are planned to manufacture (to be more exact — to appear in stores) in Q2 2007! It means that AMD intends to reschedule its "riposte to Intel Core 2" to an earlier date, even compared to the most optimistic forecasts (mid 2007). Opteron and Athlon 64 FX processors will be the first to upgrade to the new core. That's certainly the most important piece of news.

There will be three levels of cache in quad-core processors. It has been stressed (many times) that the focus is on efficient memory handling instead of "brute force" cache sizes. Does it mean that associativity of L1 cache will grow and that the L1-L2 bus will be expanded to 256 bit? That's most surely affirmative (how else can you achieve your promises, but with these improvements, which are technically easy to implement and which have been suggesting themselves for a long time already?)

The popular division into exclusive and inclusive models of cache organization is gradually losing its importance (to be more exact, there existed more than two structure types, marketing guys from AMD just popularized this division, being the first to report about advantages of exclusive cache in K7 processors). In practice, it would be strange for dual- and multi-core processors with shared cache to strive for the purely exclusive model. In the three-level cache structure, the L1-L2 pair will most likely inherit the exclusive type (cache data of one level will supplement data in the other cache, a given data block may be located only in one of caches at a time - efficient addition of their sizes). What concerns L3, it will be able to hold data blocks from higher-level caches along with its "exclusive" data. Nevertheless, it will not be required to store the data, which are available on the other levels, like in processors with purely inclusive cache. This type can be called "non-inclusive" or "non-exclusive".

On the whole, it seems like a sound idea. Having the integrated memory controller at its disposal, AMD indeed does not have to increase the size of internal cache. A more universal way consists in accelerating the data exchange rate with the entire array of external memory, especially as the DDR2 potential is not fully revealed yet. According to our tests, modern memory modules can exchange data with a processor at the rates similar to exchange rates inside a processor. In this case, it makes sense to minimize cache size, but to make it as fast as possible.

Power and clock management will be available for each core separately. The control procedure will allow each core running at its own frequency, depending on its workload at a given moment of time.

The server roadmap for 2007 and 2008 hasn't been changed much compared to the CeBIT slide. However, there is one interesting addition — Direct Connect Architecture will be updated to the second version by 2008 and one of the innovations is called Probe Filter. The fact is, despite the scheduled increase in Hyper-Transport bandwidth in Version 3.0 to 10.4 GB/s in each direction, when it's used to connect processors (as opposed to periphery), as the number of CPUs grows, the internal traffic may turn it into a bottleneck. And with the scheduled appearance of the fourth HT channel in top Opteron models, AMD facilitates (and encourages) building 8-CPU configurations. This filter should unload the bus by exchanging only necessary information between CPU caches! Technical details are not yet available, so server gurus may come up with their own implementations (to compare with the original idea later).

According to AMD, the principal difference of its quad-core processors from Intel CPUs on Kentsfiled/Clovertown cores to be launched in Q4 2006 is their native quad-core design — all the four cores are located on the same die and share the cache. But Intel only "glues" two dice from the existing dual-core family (Conroe/Woodcrest) to accelerate the launch of this product. As in this case processors have to interact via the external bus and the chipset, this approach resembles the one in Pentium D. It's certainly far from being technically elegant. Besides, Intel will have to reduce the FSB clock from 1333 MHz to 1066 MHz, which is also a scale back. On the other hand, end users are usually indifferent to how nice the technical features are implemented from the engineering point of view, so performance tests of the products will have the final say in this matter.

But "glued" cores will have the side effect of increased power consumption, which can be compared even now, because TDP values for processors and other system components (chipsets, memory) are already available. Proceeding to power consumption, the reporter noted two issues:

This parameter matters to a consumer (possessing a lot of servers and striving to save power) only when we compare power consumption of the entire platform, not just processors. When we do that, it turns out that even a platform on the modern dual-core Opteron (90nm SOI) outscores Xeon-Woodcrest (65nm), as it does not need an external memory controller and uses DDR2 memory, which consumes less power compared to the first generation of FBDIMM
AMD platform gains advantage when you upgrade to quad-core processors, because Intel will have to expand thermal emission versus the dual-core family for the above-mentions reasons, and AMD will have it on the old level (though it exceeds Intel's level in absolute values for middle models in the series).

Interestingly, AMD evaluation is based on conservative data. The real power consumption of the first quad-core processors from Intel, demonstrated by independent tests, is practically twice as high versus dual-core prototypes - it reaches the level of eXtreme models from the old Netburst family. This may pose certain problems, if you plan on upgrading processors on your platform for Intel Conroe/Woodcrest - if voltage regulators on your motherboard are not powerful enough. However, in practice users sometimes manage to run new processors on rather old motherboards, but the optimal choice is specially optimized models. Interested in selling its own products (not only processors), Intel will not miss a chance to offer modifications of motherboards and then chipsets, specially updated for the new platform.

In conclusion, Guiseppe Amato gave attention to the ideological issue — the effect of multi-core processors on multi-processor systems.

The number of cores in processors grows each year. On the face of it, two-three dual-CPU servers in the epoch of quad-core processors should be enough to maintain very large projects. Expensive 4- and 8-CPU systems migrate to the field of scientific computations and modeling. And simpler systems suffice for business usage. Right? Not quite like that.

A multi-processor server offers lower maintenance costs compared to several dual-processor systems, though it's more expensive. In practice, you can often do with fewer physical processors.

This "instruction" again demonstrates that different classes of tasks have their own preferable server organizations (based on single-, dual-, and multi-processor configurations).

Part 2. Benchmarking Performance and Power Consumption

Dmitry Laptev (lpt@ixbt.com)
September 22, 2006

Write a comment below. No registration needed!