iXBT Labs - 'New Old' Core i7 Architecture - Page 1: A bit of history, key features

<< Previous page

We've all been looking forward to the rollout of Core i7 aka Nehalem. But when it finally appeared, many users were discouraged: we were waiting for the "simple and clear" event -- Intel launches its first processor with an integrated memory controller, we grumble a bit that it took the company so long to follow AMD's beaten track, but then we take a look at performance test results, we melt, rejoice, and forgive everything. Here we go, Intel launches Core i7. However, it turns out very different from "Core 2 with an integrated memory controller" we have all expected. We don't know how to react. In the first part of this article, devoted to the new architecture from Intel, we'll try to find out what we could expect from the new processor without tests, proceeding only from its specifications. And the second part of the article, to be published a tad later, will be devoted to verifying our assumptions in practice.

A bit of history

The main reason for designing the Nehalem core had nothing to do with Intel's intentions to integrate a memory controller into a processor. The integrated memory controller per se makes sense solely as a way to raise performance -- and Intel has absolutely no reasons to make its processors even faster now: the main competitor is already left far behind and will hardly catch up in the nearest couple of years. Thus, it would have been a grave mistake to treat Nehalem simply as "Core 2 with the integrated memory controller". This false assumption in the beginning of our analysis might have led us all to totally wrong conclusions. So what's the real reason behind the new core? For this purpose we'll have to take a closer look at its predecessor, Core 2 (preferably with a fresh mind). For the sake of completeness, we should write the analysis not only from the technical, but also from historical point of view.

What was the situation Intel faced prior to Core 2? No, let's go even further: what was the situation in Intel, when the company only started the Core 2 project? Frankly speaking, it was stressful. It's only an assumption, of course (but strictly logical). But Intel was probably aware that the NetBurst architecture reached a stalemate much earlier than common users or even independent test labs. It stands to reason that PR and marketing departments could save a day or two, but it was crystal clear that it wouldn't be for long. So it's logical to assume that Conroe was a rush job*.

* Even the first sample of Intel Pentium M (Banias core) was actually designed in a short period. But it was a mobile processor. And when the Israeli branch of Intel faced the task of making a sterling desktop processor from this mobile product (with a tight schedule at that), that was the real fun...

On the other hand, considering how much time it takes to design an absolutely new core (even ideologically different from the old architecture) and comparing announcement dates of the first Pentium D and Athlon 64 X2 processors with the announcement date of the first processor based on the Core 2 architecture, we can make the second significant assumption: it's quite possible that the Conroe core was not considered as a basis for multicore processors at the initial stage. The original strategic line was most likely as follows: "let's design a good single processor core, and if we need a dual-core processor, we'll just merge them". Theoretically, Conroe's design confirms this assumption, and even the concept of Shared L2 Cache agrees with it well (leaping ahead: this intermediate solution was removed from Nehalem, only L3 is shared). Moreover: some details (for example, macrofusion technology) suggest a totally seditious idea that Conroe hadn't even been a 64-bit processor at first! But we'll speak of it later.

Thus, we have a curious technical paradox here: the fastest dual- and quad-core processors Intel Core 2 Duo / Core 2 Quad are rudimentally much older than even the elderly AMD Athlon 64 X2, to say nothing of Phenom X3/X4. In fact, AMD and Intel approached the task of designing a modern x86-64 CPU with different emphases: Being a traditionalist, Intel focused on a fast processor core. And AMD the pioneer brought a lot of new even to the single-core Athlon 64, which were crying for multiple cores (or at least multiprocessing). This time the instinct didn't let Intel down: those processors that could be good single-core CPUs turned out to be the best multi-core products in the transition period. For example, analysis of some test results imply that one of the most useful features of Shared L2 Cache in Core 2 is giving almost the entire L2 cache to a core in the single-core mode. However, if we ponder about the essence of the transition period, it would have been strange to see a different outcome: that was the time when multiprocessing support was a hot topic, but software developers were not in a hurry to do anything in this direction.

But the transition period is coming to an end, and the Intel R&D department has to answer the question what to do next. Core 2 architecture is very strong (actually the strongest out of all x86 solutions) execution core, well balanced dual-core processors, much more problematic quad-core models (in architectural terms), so what's next? What if industry appetites grow, and it is ready to "swallow" even 8-core processors? AMD experiences certain problems, of course, including technological ones. So we can hardly expect this company to be the first with its 8-core solution. But on the other hand, in purely architectural terms the modern Phenom is ready for 8 or even 16 cores. And Core 2 was not quite ready for that. That's why Intel needed a new core, or even a new architecture -- perfectly scalable, modular, initially designed for multi-core systems (Intel uses the term "design-scalable microarchitecture"). Nehalem has become the first implementation of this architecture.

Key features of the new architecture

As we have already mentioned above, the main feature of the new architecture is its modular design. Talking from the point of view of not-so-old times, the main module is a classic single-core x86 CPU: it consists of an executing core, 64-KB L1 Cache divided into two equal parts for data and instructions, and 256-KB L2 Cache. Make L1 twice as small and you'll get Pentium III Coppermine, don't you agree? :)

The other units may include:

Shared L3 Cache
Memory controller
QPI bus controller (QuickPath Interconnect)
PCI Express bus controller (not yet implemented)
Power consumption unit (PCU) and clock
Integrated graphics unit (we've heard that it will be in the same package with a CPU, but it will be a separate die).

However, we don't think that it's a hard-set list and that Core i7 cannot include other units. This list of basic components is more likely a demonstration of the nearest intentions of Intel as far as the architectural development is concerned -- it comprises an integrated graphics controller for a reason, which is not yet available in any Core i7 processor. All these components can be combined in an arbitrary way, some of them may be removed, the number of others may be increased. The model we are going to test a bit later (Core i7 920) looks like this:

As you can see, it includes four processor cores, one three-channel DDR3 memory controller, one QPI controller to communicate with the chipset, and a module responsible for generating necessary frequencies for a CPU and power management. On the other hand, it's just one of possible alternatives implemented in this model. For example, if this new architecture is used for a server processor -- it won't hurt to increase the number of core and QPI controllers. A mobile CPU may have fewer cores to reduce power consumption, and the fast QPI controller can be replaced with the regular PCI Express one. Theoretically, L3 can also be removed to preserve only the minimum: one core, memory controller, PCI-E (Celeron?..) Thus, the main objective of Intel seems to be achieved: the company now has modular architecture with several main units, which can be combined to build a Low-End processor for a net-top or a multi-core processor for a server. And it can be done with the same units -- that's the main attraction! However, there are other important changes. Let's give them a closer look.

Write a comment below. No registration needed!

Next page >>