Dual Core Yonah: Not Pentium III Anymore, Not Conroe Yet

The last but not the least

We have finally seen the launch of the last representative of the long-playing Pentium Pro series — Pentium II — Pentium III — Pentium M, which history goes back more than ten years (since 1995 to 2006). It's launched under a strange and lightweight name - Intel Core Duo. But it does not change its essence: it's still a Pentium M — but highly modified and implemented as a dual core processor with a shared L2 Cache.

In order not to stand out in the row of common P6 CPU names and to highlight noticeable microarchitectural differences of the new processor from its immediate predecessor Pentium M (Dothan), we'll call it Pentium M2 in this article. We'll also use abbreviations for all processors in this series: PPro, P-II, P-III, P-M, P-M2.

But at first let's agree upon the terminology due to ambiguous "dual-core" and "dual-processor" notions. Following Intel's example, the following interpretation is used in popular literature: everything in a single package (and inserted into a single socket) is called a processor, each of two devices in this package (not necessarily on the same die), not connected with each other on the microarchitectural level and executing separate instruction flows, is called a core. On the other hand, each of these cores is a sterling processor in the strictest sense of this word. For example, IBM uses only strict terminology: two processors on one POWER4 chip. That's why in order to avoid ambiguity and not digress from the emergent terminology, we shall use the term "processor core" or "CPU" meaning each independent processor and "processor chip" or "socket" meaning a dual-core processor proper. In this connection, a system with two processor sockets should be called dual socket rather than dual processor system (irregardless of the number of cores in each processor chip).

This terminology issue wasn't elaborated in the previous article about dual core processors of the new architecture, which roused some censure.

The key properties of the dual core Yonah are shown on the image above. The new processor does not contain anything special on the face of it. Expectable die size for the 65-nanometer process technology, expectable shared 2 MB L2 Cache, expectably improved power consumption technologies. The clock frequency is also quite expectable — 2.16 GHz now and 2.33 in the second quarter. But if we dig deeper, we'll see that Yonah has surprisingly many important microarchitectural differences from its predecessor, P-M processor on Dothan core. Information about them appeared back in Spring and Autumn IDF-2005, but it was overlooked on the background of news about the new Merom/Conroe CPU architecture. Most interestingly, there were no signs of such microarchitectural changes for mobile use. I guess Intel has chosen a path of careful evolutionary development of the existing P6+ (P-M) architecture, in order to streamline various innovations before introducing them into next-gen Merom/Conroe processors. These innovations may quite possibly fail to make a significant contribution into performance growth, as they don't touch many weak spots of the P-III/P-M architecture. But they have to do with those processor blocks, which seemed critical to the future high-performance architecture and about which we made various assumptions.

In this article the new processor will be reviewed mainly from the microarchitectural point of view — that is in operational terms of internal processor units and execution efficiency of user applications. Such characteristics and innovations as power management and virtualization technology (VT) will be just briefly mentioned.

What's been changed?
Main differences of the new processor

The main microarchitectural changes in the new processor are grouped under the Digital Media Boost title.

These changes are quite fragmentary and practically don't concern functional units. They include:

Micro-ops fusion for SSE instructions of all types (SSE/SSE2/SSE3)
SSE instructions are now handled by all three decoders
SSE3 instruction set
Faster execution of some SSE2 instructions as well as integer divide
Enhanced data prefetch

The most radical changes were made in the decoder, where in concerns executing SSE instructions. As we mentioned in the previous article, the P-M processor now has micro-ops fusion for decoding x86 instructions. It consists in generating in some cases a single micro operation (it's sometimes called "macro operation") with two elementary operations instead of two µops. It mostly concerns Load-Ops, when µops of reading from memory and executing an operation are substituted for a single macro-op. Such a macro-op is split into two separate actions at a later stage, right before being issued for execution. This procedure resembles that in AMD K7/K8 processors, where any µop may contain an operation for loading from memory or for writing the result back to memory.

But the P-M processor had no micro-op fusion for SSE Load-Ops. Such instructions were still split into separate Load-Ops. Along with increasing a number of µops, it resulted in decreased decoder throughput, because such instructions could be handled only in the first "complex" decoder channel (the other two "simple" channels could generate only a single µop). Besides, regular SSE instructions without Load-Ops could be processed only in the first channel of the decoder (since P-III). This limitation applied not only to packed SSE instructions, which generate two 64-bit µops, but also to scalar SSE instructions. Perhaps this limitation had to do with problematic radical overhaul of the decoder to support SSE instructions in the P-III processor (by the way, one of the project managers responsible for creating SSE was our fellow-countryman V. Pentkovsky).

P-M2 (Yonah) decoder has been completely overhauled — now it supports micro-op fusion for Load-Ops of all types (including SSE), as well as handling SSE instructions (both scalar and packed) in all the three decoder channels. Thus, the maximum bandwidth of the decoder is three times higher — from one SSE instruction per cycle up to three instructions.

This picture shows an example of handling the most difficult SSE instruction — packed (128-bit) Load-Op. Such an instruction was split into four µops in the P-M (Dothan) processor. According to the picture, P-M2 (Yonah) processor uses only one fused µop that contains 128-bit loading of XMM register from memory and a 128-bit packed operation. It's hard to say whether the decoder outputs only one combined 128-bit macro-op to be split into 4 separate actions at a later stage, or there will be generated two 64-bit macro-ops (each one will be later split into two actions). Anyway, we can say that the new decoder works by the 4-2-2 scheme, as each decoder channel can now process packed 128-bit SSE instructions, which are handled by functional units as two 64-bit instructions. This enhancement draws P-M2 (Yonah) closer to AMD K8.

Along with decoder changes, the new processor now supports SSE3 instructions. Besides, some SSE2 unpack and pack instructions were accelerated. Unfortunately, available documents do not tell us anything about the execution speed of 64-bit floating point SSE2 multiplication (scalar mulsd and packed mulpd). P-M processor executes such operations at half speed, outputting one result per two cycles for scalar operations and one result per four cycles — for packed instructions. It's twice as slow as the execution speed of 64-bit SSE2 addition operations as well as 32-bit SSE multiplications and additions. Executing 64-bit multiplication at full speed is mandatory for a processor, which should compete in floating point applications.

As marketing documents from this company do not mention such an enhancement, we can be sure that these operations are still executed slowly. Taking this assumption into account, the situation with floating point performance of the new processor resembles the situation with P-III when it just appeared: execution speed of 64-bit arithmetic remained low due to the halved multiplication speed (for P-III — in x87 mode, for P-M and P-M2 — x87 and SSE2). All efforts were focused on increasing the efficiency of 32-bit arithmetic (SSE for P-III, for P-M2 — SSE plus an improved decoder, which stopped acting as a bottleneck). But it should be noted that the P-III was a multi-purpose processor, oriented to all applications, while the P-M2 (Yonah) is a processor for mobile applications.

This picture shows an example of improved idiv performance when the number of significant digits in the dividend and divisor is limited to 16 or 24 bits. The most interesting case is a wide-spread division of a 24-bit number by a 16-bit number — execution time of this operation is now quite acceptable, just four cycles.

Let's proceed from microarchitectural changes to functional ones that have to do with two processor cores on the same die. The main element of the new processor, which differs it both from the P-M single core processor as well as from other dual core processors of the x86 architecture (Intel Pentium D/XE and AMD Athlon 64 X2/FX-60), is a shared L2 Cache.

Shared L2 Cache provides a higher integration level of processor cores. In systems with separate caches, cores exchange data via a communication interface (bus or switch) that connects these cores outside L2 Caches. But in systems with a shared cache, such data are available to both processor cores directly from this cache. Such an organization allows to reduce access times to remote data as well as to decrease the bus (switch) traffic.

But these are not the only advantages of a shared cache. A process, executed in one CPU, may require more space in L2 Cache for its data than a process in the other CPU, so a shared adaptive cache provides more flexibility in their optimal allocation.

Cache space requirements of a process are governed by algorithm locality. If the volume of data, required by a process for a certain period of time (working set size), exceeds the currently allocated size of L2 cache subset, the process tries to capture more space in the cache and take it away from a process with lesser requirements. Let's review efficiency of a shared L2 Cache versus separate caches of the same total size for different processes.

Two processes with the same locality — in this case each process gets half of the cache.
One process — L2 Cache is fully available to this process.
Two processes with shared (cross) data — cache space is saved, as the data are not duplicated.
Two processes with different locality — processor with higher requirements (worse locality) gets more of cache space.

Thus, efficiency of a shared and separate caches is the same in the first case. In other cases, a shared cache is used more efficiently. Only when locality of one process is much worse than that of the other, the first process may practically completely push the data of the second process out of the cache and thus slow down its execution. Such situations may also happen in other systems with shared resources. For example, one of processes in any dual core processor may require much more memory calls than the other, thus slowing its execution down. On the whole, situation with extremely asymmetric processes requires additional analysis considering various criteria of "fair" resource allocation.

According to Yonah Performance Preview, access time to the shared L2 cache increased to 14 cycles versus 10 cycles in a single core Dothan processor. These additional cycles are the cost of a shared cache for providing access to it from two processor cores and arbitration expenses. Of course, the increased access time may reduce performance of a processor in some applications (all other things being equal). But our experience in using and testing various processors shows that the effect of L2 Cache access time is usually not very strong. For example, the effective access time to heavily loaded L2 Cache in AMD K7 and K8 processors reaches 20 and 16 cycles correspondingly. Such an increase in access times has to do with exclusive cache organization — but it has weak effect on performance.

Yonah performance results will require confirmation in other test procedures. It will be also necessary to determine L2 Cache throughput. Shared access for two processor cores ideally requires doubled throughput. But the cache access speed of 32 bytes per cycle for single core P-III and P-M processors seems excessive and is usually not confirmed by direct measurement. Perhaps the maximum cache throughput of the new processor is not changed and is still 32 bytes per cycle and requirements to the simultaneous access of two CPUs are provided by more rational organization and pipelining of cache calls.

Thus, thanks to the shared L2 Cache, Yonah is the first highly integrated dual core processor of the x86 architecture. As dual core processors start to appear on the mass market, requirements to such processors as well as to application parallelization methods must be raised. The appearance of processors with a shared L2 Cache must lead to the appearance of new programming methods providing higher integration of executed processes and general execution speed acceleration.

What hasn't changed?
Inherited weak spots

Unfortunately, Intel has recently stopped publishing detailed information about the microarchitecture of its new processors. That's why their characteristics have to be deduced from brief marketing materials and announcements, which enumerate only basic differences from the previous models, as well as from tests. Testing mobile processors is also hampered by the fact that such processors are not used in servers and cannot be tested remotely — at the same time they are not spread wide enough for a tester to find a processor on his own.

Another Intel's reason for concealing information about Pentium M processors is probably its desire to present these processors as an independent generation, different from the old P6 family (PPro, P-II, P-III) — as admitting that P-M processors belong to the P6 family might look like a step back compared to the main desktop architecture NetBurst (Pentium 4).

Indeed, the P-M processor retains a lot of P6 features, including its weak spots. Very important limitations of the architecture, which have to do with the decoder (they are described in the previous section), are eliminated in the new P-M2 processor (Yonah). But the other limitations, inherited from the P-III/P-M processors, remain. The first test results indirectly confirm it - Yonah is little different from its predecessor (Dothan) in single-threaded performance, outperforming it or falling slightly behind.

The main disadvantage of P-M and P-M2 processors is the weak floating point unit. This unit uses 80 bits for x87 operations and 64 bits for SSE operations. The latter means that 128-bit packed SSE (4 x 32 bit) and SSE2 (2 x 64 bit) instructions are split into two 64-bit operations and are executed consecutively at twice as low speed.

If all 64-bit operations could have been executed at full speed, FPU performance could have been acceptable for processors of this class. Unfortunately, this requirement is not met by a multiplication unit. It can run at full speed (one result per cycle) only in 32-bit SSE scalar mode. The multiplication unit also operates at the optimal speed in SSE packed mode (four 32-bit results per two cycles). Multiplication is done at half speed in x87 mode and in 64-bit SSE2 scalar mode, in packed SSE2 mode — twice as slower.

Thus, even though P-M2 decoder can generate enough SSE2 µops per cycle and µops can be issued for execution into separate multiplication and addition units via two independent ports, the maximum execution speed of 64-bit floating point operations is reduced twofold, from two operations per cycle to one (in case of the same number of multiplication and addition operations).

Another disadvantage of the P-III and P-M microarchitecture is impossibility to reach 100% execution speed even if instructions are optimally planned per cycle by a programmer or a compiler. It probably has to do with the organization of the µop queue (buffer), from which µops are fetched for out of order execution in functional units, as soon as operands are ready (such a buffer is usually called "Reservation station"). Other state-of-the-art processors (Intel Pentium 4, AMD K8, IBM PPC970) have several such queues (buffers), one queue per a group of close functional units. Due to this fact, only one µop per cycle is fetched for execution from each queue (buffer). The P-III processor has a single buffer for 20 elements, up to five µops can be fetched per cycle. To all appearances, this operation is too complex and is not always executed in the optimal way, at necessary speed. P-M test results (Banias, Dothan) show that they have no changes in this area. Organization of this buffer in the P-M2 (Yonah) processor is presumably not changed either.

We can also note a number of quantitative limitations that have to do with Out-of-Order execution — in particular, a relatively short queue (buffer) - 20, and a small Reorder Buffer - 40. We have an assumption that both buffers are a tad larger in the P-M processor compared to P-III, but neither documentation nor available advertisements point it out.

And finally, another important architectural drawback of the Yonah processor (as well as of its predecessors) is the lack of support for 64-bit EM64T mode (x86-64). Introduction of this mode would have required considerable overhaul of functional integer and address units as well as increasing capacity of registers and width of interprocessor buses. It goes without saying that it couldn't be done within the evolution of the P6+ family. Thus, Yonah can be considered only as a temporary, transitional processor on the way to the new family of all-purpose Merom/Conroe/Woodcrest processors, where, along with EM64T, there will be added a whole complex of solutions to increase performance.

Results of the P6 evolution

In conclusion, let's review the evolution of P6 family and draw a bottom line. For this purpose we'll single out five main representatives of the family, which demonstrate development of the processor microarchitecture: PPro, P-II, P-III, P-M, P-M2.

Pentium Pro

The first superscalar processor of the x86 architecture with Out-of-Order execution. It has insignificant architectural extensions compared to the previous generation Pentium processor: new conditional move and compare instructions are added. It's designed as an assembly with separate dice of the processor and L2 Cache. It was mostly used for server applications, it came with various caches, up to 1 MB.

Pentium II

MMX instruction set is added. Enlarged L1 Caches, improved execution of 16-bit codes. It's implemented as a daughter board with a separate die with L2 Cache, operating at half frequency. It was also manufactured in the following modifications: OverDrive, compatible with Pentium Pro socket; Celeron without L2 Cache; Celeron and Dixon with integrated L2 Cache of a reduced size; Xeon with a full-frequency external cache of an increased size.

Pentium III

SSE instruction set and memory prefetch instructions are added. It was initially implemented as a daughter board with a separate cache, the next modification have a built-in full-frequency L2 Cache with a 256-bit bus. It was also manufactured in the following modifications: Celeron with L2 Cache of the reduced size; Xeon with a full-frequency external cache of an increased size.

Pentium M

SSE2 instruction set is added. Micro-ops fusion, enlarged L1 Caches, faster operations with the hardware stack, significantly improved branch prediction. Radical power saving technologies are introduced due to the orientation to mobile applications. A new processor bus is introduced, copied from the Pentium 4 processor. There are a number of microarchitectural improvements, floating point multiplication/addition operations are divided between two separate execution ports. Delays due to partial register stall are fixed and the execution speed of MMX addition instruction is doubled in the second model of the P-M (Dothan) processor.

Pentium M2 (Core Duo)

SSE3 instruction set is added. A significantly improved decoder with support for SSE micro-op fusion and handling packed SSE instructions in all decoder channels. Faster execution of some instructions, improved memory prefetch. Its die contains two processor cores and a shared L2 Cache. It supports VT and improved power saving technologies.

Development of P6 microprocessors was accompanied by the improvement of electronic technologies and the reduction of technological norms together with the increase of L2 Cache size. Performance of these processors have grown due to improved microarchitecture, elimination of bottlenecks, enlarged caches, and a higher clock. In some parameters it has reached performance of the competing products (AMD Athlon 64 and Intel Pentium 4).

Below is a summary table with main processors of the P6 family.

Processor	Codename	Process, µm	L1 I/D Caches	L2 Cache
P Pro	P6	0.5	8/8	256
P-II	Deschutes	0.25	16/16	512
P-III	Tualatin	0.13	16/16	512
P-M	Dothan	0.09	32/32	2048
P-M2	Yonah	0.065	32/32	2048

Processor	Form-factor	Clock, GHz	Architectural enhancements	Microarchitectural improvements
P Pro	Assembly	0.20	+conditional operations	New microarchitecture
P-II	External L2	0.45	+MMX	External L2, improved 16-bit
P-III		1.40	+SSE	Fast L2, prefetch
P-M		2.26	+SSE2	µop fusion, improved branch prediction
P-M2	2 cores	2.33	+SSE3	SSE µop fusion, shared L2 cache

Another step on the way to Conroe

We have come to a conclusion that Yonah can be reviewed as a transition model that is created to streamline various innovations before introducing them into next-gen Merom/Conroe processors. And now let's see what features of the future processor have been already implemented and what features are yet to be added. Of course, we are speaking of only those subsystems that could have been improved in processors of the old P6 architecture and that wouldn't have required a radical overhaul of the processor structure. Such fundamental issues as the increased number of decoded and processed instructions from three to four as well as support for 64-bit EM64T (x86-64) mode require considerable overhauling of many subsystems and units. So they couldn't appear in the process of evolution.

In the previous article about processors of the new architecture we reviewed various subsystems of a processor from the point of view of the necessity to improve them. The main emphasis was made on the necessity to overhaul the decoder in order to support all micro-op fusion modes and to provide a necessary speed of generating SSE micro-ops. Judging from the Yonah review, we can draw a conclusion that the problem was successfully solved. Moreover, it's solved in the most efficient way, including conversion of the most complex SSE instruction — packed (128-bit) Load-OP — into a single macro-op.

Besides, there appeared another fundamental element of the new architecture — shared (between two processor cores) L2 Cache. Plus improved power management and VT support.

What else must be changed in processors of the new architecture, in order to provide a necessary performance and competition level? It's evidently a floating point unit. At minimum — to implement a full-frequency 64-bit multiplication. Proceeding from the fact that such an improvement was added to Pentium 4 processor, operating at a much higher clock frequency, no doubt that it can be easily done in the new architecture. But in this case, floating point performance will be inferior to performance of the old family (Pentium 4).

Thus we can assume (it's confirmed by some sources) that Merom/Conroe processors will have 128-bit full-frequency floating point SSE arithmetic for all operand types. Within this assumption, multiplication and addition units will become wider and retain their full frequency. That is each of them will be able to output one 128-bit result per cycle — to be more exact, two 64-bit (SSE2) or four 32-bit (SSE) results in packed operations. But still one operation of each type (multiplication and addition) per cycle will be executed in scalar mode. Thus, in scalar mode the new processor will just catch up (in formal FPU speed) with AMD K8 — but in packed mode it will outperform it twofold, reaching the peak performance of quad clock frequency on 64-bit arithmetic.

The assumed doubled maximum performance must be supported by the instruction decoder — as we have already seen above, the decoder can output a necessary number of 128-bit µops (to reach maximum FPU speed, 2 such µops are required in each cycle with the total width of CPU lanes equal to four). Besides, the doubled performance will require an increased speed of reading data from L1 Cache providing (at least) one 16-byte reading per cycle.

We have information that the new processors will have an additional floating point instruction set, which may be called SSE4. Like the previous SSE3, the new extension will probably not be an independent instruction set (like SSE or SSE2), but an addition, which will extend some bottlenecks and provide additional flexibility in using the existing instruction sets. For example, we are interested in the ways of improving operation with 64-bit halves of XMM registers for their separate loading/storing, swapping, and other conversions to allow packed SSE2 instructions to be used for processing array groups not aligned to each other with 16-byte multiplicity. The last limitation is a serious obstacle for efficient usage of packed instructions in calculations.

Finally, processors of the new architecture must have a seriously improved Out-of-Order execution mechanism, up to its complete redesign with due regard for the experience from designing Pentium 4 and competing products. The architecture announcement spoke of enlarged buffers. It may also assume a modified structure of such buffers. For example, a possible switch from a system with a single queue (buffer) for macro-ops to a system with several queues, assigned to corresponding functional units. It would have allowed maximum speed of issuing operations for execution without complicating the structure and making a processor cycle longer.

Taking into account the wider processor lanes, there must be an increased number of functional units of some types (integer arithmetic and logic units in the first place) as well as a modified structure of execution ports.

And in conclusion, we should remind you that the new processors will support EM64T (x86-64) mode — that is to contain a full set of 64-bit integer and address units.

Thus, we can draw a conclusion that the microarchitecture of the future Merom/Conroe/Woodcrest processors will be based on the same fundamental principles as the P6/P6+ architecture — but it will be overhauled with regard to many years' experience of competing products, new electronic technologies and new users' requirements. We can say for sure that processors of the new architecture will be more different from their predecessors (P-M/P-M2) than 64-bit AMD K8 processors from their 32-bit predecessors (K7).

Conclusion

In conclusion we'll publish characteristics of some Yonah models. I repeat that this processor got a new name — "Core Duo" for a dual core modification and "Core Solo" for a single core modification. Besides, the designation system for CPU models was also changed. Now it consists of a letter, pointing out the maximum level of power consumption (U — up to 15 Watts, L — up to 25 Watts, T — up to 50 Watts, E — higher than 50 Watts), and a four-digit code, which starts with 1 or 2 (a number of CPU cores). At present top representatives in each series are processors with normal and reduced power consumption - T2600 (2.16 GHz), T1300 (1.66 GHz) and L2400 (1.66 GHz) with a 667 MHz bus as well as models with ultra low power consumption - U2500 (1.06 GHz) and U1400 (1.20 GHz) with 533 MHz bus. Higher-performance models will be released in the second quarter - T2700 (2.33 GHz), T1400 (1.83 GHz) and L2500 (1.83 GHz). There is still no information about E-indexed processors.

A tad later we'll see another modification of this processor, codenamed Sossaman, for servers with reduced power consumption and heat dissipation. This processor supports dual socket configurations (4 CPU cores per system). The new processor supports monitor and mwait instructions for synchronizing processes.

The first Core Duo results demonstrate a good performance level, comparable in single-threaded applications to the P-M (Dothan) level. In multi-threaded applications, it noticeably exceeds it with a similar power consumption level and comes close to the desktop level. For example, SPECint_rate_base2000 and SPECfp_rate_base2000 results, which characterize the execution speed of threads in a multi-processing (multi-core) system, for the top T2600 model (2.16 GHz) are 34.9 and 27.4 correspondingly. For your information, the Pentium XE 955 (3.46 GHz) with 2 x 2 MB cache demonstrates the following results in these tests - 41.1 and 36.6. Two other dual core processors with 2 x 1 MB caches demonstrate lower results: Pentium XE 840 (3.2 GHz) — 33.5 and 31.9, AMD Opteron 280 (2.4 GHz) — 35.9 and 30.8 (in 32-bit mode). Taking into account that Yonah is a transition model on the way to a new architecture, we can draw a conclusion that it's a strong-knit mobile processor, which is no conjurer, but answers our expectations. Thus it was a good decision to entrust this team from a hot country (Israel) with creating such processors. Let's hope that the next Merom/Conroe family, which is designed by the same team, will also come up to our expectations.

Reference list

O. Bessonov. New Wine into Old Skins. Conroe: Grandson of Pentium III, Nephew of NetBurst? iXBT.com, 2005.
M. Eden. Innovate & Invigorate. Intel Developer Forum, IDF Spring 2005.
M. Eden. Taking Mobile Mainstream. Intel Centrino Mobile Technology. Intel Developer Forum, IDF Fall 2005.
S.L. Smith, D. Perlmutter. Intel Next Generation Multi-core Platforms. Intel Developer Forum, IDF Fall 2005.
IA-32 Intel Architecture Optimization Reference Manual. Intel, 2005.
S. Gochman et al. The Intel Pentium M Processor: Microarchitecture and Performance. Intel Technology Journal, V.7, Issue 2, 2003.
J. Keshava, V. Pentkovski. Pentium III Processor Implementation Tradeoffs. Intel Technology Journal, V.3, Q2, 1999.
A. Fog. How to optimize for the Pentium family of microprocessors. 2004.
H.H. Sean Lee. P6 & NetBurst Microarchitecture. School of ECE, Georgia Institute of Technology, 2003.
S. Wasson. IDF Fall 2005 wrap. Intel aims for more performance per watt. The Tech Report, 2005.
Samuel D. IDF Fall 2005 : Visite guidee. X86-secret, 2005.
Anand Lal Shimpi. Intel's 90nm Pentium M 755: Dothan Investigated. AnandTech, 2004.
Anand Lal Shimpi. Intel Yonah Performance Preview - Part I: The Exclusive First Look at Yonah. AnandTech, 2005.

Oleg Bessonov (bess@ipmnet.ru)
January 20, 2006

Write a comment below. No registration needed!