We proceed with our analysis of optimized performance technologies and power management technologies of modern processors, started by our articles devoted to these technologies in Intel Pentium 4 and Intel Pentium M. It's only natural to expand our material to new processors from Intel based on the Intel Core microarchitecture. It will be represented here by the latest Intel Core 2 Extreme QX6700 processor: quad-core (2x2) configuration of the CPU core, codenamed Kentsfield.
Intel Core 2: New Technologies and Their Peculiarities
First of all, we are going to give a recap of the new optimized performance and power management technologies implemented in Intel Core 2 processors, as well as their peculiarities that have to do with multi-core configuration of these processors.
Let's have a look at the screenshot of the latest version of RightMark CPU Clock Utility 2.2RC3 (pre-release), which fully supports dual- and quad-core processors of the Intel Core 2 family.
The main screenshot shows that the quad-core Intel Core 2 Extreme (Kentsfield) processor supports all five performance and power saving technologies — Enhanced Intel SpeedStep (EIST), Thermal Monitor 1 (TM1) and Thermal Monitor 2 (TM2), old On-Demand Clock Modulation (ODCM), as well as Enhanced C States (CxE). Compared to Intel Pentium 4 and Pentium D 600, 800, and 900 processors, which are characterized only by Enhanced Halt (C1) State, this function has been expanded in Intel Core 2 processors (as well as Intel Core Solo/Duo processors) for all possible idle states of a processor, including Stop Grant (C2), Deep Sleep (C3), and Deeper Sleep (C4).
A significant difference of Intel Core 2 processors from the previously reviewed Intel processors is Digital Thermal Sensor built into each core. These sensors are also available in Intel Core Solo/Core Duo models. They can be read independently for each core, this feature is implemented in the latest RMClock versions. From the technical point of view, the returned value is a negative temperature shift relative to the maximum admissible operating temperature of a CPU core (junction temperature, Tj). The latter is published in processor specifications - 100°C for Intel Core Solo/Core Duo as well as for Intel Core 2 processors. But from our experience we know that we should subtract an additional shift from these temperature readings - it's specified in one of additional model-specific registers (MSR) of a processor that pertain to these functions. This shift usually amounts to 16°C. It will be taken into account in future RMClock 2.2.
Getting back to enhanced halt states of a processor, the Advanced CPU settings screenshot shows that desktop Intel Core 2 processors have only Enhanced Halt (C1) State (C1E) enabled by default. It's because deeper CPU sleep modes are not used in desktop platforms, as a rule — an operating system does not detect them in the information provided by ACPI BIOS. They are important for mobile platforms — for example, Intel Centrino Duo enables enhanced modes C2E and C4E by default, as a rule.
What concerns TM1 and TM2, you can see on the screenshot the main difference in their implementation compared to the previous Intel processors — Intel Core 2 processors can use both technologies simultaneously, which is done by default in this case. But as we shall show below, it makes no sense in itself — simultaneous usage of TM1 and TM2 requires enabled Extended Throttling. The latter consists in emergency enabling of TM1, when TM2 is not efficient enough to keep CPU temperature within the official admissible range. Despite the famous high efficiency of TM2, it can easily happen with quad-core processors Intel Core 2 Quad / Intel Core 2 Extreme. So we can say that extended throttling is included into these processors for a reason.
The last of the additional settings disabled by default, Sync TM1 on CPU Cores, consists in using the same CPU clock modulation mode for all cores. To all appearances, this mode is indeed of no use — each core (including even those that are part of the "common core") can have its own clock modulation mode, depending on its load and temperature. On one hand, forced sync TM1 can contribute to more efficient reduction of the core temperature on the whole. But on the other hand, it can lead to performance drops in case of overheating. The manufacturer tries to keep high performance even in an overheated processor, so this mode is not used by default.
Speaking of independence or, on the contrary, dependence of separate cores of each other, we proceed to the analysis of performance and power management functions of Intel Core 2 processors, because these peculiarities have to do with this very fact. As is known, classic dual-core processors Intel Core 2 as well as Intel Core Duo are characterized by a shared L2 Cache. As this resource operates at full CPU core frequency, it becomes clear that both cores of dual-core Intel Core 2 (and Intel Core Duo) processors should always operate at the same clock rate. Even though each core has its own MSRs to control performance and power consumption of a CPU core, an attempt to set different clock rates (to be more exact, FSB multiplier, FID) for different CPU cores results in an immediate system freeze.
As only EIST and TM2 have to do with CPU clock rate directly, these parameters should be identical in both cores. Considering that programming TM2 for Intel Core 2 processors is not available (this MSR is read-only in these processors), the problem of syncing core clock rates comes down to controlling them with EIST. Driver of a processor or a control utility, like RMClock, should "be informed" about this performance management technique of dual-core processors from Intel using EIST. At the same time, as we have already mentioned above, each core can have its own throttling level (clock rate modulation — on demand, via ODCM, or automatically via TM1), because throttling affects execution resources of a processor core only and has no effect on its L2 Cache.
On the face of it, controlling performance and power consumption of quad-core processors, represented by Intel Core 2 Extreme QX6700, can be more difficult. But it is only at the first glance — in fact, the core of this processor is like two independent Intel Core 2 Duo E6700 processors in a single package. So, each of these independent "dual-core cores" can also be controlled independently.
Our processor is detected by an operating system as four system processors. As we know, they can mean anything — either a sterling processor core (for example, Intel Pentium M), or its constituent part (Intel Core Duo or Intel Core 2), or a logical processor (Intel Pentium 4 with Hyper-Threading). Each of system processors should be assigned to a physical device for control. Then system processors should be grouped by their belonging to a given physical processor/core to be controlled as a single device. This task can be solved well by the so called APIC ID of a processor, which can be assigned to it arbitrary by an operating system. In case of quad-core processors Intel Core 2, the least significant bit of APIC ID stands for the index of the "dependent" ("secondary") processor core inside the "independent" ("main") core (0 or 1). The following bit of APIC ID, shifted to the right by one, reflects the index of the "independent core" inside a CPU package. The table below illustrates it.
The table shows that APIC ID of a processor may not coincide with the number of a system processor, so you shouldn't take this number into account. Note that the above layout of Intel Core 2 Extreme QX6700 cores formally coincides with that for the previous extreme dual-core processors with Hyper-Threading — for example, Intel Pentium Extreme 965. Here is the only difference - index of the "main" core in that case corresponds to the core index, and index of the "secondary" core corresponds to index of the logical processor, a part of the core. The analogy is complete.
Intel Core 2: tests
We have reviewed the main performance and power management technologies of Intel Core 2 processors and their peculiarities in these processors. It's high time to proceed to tests.
EIST and CxE technologies in Intel Core 2 processors behave just like in the previously reviewed processors Intel Pentium 4 and Intel Pentium M, including the above mentioned peculiarities that have to do with the multi-core architecture of the processors. We are much more interested in TM1 and TM2 technologies - extended throttling. Especially as the quad-core processor gets overheated quite often, as it turned out...
First of all, let's have a look at monitoring curves (we monitor the first core of the processor — "CPU 0", the other cores behaved in a similar way), idle.
We can see that the "nominal" core clock rate is dropped to minimum — 1600 MHz (266 MHz FSB, minimal multiplier - 6x) due to the enhanced halt state CxE (in this case — C1E). At the same time, CPU throttling frequency reading (real CPU core clock rate) returns a nearly maximum value, because the process of reading is an inevitable CPU load, it switches the processor from C1E mode into C0. Core temperature is minimal (about 33°C). We should note a peculiarity of C1E technology in Intel Core 2 processors - CPU core voltage (VID) measured in idle mode sort of always remains at the maximum level, while the FSB multiplier (FID) goes to minimum. It seems a peculiarity of measuring VID on this CPU type, because the C1E technology itself puts a processor into minimal power consumption mode, characterized by a minimum frequency (FID) and voltage (VID).
We load a processor with our simple test application (StressTest). In this case we started it in four-thread mode to load the processor completely. The clock rate and load of the processor are maximized, core temperature grows fast to an impressive level.
We can soon see the situation, published on the screenshot, even in case of the standard cooling system (quite noisy!) That's right: we can see a threshold of the overheating protection system. The core temperature is about 81°C.
That's how throttling works in case of standard cooling.
Clock rate curves (as well as multiplier curves) of a processor show "jumps" to the minimal performance level — 1600 MHz, 6x FSB. That's how TM2 works - we know it well from our previous tests. In this case (enabled TM1, TM2, Extended Throttling), it's the first to snap into action. It looks quite natural, considering its high efficiency. By the way, we can judge about the latter by the core temperature curve — it stays on the same level of 81°C. And finally, you can see the CPU Load curve to drop to approximately 97%. In other words, although throttling takes place (in case of standard CPU cooling), its effect on CPU performance is minimal.
We proceed to analyze throttling in the following way: we use SpeedFan to reduce the rotational speed of the CPU fan to minimum. We failed to stop the fan, its rotational speed went down to approximately 1000 rpm.
CPU throttling reaches its maximum efficiency rather fast thanks to TM2 — the average core clock rate gradually goes down practically to minimum, approximately 1.7 GHz. It corresponds to 64% of effective performance of the CPU core, which unfortunately cannot be seen in the CPU Load curve. The fact is that the latter is plotted relative to the current, not "full" CPU core clock rate. Due to the measurement error (that has to do with multiple switching between minimal and maximal CPU clock rate), it may be even above 100%, when the measured average core clock rate is really higher than the synthetic instant clock rate. Efficiency of the thermal protection system is on a high level — pay attention how little the core temperature changes.
There are still no principal differences demonstrated in other tests. The only difference is that the average CPU clock rate drops to its minimum (in case of TM2), 1.6 GHz. You cannot see Extended Throttling in action even in these conditions, with extremely low CPU fan speed. We have nothing to do but to stop the fan manually.
Extended throttling in action! In reality everything was exactly as it was described in theory — TM1 snaps into action, when TM2 is insufficiently efficient. It consists in modulating CPU clock rate by inserting forced idle cycles. The latter, measured relative to the minimal clock rate of 1.6 GHz, reaches 0.91 GHz, that is 57% of the minimal CPU clock rate conditioned by TM2. It's about 34% of the full CPU clock rate (2.67 GHz). Further reduction of the CPU clock rate is probably impossible even when TM2 and TM1 are used simultaneously to the full extent. Pay attention to the core temperature curve — when TM1 snaps into action, its readings "disappear": a digital temperature sensor of the CPU stops giving sensible data, setting the reading valid bit to zero. It indicates whether the readings are correct. It means that the core temperature is not maintained at the constant level (approximately 82°C) and continues to grow. Indeed, the outdated TM1 technology, which controls only the effective clock rate, but not the core voltage, cannot maintain CPU temperature like TM2. To all appearances,the real CPU core temperature in these extreme test conditions continues to grow, until it reaches the threshold of the overheating sensor that powers off the system.
But we shall not wait for it to happen and start the active CPU cooling. Let's roll back to the minimal fan speed at first.
CPU throttling (TM1) slowly goes to nought. Then there appear the first signs of TM2 - short attempts to restore maximum CPU clock rate. And finally, CPU temperature monitoring is restored. Then we resume maximum cooling of the processor.
The situation is similar to TM2 throttling, to within the reverse flow of events in time. We gradually reach the 95% CPU performance, demonstrated under full load with a standard cooling system.
Now let's try to disable extended throttling, while keeping TM1 and TM2 enabled. Now we reduce the fan speed to minimum.
TM2 reaches its maximum efficiency, the core temperature is kept at a constant level. Now we stop the fan in the CPU cooler.
We'll see the TM2 threshold — all attempts to restore maximum CPU clock rate are ceased first, and its real frequency goes to minimum. Then there disappear readings of the digital temperature sensor. But as extended throttling is not used anymore, TM1 is not enabled.
Restoring active cooling of the processor results in rolling back to practically maximum CPU performance.
And finally, let's analyze the outdated mechanism of TM1 in pure form.
It's enough to reduce the fan speed to minimum to activate TM1 throttling to full extent. CPU core clock rate remains maximal, but its real frequency quickly goes down to 56% (1.49 GHz), like in case of extended throttling. When the maximum throttling level is reached, the core temperature quickly goes beyond the admissible limit, so temperature readings disappear again. That's another proof that TM1 is much less efficient than TM2. In the same conditions of the experiment, this technology results in a more significant CPU frequency reduction, and it turns out incapable of maintaining CPU temperature at a constant level.
When we resume active cooling of the processor, CPU core temperature quickly goes back within the admissible range, and CPU performance slowly reaches the maximum level, approximately 87% of the nominal.
Conclusion and Recommendations
Our tests demonstrated that the efficiency of the standard cooling system of Intel Core 2 Extreme QX6700 is not high enough. Besides, it's rather noisy. When all four cores are fully loaded, the standard cooling system does not cope with the load, so the thermal protection system snaps into action, which results in throttling. The recommended solution to this problem is to use a more efficient cooling system. Additional tests reveal that this task is up to even non-exotic air cooling systems, like Zalman CNPS9700NT.
Throttling tests of our processor show high efficiency of Thermal Monitor 2, as in maintaining core temperature at an admissible level for a long time even in case of minimal cooling. But in case of emergencies, for example, when the CPU fan stops under maximum CPU load, Thermal Monitor 2 technology may be not efficient enough. It will lead to extended throttling — Thermal Monitor 1 will snap into action on the background of Thermal Monitor 2. Our tests show that CPU temperature in these conditions cannot be maintained at the admissible level. It may lead to further temperature growth and emergency power off of the system. Thus, you should give more care to cooling quad-core Intel processors, for example Intel Core 2 Extreme QX6700.
Dmitri Besedin (firstname.lastname@example.org)
December 12, 2006
Write a comment below. No registration needed!
Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.