Memory Subsystem
A Krait CPU has three cache levels: L0, L1 and L2. The latter is shared by all the cores while the former two are dedicated to each core. A Krait core has 8 KB of L0 cache (4 KB for instructions and 4 KB for data). It's exclusive and doesn't have to be duplicated by L1. The access to L0 is very fast, just one cycle, which saves battery. Next, Krait has 32 KB of L1 cache. Just as L0, accessing it requires only a single cycle. The difference between L1 and L0 (which is Qualcomm's exclusive feature) is in power efficiency which L0 improves.
The L2 cache is shared by all computational cores. Krait has more of it than Scorpion did, so every core gets 512 KB. Obviously, the dual-core SoCs have 1 MB L2 while the quad-core have 2 MB.
The biggest difference in L0/L1 and L2 caches is performance. The former two work at the core's clock rate and voltage, while L2 has dedicated power lanes and works at own clock rate (up to 1.3 GHz). This is also required to save power, and in some cases the L2 cache can be disabled completely if not used.
Unlike Scorpion, Krait has no limitations in terms of dual-channel access to external memory. Even though Scorpion does have a dual-channel LPDDR2 controller, only one channel can work with the external memory. Now, Krait can use both memory channels in any configuration. This should eliminate the memory bandwidth bottleneck and increase performance in certain tasks. Finally, like Cortex-A15, Qualcomm's Krait supports Large Physical Address Extension (LPAE) and can work with up to 4GB RAM.
Process technology and clock rate
Scorpion S4 SoCs powered by Krait cores are the first SoCs made using the 28-nm low power (LP) process technology. The first batches have been produced by TSMC, although Qualcomm has a partnership with Global Foundries as well. To compare, NVIDIA uses 40-nm low-power triple gate oxide (LPG) process technology in its SoCs. What's the difference? Qualcomm considers LP process technology a better choice for mobiles, because they mostly operate at low clock rates, while triple gate oxide transistors help reduce leakage only at high clock rates and voltages. According to Qualcomm, the LP process technology provides better energy efficiency as shown on the figure below.
Unlike Tegra, Krait (as well as Scorpion) uses dedicated power lanes. As a result, any core can operate at lower clock rates and consume less power, giving Krait an advantage in energy efficiency in many types of tasks. The first Krait-based Snapdragon S4 SoC was MSM8960, with CPU operating at 1.55 GHz and 1.05V. The second edition works at 1.7-2.0 GHz, keeping the same voltage. The new TSMC process technology has reduced power consumption from 0.432 W to 0.265 W in the same tasks.
Performance, power consumption
To compare capabilities of the different processor cores, we used the integer-value Dhrystone benchmark, which estimates DMIPS (Dhrystone Millions of Instructions per Second). Though the test is rather old, it's enough to compare performance within the same processor microarchitecture. The table below shows the number of DMIPS/MHz per core:
|
ARM Cortex-A5 |
ARM Cortex-A8 |
ARM Cortex-A9 |
Qualcomm Scorpion |
Qualcomm Krait |
ARM Cortex-A15 |
DMIPS/MHz |
1.6 |
2.0 |
2.5 |
2.1 |
3.3 |
3.5 |
With its 3.3 DMIPS/MHz Krait is one third faster than Cortex-A9, which operates at the same clock rates, and 60% faster than Scorpion. The next Cortex-A9 SoCs promises to reduce the gap down to 25%. In turn, the next generation of Qualcomm chips will bring it back to 40-50%. However, today it yields to the newer Cortex-A15.
Estimating the progress of the Qualcomm cores, we can see that since the company left ARM11 microarchitecture, the performance increased by 60% with each generation:
Also, Krait-based SoCs have the advantage of using aSMP (asymmetric multiprocessing). It means that any core has its own clock rate and power consumptions level, providing the most effective operating mode. Moreover, any idle core can be turned off. According to Qualcomm, their aSMP technology gives 25-40% boost in energy efficiency and lets manufacturers use simpler chip designs—without aSMP they would have to use supplementary cores operating at lower clock rates (as in NVIDIA's 4+1 Tegra 3 or ARM's big.LITTLE architecture).
Finally, the Krait core features the improved branch prediction and balanced load for the instruction pipeline. The core features manually designed logic circuitry, which provides very wide capabilities for dynamic voltage and clock rate tuning. Krait cores can gradually move from the minimum to maximum performance and power consumption, in theory providing the best energy efficiency on the market.
Write a comment below. No registration needed!