SoC design staffs of Qualcomm published a paper, titled “A 9-mm2 Ultra-Low-Power Highly Integrated
28-nm CMOS SoC for Internet of Things”, on IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 53, NO. 3, MARCH 2018. It gives an overview of the Blackghost 1.0 system-on-chip (SoC) from Qualcomm Research, which was their first test chip that paved the way toward the commercialization of Qualcomm’s most recent ultra-low-power Blackghost SoC family.
This review notes highlights some low power techniques adopted in Blackhost 1.0. The original paper is rich in content. If you are interested in low power SOC design, it is highly recommended to check out the original article.
First, type of gates used.
Blackghost uses high Vth and long channel cells in AON domain and thick gate-oxide transistor on AFE. Looks good. It is common practice in low power soc design to select high Vth, long channel, and thick gate-oxide cells which are of low leakage. But the trade-off is they are also slower and higher in dynamic power.
Second, super-cutoff power gating design.
As shown below, power switches are used to gate off supply for a power domain. Nothing special to mention. But the trick is power switch control is at 1.05v level which is much higher than the supply to be gated which is 0.55v. This dramatically reduces the leakage across power switch. (Yes, power switch is not ideal and it still leak even it is off)
Third, clock structure and RC-based OSC.
Off-chip XO can be integrated into SOC. Either way XO is power hungry. Blackghost has an on-die RC based oscillator. It is suited for low power and low clock accuracy tasks. It consumes only 20uW at 50MHz clock rate.
4th, mixed AHB and NoC SoC.
NoC is introduced to this IoT chip to lower power. It is argued “Different than busses that fan out the wires to all the peripherals, the links between NoC units are point-to-point connections, which reduce the total number and distance of global interconnections, thereby lowering the associated capacitance load switched per transaction.” I am not a fan of this argument. Just like NoC, AHB has interconnect hub and it is also point-to-point from hub to components. There are lots of papers about NoC and bus power discussion. Some research would be better before take it for granted.
5th, hardware accelerator to CPU for heavy-duty DSP calculations.
No need to say more. It is common sense CPU is not optimized for heavy-duty data-oriented tasks. Hardware accelerator just makes perfect sense. As a result, Blackghost just selects the lowest-end ARM CM-0 as the CPU for IoT applications.
6th, TSMC 28LP process
Paper gives several reasons why TSMC 28LP process was chosen for Blackghost. (28LP means 28nm Low Power process) Process selection is critical for IoT SOC. lower process nodes like 16nm, 7nm tend to be smaller in area, faster in speed, but higher in leakage and very costly. So lots of time IoT SOC tends to use 28nm and even 40nm process.
7th, NTC (near-threshold computing)
For above process node, their simulation shows lowest energy can be achieved at around 0.4v. This voltage level calls for near-threshold computing (NTC). Blackghost NTC mode is defined as 50Mhz clock rate at 0.55v.
There should be no surprise to below optimal energy spot. With voltage increases, leakage will drop but dynamic power will go up. There is a point that the total power is the lowest. Indeed industry has lots of discussion about NTC. But due to variation, it is hard to achieve. NTC is normally not used for mass production SOCs. This paper introduces how they mitigate the risk of using NTC.
8th, multiple active and sleep modes and DVFS
For active modes, Blackghost has NTC (0.55v and 50Mhz), turbo mode (1.05v and 200Mhz), and several other active modes in between. This sounds straightforward but very challenging for backend since they have to close timing at a broad range and balance leakage/dynamic power. It also means Blackghost adopts Dynamic Voltage Frequency Scaling (DVFS).
Blackghost also defines several sleep modes from standby mode (clk-gating), partly power gating, all power gating, to retention mode.
9th, partially retained FFs
Blackghost adopts partial retention for flip-flops. There are about 1K FFs in total retained. Retention voltage is as low as 0.3-0.4v.
10th, asynchronous clock domains
Blackghost adopts globally asynchronous and locally synchronous design partition to lower power consumption. “when EDA tools strived for the design closure of a fully synchronous clock domain in the NTC mode, the number of clock buffers in the clock tree and data buffers (that are needed to fix transition slew, setup time and hold time violations) went up superlinearly with the size of the clock domain”
11th, lower IR drop
Low IR drop is critical to NTC due to ultra low operating voltage. Blackghost achieves low static IR drop through power distribution network enhancement such as using thick metal layers and customizing power switch placement. In the meanwhile, dynamic IR drop is reduced through deploying decap and edge-cap cells in high density.
12th, on-die PVT (process, voltage, temperature) monitor
This is for run-time close-loop DVFS control.
13th, Vth tuning through body bias
CMOS body bias can change gate Vth. Blackghost has optional Vth tuning through forward (FBB) and reverse (RBB) body bias. Below shows with different BB levels how logic clock rate and leakage can be affected.
14th, pmu with mixed buck and LDO
To enable direct lithium-ion battery attachment, Blackghost analog PMU must handle input voltages up to 4.5 V. To bridge the gap between such high input voltage (4.5v) and SOC internal voltages (0.55v/etc), switch-mode-power-supply (SMPS) buck regulators are used with LDO.
15th, low power SRAM partition
SRAM partition is investigated. “The usage of smaller SRAM sub-macros … enables finer-grain leakage management. It also segments the word-lines and bit-lines … hence lowering total switching capacitance and switching power … Such power advantage is particularly predominant at low toggling rates.” Blackghost chosen 4KB small macro based on power/area tradeoff.
16th, customer standard cell library and SRAM macro
Yeah, it is Qualcomm. For most chip companies, don’t even think about it. Anyway, below is performance comparison of Blackghost SRAM vs TSMC SRAM. Blackghost SRAM is low in both retention power and dynamic power.