Pcie device low power states require system driver to explicitly put device into low power states and pcie link can in turn go into low power link states. Pcie spec allows pcie link to go to low power states without system driver get involved. This feature is the so-called Active State Power Management (ASPM). In general, either system driver side hardware or device hardware can for example detect some idle time on pcie link and then initiate transition to move
There are two low power link states:
- L0s, also called L0 standby. Mandatory for all pcie devices. Los I applied to a single direction of pcie link. So if device initiates L0s, device can put its tx into L0s while its rx is still in L0.
- L1 ASPM. Optional. Both directions of pcie link needs to be in L1.
For ASPM to work, system driver needs to first read link capabilities register in device configuration space to learn whether ASPM is supported by this pcie device. [11:10] of link capabilities register is active state link PM support bits. [11:10]=00 is reserved. [11:10]=01 indicates L0s is supported. [11:10]=10 is reserved again. [11:10]=11 indicates both L0s and L1 are supported. Device also uses [14:12] and [17:15] to indicate L0s and L1 exit latencies.
After system driver reads link capabilities register, driver can write link control register in configuration space to enable L0S and ASPM L1. [1:0] of link control register is active state PM control bits. [1:0]=00 indicates both are disabled. 01 indicates L0s is enabled and ASPM L1 is disabled. 10 indicates L0s is disabled and ASPM L1 is enabled. 11 indicates both are enabled.
When either system or device side pcie phy hardware detects idle time on pcie link, it can put its tx into L0s state. It doesn’t need to tell higher layer software to block outbound TLP transactions. L0s exit is also initiated by the corresponding phy hw.
L1 entrance is different since L1 needs both directions to end up in L1. Below is how device requests the link to get into ASPM L1:
- Device phy detects certain idle time on pcie link. Note this idle time is implementation dependent and is normally 7-10us. Device then blocks new outbound transaction to system.
- Device keeps sending PM_Active_State_Request_L1 DLLP to system side until it receives PM_Request_ACK from system side.
- System receives PM_Active_State_Request_L1, blocks new transaction to device, keeps sending PM_Request_ACK until receives electrical idle ordered set.
- Device receives PM_Request_ACK, sends electrical idle ordered set, and puts its tx into electrical idle
- System receives electrical idle, puts its tx into electrical idle. Now Pcie link is in ASPM L1 state.
Either system or device can initiate ASPM L1 exit.
Compared with device low power state sequence mentioned in PCIE Tutorial: Software Initiated Device Power Management, it can be seen no system driver is needed to initiate L1 entrance. This is important since device can put itself into low power state and then put pcie link in L1. It is device who knows its power state better than anyone else.
Let’s check a pcie trace captured by a Teledyne LeCroy PCIe analyzer. First device side detects link idle for several us (micro-second). It sends PM_Active_State_Request_L1 DLLP to Root Complex, aka system software side. RC replies with PM_Request_Ack DLLP. RC may send another PM_Request_Ack to device if electrical idle is not received. This ack DLLP may be indicated as packet error but it is ok. It is actually expected since the other side is in electrical idle.
Here is another example. Device sends PM_Active_State_Request_L1 to RC as before. But device immediately notices it has mem wr TLP to RC. It sends out. Then it needs to re-send PM_Active_State_Request_L1 after link idle is detected. Eventually when RC gives back PM_Request_Ack, link can go to L1.
Pcie spec also defines L1.1 and L1.2 low power substates.
Below is a good description of history and general idea of L1.1 and L1.2:
By the time PCI Express was developed in 2002, additional “Link States,” or “L-States,” were included in the specification. These follow the template of the existing “D-States.” “L0” reflects a PCI Express link in full operation. “L1” is a link which is not transferring data but which can relatively quickly resume normal operation. “L2” and “L3” each reflect a link with main power removed (“L2” indicates an auxiliary power supply is active to provide “keep alive” power to devices). An “L0s” state was also defined in which each direction of the PCI Express link could be shut down independently with a quick resumption to normal operation. Fully active devices in “D0” can transition between “L0”, “L0s” and “L1” with no software intervention and thus save power on their own initiative without any operating system interaction.
While this seemed like a solution to the problem of active device power savings, the Achilles heel of the “L-States” turned out to be the speed of powering up or powering down. The PCI Express specification called for devices to exit from “L0s” in less than one microsecond, and from “L1” in somewhere on the order of 2-4 microseconds. Although PHY designers could idle their receiver and transmitter logic in L1 to meet those resumption times, they were forced to keep their power-hungry common-mode voltage keepers and phase-locked loops (PLLs) powered on and running. This meant that each lane of a PCIe PHY in “L1” could still be consuming 20-30 milliwatts of power, which would clearly be too high for a battery-powered device.
By 2012 it was becoming clear that combinations of specialized hardware and software in such mobile devices could handle transitioning PCI Express components between normal and low-power states if only they had a mechanism to do so. The “L1 PM sub-states with CLKREQ” ECN to PCI Express (often referred to simply as “L1 sub-states”) was introduced to allow PCI Express devices to enter even deeper power savings states (“L1.1” and “L1.2”) while still appearing to legacy software to be in the “L1” state.
The key to L1 sub-states is providing a digital signal (“CLKREQ#”) for PHYs to use to wake up and resume normal operation. This permits PCI Express PHYs in the new sub-states to completely power off their receiver and transmitter logic as it is no longer needed for detection or signaling of link resumption. They can also power off their PLLs and potentially even common-mode voltage keepers as the ECN specifies two levels of resumption latency. The L1.1 sub-state is intended for resumption times on the order of 20 microseconds (5 to 10 times longer than the L1 sub-state allowed), while the L1.2 sub-state targets times on the order of 100 microseconds (up to 50 times longer). Both sub-states should permit good PHY designs to power off their PLLs. The L1.1 sub-state requires maintaining common-mode voltage, while the L1.2 sub-state allows it to be released. Well-designed PCI Express PHYs in the L1.1 sub-state should be able to reach power levels around 1/100 of that in L1 state. Likewise, in L1.2 sub-states, those PHYs should reduce power to about 1/1000 of L1 state.
PCIE-SIG engineering change notice of L1 PM substates with CLKREQ is a more thorough doc.
Conventional L1 state is also the new L1.0 substate.
L1.1 consumes lower power than L1.0. This is because pcie ports are not required to be enabled to detect electrical idle. But pcie link common mode voltages are still maintained.
L1.2 is even lower power than L1.1 since link common mode voltages are not maintained.
Both L1.1 and L1.2 requires a bidirectional open-drain CLKREQ# for entry to and exit from the state.
L1.1 and L1.2 entering sequence is like below:
- first device detects some idle time on pcie link and it wants to enter L1.0 which is the conventional L1 state.
- device will check if L1.1 and L1.2 are enabled by system driver.
- If disabled, device will just go to L1.0.
- If enabled, device will checked LTR value against pre-programmed LTR thresholds for L1.1 and L1.2. If LTR value is no less than LTR threshold, device shall further check CLKREQ#
- if CLKREQ# is high indicating system doesn’t need device to be in L0, device can get pcie link into L1.1 or L1.2.
LTR threshold is in configuration space. LTR value is the LTR message device sends to system periodically. LTR stands for Latency Tolerance Reporting. This is for device to report their service latency requirements for Memory Reads and Writes to the system side such that system can be power managed without impacting device functionality and performance. LTR message value is no less than LTR threshold means device is saying I can tolerate long delay of reading and writing system memory (so ok to put link into low power state).