I would like to share some experience of semaphore implementation. We have a multiple processor system and we’d like to protect a resource access using semaphore. Since multiple processors are all ARM based and system bus is ARM’s AMBA bus, intuitively I try to find ARM based semaphore solution.
After some research, it looks ARM processors provide exclusive access instructions and they can be used to implement software semaphore. One good document of this topic is ARM Synchronization Primitives. It specifies how LDREX (exclusive load) and STREX (exclusive store) are used to achieve atomic memory access which can then further be used to implement software mutex and semaphore.
The LDREX instruction loads a word from memory, initializing the state of the exclusive monitor(s) to track the synchronization operation. For example, LDREX R1, [R0] performs a Load-Exclusive from the address in R0, places the value into R1 and updates the exclusive monitor(s).
The STREX instruction performs a conditional store of a word to memory. If the exclusive monitor(s) permit the store, the operation updates the memory location and returns the value 0 in the destination register, indicating that the operation succeeded. If the exclusive monitor(s) do not permit the store, the operation does not update the memory location and returns the value 1 in the destination register. This makes it possible to implement conditional execution paths based on the success or failure of the memory operation. For example, STREX R2, R1, [R0] performs a Store-Exclusive operation to the address in R0, conditionally storing the value from R1 and indicating success or failure in R2.
An exclusive monitor is a simple state machine, with the possible states open and exclusive. To support synchronization between processors, a system must implement two sets of monitors, local and global. A Load-Exclusive operation updates the monitors to exclusive state. A Store-Exclusive operation accesses the monitor(s) to determine whether it can complete successfully. A Store-Exclusive can succeed only if all accessed exclusive monitors are in the exclusive state.
But this doc stops here without giving much details about how local and global monitors are implemented. After further research, I finally find more details in another doc called “ARMv7-M Architecture Reference Manual”. This one is not public and you need to get an ARM account to download it.
Here is the local monitor state diagram.
Here is the global monitor state diagram.
They are similar to each other. The basic idea is straightforward. In local monitor case, if one process requests and wins exclusive access to a memory address using exclusive load (LoadExcl), local monitor state machine transitions from open access to exclusive access, that address or to be more accurate address space is tagged, and other processes can load but can’t store to that space anymore. The holding process uses exclusive store (StoreExcl) to release the exclusive access.
Similarly, in global monitor case, if one processor (labeled as n) requests and wins exclusive access to a shared memory address using exclusive load (LoadExcl), global monitor state machine transitions from open access to exclusive access, that address or to be more accurate address space is tagged, and other processors (labeled as !n) can load but can’t store to that space anymore. The holding processor (n) uses exclusive store (StoreExcl) to release the exclusive access.
Note when a LDREX instruction is executed, the resulting tag address ignores the least significant bits of the memory address:
Tagged_address == Memory_address[31:a]
The value of a in this assignment is IMPLEMENTATION DEFINED, between a minimum value of 2 and a maximum value of 11. For example, in an implementation where a = 4, a successful LDREX of address 0x000341B4 gives a tag value of bits[31:4] of the address, giving 0x000341B. This means that the four words of memory from 0x000341B0 to 0x000341BF are tagged for exclusive access. Subsequently, a valid STREX to any address in this block will remove the tag.
The size of the tagged memory block is called the Exclusives reservation granule. The Exclusives reservation granule is IMPLEMENTATION DEFINED between:
• One word, in an implementation with a == 2.
• 512 words, in an implementation with a == 11.
But what bothers me is there are lots of transitions are specified as “implementation defined”. ARM’s Cortex-M document says local monitor is implemented inside cortex-m. But i don’t find it when tracing inside cortex-m core rtl. Cortex-m doc says below two ports are used to connect to exclusive monitor which I assume is global monitor. I do find these two ports and don’t see a reference global monitor design in ARM’s SDK.
(See below U1 for update)
However, inspired by above local and global monitor state diagram, a simple hardware semaphore can be implemented as below. It supports multiple requesters and each of them can independently request and release semaphore. State machine has two states, default open access state and exclusive access state. When requester n sends req(n) and wins assuming multiple requests occur at the same time, state transitions from open access to exclusive access. Use multiple bit FFs to store n as the holding requester. Status(n) is asserted to 1 to inform requester n semaphore is successfully acquired. During exclusive access state, other non-wining requesters’ request/release are all ignored. When wining requester n releases semaphore, state transitions back to open access and the whole process starts again.
Let’s use one signal, req, to indicate both semaphore request (req=1) and release (req=0). Another signal, status, indicates if semaphore is successfully acquired (status=1) or not (status=0). Timing diagram is like below.
Next is how to hook up above hardware semaphore module to firmware. Req can be connected to a fw programmable register bit so fw can program 1 and 0 to acquire and release semaphore. Status can be connected to a status register which fw can poll and also connected to processor interrupt input so instead of polling processor can be interrupted when semaphore is acquired.
U1: how to implement a global exclusive monitor with AMBA bus and how Cortex-M processor implements local exclusive monitor
Use points to gain access. You can either purchase points or contribute content and use contribution points to gain access.