The classic spin-lock mechanism rests on instructions to atomically
exchange the contents of a memory location and a register, or equivalently,
on an instruction to atomically test the value in a memory location and
then set that location to a new value (the Test-and-Set instruction made
famous on the IBM System 360).
Implementing these primitives on CPUs intended for use in shared-memory
multiprocessors requires that each CPU be able to take exclusive use of
the shared memory through a load-store sequence, and this, in turn, must
rest on a hardware solution to the mutual exclusion problem. This is
expensive!
One of the most innovative features of the DEC Alpha computer is the
way this computer solves this problem. On the Alpha, there is no atomic
Test-and-Set or Excange-Memory-and-Register instruction. Instead, there
are the following instructions:
- LDL_L r,m and LDQ_L r,m -- Load Long or Quad Locked
- Loads the specified longword (32 bits) or the quadword (64 bits) into
the register r from memory location m. As a side
effect, it resets the CPU's lock flag and causes a special
subsystem of the CPU to begin snooping for references to location
m.
- STL_C r,m and STQ_C r,m -- Store Long or Quad Condtional
- If the lock flag is reset, stores the contents of r in
memory location m (either a longword or quadword, depending on
the instruction used). If the lock flag is set, the instruction
will not store anything in memory. In either case, the lock flag
is returned in r and reset.
- The lock flag
- This is reset by both the Load Locked and Store Conditional instructions.
It is set whenever the snooping subsystem detects a store operation by
any CPU to the address m that was most recently used as an
operand on a Load Locked instruction by this CPU.
The idea of this set of instructions is that no memory interlocking
is required in the normal case -- where normal is defined as the absence
of contention. Only when there is a collision between two CPUs trying
to claim the same spin-lock will there be overhead in claiming the lock.
In contrast, with the Test-and-Set class of operations, every test of a
lock variable inside a spin-lock will potentially block other CPUs from
using the shared memory subsystem.
The Question: Write out an algorithm using the Alpha's Load Locked
and Store Condtional instructions that will perform an atomic operation
equivalent to the atomic Exchange Memory and Register operations of classical
architectures. (Hint: Use pseudocode; the algorithm is only a few machine
instructions.)
Then: Build on what you just did by outlining the entire code for
the lock() and unlock() primitives given in Lectures 4 and 5, using the Alpha
instructions instead of an exchange instruction.
Consider a system with 3 FIFO queues shared by an arbitrary number of
processes where each queue may have multiple producers and consumers,
where the queued items are buffers, where the queues are implemented using
linked lists of buffers, where there is a single pool of free buffers
shared by all queues, and where there are a total of 10 buffers in the system.
Assuming that semaphores are used for all mutual exclusion and syncronization,
how many semaphores are required by this system, total, what is each used
for, and what is the maximum and minimum value of the count in each?
Practical Note: This kind of buffer management for FIFO queues is common
inside operating systems -- specifically, for buffers of data being held
by the system between the time the user requests I/O and the time the I/O is
completed.