Homework 3

22C:116, Fall 1997

Due Friday Sept. 12, 1997, in class

Douglas W. Jones

The classic spin-lock mechanism rests on instructions to atomically exchange the contents of a memory location and a register, or equivalently, on an instruction to atomically test the value in a memory location and then set that location to a new value (the Test-and-Set instruction made famous on the IBM System 360).
Implementing these primitives on CPUs intended for use in shared-memory multiprocessors requires that each CPU be able to take exclusive use of the shared memory through a load-store sequence, and this, in turn, must rest on a hardware solution to the mutual exclusion problem. This is expensive!
One of the most innovative features of the DEC Alpha computer is the way this computer solves this problem. On the Alpha, there is no atomic Test-and-Set or Excange-Memory-and-Register instruction. Instead, there are the following instructions:

LDL_L r,m and LDQ_L r,m -- Load Long or Quad Locked
Loads the specified longword (32 bits) or the quadword (64 bits) into the register r from memory location m. As a side effect, it resets the CPU's lock flag and causes a special subsystem of the CPU to begin snooping for references to location m.

STL_C r,m and STQ_C r,m -- Store Long or Quad Condtional
If the lock flag is reset, stores the contents of r in memory location m (either a longword or quadword, depending on the instruction used). If the lock flag is set, the instruction will not store anything in memory. In either case, the lock flag is returned in r and reset.

The lock flag
This is reset by both the Load Locked and Store Conditional instructions. It is set whenever the snooping subsystem detects a store operation by any CPU to the address m that was most recently used as an operand on a Load Locked instruction by this CPU.
The idea of this set of instructions is that no memory interlocking is required in the normal case -- where normal is defined as the absence of contention. Only when there is a collision between two CPUs trying to claim the same spin-lock will there be overhead in claiming the lock. In contrast, with the Test-and-Set class of operations, every test of a lock variable inside a spin-lock will potentially block other CPUs from using the shared memory subsystem.
The Question: Write out an algorithm using the Alpha's Load Locked and Store Condtional instructions that will perform an atomic operation equivalent to the atomic Exchange Memory and Register operations of classical architectures. (Hint: Use pseudocode; the algorithm is only a few machine instructions.)
Then: Build on what you just did by outlining the entire code for the lock() and unlock() primitives given in Lectures 4 and 5, using the Alpha instructions instead of an exchange instruction.
Consider a system with 3 FIFO queues shared by an arbitrary number of processes where each queue may have multiple producers and consumers, where the queued items are buffers, where the queues are implemented using linked lists of buffers, where there is a single pool of free buffers shared by all queues, and where there are a total of 10 buffers in the system.
Assuming that semaphores are used for all mutual exclusion and syncronization, how many semaphores are required by this system, total, what is each used for, and what is the maximum and minimum value of the count in each?
Practical Note: This kind of buffer management for FIFO queues is common inside operating systems -- specifically, for buffers of data being held by the system between the time the user requests I/O and the time the I/O is completed.
Consider a heap that runs from addresses 100 to 1000 (decimal). How many free blocks will this heap be divided into by the binary buddy system, as documented in lecture 6, prior to any allocation or after the final deallocation and merger of free blocks.
Big hint: The answer is not 1!
Write reasonably detailed pseudocode for the allocate and deallocate (or malloc() and free()) operations on a boundary-tag heap with lazy merger of free blocks.