The usual assumption surrounding such assignments as
A := 0; A := -1is that assignment is atomic, that is, it either happens or it doesn't. If the above two statements are executed in parallel, or if one statement is interrupted in order to complete the other, a programmer will typically expect only two possible outcomes. These are expressed by the following postcondition:
(A = 0) or (A = -1)What if the variable A is 64 bytes long on a machine where all loads and stores are in terms of 32 bit quantities? Or, on a small microcontroller, what if the variable A is 16 bits long, on a machine where all load and store operations operated on 8 bit quantities. Suddenly, there are a number of other possible outcomes! Each of the following outcomes is now possible on the 8/16 bit microcontroller:
(A = 0) or (A = -1) or (A = 255) or (A = -256)If the low half of the word A is A0 and the high half is A1, the following sequence of assignments will give the value 255:
Process 1 Process 2 A0 := 0; A0 := 255; A1 := 255; A1 := 0;The usual solution to this is to use some kind of mutual exclusion mechanism, so that the assignments to A0 and A1 are done in a critical section, but this cannot prevent failures. Consider the following:
Process 1 Process 2 A0 := 0; A1 := 0; A0 := 255; < A1 := 255; > -- failsSuch a failure, affecting one process but not the other, is easily imagined on a multiprocessor, but even on a uniprocessor, if two different users are running code that shares a memory segment, and then one user hits control-C to kill a process, this kind of consequence is possible.
The consequence of this partial failure is that the illegal value 255 still appears in the shared variable, so that, even if the mutual exclusion mechanism can recover from the failure of one process inside the critical section, and grant entry to another, the shared data is corrupted.
This problem shows up whenever the value of a shared variable can only be updated one piece at a time. For example, if A is a logical record on disk that consists of multiple physical records. The problem may occur not only on parallel computers, but also on purely sequential machines when there is asynchronous context switching.
Analogous problems can occur in the absence of context switching, if a sequence of assignments is interrupted by a failure. Consider:
A0 := 0; A1 := 0; ... -- some time later A0 := 255; < A1 := 255; > -- failsHere, if the data being updated is on disk or in "persistant RAM" such as RAM with battery backup or old-fashioned core memory, the failure can leave an illegal value in memory, a value that was only partially updated.
One way to assure that a shared variable is updated on an all or none basis is to perform the updates "off line". This requires a way to atomically move the updated copy into public view. Consider the following scheme:
The publically visible copy of a composite shared variable is pointed to by a pointer; a process wishing to update the composite variable updates a copy, and then atomically updates the pointer. We assume that updates to pointers are atomic -- that is, that pointers occupy exactly one hardware word, where a word is defined as the unit of atomic update in memory.
P a pointer to the shared value Inspect(V): V := P^; -- return the value pointed to by P Update(V): new( temp ); temp^ := V; P := temp; -- make P point to the desired valueThe above code uses a newly allocated cell from the heap for each update to the atomicly updated variable. This generality is unnecessary. If N processes compete for access to the shared variable, it must be possible for each process to independently update its own private copy of the variable; thus, the atomically updatable pointer must have at least N possible values, corresponding to the private variables of each process.
Therefore, if the machine allows atomic updates of n bit quantities in memory, this scheme allows up to 2n processes to compete.
In fact, we do not need a completely general heap! If each process has two cells able to hold a single value each, and if each process uses these cells in turn, updating the least recently updated cell each time an update is required, and finishing the update with a pointer assignment, this scheme will work. Therefore, pointer assignment schemes can support atomic update on a 4-bit machine with as many as 8 processes, or on an 8-bit machine with as many as 128 processes.
How do you make an assignments to a multi-byte object look atomic without using pointer assignment and without using memory proportional to the number of processes? This is easy if there is no problem with failure, but it is harder when failures are possible!
Leslie Lamport developed algorithms for updating a shared variable in the face of failures. This assumes that a mutual exclusion algorithm guarantees that only one process tries to update the variable at a time, and in this context, it guarantees that the result of the update will be correct, in the face of failure.
The basic operations offered by Lamport's stable storage algorithms are:
inspect( V ) returns value (or V.inspect() ) update( V, value ) ( V.update( value ) )Lamport's stable storage rests on a new and redundant representation for the stored value and it rests on two procedures (or protocols) for inspecting and updating the stored value.
Conceptually, it is reasonable to use a client-server world view and imagine the inspect and update procedures as being the services offered by the server to clients. If the server fails, we can easily start a new server, as long as the variable itself is stored in such a way that it survives the failure.
A stable variable V is represented as a record where every field is duplicated:
Copy1 -- the value stored Time1 -- the time it was stored Sum1 -- a checksum over the value and time Copy2 Time2 Sum2There are two copies of the value, and for each copy, there is a record of the time at which the value was last updated and a record of the checksum computed as of the last update.
The fault tolerance of Lamport's scheme improves if the two
(or more) copies of the
The update and inspect operations must be performed inside critical
sections, and if failure is to be survived, these critical sections
must use some algorithm that can detect failure of a process inside
a critical section and release any associated mutual exclusion semaphores.
The assignment statements shown above will not, typically,
execute instantaneously. On a cache machine, for example, there
may be a delay before the values are written back to main memory.
If disks are involved, there may be a considerable time between
the issuing of a write request and the actual write to disk.
If disk cache software is involved, it is even possible that
the write to disk will never happen.
This illustrates something that was pointed out earlier, in the
context of the discussion of disk caches. In general, fancy
cacheing algorithms can improve performance, but they can get in
the way when fault tolerance is the goal! We must have a way
to force cached values out into the real memory before we can
rely on this stable storage algorithm. In systems derived from
UNIX, the kernel primitive that does this is fsync().
1) There were no errors
In this case, the checksums on both copies will be valid,
both will be the same, and they will have the same
timestamp.
In this case, either copy may be returned.
2) There was a failure such that update managed to write one
updated copy of the data, but it didn't get to start
writing the other updated copy.
In this case, the checksums on both copies will be valid,
but their timestamps will differ. Return the copy with
the most recent timestamp.
3) There was a failure during one of the updates, or there
was a failure that corrupted one of the stored values between
updates.
In this case, the checksum on the copy in question will
be invalid. The other copy should still be OK and can be
returned.
If the failure occurs during the write of the second copy,
it will be as if the write was successful because the first
copy will have already been written and will be available
for use.
If the failure occurs during the write of the first copy,
it will be as if the write never occurred.
4) A failure or failures destroy all copies. There is nothing that
can be done about this, but it takes multiple failures to cause
this kind of damage, and the probability of this multiple failure
can be made arbitrarily low by storing more copies in more widely
distributed locations.
Note that the stable storage update procedure may be made to be
even more reliable if it does a write-read-verify cycle on each
copy it writes out, that is, it writes each to memory and then reads
it back and verifies correct storage before continuing.
Also, note that no checksum algorithm can detect all possible errors in
the stored data. Checksums can be made arbitrarily error resistant,
but but they cannot be made perfect. For any checksum algorithm, there
is some combination of errors that will defeat it!
Atomic assignment and inspection are not enough to guarantee that
a shared variable is used correctly through possible fault sequences! For
example, consider the problem of transferring money from one checking
account to another:
The term transaction comes from the financial world, as is
suggested by the above example. Transactions can be quite complex:
For example, if I deposit a check in a bank, the transaction involves
debting the account of the check writer, crediting my account, and
crediting the sum representing the current day's checks received.
Each transaction typically involves three sets of variables:
In one sense, atomic transactions are trivial to implement. The
entire database needed for the transaction could be stored as a
single variable using Lamport's stable storage. A single mutual
exclusion semaphore suffices to guard access to this variable.
This trivial solution is unacceptable! Most interesting databases
cannot be claimed totally for each update. Instead, the process
performing an update must claim only part of the structure, locking
only those variables needed while allowing other processes to operate
on other parts of the structure at the same time.
Consider, for example, the Bank of America. This is a large bank
with branches all over California. It would be impossible to
operate such a bank if each transaction required giving the teller
in charge of that transaction exclusive use of the bank's entire
data base for the duration of the transaction.
The sets I (inspected variables) and U (updated variables) are not
known in advance.
Typically, some subset of I must be inspected in order
to determine the other variables in I and U.
This leads to the following two-phase model of transactions.
Deadlocks can be avoided by claiming locks in a standard order, but
this is not always practical, in the general case. For example,
if each database record contains pointers to other records that may need to
be locked and inspected, you must claim the record containing the pointer
in order to follow the pointer, and any cycles in the pointer structure are
potential sources of deadlock.
The 2 phase model guarantees that things will work, but it is overly
restrictive. There are specific problems where a greater degree of
concurrent access can be allowed by violation of the two-phase model.
For example, consider a shared lexicographic binary tree. The
two-phase model requires that a user of any node in this tree must
lock all nodes on the path from the root to the node in question
before releasing any of the locks.
In fact, the data structure will work just as well if the user only
holds a few locks at a time. Specifically, the user first claims the
lock on the root, and then claims the lock on the child of the root
that is on the path to the target node in the tree. As soon as it
is determined that the root is not the direct parent of the target,
the root can be released and the process can be continued recursively.
In the end, the process holds a lock on the target node and on the
parent of the target node.
Note: The above definitions of the two-phase model of transactions
do not guarantee atomicity. They only describe the mutual exclusion model
that atomic transaction systems provide for the user. The implementation of
atomicity in this model will be the subject of the next lecture!
Procedure Update( V, value )
begin
update time
V.Copy1 := value
V.Time1 := time
V.Sum1 := Checksum( value, time )
-- wait for the assignment to really finish
V.Copy2 := value
V.Time2 := time
V.Sum2 := Checksum( value, time )
-- wait for the assignments to really finish
end
The utility of this code relies on keeping two (or more)
copies of the value, where no failure will corrupt more than
one copy at a time. The wait for one update to finish before
starting another update is very important, in this regard.
Procedure Inspect( V )
begin
-- optionally start by reading all fields from disk
if V.Sum1 = checksum( V.Copy1, V.Time1 )
if V.Sum2 = checksum( V.Copy2, V.Time2 )
if V.Time1 > V.Time2
value = V.Copy1
else
value = V.Copy2
endif
else
value = V.Copy1
endif
else
if V.Sum2 = checksum( V.Copy2, V.Time2 )
value = V.Copy2
else
-- failure --
endif
endif
return value
end
This code is fairly simple -- there are four cases to
consider:
Jones pays Smith some Amount
Jones := Jones - Amount
Smith := Smith + Amount
To avoid accidentally creating or destroying money, either both updates
must be made before an error occurs, or neither must be made!
In the general case, there may be three machines here, one belonging
to a bank where Smith has an account, one belonging to a bank
where Jones has an account, and a third machine that mediates the
transaction.
V -- the set of variables
I in V -- the subset of V inspected
U in V -- the subset of V updated
In the following, we will make the slightly stronger assumption:
U in I -- updated variables are all inspected
Typically, we assume that every variable is associated with a lock,
a binary mutual exclusion semaphore; this allows processes to claim
exclusive use of that variable. If a variable is to be updated but
does not need inspection, we lock it as if we needed to inspect it first.
Phase 1
Claim locks and inspect variables
until all needed variables are locked.
All variables must be locked prior to inspection.
If multiple variables are locked, deadlock is possible. Deadlocks
can be resolved by aborting and restarting transactions because
they always occur in phase 1, before the transaction has modified
any shared variables.
Phase 2
Update variables and release locks!
The 2 phase model can be improved by using more interesting locks.
Instead of 2-state locks -- where the two states are free and in-use,
for example, 3-state locks can be used, with the following states:
free
read-only
in-use (read-write)
If a process only needs to read a variable, it may claim the variable
by setting the lock to read-only. This is legal if the previous
state of the lock was free or read-only. If a process may need to
write a variable, it must claim the lock as in-use, a claim that can
only be made if the lock was free. Thus, a variable may have multiple
readers. Locks that support this are said to solve the
readers-writers problem.