What if we use token passing to assure mutual exclusion, and the token is lost? If more than one process regenerates the token, mutual exclusion is lost! How can the processes agree on who is to regenerate the token?
Or, what if we use a token ring network, and the token is lost? If more than one machine regenerates the token, the network will not work correctly! How can the machines agree on which is to regenerate the token?
Or, what if a group of processes is providing fault-tolerant backup for a process that performs a critical service. When that process dies, one of the surviving backup processes must take over as the service provider, but if two take over, chaos results. How can the survivors agree among themselves which process is to take over.
The same algorithms may be used to solve any of the above problems! These are called election algorithms.
When a process detects that an election is needed, for example, because a timer expired indicates the loss of a token, or because it believes that a server has failed and must be replaced by one of the backups, it calls for an election.
Note that we do not assume a central authority to arbitrate the election! Typically, the call for election must be broadcast to every other process in the relevant group, and the result of the election algorithm must be that all the processes agree on two things: First, that an election has been called and that it has been completed, and second, that a particular process won the election.
Note that, unlike elections in human society, we aren't really interested in using something akin to democracy to select the best candidate. Nonetheless, we are very democratic! We assume that all voters are equally qualified to hold the office and that, as a result of the election, any voter may win. The only requirement we pose is that all voters know who won when the election is over so that the losers can continue as citizens while the winner assumes office.
Note, however, that many election algorithms do include information that can be extended to allow bidding for office, where each voter advertises his or her qualifications as the balloting is conducted, so that the best qualified candidate wins.
If each process has a unique identifier, for example, the concatenation of the host machine's unique identifier and the serial number assigned to that process, the election algorithm may select a winner by selecting the process that holds the minimum (or equivalently, the maximum) unique identifier.
For most purposes, the important thing is to elect a process, and it doesn't matter which process is elected. If there is actually a reason that one process might be preferred over another, for example, if different processes run at different speeds, processes can bid for the election by providing not just their identifier, but a numeric estimate of their suitability as a prefix on their identifier. In this case, the most suitable process is the one that gets elected, and the unique identifiers are used only to break ties.
One simple election algorithm involves circulating a message around a loop. Each process has a designated successor in the loop. If process i believes an election is needed, it sends a message saying "Process i initiated this election with bid bi". Here, the bid contains, at minimum, the process's unique ID.
When process j receives such a message, it compares it's own bid bj with the bid in the message; if the bid in the message wins, the message is simply forwarded around the ring. If the bid in the message loses, the process replaces the bid in the message with bj.
If process i receives an election request message that it originated, the bid in the message contains the unique ID of the winner. Process i now knows the identitity of the winner, but the other processes in the ring must be informed, so process i must circulate a new message saying "Process i declares the winner of the election to be j". This message must travel once around the ring before all processes know the result of the election.
What happens if two processes initiate elections at the same time with the above ring algorithm? Assuming the same processes participate in both elections, the elections will arrive at the same result, but in some cases, this can still lead to trouble because the winner will receive two notices telling it that it won.
To eliminate this problem, note that the initiation of two overlapping elections on the same ring implies that some process will receive a second notice that an election has been called before it receives a notice of the results from the first election.
It is tempting to simply suggest that processes simply ignore election notices they receive between receiving the first notice of any election and the results of that election, but this does not work because when two elections overlap, it is easy to show that, under this rule, each election notice will be discarded by some station.
The solution is to have stations that are waiting for the results of an election remember who initiated that election. If a second election notice arrives at such a station, it should be discarded if the first election was initiated by a process with a larger process ID, and the first election should be cancelled if the new notice has the larger ID. This rule guarantees that, no matter how many elections are started in parallel on a set of processes, only one election will be completed.
Thus, the minimum essential content for an election notice is the ID of the initiator and the highest bid received, plus a bit indicating whether this election notice is a request for votes or the announcement of the results. The basic election algorithm can be extended to gather other information, for example, the identities of the voters or their current states.
What happens if an election notice is lost in the above ring algorithm? If the initiator of an election fails to hear a reply after some amount of time, it may re-initiate the election. If some process votes in an election and then fails to hear the result of the election after some amount of time, it may start an election of its own. The time limits for these failures are typically determined by the expected worst-case round-trip time for the cycle of processes participating in the election.
The failure of a process is harder to deal with, because this damages the connectivity of the cycle. In this case, a new cycle must be constructed using alternative paths, and this requires that the processes in the group know not only about the identities of their successors in the original cycle, but about other members of the cycle. If each process acknowledges each message it gets with a reply to the sender, then a process can detect the failure of its successor and begin a search for a working successor.
The same meta-election algorithm may be executed on an arbitrary graph structured network by having the election initiator broadcast the request for election along a spanning tree. When the outgoing request reaches a leaf of the network, it is reflected back towards the initiator, with each node returning the winner of it's subtree towards the initiator. When the initiator receives the answers from every subtree, it selects the overall winner and sends the election result out towards every leaf.
If the spanning tree to be used is computed dynamically by the broadcast algorithm, this will find all reachable surviving processes; as such, tree-based election algorithms that are inherently at-least partly fault tolerant are fairly easy to construct.
Each tree node has the following states: