The rendezvous construct in the Ada programming language provides an interesting example of how remote procedure call semantics can be presented to high level language programmers. Ada allows a program to be composed of many processes, called tasks.
Communication between tasks takes place when code in one task calls an entry of another. A call to an entry of a task is syntactically identical to a procedure call. Here is an example:
Calling Task Called Task task T; entry E(formals); : begin : : T.E(actuals) accept E(formals); : : -- body end accept; : end;
The entry definition of a task simply declares that the entry is part of the public interface of the task. The task may contain multiple entries, and the task body may contain one or more accept statements for every entry. The body of the accept statement (between the keywords accept and end accept) is only executed when the control flow of the task reaches the accept statement and when some other caller has called that entry.
Conceptually, the caller sends a message to the called task containing the actual parameters to the rendezvous, and the called task sends a message back to the caller after the completion of the body of the accept statement. The body of the accept statement is called a rendezvous (a French word meaning get-together). The following petri net notation correctly expresses the semantics of a simple rendezvous between two processes:
Calling Task Called Task | | |______________________ | | | _V_V_ join | V Body | __V__ fork | | ______________________| | | | V VThis conceptual model is not practical and it does not deal with the possibility of multiple callers. Instead, the following model is usually used in implementing the rendezvous:
Calling Task Called Task | | _V___ send | | | | | |_______message______ | | queue | | | _V_V_ receive | | V V waiting Body | | | ___V_ send | | | | _______return_______| | | | message | _V_V_ receive | | | V VThis implementation is clearly suitable for a distributed system, where a remote procedure call protocol is clearly applicable. On a uniprocessor, the outgoing message is still required; this carries, at a bare minimum, a pointer to the calling process description, from which the parameters can be extracted. The reply message in this context may be replaced by a semaphore, perhaps one semaphore per process used only when that process is awaiting a return from a rendezvous.
In either a distributed or a centralized implementation, the entries to a task declare interprocess message queues, and the accept statements for each entry are receive operations on the associated queue.
The actual implementation of the Ada rendezvous is complicated by two factors: First, Ada supports time limits; a timed call to a rendezvous will abort if the rendezvous is not entered within the time limit, and a timed accept statement will abort if no client calls that entry within the time limit. In order to prevent the execution of the body of a rendezvous when the caller aborts, it is essential to exchange additional messages in the implementation of a timed rendezvous.
Second, Ada supports accept statements that service more than on entry to a task. If an accept statement will accept calls to either entry A or entry B, it must wait on both queues and continue when either one or the other (or both) contain data, accepting exactly one message addressed to either A or B each time it falls through. This form of the accept statement resembles a case statement, with one body for each entry that may be accepted.
When a group of processes must cooperate to perform a single task, it is sometimes necessary for one process in the group to send messages to all others. (In a shared memory environment, this is commonly implemented with what is called barrier synchronization.) Consider the case where a group of processes alternately compute and exchange results; for example, each process may be updating one entry in a matrix, where each computation stage involves inspecting the values of neighboring entries from the previous computation phase.
The synchronization relationship between these processes may be summarized by a petri net such as the following:
_________________________ | _______________ | | | _____ | | | | | | | | V V V | | | Compute Compute Compute | | | | | | | | | __V_______V_______V__ | | | | | |_____| | | | |_______________| | |_________________________|Recall that the horizontal bar in a Petri net means "wait until control arrives at the bar from all sources, and then deliver control to all destinations". For the simple single input, multiple output case, it models a process fork, a message transmission, or a V operation. For the simple multiple input, single output case, it models a process join, message receive or a P operation. Here, with the same number of inputs as outputs, it represents all processes waiting for all the others before continuing.
This synchronization operation between a group of processes is called barrier synchronization. Efficient implementations of barrier synchronization are essential to many distributed algorithms.
The simple minded implementation of barrier synchronization is to have each process send a message to each of the others and then await the messages each of the others sends back. The same basic algorithm may be expressed using semaphores, as follows:
Process a Process b Process c V(Sb) V(Sa) V(Sa) Semaphores V(Sc) V(Sc) V(Sb) Sa P(Sa) P(Sb) P(Sc) Sb P(Sa) P(Sb) P(Sc) Sc
In order to synchronize N processes, this requires the exchange of N(N-1) messages! This is O(N**2). If these messages can be exchanged in parallel, the critical path is short (one message transmission delay), but exchanging these messages in parallel is rarely possible. Implementing a barrier this way on a network with bottlenecks, for example, on an ethernet or token ring network, is very inefficient!
The class of network protocols designed to solve this problem are usually described as a class of group communication protocols.
One way to cut down the number of messages exchanged during group communication is to arrange the processes into a ring and circulate a message around this ring to synchronize the group and exchange information.
On the first round, every process adds its message to the circulating message, saying, in effect, "I'm ready and here's my contribution to our shared venture". On the second round, each process receives permission to continue, along with the complete corpus of information computed by all the other processes.
This process takes 2n-1 interprocess messages, O(n), so it scales very well, for example, on an ethernet or token ring. On the other hand, the critical path through this protocol is of length n -- no process may continue until it receives the reply to the message it sent, and this takes one complete circuit of the ring.
By using an out-and-back protocol instead of a ring protocol, the number of messages can be reduced by one at the expense of almost doubling the length of the critical path.
On an arbitrarily structured graph, an effective way to synchronize a group of processes is to use a spanning tree of the graph. Given a precomputed spanning tree and a designated root process, a barrier can be implemented by having the leaves send messages towards the root when they reach the barrier.
When an internal node in the tree reaches the barrier, it awaits messages from all its subtrees, then sends a summary to the root. When the root reaches the barrier, it awaits messages from all its subtrees and then forwards a summary to each subtree before continuing.
When a subtree receives a summary message from the root, it sends this to each subtree under it before continuing. Leaves merely await their summary messages and then continue.
This protocol takes 2(N-1) messages or O(N). The proof of this is based on the fact that any tree of N vertices has N-1 edges, and that each edge is traversed by exactly 2 messages, one towards the root (we're ready), and one from the root (OK, you can go).
Dynamic computation of the spanning tree is essential for a fault tolerant version of this protocol.
The critical path through this protocol clearly depends on the structure of the graph! If the graph is "stringy", for example, a straight line, the critical path is related to the length of this line. In a more compact tree, the critical path will typically be of length O(log N).
Some low level communications protocols include provisions to simplify group communications. For example, both Ethernet and ring networks are inherently broadcast or multicast networks -- messages from one station to another are potentially seen by all other stations or by all intermediate stations.
Given this, it is easy to design low-level broadcast protocols based on the assignment of a special network address that is accepted by all network receivers. Given such an addressing scheme, the higher level software may interpret part of the broadcast message as an indication of which group of processes is supposed to receive copies of the message.
This scheme reduces the cost of the simple minded barrier implementation from O(N**2) point-to-point messages to O(N) broadcasts.
While this presentation has emphasized barrier synchronization, many system designers have emphasized multicast or group communication at a lower level, where one process sends a message to a group of others. The protocols for solving that problem are trivially contained within the protocols discussed above.