Consider a distributed system, with many machines scattered around a net. Some machines have disks, some have none. What are the alternative approaches for implementing file service on this system? Here, we are not so concerned with what service is offered by each server, but rather, with the internal architecture of the servers and the impact of the particular communications primitives used on how the servers function.
The simplest form of message addressing we will consider involves addressing messages directly to a process. This implies that all messages to a process end up in the same incoming message queue; if the process is a server, for example, a file server, this implies a structure like the following:
repeat
await request
process request
send reply
forever
This simple model is sufficient for servers that provide very simple services,
specifically, this may be the right model to use if the server never blocks
during the processing phase. On the other hand, if the server must send
messages to a subsidiary server, for example, when a file server communicates
with a disk I/O server, problems arise:
repeat
* await request from client
process step 1
send request to sub server
* await reply from sub server
process step 2
send reply to client
forever
As examples, consider a directory server that stores its directories as
a file, using a low-level file server to do the work, or consider a low-level
file server that uses a disk server to store user files. These examples
all suggest the above structure.
This structure fails if there is only one incoming FIFO queue of messages per process! In this case, the two starred lines (await) have no way of distinguishing between incoming requests from clients and replies from subsidiary servers.
One solution to the problem outlined above is to rewrite the server so that it classifies incoming messages as either requests or replies. Each message must therefore have a prefix identifying the message type. The software structure of such a server can get quite complex:
repeat
await message
case message-type of
request:
allocate transaction record (TR)
process step 1 relative to TR
send request + TR to sub server
reply:
get TR from reply
process step 2 relative to TR
send reply to client recorded in TR
deallocate TR
end case
forever
This scheme, although complex, offers one important advantage! It allows
multiple transactions to be processed at the same time. Consider the
situation where two transactions arrive in quick succession: The server
will process step 1 of both before it awaits answers from the subsidiary
server on behalf of either. If there are multiple independent subsidiary
servers, or if the subsidiary server can optimize the order in which it
processes requests, it is possible for the resulting system to have faster
throughput than a naive system that finished one transaction before starting
another.
Unfortunately, this "event processing loop" involves complex code, and most programmers would prefer to be able to write something simpler. This can be done, as follows:
repeat
----
| if deferred-queue empty
| await message
| (assert it is a request from a client)
| else
| dequeue request from deferred-queue
| endif
----
process step 1
send request to sub server
----
| repeat
| await message
| if it is a request from a client
| enqueue it in the deferred-queue
| endif
| until it is a reply from the sub server
----
process step 2
send reply to client
forever
This bit of code is also ugly, but the uglyness is confined to
the marked sections of code, each of which substitutes for a starred
line in the naive code. The resulting code does not offer the possibility of
overlapping the processing of multiple transactions, so the resulting
throughput may not be as good as in the previous example. On the other
hand, it may be significantly easier to debug, and it should serve well
for simple low performance servers.
In fact, the "uglyness" in the above code is a special case of a more general solution, in which the prefix on each message identifies the logical queue to which that message is directed. Thus, we might write a general wait routine as:
typed_wait( msg-type )
if empty( queue[ msg-type ] )
repeat
await an incoming message
t = type of received message
if not( t = msg-type )
enqueue message on queue[ t ]
endif
until t = msg-type
else
dequeue message from queue[ msg-type ]
endif
This is, in effect, how systems like DEMOS implement multiple mailboxes
for each process. Furthermore, it is not difficult to generalize the above
to allow a selective wait for any of a set of message types.
There are alternatives to this model; for example, on some systems, there
are primitives that allow newly received messages to be inspected and put
back on the queue if they don't fit some criteria, or they allow a wait for
messages from a particular sender.
Given only one queue per thread instead of one queue per process, we can solve the problems outlined above in another way, using multiple threads per server (or equivalently, multiple processes sharing an address space, with one queue per process). This leads to the following server design:
repeat
await request from client
create a new process to do the following
process step 1
send request to sub server
await reply from sub server
process step 2
send reply to client
exit
end new process
forever
This piece of code has some interesting properties. First, note that
the process that receives requests from clients is not the same process
as the one that eventually sends replies to clients! The process that
receives requests creates one short lived process per request, and all
further work resulting from that request is done by that new process.
The second useful property of this code is that it allows for parallelism in exactly the same way as the event processing code. The processing of different requests can be carried on in parallel, but most of the benefit of this is likely to come from overlapping interactions with subservers.
Note that the only processes that know about the short-lived request processing processes are the main process of the server, which creates them, and the sub-server, which learns their identity as part of the reply address that is provided with the request messages. As a result, the short-lived processes never need to worry about receiving any messages other than the expected replies.
This code has a problem! If requests arrive faster than they can be processed, large numbers of short-lived processes may be created. In most systems, there is a limit on the number of processes that may be created, and to avoid exceeding this limit, the server must generally avoid creating too many processes, for example, by using a general semaphore to count processes:
repeat V(pop) to initialize the semaphore
repeat
P(pop)
await request from client
fork a new process to do the following
process step 1
send request to sub server
await reply from sub server
process step 2
send reply to client
V(pop)
exit
end new process
forever
On a UNIX based server, the main server process can use process termination
signals to await the termination of a short-lived processes in the event
that too many short-lived processes have been created.
On systems where process creation is expensive, a multithreaded server is impractical. An alternative is to introduce multiple queues. This was done on DEMOS, and the following simple server structure results:
repeat
await message in main mailbox
(assert it is a request from a client)
process step 1
create new mailbox
send request+newbox to sub server
await reply in new mailbox
destroy new mailbox
process step 2
send reply to client
forever
This code is amost identical to the naive (and incorrect) code give
at the beginning of this section; the big change is the use of a new
(shortlived) mailbox for each transaction with a subsidiary server.
The above code does not exploit the opportunities for parallelism that were exploited by the event processing code or by the multithreaded code, but this can be done if there is a "multiwait" operation, as on DEMOS. In this case, the control structure is that of the event processing code:
boxset = { main mailbox }
repeat
await message in boxset
case box id of
main mailbox:
create new mailbox B
boxset := boxset + { B }
client[B] := client
process step 1 relative to B
send request+B to sub server
other:
B := box id
process step 2 relative to B
send reply to client[B]
destroy B
end case
forever
The advantage of this scheme over the pure event processing scheme is that
the simple code can be used for simple servers, and only if these begin to
limit system performance need the code be rewritten as above.