Douglas W. Jones
Part A: Two low-level messages between sender and receiver are necessary in order to implement the synchronization requirements of this protocol? Assuming that the actual message being sent occupies only a few bytes and that a process ID is comparably small, the sender sends [Ready to send, sender's ID, message], and when the recipient gets the message, it sends [Received, recipient's ID].
Part B: If errors are possible, the above protocol could be extended with a unique interaction ID added to each message to disentangle messages from previously retransmitted messages. Blocked senders must periodically retransmit their [Ready to send] message until they the appropriate [Received] message arrives. In the event that the [Received] message is lost, the sender will retransmit a message that has already been received. In this event, the recipient must reply immdeiately, whether or not the process is blocked awaiting a message, and must then discard the incoming message.
Part C: Here is (briefly) a failure scenerio that demonstrates that reliable delivery is not sufficient to guarantee fault tolerance in a set of applications: The processor running one process in the distributed application dies! Reliable delivery to a faulty computer is of no value!
1) When a process is selected for migration, it is stopped, as in DEMOS. If the process was blocked awaiting receipt of a message, this does not matter. If it was blocked in a send, the message it was trying to send is retracted.
2) A destination machine is selected and a process template is allocated there.
3) The process memory image is moved, along with any other kernel data structures. There is no need to move any incoming message queues because queuing of messages in not required!
3) The process is started on the destination machine. If the process was blocked when it was stopped, it comes up blocked on the new machine. If it was blocked in a send, it re-initiates the send operation.
4) The process is deleted on the originating machine, and it is replaced with a forwarding address.
5) If, at any time, the kernel finds that another process wishes to communicate with a process that has moved, indicated by the presence of a forwarding address, it rejects the communication attempt and sends a change of address notice to the kernel hosting the process trying to communicate.
On receipt of a forwarding notice, the kernel on any machine updates message sending capabilities (or their equivalent) and retransmits requests-to-send messages on behalf of all processes that were blocked in an attempt to send to the process that moved.
demos_send( demos_addr, msg )
send( demos_addr.p, [demos_addr.box, msg] )
demos_receive( box, msg )
thread_p( box.data )
dequeue( box.queue, msg )
background_thread
repeat
receive( msg, sender )
enqueue( msg.box.queue, msg.data )
thread_v( msg.box.data )
forever
I assumed that I had a kernel thread package available to support the
background process that handles unpackaging the received messages into
the Demos-style mailboxes. If a UNIX-style signal is delivered to the
receiving process every time a message becomes available to receive,
the body of the background thread could be rewritten as a signal handler.
If there was a way to do a nonblocking receive (or to ask "would receive
block if I called it now?"), the signal mechanism isn't needed, and the
signal handler can be called from each Demos-style primitive to check
for incoming messages and unbundle them.
Part A: To implement UNIX style directories on this system, we store each directory as a file stored on the lower level file servers. A directory is simply a list of ordered pairs, each relating a symbolic file name to a file number in the lower level system.
With this minimal scheme, opening a file involves a broadcast addressed to all lower-level file servers saying "who has this file?" To avoid this, each directory entry can have an added field, the identity of the server or servers last known to hold this file. Only if the indicated low-level server says "I don't have it!" does the directory server issue the broadcast "who has this file?"; on receiving a result, the directory server updates the directory to indicate the new holder of the file.
For this design, the open service would return a tuple [file-number, server, 0] as the descriptor of the newly opened file. The 0 is the initial file position.
Part B: Here is quick-and-dirty pseudocode in terms of the above above for the user's read() operation:
read( descriptor, buffer, length)
loop
send( descriptor.server,
[read_request,
descriptor.file_number,
descriptor.position,
descriptor.length,
return_address]
)
await reply
if not(reply.status = I_dont_have_it) exit loop
-- here we account for migration
broadcast( [who's_got_it,
descriptor.file_number]
await reply saying I've got it!
descriptor.server = reply.sender
end loop
length = min( reply.bytes_read, length )
descriptor.position += length
buffer = reply.data truncated to length bytes
return length
Part C: Here is quick-and-dirty pseudocode for the implementation of the open() operation in the directory server, in terms of the above.
open( name )
directory = [root_directory_file_number,
root_directory_server, 0]
parent = null
loop while not( name = null )
component = name.first
name = name.remainder
loop
-- looking up component in directory
temp = directory
len = read( directory, tuple, tuple_length)
if len = 0 return "file not found!" to user
if not( temp.server = directory.server )
-- someone migrated, fix the directory!
if parent_descriptor = null
root_directory_server = directory.server
else
parent_tuple.server = directory.server
temp1 = parent
write( temp1, parent_tuple, tuple_length )
endif
endif
until tuple.name = component
parent = temp
parent_tuple = tuple
directory = [tuple.file_number, tuple.server, 0]
endloop
return directory
The above code has an interesting fault. It correctly accounts for
directories that migrate from server to server, but it does not account
for file migration! Also, it does not distinguish between data files and
directories. If x is a data file, this code will, if given x/path as
a file name, attempt to interpret the contents of x as a directory.
Fixing this requires adding code to account for file types.