Douglas W. Jones
Part A: Two low-level messages between sender and receiver are necessary in order to implement the synchronization requirements of this protocol? Assuming that the actual message being sent occupies only a few bytes and that a process ID is comparably small, the sender sends [Ready to send, sender's ID, message], and when the recipient gets the message, it sends [Received, recipient's ID].
Part B: If errors are possible, the above protocol could be extended with a unique interaction ID added to each message to disentangle messages from previously retransmitted messages. Blocked senders must periodically retransmit their [Ready to send] message until they the appropriate [Received] message arrives. In the event that the [Received] message is lost, the sender will retransmit a message that has already been received. In this event, the recipient must reply immdeiately, whether or not the process is blocked awaiting a message, and must then discard the incoming message.
Part C: Here is (briefly) a failure scenerio that demonstrates that reliable delivery is not sufficient to guarantee fault tolerance in a set of applications: The processor running one process in the distributed application dies! Reliable delivery to a faulty computer is of no value!
1) When a process is selected for migration, it is stopped, as in DEMOS. If the process was blocked awaiting receipt of a message, this does not matter. If it was blocked in a send, the message it was trying to send is retracted.
2) A destination machine is selected and a process template is allocated there.
3) The process memory image is moved, along with any other kernel data structures. There is no need to move any incoming message queues because queuing of messages in not required!
3) The process is started on the destination machine. If the process was blocked when it was stopped, it comes up blocked on the new machine. If it was blocked in a send, it re-initiates the send operation.
4) The process is deleted on the originating machine, and it is replaced with a forwarding address.
5) If, at any time, the kernel finds that another process wishes to communicate with a process that has moved, indicated by the presence of a forwarding address, it rejects the communication attempt and sends a change of address notice to the kernel hosting the process trying to communicate.
On receipt of a forwarding notice, the kernel on any machine updates message sending capabilities (or their equivalent) and retransmits requests-to-send messages on behalf of all processes that were blocked in an attempt to send to the process that moved.
demos_send( demos_addr, msg ) send( demos_addr.p, [demos_addr.box, msg] ) demos_receive( box, msg ) thread_p( box.data ) dequeue( box.queue, msg ) background_thread repeat receive( msg, sender ) enqueue( msg.box.queue, msg.data ) thread_v( msg.box.data ) foreverI assumed that I had a kernel thread package available to support the background process that handles unpackaging the received messages into the Demos-style mailboxes. If a UNIX-style signal is delivered to the receiving process every time a message becomes available to receive, the body of the background thread could be rewritten as a signal handler. If there was a way to do a nonblocking receive (or to ask "would receive block if I called it now?"), the signal mechanism isn't needed, and the signal handler can be called from each Demos-style primitive to check for incoming messages and unbundle them.
Part A: To implement UNIX style directories on this system, we store each directory as a file stored on the lower level file servers. A directory is simply a list of ordered pairs, each relating a symbolic file name to a file number in the lower level system.
With this minimal scheme, opening a file involves a broadcast addressed to all lower-level file servers saying "who has this file?" To avoid this, each directory entry can have an added field, the identity of the server or servers last known to hold this file. Only if the indicated low-level server says "I don't have it!" does the directory server issue the broadcast "who has this file?"; on receiving a result, the directory server updates the directory to indicate the new holder of the file.
For this design, the open service would return a tuple [file-number, server, 0] as the descriptor of the newly opened file. The 0 is the initial file position.
Part B: Here is quick-and-dirty pseudocode in terms of the above above for the user's read() operation:
read( descriptor, buffer, length) loop send( descriptor.server, [read_request, descriptor.file_number, descriptor.position, descriptor.length, return_address] ) await reply if not(reply.status = I_dont_have_it) exit loop -- here we account for migration broadcast( [who's_got_it, descriptor.file_number] await reply saying I've got it! descriptor.server = reply.sender end loop length = min( reply.bytes_read, length ) descriptor.position += length buffer = reply.data truncated to length bytes return length
Part C: Here is quick-and-dirty pseudocode for the implementation of the open() operation in the directory server, in terms of the above.
open( name ) directory = [root_directory_file_number, root_directory_server, 0] parent = null loop while not( name = null ) component = name.first name = name.remainder loop -- looking up component in directory temp = directory len = read( directory, tuple, tuple_length) if len = 0 return "file not found!" to user if not( temp.server = directory.server ) -- someone migrated, fix the directory! if parent_descriptor = null root_directory_server = directory.server else parent_tuple.server = directory.server temp1 = parent write( temp1, parent_tuple, tuple_length ) endif endif until tuple.name = component parent = temp parent_tuple = tuple directory = [tuple.file_number, tuple.server, 0] endloop return directoryThe above code has an interesting fault. It correctly accounts for directories that migrate from server to server, but it does not account for file migration! Also, it does not distinguish between data files and directories. If x is a data file, this code will, if given x/path as a file name, attempt to interpret the contents of x as a directory. Fixing this requires adding code to account for file types.