22C:116, Lecture 20, Fall 1999

Douglas W. Jones
University of Iowa Department of Computer Science

General Concepts in Distributed Systems
There is a broad (and in fact, multidimensional) spectrum of distributed systems. To explore this spectrum, consider asking the following questions:
- How many processors are in the system
- Is memory shared between processors
- How many processes are allowed per processor
- How do processes communicate
- Can processes move from processor to processor
- Can resources move from place to place
- What are the shared resources
In answering these questions, four large classes of systems emerge, isolated uniprocessors, the typical subject of introductory courses, plus multiprocessors, networks of systems, and distributed systems.
To add to the confusion, the answers to some of these questions may change depending on the level of abstraction from which the system is examined. For example, some systems that appear to share memory at the user level are implemented on top of packet switched networks.
Multiprocessors
Some computer systems have multiple CPU's and multiple memories, but (under the constraints of some protection mechanism) any code running on any processor may access any location in memory. Some of these machines have a single ready list shared equally by all processors, while others assign processes to processors for any of a variety of reasons.
The University of Iowa has had a number of these systems, since the mid 1980's. For many years, we had a number of Encore Multimax systems at the Weeg Computer Center, and now we have a number of large Silicon Graphics systems on campus that are good examples of such multiprocessors. The idea of building such multiprocessors dates back to work by Conway in the early 1960's, and was widely used on Burroughs mainframes from the mid 1960's onward. In the late 1970's, Encore and Sequent reintroduced such architectures in the marketplace, and today, such architectures are relatively common in high-end scientific computation.
It is important to note that, at some level of abstraction, multiprocessor systems may be viewed as having a communications network. Typically, this network connects processors and memory, and it is typically possible to isolate transactions on this network that resemble messages. If the messages in the interconnection network are generated by hardware and concern individual memory references, we will classify such systems as shared-memory systems, while if the messages are at a higher level, generated by software at the user or system level, we will claffify such systems as distributed systems.
The discussion that follows deals with symmetrical multiprocessors, that is, with multiprocessors where all CPUs have the same instruction set and an effort is made to divide the work evenly between all. The alternative to this is to designate one CPU as a primary CPU and view the others as special purpose coprocessors, peripheral support processors, etc. Most modern computer systems have large numbers of such secondary processors that are viewed by the operating system as parts of I/O devices.
Operating systems for multiprocessors frequently differ only slightly from systems for uniprocessors. The difference is that, where a uniprocessor had a single idle process on the ready list, multiprocessors must have enough idle processes to satisfy all processors when there is no work to be done, and the ready list can be shared equally by all processors. If a system is written with a uniform mechanism for entry and exit from critical sections, most of the changes required to move such a system to a multiprocessor are confined to this mechanism. This explains why UNIX has been successfully ported to a number of multiprocessors.
The other major difference between multiprocessor and uniprocessor operating systems is visible only in the lowest level kernel code. Where a critical section on a uniprocessor operating system kernel could guard a critical section with, for example:
```
	disable interrupts

	-- critical code

	enable interrupts
```
The same critical code on a multiprocessor must be guarded by:
```
	disable interrupts
	spin-lock

	-- critical code

	spin-unlock
	enable interrupts
```
The interrupt enable/disable prevents other activity on the same CPU from occuring, while the spin-lock (using, for example, test-and-set instructions, Dekker's solution, Lamport's minimum latency solution, or any of a number of others) prevent other CPU's from interfering.
If all critical sections are clearly identified in the code, the conversion of a uniprocessor system to a multiprocessor system can be quite straightforward. This is why most modern systems, including UNIX, Windows NT, and MacOS, are all able to handle symmetrical multiprocessing.
Network Operating Systems
A network operating system is a conventional operating system which includes provisions for attaching it to a network. Most versions of UNIX in common use today are network operating systems, and networking features have become standard in MacOS and the various Windows systems from Microsoft.
Typical features that distinguish a network operating system from a stand-alone operating system include:
Remote Command Execution, for example, as provided by the UNIX rsh command. This allows a user to issue a command to be executed on a particular remote system. For example, to run the sync command on a remote UNIX system named cow, assuming you have accounts that are configured correctly, you may type:
```
	rsh cow sync
```
Remote File Access is provided, for example, by the UNIX to UNIX rcp subsystem and by the more general ftp subsystem. These subsystems allow users to copy files from one system to another. For example, to copy a file from one UNIX system to another (assuming the appropriate accounts exist with the appropriate rights):
```
	rcp cow:src goat:dst
```
This copies the file src on cow to the file dst on goat.
Remote Login is provided for example, by the UNIX to UNIX rlogin command and by the more general telnet subsystem. These allow users of one system to open interactive sessions on a remote system, for example, to start a session on a machine named tractor, type:
```
	rlogin tractor
```
At a lower level, network operating systems must provide user processes with access to communications protocols for communicating over the network with processes on remote machines. These provide the basis for the implementation of network-oriented commands such as those outlined above.
Network file systems are a special feature of most network operating systems. These allow multiple machines in a network to share one logical file system, even though the different machines may run otherwise unrelated operating systems. For example, the NFS protocols developed originally by Sun Microsystems provide this ability for UNIX systems, and there are versions of the MacOS and Windows that allow those machines to support the same protocols and share file systems.
Distributed Operating Systems
A distributed operating system differs from a network of machines each supporting a network operating system in only one way: The machines supporting a distributed operating system are all running under a single operating system that spans the network. Thus, the print spooler might, at some instant, be running on one machine, while the file system is running on others, while other machines are running other parts of the system, and under some distributed operating systems, these parts migrate from machine to machine at times.
With network operating systems, each machine runs an entire operating system. With distributed operating systems, the entire system is itself distributed across the network. As a result, distributed operating systems typically make little distinction between remote execution of a command and local execution of that command. In theory, all commands may be executed anywhere; it is up to the system to execute commands where it is convenient.
Homogenous and Inhomogenous Systems
Network operating systems are naturally compatable with inhomogenous networks, that is, networks containing many different kinds of machines. Thus, for example, the Internet connects many UNIX machines, but it also connects machines running DEC's VMS operating system, IBM PCs running MS/DOS, OS/2 and Windows NT, IBM Mainframes running IBM's VM operating system, and Apple Macintosh computers. This inhomogeneity is tolerated as long as each network operating systems involved supports some subset of the same protocols.
Distributed operating systems have typically been implemented on homogenous networks, where all machines support identical instruction sets. In fact, though, if there are any differences between machines, even, for example, differences in optional extensions such as floating point units, the system must distinguish between machines in exactly the same way it would have to if it supported machines with different instruction sets.
The most troublesome differences in a distributed system are those involving differences in data representation. Some machines store integers least-significant byte first, while others store integers most significant byte first. As a result, if an array of integers is transmitted in byte sequential order from one machine to another, the integers may have their bytes shuffled! The same problem may occur with floating point numbers, even when the source and destination machine agree on a common floating point format such as the IEEE standard!