22C:116, Lecture 22, Spring 1997

Douglas W. Jones
University of Iowa Department of Computer Science

Communication Protocols
A communication protocol is an agreement between the users of a communication medium about how the medium is to be used. For example, a typical asynchronous serial communications protocol might be described as follows:
```
               time ___________________\
      V1      _ _ _ _ _ _ _ _ _ _      /
        _____| |_|_|_|_|_|_|_|_|_|_|_|____
      V0      | 1 2 3 4 5 6 7 8 P  V
              |      data       |  Stop bits
          Start bit             |
                              Parity bit.
```
The time scale must be specified as part of the protocol, as must the voltages V1 and V0 used to encode zero and one. With V₁ = -15 and V₀ = +15, this is typical of the RS232 protocol.
It is not sufficient to give this low level protocol. To effectively use a communications line, there must also be an agreement about the interpretation of data. For example, the protocol could specify that the ASCII character set is used.
Even specifying the character set is not sufficient, however. There must be a convention for identifying messages embedded in the string of characters. The ASCII code includes special control characters that were intended for this use. These were originally intended to be used to construct messages with the following format:
```
    SOH -- start of heading
       header -- for example, the address of the recipient
    STX -- start of text
       message text
    ETX -- end of text
       trailer -- for example, the message checksum
    EOT -- end of transmission
```
Other characters provided by the ASCII code related to this kind of protocol issue are:
```
    ENQ -- enquiry
    ACK -- acknowledgement
    NAK -- negative acknowledgement
```
These might be used to enquire about the previously transmitted block. An ACK might indicate that the block was received correctly, and a NAK might indicate that there was an error.
Protocol Hierarchy
In general A protocol takes a stream of data and embeds it in a stream containing not only that data but also other information.
```
              ____________________
               DATA   DATA   DATA
              ____________________
          _________/|      |\_________
         /          |      |          \
  ______________________________________________
   DATA |   stuff   | DATA |   stuff   | DATA
  ______|___________|______|___________|________
```
This added information is not necessarily anything the user wants sent, but it is necessary (under the protocol) for the correct conveyance of the data.
Protocols exist at many different levels of abstraction, and early protocols frequently mixed and confused these levels. The ISO (International Standards Organization) Open Systems Interconnection model protocol hierarchy clearly separates the levels and helps system designers avoid the problems of early protocols.
An example of a confused protocol is IBM's Word Processing EBCDIC protocol. This specifies everything from the use of the BISYNC data transmission method to the use of typeballs on a selectric printing mechanism. Issues of flow control, communications line management, and character set coding are completely mixed.
If one says RS-232, one is specifying a low-level protocol for conveying ones and zeros. If one says Asynchronous, 56 kilobaud, one is specifying a somewhat higher level protocol for conveying a stream of bytes over some low level protocol. If one says ASCII, one is specifying an even higher level but somewhat mixed up protocol for the interpretation of those bytes. The problem with saying ASCII is that the ASCII (or ISO-7, as it is now known) character set specifies both a set of interpretations of bytes for printable characters and a set of control characters that were originally intended for higher level protocols but are, today, rarely used as intended.
An open system is a system where the components are made by different manufacturers without any direct communication between them. Open systems only work when there are standards for interconnecting the components. Some standards, like the Centronics printer interface or Postscript, are the result of a single company leading the marketplace and being copied by others; other standards, such as ASCII and the ISO OSI suite, are the result of committee actions that span many users and manufacturers.
The ISO OSI Reference Model has seven layers, as follows
```
           ------------
          |application | -- not an OS problem?
          |------------|
          |presentation| -- not an OS problem?
          |------------|
          |session     | -- establish a connection
          |------------|
          |transport   | -- multiplex sessions
          |------------|
          |network     | -- connect systems
          |------------|
          |data link   | -- manage one data link
          |------------|
          |physical    | -- electrical and mechanical
           ------------ 
```
The physical layer
Examples:
- RS-232 Asynchronous data at 9600 baud,
  - 1 start bit
  - 8 data bits
  - 1 parity bit
  - 2 stop bits
- 50 Ohm BNC baseband Ethernet (thin wire) with grounded shield, terminated, running at 10 megabaud.
In both of the above examples, the type of the physical network connection is given (the RS-232 standard specifies a 25 pin miniature delta connector, and "50 ohm BNC" is a type of connector). Both specify voltages used for signalling (RS-232 specifies +15 volts for logical 0, -15 volts for logical 1; the baseband Ethernet standard includes a set of voltage assignments). Both specifications include a basic interpretation of the data on the line, in terms of how to identify the start and stop of data and how to encode the individual bits.
The link layer
Consider a point to point data format where data is formed into blocks with the following structure based on the protocol suggested by ASCII's control characters:
```
     >>-- transmission direction -->>
   ______________________________________
  |_|_|_____|_|_______________|_|______|_|
 ENQ EOT |   ETX      |        STX  |   SOH
         |          data           head
      checksum                       the number of bytes
        CRC-16 computed over         in the data part
        the head and data.
```
The problem with data that contains characters that are accidentally meaningful to the protocol is a consequence of "in-band signalling". This term comes form telephony. A protocol using out-of-band signalling relies on two separate communications channels, one to send user data, and one to send the data necessary to control the communications link.
With in-band signalling, both user data and control information are sent over the same channel, and unless care is taken, there are problems that can arise when the two are confused.
In the above example, it is important to make sure that the following control characters are not present in the data:
```
           ENQ EOT ETX STX SOH ACK NAK
```
Typically, if these are found in the data, the transmission software must substitute something else, typically something like an escape sequence, for example ESC-1 for ENQ, ESC-2 for EOT, and so on up to ESC-8 for ESC, in order to prevent the data from corrupting the protocol.
The touch-tone signalling mechanism used in telephony is an example of in-band signalling. In the early 1970's, Captain Crunch cerial (made in Cedar Rapids) was sold with a small whistle in each box. This whistle, unfortunately, was tuned to a signalling frequency used in long distance telephone lines (2600 cycles). The effect of injecting this signal into a line was to cause the remote long-distance exchange to terminate the connection and listen for touch-tone signals encoding the new destination being called. Unfortunately for the telephone companies, the billing for the long-distance call ended with the 2600 cycle tone, and the new long-distance call was made at no charge to the customer.
When the telephone companies discovered their error, they got Quaker Oats to discontinue their promotional giveaway, and over the decade that followed, they moved to out-of-band signalling.
The now somewhat legitimate hacker magazine 2600 takes its name from this bit of history.
The network layer
```
          * Absolute Addressing
             _______________________
            |______________|_|______|
               data         |  fixed size address
                        bytes
                       of data

          * Path Addressing
             _________________________________
            |______________|_|____|____|____|_|
               data         |   variable     number of
                        bytes   sized        address
                       of data  address      components
```
With absolute addressing, the sender specifies the name of the destination machine, and it is up to the network layer to find a route through the network to get to the destination.
With path addressing, the sender specifies a path to the destination, and the network layer sends the data to the first machine on the path, at which point, the first machine strips off its address and forwards the message to first machine on what is left of the path.
In any case, the network layer is concerned with routing the data from one machine to another.
The transport layer
The transport layer deals with movement of data between logical senders and receivers. Thus, each machine on the network may have more than one logical destination for messages.
For example, data may be transported between processes, or it may be transported from a sending process to a named network socket, an abstract named destination -- a process may be able to receive information from more than one socket.
```
  A stream between sockets
    ______________________________
   |______________|_|______|______|
      data         |   |     socket
               bytes   |     number
              of data  |
                       sequence number
```
The layers between the hardware and the transport layre don't necessarily guarantee that packets of data will arrive at their destination in the order in which they were sent. If this order matters, the transport layer must add sequence numbers to each outgoing message and it must sort incoming messages into order.
The transport layer also multiplexes messages from multiple logical sources on one machine, and it demultiplexes messages addressed to different destinations on the machine. One way of identifying the source (or destination) of a message is by socket number. Sockets identify logical destinations of a message, not the machine to which the message is addressed. It is up to the transport layer to determine (for the network layer) where the message should go, physically.
The session layer
The name of the session layer suggests that its inventors expected that this layer would implement interactive sessions between users on remote machines. Typically, the transport layer manages the delivery of messages from logical sender to logical receiver, but the session layer is given the job of organizing these into streams of bytes.
Not all applications need streams, though, so an alternate session layer might organize data into remote procedure calls or other transaction oriented structures.
Protocol Bloat
Protocol layering can cause a problem known as protocol bloat. This is because, at each layer in the protocol, it is tempting to include such information as block size, checksum, and similar information. If the same problem is solved at many different layers in the protocol, the result will be considerable extra network traffic!
The ISO OSI model focuses on standardizing the data stream at each level in the protocol hierarchy. This is probably necessary for open systems, where components at each level may come from different sources, but it is not a good software engineering methodology for constructing an integrated system.
It is not good software engineering practice to focus on data structures, particularly at an early stage in the design. It is far better to focus on functional decomposition of a problem first, and then to concentrate on the abstract components needed.
Hierarchies are useful, but design in terms of functional layers! Think about the transparency or opacity of layers!
Transparent layers add function without adding overhead. The concept of transparency originated in a paper by D L Parnas in the early 1970's.
An opaque layer in a hierarchidal design hides the details of lower levels in the design. Opacity is useful when one goal is to allow the lower level to be changed with no effect on upper levels.
A transparent layer allows the facilites of a lower level to be used directly by upper levels, without any need to, for example, call on procedures or functions at an intermediate level which forward the request to a lower level.
Transparent layers allow high performance. Typically, layers that add functionality but don't provide for implementation independance should be transparent. Layers that provide for implementation independance but add no functions should be opaque, and layers that do both should be partially opaque.
In the context of the ISO OSI protocol suite, a transparent session layer is a good idea. Once this establishes a new channel at the transport level, the application should be able to directly use the lower level protocol. With appropriate modularization, the transport layer at the sending end of a logical link can even let the sender deal directly with the network layer.