Chapter 10, Interrupts and Queues

Overlapping computation with input-output

Part of the notes for 22C:50, Introduction to System Software
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

The Problem with Polling

The polled input-output routines presented in the previous chapter work, but they severely limit system performance. They make no provision for overlap of input-output activity and computation, and they make no provision for simultaneous input-output to multiple devices. This may be acceptable on small personal computers, but even on single-user systems, users are sometimes annoyed by such things as the inability of the system to respond to keyboard input at the same time it is doing disk input-output.

The basic problem is that the input-output drivers can "capture" the CPU and and hold it in a polling loop until the I/O is completed. Thus, to allow and up in a polling loop until the I/O is completed. Thus, to allow overlap of computation and input-output, or to allow simultaneous operation on multiple devices, the polling loops must be moved out of the input-output drivers! To allow this, we must change our input-output driver so that a call to a device driver places data in a queue, for output, or takes data from a queue, for input; this allows data transfers between the queue and the device can be done independently, whenever polling the device status indicates that it is appropriate.

There are essentially three places that the polling burden can be moved:

First, it can be moved to the main program, so that the entire program is structured as a single giant polling loop; this places severe restrictions on the structure of the application, but it is, nonetheless, an important approach that is still widely used in some applications.
A second option is to localize the polling code in a procedure which must be periodically called from within the application program; this has no impact on the structure of the application code but it does clutter it with calls to the polling procedure.
The final alternative is to have the hardware perform the polling and raise an interrupt whenever a device becomes ready; this is the most convenient alternative for the programmer, and despite the added hardware complexity, this approach has dominated essentially all operating systems developed after 1965, with the exclusion of the first generation of systems developed on minicomputers and microcomputers.

Whenever multiple devices are supported, a second problem arises, that of buffering. If data arrives from an input device while the program is dealing with an output device, the input data must be held somewhere until the program can process it. Similarly, if a program produces more output than an output device can transfer in one operation, the excess data must be stored somewhere so that the program can continue processing. Of course, the applications program must stop and wait for input when there is nothing in the input buffer at the time data is needed, and it must stop and wait when it produces more data for output when the output buffer is full, but outside of these limiting cases, we would like the application to be able to run essentially continuously, in parallel with input-output activity.

Queues

Once we separate the actual input-output transfer from the moment when the application program request input-output, we must find a way to store the record of this request until the required transfer can be performed. In general, we do this using one or another version of the queue abstraction.

For simple character sequential devices such as keyboards or communication ports, we will use first-in first-out or FIFO queues, with the operations documented in Figure 10.1.

enqueue(q,c) or q.enqueue(c)
puts the character c at the tail of the queue q, illegal if q is full.

c = dequeue(q) or c = q.dequeue
gets the character c from head of the queue q, illegal if q is empty. The character returned it the oldest one still in the queue.

full(q) or q.full
returns true if q is full.

empty(q) or q.empty
returns true if q is empty.

Figure 10.1. An informal specification of first-in first-out queues.

As with any abstract data type, we can divorce the question of how queues are implemented from how queues are used. There are many implementations of queues, but for our purposes, all are equivalent so long as they support operations analogous to those given in Figure 10.1. It should be noted that the queues for terminal devices that combine keyboard and display, issues such as echoing keypresses and line buffering add complexity that we will ignore, for the moment.

The Main Polling Loop Approach

The oldest solution to the problem of overlapping computation with input-output is to enclose the entire program in a single big loop, the main polling loop, event loop or main loop. This distorts the logic of applications programs but it makes it easy to see how multiple devices can work independently and in parallel.

The main polling loop structure is common in some older real time or process control systems. A real-time system is one where there are deadlines by which certain actions must be taken; in a process-control system, a computer controls some process. As an example, consider the computer that controls the reactor where ethylene is converted to polyethylene in a chemical plant; here, the computer must adjust the pumps and valves every few seconds depending on the temperature and pressure in the reaction vessel to prevent an explosion or freeze up.

This approach is also common in transaction processing or event processing applications, not only old ones such as IBM's CICS system that is still used in mainframe environments, but also under many of today's window managers. In these, each user action is represented by a transaction or event that is sent to the program's input queue. The outer loop of the program gets one event notice or transaction record per iteration; these transactions may be as complex as requests to web-server or as simple as individual mouse clicks and keypresses.

Figure 10.2 illustrates the global structure of a main polling loop in the context of a single input device and a single output device, each with a status register and a data register.

repeat {main polling loop}
     if (inp( inputstatus ) = ready) and (not full( inputqueue ))
          then enqueue( inputqueue, inp( inputdata ));

     if (inp( outputstatus ) = ready) and (not empty( outputqueue ))
          then outp( outputdata, dequeue( outputqueue ));

     application;
forever;

Figure 10.2. The main polling loop control structure, in Pascal.

In the above, the procedure call to application does all the useful work. On any given call to application, nothing might be done, for example, when an insufficient amount of information is available in the input queue or when the output queue is full, or quite a bit might be done. We must usually limit the time taken by any given call to application, however, because most devices must be checked at some minimum frequency. For example, a 9600 baud communications line can transmit 960 characters per second, assuming that each character is 8 bits, to which we add a start bit and a stop bit. If such a communications line is connected to our input, one call to application must not take longer than 1/960 of a second.

To see how the use of a main polling loop distorts the structure of an application, consider the rather stupid little application described in Figure 10.3.

repeat
     ch = read;
     if ch <> 'z' then begin
          if ch = 'x' then write( ch );
          write( ch )
     end;
until done;

Figure 10.3. A simple application, in Pascal.

The application in Figure 10.3 copies its input to its output, deleting all occurrences of 'z' from the input and doubling all occurences of 'x'. So, if the input is 'azyxzyb', the output will be 'ayxxyb'. As is typical of many applications, the amount of output depends in part on the amount of input; in this case, so long as there are no 'x' or 'z' characters, the amount of output exactly matches the amount of input. However, for some inputs, the output will be much smaller than the input, for example, if there are many 'z' characters in the input, and for other inputs, the output will be much bigger than the input, for example, if there are many 'x' characters. As a result, sometimes, the input queue may tend to fill because the program is blocked by a full output queue, and other times, the output queue may run empty because the program is blocked awaiting for more input.

The application shown in Figure 10.3 can be embedded in the main polling loop shown in Figure 10.2 using the code in Figure 10.4.

procedure application;
{ ch is global, but not used elsewhere }
{ state is global, not used elsewhere, and has initial value 1 }
begin
     case state of
       1: if not empty( inputqueue ) then begin { read a character }
               dequeue( inputqueue, ch );
               if ch <> 'z' then begin
                    if ch = 'x'
                      then state := 2
                      else state := 3
               end;
          end;

       2: if not full( outputqueue ) then begin { write an extra 'x' }
               enqueue( outputqueue, ch );
               state := 3;
          end;

       3: if not full( outputqueue ) then begin { write the character }
               enqueue( outputqueue, ch );
               state := 1;
          end;
     end { case };
end { application };

Figure 10.4. The code from Figure 10.3 changed to fit Figure 10.2.

In the code in Figure 10.4, the logical control structure of the application has been thoroughly hidden as a consequence of the overall event-processing structure of the code. The logical control structure is still there, however, and can be found in the sequence of values taken on by the variable state; thus, assignments to this variable have the effect of goto statements transferring the flow of control between the alternative cases. On any particular call to application, only one case will be executed, but because application is embedded in the main polling loop, we may read the program as if the flow of control was from one case to another as determined by the assignments to state at the end of each case.

The example used in Figures 10.3 and 10.4 was very simple, but this approach, adding state variables and converting control structures into selection among cases embedded in an outer loop is universally applicable. Any program can be rewritten using just one outer loop plus a state variable per function called and case selection replacing the control structure of the body of that function, with if statements used where needed to determine the assignment of values to the state variable at the end of each case.

The Polling Procedure Solution

Although the main polling loop approach is completely general, in that any program can be converted to this form, it is hardly convenient to rewrite easily understood code into strange amalgams of case selections and state variables. Programs are far more maintainable if the obvious control structures actually relate to the algorithm being implemented and not to the strange constraints of the input-output environment. This can be done by packaging the device polling operations used in the main polling loop into a subroutine, call it poll, and by replacing the requirement that the loop iterate at least n times per second with the requirement that the application must call poll at least n times per second.

Figure 10.5 illustrates the use of a polling procedure to support the application from Figure 10.3 using the device interface code from Figure 10.2.

procedure poll;
begin
     if (inp( inputstatus ) = ready) and (not full( inputqueue ))
          then enqueue( inputqueue, inp( inputdata ));

     if (inp( outputstatus ) = ready) and (not empty( outputqueue ))
          then outp( outputdata, dequeue( outputqueue ));
end { poll };

function read: char;
begin
     while empty( inputqueue ) do poll;
     read := dequeue( inputqueue, ch );
end;

procedure write( ch: char );
begin
     while full( outputqueue ) do poll;
     enqueue( outputqueue, ch );
end;


begin { main program }
     repeat
          ch = read;
          if ch <> 'z' then begin
               if ch = 'x' then write( ch );
               write( ch );
          end;
          poll;
     forever;
end { main program };

Figure 10.5. The polling procedure approach.

Note that the main program is almost exactly the same as the code in Figure 10.3, with only the addition of a call to poll added in the body of the loop. The read, write, and poll routines, as well as the implementation of the queue abstract data type, can be viewed as the 'operating system' which is needed to support this code. Note that, within read and write, there are polling loops waiting not for the associated devices to be ready, but for space or data in the appropriate queues. Because these polling loops contain calls to poll, output can continue to be passed from the output queue to the output device while the program is blocked waiting for data to arrive in the input queue.

The polling procedure approach is not only adequate but it is reasonably easy to use. All you usually need to remember is to insert a call to poll in the body of each loop in the program. As a result, this is an excellent approach to support for multiple peripheral devices on the few systems where interrupt hardware is not available.

There are two problems with this approach. First, careless programmers may forget to add a call to poll in some loop. For example, the program might contain a short definite loop, perhaps a for loop with only a few iterations. This loop may need no call to poll during program development, but if the number of iterations depends on some configuration constant, when the time comes to change this constant, nobody may notice that a call to poll needs to be added to the for loop. This kind of error can be very hard to track down. Data will be lost occasionally, but things may well work most of the time, and the result will be a program that has a reputation for being unreliable.

Another problem with this approach arises if an attempt is made to optimize the polling, avoiding the overhead of too many calls to poll. This requires a careful check of the run-time of each section of code, and with modern computers, estimating the run-time of a section of code can be very difficult. Furthermore, slight changes in the system configuration can change these times. Install the program on a machine with a slightly slower CPU and it will become unreliable. Add to the code or data, pushing information out of cache that had previously been cached, and the added memory access time may slow the program enough to make it unreliable.

Interrupts

Interrupt hardware was invented to eliminate the need for explicit calls to a polling procedure from within applications code. Essentially all computers on the market today, from the smallest microcontrollers to the highest performance supercomputers include such hardware. In effect, what the basic interrupt mechanism does is check all of the relevant device status bits just after executing each and every machine instruction, inserting a call to an interrupt handler, analogous to our poll routine whenever some device is ready. So long as no devices need service, this allows the computer to execute instructions at full speed.

In general, an interrupt can be viewed as a hardware-initiated call to a procedure, the interrupt handler or interrupt service routine. In effect, the instruction execution loop of the central processor has been made to serve as the main polling loop of our application! Although the abstract description of an interrupt as a hardware initiated procedure call applies to most interrupt hardware, the details vary considerably from machine to machine. The address of the interrupt service routine is frequently stored in a special register or dedicated memory location, called the interrupt vector. On some machines, calls to interrupt service routines parallel procedure calls to the extent that the normal procedure return can be used to return to the code which was running at the time of the interrupt, while on others, special return-from-interrupt instructions must be used.

In most of this discussion, interrupt service routines will be coded as conventional procedures, but it must be understood that, depending on the actual hardware, small assembly language stubs may be required to call these interrupt service procedures. An assembly language stub is a bit of code that masks over the incompatability between one model of control transfer and another; in this case, allowing the use of an interrupt to call a normal function compiled by a compiler that knew nothing about interrupt service routines.

Once an interrupt service routine has been called, it is essential that the hardware request for that service be disabled or withdrawn. If it were not, an infinite (and possibly recursive) loop would result in which, after executing the first instruction of the interrupt service routine, the hardware would force a control transfer to the start of the same interrupt service routine. On some systems, the interrupt is automatically disabled by the actions of the central processor hardware when it responds to the interrupt, while on others, the first instruction of each interrupt service routine must disable the interrupt. In the examples presented here, it will be assumed that the hardware automatically disables interrupts as it calls the interrupt service routine.

Even if the hardware can disable interrupts, the software must also be able to enable or disable them. For example, on return from an interrupt service routine, after the software has done whatever the interrupt requested, the software must re-enable the interrupt as it returns to the the code which was interrupted. Furthermore, if the output queue is empty, there is no point in responding to an interrupt from the output device when it is ready to transfer more data; a similar argument can be made when the input queue is full. On most machines, there are special instructions to enable and disable interrupts; on some, these instructions apply to all interrupts at the same time, while other machines allow groups of devices to be enabled or disabled as a group. In addition, it is usual to include, in each device's control register, one or more interrupt enable bits corresponding to each condition the device can sense that might be cause for an interrupt request. If the interrupt enable bit for a condition is set, then when that condition is detected, there will be an interrupt request.

It should be noted that the disabling an output device's ability to request interrupts when the output queue is empty and enabling that device when data is put in the queue is a special purpose solution to a general problem, the producer-consumer problem. The device is the consumer, and the application is the producer, and the general problem is to prevent the consumer from attempting to use data that has not yet been produced. In any producer-consumer system, whether the system is all in hardware, all in software, or mixed between the two, there must be some synchronization mechanism to make the consumer wait until data is available!

An Example Device

The asynchronous communications line interface commonly used on the IBM PC compatable family of computers was used as an example in Chapter 9, and the interrupt interface to this device is clean enough to use as an example here. Figure 9.4 defined many of the input-output interface registers of the COM1 port, but it did not give details of the interrupt enable and interrupt identification registers for this device. These are given in Figure 10.6:

3F9 COM1IER Interrupt enable register
The bits of this one-byte read-write register determine which conditions detected by the interface will cause interrupts.
      _______________
     |_______|_|_|_|_|
              | | | |
              | | | 0 -- disable interrupt on receive data input ready
RxIE          | | | 1 -- enable      "                "
              | | 0 -- disable interrupt on transmit data register empty
TxIE          | | 1 -- enable      "                "
              | 0 -- disable interrupt on error, overrun or break
              | 1 -- enable      "                "
              0 -- disable interrupt on change in modem status inputs
              1 -- enable      "                "
3FA COM1IIR Interrupt identification register
After the serial port hardware requests an interrupt, the bits of this one-byte read-only register explain the cause. The information presented in this register could be inferred from the line status register after an interrupt occurs, but it is presented here in an easier to decode format.
      _______________
     |_________|_____|
                | | |
MSIP            0 0 0 -- modem status change interrupt pending
TxIP            0 1 0 -- transmit register empty interrupt pending
RxIP            1 0 0 -- receive data input interrupt pending
ERIP            1 1 0 -- error overrun or break interrupt pending
                x x 1 -- no interrupt pending
Figure 10.6. Interrupt control and status bits for the PC COM1 port.

It is important to understand that the interrupt enable bits of the device controller only determine whether or not that device will attempt to request an interrupt! They do not determine whether or how the CPU will respond to that request. The interrupt enable flag, one bit in the Intel 80x86 flags register, determines whether the processor will check for interrupts before each instruction execution cycle. There is a similar interrupt enable bit hidden somewhere in just about every central processor designed since the early 1960's.

On the 80x86, the CLI machine instruction clears the interrupt enable flag, preventing interrupts, and the STI sets the interrupt enable flag, with a delay of one instruction. This delay sometimes has practical consequences, for example, when a CLI instruction comes immediately before a return instruction. In addition, the 80x86 flag register, including the interrupt enable flag, is pushed on the stack along with the return address when the processor accepts an interupt request, and at the same time, the interrupt enable flag is automatically cleared. Similarly, the return from interrupt instruction not only pops the return address from the stack back into the program counter, but it also restores the flag register. The details vary considerably on other processors, but there is always a way to accomplish these same goals.

The only things the 80x86 hardware saves when an interrupt occurs are the program counter and flags word, and these are the only things restored by the return from interrupt instruction. The interrupt service routine must save and restore the values of any other registers it uses! When an interrupt service routine is given in C or Pascal, it must gloss over this because these high level languages contain no concept of register; in any practical implementation, we will typically need an assembly language stub to take care of this!

The IBM PC and compatables are typical of modern computers in that they add an extra layer of hardware between the device that requests interrupts and the central processor itself. In the case of the PC, this layer is the programmable interrupt controller. This is connected to the system as an input-output device, using device registers 020₁₆ through 02F₁₆. The programmable interrupt controller subsystem of the PC supports a minimum of 8 distinct interrupt levels, with a mask bit for each level to collectively enable or disable all interrupts from devices that share that level. Figure 10.7 describes the access to the interrupt enable mask register for the first 8 interrupt levels.

021 PIC0IMR Interrupt Mask Register
This one byte read-write register holds 1 bit for each of the first 8 interrupt lines. The least significant bit controls line 0, and the most significant bit controls line 7. If that bit is zero, the corresponding interrupt is enabled. If that bit is one, it is disabled.
Figure 10.7. The Interrupt Mask Register for the first 8 levels.

On the 80x86 family, the interrupt vector is a table of 256 entries, allowing up to 256 distinct interrupt service routines. This table begins at physical address 0 in memory! Each table entry is the 32-bit memory address of one interrupt service routine. The connection between a particular device and its interrupt service routine on a classic PC is determined in two steps:

First, which interrupt request line is the device connected to? Most device interface cards sold with the PC had jumpers that could be set to connect the device to one or another interrupt request line. These lines are inputs to the programmable interrupt controller.
Second, how is the programmable interrupt controller programmed to respond to requests on that line? The controller contains configuration registers that determine the mapping from interrupt request line to interrupt vector entry.

We will ignore all of this and assume that the COM1 port is configured normally. In the default configuration, line 4 is disabled in the programmable interrupt controller and the interrupt is mapped to interrupt vector entry C₁₆. Therefore, to connect an interrupt service routine to the COM1 port, we must first store a pointer to the first instruction of the interrupt service routine in memory location 30₁₆ (C times 4 bytes per pointer), and, with the processor's interrupt-enable flag turned off, enable interrupt line 4 in the programmable interrupt controller by anding the contents of input-output register 21₁₆ with EF₁₆ (11101111₂).

The code shown in Figure 10.8 uses interrupts to implement the read and write routines required by the example application code given in Figure 10.3, using the interface from Figure 9.4 and Figure 10.6.

procedure com1interrupt;
{ control reaches here whenever there is
  any interrupt request from the COM1 port }
var id: integer;
begin
     { first, find out what interrupt this is }
     id := inp( COM1IIR );
     { assert, because we got here by interrupt, id is even }

     { second, dispatch to the appropriate routine }
     case id of
        MSIP: ... ;
        TxIP: com1transmit; { we infer TxDE is set }
        RxIP: com1receive;  { we infer RxRD is set }
        ERIP: ... ;
     end;
end { com1interrupt };


procedure com1receive;
var stat: integer;
    data: char;
begin
     { we know the RxRD status bit is 1 }
     stat = inp( COM1LSR );
     if testbit( stat, RxRD )
     and not full( com1inqueue ) then begin
          { get the data }
          data := inp( COM1DATA );
          enqueue( com1inqueue, data );
     end;
 
     if full( com1inqueue ) then begin
          { disable the RxRD interrupt }
          stat := inp( COM1IER );
          clearbit( stat, RxIE ); 
          outp( COM1IER, stat );
     end;
end { com1receive };



procedure com1transmit;
var stat: integer;
    data: char;
begin
     { we know the TxDE status bit is 1 }
     stat := inp( COM1LSR );
     if testbit( stat, TxDE )
     and not empty( com1outqueue ) then begin
          { output the data }
          data := dequeue( com1outqueue );
          outp( COM1DATA, data );
     end;

     if empty( com1outqueue ) then begin
          { disable the TBE interrupt }
          stat := inp( COM1IER );
          clearbit( stat, TxIE );
          outp( COM1IER, stat );
     end;
end { com1transmit };



function read: char;
begin
     { wait for data to be available }
     while empty( com1inqueue ) do { nothing };

     { get the data }
     read = dequeue( com1inqueue );

     { enable receive data interrupt }
     stat := inp( COM1IER );   {  error!!  }
     setbit( stat, RxIE );     {  error!!  }
     outp( COM1IER, stat );    {  error!!  }
end { read };



procedure write( ch: char );
begin
     { wait for space in queue }
     while full( com1outqueue ) do { nothing };

     { put the data }
     enqueue( com1outqueue, ch );

     { enable transmit interrupt }
     stat := inp( COM1IER );   {  error!!  }
     setbit( stat, TxIE );     {  error!!  }
     outp( COM1IER, stat );    {  error!!  }
end { read };

Figure 10.8. Use of interrupts for the COM1 port.

The com1interrupt routine is called whenever the COM1 hardware requests any interrupt, so long as interrupts are enabled in the CPU and so long as the programmable interrupt controller is correctly configured. Its job is to check to see what the cause of the interrupt was and dispatch control to subsidiary interrupt service routines. The details of how the hardware interrupt mechanism dispatches control to the com1interrupt routine are not covered here, but it is important to know that, during the execution of com1interrupt or any routines called from it, no other interrupts from the COM1 communications port are allowed. On the IBM PC, fast devices such as disks may request interrupts during the execution of interrupt service routines for slow devices like asynchronous communications ports, but we can ignore this possibility in this discussion.

There are 4 possible reasons the COM1 port may request an interrupt, including errors and secondary input changes that will be ignored here. The remaining possibilities are that the transmit buffer is empty, allowing a new character to be transmitted by the software, or that the receive buffer contains a character ready to be processed by software. In the former case, com1interrupt calls com1transmit; in the latter case, it calls com1receive.

The com1receive routine is called each time a new character of input becomes available. It begins by verifying that input is available. Assuming that the remaining code is correct, this shouldn't be required, but it seems reasonable to refuse to read the input if the RxRD bit is not set in the status register or if the input queue has no space for the new character. If all is well, we read the character from the data register and enqueue it on the input queue.

The com1receive routine finishes with a check of the queue status. If the received character did not fill the input queue, the routine returns normally, and, on return from interrupt, the interrupted program will be resumed, and perhaps later, another interrupt will signal the arrival of the next character. If, on the other hand, the input queue is full, the RxRD bit of the COM1 port's interrupt enable register is reset, thus preventing any future interrupts announcing that input data is available until the queue is no longer full.

The possiblity that the interrupt service routine may return while leaving interrupts for that routine disabled requires that other code re-enable this interrupt when conditions allow it! Here, the receive interrupt returned with interrupts disabled if there was no space in the queue to put the next item of received data, so we should re-enable this interrupt when we successfully make space in the queue. We do this by having the read-character routine re-enable the interrupt, setting the RxRD bit of the COM1 port's interrupt enable register, after each dequeue it performs. We only need to do this if we dequeued from a full queue, but it is easier to do it after each dequeue, and enabling interrupts when they are already enabled has no dangerous side effects.

Note that the read routine begins with a little polling loop, waiting for data to become available in the input queue; interrupts during this polling loop may service any device, and the polling loop will exit as soon as a COM1 interrupt puts something in the queue.

Critical Sections

The code shown in Figure 10.8 contains serious flaws in both the read and write routines, there are 3-line sequences of code that begin by reading the interrupt enable register, then edit the value read and output that value back to the interrupt enable register. The problem is, what happens if an interrupt occurs during such a sequence of instructions, and more specifically, what if it is an input or output interrupt that, itself, makes some change to the interrupt enable register?

A sequence of instructions such as this is called a critical section. It is a section of code where it is critical that no interrupt be allowed, or more specifically, that no other activity that uses the same variable be allowed. In this case, the variable is the interrupt enable register for the COM1 port.

There are several solution to the critical section problem, depending on whether it is a long critical section or a short one, and depending on wheter the machine has just one CPU or many CPUs. The solution we will use here is one appropriate for short critical sections on uniprocessors. This is quite simple, just disable all interrupts for the duration of the critical section, as shown in Figure 10.9.

function read: char;
begin
     { wait for data to be available }
     while empty( com1inqueue ) do { nothing };

     { get the data }
     read = dequeue( com1inqueue );

     { enable RxRD interrupt -- a critical section }
     disableints;
     stat := inp( COM1IER );
     setbit( stat, IERxRD );
     outp( COM1IER, stat );
     enableints;
end { read };

procedure write( ch: char );
begin
     { wait for space in queue }
     while full( com1outqueue ) do { nothing };

     { put the data }
     enqueue( com1outqueue, ch );

     { enable TBE interrupt -- a critical section }
     disableints;
     stat := inp( COM1IER );
     setbit( stat, IETBE );
     outp( COM1IER, stat );
     enableints;
end { read };

Figure 10.9. Protecting critical section in code from Figure 10.8

The code shown in Figure 10.9 replaces the final to routines in Figure 10.8. The only change is the addition of a call to disableints before each critical section and a call to enableints after each critical section. In fact, most computers include a pair of machine instructions, one that globally enables all interrupts, and one that globally disables all interrupts. On the Intel 80x86 family of machines, for example, these are the CLI and STI instructions. Since many high level language compilers do not allow easy mixing of machine language instructions with high-level language code, we write these a pair of trivial assembly language subroutines to use these special machine instructions.

Queues Revisited

The first-in, first-out queues used in the above input-output implementations provide communication between the user code and the device using the enqueue, and dequeue operations, and they provide synchronization using the tests for a full or empty queue. These operations were informally defined in Figure 10.1, but now, it is time to give implementation details. One common implementation of the queue abstract data type uses a linked list to represent each queue. The other, given in Figure 10.10, uses bounded buffers.

type queue = record
                  head, tail, count: integer { initially zero };
                  buffer: array [0..size] of char;
             end;

procedure enqueue( var q: queue; ch: char );
begin
     q.count := q.count + 1;
     q.buffer[ q.tail ] := ch;
     q.tail := (q.tail + 1) mod (size + 1);
end { enqueue };

function dequeue( var q: queue ): char;
begin
     q.count := q.count - 1;
     dequeue := q.buffer[ q.head ];
     q.head := (q.head + 1) mod (size + 1);
end { dequeue };

function full( var q: queue ): boolean;
begin
     full := (q.count = (size + 1));
end { full };

function empty( var q: queue ): boolean;
begin
     empty := q.count = 0;
end { empty };

Figure 10.10. A simple bounded buffer queue implementation.

In the code in Figure 10.10, it has been assumed that the head pointer always points to the next item to be dequeued from the queue (if there is one), and thus must be incremented after use; similarly, the tail pointer always points to the next free space ready to be used when new data is enqueued, so it must also be incremented after use. Variations are possible where the pointers are incremented before use or where one is incremented before and one after use. Note that this code does not prevent an attempt to dequeue from an empty queue or enqueue in a full queue. Additional code could be added to check for this, but it should be unnecessary except for debugging because the users of this code are expected to provide all necessary synchronization by testing for queue full before enqueueing, and by testing for queue empty before dequeueing.

The bounded buffer implementation shown in Figure 10.10 is entirely sufficient for use in the main polling loop and polling procedure solutions to overlapping of processing with data transfers, but it will not suffice for the interrupt solution. Again, we have a critical section problem! what would happen if, as the user program was enqueueing a character on the output queue, the output device requested an interrupt causing a character to be dequeued from the output queue. If this happens while the enqueue operation is in the middle of incrementing the count of items in the queue, the sequence of operations shown in Figure 10.11 may occur.

    actions in enqueue         actions in dequeue

1) register := q.count + 1;

2)                    interrupt

3)                            q.count := q.count - 1;

4)                      return

5) q.count := register;

Figure 10.11. Interrupts in the context of Figure 10.10.

To understand Figure 10.11, note first that many modern CPUs do not include instructions to directly increment or decrement a variable in memory, and even where the instruction set manual says that there is an instruction that does this, modern optimizing compilers frequently use two instructions in order to allow faster pipelined execution. Thus, on modern processors using high performance compilers, decrementing a variable is generally done by loading it in a register, then decrementing the register, and finally storing the result in memory. It is quite possible that an interrupt will occur between any pair of consecutive instructions unless this is prevented, so here we have another example of a critical section.

We could fix the code from Figure 10.11 in exactly the way the code from Figure 10.8 was fixed in Figure 10.9, by disabling all interrupts before each critical section and enabling them after each critical section, but there is another solution, the complete elimination of the count of items in the queue. The key to doing this is to note that, for both a full queue and an empty queue, the head and tail pointers will be equal! This is illustrated in Figure 10.12.

                           _ _ _ _ _ _ _ _ _ _
An empty queue:           |_|_|_|_|_|_|_|_|_|_|
                                       |        count = 0, empty
                                  head = tail
After enqueue(X)           _ _ _ _ _ _ _ _ _ _
      enqueue(Y)          |_|_|_|_|_|_|X|Y|_|_|
                                       |   |    count = 2
                                     head tail
After enqueue(Z)           _ _ _ _ _ _ _ _ _ _
      enqueue(W)          |_|_|_|_|_|_|X|Y|Z|W|
                           |           |        count = 4
                          tail       head
                           _ _ _ _ _ _ _ _ _ _
After more enqueues       |A|B|C|D|E|F|X|Y|Z|W|
                                       |        count = 10
                                  tail = head
                           _ _ _ _ _ _ _ _ _ _
After two dequeues        |A|B|C|D|E|F|_|_|Z|W|
                                       |   |    count = 8
                                     tail head
                           _ _ _ _ _ _ _ _ _ _
After two more dequeues   |A|B|C|D|E|F|_|_|_|_|
                           |           |   |    count = 6
                          head       tail head

Figure 10.12. A sequence of operations on a queue of length 10.

If enqueueing data causes the head and tail pointers to become equal, then the queue has been filled. If dequeueing data causes the head and tail pointers to become equal, then the queue has been emptied. Note that the relative order of the head and tail pointers has absolutely no bearing on whether or not the queue is full or empty! It should also be noted that only the enqueue routine modifies the value of the tail pointer and only the dequeue routine modifies the value of the head pointer; thus, there are no critical sections involving these pointers as long as there is only one producer and one consumer for data from the queue.

Using the above observation, we can detect queue full after enqueue and queue empty after dequeue, but these rules do not allow for a separate predecate to distinguish between the full and empty conditions by comparing the head and tail pointers. We can solve this if we forbid the queue from filling, reserving the condition where the pointers are equal to mean queue empty. If we do this, the queue full condition must be indicated by the fact that there is exactly one free space, or that the head pointer is one location behind the tail pointer. The code for testing for the full and empty conditions using this approach is given in Figure 10.13.

function full( var q: queue ): boolean;
begin
     full := ((q.tail + 1) mod (size + 1)) = q.head);
end { full };

function empty( var q: queue ): boolean;
begin
     empty := (q.head = q.tail);
end { empty };

Figure 10.13. Eliminating the need for a count field.

It's interesting to note that the storage requirements for a queue of characters may even be reduced by changing to this scheme! We have eliminated the count field, an integer variable, at the expense of one item in the queue, a single character. One character is typically only 8 bits, and if the queue size is over 256 bytes, the count field would need to be larger than 8 bits!

Queues and Terminal Input/Output

When you type on the keyboard, you usually expect to see what you typed echo on the display screen. When queues are used for terminal input-output, there are a number of possible ways of handling this echoing. These are described in Figure 10.14.

             hardware  |  software
 ----------       -----------       --------------           ------
| display  |--<--| interface |-<---| output queue |-<-------|      |
 ----------       -----------       --------------    |  |  |      |
            |          |      |                       |3 |  | user |
            |1         |      |2  --------->----------   |4 |      |
            |          |      |  |                       |  | code |
 ---------- ^     ----------- ^  |  --------------       ^  |      |
| keyboard |-->--| interface |-->--| input  queue |-->------|      |
 ----------       -----------       --------------           ------
                       |

Figure 10.14. Alternate approaches to echoing.

The alternatives outlined in Figure 10.14 are:

Path 1: Half duplex. Characters are echoed by the terminal device -- what is typed on the keyboard will be directly echoed on the display without any involvement from the host computer. This mode matches the behavior that was typical of the Teletype networks of the 1920's through 1960's, and because teletype compatability was common in early computer systems, this mode remained in common use in early compter systems. This path is called half duplex because it does not require a bidirectional communications line from the host computer to the terminal. At any time, either input is being sent from keyboard to host, or output is being sent from the host to the display.
Path 2: Immediate echo. Characters are echoed by the input interrupt service routine; echoed characters take precidence over any pending output. Users will generally see very similar behavior with either Path 1 or Path 2, but with Path 2, each time a key is pressed on the keyboard, it is sent to the computer and then immediately resent back to the display. For low data rates and high typing speeds, this means that the echoing of one character may be overlapped with the typing of the next, and this requires a full duplex communications line, one that allows simultaneous (or at least, the appearance of simultaneous) bidirectional communication.
Path 3: Queued echo. Characters are echoed by the input interrupt service routine, but instead of directly handing these characters to the output interrupt service routine, they are enqueued in the output queue. This eliminates the need for some kind of high-priority communications path (possibly a small FIFO queue) from the input interrupt service routine to the output interrupt service routine. Based on empirical observations, this appears to be the scheme used by Unix.
Path 4: Echo on read. Characters are echoed at the time they are dequeued from the input queue. This means, if a user types characters while a program is busy doing something else, they will not echo until the program actually gets them. If the "something else" was outputting text to the display, the characters won't be jumbled into the output data stream, but rather, they will appear in the same place on the display as they would have appeared had they been typed later.

If a user always waits until the application program prompts for input before typing anything, it is unlikely that the user will ever notice any difference between the 4 echo paths illustrated in Figure 10.14. On the other hand, if the user types ahead, that is, types before being prompted, there will be differences. With paths 1, 2 and 3, the user will see the material that was typed ahead, while with path 4, the user will see nothing until the program prompts for it, and then all of the pre-typed material will echo as if the user had typed it in response to the prompt.

The user will be able to distinguish between paths 1 and 2, on the one hand, and path 3, on the other hand, only if the user types while the application is sending output to the screen, and even then, the result is likely to be subtle. With paths 1 and 2, typed input will be echoed immediately, while with path 3, typed input will be delayed. In both cases, however, typed input will be jumbled into the stream of output, so it may be difficult to observe which echo path is being used.

Finally, paths 1 and 2 are usually indistinguishable, except on systems that actually lock the keyboard when the communications line is configured for output, and then unlock it when input is permitted.

Of course, many applications turn off all echoing in the driver for the keyboard and display, and instead, handle echoing entirely at the application level. This is certainly the case in text editors such as Vi and Emacs (and even horrid little editors such as Pico), and it is also the case whenever applications run directly under the control of a window manager instead of running under a command-line interface.

It should also be noted that, although queues are used with most devices on large systems, the keyboard input queue is the only queue which users are usually directly aware of. As a result, it has earned a name of its own, the type-ahead buffer. This name accurately describes the function of the keyboard input queue: It allows users to type ahead of any request from the applications program for input.

Multiple Devices on one Interrupt Line

The IBM PC serial port example presented above is based on the assumption that each interrupt line will serve only one device, and that requests on each line will be vectored, by hardware, to a single interrupt service routine. These assumptions lead to one of the most common problems found in configuring PC systems! If the system is accidentally configured with two devices connected to the same interrupt line, only one of the interrupt service routines will be called when there is an interrupt, and if this is the wrong routine, the system will hang because that routine will not use the correct input-output instructions to service the device.

Many systems have been built with only a single interrupt request line. In today's marketplace, for example, programmable interface controllers such as Microchip's PIC product line frequently operate this way. On those systems, there is only one interrupt service routine, and its job is to do, in software, the job done by the programmable interrupt controller on the PC. Figure 10.15 gives an outline of the code we might use if we attached all of our serial communication ports to the same interrupt line on the PC; we assume here that the ports are identical, except that their device registers are assigned different device addresses.

void cominterrupt()
/* control reaches here whenever there is
   any interrupt request from any of the 4 COM ports we attached */
{
    { find out what interrupt this is! }
    if ((inp( COM1IIR ) & 1) == 0) {
        com1interrupt();
    } else if ((inp( COM2IIR ) & 1) == 0) {
        com2interrupt();
    } else if ((inp( COM3IIR ) & 1) == 0) {
        com3interrupt();
    } else if ((inp( COM4IIR ) & 1) == 0) {
        com4interrupt();
    }
}

Figure 10.15. Multiple devices sharing one interrupt line.

The code in Figure 10.15 assumes that, despite the fact that our devices are basically identical and differ only in their device addresses, we have written entirely different interrupt service routines for each of them. In fact, because we have assumed identical devices, we should probably use one routine, with parameters that tell it which device interface registers to use and what input-output queue data structures to use.

The code in Figure 10.15 is typical of the interrupt handling framework used on programmable interrupt controllers of today, or on the smaller minicomputers of the late 1960's. Given that we have known how to do this for so many decades, why do so many of today's PC-based systems hang when users accidentally attach multiple devices to one interrupt request line? The answer is a matter of software architecture. The basic framework for interrupt handlers under Microsoft's old DOS system was not carefully thought out and has proven to be an inadequate foundation for later development. If Microsoft had enforced appropriate standards for interrupt handlers in the early 1980's, the story today would have turned out quite differently.

Here is one time-tested rule that works quite well: Initially, the system installs default handlers for each interrupt request line. If, by default, no devices are attached to that line, the default handler panics when it gets an interrupt. A typical thing to do on panic is to reboot the system. To install a handler for a new device on top of the default system, save the pointer to the previously installed handler and replace it with a pointer to the new handler. When there is an interrupt, the new handler must check the appropriate device status to see if the new handler's device has requested the interrupt. If not, the new handler must call the old handler and then return. If so, the new handler takes care of its own device and returns.

Queues and High Performance Sequential Devices

On devices such as magnetic tape, where data is transferred one block at a time instead of one character at a time, first-in first-out queues of characters are rarely used. Instead, it is common to use queues of buffers or of input-output request blocks. In both cases, the queues are usually implemented using linked lists, but the two approaches are otherwise quite distinct.

When queues of buffers are used, the interrupt service routine must have a state, either reading or writing. When it is reading, each interrupt signals that a buffer has been filled and is ready to be enqueued for later use by a user program reading that tape. When it is writing, each interrupt signals that a buffer has been emptied and a new buffer will be dequeued if a new write is started. Because these queues are linked lists of buffers, there can be a common pool of free buffers shared by all block-transfer devices using the same buffer size. When this is done, there is a problem: How long should each queue be allowed to get before either the device or the user program is stopped? This is an example load balancing problem.

Consider the problems caused when a user program opens two files attached to different tape drives, but reads many blocks from one of them while reading few blocks from the other. Both tape interrupt service routines will quickly start enqueueing buffers holding blocks which have been read from tape, but the user program will only remove blocks from one queue. If there is no provision to stop reading from the tape which is not being used much, that tape interrupt service routine will eventually claim most of the buffers, fill them with data, and enqueue them on the queue from which the user program is not taking data. As a result, very few buffers will be available for the queue which is handling most of the traffic, and the benefits of queueing will be largely lost. In effect, the input-output load has been placed on one queue while the resources have been claimed by the other.

The solution to this load balancing problem is to limit the size of each queue. A variable limit is desirable because a fixed limit per queue eliminates many of the advantages of sharing a pool of free buffers. Some older systems require the user to give the queue size when a file is opened; this requires that the user understand the input-output patterns of the program. A more interesting solutions to this problem is to have the queue size adjusted adaptively by the system as it observes the input-output behavior of the user program. This sounds complex, but some surprisingly simple heuristics were developed in the mid 1970s that lead to excellent performance.

The Demos system used a heuristic where each input or output queue initially had a limit of two buffers, but this limit was adjusted whenever the queue filled or emptied. If the queue never filled or emptied, Demos assumed that the queue length was appropriate. For input queues, the fact that the interrupt service routine filled the queue suggested that the queue was too big, so the limit was decremented each time this happened, but never below a size of two. If the user program ever emptied an input queue, it was assumed that the queue was too small, so the limit was incremented. The net result was a system that, at low cost, constantly adjusted the queue sizes as it attempted to keep the program from having to wait for input while not wasting buffers. A similar heuristic was used for output.

The use of queues of buffers is complicated if the user is allowed to switch from reading to writing. If there are I buffers in the input queue and the user requests a write, the last buffer which the user has read was I blocks prior to the current position of the read/write head on the tape, so the tape must be moved back I blocks before starting to write. The I buffers in the input queue can be discarded when this is done. Note that this means that either the input queue or the output queue will always be empty with a tape-based system, so there is really only a need for one queue with a a state variable to tell whether it is for input or for output.

The complexity of the schemes described above for managing queues of buffers is such that many older systems take an alternate approach. Instead of queueing buffers, the system queues requests to perform input-output on a particular device. Each input-output request is a record describing which input-output operations were requested and where the associated buffer is to be found. All operations will be performed in exactly the order that the user requests, but not necessarily at the time of the request. A user request for input or output specifies the buffer and the operation, and the user must explicitly wait for the operation to be completed. If the user wishes any overlap of input-output and computation, the user must maintain the queue of buffers. Few programs on such systems use more than two buffers; as a result, the term double buffering has come to be widely used to describe schemes which allow overlap of computation and input-output.

Queues and Files

The discussion of queues and interrupts presented up to this point in this chapter has ignored all of the issues of device independent input-output, file variable structures, and device description records which were discussed in the previous chapter. In fact, it is not difficult to incorporate queues into the device description record in such a way that most of the structures discussed in Chapter 9 remain unchanged. As an example, consider the design of an interrupt driven serial port driver based on that given in Figure 9.6.

First, each open serial port record needs a new piece of information, the addresses of the input and output queues associated with that device. The user of the open file no longer needs to know the data register and status register for this device; those are needed only in the interrupt service routine, but the user does need to use the interrupt enable register. This leads to the design of a keyboard interface documented in Figure 10.16.

/* definition of comportvariable based on filevariable from Figure 9.1 */
struct comportvariable {
     /* pointers to device specific routines */
     void (* rewind)( struct filevariable * f );
     char (* read)( struct filevariable * f );
     void (* write)( struct filevariable * f, char c );
          .
	  .   /* pointers to routines for other operations */
	  .

     /* device specific information */
     int comie;   /* I/O register number of interrupt enable register */
     struct queue * inq;
     struct queue * outq;
};

/* functions implementing access methods */
void comwrite( struct filevariable * f, char c )
{
     struct comportvariable * cp = (struct comportvariable *) f;
     int ie;

     while (!full( cp->inq )) /* empty loop body */;
     enqueue( cp->enq, c );

     disableints();
     ie = inp( cp->comie );
     ie = ie | TxIE;
     outp( cp->comie, ie );
     enableints();
}

          .
	  .   /* other functions */
	  .


void com1transmit()
/* interrupt service routine for COM1 output */
{
     int stat;
     char data;

     /* we know the TBE status bit is 1, but so what */
     stat = inp( COM1LSR );
     if ((stat & TxBE) != 0)
     &&  !empty( com1outqueue )) {
          /* output the data */
          data = dequeue( com1outqueue );
          outp( COM1DATA, data );
     }

     if (empty( com1outqueue )) {
          /* disable the TBE interrupt */
          stat = inp( COM1IER );
          stat = stat & ~TxIE;
          outp( COM1IER, stat );
     }
}

Figure 10.16. Device Independance and Interrupts, combining code from Figure 9.6 and from Figure 10.8.

The key thing to note about Figure 10.16 is that, above the level of the user-callable read routine, no part of the system needs to know that there is an input queue. The linkage between the open file and the interrupt service routine is tenuous! The open routine must fill in the queue pointer field of the open file description with the same queue used by the interrupt service routine, and the open routine must fill in the same interrupt enable register number as is used in the interrupt service routine.

Terminology

The reader should be familiar with the following general terms after reading this section.

main polling loop               free space list
polling procedure               circular queues
interrupts                      bounded buffers
interrupt service routines      buffering
interrupt vector                request queues
interrupt disable or enable     synchronization mechanisms
producer consumer problems      mutual exclusion problems
echoing                         type ahead buffers
queue load balancing heuristics double buffering

Exercises

Convert the following applications program to the use of a main polling loop, showing your answer in the style of Figure 10.4.

repeat
     read( ch );
     if ch = '('
          then repeat read( ch ) until ch = ')'
          else write( ch );
forever;

What use of queues or interrupts, if any, would be appropriate with the memory mapped display interface discussed in Chapter 9?
Write code to initialize the COM1 port hardware and the Programmable Interrupt Controller hardware in order to allow the use of the interrupt service routine given in Figure 10.8.
Figure 10.8 did not give code for dealing with interrupts on errors detected in the input. Consider case of overrun errors, that is, errors caused by the interrupt service routine failing to read a character from the receive data register before the next character of input arrives from the outside world. With reference to the logic of Figure 10.8, what behavior on the part of the application program will cause this error?
Figure 10.8 did not give code for dealing with interrupts on errors detected in the input. Consider case of parity errors, that is, errors caused by corruption of the data on the communication line. The parity error bit in the line status register will be set when the input data register contains the data that had a parity error, and it will be cleared when that data is read from the input data register. What should the interrupt service routine do when the interrupt is caused by an input parity error?
a) Using the style of Figure 10.10, write a queue implementation based on a linked list of one character buffers, with a global free-space list shared by all such queues to hold idle buffers. Assume that the global free-space list initially holds some number of buffers, but do not be concerned with how they got there.
b) Does your solution contain any critical sections which would cause trouble with interrupts? If so, for each, document the problem, in the style of Figure 10.11.
c) Modify your solution to part a to support the Demos load balancing heuristics.
a) Using the style of Figure 10.10, write a queue implementation based on a bounded buffer with a capacity of exactly one character. This should not involve any head or tail pointer, just a buffer of type char and a state variable with values full and empty. (OK, this sounds like a really stupid question, but writing code for the limiting case is sometimes a good way to press your understanding of a problem!)
b) Does your solution contain any critical sections which would cause trouble if interrupts were used? If so, document them in the style of Figure 10.9.
In Figure 10.16, the only code given is for output. Write the missing code to support character sequential input.
If you take the comreadline routine from Figure 9.8 and graft it onto the basic character sequential interface in Figure 10.8, which of the echo paths (if any) from from Figure 10.14 is implemented.
Describe a heuristic appropriate for adjusting the length of output queues in the style of the Demos input queue heuristic given here.

References

Many of the references listed at the end of the previous chapter are also appropriate here. Interrupts are well covered in

Software Systems Principles by Peter Freeman (SRA, 1975);

see particularly Sections 2.4.4 on interrupt hardware and 5.4.1 on interrupt handlers and polling.

There is a decent discussion of interrupt structures in Sections 9.1.6 and 9.1.7 of

Systems Programming by Donnovan (McGraw Hill, 1972)

The Demos approach to buffer management is discussed in

"The Demos File System" by M. L. Powell, in the Proceedings of the 6th Symposium on Operating Systems Principles, published as Operating Systems Review, 11, 5 (ACM SIGOPS, Nov. 1977), pages 33-41.