7. Bus Level Design

Part of the 22C:122/55:132 Lecture Notes for Spring 2004
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Introduction

Busses are not required in the design of a computer system. All digital systems that use busses can can be built, in theory, using multiplexors and demultiplexors to serve the same function. Nonetheless, we use bus-based design in almost all modern systems, and it has been present, to some extent, even in the earliest digital computers.

Bus-oriented design has some important advantages:

Busses allow greater isolation between subsystems, so, for example, memory modules and peripherals can be plugged in or removed without requiring modifications to the CPU.
Busses distribute the logical multiplexor and demultiplexor functions over the set of attached subsystems. In effect, the cost of the multiplexor and demultiplexor is distributed over the attached subsystems.
Busses allow physical distribution of components to match the logical design, with the bus on the motherboard (small systems) the backplane (mid-sized systems) or interconnecting cables (large systems) and the subsystems plugged into the motherboard, backplane or interconnecting cables.

Sidenote The term bus in electronics comes from the term bus-bar in power-plant design; this term may date back to Edison. In a classical power plant, with many dynamos powered by piston engines, a pair of copper bars was used to combine the outputs of the dynamos and transfer the output to the various power transmission lines leading from the plant. These bars, taken together, were the bus for the powerplant. The topology of the entire system was quite similar to the topology of a computer system with its own bus, except that the subsystems were dynamos and external transmission lines, and there were only two conductors in the first generation of these early busses.

Low-Level Issues

Physically, a typical bus consists of many wires. Some are ground or return wires, required to complete the electrical circuits of the underlying electronics, and some are signal lines.

Because the lines in a bus are typically relatively long, they must be driven using more power than the short lines carrying logic signals within a subsystem. Furthermore, long lines are more likely to act as antennas than short ones, picking up interference from the outside world. Because of this, we generally use bus receivers, special circuits with a better noise rejection characteristic than the common logic circuits. Aside from this distinction, the actual encoding of logical values on bus lines is typically similar to that used elsewhere in the digital system.

When a bus is long enough that the transmission time on the bus is significant compared to the speed of the logic used in the system, echoes from the ends of the bus must be controlled. This is done using bus terminators. Typically, these are resistor networks at the ends of the bus, matched to the characteristic impedance of the bus lines, and connected to either ground, the logic supply voltage, or a carefully selected neutral intermediate voltage.

Bus lines can be divided into two categories:

Single-source lines. These are driven from a single source. Lines carrying the system clock signal, and sometimes, the address, write and read lines, are in this category.
The bus drivers for single-source lines are typically merely high-power versions of normal logic outputs.
Multiple-source lines. These are lines that may be driven by any of a number of sources, for example, the data lines that may carry data from the CPU to memory or from the memory to the CPU.
The bus drivers for multiple-source lines must be special, able to either deliver a logic signal to the line or to disconnect, allowing some other source to deliver a signal.

There are two kinds of drivers appropriate for a multiple-source bus line. Both are common in digital systems:

Open-collector (TTL) or open-drain (CMOS) drivers. These have two output values, low (logic zero) and disconnected. The bus terminator used with open-collector or open-drain devices is responsible for pulling the line up to a high level (logic one) when no driver is active. Because of this, the line is said to implement a wired-and function; the value on the line is one only if no drivers are sending a zero.
Open-collector or open-drain busses are limited in speed because the passive pull-up of the terminator can't drive a one onto the line as quickly as an active bus driver could. They are simpler, however, and are common on short low-performance busses.
Tri-state drivers. These have three output values, low (logic zero), disconnected, and high (logic one). The bus terminator used with open-collector devices may pull to any value; both high and intermediate values are common in practice.
Tri-state drivers are fast. A typical tri-state driver is described as follows:
```
	  a  b |  c         c |        
	 ------|------        |
	  0  0 | open         |
	  0  1 | open   a ---/_\
	  1  0 |  0           |
	  1  1 |  1         b |
```

Bus Masters

Components that attach to a bus may be divided roughly into two groups: bus masters and bus slaves. The bus master or bus masters attached to some particular bus are the ones that control the transfer of data on that bus, while the slaves react, responding to instructions from the masters.

If we use the CPU-to-memory bus of a computer system as an example, the CPU is a bus master, while each memory module is a slave. The CPU initiates all bus transactions in this system, while the memory modules passively do what the CPU instructs.

Generally, bus-based systems may be divided into two groups depending the number of bus masters:

Those with a single bus master. In such a system, one subsystem is designated as the bus master; this subsystem controls what device has access to the bus at what time. The master initiates all data transfers, while the slaves operate only in response to the master.
Typically, the master is the only device able to drive the single-source bus lines, while the slaves all use drivers compatable with multiple-source lines. For example, the master may drive the address lines while each slave listens for its address and reads or writes data only when the address lines select it.
Those with multiple masters. In such a system, many subsystems are able to act as a master. Each potential master may initiate a bus cycle involving a transfer between that master and some other subsystem.
Multimaster busses require some way to resolve bus contention when multiple masters each wish to use the bus at the same time. Bus contention logic is a complicated problem, and in fact, there is no solution to the problem of resolving the contention problem in a fixed finite time without the use of a centralized arbitrator.
Most multimaster busses are not completely asynchronous, but instead, have a central clock that delivers a single synchronized stream of clock pulses to all devices on the bus. The master clock is sometimes packaged with a small amount of contention resolution logic and the bus terminators in a bus controller subsystem.

In most desktop and larger computer systems, multimaster busses are quite common. Looking at the CPU-to-memory bus, for example, The CPU and the DMA controller are both bus masters, able to independently initiate memory read or write operations.

Multi-master busses are complex, so the busses in smaller computer systems are almost all single-master designs.

A Typical Single Master Bus

A typical single master bus might be designed as follows:

	 ----------
	|          |----------------------- Data  (tristate)
	|          |----------------------- Lines
	|          |
	|          |-\--------------------- Address (single source)
	|   BUS    |-/--------------------- Lines
	|  MASTER  |
	|          |->--------------------- Direction (read or write)
	|          |                        (single source)
	|          |->--------------------- Strobe or clock
	|          |
	 ----------

From the master's perspective, a bus read cycle involves first placing the desired address on the bus and setting the direction line to indicate a read cycle, and then putting a pulse out on the strobe or clock line. The slave will try to put data on the data lines for the entire duration of this pulse, and typically, the master will clock its flipflops to consume this data at the end of the strobe pulse.

A write cycle on this bus is similar. The master begins by setting the desired address and by setting direction signal to indicate a write cycle, and then forcing the desired data onto the data lines. Once this data is stable, the master outputs a strobe or clock pulse to indicate that the addressed slave should consume the data.

The following timing diagram illustrates a read and a write cycles:

	                read cycle       write cycle
                     (data from slave) (data from master)
	          |         ____            ________
	     data |---------____------------________----
                  |        :   :             :   :
	          |     __________        __________
	  address |-----__________--------__________----
                  |        :   :             :   :
                  |                    _________________
        direction |___________________|
                  |        :   :             :   :
                  |         ___               ___
           strobe |________|   |_____________|   |______
                  |            
                 -|--------------------------------------
                  |

In the above, the following notation is used:

	            
	          |                  ________________ 
	          |------------------________________---
	          | invalid or no    valid (each line
	          | defined value    in this group is
	          |                  either 0 or 1)

During a read cycle, we speak of the required delay between the time the address becomes valid and the strobe pulse to read the data. If an insufficient delay is provided by the master, the device will put invalid data on the bus during the strobe pulse. This time interval is the read-time of the device. The receiver of the data being read only looks at the data during the strobe pulse. During a write cycle, we talk about the delay between data valid and the strobe pulse. This is the setup-time of the device. The total delay from address valid to the significant edge of the clock pulse, typically the read-time plus the setup time, must be shorter than the time from address valid to the significant edge of the clock pulse.

Bus Slave Design

Consider a very simple component, a one-word register R that can be read or written at address X. This can be realized as follows in terms of the bus master outlined above:

	--------------------------------------------- Data
	-------------------------------   ----   --- Lines
	                               | |    | |
	-\-----------------------------| |----| |--- Address
	-/-----   ---------------------| |----| |--- Lines
	       | |                     | |    | |
	->-----| |--------o------------| |----| |------ Direction
	       | |        |            | |    | |
	->-----| |-----o--|------------| |----| |------ Strobe or clock
	     __\_/__   |  |            | |    | |
	    |  =X   |  |  |   ___      | |    / \
	    |_______|  |  o--|   |   __\_/__  | |
	        |      o--|--|AND|--|<-_  R | | |
	        o------|--|--|___|  |_______| | |
	        |      |  |   ___      | |    | |
	        |      |   -O|   |     | |    / \
	        |       -----|AND|-----| |---/___\
	         ------------|___|     | |____| |
                                       |________|

The above trivial example can trivially converted into several different useful devices. For example. if we change only the bottom part of the figure, we get a very rudimentary parallel output port:

                    strobe               
             address   | direction   data in and out
	     __\_/__   |  |            | |    | |
	    |  =X   |  |  |   ___      | |    / \
	    |_______|  |  o--|   |   __\_/__  | |
	        |      o--|--|AND|--|<-_  R | | |
	        o------|--|--|___|  |_______| | |
	        |      |  |   ___      | |    | |
	        |      |   -O|   |     | |    / \
	        |       -----|AND|-----| |---/___\
	         ------------|___|     | |____| |
                                       |  ______|
                                       | |
                                     external
                                     connector

This parallel port allows the contents of the register R to be inspected by some device in the outside world; it also allows the host system to read the register; this function is useful in hardware diagnostics (to verify that the register works or if any outputs have been short circuited to power or ground), and with minor added hardware (series resistors between the register and the inputs) it allows the port to be used for input as well as output. The IBM PC parallel port expands on this in one crucial way: There is a second register that controls output-only bits that are used, for example, to tell the external device when data is ready, and there are input-only bits that allow the external device to indicate when it is ready.

As another example, consider a memory module constructed as follows:

                    strobe               
             address   | direction   data in and out
	    ___|_|___  |  |            | |    | |
	   || |   | || |  |            | |    | |
	    | |   | |  |  |          __\_/__  | |
	    | |   |  --|--|--------\|       | | |
	  __\_/__  ----|--|--------/|addr   | | |
	 |  =X   |     |  |   ___   |  MEM  | / \
	 |_______|     |  o--|   |  |       | | |
	     |         o--|--|AND|--|<-_-   | | |
	     o---------|--|--|___|  |_______| | |
	     |         |  |   ___      | |    | |
	     |         |   -O|   |     | |    / \
	     |          -----|AND|-----| |---/___\
	      ---------------|___|     | |____| |
                                       |________|

Here, we have broken the address into two parts, using the most significant bits of the address for device selection and using the least significant bits as the memory address input to the memory module itself.

This is very close to the design used for small static RAM modules sold in the mid 1970's. DRAM systems are more complex because of the need for refresh logic. Typical 1970's and even early 1980's memory subsystems used jumpers or miniature switches to set the address X to which the memory module responds.

Today's SIMM modules have added complexity because there are only, for example, 11 address input pins on a 72 pin SIMM with a capacity of 16 or 32 megabytes (4 to 8 million 32 bit words). The internal memory address register on the SIMM is 22 bits long, divided into two 11 bit sub-registers (referred to as page and word in page, or as row and column select registers). A memory cycle requires that the address be strobed into the chip in two cycles before the data can be inspected or stored. But note! 22 bits only allows addressing 4 megawords. This SIMM, when used in the 8 megaword configuration, had separate strobes for the two banks on the one SIMM. The decision to clock data into the SIMM was made by an external memory controller subsystem.