22C:122/55:132, Notes, Lecture 24, Spring 2001

Douglas W. Jones
University of Iowa Department of Computer Science

The Pipelined Fetch-Execute Cycle
In the late 1960's, IBM introduced the 360 model 91; this was the first commercial offering of a pipelined computer, if one ignores the fact that certain aspects of the CDC 6600 computer could be viewed, in retrospect, as pipelined.
The IBM System 360 family of computers had a common architecture, that is, a common instruction set, as seen by the users, with a wide variety of implementations. Vertical microcode (with large numbers of short microinstructions per instruction) and horizontal microcode (with a small number of long microinstgructions per instruction) were both used for some members of this family.
It is worth noting that the System 360 family is alive and well today. The number was coined by marketing to mean a 3rd generation machine for the 1960's, so, of course, when the 1960's ended, marketing considerations demanded a change of numbering, so the 370 family was born. In the 1980's, a new and somewhat scrambled numbering system obscured what was going on, but in the 1990's, the 390 family carried on the tradition. Today's representatives of this family remain object-code compatable with the original for user programs, despite immense changes in the implementation technology, the I/O architecture, and the operating system environment.
The IBM System 360 family is characterized by a 32-bit word, and 16 general purpose registers. The basic instruction format of the IBM System 360 was and is:
```
	Full-Word Instructions:

                8           4       4       4              12
	|_______________|_______|_______|_______|_______________________|
        |_______________|_______|_______|_______|_______________________|
        |                               |                               |
                OP          R       X       B             DISP

	  Typical examples:

             L  (load)          GPR[R] = M[ GPR[X] + GPR[B] + DISP ]
             LA (load address)  GPR[R] =    GPR[X] + GPR[B] + DISP
             ST (store)                  M[ GPR[X] + GPR[B] + DISP ] = GPR[R]
             A  (add)           GPR[R] = M[ GPR[X] + GPR[B] + DISP ] + GPR[R]
             BAL(branch and link)   PC =    GPR[X] + GPR[B] + DISP; GPR[R] = PC


	Half-Word Instructions:

                8           4       4
	|_______________|_______|_______|
        |_______________|_______|_______|
        |                               |
                OP          R       X

	  Typical examples:

             BALR (branch and link) PC =    GPR[X]; GPR[R] = PC
	
```
There are hunddreds of texts on programming this family of machines (The assembly language was BAL, the Basic Assembly Language, so many texts are catalogued under this and not under IBM 360). The purpose of the above list of instructions is not to provide anything like an exhaustive list! Rather, it is to illustrate the kinds of operations the the underlying hardware must execute for each instruction execution cycle.
During any cycle, the machine may fetch a 16-bit halfword instruction or a 32-bit full-word instruction. In executing this instruction, it may perform any of the following operations:
- Gather operands from registers and instruction word fields.
- Compute an effective address.
- Reference memory to load or store an operand.
- Perform an arithmetic operation.
- Store some result in a general purpose register.
Not all instructions will involve all of these stages, but note that This list includes several time-consuming operations. Gathering operands from registers will be fairly fast, since it involves accessing a very small and very fast RAM. If we ignore floating point operations and multiplication and division, the addition required to form an effective address is very likely to be just as as time consuming as the ALU operations required for instructions such as add and subtract.
The memory reference for operands is certain to take as long as an instruction fetch, and it is quite possible to imagine a system where the time to perform arithmetic operations is comparable to the time taken to access memory. This leads to the suggestion that this architecture could be pipelined with the following pipeline stages:
- Fetch
- Gather operands and compute an effective address.
- Reference memory to load or store an operand.
- Perform an arithmetic operation and save result.
(It is important to note that this set of pipeline stages was derived from the instruction set of one computer, under one set of assumptions about the relative time taken for various operations. Other instruction sets and other assumptions about relative times lead naturally to other sets of operations!)
The Interstage Registers
Given the above breakdown of an instruction set into pipeline stages, the first question that must be resolved is: What goes in the interstage registers?
- Fetch
```
	 _______   _______
	|_______| |_______|
	   IR      NEXT PC
	
```
- Gather operands and compute an effective address.
The first stage fetches an instruction, so the interstage register that it feeds must contain this instruction word. For the IBM 360 architecture and many others, there are instructions that require knowledge of the address of the instruction following the instruction that was just fetched. In the case of the IBM 360, the BAL and BALR instructions need this so they can save a return address. Other computers need the same information for PC relative branches.
- Gather operands and compute an effective address.
```
	 _______   _______   _______
	|___////| |_______| |_______|
	   IR         R        EA
	
```
- Reference memory to load or store an operand.
The gather operands stage collects the operands of the instruction from general registers and from the next PC interstage register, and feeds the addressing adder which produces an effective address. The details of what operand is loaded in the R and EA interstage registers are determined by the opcode.
Certain fields of the instruction register are no-longer needed at this point. The displacement and base register B have been used, and for all of the example instructions, there is no further need for the index register X.
- Reference memory to load or store an operand.
```
	 _______   _______   _______
	|___////| |_______| |_______|
	   IR         R        OP
	
```
- Perform an arithmetic operation and save result.
The memory reference may save a register to memory, or it may load a register from memory. In the former case, most of the work of the instruciton has been completed and the ALU stage is only needed for cleanup. In the latter case, the ALU will be used to combine the two register operand R with the memory operand OP to compute some result. The opcode field is still needed, as is the register number, so the result can be saved.