4. The Fetch Execute Cycle

Part of 22C:60, Computer Organization Notes
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Overview

The Preliminary discussion of the logical design of an electronic computing instrument by Burks Goldstine and Von Neumann, written in in 1946, outlines the design of the EDVAC computer built at the Moore School of Electrical Engineering. The basic structure of this computer followed the outline given at the start of Chapter 3:

The Von Neumann model of a computer
     
Central Processing Unit
CPU
 
Registers
 
 
              Interface(s) or Bus(ses)              
 
Memory
 
Input
Output
     

Within this outline, EDVAC and other early computers varied considerably. Some had parallel CPUs in which all the data in any particular memory word were operated on at the same time. Others had serial architectures in which the bits of a word were processed sequentially. Some used genuinely random-access memory technologies, while others simulated random-access behavior using magnetic drums or delay lines. Some used binary arithmetic, Others used decimal arithmetic, using from 4 to 6 bits to encode each digit.

All of these early machines shared a very important property with modern computers. The basic idea of the central processing unit has not changed. At heart, almost all of the central processors designed from 1946 onward can be described in the following terms:

The above description is essentially an algorithm, so we can describe the central processor in algorithmic terms using programming notation. Here is the algorithm:

The fetch-execute cycle
while (TRUE) do {
        /* fetch one instruction */
        instruction_register = memory[ program_counter ];
        program_counter = program_counter + 1;

        /* execute that instruction */
        execute( instruction_register );
}

While the basic idea of the fetch-execute cycle has not changed since the dawn of electronic computing, the details have changed immensely. Early machines could only execute a few thousand instructions per second, but by the late 1960's there were machines able to execute a million instructions per second, and today we speak casually of machines that may execute billions of instructions per second.

The above algorithm increments the program counter by one for each instruction. In fact, from the start, many computers have had instructions smaller or larger than one addressable unit of memory. By 1960, there were computers with variable-sized instructions, where the size of the instruction depended on its function.

The set of registers used inside the central processor has also varied considerably between different computers. In addition to the program counter and instruction register, many early computers had only one other register, the accumulator. Instructons did things like load the accumulator from memory, store its contents to memory, or add to it. In contrast, many modern central processors have several tens of registers in their central processors.

The Hawk CPU

The Hawk borrows from a number of computers designed from 1946 to the present. The most important of these is the IBM System 360 first sold in 1965; IBM's family of enterprose servers are descended from this. Another important source is the Digital Equipment Corporation PDP-11, a 16-bit machine from which the Motorola 68000 and the Intel 8080 and 8086 borrowed many ideas; the 8086 is the ancestor of the Pentium family.

As with the IBM System 360, The Hawk has 15 general purpose registers of 32 bits each, numbered from 1 to 1510 or 1 to F16. Each of these can be used as accumulators to hold intermediate results during computation or as index registers to hold memory addresses. In assembly language programs, these are named R1 to R15 or RF. Early computers had only a single accumulator within the CPU. The PDP-11 and Motorola 68000 had 7 general purpose registers. Several modern computers have 32 or more general purpose registers.

In addition to the general purpose registers, the Hawk has a program counter and a special 32-bit register called the processor status word that contains a variety of special fields. The most important processor status word field holds the condition codes named N, Z, V and C. These are so important that the Hawk emulator breaks them out to be displayed separately.

Whenever you run a Hawk debugger or emulator, the internal state of the processor will be displayed as follows:

The Hawk CPU state
PC:  00000000                R8: 00000000
PSW: 00000000  R1: 00000000  R9: 00000000
NZVC: 0 0 0 0  R2: 00000000  RA: 00000000
               R3: 00000000  RB: 00000000
               R4: 00000000  RC: 00000000
               R5: 00000000  RD: 00000000
               R6: 00000000  RE: 00000000
               R7: 00000000  RF: 00000000

In this display of the Hawk processor state, the program counter and processor status word are displayed on the left, with the condition code bits broken out on the third line. The right two columns show the 15 registers, each with its name. In the Hawk debugger, the contents of all of these registers are always shown in hexadecimal, with the exception of the condition codes, which are shown in binary.

Why didn't the display shown above include the instruction register? It is there, inside the central processor, but its contents are only transient, and a typical debugger or emulator will provide a more convenient way of examining the program being executed, either by displaying the contents of memory, or better yet, by displaying a symbolic representation of the code.

Newcomers to machine-language and assembly-language programming should not seek in-depth information on the internals of the processor until they have seen an overview of a few of the simpler instructions. Nonetheless, additional information is always available in the reference manuals for the processor. In our case, Chapter 1 of the Hawk manual contains a section on the central processor that gives more detail on the registers, and it includes a section on the instruction execution cycle giving specifics of this cycle for the Hawk processor.

The Hawk Fetch-Execute Cycle

The fetch-execute cycle of the Hawk differs from the generic fetch-execute cycle because Hawk instructions are each made of one or two halfwords. Therefore, instruction addresses are always even, and the program counter is always incremented by either two or four after each fetch. If we ignore instructions longer than one halfword, we can describe the fetch-execute cycle of the Hawk as follows:

The fetch-execute cycle
while (TRUE) do {
        /* fetch one instruction */
        instruction_register = halfword_memory[ program_counter ];
        program_counter = program_counter + 2;

        /* execute that instruction */
        execute( instruction_register );
}

To understand how the central processor of any computer works, we must examine the instruciton register in more detail. In the case of the Hawk, the instruction register is arranged as follows:

Organization of the Hawk instruction register
                             
07 06 05 04 03 02 01 00
opcode dst
15 14 13 12 11 10 09 08
 
                             

First and foremost, the instruction register is best viewed as two consecutive bytes of one halfword. The bit numbers shown above are the bit numbers in the halfword, but the first byte of the halfword is shown on top of the second, to emphasize the order in which the bytes are stored in memory. In much of the Hawk documentation, the two halves of the halfword are shown as follows, deliberately placed in the wrong order in order to allow left-to-right readers to see the opcode field as the most prominent:
 

Organization of the Hawk instruction register
                 
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
opcode dst  
                 

The single most important field of the instruction register is the opcode or operation-code field. This 4-bit field determines the interpretation of the other fields of the instruction, and in some cases, it completely determines the details of the instruction to be executed. A second field, the destination register field or dst is almost always used to specify one of the 15 general purpose registers, and this register is almost always used as the destination register, that is, the register in which the result of some operation will be placed.
 

Basic Arithmetic Instructions

Consider the add instruction. On the Hawk, as on many other computers, this instruction allows the contents of two registers to be added. On the Hawk, this instruction takes the following form:
 

The Hawk ADD instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
0 0 1 1 dst s1 s2

 
In the add instruction, the opcode field is 00112 and when the central processor sees this pattern, this determines that bits 8 to 11 and bits 12 to 15 of the instruction register will be used as register numbers source two and source one, s2 and s1, specifying the registers that will provide the operands to be added. The result will then be placed in the destination register before the next instruction is executed.

In short, we use the notation r[dst]=r[s1]+r[s2] to describe the add instruction, or rather, what the CPU does when it processes the add instruction. We can describe this instruciton as a three-register instruction because it references a total of three registers. The name three-register instruction is something of a misnomer because some Hawk instructions with this format use s1 or s2 to hold constants instead of register numbers.

To illustrate the impact of executing an add instruction, consider the following initial state of the registers in the Hawk processor:
 

An initial Hawk CPU state
PC:  00001000                R8: 00000000
PSW: 00000000  R1: 00000000  R9: 00000000
NZVC: 0 0 0 0  R2: 00000000  RA: 00000000
               R3: 00000000  RB: 00000000
               R4: 01234567  RC: 00000000
               R5: 76543210  RD: 00000000
               R6: 00000000  RE: 00000000
               R7: 00000000  RF: 00000000

In the above initial state, the program counter is set to 100016 so the next time the processor tries to fetch an instruction, it will be from the halfword in memory occupying memory addresses 100016 and 100116. Consider what will happen if this halfword of memory has the value 453216. The first byte of this value is 3216, with most significant hexadecimal digit being 00112 in binary. This opcode leads the CPU to interpret the instruction as an add instruction, summing the contents of registers 4 and 5 and storing the result in register 2. If we allow the CPU to execute exactly one instruction in this context, the state of the processor will change to:

The state after ADD R2,R4,R5
PC:  00001002                R8: 00000000
PSW: 00000000  R1: 00000000  R9: 00000000
NZVC: 0 0 0 0  R2: 77777777  RA: 00000000
               R3: 00000000  RB: 00000000
               R4: 01234567  RC: 00000000
               R5: 76543210  RD: 00000000
               R6: 00000000  RE: 00000000
               R7: 00000000  RF: 00000000

Note that there are only two changes from the initial state to the new state given above. First, the program counter has been incremented by two, since a single halfword instruction occupies 2 bytes, and second, the value of register 2 has changed. This now holds the sum of the values in registers 4 and 5.

Nothing prevents an assembly language programmer from writing H #4532 to put this add instruction into memory. Recall that the H directive in our SMAL assembly language places a halfword in memory, and recall that the # is used as a prefix indicating that the operand is a hexadecimal number. While an assembly language programmer can write this, it is not very readable. Human readers, and for that matter, the authors of compilers, would rather not program in hexadecimal!

Assembly languages provide tools that allow you to use a symbolic notation. Our SMAL assembly language does not have any built-in knowledge of the Hawk instruction set, but as mentioned in the previous chapter, the relevant definitions are available in a header file. Once this header is included, we can use the convenient shorthand notation ADD R2,R4,R5. Here is a listing of an assembly language program that puts the two byte sequence 3216 4516 making up the instruction halfword 453216 in memory location 100016:

Assembly of the ADD instruction
                                 1          USE     "hawk.h"
                                 2  .       =       #1000
 00001000: 32  45                3          ADD     R2, R4, R5

A computer with just one instruction would be difficult to use, but the 4-bit opcode field of the Hawk allows for 16 distinct instructions. For example, the SUB instruction has the same format as ADD, but it subtracts. ADD and SUB are register-to-register instructions, so called because they take operands from registers to produce a result in a register. There are also memory-reference instructions that transfer data to and from memory, and immediate instructions that have some operands that are taken directly from fields on the instruction register.

Exercises

a) What halfword does ADD R15, R12, R8 assemble to in memory.

b) What halfword does ADD R7, R9, R11 assemble to in memory.

c) If every register r[i] contains the value i, and then ADD R7, R9, R11 is executed, what register is changed and what is its new value?

d) If every register r[i] contains the value i, and then the instruction 453216 is executed, what register is changed and what is its new value?

More Hawk Instructions, Loading Constants

Register-to-register instructions alone are not sufficient to make a useful computer. We also need a way to load values into registers and to store results from registers to memory. Many arithmetic operations involve small constants. For example, the most common use of addition in real programs is to increment a variable, usually by one. Because of this, most computer architectures provide simple instructions for loading constants into registers. These are typically called load-immediate instructions. The term immediate constant refers to a constant that is either taken from the instruction register or addressed by the program counter.

In the case of the Hawk architecture, several instructions include immediate constants. There are two load immediate instructions, one for 8-bit constants and one for 24-bit constants, and there are two add immediate instructions, one for 4-bit constants and one for 16-bit constants. The simplest of these is the load immediate short instruction. This has the following format:

The Hawk LIS instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
1 1 0 1 dst const8

In the load-immediate short instruction, the opcode field is 11012 and when the central processor sees this pattern, this determines that bits 8 to 15 of the instruction register will be taken as an 8-bit constant in the range –12810 to +12710. This constant is expanded to 32 bits by sign extension and stored in the register specified by the dst field of the instruction. We can use the shorthand notation r[dst]=sxt(7,const8) to describe this.

In our shorthand notation, the sxt() function indicates sign extension, with the first argument being the bit-number of the bit that is to be duplicated to fill all higher-numbered bits. In the two's complement number system, an 8-bit constant can be expanded into a 32-bit constant by padding it, on the left, with 24 additional bits. These additional bits are copies of bit 7 of the original short constant, its sign bit. If the constant was positive, the new bits are all zero. If negative, they are all one. Many (but not all) Hawk instructions that include short constants sign extend those constants before using them.

Of course, not all constants are short enough to fit in one byte. The load immediate long instruction allows any 24-bit signed constant. Because the instruction register of the Hawk is only 16-bits long, this instruction is coded as two successive halfwords, as follows:

The Hawk LIL instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
1 1 1 0 dst const8 (bits 0-7)
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
const16 (bits 8-23)

The constant in the load-immediate long instruction is sign extended before loading it into the 32-bit destination register, so the allowed range of values is –223 to 223–1, or –8,388,608 to 8,388,607. Again, we can use the shorthand notation r[dst]=sxt(23,const) to describe this, ignoring for the moment the problem of combining the 8-bit and 16-bit parts of the constant.

It is worth remembering that the load immediate long instruction can be viewed as a sequence of 4 bytes in memory. From this perspective, it looks like this.

The Hawk LIL instruction as a byte sequence
                             
07 06 05 04 03 02 01 00
1 1 1 0 dst
15 14 13 12 11 10 09 08
const8 (bits 0-7)
07 06 05 04 03 02 01 00
const8 (bits 8-15)
15 14 13 12 11 10 09 08
const8 (bits 16-23)
                             

To demonstrate the use of these instructions, consider the following assembly language program that loads constants into registers R4 and R5 and then adds them, storing the result in R2:

 
Loading and adding some constants
                                 1          USE     "hawk.h"
                                 2  .       =       #1000
 00001000: D4  12                3          LIS     R4, #12
 00001002: E5  654321            4          LIL     R5, #654321
 00001006: 32  45                5          ADD     R2, R4, R5

Note that, in assembling this code, the assembler allocated one halfword for the load-immediate-short instruction LIS and two halfwords for the load-immediate-long instruction LIL. As a result, the ADD instruction sits in the 4th halfword of the program. In the assembly listing, each single-halfword instruction is given in hexadecimal as two consecutive bytes, while the long LIL instruction is shown as a single byte followed by a triple-byte quantity holding the immediate constant.

If this bit of code is loaded into the memory of a Hawk computer and the program counter is set to 100016 with all of the registers equal to zero, and then the computer is started and run for 3 instruction cycles, the state of the processor will end up as:
The state after 2 loads and an add
PC:  00001008                R8: 00000000
PSW: 00000000  R1: 00000000  R9: 00000000
NZVC: 0 0 0 0  R2: 00654333  RA: 00000000
               R3: 00000000  RB: 00000000
               R4: 00000012  RC: 00000000
               R5: 00654321  RD: 00000000
               R6: 00000000  RE: 00000000
               R7: 00000000  RF: 00000000

In this new processor state, the program counter has advanced to 100816, 8 bytes from the point at which the program began because the program itself occupied 4 halfwords. Registers R4 and R5 have been loaded as the instructions required, and the ADD instruciton has computed the sum of these two registers in R2.

The Hawk architecture does not have an instruction for loading an arbitrary 32-bit constant. Instead, there is a sequence of 2 instructions that takes up 3 halfwords of memory to load a 32-bit constant. The assembler (or rather, the hawk.h header file) knows this sequence by the name LIW, standing for load immediate word, and generally, programmers can use this as if it was a single instruction, even though it is not.

Exercises

e) What is the most appropriate instruction to use for loading each of the following constants: 4, 40, 400, 4000, 40000, 400000?

f) What halfword does LIS R5, 44 assemble to in memory?

g) Write a SMAL Hawk program to load each register with its register number, so register r[i] ends up containing i.

Adding Constants

Studies of the frequency of different operations in real programs show that the most common operator in programs is addition, that most additions add a constant to a variable, and that the most common constant added to a variable is one. It is fair to say that the most common statement in all programs is i=i+1. Because of this, most computer architectures provide a single instruction for incrementing the value stored in a register. The Hawk is no exception.

Of course, we could always increment a variable by first loading the constant required into a register and then adding that constant to the variable, but this takes two instructions. This is exactly how we do it when the constant we are adding is very large. Because small constants are so common, the Hawk provides two different add immediate instructions. Add short immediate or ADDSI can be used for signed constants that fit in 4 bits, while the more general add immediate allows 16-bit constants.

The Hawk ADDSI instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
0 0 0 1 dst 1 1 0 0 const4

We can use the shorthand notation r[dst]=r[dst]+sxt(3,const4) to describe this. Because the constant in the add short immediate instruction is sign extended before adding it to the contents of the 32-bit destination register, the allowed range is -8 to +7. (Actually, because adding zero is not useful, the Hawk CPU includes trickery to encode -8 as zero.) If a larger range of values is needed, the add immediate instruction should be used. As with LIL, this is a long instruction composed of two consecutive halfwords:

The Hawk ADDI instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
1 1 1 1 dst 0 1 1 0 x
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
const16

We can use the shorthand notation r[dst]=r[x]+sxt(15,const16) to describe this. Unlike ADDSI, ADDI does not require that the source and destinations be the same register. The constant is sign extended, so the range of allowed values is -32768 to +32767.

In the rare event that larger constants are needed, these must be loaded into a register first, and then added. The following assembly language example illustrates this:

Incrementing registers by various amounts
                                 1          USE     "hawk.h"
                                 2  .       =       #1000
 00001000: 13  C1                3          ADDSI   R3, 1        ; R3=R3+1
 00001002: F4  63  000A          4          ADDI    R4, R3, 10   ; R4=R3+10
 00001006: E5  0186A0            5          LIL     R5, 100000
 0000100A: 35  54                6          ADD     R5, R5, R4   ; R5=R4+100000

Exercises

h) What is the most appropriate instruction or instruction sequence to use for incrementing a register by each of the following constants: 4, 40, 400, 4000, 40000, 400000?

i) What halfword does ADDSI R5, 4 assemble to in memory?

j) Translate r1 = r2 + r3 - 5 into Hawk instructions expressed in assembly language.

k) Translate r1 = (r2 - r3) + 5 into Hawk instructions expressed in assembly language.

Memory Reference Instructions

Loading constants into registers and doing arithmetic on those constants is not enough unless all of the variables used by a program can be held in registers. This is occasionally true, but most programs must hold some variables in memory. Some computer architectures allow any machine instruction to reference operands anywhere in memory or in any register in the processor. Such machines have complex instruction set complexity or CISC architectures.

On the other extreme, some architectures, the Hawk among them, require that all arithmetic be performed by register-to-register instructions, with all use of variables in memory done by loading those variables into registers or storing the contents of registers to memory. These architectures are known as load-store architectures or reduced instruction set complexity or RISC architectures. The ARM, MIPS and IBM Power architectures are commercially important RISC architectures.

On a CISC architecture, the statement a=b+c might reduce to one very complex add instruction. In contrast, on a RISC architecture, we would begin with two load instructions to get the values of the variables into registers, then an add instruction to add the values, and finally a store instruction to put the result back in memory.

The Hawk architecture has 5 distinct load instructions and 3 store instructions. The simplest of these take a memory address taken from a registers, with no modification. These are known as short-memory-reference instructions because they are encoded as a single 16-bit halfword. For each of these short instructions, there is a corresponding long forms made up of of two consecutive halfwords, where the second halfword is used to modify the address.

On the Hawk and many other computers, a register that holds a memory address is called an index register. This makes sense if you think of memory as an array and remember that one use of the word index is as a synonym for an array subscript.

The LOADS or load short instruction, for example, loads the destination register from the 32-bit word stored at the memory location addressed by the contents of the index register. This instruction has the following form:

The Hawk LOADS instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
1 1 1 1 dst 1 1 0 1 x

The STORES or store short instruction has exactly the same format, but it moves the data in the opposite direction, from the destination register to the location pointed to by the index register. Note the abuse of the term destination register in the context of this instruction.

The Hawk STORES instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
1 1 1 1 dst 1 0 1 0 x

These two instructions have identically the same opcode, 11112. This opcode value indicates that this is a memory-reference instruction and that bits 0 to 3 of the instruction register specify an index register while bits 8 to 11 specify the destination register, as usual. Bits 4 to 7 of the instruction register are used in this instruction format to specify which specific memory memory reference instruction is to be performed. These bits can be thought of as an extension of the opcode field of the instruction.

In a high-level language, the operation of LOADS can be described as r[dst]=m[r[x]]. Similarly, STORES can be described as m[r[x]]=r[dst]. As was mentioned in the introduction to the instruction register format, the name destination field is used because, in most instructions, this field identifies the destination register. The Hawk store instructions are exceptions to this generalization!

The following example shows the use of these instructions to add the contents of in memory words 1000016 and 1000416 and put the sum in word 1000816. The example begins by loading index registers with the addresses of the 3 locations before the 4 instructions that actually do the work:

Adding two variables
                                 1          USE     "hawk.h"
                                 2  .       =       #1000
                                 3  
                                 4  ; load the addresses
 00001000: E5  010000            5          LIL     R5, #10000
 00001004: E6  010004            6          LIL     R6, #10004
 00001008: E7  010008            7          LIL     R7, #10008
                                 8  
                                 9  ; then perform the arithmetic
 0000100C: F3  D5               10          LOADS   R3, R5         ; from #10000
 0000100E: F4  D6               11          LOADS   R4, R6         ; from #10004
 00001010: 32  34               12          ADD     R2, R3, R4
 00001012: F2  A7               13          STORES  R2, R7         ; to #10008

The idea that each load from memory or store to memory must be preceeded by loading the address of the variable into an index register is rather daunting, but note that there are many optimizations that can reduce this cost.

Exercises

l) Translate a = (b - 64) + c to Hawk assembly code, assuming that the variables a, b and c are stored in memory at words 1000016 through 1000816 in that order.

m) Consider F1D516, F26116, FFFF16, F2A516 as a sequence of halfwords. What sequence of Hawk assembly language instructions, when assembled, would produce this sequence?

The Register-to-Register Move Instruction

The Hawk computer has a number of other useful instructions. For example, the move instruction moves the contents of one register to another:

The Hawk MOVE instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
1 1 1 1 dst 1 1 1 1 x

We can describe the effect of this with the notation r[dst]=r[x]. This instruction is considered a short memory reference instruction, just like LOADS because inside the processor, it performs exactly the same computation as LOADS, but instead of using the value of r[x] as the memory address of the operand, it uses that value as the operand itself.

Note that value 11111111111111112 or FFFF16, when treated as an instruction, means move register 15 to register 15, doing nothing. Such an instruction is called a no-op, short for no operation. Processors designed for use with programmable read-only memory or PROM are frequently designed so that the values 000016 and FFFF16 are no-ops. This is because each bit of a PROM can be changed exactly once from its initial value, and all bits are initially the same, either 0 or 1. If the initial value is FFFF16, such no-ops can later be changed to useful instructions by changing some of the ones to zeros. If 000016 is also a no-op, erroneous instruction can be converted to no-ops by turning the remaining ones to zeros. This allows a program in PROM to be patched without the need to replace the PROM chip.

Exercises

n) Using only one LIS instruction and a sequence of MOVE instructions, write sequence of Hawk assembly language instructions that will set all 15 registers to 1.

o) How many distinct distinct correct answers are there to the previous problem? This is really a discrete math question!

p) Using only one LIS instruction and a sequence of ADD instructions, write sequence of Hawk assembly language instructions that will set all 15 registers so that r[i] contains i.

q) Assuming you set all 15 registers in order, how many distinct solutions are there to the previous problem? If you eliminate the order constraint, in how many distinct solutions are there? This question is harder than the similar one above.

Load Immediate Word and Macros

As mentioned earlier, the Hawk architecture does not have a single instruction to load a full 32-bit constant. Instead, a two-instruction instruction sequence is used. First, an LIL instruction loads the high 24 bits of the constant. Then, an ORIS instruction slips in the low 8 bits of the constant. ORIS stands for or-immediate shifted. This first shifts the contents of the destination register 8 places left, then it ors an 8-bit constant into the register:

The Hawk ORIS instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
1 1 0 0 dst const8

This can be described formally as r[dst]=(r[dst]<<8)|const8. Note that the constant is not sign extended!

The << operator has the same definition here as it has in C, C++ and Java, as well as in our SMAL assembly language. This operator shifts the binary value of its left-hand argument to the left the number of places indicated by its right-hand argument, filling zeros into the left end of the word. In arithmetic terms, the result of this is that, a<<b is the same as a×2b.

The | or bitwise-or operator is also used here as it is in C, C++ and Java. This operator produces each bit of the result as the logical or of the corresponding bits of the operands. So, 11002 or 10102 is 11102; this carefully composed example includes within it all possible combinations of 1 and 0.

While there may be rare exceptions, the vast majority of uses of the ORIS instruction will be in the context of a previous LIL instruction. For example if we wanted to load 765432102 into register 5, we could use the following instruction sequence:

Loading a 32-bit constant
                                 1          USE     "hawk.h"
                                 2  .       =       #1000
 00001000: E5  765432            3          LIL     R5, #765432
 00001004: C5  10                4          ORIS    R5, #10     ; R5 = #76543210

When a sequence of instructions is used as a unit, as if it were a single instruction, that sequence is known as a macro instruction or just a macro. This terminology dates back to Autocoder, the assembler for the IBM 705 computer from the mid 1950's. Most assembly languages designed since that time have included tools to package macros so that they can be used as if they were single instructions. In the SMAL assembler, the macro LIW, meaning load-immediate word, can be defined as:

Defining a macro to load a 32-bit constant
             
        MACRO   LIW =dst, =const
          LIL   dst, const >> 8
          ORIS  dst, const & #FF
        ENDMAC
             

In reading the above, note the >> operator. This is the right-shift operator. As in C, C++ and Java, it shifts the bit pattern from its left operand right the number of places indicated by its right operand, filling the most significant bits with zeros. If the operands are unsigned integers, a>>b is equivalent to a/2b. Also note the & or bitwise-and operator. As in C, C++ and Java, each bit of the result is the logical and of the corresponding bits of the operands. So 11002 & 10102 is 10002.

This macro definition is included in the standard hawk.h header file, so most assembly language programmers will never explicitly use the ORIS instruction; instead, they will write programs using LIW to load full-word constants, as follows:

Using the macro to load a 32-bit constant
                                 1          USE     "hawk.h"
                                 2  .       =       #1000
 00001000: E5  765432  C5        3          LIW     R5, #76543210
 00001005: 10 

Of course, when the CPU executes this code, it will not see an LIW instruction, it will see an LIS followed by an ORIS, executed in two iterations of the fetch-execute cycle. If you use a debugger to examine the contents of memory, you will see two separate instructions, not one macro instruction. In the assembly listing above, the assembler listed the output from the two different instructions from left to right, filling each line before starting on the next line. The second instruction, the LIW instruction, ended up split across two lines of the listing. This has no effect on how the data is stored in memory.

The hawk.h header file includes macros that define each of the built-in instructions in the Hawk. For example, the load-immediate-long and or-immediate-and-shift instructions can be defined as follows:

Macros for two Hawk machine instructions
         
; the load-immediate-long instruction
        MACRO   LIL =dst, =const
          B     #E0 | dst
          T     const
        ENDMAC

; the or-immediate-and-shift instruction
        MACRO   ORIS =dst, =const
          B     #C0 | dst
          B     const
        ENDMAC
         

The keyword MACRO tells the assembler that the following text, up to ENDMAC, is to be substituted for the macro name whenever that name appears in the text as an instruction name. As this substitution is done, the parameters to the macro are substituted into the macro body. Macros resemble subroutines with similar terminology. We speak of macro calls and macro parameters, but unlike subroutines, macro processing is purely textual.

The macro defining the LIL instruction uses a T directive to assemble a 24-bit (three byte) constant into memory. This is the only use of the T directive in the Hawk macros.

If you look in the hawk.h file, you will see macros somewhat more complex than those presented here for LIW, LIL and ORIS. The complexity is the result of additions to prevent the use of symbols that are not register names in fields where register names are required. If all assembly language programs were error free, the code given here would suffice. Unfortunately, real programmers make errors, and the additional complexity in the hawk.h file helps detect some of these errors when a program is assembled.

Exercises

r) Give an appropriate macro definition for the Hawk LIS instruction.

s) Give an appropriate macro definition for the Hawk ADDSI instruction.

t) Give an appropriate macro definition for the Hawk ADDI instruction.

u) Give an appropriate macro definition for the Hawk LOADS instruction.

The Fetch-Execute Cycle Revisited

Our original presentation of the fetch-execute cycle was quite brief, suggesting that Hawk worked as follows:

The Hawk fetch-execute cycle
while (TRUE) do {
        /* fetch one instruction */
        instruction_register = halfword_memory[ program_counter ];
        program_counter = program_counter + 2;

        /* execute that instruction */
        execute( instruction_register );
}

We have covered enough ground that we can expand on this considerably, giving details of how the fields of the instruction are broken out and how they are decoded:

The Hawk cycle, in more detail
while (TRUE) do {
        /* fetch one instruction */
        instruction_register = halfword_memory[ program_counter ];
        program_counter = program_counter + 2;

        /* break out fields of the instruction register */
        opcode =          (instruction_register >>  4) & 15;
        dst =             (instruction_register      ) & 15;
        const8 =          (instruction_register >>  8)     ;
        s1 = opcode1 =    (instruction_register >> 12)     ;
        s2 = const4 = x = (instruction_register >>  8) & 15;

        /* execute the instruction */
        switch (opcode) {
                /* 16 cases, one per legal opcode value */
        }
}

In the above code, we have given no detail for each instruction, but we can easily add this. Note that many of the fields of the instruction have multiple names. For example, we used the name const4 in our description of the ADDSI instruction to refer to the same field of the instruction that we called x in the memory reference instructions or s2 in the ADD and SUB instructions.

The actual hardware for a CPU does not need separate registers to hold the different fields of the instruction register. Instead, there is just one instruction register, with wires (conducting traces on the surface of the chip) carrying the different bits to the required places. The variables opcode, const8, s1 and s2 are names for these bundles of wires and not names for registers. The assignment statements giving values to these fields do not describe real computations inside the processor, they describe how the bundles of wires are connected to the various bits of the instruction register.

Given the above, filling in the cases for instructions like ADD and LIS is quite simple:
 

Executing the ADD and LIS instructions
           
case 3: /* ADD */
        r[dst] = r[s1] + r[s2];
        break;

case 13: /* LIS */
        r[dst] = sxt( 7, const8 );
        break;
           

Execution of double-length instructions is more complex. This is the case with the LIL instruction. Only after we decode the opcode field to determine that it is an LIL instruction can we fetch the second halfword.

Other instructions are more complex because they involve additional decoding work. This is the case with the LOADS and STORES instructions. Here, we must decode the fact that the instruction is a memory reference instruction before we use the s1 field as a secondary opcode. Some of this complexity is described below:

Executing the LIL and memory reference instructions
       
case 14: /* LIL */
        const16 = halfword_memory[ program_counter ];
        program_counter = program_counter + 2;
        r[dst] = sxt( 23, (const16 << 8) | const8 );
        break;

case 13: /* memory reference instructions */
        switch (opcode1) {
                case 13: /* LOADS */
                        r[dst] = word_memory[ r[x] ];
                        break;
                case 15: /* MOVE */
                        r[dst] = r[x];
                        break;
                /* 14 more cases */
        }
        break;
       

It is crucial to remember that, within any CPU, whether implemented using silicon or vacuum tubes, the fetch-execute algorithm is directly executed by hardware. As this hardware executes the instruction, it performs a sequence of register transfers, to move data from one register to another. With each register transfer, the data may be routed through components that transform the data, for example, adding values from different registers.

When these register transfers are done in sequence, we can describe each instruction in the machine language as being executed in a series of microsteps; sometimes, the control unit of the processor directs these microsteps using what can be described as micro instructions, a phrase that was coined at around the same time as the phrase macro instruction. Some processors even have a small and very fast read-only memory that is used to store a microprogram that describes the instruction set of that processor.

If, instead of building hardware, we take an existing computer and run a program on that computer that executes the algorithm given above, we have constructed what is called an emululator for one processor's instruction set that runs on another. Emulators can be very complex because of the need to provide support for input and output devices, but the basic idea is fairly simple. Commercial emulators are widely used in order to allow applications written for one vendor's computers to be run on computers built by other vendors, for example, there were several emulators allowing IBM PC code using the Intel Pentium architecture to be run on Apple Macintosh computers using the Power PC architecture.

Emulators are also used to allow new computers to run code that was originally compiled for older machines, for example, to allow code from the original Apple Macintosh computers that used Motorola 68000 processors to run on newer Apple Macintosh computers that use the Motorola or IBM Power PC processor. Apple went through this again during the transition from the Power PC to the Intel Pentium architecture. Some of the very first emulators were written with this use in mind. For example, when IBM came out with their System 360 processor in 1965, one of their first products was an emulator for the IBM 1401 computer, so that they could immediately begin replacing existing 1401 computers with 360 computers.

Exercises

v) Give the code for the SUB case (you will have to look it up in the Hawk manual).

w) Give the code for the ORIS case.

x) Give the code for the ADDI case -- and note which switch statement it goes in!

y) Write code for the sxt() function. This may take some thought!
 

Instruction Register Formats of some Other Machines

For the sake of comparison, here are the instruction register formats of some other historically important machines. The simplest of the DEC PDP-8 family of computers had a main memory consisting of 4096 12-bit words, and the processor contained a 12-bit program counter and a single 12-bit accumulator. The instruction register on this machine had the following format:

The PDP-8 instruction register
           
00 01 02 03 04 05 06 07 08 09 10 11
opcode i z address
           

With a 3-bit opcode field, this machine obviously had a very small instruction set. It shold be no surprise that computers with this architecture were among the least expensive computers on the market for many years in the 1960s and 1970s. The i and z bits determined how the 7-bit address field of the instruction was used. If the z bit was zero, the address referred to a location in the bottom 128 words of memory, while if it was one, the reference was to a location in the 128-word page selected by the top 5 bits of the program counter. If the i bit was one, the contents of the memory address selected by the combined z bit and 7-bit address was used to load a complete 12-bit address from memory. This is called indirect addressing. Notice also that the bits of the word are numbered from the left instead of from the right. This is the bigendian system of numbering the bits.

The IBM System 360, ancestor of IBM's current line of enterprise servers, was at the other extreme of price and performance. This machine had 16 general prupose registers and a 32-bit word, with the following instruction register format:

The IBM 360 instruction register
   
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15
opcode r x
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
b displacement
   

Like the Hawk, this machine allowed two different instruction sizes. Short instructions were only 16 bits long, while long instructions were 32 bits. The opcode was always 8 bits, allowing 256 different instructions. The r field was comparable in use to the dst field of a Hawk instruction. It usually designated the destination register except for store instructions.

The x field was used to designate the source register for register to register arithmetic instructions, and the x field was used to designate the index register that held a memory address for short memory reference instructions. The 360 long memory reference instructions used x and b to designate two registers, the index register and the base register. The contents of these were added to the 12-bit displacement to form the memory address.

Modern architectures frequently have a dismaying variety of instruction formats. The Intel 80x86 family that dominates today's marketplace is extremely messy, while the Power PC architecture is more modest, with 5 basic instruction formats, all of which are 32 bits long. With 32 registers, this machine used 5 bits for each of the source and target register fields, and some instructions specified as many as 4 registers (typically three source registers and a target).

The Power PC instruction register
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
opcode tgt/src src/tgt immediate
opcode tgt/src src/tgt src extended opcode
opcode tgt/src src/tgt src src ext opcd Rc
opcode BO BI BD AA LK
opcode LI AA LK