4. The Fetch Execute Cycle

Part of CS:2630, Computer Organization Notes
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Overview

The Preliminary discussion of the logical design of an electronic computing instrument by Burks, Goldstine and Von Neumann, written in in 1946, outlines the design of the EDVAC computer that was delivered to the Ballistics Research Laboratory in 1949. The basic structure of EDVAC followed the outline given at the start of Chapter 3:

The Von Neumann model of a computer
     
Central Processing Unit
CPU
 
Registers
 
 
              Interface(s) or Bus(ses)              
 
Memory
 
Input
Output
     

Within this outline, EDVAC and other early computers varied widely. Parallel machines operated on all the bits of each word at once, while serial machines processed bits one at a time. Some used real random-access memory, while others simulated random-access memory using sequential devices like drums or delay lines. Some used binary, others used binary-coded decimal with multiple bits per digit.

The basic idea of the central processing unit has not changed since these early days. At heart, almost all of the central processors designed from 1946 onward can be described in the following terms:

The above description is essentially an algorithm, so we can describe the central processor in algorithmic terms using programming notation. Here is the algorithm in C:

The fetch-execute cycle
while (TRUE) do {
        /* fetch one instruction */
        instruction_register = memory[ program_counter ];
        program_counter = program_counter + 1;

        /* execute that instruction */
        execute( instruction_register );
}

While the basic idea of the fetch-execute cycle has not changed since the dawn of electronic computing, the details have changed immensely. Early machines could only execute a few thousand instructions per second, but by the late 1960s there were machines able to execute a million instructions per second, and today we speak casually of machines that may execute billions of instructions per second.

The above algorithm increments the program counter by one for each instruction, while from the start, many computers have had instructions smaller or larger than one addressable unit of memory. By 1960, there were computers with variable-sized instructions, where the size of the instruction depended on its function.

The set of registers used inside the central processor has also varied considerably between different computers. In addition to the program counter and instruction register, many early computers had only one other register, the accumulator. Instructons did things like load the accumulator from memory, store its contents to memory, or add to it. In contrast, many modern central processors have several tens of registers.

The Hawk CPU

The Hawk borrows from a number of computers designed from 1946 to the present. The most important of these is the IBM System 360 first sold in 1965; IBM's family of enterprose servers are descended from this. Another important source is the Digital Equipment Corporation PDP-11 first sold in 1970. The Motorola 68000 and the Intel 8080 and 8086 borrowed many ideas from the PDP-11, and the 8086 is the ancestor of the Intel Pentium family that still dominates today's desktop computing market.

As with the IBM System 360, The Hawk has 15 general purpose registers of 32 bits each, numbered from 1 to 1510 or 1 to F16. Each of these can be used as accumulators to hold intermediate results during computation or as index registers to hold memory addresses. In assembly language programs, these are named R1 to R15 or RF. Early computers had only a single accumulator within the CPU. The PDP-11 and Motorola 68000 had 7 general purpose registers. Several modern computers have 32 or more general purpose registers.

In addition to the general purpose registers, the Hawk has a program counter and a special 32-bit register called the processor status word that contains a variety of special fields. The most important PSW field holds the condition codes named N, Z, V and C. These are so important that the Hawk emulator displays them separately.

Whenever you run a Hawk debugger or emulator, the internal state of the processor will be displayed as follows:

The Hawk CPU state
PC:  00000000                R8: 00000000
PSW: 00000000  R1: 00000000  R9: 00000000
NZVC: 0 0 0 0  R2: 00000000  RA: 00000000
               R3: 00000000  RB: 00000000
               R4: 00000000  RC: 00000000
               R5: 00000000  RD: 00000000
               R6: 00000000  RE: 00000000
               R7: 00000000  RF: 00000000

In the Hawk processor state display, the program counter and processor status word are shown on the left, with the condition codes (the low 4 bits of the status word) broken out on the third line. The right two columns show the 15 registers, each with its name. The Hawk debugger always shows the contents of the registers in hexadecimal.

Why didn't the display shown above include the instruction register? It is there, inside the central processor, but its contents are only transient, and typical debuggers or emulators provide convenient ways of examining the program being run, either by showing the contents of memory, or by showing the code in symbolic form.

Newcomers to machine-language and assembly-language programming should not seek in-depth information on the internals of the processor until they have seen an overview of a few of the simpler instructions. This information is always available in the reference manual for the processor. In our case, Chapter 1 of the Hawk manual contains a section on the central processor that gives more detail on the registers, and it includes a section on the instruction execution cycle giving specifics of this cycle for the Hawk processor.

The Hawk Fetch-Execute Cycle

The fetch-execute cycle of the Hawk differs from the generic fetch-execute cycle because each Hawk instruction is made of either one or two halfwords. As a result, instruction addresses are always even, and the program counter always increments by either two or four. If we ignore instructions longer than one halfword, we can describe the fetch-execute cycle of the Hawk as follows:

The fetch-execute cycle
while (TRUE) do {
        /* fetch one instruction */
        instruction_register = halfword_memory[ program_counter ];
        program_counter = program_counter + 2;

        /* execute that instruction */
        execute( instruction_register );
}

To understand how the central processor of any computer works, we must examine the instruciton register in more detail. In the case of the Hawk, for 16-bit instructions, the instruction register is arranged as follows:

Organization of the Hawk instruction register
                             
07 06 05 04 03 02 01 00
opcode dst
15 14 13 12 11 10 09 08
 
                             

First and foremost, the instruction register is best viewed as two consecutive bytes of one halfword. The bit numbers shown above are the bit numbers in the halfword, but the first byte of the halfword is shown on top of the second, to emphasize the order in which the bytes are stored in memory. In much of the Hawk documentation, the two halves of the halfword are shown as follows, deliberately placed in the wrong order in order to allow left-to-right readers to see the opcode field as coming first:

Organization of the Hawk instruction register
                 
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
opcode dst  
                 

The most important field of the instruction register is the opcode or operation-code field. This 4-bit field determines the interpretation of the other fields of the instruction. In some cases, it completely determines the details of the instruction to be executed. A second field, the destination register field or dst is almost always used to specify one of the 15 general purpose registers. This register is almost always used as the destination register, that is, the place where the result of some operation will be go.

Basic Arithmetic Instructions

Consider the add instruction. On the Hawk, as on many other computers, this instruction allows the contents of two registers to be added. On the Hawk, this instruction takes the following form:

The Hawk ADD instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
0 0 1 1 dst s1 s2

In the add instruction, the opcode field is 00112 and when the central processor sees this pattern, this determines that bits 8 to 11 and bits 12 to 15 of the instruction register will be used as register numbers source one and source two, s1 and s2, specifying the registers that will provide the operands to be added. The result will then be placed in the destination register before the next instruction is executed.

In short, we use the notation r[dst]=r[s1]+r[s2] to describe the add instruction, or rather, what the CPU does when it processes the add instruction. Here, r is the array of registers, with the register number being used as a subscript. We describe this instruciton as a three-register instruction because it specifies three registers. The name three-register instruction is something of a misnomer because some Hawk instructions with this format use s1 or s2 to hold small constants instead of register numbers.

To illustrate the impact of executing an add instruction, consider a Hawk processor with the following initial state:

An initial Hawk CPU state
PC:  00001000                R8: 00000000
PSW: 00000000  R1: 00000000  R9: 00000000
NZVC: 0 0 0 0  R2: 00000000  RA: 00000000
               R3: 00000000  RB: 00000000
               R4: 01234567  RC: 00000000
               R5: 76543210  RD: 00000000
               R6: 00000000  RE: 00000000
               R7: 00000000  RF: 00000000

In the above initial state, the program counter is set to 100016 so the next time the processor tries to fetch an instruction, it will be from the halfword in memory occupying memory addresses 100016 and 100116. Suppose that this halfword has the value 453216. The Hawk is a lowbyter or little-endian machine, so first byte of this value is 3216, with most significant hexadecimal digit being 00112 in binary. This is the ADD opcode, so the CPU sees s1 = 4, s2 = 5, and dst = 2. The computation this specifies is r2=r4+r5, and in SMAL Hawk assembly language, it would have been specified as ADD R2,R4,R5. As a result, if we allow the CPU to execute exactly one instruction in this context, the state of the processor will change to:

The state after ADD R2,R4,R5
PC:  00001002                R8: 00000000
PSW: 00000000  R1: 00000000  R9: 00000000
NZVC: 0 0 0 0  R2: 77777777  RA: 00000000
               R3: 00000000  RB: 00000000
               R4: 01234567  RC: 00000000
               R5: 76543210  RD: 00000000
               R6: 00000000  RE: 00000000
               R7: 00000000  RF: 00000000

Note that there are only two changes from the initial state to the new state given above. First, the program counter has been incremented by two, since a single halfword instruction occupies 2 bytes, and second, the value of register 2 has changed. This now holds the sum of the values in registers 4 and 5.

Nothing prevents an assembly language programmer from writing H #4532 to put this add instruction into memory. Recall that the H directive in our SMAL assembly language places a halfword in memory, and recall that the # is used as a prefix indicating that the operand is a hexadecimal number. While an assembly language programmer can write this, it is not very readable. People don't like reading programs written in hexadecimal.

The SMAL assembly language does not have any built-in knowledge of the Hawk instruction set, but as mentioned in the previous chapter, the relevant definitions are available in a header file. Once this header is included, we can use the convenient shorthand notation ADD R2,R4,R5. Here is a listing of an assembly language program that puts the two byte sequence 3216 4516 making up the instruction halfword 453216 in memory location 100016:

Assembly of the ADD instruction
                                 1          USE     "hawk.h"
                                 2  .       =       #1000
 00001000: 32  45                3          ADD     R2,R4,R5

A computer with just one instruction would be difficult to use, but the 4-bit opcode field of the Hawk allows for 16 distinct instructions. For example, the SUB instruction has the same format as ADD, but it subtracts. ADD and SUB are register-to-register instructions, so called because they take operands from registers to produce a result in a register. There are also memory-reference instructions that transfer data to and from memory, and immediate instructions that have some operands that are constants taken directly from fields on the instruction register.

Exercises

a) What halfword does ADD R15,R12,R8 assemble to in memory.

b) What halfword does ADD R7,R9,R11 assemble to in memory.

c) If every register r[i] contains the value i, and then ADD R7,R9,R11 is executed, what register is changed and what is its new value?

d) If every register r[i] contains the value i, and then the instruction 563216 is executed, what register is changed and what is its new value?

More Hawk Instructions, Loading Constants

Register-to-register instructions alone are not sufficient to make a useful computer. We also need a way to load values into registers and to store results from registers to memory. The most common operands in arithmetic expressions are small constants. For example, the most common use of addition in real programs is to increment a variable, usually by one. Because of this, most computer architectures provide simple instructions for loading constants into registers. These are typically called load-immediate instructions. The term immediate constant refers to a constant that is either taken from the instruction register or addressed by the program counter.

In the case of the Hawk architecture, several instructions include immediate constants. There are two load immediate instructions, a short one for 8-bit constants and a long one for 24-bit constants. Similarly, there are two forms of add immediate, one for short 4-bit constants and one for long 16-bit constants. The simplest of these is the load-immediate-short instruction. This has the following format:

The Hawk LIS instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
1 1 0 1 dst const8

In the load-immediate-short instruction, the opcode field is 11012 and when the central processor sees this pattern, it uses bits 8 to 15 of the instruction as a two's complement 8-bit constant in the range –12810 to +12710. This constant is expanded to 32 bits by sign extension and stored in the register specified by the dst field of the instruction. We can use the shorthand notation r[dst]=sxt(7,const8) to describe this.

In our shorthand notation, the sxt() function indicates sign extension, with the first argument being the bit-number of the bit that is to be duplicated to fill all higher-numbered bits. In the two's complement number system, an 8-bit constant can be expanded into a 32-bit constant by padding it, on the left, with 24 additional bits. These additional bits are copies of bit 7 of the original short constant, its sign bit. If the constant was positive, the new bits are all zero. If negative, they are all one. Many (but not all) Hawk instructions that include short constants sign extend those constants before using them.

Of course, not all constants are short enough to fit in one byte. The load-immediate-long instruction allows any 24-bit signed constant. Because the instruction register of the Hawk is only 16-bits long, this instruction is coded as two successive halfwords, as follows:

The Hawk LIL instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
1 1 1 0 dst const8 (bits 0-7)
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
const16 (bits 8-23)

The constant in the load-immediate long instruction is sign extended before loading it into the 32-bit destination register, so the allowed range of values is –223 to 223–1, or –8,388,608 to 8,388,607. Again, we can use the shorthand notation r[dst]=sxt(23,const) to describe this, ignoring for the moment the problem of combining the 8-bit and 16-bit parts of the constant.

It is worth remembering that the load immediate long instruction can be viewed as a sequence of 4 bytes in memory, like this.

The Hawk LIL instruction as a byte sequence
                             
07 06 05 04 03 02 01 00
1 1 1 0 dst
15 14 13 12 11 10 09 08
const8 (bits 0-7)
07 06 05 04 03 02 01 00
const8 (bits 8-15)
15 14 13 12 11 10 09 08
const8 (bits 16-23)
                             

To demonstrate the use of these instructions, consider the following assembly language program that loads constants into registers R4 and R5 and then adds them, storing the result in R2:

Loading and adding some constants
                                 1          USE     "hawk.h"
                                 2  .       =       #1000
 00001000: D4  12                3          LIS     R4,#12
 00001002: E5  654321            4          LIL     R5,#654321
 00001006: 32  45                5          ADD     R2,R4,R5

Note that, in assembling this code, the assembler allocated one halfword for the load-immediate-short instruction LIS and a byte followed by a triple byte for the load-immediate-long instruction LIL. As a result, the ADD instruction sits in the 4th halfword of the program. In the assembly listing, each single-halfword instruction is given in hexadecimal as two consecutive bytes, while the long LIL instruction is shown as a single byte followed by a triple-byte quantity holding the immediate constant.

If this bit of code is loaded into the memory of a Hawk computer and the program counter is set to 100016 with all of the registers equal to zero, and then the computer is started and run for 3 instruction cycles, the state of the processor will end up as:

The state after 2 loads and an add
PC:  00001008                R8: 00000000
PSW: 00000000  R1: 00000000  R9: 00000000
NZVC: 0 0 0 0  R2: 00654333  RA: 00000000
               R3: 00000000  RB: 00000000
               R4: 00000012  RC: 00000000
               R5: 00654321  RD: 00000000
               R6: 00000000  RE: 00000000
               R7: 00000000  RF: 00000000

In this new processor state, the program counter has advanced to 100816, 8 bytes from the point at which the program began because the program itself occupied 4 halfwords. The first two instructions loaded registers R4 and R5, and the ADD instruciton put the sum of these two registers in R2.

In general, long instructions run slower than short ones, and even long-immediate instructions cannot handle all possible constant values. The Hawk architecture does not have an instruction for loading an arbitrary 32-bit constant. Instead, a sequence of 2 instructions taking up 3 halfwords of memory can do this. The assembler (or rather, the hawk.h header file) knows this sequence by the name LIW, standing for load immediate word, and generally, programmers can use this as if it was a single instruction, even though it is not.

Exercises

e) What is the most appropriate instruction to use for loading each of the following constants: 4, –40, 400, –4000, 40000, –400000?

f) What halfword does LIS R5,-44 assemble to in memory?

g) Write a SMAL Hawk program to load each register with its register number, so register r[i] ends up containing i.

Adding Constants

Studies of the frequency of different operations in real programs show that the most common operator in programs is addition, that most additions add a small constant to a variable, and that the most common constant added to a variable is one. So, the most common statement in all programs is i=i+1. Because of this, most computer architectures provide a single instruction for incrementing the value stored in a register. The Hawk is no exception.

Of course, we could always increment a variable by first loading the constant required into a register and then adding that constant to the variable, but this takes two instructions. This is what we do for the rare large constant, but CPU designers usually give us special small fast instructions for the common cases. The Hawk is typical, with two different add immediate instructions. Add short immediate or ADDSI can be used for signed constants that fit in 4 bits, while the more general, and longer, add immediate allowing 16-bit constants.

The Hawk ADDSI instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
0 0 0 1 dst 1 1 0 0 const4

We can use the shorthand notation r[dst]=r[dst]+sxt(3,const4) to describe this. Because the constant in the add short immediate instruction is sign extended before adding it to the contents of the 32-bit destination register, the allowed range is -8 to +7; in addition, since adding zero is not useful, the Hawk CPU includes trickery to encode -8 as zero. If a larger range of values is needed, the add immediate instruction should be used. As with LIL, this is a long instruction composed of two consecutive halfwords:

The Hawk ADDI instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
1 1 1 1 dst 0 1 1 0 x
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
const16

We can use the shorthand notation r[dst]=r[x]+sxt(15,const16) to describe this. Unlike ADDSI, ADDI does not require that the source and destinations be the same register. The constant is sign extended, so the range of allowed values is -32768 to +32767.

In the rare event that larger constants are needed, these must be loaded into a register first, and then added. The following assembly language example illustrates this:

Incrementing registers by various amounts
                                 1          USE     "hawk.h"
                                 2  .       =       #1000
 00001000: 13  C1                3          ADDSI   R3,1         ; R3=R3+1
 00001002: F4  63  000A          4          ADDI    R4,R3,10     ; R4=R3+10
 00001006: E5  0186A0            5          LIL     R5,100000
 0000100A: 35  54                6          ADD     R5,R5,R4     ; R5=R4+100000

Exercises

h) What is the most appropriate instruction or instruction sequence to use for incrementing a register by each of the following constants: 4, –40, 400, –4000, 40000, –400000?

i) What halfword does ADDSI R5,4 assemble to in memory?

j) Translate r1 = r2 + r3 - 5 into SMAL Hawk instructions.

k) Translate r1 = (r2 - r3) + 5 into SMAL Hawk instructions.

Memory Reference Instructions

Loading constants into registers and doing arithmetic on those constants is not enough unless all of the variables used by a program can be held in registers. This is occasionally true, but most programs must hold some variables in memory. Some computer architectures allow any machine instruction to reference operands anywhere in memory or in any register in the processor. These have complex instruction set complexity or CISC architectures.

On the other extreme, some architectures, the Hawk among them, require that all arithmetic be performed by register-to-register instructions, with all use of variables in memory done by loading those variables into registers or storing the contents of registers to memory. These architectures are known as load-store architectures or reduced instruction set complexity or RISC architectures. The MIPS, IBM Power, and ARM architectures are commercially important RISC architectures; the ARM is currently the dominant architecture used in cellphones.

On a CISC architecture, the C statement a=b+c might compile to one very complex add instruction. In contrast, on a RISC architecture, it would compile to two load instructions to get the values of the variables b and c into registers, then an add instruction, and finally a store instruction to put the result in a.

The Hawk architecture has 5 distinct load instructions and 3 store instructions. The simplest of these use a memory address taken directly from a register. These are known as short-memory-reference instructions because they are encoded as a single 16-bit halfword. For each of these short instructions, there is a corresponding long form made up of of two consecutive halfwords, where the second halfword is used to modify the address.

On the Hawk and many other computers, registers holding memory addresses are called index registers. This makes sense if you think of memory as an array and note that the word index can be a synonym for array subscript.

The LOADS or load short instruction, for example, loads the destination register from the 32-bit word stored at the memory location addressed by the contents of the index register. We suggest pronouncing LOADS as load-s so it is not confused with a plural or past tense. This instruction has the following form:

The Hawk LOADS instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
1 1 1 1 dst 1 1 0 1 x

The STORES or store short instruction has the same format, but it moves data the opposite way, from the destination register to the location pointed to by the index register. Note the abuse of the term destination register in the context of this instruction. We suggest pronouncing STORES as store-s.

The Hawk STORES instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
1 1 1 1 dst 1 0 1 0 x

These two instructions have the same opcode, 11112. This value merely indicates that this is a memory-reference instruction, using bits 8 to 11 of the instruction register specify an index register, and bits 12 to 15 to specify which specific memory memory reference instruction is to be performed. These bits can be thought of as an extension of the opcode field.

In a high-level language, the operation of LOADS can be described as r[dst]=m[r[x]]. Similarly, STORES can be described as m[r[x]]=r[dst]. As mentioned in the introduction to the instruction register format, the name destination field is used because this field usually identifies the destination register. The Hawk store instructions are exceptions to this!

Suppose memory locations 1000016, 1000416 and 1000816 hold the variables a, b and c, and we want the machine code equivalent of the C statement a=b+c; on the Hawk, we do this by first loading index registers with the addresses of the varialbes, then loading the operands, adding them, and storing the result:

Adding two variables
                                 1          USE     "hawk.h"
                                 2  .       =       #1000
                                 3  
                                 4  ; load the addresses
 00001000: E5  010000            5          LIL     R5,#10000      ; address a
 00001004: E6  010004            6          LIL     R6,#10004      ; address b
 00001008: E7  010008            7          LIL     R7,#10008      ; address c
                                 8  
                                 9  ; then perform the arithmetic
 0000100C: F3  D6               10          LOADS   R3,R6          ; get b
 0000100E: F4  D7               11          LOADS   R4,R7          ; get c
 00001010: 32  34               12          ADD     R2,R3,R4
 00001012: F2  A5               13          STORES  R2,R5          ; set a=b+c

The idea that each load from memory or store to memory must be preceeded by loading the address of the variable into an index register can be daunting, but there are many optimizations that can reduce this cost.

Exercises

l) Translate a = (b - 64) + c to SMAL Hawk code, assuming that the variables a, b and c are stored in memory at words 1000016 through 1000816 in that order.

m) Consider F1D516, F26116, FFFF16, F2A516 as a sequence of halfwords. What sequence of SMAL Hawk instructions, when assembled, would produce this sequence?

The Register-to-Register Move Instruction

The Hawk MOVE insruction is a simple register-to-register assignment, copying data from one register to another:

The Hawk MOVE instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
1 1 1 1 dst 1 1 1 1 x

We describe the effect of this with the notation r[dst]=r[x]. This is considered a short memory reference instruction, just like LOADS because inside the processor, it performs exactly the same computation as LOADS, but instead of using the value of r[x] as a memory address, it uses the register contents as the operand itself.

Note that value 11111111111111112 or FFFF16, when treated as an instruction, means move register 15 to register 15, doing nothing. Such an instruction is called a no-op, short for no operation. Processors designed for use with programmable read-only memory or PROM are frequently designed so that the values 000016 and FFFF16 are no-ops. This is because each bit of a PROM can be changed exactly once from its initial value, and all bits are initially the same, either all 0 or all 1. If the initial value is FFFF16, such no-ops can later be changed to useful instructions by changing some of the ones to zeros. If 000016 is also a no-op, erroneous instruction can be converted to no-ops by turning the remaining ones to zeros. This allows a program in PROM to be patched without the need to replace the PROM chip. In the case of flash memory, it allows patching without the need to reflash the entire memory, a significantly slower operation.

Exercises

n) Using only one LIS instruction and a sequence of MOVE instructions, write sequence of Hawk assembly language instructions that will set all 15 registers to 1.

o) How many distinct distinct correct answers are there to the previous problem? This is really a discrete math question!

p) Using only one LIS instruction and a sequence of ADD instructions, write sequence of Hawk assembly language instructions that will set all 15 registers so that r[i] contains i.

q) Assuming you set all 15 registers in order, how many distinct solutions are there to the previous problem? If you eliminate the order constraint, in how many distinct solutions are there? This question is harder than the similar one above.
 

Load Immediate Word and Macros

As mentioned earlier, the Hawk architecture does not have a single instruction to load a full 32-bit constant. Instead, a two-instruction instruction sequence is used. First, an LIL instruction loads the high 24 bits of the constant. Then, an ORIS instruction slips in the low 8 bits of the constant. ORIS stands for or-immediate shifted. This first shifts the contents of the destination register 8 places left, then it ors an 8-bit constant into the register:

The Hawk ORIS instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
1 1 0 0 dst const8

This can be described formally as r[dst]=(r[dst]<<8)|const8. Note that the constant is not sign extended!

The << or left-shift operator has the same definition here as it has in C, C++, Java and Python, as well as in SMAL. This operator shifts the binary value of its left-hand argument to the left the number of places indicated by its right-hand argument, filling zeros into the left end of the word. In arithmetic terms, the result of this is that, a << b is the same as a × 2b.

The | or bitwise-or operator is also used here as it is in C, C++, Java, Python and SMAL. This operator produces each bit of the result as the logical or of the corresponding bits of the operands. So, 11002 | 10102 = 11102.

With rare exceptions, ORIS will always be used immediately after an LIL instruction. For example if we wanted to load 765432102 into register 5, we could use the following instruction sequence:

Loading a 32-bit constant
                                 1          USE     "hawk.h"
                                 2  .       =       #1000
 00001000: E5  765432            3          LIL     R5,#765432
 00001004: C5  10                4          ORIS    R5,#10      ; R5 = #76543210

When a sequence of instructions is used as a unit, as if it were a single instruction, that sequence is known as a macro instruction or just a macro. This terminology dates back to Autocoder, the assembler for the IBM 705 computer from the mid 1950s. Most assembly languages designed since that time have included tools to package macros so that they can be used as if they were single instructions. In the SMAL assembler, the macro LIW, meaning load-immediate word, can be defined as:

Defining a macro to load a 32-bit constant
             
        MACRO   LIW =dst, =const
          LIL   dst, const >> 8
          ORIS  dst, const & #FF
        ENDMAC
             

The >> or right-shift operator in the macro body has the same definition it has in C, C++, Java, and Python. This operator shifts the binary value of its left-hand operand to the right the number of places indicated by its right-hand operand, filling zeros into the most significant bits. If the operands are unsigned integers, a >> b is the same as ⌊a ÷ 2b⌋.

Aside: Java and SMAL have two distinct right-shift operators. The signed right shift >> treats the operand as a signed two's complement integer, duplicating the sign bit as needed to fill in the most significant bits of the number, so shifting a negative integer will produce a negative result. The unsigned right shift >>> treats the operand as an unsigned integer, filling in the high bits with zeros, so shifting a negative integer will produce a positive result.

In contrast, in C, the >> operator does both jobs, depending on how the operand was declared. If the operand is of type int, it is a signed shift, while if the operand is of type unsigned int it is an unsigned shift.

All integers in Python are represented using signed two's complement values, so Python only has signed shifts.

Also note the & or bitwise-and operator in the macro body. As in C, C++ and Java, each bit of the result is the logical and of the corresponding bits of the operands. So 11002 & 10102 = 10002.

To explore how the macro body works, suppose that the macro parameter const has the value 7654321016. The shifting operation const >> 8 throws away the least significant 8 bits giving the result 76543216, while the masking operation const & #FF throws away all but the least significant 8 bits, giving 1016. If this is confusing, recall that the mask FF16 is ...00111111112. Shifting and masking, as illustrated here, are vary common in code that must pick out and manipulate individual bits or block of bits within a word.

In the macro definition above, the equals signs used as prefixes on the macro formal parameters tell the assembler to evaluat the expression used as a parameter and pass its value. This is called passing parameters by value. Omitting the equals sign would cause the assembler substitute the entire text of the actual parameter for the formal parameter.

The LIW macro is part of the standard hawk.h header file, so most Hawk programmers will never explicitly use the ORIS instruction; instead, they will write programs using LIW to load full-word constants, as follows:

Using the macro to load a 32-bit constant
                                 1          USE     "hawk.h"
                                 2  .       =       #1000
 00001000: E5  765432  C5        3          LIW     R5, #76543210
 00001005: 10 

When the assembler processes this code, it expands the LIW macro call to a sequence of two instructions, so when the CPU executes this code, it will see an LIS followed by an ORIS, executed in two iterations of the fetch-execute cycle. If you use a debugger to examine the contents of memory, you will see two separate instructions, not one macro instruction. In the assembly listing above, the assembler listed the object code resulting from the macro call from left to right, filling each line before starting on the next line. The object code for the LIW instruction ended up split across two lines of the listing. This has no effect on how the data is stored in memory.

The hawk.h header file includes macros that define each of the built-in instructions in the Hawk. For example, the load-immediate-long and or-immediate-and-shift instructions can be defined as follows:

Macros for two Hawk machine instructions
         
; the load-immediate-long instruction
        MACRO   LIL =dst, =const
          B     #E0 | dst
          T     const
        ENDMAC

; the or-immediate-and-shift instruction
        MACRO   ORIS =dst, =const
          B     #C0 | dst
          B     const
        ENDMAC
         

The keyword MACRO tells the assembler that the following text, up to ENDMAC, is to be substituted for the macro name whenever that name appears in the text as an instruction name. As this substitution is done, the parameters to the macro are substituted into the macro body. Macros resemble subroutines with similar terminology. We speak of macro calls and macro parameters, but unlike subroutines, macro processing is purely textual.

The macro defining the LIL instruction uses a T directive to assemble a 24-bit (three byte) constant into memory. This is the only use of the T directive in the Hawk macros.

If you look in the hawk.h file, you will see macros somewhat more complex than those presented here for LIW, LIL and ORIS. The complexity is the result of additions to prevent the use of symbols that are not register names in fields where register names are required. If all assembly language programs were error free, the code given here would suffice. Unfortunately, real programmers make errors, and the additional complexity in the hawk.h file helps detect some of these errors when a program is assembled.

Exercises

r) Give an appropriate macro definition for the Hawk LIS instruction.

s) Give an appropriate macro definition for the Hawk ADDSI instruction.

t) Give an appropriate macro definition for the Hawk ADDI instruction.

u) Give an appropriate macro definition for the Hawk LOADS instruction.

Macros in C

C and C++ also support macros, although with a very different syntax and with weaker semantics. Here are some example C macros used in a function to compute the length of a string:

Example macros in C
#define NUL '\0'
#define succ(i) ((i) + 1)

int strlen( char* s ) {
    int len = 0;
    while (s[len] != NUL) len = succ(len);
    return len;
}

Here, the macro NUL renames the character constant '\0', so wherever the identifier NUL appears in the program, it is expanded to '\0' before the compiler sees it. The macro succ has a parameter, i, so the macro call succ(len) expands to ((len) + 1).

The succ macro definiton may seem overly parenthesized. Remember that macros expansion is done by simple textual replacement. This can lead to unexpected results if the macros are defined carelessly. Suppose the body of succ had been given as i + 1 without parentheses. As a result, -succ(5) would expand to -5 + 1, giving the value –4 where you might expect –6, and succ(5)*2 would expand to 5 + 1*2, giving the value 7 where you might expect 12.

When you run a C program, you use a toolchain that begins with the C preprocessor, followed by the actual C compiler. The compiler may produce assembly language that is then assembled make object code. Finally, a linker is used to combine the object code with various libraries to make an executable. All of this is typically hidden inside the cc command, although options on the cc command let you run just parts of the toolchain. For example, cc -E runs just the preprocessor showing you what it does to your program.

The Fetch-Execute Cycle Revisited

Our original presentation of the fetch-execute cycle was quite brief, suggesting that Hawk worked as follows:

The Hawk fetch-execute cycle
while (TRUE) do {
        /* fetch one instruction */
        instruction_register = halfword_memory[ program_counter ];
        program_counter = program_counter + 2;

        /* execute that instruction */
        execute( instruction_register );
}

This is vastly oversimplified. We have covered enough ground that we can expand on this considerably. The following code, while still simplified, gives details of how the fields of the instruction are broken out and how instructions are decoded:

The Hawk cycle, in more detail
while (TRUE) do {
        /* fetch one instruction */
        instruction_register = halfword_memory[ program_counter ];
        program_counter = program_counter + 2;

        /* break out fields of the instruction register */
        opcode =          (instruction_register >>  4) & 15;
        dst =             (instruction_register      ) & 15;
        const8 =          (instruction_register >>  8)     ;
        s1 = opcode1 =    (instruction_register >> 12)     ;
        s2 = const4 = x = (instruction_register >>  8) & 15;

        /* execute the instruction */
        switch (opcode) {
                /* 16 cases, one per legal opcode value */
        }
}

In the above code, we have given no detail for each instruction, but we can easily add this. Note that many of the fields of the instruction have multiple names. For example, we used the name const4 in our description of the ADDSI instruction to refer to the same field of the instruction that we called x in the memory reference instructions or s2 in the ADD and SUB instructions.

The actual hardware for a CPU does not need separate registers to hold the different fields of the instruction register. Instead, there is just one instruction register, with wires (conducting traces on the surface of the chip) carrying the different bits to the required places. The variables opcode, const8, s1 and s2 are names for these bundles of wires and not names for registers. The assignment statements giving values to these fields do not describe real computations inside the processor, they describe how the bundles of wires are connected to the various bits of the instruction register.

Given the above, filling in the cases for instructions like ADD and LIS is quite simple:
 

Executing the ADD and LIS instructions
           
case 3: /* ADD */
        r[dst] = r[s1] + r[s2];
        break;

case 15: /* LIS */
        r[dst] = sxt( 7, const8 );
        break;
           

Execution of double-length instructions is more complex. This is the case with the LIL instruction. Only after we decode the opcode field to determine that it is an LIL instruction can we fetch the second halfword.

Other instructions are more complex because they involve additional decoding work. This is the case with the LOADS and STORES instructions. Here, we must decode the fact that the instruction is a memory reference instruction before we use the s1 field as a secondary opcode. Some of this complexity is described below:

Executing the LIL and memory reference instructions
       
case 14: /* LIL */
        const16 = halfword_memory[ program_counter ];
        program_counter = program_counter + 2;
        r[dst] = sxt( 23, (const16 << 8) | const8 );
        break;

case 13: /* memory reference instructions */
        switch (opcode1) {
                case 13: /* LOADS */
                        r[dst] = word_memory[ r[x] ];
                        break;
                case 15: /* MOVE */
                        r[dst] = r[x];
                        break;
                /* 14 more cases */
        }
        break;
       

It is crucial to remember that, within any CPU, whether implemented using silicon or vacuum tubes, the fetch-execute algorithm is directly executed by hardware. As this hardware executes the instruction, it performs a sequence of register transfers, to move data from one register to another. With each register transfer, the data may be routed through components that transform the data, for example, adding values from different registers.

When these register transfers are done in sequence, we can describe each instruction in the machine language as being executed in a series of microsteps; sometimes, the control unit of the processor directs these microsteps using what can be described as micro instructions, a phrase that was coined at around the same time as the phrase macro instruction. Some processors even have a small and very fast read-only memory that is used to store a microprogram that describes the instruction set of that processor.

If, instead of building hardware, we take an existing computer and run a program on that computer that executes the algorithm given above, we have constructed what is called an emululator for one processor's instruction set that runs on another. Emulators can be very complex because of the need to provide support for input and output devices, but the basic idea is fairly simple. Commercial emulators are widely used in order to allow applications written for one vendor's computers to be run on computers built by other vendors, for example, there were several emulators allowing IBM PC code using the Intel Pentium architecture to be run on Apple Macintosh computers in the days when Apple was using the Power PC architecture.

Emulators are also used to allow new computers to run code that was originally compiled for older machines, for example, to allow code from the original Apple Macintosh computers that used Motorola 68000 processors to run on newer Apple Macintosh computers that use the Motorola or IBM Power PC processor. Apple went through this again during the transition from the Power PC to the Intel Pentium architecture. Some of the very first emulators were written with this use in mind. For example, when IBM came out with their System 360 processor in 1965, one of their first products was an emulator for the IBM 1401 computer, so that they could immediately begin replacing existing 1401 computers with 360 computers.

Exercises

v) Give the code for the SUB case (you will have to look it up in the Hawk manual).

w) Give the code for the ORIS case.

x) Give the code for the ADDI case -- and note which switch statement it goes in!

y) Write code for the sxt() function. This may take some thought!
 

Instruction Register Formats of some Other Machines

For the sake of comparison, here are the instruction register formats of some other historically important machines. The simplest of the DEC PDP-8 family of computers had a main memory consisting of 4096 12-bit words, and the processor contained a 12-bit program counter and a single 12-bit accumulator. The instruction register on this machine has the following format:

The PDP-8 instruction register
           
00 01 02 03 04 05 06 07 08 09 10 11
opcode i z address
           

With a 3-bit opcode field, the PDP-8 obviously has a very small instruction set. It shold be no surprise that computers with this architecture were among the least expensive computers on the market for many years in the 1960s and 1970s. Notice also that the bits of the word are numbered from the left instead of from the right. This is the bigendian system of numbering the bits.

The i and z bits specify the addressing mode. This determines how the 7-bit address field of the instruction is used. If the z bit is zero, the address refers to a location in the bottom 128 words of memory, while if it is one, the reference is to a location in the 128-word page selected by the top 5 bits of the program counter.

If the i bit is zero, the operand is in the memory word specified by the address and the z bit. This is called direct addressing. If the i bit is one, the addressed word holds the complete 12-bit address of the operand. This is called indirect addressing. The PDP-8 can directly address only 256 words of memory, page zero and the current page. Using indirect addressing, it can address the entire 4096 word address space.

The IBM System 360, introduced about the same time as the PDP-8 is the ancestor of IBM's current line of enterprise servers. This machine was at the other extreme of price and performance from the PDP-8. Like the Hawk, the 360 has 15 general prupose registers and a 32-bit word, where instructions occupy either one or two consecutive halfwords of memory. The instruction register on the 360 has the following format:

The IBM 360 instruction register
   
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15
opcode r x
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
b displacement
   

Like the PDP-8, the 360 is a bigendian machine. For short instructions, the second halfword of the 360 instruction register is unused. The opcode is always 8 bits, allowing 256 different instructions. The r field is comparable in use to the dst field of a Hawk instruction. It usually designates the destination register except for store instructions.

The x field is used to designate the source register for register to register arithmetic instructions, and it designates the index register holding a memory address for short memory reference instructions. The 360 long memory reference instructions used the x and b fields to designate two registers, the index register and the base register. The contents of these are added to the 12-bit displacement to form the memory address.

Modern architectures frequently have a dismaying variety of instruction formats. The Intel 80x86 family that dominates today's marketplace is extremely messy, while the Power PC architecture is more modest, with 5 basic instruction formats, all of which are 32 bits long. With 32 registers, this machine uses 5 bits for each of the source and target register fields, and some instructions specify as many as 4 registers (typically three source registers and a target).

The Power PC instruction register
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
opcode tgt/src src/tgt immediate
opcode tgt/src src/tgt src extended opcode
opcode tgt/src src/tgt src src ext opcd Rc
opcode BO BI BD AA LK
opcode LI AA LK

The ARM (Advanced RISC Microprocessor), common in cellphones and the Raspberry Pi family of computers supports two different instruction sets. The machine has (for our purposes) 16 registers, so like the Hawk and the IBM 360, it has 4-bit fields used for register selection. The Thumb instruction set is vaguely similar to the Hawk and IBM 360, with instruction halfwords, while full 32-bit ARM instructions are comparble to the Power PC in some ways but with some interesting differences.

One big difference is that there is a condition field in the high 4 bits of every instruction that makes that instruction conditional on the result of a previous instruction. That is, it is as if every instruction is included in an if statement to see if it should do anything at all. The other big difference is that essentially all of the arithmetic operations can have one of their operands shifted arbitrarily before it is combined with the other operand.

For most instructions, 8 bits specify the opcode, although spare bits elsewhere in the instruction register (marked o in the following) are also used as opcode bits. A single instruction can, for example, add register Rs shifted left by count bits to register Rn, and store the result in Rd, but only if the previous instruction produced a positive result. Another instruction could subtract Rn from a 12-bit immediate constant and unconditionally put the result in Rd. The presentation here is vastly oversimplified:

The ARM instruction register
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
cond opcode Rn Rd count shift o Rm
cond opcode Rn Rd Rs o shift o Rm
cond opcode Rn Rd rotate immediate
cond opcode Rn Rd immediate
cond opcode branch offset