4. The Fetch Execute Cycle

Part of 22C:60, Computer Organization Notes
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Overview

When Burks Goldstine and Von Neumann published their seminal report, Preliminary discussion of the logical design of an electronic computing instrument in 1946, they proposed a number of details inspired by and contributing to the preliminary design of the EDVAC computer then being built at the Moore School of Electrical Engineering. The basic structure of this computer followed the basic outline introduced at the start of the previous chapter:

The Von Neumann model of a computer
Central Processing Unit
CPU
 
  Registers  
 
 
            Interface(s) or Bus(ses)            
 
Memory
 
Input
Output

Within this outline, EDVAC and other early computers varied considerably! Some early machines had parallel CPUs in which all the data in any particular memory word were operated in at the same time, while others had sequential architectures in which the bits of a word were processed sequentially. Some used various early random access memory technologies for the main memory, while others used sequential memory technologies. Some used binary arithmetic, while others used decimal arithmetic, usually using 4 bits to encode each decimal digit.

All of these early machines, however, shared a very important property with modern computers! The fundamental idea of the central processing unit has not changed. At heart, almost all of the central processors designed from 1946 onward can be described in the following terms:

The program to be executed is stored in main memory, as a sequence of instructions.

One register within the central processor, today called the program counter holds the address of the next instruction of the program to be executed.

The central processor operates by repeatedly fetching and executing instructions.

To fetch and execute one instruction, the central processor fetches the contents of the memory location addressed by the program counter into the instruction register, increments the program counter so it points to the next instruction, and performs the operation indicated by the instruction register.

Since the above description is essentially the description of a procedure, we can describe the central processor of a computer in procedural terms by saying that it is hardware that implements an algorithm described in procedureal terms. Here is the algorithm:

The fetch-execute cycle
while (TRUE) do {
        /* fetch one instruction */
        instruction_register = memory[ program_counter ];
        program_counter = program_counter + 1;

        /* execute that instruction */
        execute( instruction_register );
}

While the basic idea of the fetch-execute cycle has not changed since the dawn of electronic computing, the details have changed immensely. Early machines could only execute a few thousand instructions per second, but by the late 1960's there were machines able to execute a million instructions per second, and today we speak casually of machines that may execute billions of instructions per second.

The code given above shows the program counter being incremented by one for each instruction fetched. In fact, throughout the history of computing, computers have been made with instructions that are smaller or larger than one addressable unit of memory, and by 1960, there were computers with variable-sized instructions, where the size of the instruction depending on what it did.

The set of registers used inside the central processor has also varied considerably between different computers. In addition to the program counter and instruction register, many early computers had only one other register, the accumulator. Instructons did things like load the accumulator from memory, store its contents to memory, or adding to it. In contrast, many modern central processors have many tens of registers in their central processors.

The Hawk CPU

The Hawk processor borrows features from a number of computers designed between 1946 and 2000. Perhaps the most important machine it borrows from is the IBM System 360 designed in the mid 1960's; IBM's family of enterprose servers are direct descendants of this. Another important source is the Digital Equipment Corporation PDP-11, from which the Motorola 68000 and the Intel 8086 also borrowed; the latter, of course, is the ancestor of the Pentium.

The Hawk has 15 general purpose registers of 32 bits each, numbered 1 to 1510 or 1 to F16. Each of these can be used as accumulators to hold intermediate results during computation or as index registers to hold memory addresses. In assembly language programs, these are named R1 to R15 or RF.

In addition to the general purpose registers, the Hawk has a program counter and a special 32-bit register called the processor status word that contains a variety of special fields. The most important processor status word field holds the condition codes named N, Z, V and C. These are so important that the Hawk emulator breaks them out to be displayed separately.

Whenever you run a Hawk debugger or emulator, the internal state of the processor will be displayed as follows:
The Hawk CPU state
PC:  00000000                R8: 00000000
PSW: 00000000  R1: 00000000  R9: 00000000
NZVC: 0 0 0 0  R2: 00000000  RA: 00000000
               R3: 00000000  RB: 00000000
               R4: 00000000  RC: 00000000
               R5: 00000000  RD: 00000000
               R6: 00000000  RE: 00000000
               R7: 00000000  RF: 00000000

In this display of the Hawk processor state, you see the program counter and the processor status word displayed on the left, with the condition code bits broken out for separate examination on the third line. The right two columns show the 15 registers, each with its name. In the Hawk debugger, the contents of all of these registers are always shown in hexadecimal, with the exception of the condition codes, which are shown in binary.

Why didn't the display shown above include the instruction register? It is there, inside the central processor, but its contents are only transient, and a typical debugger or emulator will provide a more convenient way of examining the program being executed, either by displaying the contents of memory, or better yet, by displaying a symbolic representation of the program being executed.

Generally, newcomers to machine-language and assembly-language programming should not seek in-depth information on the internals of the processor until they have seen an overview of a few of the simpler instructions. Nonetheless, additional information is always available in the reference manuals for the processor. In our case, Chapter 1 of the Hawk manual contains a section on the central processor that gives more detail on the registers, and it includes a section on the instruction execution cycle giving specifics of this cycle for the Hawk processor.

The Hawk Fetch-Execute Cycle

The fetch-execute cycle of the Hawk differs from the generic fetch-execute cycle because Hawk instructions are each made of one or two halfwords. Therefore, instruction addresses are always even, and the program counter is always incremented by either two or four after each fetch. If we ignore instructions longer than one halfword, we can therefore describe the fetch-execute cycle of the Hawk as follows:
The fetch-execute cycle
while (TRUE) do {
        /* fetch one instruction */
        instruction_register = halfword_memory[ program_counter ];
        program_counter = program_counter + 2;

        /* execute that instruction */
        execute( instruction_register );
}

To understand how the central processor of any computer works, we must examine the instruciton register in more detail. In the case of the Hawk, the instruction register is arranged as follows:

Basic fields of the Hawk instruction register
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
opcode dst

The single most important field of the instruction register is the opcode or operation-code field. This 4-bit field determines the interpretation of the other fields of the instruction, and in some cases, it completely determines the details of the instruction to be executed. A second field, the destination register field or dst is almost always used to specify one of the 15 general purpose registers, and this register is almost always used as the destination register, that is, the register in which the result of some operation will be placed.

Basic Arithmetic Instructions

For example, consider the add instruction. On the Hawk, as on many other computers, this instruction allows the contents of two registers to be added. On the Hawk, this instruction takes the following form:

The Hawk ADD instruction
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
0 0 1 1 dst s1 s2

In the add instruction, the opcode field is 00112 and when the central processor sees this pattern, this determines that bits 0 to 3 and bits 4 to 7 of the instruction register will be used as register numbers source two and source one, s2 and s1, specifying the registers that will provide the operands to be added. The result will then be placed in the destination register before the next instruction is executed.

In short, we can use the notation r[dst]=r[s1]+r[s2] to describe what the add instruction does, and we can describe this instruciton as a three-register instruction because it references a total of three registers. In fact, the name three-register instruction is something of a misnomer; some so-called three-register instructions on the Hawk use one of the fields s1 or s2 to hold constants, not register numbers.

To illustrate the impact of executing an add instruction, consider the following initial state of the registers in the Hawk processor:

An initial Hawk CPU state
PC:  00001000                R8: 00000000
PSW: 00000000  R1: 00000000  R9: 00000000
NZVC: 0 0 0 0  R2: 00000000  RA: 00000000
               R3: 00000000  RB: 00000000
               R4: 01234567  RC: 00000000
               R5: 76543210  RD: 00000000
               R6: 00000000  RE: 00000000
               R7: 00000000  RF: 00000000

If memory location 100016 contains the halfword 324516, that is, an add instruction requesting that register 2 be loaded with the sum of the contents of register 4 and register 5, and then we allow the CPU to execute exactly one instruction, the state of the processor will change to:

The state after ADD R2,R4,R5
PC:  00001002                R8: 00000000
PSW: 00000000  R1: 00000000  R9: 00000000
NZVC: 0 0 0 0  R2: 77777777  RA: 00000000
               R3: 00000000  RB: 00000000
               R4: 01234567  RC: 00000000
               R5: 76543210  RD: 00000000
               R6: 00000000  RE: 00000000
               R7: 00000000  RF: 00000000

Note that there are only two changes from the previous state listed above to the new state. First, the program counter has been incremented by two, since a single halfword instruction occupies 2 bytes, and second, the value of register 2 has changed. This now holds the sum of the values in registers 4 and 5.

While nothing prevents an assembly language programmer from writing H #3245 when an instruction is needed to add what is in registers 4 and 5 and put the sum in register 2. Recall that the H directive in our SMAL assembly language places a halfword in memory, and recall that the # is used as a prefix indicating that the operand is a hexadecimal number. While an assembly language programmer can write this, it is unreadable! Human readers, and for that matter, the authors of compilers, would rather not program in hexadecimal!

Assembly languages provide tools that allow you to use a symbolic notation. Our SMAL assembly language does not have any built-in knowledge of the Hawk instruction set, but as mentioned in the previous chapter, we the relevant definitions are available in a header file, and once this header is included, we can use the convenient shorthand notation ADD R2,R4,R5. Here is a listing of an assembly language program that will put the instruction 324516 in memory location 100016 using this symbolic notation:

Assembly of the ADD instruction
                             1          USE     "hawk.macs"
                             3  .       =       #1000
 001000: 3245                4          ADD     R2, R4, R5

Of course, a computer with just one instruction is unlikely to be very useful, and as you may have guessed, the 4-bit opcode field of the Hawk allows for 16 distinct instructions. There is, for example, a subtract instruction that has the same basic format ADD, and is known, naturally, as SUB. These instructions are all register-to-register instructions, so called because they take operands from registers and deliver a result in registers.

Exercises

a) What halfword does ADD R15, R12, R8 assemble to in memory.

b) What halfword does ADD R7, R9, R11 assemble to in memory.

c) If register r[i] contains i, and then ADD R7, R9, R11 is executed, what register is changed and what is its new value?

d) If register r[i] contains i, and then the instruction 324516 is executed, what register is changed and what is its new value?

More Hawk Instructions, Loading Constants

Register-to-register instructions alone are not sufficient to make a useful computer, because we also need a way to load values into registers and to store results from registers to memory. Most arithmetic operations in real programs involve small constants. For example, the most common use of addition in real programs is to increment a variable, usually by one. Because of this, most computer architectures provide a simple instruction for loading a constant into a register. Such instructions are typically called load-immediate instructions, where the term immediate constant is used to refer to a constant that is part of the program itself, as opposed to a constant stored somewhere in memory outside the program.

In the case of the Hawk architecture, several instructions include immediate constants. There are two load-immediate instructions, one for 8-bit constants and one for 24-bit constants, and there are two add-immediate instructions, one for 4-bit constants and one for 16-bit constants. The simplest of these is the load-immediate-short instruction. This has the following format:

The Hawk LIS instruction
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
1 1 0 1 dst const8

In the load-immediate short instruction, the opcode field is 11012 and when the central processor sees this pattern, this determines that bits 0 to 7 of the instruction register will be taken as an 8-bit constant in the range -12810 to +12710. This constant is expanded to 32 bits by sign extension and stored in the register specified by the DST field of the instruction. We can use the shorthand notation r[dst]=sxt(7,const8) to describe this.

In the two's complement number system, an 8-bit constant can be expanded into a 32-bit constant by padding it, on the left, with 24 additional bits. If the original constant was positive or zero, this padding is done with zeros, while if the original was negative, it is done with ones. All of the bits added in this padding have a value equal to the sign bit of the original constant, which is why this is called sign extension. Many (but not all) Hawk instructions that include short constants sign extend those constants before using them. In our shorthand notation, the sxt() function indicates sign extension, with the first argument being the bit-number of the constant that is to be duplicated to fill all higher-numbered bits.

Of course, not all constants are short enough to fit in one byte! The load-immediate-long instruction provides for the vast bulk of larger constants, allowing for a 24-bit constant. Because the instruction register of the Hawk is only 16-bits long, this instruction is coded as two successive halfwords, as follows:

The Hawk LIL instruction
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
1 1 1 0 dst const8 (bits 16 to 23)
const16 (bits 0 to 15)

The constant in the load-immediate long instruction is sign extended before loading it into the 32-bit destination register, so the allowed range of values is -223 to 223-1, or -8,388,608 to 8,388,607. Again, we can use the shorthand notation r[dst]=sxt(23,const) to describe this, ignoring for the moment the problem of combining the 8-bit and 16-bit parts of the constant.

To demonstrate the use of these instructions, consider the following assembly language program that loads constants into registers R4 and R5 and then adds them, storing the result in R2:
Loading and adding some constants
                             1          USE     "hawk.macs"
                             2  .       =       #1000
 001000: D412                3          LIS     R4, #12
 001002: E565  4321          4          LIL     R5, #654321
 001006: 3245                5          ADD     R2, R4, R5

Note that, in assembling this code, the assembler allocated one halfword for the load-immediate-short instruction (LIS) and two halfwords for the load-immediate-long instruction (LIL). As a result, the ADD instruction sits in the 4th halfword of the program.

If this bit of code is loaded into the memory of a Hawk computer and the program counter is set to 100016 with all of the registers equal to zero, and then the computer is started and run for 3 instruction cycles, the state of the processor will end up as:
The state after 2 loads and an add
PC:  00001008                R8: 00000000
PSW: 00000000  R1: 00000000  R9: 00000000
NZVC: 0 0 0 0  R2: 00654333  RA: 00000000
               R3: 00000000  RB: 00000000
               R4: 00000012  RC: 00000000
               R5: 00654321  RD: 00000000
               R6: 00000000  RE: 00000000
               R7: 00000000  RF: 00000000

In this display of the Hawk processor state, the program counter has advanced to 100816, 8 bytes from the point at which the program began because the program itself occupied 4 halfwords. Registers R4 and R5 have been loaded as the instructions required, and the ADD instruciton has computed the sum of these two registers in R2.

The Hawk architecture does not provide a single-instruction for loading an arbitrary 32-bit constant into a register. Instead, there is a sequence of 2 instructions that takes up 3 halfwords of memory to load a 32-bit constant. The assembler (or rather, the hawk.macs header file) knows this sequence by the name LIW, standing for load immediate word, and generally, programmers can use this as if it was a single instruction, even though it is not.

Exercises

e) What is the most appropriate instruction to use for loading each of the following constants: 4, 40, 400, 4000, 40000, 400000?

f) What halfword does LIS R5, 44 assemble to in memory?

g) Write a SMAL Hawk program to load each register with its register number, so register r[i] ends up containing i.

Adding Constants

Studies of the frequency of different operations in real programs show that the most common operator in programs is addition, and that most addition operations add a constant to a variable. Furthermore, the most common constant in this context is one! It is fair to say that the most common statement in all programs is i=i+1. Because of this, most computer architectures provide a single instruction for incrementing the value stored in a register. The Hawk is no exception!

Of course, we could always increment a variable by first loading the constant required into a register and then adding that constant to the variable, but this takes two instructions, and incrementing is so very common that we want a single-instruction to do this. In the case of the Hawk architecture, the instruction is called add immediate, and like the load immediate instructions, there are two versions, one allowing a short constant, this time, 4 bits, and one allowing a larger constant, this time, 16 bits.

The Hawk ADDSI instruction
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
0 0 0 1 dst 1 1 0 0 const4

We can use the shorthand notation r[dst]=r[dst]+sxt(3,const4) to describe this. Because constant in the add-short-immediate instruction is sign extended before adding it to the contents of the 32-bit destination register, the allowed range is -8 to +7. If a larger range of values is needed, the add-immediate instruction should be used. As with load-immediate-long, this is a long instruction composed of two consecutive halfwords:

The Hawk ADDI instruction
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
1 1 1 1 dst 0 1 1 0 x
const16

We can use the shorthand notation r[dst]=r[x]+sxt(15,const16) to describe this. Unlike the ADDSI instruciton, this add-immediate instruction does not require that the source and destinations be the same register. The constant is sign extended, so the range of allowed values is -32768 to +32767.

In the rare event that larger constants are needed, these must be loaded into a register first, and then added. The following assembly language example illustrates this:

Incrementing registers by various amounts
                             1          USE     "hawk.macs"
                             2  .       =       #1000
 001000: 13C1                3          ADDSI   R3, 1        ; R3 = R3 + 1
 001002: F463  000A          4          ADDI    R4, R3, 10   ; R4 = R3 + 10
 001006: E501  86A0          5          LIL     R5, 100000
 00100A: 3554                6          ADD     R5, R5, R4   ; R5 = R4 + 100000

Exercises

h) What is the most appropriate instruction or instruction sequence to use for incrementing a register by each of the following constants: 4, 40, 400, 4000, 40000, 400000?

i) What halfword does ADDSI R5, 4 assemble to in memory?

j) Translate r1 = r2 + r3 - 5 into Hawk instructions expressed in assembly language.

k) Translate r1 = (r2 - r3) + 5 into Hawk instructions expressed in assembly language.

Memory Reference Instructions

Loading constants into registers and doing arithmetic on those constants is not enough unless all of the variables used by a program can be held in registers. Although this is occasionally true, the vast majority of programs require that some variables be held in memory. Some computer architectures allow any machine instruction to reference operands anywhere in memory or in any register in the processor. This makes the processor very complex, leading to a class of computer architectures known as complex instruction set complexity architectures or CISC architectures.

On the other extreme, some architectures, the Hawk among them, require that all arithmetic be performed by register-to-register instructions, with all use of variables in memory done by loading those variables into registers or storing the contents of registers to memory. These architectures are known as load-store architectures or reduced instruction set complexity architectures or RISC architectures.

Where a CISC architecture might allow the statement a=b+c to be executed by one very complex instruction, on a RISC architecture, if the three variables are all in memory, this statement would require two load instructions followed by an add and a store.

The Hawk architecture has 5 distinct load instructions and 3 store instructions. Only the simplest of these instructions are discussed here, instructions that load or store one register using the contents of a second register as the memory address. These are known as short-memory-reference instructions because they are encoded as a single 16-bit halfword, while some of the others have long forms requiring a sequence of two halfwords.

On the Hawk, as on many other computer architectures, we speak of registers that hold memory addresses as index registers because such a register can be described as holding the index of a particular location in memory. This makes sense if you think of memory as an array and remember that one use of the word index is as a synonym for an array subscript.

The LOADS or load short instruction, for example, loads the destination register with the 32-bits word stored at the memory location addressed by the contents of the second register, referred to as the index register. This instruction has the following form:

The Hawk LOADS instruction
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
1 1 1 1 dst 1 1 0 1 x

The STORES or store short instruction has exactly the same format, but it moves the data in the opposite direction, from the designated register to the location pointed to by the index register.

The Hawk STORES instruction
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
1 1 1 1 dst 1 0 1 0 x

These two instructions have identically the same opcode, 11112. This opcode value indicates that this is a memory-reference instruction and that bits 0 to 3 of the instruction register specify an index register while bits 8 to 11 specify the destination register, as usual. Bits bits 4 to 7 of the instruction register are available in this instruction format to specify which specific memory memory reference instruction is to be performed. It is reasonable to think of these bits as an extension of the opcode field of the instruction.

If we wanted to use a high-level language notation to specify the function of these instructions, we would say that the load short instruction performs the operation described by r[dst]=m[r[x]] and the store short instruction performs the operation described by m[r[x]]=r[dst]. Notice, in the latter, that the name dst is clearly a misnomer! As was mentioned in the introduction to the instruction register format, the name destination field is used because, in most instructions, this field identifies the destination register. Store instructions are the most common exception to this generalization!

The following example illustrates the use of these instructions to add the contents of the words at memory locations 1000016 and 1000416 and place the sum in location 1000816. This code begins by loading index registers with the addresses of these locations and then performing the sequence of 4 instructions that actually does the work:
Adding two variables
                             1          USE     "hawk.macs"
                             2  .       =       #1000
                             3  
                             4  ; load the addresses
 001000: E501  0000          5          LIL     R5, #10000
 001004: E601  0004          6          LIL     R6, #10004
 001008: E701  0008          7          LIL     R7, #10008 
                             8  
                             9  ; then perform the arithmetic
 00100C: F3D5               10          LOADS   R3, R5          ; load #10000
 00100E: F4D6               11          LOADS   R4, R6          ; load #10004
 001010: 3234               12          ADD     R2, R3, R4
 001012: F2A7               13          STORES  R2, R7          ; store #10008 

The idea that each load from memory or store to memory must be preceeded by loading the address of the variable into an index register is rather daunting, but note that these are only the simplest load and store instructions offered by the Hawk computer. They are good examples for a first-time Hawk programmer, but there are other load and store instructions that we will discuss later.

Exercises

l) Translate a = (b - 64) + c to Hawk assembly language code, assuming that the variables a, b and c are stored in memory at locations 1000016 through 1000B16 in that order.

m) Consider F1D516, F26116, FFFF16, F2A516 as a sequence of halfwords. What sequence of Hawk assembly language instructions, when assembled, would produce this sequence?

The Register-to-Register Move Instruction

The Hawk computer has a number of other useful instructions. For example, the move instruction moves the contents of one register to another:

The Hawk MOVE instruction
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
1 1 1 1 dst 1 1 1 1 x

We can describe the effect of this with the notation r[dst]=r[x]. This instruction is considered a short memory reference instruction, just like LOADS because inside the processor, it performs exactly the same computation as LOADS, except that, instead of using the value of r[x] as the memory address of the operand, it uses that value as the operand.

It is interesting to note that value 11111111111111112 or FFFF16, if treated as an instruction, means move register 15 to register 15. This does nothing; such an instruction is called a no-op, short for no operation! Computers designed for use with programmable read-only memory or PROM are frequently designed so that the values 000016 and FFFF16 are no-ops. This is because each bit of a programmable read-only memory can be changed exactly once from its initial value, and all bits are initially the same. If the initial value of each word is FFFF16, any no-op in the program with this value can later be converted to a useful instruction by changing some of the bits to zero. Similarly, if 000016 is a no-op, any instruction that turns out to have been wrong can be converted to a no-op by turning the remaining one bits in that instruction to zero. This allows an existing program in PROM to be patched without the need to purchase a new PROM module.

Exercises

n) Using only one LIS instruction, followed by a sequence of MOVE instructions, write sequence of Hawk assembly language instructions that will set all 15 registers to 1.

o) Assuming you set all 15 registers in order, how many distinct solutions are there to the previous problem? Then, in how many distinct orders could you initialize the registers? Multiply these together to get the total number of distinct correct solutions to the previous problem. This question is really a discrete math question!

p) Using only one LIS instruction, followed by a sequence of ADD instructions, write sequence of Hawk assembly language instructions that will set all 15 registers so that r[i] contains i.

q) Assuming you set all 15 registers in order, how many distinct solutions are there to the previous problem? Then, in how many distinct orders could you initialize the registers? Multiply these together to get the total number of distinct correct solutions to the previous problem. This question is really a discrete math question!

Load Immediate Word and Macros

As mentioned earlier, the Hawk architecture does not include a single instruction for loading full 32-bit constants in memory. Instead, this must be done by a sequence of two instructions. In the first of these instructions, the high 24 bits of the constant are loaded using an LIL instruction. Following this, a second instruction is used to pack the low 8 bits of the constant into place. This second instruction, or-immediate shifted operates by first shifting the contents of its destination register 8 places to the left, and then oring its 8-bit constant into the low bits of the destination register:

The Hawk ORIS instruction
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
1 1 0 0 dst const8

This can be described formally as r[dst]=(r[dst]<<8)|const8. Note that the constant is not sign extended!

The << operator has the same definition here as it has in C, C++ and Java, as well as in our SMAL assembly language. This operator shifts the binary value of its left-hand argument to the left the number of places indicated by its right-hand argument. In arithmetic terms, a<<b is the same as a×2b.

The | or bitwise-or operator is also used here as it is in C, C++ and Java. This operator produces each bit of the result as the logical or of the corresponding bits of the operands. So, 11002 or 10102 is 11102; this carefully composed example includes within it all possible combinations of 1 and 0.

While there may be rare exceptions, the vast majority of uses of the ORIS instruction will be in the context of a previous LIL instruction. For example if we wanted to load 765432102 into register 5, we could use the following instruction sequence:

Loading a 32-bit constant
                             1          USE     "hawk.macs"
                             2  .       =       #1000
 001000: E576  5432          4          LIL     R5, #765432
 001004: C510                5          ORIS    R5, #10     ; R5 = #76543210

When a sequence of instructions is used as a unit, as if it were a single instruction, that sequence is known as a macro instruction or just a macro. This terminology dates back to Autocoder, the assembly language of the IBM 705 computer developed in the mid 1950's. Most decent assembly languages designed since that time have included tools to allow the sequence of instructions making up a macro to be named and used as if they were single instructions. For example, in the SMAL assembler, the macro LIW, meaning load-immediate word, can be defined as:
Defining a macro to load a 32-bit constant
; macro to load a full 32-bit word with a constant
        MACRO   LIW =dst, =const
          LIL   dst, const >> 8
          ORIS  dst, const & #FF
        ENDMAC

In reading the above macro, note the use of the >> operator. This is the right-shift operator, and as in C, C++ and Java, this operator shifts the bit pattern representing its left-hand operand a rightward the number of places indicated by its right-hand operand. If you interpret the operands as unsigned integers, a>>b is equivalent to a/2b.

Note also the use of the & or bitwise-and operator. As in C, C++ and Java, this operator produces each bit of the result as the logical and of the corresponding bits of the operands. So, 11002 and 10102 is 10002; this carefully composed example includes within it all possible combinations of 1 and 0.

This macro definition is included in the standard hawk.macs header file, so most assembly language programmers will never explicitly use the ORIS instruction; instead, they will write programs using LIW to load full-word constants, as follows:
Using the macro to load a 32-bit constant
                             1          USE     "hawk.macs"
                             2  .       =       #1000
 001000: E576  5432          4          LIW     R5, #76543210
 001004: C510

Of course, later on, when the CPU executes this code, it will not see the macro instruction as one unit, it will execute it using two iterations of the fetch-execute cycle, and if you use a debugger to examine the contents of memory, you will see two separate instructions, not one macro instruction.

In fact, the hawk.macs header file includes macros that define each of the built-in instructions in the Hawk! For example, the load-immediate-long and or-immediate-and-shift instructions can be defined by the following two macros:
Macros for two Hawk machine instructions
; the load-immediate-long instruction
        MACRO   LIL =dst, =const
          H     #E000 | (dst << 8) | ((const >> 16) & #FF)
          H     const & #FFFF
        ENDMAC

; the or-immediate-and-shift instruction
        MACRO   ORIS =dst, =const
          H     #C000 | (dst << 8) | (const & #FF)
        ENDMAC

If you look in the actual hawk.macs file, you will see macros somewhat more complex than those presented here for the LIW, LIL and ORIS instructions, you will see macros somewhat more complex than those presented here. These macros are complicated by additions to prevent the use of any symbol that is not a register name in the fields where register names are required, and by additions to check that each of the arguments in within the correct range of values. If all assembly language programs were error free, the code given here would suffice, but real programmers are good at making errors, and the additional complexity in the hawk.macs file helps detect some of these errors when a program is assembled.

Exercises

r) Give an appropriate macro definition for the Hawk LIS instruction.

s) Give an appropriate macro definition for the Hawk ADDSI instruction.

t) Give an appropriate macro definition for the Hawk ADDI instruction.

u) Give an appropriate macro definition for the Hawk LOADS instruction.

The Fetch-Execute Cycle Revisited

Our original presentation of the fetch-execute cycle was quite brief, suggesting that the hardware of our CPU implemented something like the following algorithm:

The Hawk fetch-execute cycle
while (TRUE) do {
        /* fetch one instruction */
        instruction_register = halfword_memory[ program_counter ];
        program_counter = program_counter + 2;

        /* execute that instruction */
        execute( instruction_register );
}

We have covered enough ground that we can expand on this considerably, giving details of how the fields of the instruction are broken out and how they are decoded:

The Hawk cycle, in more detail
while (TRUE) do {
        /* fetch one instruction */
        instruction_register = halfword_memory[ program_counter ];
        program_counter = program_counter + 2;

        /* break out fields of the instruction register */
        opcode = instruction_register >> 12;
        dst = (instruction_register >> 8) & 15;
        const8 = instruction_register & 255;
        s1 = opcode1 = (instruction_register >> 4) & 15;
        s2 = const4 = x = instruction_register >> 4;

        /* execute the instruction */
        switch (opcode) {
                /* 16 cases, one per legal opcode value */
        }
}

In the above code, we have given no detail for each instruction, but we can easily add this. Note that many of the fields of the instruction have multiple names. For example, we used the name const4 in our description of the ADDSI instruction to refer to the same field of the instruction that we called x in the memory reference instructions or s2 in the ADD and SUB instructions. The actual hardware for a CPU does not need separate physical registers to hold the different fields of the instruction register. Instead, there is just one instruction register, with wires (or conducting traces on the surface of the silicon chip) carrying the required bits to the required places. Thus, although we have written several assignment statements to describe how we break apart these fields, these do not describe computations within the physical processor of a real machine.

Given the above, filling in the cases for some instructions is quite simple:
Executing the ADD and LIS instructions
case 3: /* ADD */
        r[dst] = r[s1] + r[s2];
        break;
case 13: /* LIS */
        r[dst] = sxt( 7, const8 );
        break;

Execution of other instructions is more complex because they involve two halfwords, as is the case with the LIL instruction. Others are more complex because they involve additional decoding work, as is the case with the LOADS and STORES instructions. Some of this complexity is described in the next figure:
Executing the LIL and memory reference instructions
case 14: /* LIL */
        const16 = halfword_memory[ program_counter ];
        program_counter = program_counter + 2;
        r[dst] = sxt( 23, (const8 << 16) | const16 );
        break;
case 13: /* memory reference instructions */
        switch (opcode1) {
                case 13: /* LOADS */
                        r[dst] = word_memory[ r[x] ];
                        break;
                case 15: /* MOVE */
                        r[dst] = r[x];
                        break;
                /* 14 more cases */
        }
        break;

It is crucial to remember that, within a real processor, whether it is implemented in silicon or vacuum tubes, the algorithm descrubed by the code given here is directly executed by the hardware! As this hardware executes the instruction, it performs a sequence of register transfers, that is, transfers of data from one register to another. With each register transfer, the data may flow through a component that transforms the data, for example, by adding the value from one register to the value from another.

When these register transfers are done in sequence, we can describe each instruction in the machine language as being executed in a series of microsteps; sometimes, the control unit of the processor directs these microsteps using what can be described as micro instructions, a phrase that was coined at around the same time as the phrase macro instruction. Some processors even have a small and very fast read-only memory that is used to store a microprogram that describes the instruction set of that processor.

If, instead of building hardware, we take an existing computer and run a program on that computer that executes the algorithm given above, we have constructed what is called an emululator for one processor's instruction set that runs on another. Emulators can be very complex because of the need to provide support for input and output devices, but the basic idea is fairly simple. Commercial emulators are widely used in order to allow applications written for one vendor's computers to be run on computers built by other vendors, for example, there are several emulators allowing IBM PC code to be run on Apple Macintosh computers.

Emulators are also used to allow new computers to run code that was originally compiled for older machines, for example, to allow code from old Apple Macintosh computers that used Motorola 68000 processors to run on new Apple Macintosh computers that use the Motorola or IBM Power PC processor. Some of the very first emulators were written with this use in mind. For example, when IBM came out with their System 360 processor in 1965, one of their first products was an emulator for the IBM 1401 computer, so that they could immediately begin replacing existing 1401 computers with 360 computers. On some System 360 processors, this emulation was done by custom microcode for the 360 processor, but on others, it was done entirely by 360 machine language code, because only some versions of the 360 processor were microprogrammable.

Exercises

v) Give the code for the SUB case (you will have to look it up in the Hawk manual).

w) Give the code for the ORIS case.

x) Give the code for the ADDI case -- and note which switch statement it goes in!

y) Write code for the sxt() function. This may take some thought!