4. The Fetch Execute Cycle

Part of 22C:60, Computer Organization Notes
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Overview

When Burks Goldstine and Von Neumann wrote their seminal report, Preliminary discussion of the logical design of an electronic computing instrument in 1946, they outlined many details aspects of the design of the EDVAC computer then being built at the Moore School of Electrical Engineering. The basic structure of this computer followed the outline given at the start of the previous chapter:

The Von Neumann model of a computer

Central Processing Unit
CPU

Registers

Interface(s) or Bus(ses)

Memory
Input
Output

Within this outline, EDVAC and other early computers varied considerably! Some had parallel CPUs in which all the data in any particular memory word were operated on at the same time. Others had serial architectures in which the bits of a word were processed sequentially. Some used various early random access memory technologies for the main memory. Others used sequential memory technologies. Some used binary arithmetic, Others used decimal arithmetic, usually using 4 bits to encode each digit.

All of these early machines shared a very important property with modern computers! The basic idea of the central processing unit has not changed. At heart, almost all of the central processors designed from 1946 onward can be described in the following terms:

The program to be executed is stored in main memory, as a sequence of instructions.
One register within the central processor, today called the program counter holds the address of the next instruction of the program to be executed.
The central processor operates by repeatedly fetching and executing instructions.
To fetch and execute an instruction, the central processor fetches the memory location addressed by the program counter into the instruction register, increments the program counter to point to the next instruction, and executes the contents of the instruction register.

Since the above description is essentially the description of a procedure, we can describe the central processor of a computer in procedural terms by saying that it is hardware that implements an algorithm described in procedureal terms. Here is the algorithm:

The fetch-execute cycle

while (TRUE) do { /* fetch one instruction */ instruction_register = memory[ program_counter ]; program_counter = program_counter + 1; /* execute that instruction */ execute( instruction_register ); }

**The fetch-execute cycle**
while (TRUE) do { /* fetch one instruction / instruction_register = memory[ program_counter ]; program_counter = program_counter + 1; / execute that instruction */ execute( instruction_register ); }

While the basic idea of the fetch-execute cycle has not changed since the dawn of electronic computing, the details have changed immensely. Early machines could only execute a few thousand instructions per second, but by the late 1960's there were machines able to execute a million instructions per second, and today we speak casually of machines that may execute billions of instructions per second.

The code given above shows the program counter being incremented by one for each instruction fetched. In fact, throughout the history of computing, computers have been made with instructions that are smaller or larger than one addressable unit of memory, and by 1960, there were computers with variable-sized instructions, where the size of the instruction depending on what it did.

The set of registers used inside the central processor has also varied considerably between different computers. In addition to the program counter and instruction register, many early computers had only one other register, the accumulator. Instructons did things like load the accumulator from memory, store its contents to memory, or adding to it. In contrast, many modern central processors have many tens of registers in their central processors.

The Hawk CPU

The Hawk processor borrows features from a number of computers designed between 1946 and the present. Perhaps the most important machine it borrows from is the IBM System 360 designed in the mid 1960's; IBM's family of enterprose servers are direct descendants of this. Another important source is the Digital Equipment Corporation PDP-11, a 16-bit machine from which the Motorola 68000 and the Intel 8080 and 8086 also borrowed many ideas; the Intel 8086 is the ancestor of the Pentium family.

The Hawk has 15 general purpose registers of 32 bits each, numbered 1 to 15₁₀ or 1 to F₁₆. Each of these can be used as accumulators to hold intermediate results during computation or as index registers to hold memory addresses. In assembly language programs, these are named R1 to R15 or RF. Early computers had only a single accumulator within the CPU. Several modern computers have 32 or more general purpose registers.

In addition to the general purpose registers, the Hawk has a program counter and a special 32-bit register called the processor status word that contains a variety of special fields. The most important processor status word field holds the condition codes named N, Z, V and C. These are so important that the Hawk emulator breaks them out to be displayed separately.

Whenever you run a Hawk debugger or emulator, the internal state of the processor will be displayed as follows:

The Hawk CPU state

PC: 00000000 R8: 00000000 PSW: 00000000 R1: 00000000 R9: 00000000 NZVC: 0 0 0 0 R2: 00000000 RA: 00000000 R3: 00000000 RB: 00000000 R4: 00000000 RC: 00000000 R5: 00000000 RD: 00000000 R6: 00000000 RE: 00000000 R7: 00000000 RF: 00000000

**The Hawk CPU state**
PC: 00000000 R8: 00000000 PSW: 00000000 R1: 00000000 R9: 00000000 NZVC: 0 0 0 0 R2: 00000000 RA: 00000000 R3: 00000000 RB: 00000000 R4: 00000000 RC: 00000000 R5: 00000000 RD: 00000000 R6: 00000000 RE: 00000000 R7: 00000000 RF: 00000000

In this display of the Hawk processor state, the program counter and processor status word are displayed on the left, with the condition code bits broken out on the third line. The right two columns show the 15 registers, each with its name. In the Hawk debugger, the contents of all of these registers are always shown in hexadecimal, with the exception of the condition codes, which are shown in binary.

Why didn't the display shown above include the instruction register? It is there, inside the central processor, but its contents are only transient, and a typical debugger or emulator will provide a more convenient way of examining the program being executed, either by displaying the contents of memory, or better yet, by displaying a symbolic representation of the code.

Newcomers to machine-language and assembly-language programming should not seek in-depth information on the internals of the processor until they have seen an overview of a few of the simpler instructions. Nonetheless, additional information is always available in the reference manuals for the processor. In our case, Chapter 1 of the Hawk manual contains a section on the central processor that gives more detail on the registers, and it includes a section on the instruction execution cycle giving specifics of this cycle for the Hawk processor.

The Hawk Fetch-Execute Cycle

The fetch-execute cycle of the Hawk differs from the generic fetch-execute cycle because Hawk instructions are each made of one or two halfwords. Therefore, instruction addresses are always even, and the program counter is always incremented by either two or four after each fetch. If we ignore instructions longer than one halfword, we can therefore describe the fetch-execute cycle of the Hawk as follows:

The fetch-execute cycle

while (TRUE) do { /* fetch one instruction */ instruction_register = halfword_memory[ program_counter ]; program_counter = program_counter + 2; /* execute that instruction */ execute( instruction_register ); }

**The fetch-execute cycle**
while (TRUE) do { /* fetch one instruction / instruction_register = halfword_memory[ program_counter ]; program_counter = program_counter + 2; / execute that instruction */ execute( instruction_register ); }

To understand how the central processor of any computer works, we must examine the instruciton register in more detail. In the case of the Hawk, the instruction register is arranged as follows:

Organization of the Hawk instruction register

07 06 05 04 03 02 01 00

opcode dst

15 14 13 12 11 10 09 08

First and foremost, the instruction register is best viewed as two consecutive bytes of one halfword. The bit numbers shown above are the bit numbers in the halfword, but the first byte of the halfword is shown on top of the second, to emphasize the order in which the bytes are stored in memory. In much of the Hawk documentation, the two halves of the halfword are shown as follows, deliberately placed in the wrong order in order to allow left-to-right readers to see the opcode field as the most prominent:

Organization of the Hawk instruction register

07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

opcode dst

The single most important field of the instruction register is the opcode or operation-code field. This 4-bit field determines the interpretation of the other fields of the instruction, and in some cases, it completely determines the details of the instruction to be executed. A second field, the destination register field or dst is almost always used to specify one of the 15 general purpose registers, and this register is almost always used as the destination register, that is, the register in which the result of some operation will be placed.

Basic Arithmetic Instructions

Consider the add instruction. On the Hawk, as on many other computers, this instruction allows the contents of two registers to be added. On the Hawk, this instruction takes the following form:

The Hawk ADD instruction
07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

0 0 1 1 dst s1 s2

**The Hawk ADD instruction**
`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`	`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

0 0 1 1	dst	s1	s2

In the add instruction, the opcode field is 0011₂ and when the central processor sees this pattern, this determines that bits 8 to 11 and bits 12 to 15 of the instruction register will be used as register numbers source two and source one, s2 and s1, specifying the registers that will provide the operands to be added. The result will then be placed in the destination register before the next instruction is executed.

In short, we can use the notation r[dst]=r[s1]+r[s2] to describe what the add instruction does, or rather, what the CPU does when it processes the add instruction. We can describe this instruciton as a three-register instruction because it references a total of three registers. The name three-register instruction is something of a misnomer because some Hawk instructions with this format use s1 or s2 to hold constants instead of register numbers.

To illustrate the impact of executing an add instruction, consider the following initial state of the registers in the Hawk processor:

An initial Hawk CPU state

PC: 00001000 R8: 00000000 PSW: 00000000 R1: 00000000 R9: 00000000 NZVC: 0 0 0 0 R2: 00000000 RA: 00000000 R3: 00000000 RB: 00000000 R4: 01234567 RC: 00000000 R5: 76543210 RD: 00000000 R6: 00000000 RE: 00000000 R7: 00000000 RF: 00000000

**An initial Hawk CPU state**
PC: 00001000 R8: 00000000 PSW: 00000000 R1: 00000000 R9: 00000000 NZVC: 0 0 0 0 R2: 00000000 RA: 00000000 R3: 00000000 RB: 00000000 R4: 01234567 RC: 00000000 R5: 76543210 RD: 00000000 R6: 00000000 RE: 00000000 R7: 00000000 RF: 00000000

In the above initial state, the program counter is set to 1000₁₆ so the next time the processor tries to fetch an instruction, it will be from the halfword in memory occupying memory addresses 1000₁₆ and 1001₁₆. Consider what will happen if this halfword of memory has the value 4532₁₆. The first byte of this value is 32₁₆, with initial hexadecimal digit being 0011₂ in binary. This tells the CPU that this is an add instruction to compute the sum of the contents of register 4 and register 5 and store the result in register 2. If we allow the CPU to execute exactly one instruction in this context, the state of the processor will change to:

The state after ADD R2,R4,R5

PC: 00001002 R8: 00000000 PSW: 00000000 R1: 00000000 R9: 00000000 NZVC: 0 0 0 0 R2: 77777777 RA: 00000000 R3: 00000000 RB: 00000000 R4: 01234567 RC: 00000000 R5: 76543210 RD: 00000000 R6: 00000000 RE: 00000000 R7: 00000000 RF: 00000000

**The state after ADD R2,R4,R5**
PC: 00001002 R8: 00000000 PSW: 00000000 R1: 00000000 R9: 00000000 NZVC: 0 0 0 0 R2: 77777777 RA: 00000000 R3: 00000000 RB: 00000000 R4: 01234567 RC: 00000000 R5: 76543210 RD: 00000000 R6: 00000000 RE: 00000000 R7: 00000000 RF: 00000000

Note that there are only two changes from the initial state to the new state given above. First, the program counter has been incremented by two, since a single halfword instruction occupies 2 bytes, and second, the value of register 2 has changed. This now holds the sum of the values in registers 4 and 5.

Nothing prevents an assembly language programmer from writing H #4532 to put the same add instruction into memory. Recall that the H directive in our SMAL assembly language places a halfword in memory, and recall that the # is used as a prefix indicating that the operand is a hexadecimal number. While an assembly language programmer can write this, it is not very readable! Human readers, and for that matter, the authors of compilers, would rather not program in hexadecimal!

Assembly languages provide tools that allow you to use a symbolic notation. Our SMAL assembly language does not have any built-in knowledge of the Hawk instruction set, but as mentioned in the previous chapter, the relevant definitions are available in a header file. Once this header is included, we can use the convenient shorthand notation ADD R2,R4,R5. Here is a listing of an assembly language program that will put the instruction 3245₁₆ in memory location 1000₁₆ using this symbolic notation:

Assembly of the ADD instruction

1 USE "hawk.h" 2 . = #1000 00001000: 32 45 3 ADD R2, R4, R5

**Assembly of the ADD instruction**
1 USE "hawk.h" 2 . = #1000 00001000: 32 45 3 ADD R2, R4, R5

Of course, a computer with just one instruction is unlikely to be very useful, and as you may have guessed, the 4-bit opcode field of the Hawk allows for 16 distinct instructions. There is, for example, a subtract instruction that has the same basic format ADD, and is known, naturally, as SUB. These arithmetic instructions are all register-to-register instructions, so called because they take operands from registers and deliver a result in registers. There are also memory-reference instructions that transfer data to and from memory, and immediate instructions that have some operands that are constants taken from fields on the instruction register.

Exercises

a) What halfword does ADD R15, R12, R8 assemble to in memory.
b) What halfword does ADD R7, R9, R11 assemble to in memory.
c) If every register r[i] contains the value i, and then ADD R7, R9, R11 is executed, what register is changed and what is its new value?
d) If every register r[i] contains the value i, and then the instruction 4532₁₆ is executed, what register is changed and what is its new value?

More Hawk Instructions, Loading Constants

Register-to-register instructions alone are not sufficient to make a useful computer, because we also need a way to load values into registers and to store results from registers to memory. Most arithmetic operations in real programs involve small constants. For example, the most common use of addition in real programs is to increment a variable, usually by one. Because of this, most computer architectures provide a simple instruction for loading a constant into a register. Such instructions are typically called load-immediate instructions, where the term immediate constant is used to refer to a constant that is part of the program itself, as opposed to a constant stored somewhere in memory outside the program.

In the case of the Hawk architecture, several instructions include immediate constants. There are two load immediate instructions, one for 8-bit constants and one for 24-bit constants, and there are two add immediate instructions, one for 4-bit constants and one for 16-bit constants. The simplest of these is the load immediate short instruction. This has the following format:

The Hawk LIS instruction
07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

1 1 0 1 dst const8

**The Hawk LIS instruction**
`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`	`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

1 1 0 1	dst	const8

In the load-immediate short instruction, the opcode field is 1101₂ and when the central processor sees this pattern, this determines that bits 0 to 7 of the instruction register will be taken as an 8-bit constant in the range -128₁₀ to +127₁₀. This constant is expanded to 32 bits by sign extension and stored in the register specified by the dst field of the instruction. We can use the shorthand notation r[dst]=sxt(7,const8) to describe this.

In our shorthand notation, the sxt() function indicates sign extension, with the first argument being the bit-number of the bit that is to be duplicated to fill all higher-numbered bits. In the two's complement number system, an 8-bit constant can be expanded into a 32-bit constant by padding it, on the left, with 24 additional bits. These additional bits are copies of the most significant bit of the original short constand. If the constant was positive, they are all zero. If negative, they are all one. Many (but not all) Hawk instructions that include short constants sign extend those constants before using them.

Of course, not all constants are short enough to fit in one byte. The load immediate long instruction allows any 24-bit signed constant. Because the instruction register of the Hawk is only 16-bits long, this instruction is coded as two successive halfwords, as follows:

The Hawk LIL instruction

07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

1 1 1 0 dst const8 (bits 0-7)

15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

const16 (bits 8-23)

The constant in the load-immediate long instruction is sign extended before loading it into the 32-bit destination register, so the allowed range of values is -2²³ to 2²³-1, or -8,388,608 to 8,388,607. Again, we can use the shorthand notation r[dst]=sxt(23,const) to describe this, ignoring for the moment the problem of combining the 8-bit and 16-bit parts of the constant.

To demonstrate the use of these instructions, consider the following assembly language program that loads constants into registers R4 and R5 and then adds them, storing the result in R2:

Loading and adding some constants

1 USE "hawk.h" 2 . = #1000 00001000: D4 12 3 LIS R4, #12 00001002: E5 654321 4 LIL R5, #654321 00001006: 32 45 5 ADD R2, R4, R5

**Loading and adding some constants**
1 USE "hawk.h" 2 . = #1000 00001000: D4 12 3 LIS R4, #12 00001002: E5 654321 4 LIL R5, #654321 00001006: 32 45 5 ADD R2, R4, R5

Note that, in assembling this code, the assembler allocated one halfword for the load-immediate-short instruction LIS and two halfwords for the load-immediate-long instruction LIL. As a result, the ADD instruction sits in the 4th halfword of the program. In the assembly listing, each single-halfword instruction is given in hexadecimal as two consecutive bytes, while the long LIL instruction is shown as a single byte followed by a triple-byte quantity holding the immediate constant.

If this bit of code is loaded into the memory of a Hawk computer and the program counter is set to 1000₁₆ with all of the registers equal to zero, and then the computer is started and run for 3 instruction cycles, the state of the processor will end up as:
The state after 2 loads and an add

PC: 00001008 R8: 00000000 PSW: 00000000 R1: 00000000 R9: 00000000 NZVC: 0 0 0 0 R2: 00654333 RA: 00000000 R3: 00000000 RB: 00000000 R4: 00000012 RC: 00000000 R5: 00654321 RD: 00000000 R6: 00000000 RE: 00000000 R7: 00000000 RF: 00000000

**The state after 2 loads and an add**
PC: 00001008 R8: 00000000 PSW: 00000000 R1: 00000000 R9: 00000000 NZVC: 0 0 0 0 R2: 00654333 RA: 00000000 R3: 00000000 RB: 00000000 R4: 00000012 RC: 00000000 R5: 00654321 RD: 00000000 R6: 00000000 RE: 00000000 R7: 00000000 RF: 00000000

In this new processor state, the program counter has advanced to 1008₁₆, 8 bytes from the point at which the program began because the program itself occupied 4 halfwords. Registers R4 and R5 have been loaded as the instructions required, and the ADD instruciton has computed the sum of these two registers in R2.

The Hawk architecture does not have an instruction for loading an arbitrary 32-bit constant. Instead, there is a sequence of 2 instructions that takes up 3 halfwords of memory to load a 32-bit constant. The assembler (or rather, the hawk.h header file) knows this sequence by the name LIW, standing for load immediate word, and generally, programmers can use this as if it was a single instruction, even though it is not.

Exercises

e) What is the most appropriate instruction to use for loading each of the following constants: 4, 40, 400, 4000, 40000, 400000?
f) What halfword does LIS R5, 44 assemble to in memory?
g) Write a SMAL Hawk program to load each register with its register number, so register r[i] ends up containing i.

Adding Constants

Studies of the frequency of different operations in real programs show that the most common operator in programs is addition, and that most addition operations add a constant to a variable. Furthermore, the most common constant in this context is one! It is fair to say that the most common statement in all programs is i=i+1. Because of this, most computer architectures provide a single instruction for incrementing the value stored in a register. The Hawk is no exception!

Of course, we could always increment a variable by first loading the constant required into a register and then adding that constant to the variable, but this takes two instructions. This is exactly how we do it when the constant we are adding is very large, but for small constants, we use an add immediate instruction. In the case of the Hawk architecture, there are two versions, add short immediate or ADDSI and the more general add immediate. On the Hawk, the short immediate version uses a 4 bit constant, while the more general version uses 16 bits.

The Hawk ADDSI instruction
07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

0 0 0 1 dst 1 1 0 0 const4

**The Hawk ADDSI instruction**
`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`	`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

0 0 0 1	dst	1 1 0 0	const4

We can use the shorthand notation r[dst]=r[dst]+sxt(3,const4) to describe this. Because constant in the add short immediate instruction is sign extended before adding it to the contents of the 32-bit destination register, the allowed range is -8 to +7. If a larger range of values is needed, the add immediate instruction should be used. As with LIL, this is a long instruction composed of two consecutive halfwords:

The Hawk ADDI instruction

07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

1 1 1 1 dst 0 1 1 0 x

15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

const16

We can use the shorthand notation r[dst]=r[x]+sxt(15,const16) to describe this. Unlike ADDSI, ADDI does not require that the source and destinations be the same register. The constant is sign extended, so the range of allowed values is -32768 to +32767.

In the rare event that larger constants are needed, these must be loaded into a register first, and then added. The following assembly language example illustrates this:

Incrementing registers by various amounts

1 USE "hawk.h" 2 . = #1000 00001000: 13 C1 3 ADDSI R3, 1 ; R3=R3+1 00001002: F4 63 000A 4 ADDI R4, R3, 10 ; R4=R3+10 00001006: E5 0186A0 5 LIL R5, 100000 0000100A: 35 54 6 ADD R5, R5, R4 ; R5=R4+100000

**Incrementing registers by various amounts**
1 USE "hawk.h" 2 . = #1000 00001000: 13 C1 3 ADDSI R3, 1 ; R3=R3+1 00001002: F4 63 000A 4 ADDI R4, R3, 10 ; R4=R3+10 00001006: E5 0186A0 5 LIL R5, 100000 0000100A: 35 54 6 ADD R5, R5, R4 ; R5=R4+100000

Exercises

h) What is the most appropriate instruction or instruction sequence to use for incrementing a register by each of the following constants: 4, 40, 400, 4000, 40000, 400000?
i) What halfword does ADDSI R5, 4 assemble to in memory?
j) Translate r1 = r2 + r3 - 5 into Hawk instructions expressed in assembly language.
k) Translate r1 = (r2 - r3) + 5 into Hawk instructions expressed in assembly language.

Memory Reference Instructions

Loading constants into registers and doing arithmetic on those constants is not enough unless all of the variables used by a program can be held in registers. Although this is occasionally true, the vast majority of programs require that some variables be held in memory. Some computer architectures allow any machine instruction to reference operands anywhere in memory or in any register in the processor. This makes the processor very complex, leading to a class of computer architectures known as complex instruction set complexity architectures or CISC architectures.

On the other extreme, some architectures, the Hawk among them, require that all arithmetic be performed by register-to-register instructions, with all use of variables in memory done by loading those variables into registers or storing the contents of registers to memory. These architectures are known as load-store architectures or reduced instruction set complexity architectures or RISC architectures.

Where a CISC architecture might allow the statement a=b+c to be executed by one very complex add instruction, on a RISC architecture, if the three variables are all in memory, we need two load instructions to get the values of the variables into registers, then an add instruction to add the values, and finally a store instruction to put the result back in memory.

The Hawk architecture has 5 distinct load instructions and 3 store instructions. Only the simplest of these are discussed here, these load or store one register using the contents of a second register as the memory address. These are known as short-memory-reference instructions because they are encoded as a single 16-bit halfword. Some of the others have long forms made up of of two consecutive halfwords.

On the Hawk and many other computers, registers that hold memory addresses are called index registers. This makes sense if you think of memory as an array and remember that one use of the word index is as a synonym for an array subscript.

The LOADS or load short instruction, for example, loads the destination register from the 32-bits word stored at the memory location addressed by the contents of the second register, referred to as the index register. This instruction has the following form:

The Hawk LOADS instruction
07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

1 1 1 1 dst 1 1 0 1 x

**The Hawk LOADS instruction**
`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`	`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

1 1 1 1	dst	1 1 0 1	x

The STORES or store short instruction has exactly the same format, but it moves the data in the opposite direction, from the destination register to the location pointed to by the index register. Note the abuse of the term destination register in the context of this instruction!

The Hawk STORES instruction
07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

1 1 1 1 dst 1 0 1 0 x

**The Hawk STORES instruction**
`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`	`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

1 1 1 1	dst	1 0 1 0	x

These two instructions have identically the same opcode, 1111₂. This opcode value indicates that this is a memory-reference instruction and that bits 0 to 3 of the instruction register specify an index register while bits 8 to 11 specify the destination register, as usual. Bits 4 to 7 of the instruction register are available in this instruction format to specify which specific memory memory reference instruction is to be performed. It is reasonable to think of these bits as an extension of the opcode field of the instruction.

In a high-level language, the operation of LOADS can be described as r[dst]=m[r[x]]. Similarly, STORES can be described as m[r[x]]=r[dst]. As was mentioned in the introduction to the instruction register format, the name destination field is used because, in most instructions, this field identifies the destination register. The Hawk store instructions are exceptions to this generalization!

The following example shows the use of these instructions to add the contents of in memory words 10000₁₆ and 10004₁₆ and put the sum in word 10008₁₆. This begins by loading index registers with the addresses of the 3 locations before the sequence of 4 instructions that actually does the work:

Adding two variables

1 USE "hawk.h" 2 . = #1000 3 4 ; load the addresses 00001000: E5 010000 5 LIL R5, #10000 00001004: E6 010004 6 LIL R6, #10004 00001008: E7 010008 7 LIL R7, #10008 8 9 ; then perform the arithmetic 0000100C: F3 D5 10 LOADS R3, R5 ; from #10000 0000100E: F4 D6 11 LOADS R4, R6 ; from #10004 00001010: 32 34 12 ADD R2, R3, R4 00001012: F2 A7 13 STORES R2, R7 ; to #10008

**Adding two variables**
1 USE "hawk.h" 2 . = #1000 3 4 ; load the addresses 00001000: E5 010000 5 LIL R5, #10000 00001004: E6 010004 6 LIL R6, #10004 00001008: E7 010008 7 LIL R7, #10008 8 9 ; then perform the arithmetic 0000100C: F3 D5 10 LOADS R3, R5 ; from #10000 0000100E: F4 D6 11 LOADS R4, R6 ; from #10004 00001010: 32 34 12 ADD R2, R3, R4 00001012: F2 A7 13 STORES R2, R7 ; to #10008

The idea that each load from memory or store to memory must be preceeded by loading the address of the variable into an index register is rather daunting, but note that these are only the simplest load and store instructions offered by the Hawk computer. They are good examples for a first-time Hawk programmer, but there are other load and store instructions that we will discuss later.

Exercises

l) Translate a = (b - 64) + c to Hawk assembly code, assuming that the variables a, b and c are stored in memory at words 10000₁₆ through 10008₁₆ in that order.
m) Consider F1D5₁₆, F261₁₆, FFFF₁₆, F2A5₁₆ as a sequence of halfwords. What sequence of Hawk assembly language instructions, when assembled, would produce this sequence?

The Register-to-Register Move Instruction

The Hawk computer has a number of other useful instructions. For example, the move instruction moves the contents of one register to another:

The Hawk MOVE instruction
07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

1 1 1 1 dst 1 1 1 1 x

**The Hawk MOVE instruction**
`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`	`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

1 1 1 1	dst	1 1 1 1	x

We can describe the effect of this with the notation r[dst]=r[x]. This instruction is considered a short memory reference instruction, just like LOADS because inside the processor, it performs exactly the same computation as LOADS, but instead of using the value of r[x] as the memory address of the operand, it uses that value as the operand itself.

Note that value 1111111111111111₂ or FFFF₁₆, when treated as an instruction, means move register 15 to register 15, doing nothing. Such an instruction is called a no-op, short for no operation. Processors designed for use with programmable read-only memory or PROM are frequently designed so that the values 0000₁₆ and FFFF₁₆ are no-ops. This is because each bit of a PROM can be changed exactly once from its initial value, and all bits are initially the same, either 0 or 1. If the initial value is FFFF₁₆, such no-ops can later be changed to useful instructions by changing some of the ones to zeros. If 0000₁₆ is also a no-op, erroneous instruction can be converted to no-ops by turning the remaining ones to zeros. This allows a program in PROM to be patched without the need to replace the PROM chip.

Exercises

n) Using only one LIS instruction and a sequence of MOVE instructions, write sequence of Hawk assembly language instructions that will set all 15 registers to 1.
o) How many distinct distinct correct answers are there to the previous problem? This is really a discrete math question!
p) Using only one LIS instruction and a sequence of ADD instructions, write sequence of Hawk assembly language instructions that will set all 15 registers so that r[i] contains i.
q) Assuming you set all 15 registers in order, how many distinct solutions are there to the previous problem? If you eliminate the order constraint, in how many distinct solutions are there? This question is harder than the similar one above.

Load Immediate Word and Macros

As mentioned earlier, the Hawk architecture does not include a single instruction for loading full 32-bit constants. Instead, this is done with a two instruction sequence. The first is an LIL instruction to load the high 24 bits of the constant. The second instruction packs the low 8 bits of the constant into the destination register. This is done with the or-immediate shifted instruction. This first shifts the contents of the destination register 8 places left, then it ors an 8-bit constant into the destination:

The Hawk ORIS instruction
07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

1 1 0 0 dst const8

**The Hawk ORIS instruction**
`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`	`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

1 1 0 0	dst	const8

This can be described formally as r[dst]=(r[dst]<<8)|const8. Note that the constant is not sign extended!

The << operator has the same definition here as it has in C, C++ and Java, as well as in our SMAL assembly language. This operator shifts the binary value of its left-hand argument to the left the number of places indicated by its right-hand argument, filling zeros into the left end of the word. In arithmetic terms, the result of this is that, a<<b is the same as a×2^b.

The | or bitwise-or operator is also used here as it is in C, C++ and Java. This operator produces each bit of the result as the logical or of the corresponding bits of the operands. So, 1100₂ or 1010₂ is 1110₂; this carefully composed example includes within it all possible combinations of 1 and 0.

While there may be rare exceptions, the vast majority of uses of the ORIS instruction will be in the context of a previous LIL instruction. For example if we wanted to load 76543210₂ into register 5, we could use the following instruction sequence:

Loading a 32-bit constant

1 USE "hawk.h" 2 . = #1000 00001000: E5 765432 3 LIL R5, #765432 00001004: C5 10 4 ORIS R5, #10 ; R5 = #76543210

**Loading a 32-bit constant**
1 USE "hawk.h" 2 . = #1000 00001000: E5 765432 3 LIL R5, #765432 00001004: C5 10 4 ORIS R5, #10 ; R5 = #76543210

When a sequence of instructions is used as a unit, as if it were a single instruction, that sequence is known as a macro instruction or just a macro. This terminology dates back to Autocoder, the assembler for the IBM 705 computer from the mid 1950's. Most assembly languages designed since that time have included tools to package macros so that they can be used as if they were single instructions. In the SMAL assembler, the macro LIW, meaning load-immediate word, can be defined as:

Defining a macro to load a 32-bit constant

MACRO LIW =dst, =const LIL dst, const >> 8 ORIS dst, const & #FF ENDMAC

**Defining a macro to load a 32-bit constant**
	MACRO LIW =dst, =const LIL dst, const >> 8 ORIS dst, const & #FF ENDMAC

In reading the above macro, note the use of the >> operator. This is the right-shift operator, and as in C, C++ and Java, it shifts the bit pattern from its left-hand operand rightward the number of places indicated by its right-hand operand, filling the most significant bits with zeros. If you interpret the operands as unsigned integers, a>>b is equivalent to a/2^b.

Note also the use of the & or bitwise-and operator. As in C, C++ and Java, each bit of the result is the logical and of the corresponding bits of the operands. So, 1100₂ and 1010₂ is 1000₂.

This macro definition is included in the standard hawk.h header file, so most assembly language programmers will never explicitly use the ORIS instruction; instead, they will write programs using LIW to load full-word constants, as follows:

Using the macro to load a 32-bit constant

1 USE "hawk.h" 2 . = #1000 00001000: E5 765432 C5 3 LIW R5, #76543210 00001005: 10

**Using the macro to load a 32-bit constant**
1 USE "hawk.h" 2 . = #1000 00001000: E5 765432 C5 3 LIW R5, #76543210 00001005: 10

Of course, later on, when the CPU executes this code, it will not see the macro instruction as one unit, it will execute it using two iterations of the fetch-execute cycle, and if you use a debugger to examine the contents of memory, you will see two separate instructions, not one macro instruction. In the assembly listing above, the assembler listed the output from the two different instructions from left to right, filling each line before starting on the next line. The second instruction, the LIW instruction, ended up split across two lines of the listing. This has no effect on how the data is stored in memory.

The hawk.h header file includes macros that define each of the built-in instructions in the Hawk. For example, the load-immediate-long and or-immediate-and-shift instructions can be defined as follows:

Macros for two Hawk machine instructions

; the load-immediate-long instruction MACRO LIL =dst, =const B #E0 | dst T const ENDMAC ; the or-immediate-and-shift instruction MACRO ORIS =dst, =const B #C0 | dst B const ENDMAC

**Macros for two Hawk machine instructions**
	; the load-immediate-long instruction MACRO LIL =dst, =const B #E0 \| dst T const ENDMAC ; the or-immediate-and-shift instruction MACRO ORIS =dst, =const B #C0 \| dst B const ENDMAC

The keyword MACRO tells the assembler that the following text, up to ENDMAC, is to be substituted for the macro name, whenever that name appears in the opcode field of an instruction. As this substitution is done, the parameters given to the macro are substituted into the text. Macros resemble subroutines, and the terminology is frequently similar. We speak of macro calls and macro parameters, but unlike subroutines, macro processing is purely textual.

The macro defining the LIL instruction uses T directive to assemble a 24-bit (three byte) constant into memory. This is the only use of the T directive in the Hawk macros.

If you look in the actual hawk.h file, you will see macros somewhat more complex than those presented here for the LIW, LIL and ORIS instructions. These macros are complicated by additions to prevent the use of any symbol that is not a register name in the fields where register names are required. If all assembly language programs were error free, the code given here would suffice, but real programmers are good at making errors, and the additional complexity in the hawk.h file helps detect some of these errors when a program is assembled.

Exercises

r) Give an appropriate macro definition for the Hawk LIS instruction.
s) Give an appropriate macro definition for the Hawk ADDSI instruction.
t) Give an appropriate macro definition for the Hawk ADDI instruction.
u) Give an appropriate macro definition for the Hawk LOADS instruction.

The Fetch-Execute Cycle Revisited

Our original presentation of the fetch-execute cycle was quite brief, suggesting that the hardware of our CPU implemented something like the following algorithm:

The Hawk fetch-execute cycle

while (TRUE) do { /* fetch one instruction */ instruction_register = halfword_memory[ program_counter ]; program_counter = program_counter + 2; /* execute that instruction */ execute( instruction_register ); }

**The Hawk fetch-execute cycle**
while (TRUE) do { /* fetch one instruction / instruction_register = halfword_memory[ program_counter ]; program_counter = program_counter + 2; / execute that instruction */ execute( instruction_register ); }

We have covered enough ground that we can expand on this considerably, giving details of how the fields of the instruction are broken out and how they are decoded:

The Hawk cycle, in more detail

while (TRUE) do { /* fetch one instruction */ instruction_register = halfword_memory[ program_counter ]; program_counter = program_counter + 2; /* break out fields of the instruction register */ opcode = (instruction_register >> 4) & 15; dst = (instruction_register ) & 15; const8 = (instruction_register >> 8) ; s1 = opcode1 = (instruction_register >> 12) ; s2 = const4 = x = (instruction_register >> 8) & 15; /* execute the instruction */ switch (opcode) { /* 16 cases, one per legal opcode value */ } }

**The Hawk cycle, in more detail**
while (TRUE) do { /* fetch one instruction / instruction_register = halfword_memory[ program_counter ]; program_counter = program_counter + 2; / break out fields of the instruction register / opcode = (instruction_register >> 4) & 15; dst = (instruction_register ) & 15; const8 = (instruction_register >> 8) ; s1 = opcode1 = (instruction_register >> 12) ; s2 = const4 = x = (instruction_register >> 8) & 15; / execute the instruction / switch (opcode) { / 16 cases, one per legal opcode value */ } }

In the above code, we have given no detail for each instruction, but we can easily add this. Note that many of the fields of the instruction have multiple names. For example, we used the name const4 in our description of the ADDSI instruction to refer to the same field of the instruction that we called x in the memory reference instructions or s2 in the ADD and SUB instructions.

The actual hardware for a CPU does not need separate registers to hold the different fields of the instruction register. Instead, there is just one instruction register, with wires (conducting traces on the surface of the chip) carrying the different bits to the required places. The variables opcode, const8, s1 and s2 are names for these bundles of wires and not names for registers. The assignment statements giving values to these fields do not describe real computations inside the processor, they describe how the bundles of wires are connected to the various bits of the instruction register.

Given the above, filling in the cases for instructions like ADD and LIS is quite simple:

Executing the ADD and LIS instructions

case 3: /* ADD */ r[dst] = r[s1] + r[s2]; break; case 13: /* LIS */ r[dst] = sxt( 7, const8 ); break;

**Executing the ADD and LIS instructions**
	case 3: /* ADD / r[dst] = r[s1] + r[s2]; break; case 13: / LIS */ r[dst] = sxt( 7, const8 ); break;

Execution of instructions that involve two halfwords is more comples. This is the case with the LIL instruction. Only after we decode the opcode field to determine that it is an LIL instruction can we fetch the second halfword.

Others instructions are more complex because they involve additional decoding work. This is the case with the LOADS and STORES instructions. Here, we must first decode the fact that the instruction is a memory reference instruction before we know to use the s1 field as a secondary opcode. Some of this complexity is described in the next figure:

Executing the LIL and memory reference instructions

case 14: /* LIL */ const16 = halfword_memory[ program_counter ]; program_counter = program_counter + 2; r[dst] = sxt( 23, (const16 << 8) | const8 ); break; case 13: /* memory reference instructions */ switch (opcode1) { case 13: /* LOADS */ r[dst] = word_memory[ r[x] ]; break; case 15: /* MOVE */ r[dst] = r[x]; break; /* 14 more cases */ } break;

**Executing the LIL and memory reference instructions**
	case 14: /* LIL / const16 = halfword_memory[ program_counter ]; program_counter = program_counter + 2; r[dst] = sxt( 23, (const16 << 8) \| const8 ); break; case 13: / memory reference instructions / switch (opcode1) { case 13: / LOADS / r[dst] = word_memory[ r[x] ]; break; case 15: / MOVE / r[dst] = r[x]; break; / 14 more cases */ } break;

It is crucial to remember that, within a real processor, whether it is implemented in silicon or vacuum tubes, the algorithm descrubed by the code given here is directly executed by the hardware! As this hardware executes the instruction, it performs a sequence of register transfers, that is, transfers of data from one register to another. With each register transfer, the data may flow through a component that transforms the data, for example, by adding the value from one register to the value from another.

When these register transfers are done in sequence, we can describe each instruction in the machine language as being executed in a series of microsteps; sometimes, the control unit of the processor directs these microsteps using what can be described as micro instructions, a phrase that was coined at around the same time as the phrase macro instruction. Some processors even have a small and very fast read-only memory that is used to store a microprogram that describes the instruction set of that processor.

If, instead of building hardware, we take an existing computer and run a program on that computer that executes the algorithm given above, we have constructed what is called an emululator for one processor's instruction set that runs on another. Emulators can be very complex because of the need to provide support for input and output devices, but the basic idea is fairly simple. Commercial emulators are widely used in order to allow applications written for one vendor's computers to be run on computers built by other vendors, for example, there are several emulators allowing IBM PC code to be run on Apple Macintosh computers.

Emulators are also used to allow new computers to run code that was originally compiled for older machines, for example, to allow code from the original Apple Macintosh computers that used Motorola 68000 processors to run on newer Apple Macintosh computers that use the Motorola or IBM Power PC processor. Apple went through this again during the transition from the Power PC to the Intel Pentium architecture. Some of the very first emulators were written with this use in mind. For example, when IBM came out with their System 360 processor in 1965, one of their first products was an emulator for the IBM 1401 computer, so that they could immediately begin replacing existing 1401 computers with 360 computers. On some System 360 processors, this emulation was done by custom microcode for the 360 processor, but on others, it was done entirely by 360 machine language code, because only some versions of the 360 processor were microprogrammable.

Exercises

v) Give the code for the SUB case (you will have to look it up in the Hawk manual).
w) Give the code for the ORIS case.
x) Give the code for the ADDI case -- and note which switch statement it goes in!
y) Write code for the sxt() function. This may take some thought!

Instruction Register Formats of some Other Machines

For the sake of comparison, here are the instruction register formats of some other historically important machines. The simplest of the DEC PDP-8 family of computers had a main memory consisting of 4096 12-bit words, and the processor contained a 12-bit program counter and a single 12-bit accumulator. The instruction register on this machine had the following format:

The PDP-8 instruction register

00 01 02 03 04 05 06 07 08 09 10 11

opcode i z address

With a 3-bit opcode field, this machine obviously had a very small instruction set. It shold be no surprise that computers with this architecture were among the least expensive computers on the market for many years in the 1960s and 1970s. The i and z bits determined how the 7-bit address field of the instruction was used. If the z bit was zero, the address referred to a location in the bottom 128 words of memory, while if it was one, the reference was to a location in the 128-bit page selected by the top 5 bits of the program counter. If the i bit was one, the contents of the memory address selected by the combined z bit and 7-bit address was used to load a complete 12-bit address from memory. This is called indirect addressing. Notice also that the bits of the word are numbered from the left instead of from the right. This is the bigendian system of numbering the bits.

The IBM System 360, ancestor of IBM's current line of enterprise servers, was at the other extreme of price and performance. This machine had 16 general prupose registers and a 32-bit word, with the following instruction register format:

The IBM 360 instruction register

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15

opcode r x

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
b displacement

Like the Hawk, this machine allowed two different instruction sizes. Short instructions were only 16 bits long, while long instructions were 32 bits. The opcode was always 8 bits, allowing 256 different instructions. The r field was comparable in use to the dst field of a Hawk instruction. It usually designated the destination register except for store instructions.

The x field was used to designate the source register for register to register arithmetic instructions, and the x field was used to designate the index register that held a memory address for short memory reference instructions. The 360 long memory reference instructions used x and b to designate two registers, the index register and the base register. The contents of these were added to the 12-bit displacement to form the memory address.

Modern architectures frequently have a dismaying variety of instruction formats. The Intel 80x86 family that dominates today's marketplace is extremely messy, while the Power PC architecture is more modest, with 5 basic instruction formats, all of which are 32 bits long. With 32 registers, this machine used 5 bits for each of the source and target register fields, and some instructions specified as many as 4 registers (typically three source registers and a target).

The Power PC instruction register

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
opcode tgt/src src/tgt immediate
opcode tgt/src src/tgt src extended opcode
opcode tgt/src src/tgt src src ext opcd Rc
opcode BO BI BD AA LK
opcode LI AA LK