CS:2630 Notes, Chapter 4

The Preliminary discussion of the logical design of an electronic computing instrument by Burks, Goldstine and Von Neumann, written in in 1946, outlines the design of the EDVAC computer that was delivered to the Ballistics Research Laboratory in 1949. The basic structure of EDVAC followed the outline given at the start of Chapter 3:

The Von Neumann model of a computer

Central Processing Unit
CPU

Registers

Interface(s) or Bus(ses)

Memory

Input
Output

Within this outline, EDVAC and other early computers varied widely. Parallel machines operated on all the bits of each word at once, while serial machines processed bits one at a time. Some used real random-access memory, while others simulated random-access memory using sequential devices like drums or delay lines. Some used binary, others used binary-coded decimal with multiple bits per digit.

The basic idea of the central processing unit has not changed since these early days. At heart, almost all of the central processors designed from 1946 onward can be described in the following terms:

The above description is essentially an algorithm, so we can describe the central processor in algorithmic terms using programming notation. Here is the algorithm in C:

**The fetch-execute cycle**
while (TRUE) do { /* fetch one instruction / instruction_register = memory[ program_counter ]; program_counter = program_counter + 1; / execute that instruction */ execute( instruction_register ); }

While the basic idea of the fetch-execute cycle has not changed since the dawn of electronic computing, the details have changed immensely. Early machines could only execute a few thousand instructions per second, but by the late 1960s there were machines able to execute a million instructions per second, and today we speak casually of machines that may execute billions of instructions per second.

The above algorithm increments the program counter by one for each instruction, while from the start, many computers have had instructions smaller or larger than one addressable unit of memory. By 1960, there were computers with variable-sized instructions, where the size of the instruction depended on its function.

The set of registers used inside the central processor has also varied considerably between different computers. In addition to the program counter and instruction register, many early computers had only one other register, the accumulator. Instructons did things like load the accumulator from memory, store its contents to memory, or add to it. In contrast, many modern central processors have several tens of registers.

The Hawk CPU

The Hawk borrows from a number of computers designed from 1946 to the present. The most important of these is the IBM System 360 first sold in 1965; IBM's family of enterprose servers are descended from this. Another important source is the Digital Equipment Corporation PDP-11 first sold in 1970. The Motorola 68000 and the Intel 8080 and 8086 borrowed many ideas from the PDP-11, and the 8086 is the ancestor of the Intel Pentium family that still dominates today's desktop computing market.

As with the IBM System 360, The Hawk has 15 general purpose registers of 32 bits each, numbered from 1 to 15₁₀ or 1 to F₁₆. Each of these can be used as accumulators to hold intermediate results during computation or as index registers to hold memory addresses. In assembly language programs, these are named R1 to R15 or RF. Early computers had only a single accumulator within the CPU. The PDP-11 and Motorola 68000 had 7 general purpose registers. Several modern computers have 32 or more general purpose registers.

In addition to the general purpose registers, the Hawk has a program counter and a special 32-bit register called the processor status word that contains a variety of special fields. The most important PSW field holds the condition codes named N, Z, V and C. These are so important that the Hawk emulator displays them separately.

Whenever you run a Hawk debugger or emulator, the internal state of the processor will be displayed as follows:

**The Hawk CPU state**
PC: 00000000 R8: 00000000 PSW: 00000000 R1: 00000000 R9: 00000000 NZVC: 0 0 0 0 R2: 00000000 RA: 00000000 R3: 00000000 RB: 00000000 R4: 00000000 RC: 00000000 R5: 00000000 RD: 00000000 R6: 00000000 RE: 00000000 R7: 00000000 RF: 00000000

The Hawk CPU state

PC:  00000000                R8: 00000000
PSW: 00000000  R1: 00000000  R9: 00000000
NZVC: 0 0 0 0  R2: 00000000  RA: 00000000
               R3: 00000000  RB: 00000000
               R4: 00000000  RC: 00000000
               R5: 00000000  RD: 00000000
               R6: 00000000  RE: 00000000
               R7: 00000000  RF: 00000000

In the Hawk processor state display, the program counter and processor status word are shown on the left, with the condition codes (the low 4 bits of the status word) broken out on the third line. The right two columns show the 15 registers, each with its name. The Hawk debugger always shows the contents of the registers in hexadecimal.

Why didn't the display shown above include the instruction register? It is there, inside the central processor, but its contents are only transient, and typical debuggers or emulators provide convenient ways of examining the program being run, either by showing the contents of memory, or by showing the code in symbolic form.

Newcomers to machine-language and assembly-language programming should not seek in-depth information on the internals of the processor until they have seen an overview of a few of the simpler instructions. This information is always available in the reference manual for the processor. In our case, Chapter 1 of the Hawk manual contains a section on the central processor that gives more detail on the registers, and it includes a section on the instruction execution cycle giving specifics of this cycle for the Hawk processor.

The Hawk Fetch-Execute Cycle

The fetch-execute cycle of the Hawk differs from the generic fetch-execute cycle because each Hawk instruction is made of either one or two halfwords. As a result, instruction addresses are always even, and the program counter always increments by either two or four. If we ignore instructions longer than one halfword, we can describe the fetch-execute cycle of the Hawk as follows:

**The fetch-execute cycle**
while (TRUE) do { /* fetch one instruction / instruction_register = halfword_memory[ program_counter ]; program_counter = program_counter + 2; / execute that instruction */ execute( instruction_register ); }

The fetch-execute cycle

while (TRUE) do {
        /* fetch one instruction */
        instruction_register = halfword_memory[ program_counter ];
        program_counter = program_counter + 2;

        /* execute that instruction */
        execute( instruction_register );
}

To understand how the central processor of any computer works, we must examine the instruciton register in more detail. In the case of the Hawk, for 16-bit instructions, the instruction register is arranged as follows:

`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`

opcode				dst

`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

First and foremost, the instruction register is best viewed as two consecutive bytes of one halfword. The bit numbers shown above are the bit numbers in the halfword, but the first byte of the halfword is shown on top of the second, to emphasize the order in which the bytes are stored in memory. In much of the Hawk documentation, the two halves of the halfword are shown as follows, deliberately placed in the wrong order in order to allow left-to-right readers to see the opcode field as coming first:

07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

opcode dst

The most important field of the instruction register is the opcode or operation-code field. This 4-bit field determines the interpretation of the other fields of the instruction. In some cases, it completely determines the details of the instruction to be executed. A second field, the destination register field or dst is almost always used to specify one of the 15 general purpose registers. This register is almost always used as the destination register, that is, the place where the result of some operation will be go.

Basic Arithmetic Instructions

Consider the add instruction. On the Hawk, as on many other computers, this instruction allows the contents of two registers to be added. On the Hawk, this instruction takes the following form:

**The Hawk `ADD` instruction**
`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`	`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

0 0 1 1				dst				s1				s2

In the add instruction, the opcode field is 0011₂ and when the central processor sees this pattern, this determines that bits 8 to 11 and bits 12 to 15 of the instruction register will be used as register numbers source one and source two, s1 and s2, specifying the registers that will provide the operands to be added. The result will then be placed in the destination register before the next instruction is executed.

In short, we use the notation r[dst]=r[s1]+r[s2] to describe the add instruction, or rather, what the CPU does when it processes the add instruction. Here, r is the array of registers, with the register number being used as a subscript. We describe this instruciton as a three-register instruction because it specifies three registers. The name three-register instruction is something of a misnomer because some Hawk instructions with this format use s1 or s2 to hold small constants instead of register numbers.

To illustrate the impact of executing an add instruction, consider a Hawk processor with the following initial state:

**An initial Hawk CPU state**
PC: 00001000 R8: 00000000 PSW: 00000000 R1: 00000000 R9: 00000000 NZVC: 0 0 0 0 R2: 00000000 RA: 00000000 R3: 00000000 RB: 00000000 R4: 01234567 RC: 00000000 R5: 76543210 RD: 00000000 R6: 00000000 RE: 00000000 R7: 00000000 RF: 00000000

An initial Hawk CPU state

PC:  00001000                R8: 00000000
PSW: 00000000  R1: 00000000  R9: 00000000
NZVC: 0 0 0 0  R2: 00000000  RA: 00000000
               R3: 00000000  RB: 00000000
               R4: 01234567  RC: 00000000
               R5: 76543210  RD: 00000000
               R6: 00000000  RE: 00000000
               R7: 00000000  RF: 00000000

In the above initial state, the program counter is set to 1000₁₆ so the next time the processor tries to fetch an instruction, it will be from the halfword in memory occupying memory addresses 1000₁₆ and 1001₁₆. Suppose that this halfword has the value 4532₁₆. The Hawk is a lowbyter or little-endian machine, so first byte of this value is 32₁₆, with most significant hexadecimal digit being 0011₂ in binary. This is the ADD opcode, so the CPU sees s1 = 4, s2 = 5, and dst = 2. The computation this specifies is r2=r4+r5, and in SMAL Hawk assembly language, it would have been specified as ADD R2,R4,R5. As a result, if we allow the CPU to execute exactly one instruction in this context, the state of the processor will change to:

**The state after `ADD R2,R4,R5`**
PC: 00001002 R8: 00000000 PSW: 00000000 R1: 00000000 R9: 00000000 NZVC: 0 0 0 0 R2: 77777777 RA: 00000000 R3: 00000000 RB: 00000000 R4: 01234567 RC: 00000000 R5: 76543210 RD: 00000000 R6: 00000000 RE: 00000000 R7: 00000000 RF: 00000000

PC:  00001002                R8: 00000000
PSW: 00000000  R1: 00000000  R9: 00000000
NZVC: 0 0 0 0  R2: 77777777  RA: 00000000
               R3: 00000000  RB: 00000000
               R4: 01234567  RC: 00000000
               R5: 76543210  RD: 00000000
               R6: 00000000  RE: 00000000
               R7: 00000000  RF: 00000000

Note that there are only two changes from the initial state to the new state given above. First, the program counter has been incremented by two, since a single halfword instruction occupies 2 bytes, and second, the value of register 2 has changed. This now holds the sum of the values in registers 4 and 5.

Nothing prevents an assembly language programmer from writing H #4532 to put this add instruction into memory. Recall that the H directive in our SMAL assembly language places a halfword in memory, and recall that the # is used as a prefix indicating that the operand is a hexadecimal number. While an assembly language programmer can write this, it is not very readable. People don't like reading programs written in hexadecimal.

The SMAL assembly language does not have any built-in knowledge of the Hawk instruction set, but as mentioned in the previous chapter, the relevant definitions are available in a header file. Once this header is included, we can use the convenient shorthand notation ADD R2,R4,R5. Here is a listing of an assembly language program that puts the two byte sequence 32₁₆ 45₁₆ making up the instruction halfword 4532₁₆ in memory location 1000₁₆:

**Assembly of the `ADD` instruction**
1 USE "hawk.h" 2 . = #1000 00001000: 32 45 3 ADD R2,R4,R5

A computer with just one instruction would be difficult to use, but the 4-bit opcode field of the Hawk allows for 16 distinct instructions. For example, the SUB instruction has the same format as ADD, but it subtracts. ADD and SUB are register-to-register instructions, so called because they take operands from registers to produce a result in a register. There are also memory-reference instructions that transfer data to and from memory, and immediate instructions that have some operands that are constants taken directly from fields on the instruction register.

Exercises

a) What halfword does ADD R15,R12,R8 assemble to in memory.

b) What halfword does ADD R7,R9,R11 assemble to in memory.

c) If every register r[i] contains the value i, and then ADD R7,R9,R11 is executed, what register is changed and what is its new value?

d) If every register r[i] contains the value i, and then the instruction 5632₁₆ is executed, what register is changed and what is its new value?

More Hawk Instructions, Loading Constants

Register-to-register instructions alone are not sufficient to make a useful computer. We also need a way to load values into registers and to store results from registers to memory. The most common operands in arithmetic expressions are small constants. For example, the most common use of addition in real programs is to increment a variable, usually by one. Because of this, most computer architectures provide simple instructions for loading constants into registers. These are typically called load-immediate instructions. The term immediate constant refers to a constant that is either taken from the instruction register or addressed by the program counter.

In the case of the Hawk architecture, several instructions include immediate constants. There are two load immediate instructions, a short one for 8-bit constants and a long one for 24-bit constants. Similarly, there are two forms of add immediate, one for short 4-bit constants and one for long 16-bit constants. The simplest of these is the load-immediate-short instruction. This has the following format:

**The Hawk `LIS` instruction**
`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`	`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

1 1 0 1				dst				const8

In the load-immediate-short instruction, the opcode field is 1101₂ and when the central processor sees this pattern, it uses bits 8 to 15 of the instruction as a two's complement 8-bit constant in the range –128₁₀ to +127₁₀. This constant is expanded to 32 bits by sign extension and stored in the register specified by the dst field of the instruction. We can use the shorthand notation r[dst]=sxt(7,const8) to describe this.

In our shorthand notation, the sxt() function indicates sign extension, with the first argument being the bit-number of the bit that is to be duplicated to fill all higher-numbered bits. In the two's complement number system, an 8-bit constant can be expanded into a 32-bit constant by padding it, on the left, with 24 additional bits. These additional bits are copies of bit 7 of the original short constant, its sign bit. If the constant was positive, the new bits are all zero. If negative, they are all one. Many (but not all) Hawk instructions that include short constants sign extend those constants before using them.

Of course, not all constants are short enough to fit in one byte. The load-immediate-long instruction allows any 24-bit signed constant. Because the instruction register of the Hawk is only 16-bits long, this instruction is coded as two successive halfwords, as follows:

07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

1 1 1 0 dst const8 (bits 0-7)

15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

const16 (bits 8-23)

The constant in the load-immediate long instruction is sign extended before loading it into the 32-bit destination register, so the allowed range of values is –2²³ to 2²³–1, or –8,388,608 to 8,388,607. Again, we can use the shorthand notation r[dst]=sxt(23,const) to describe this, ignoring for the moment the problem of combining the 8-bit and 16-bit parts of the constant.

It is worth remembering that the load immediate long instruction can be viewed as a sequence of 4 bytes in memory, like this.

`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`

1 1 1 0				dst

`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

const8 (bits 0-7)

`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`

const8 (bits 8-15)

`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

const8 (bits 16-23)

To demonstrate the use of these instructions, consider the following assembly language program that loads constants into registers R4 and R5 and then adds them, storing the result in R2:

**Loading and adding some constants**
1 USE "hawk.h" 2 . = #1000 00001000: D4 12 3 LIS R4,#12 00001002: E5 654321 4 LIL R5,#654321 00001006: 32 45 5 ADD R2,R4,R5

Note that, in assembling this code, the assembler allocated one halfword for the load-immediate-short instruction LIS and a byte followed by a triple byte for the load-immediate-long instruction LIL. As a result, the ADD instruction sits in the 4th halfword of the program. In the assembly listing, each single-halfword instruction is given in hexadecimal as two consecutive bytes, while the long LIL instruction is shown as a single byte followed by a triple-byte quantity holding the immediate constant.

If this bit of code is loaded into the memory of a Hawk computer and the program counter is set to 1000₁₆ with all of the registers equal to zero, and then the computer is started and run for 3 instruction cycles, the state of the processor will end up as:

**The state after 2 loads and an add**
PC: 00001008 R8: 00000000 PSW: 00000000 R1: 00000000 R9: 00000000 NZVC: 0 0 0 0 R2: 00654333 RA: 00000000 R3: 00000000 RB: 00000000 R4: 00000012 RC: 00000000 R5: 00654321 RD: 00000000 R6: 00000000 RE: 00000000 R7: 00000000 RF: 00000000

The state after 2 loads and an add

PC:  00001008                R8: 00000000
PSW: 00000000  R1: 00000000  R9: 00000000
NZVC: 0 0 0 0  R2: 00654333  RA: 00000000
               R3: 00000000  RB: 00000000
               R4: 00000012  RC: 00000000
               R5: 00654321  RD: 00000000
               R6: 00000000  RE: 00000000
               R7: 00000000  RF: 00000000

In this new processor state, the program counter has advanced to 1008₁₆, 8 bytes from the point at which the program began because the program itself occupied 4 halfwords. The first two instructions loaded registers R4 and R5, and the ADD instruciton put the sum of these two registers in R2.

In general, long instructions run slower than short ones, and even long-immediate instructions cannot handle all possible constant values. The Hawk architecture does not have an instruction for loading an arbitrary 32-bit constant. Instead, a sequence of 2 instructions taking up 3 halfwords of memory can do this. The assembler (or rather, the hawk.h header file) knows this sequence by the name LIW, standing for load immediate word, and generally, programmers can use this as if it was a single instruction, even though it is not.

Exercises

e) What is the most appropriate instruction to use for loading each of the following constants: 4, –40, 400, –4000, 40000, –400000?

f) What halfword does LIS R5,-44 assemble to in memory?

g) Write a SMAL Hawk program to load each register with its register number, so register r[i] ends up containing i.

Adding Constants

Studies of the frequency of different operations in real programs show that the most common operator in programs is addition, that most additions add a small constant to a variable, and that the most common constant added to a variable is one. So, the most common statement in all programs is i=i+1. Because of this, most computer architectures provide a single instruction for incrementing the value stored in a register. The Hawk is no exception.

Of course, we could always increment a variable by first loading the constant required into a register and then adding that constant to the variable, but this takes two instructions. This is what we do for the rare large constant, but CPU designers usually give us special small fast instructions for the common cases. The Hawk is typical, with two different add immediate instructions. Add short immediate or ADDSI can be used for signed constants that fit in 4 bits, while the more general, and longer, add immediate allowing 16-bit constants.

**The Hawk `ADDSI` instruction**
`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`	`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

0 0 0 1				dst				1 1 0 0				const4

We can use the shorthand notation r[dst]=r[dst]+sxt(3,const4) to describe this. Because the constant in the add short immediate instruction is sign extended before adding it to the contents of the 32-bit destination register, the allowed range is -8 to +7; in addition, since adding zero is not useful, the Hawk CPU includes trickery to encode -8 as zero. If a larger range of values is needed, the add immediate instruction should be used. As with LIL, this is a long instruction composed of two consecutive halfwords:

07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

1 1 1 1 dst 0 1 1 0 x

15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

const16

We can use the shorthand notation r[dst]=r[x]+sxt(15,const16) to describe this. Unlike ADDSI, ADDI does not require that the source and destinations be the same register. The constant is sign extended, so the range of allowed values is -32768 to +32767.

In the rare event that larger constants are needed, these must be loaded into a register first, and then added. The following assembly language example illustrates this:

**Incrementing registers by various amounts**
1 USE "hawk.h" 2 . = #1000 00001000: 13 C1 3 ADDSI R3,1 ; R3=R3+1 00001002: F4 63 000A 4 ADDI R4,R3,10 ; R4=R3+10 00001006: E5 0186A0 5 LIL R5,100000 0000100A: 35 54 6 ADD R5,R5,R4 ; R5=R4+100000

Exercises

h) What is the most appropriate instruction or instruction sequence to use for incrementing a register by each of the following constants: 4, –40, 400, –4000, 40000, –400000?

i) What halfword does ADDSI R5,4 assemble to in memory?

j) Translate r1 = r2 + r3 - 5 into SMAL Hawk instructions.

k) Translate r1 = (r2 - r3) + 5 into SMAL Hawk instructions.

Memory Reference Instructions

Loading constants into registers and doing arithmetic on those constants is not enough unless all of the variables used by a program can be held in registers. This is occasionally true, but most programs must hold some variables in memory. Some computer architectures allow any machine instruction to reference operands anywhere in memory or in any register in the processor. These have complex instruction set complexity or CISC architectures.

On the other extreme, some architectures, the Hawk among them, require that all arithmetic be performed by register-to-register instructions, with all use of variables in memory done by loading those variables into registers or storing the contents of registers to memory. These architectures are known as load-store architectures or reduced instruction set complexity or RISC architectures. The MIPS, IBM Power, and ARM architectures are commercially important RISC architectures; the ARM is currently the dominant architecture used in cellphones.

On a CISC architecture, the C statement a=b+c might compile to one very complex add instruction. In contrast, on a RISC architecture, it would compile to two load instructions to get the values of the variables b and c into registers, then an add instruction, and finally a store instruction to put the result in a.

The Hawk architecture has 5 distinct load instructions and 3 store instructions. The simplest of these use a memory address taken directly from a register. These are known as short-memory-reference instructions because they are encoded as a single 16-bit halfword. For each of these short instructions, there is a corresponding long form made up of of two consecutive halfwords, where the second halfword is used to modify the address.

On the Hawk and many other computers, registers holding memory addresses are called index registers. This makes sense if you think of memory as an array and note that the word index can be a synonym for array subscript.

The LOADS or load short instruction, for example, loads the destination register from the 32-bit word stored at the memory location addressed by the contents of the index register. We suggest pronouncing LOADS as load-s so it is not confused with a plural or past tense. This instruction has the following form:

**The Hawk `LOADS` instruction**
`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`	`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

1 1 1 1				dst				1 1 0 1				x

The STORES or store short instruction has the same format, but it moves data the opposite way, from the destination register to the location pointed to by the index register. Note the abuse of the term destination register in the context of this instruction. We suggest pronouncing STORES as store-s.

**The Hawk `STORES` instruction**
`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`	`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

1 1 1 1				dst				1 0 1 0				x

These two instructions have the same opcode, 1111₂. This value merely indicates that this is a memory-reference instruction, using bits 8 to 11 of the instruction register specify an index register, and bits 12 to 15 to specify which specific memory memory reference instruction is to be performed. These bits can be thought of as an extension of the opcode field.

In a high-level language, the operation of LOADS can be described as r[dst]=m[r[x]]. Similarly, STORES can be described as m[r[x]]=r[dst]. As mentioned in the introduction to the instruction register format, the name destination field is used because this field usually identifies the destination register. The Hawk store instructions are exceptions to this!

Suppose memory locations 10000₁₆, 10004₁₆ and 10008₁₆ hold the variables a, b and c, and we want the machine code equivalent of the C statement a=b+c; on the Hawk, we do this by first loading index registers with the addresses of the varialbes, then loading the operands, adding them, and storing the result:

**Adding two variables**
1 USE "hawk.h" 2 . = #1000 3 4 ; load the addresses 00001000: E5 010000 5 LIL R5,#10000 ; address a 00001004: E6 010004 6 LIL R6,#10004 ; address b 00001008: E7 010008 7 LIL R7,#10008 ; address c 8 9 ; then perform the arithmetic 0000100C: F3 D6 10 LOADS R3,R6 ; get b 0000100E: F4 D7 11 LOADS R4,R7 ; get c 00001010: 32 34 12 ADD R2,R3,R4 00001012: F2 A5 13 STORES R2,R5 ; set a=b+c

Adding two variables

                                 1          USE     "hawk.h"
                                 2  .       =       #1000
                                 3  
                                 4  ; load the addresses
 00001000: E5  010000            5          LIL     R5,#10000      ; address a
 00001004: E6  010004            6          LIL     R6,#10004      ; address b
 00001008: E7  010008            7          LIL     R7,#10008      ; address c
                                 8  
                                 9  ; then perform the arithmetic
 0000100C: F3  D6               10          LOADS   R3,R6          ; get b
 0000100E: F4  D7               11          LOADS   R4,R7          ; get c
 00001010: 32  34               12          ADD     R2,R3,R4
 00001012: F2  A5               13          STORES  R2,R5          ; set a=b+c

The idea that each load from memory or store to memory must be preceeded by loading the address of the variable into an index register can be daunting, but there are many optimizations that can reduce this cost.

Exercises

l) Translate a = (b - 64) + c to SMAL Hawk code, assuming that the variables a, b and c are stored in memory at words 10000₁₆ through 10008₁₆ in that order.

m) Consider F1D5₁₆, F261₁₆, FFFF₁₆, F2A5₁₆ as a sequence of halfwords. What sequence of SMAL Hawk instructions, when assembled, would produce this sequence?

The Register-to-Register Move Instruction

The Hawk MOVE insruction is a simple register-to-register assignment, copying data from one register to another:

**The Hawk `MOVE` instruction**
`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`	`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

1 1 1 1				dst				1 1 1 1				x

We describe the effect of this with the notation r[dst]=r[x]. This is considered a short memory reference instruction, just like LOADS because inside the processor, it performs exactly the same computation as LOADS, but instead of using the value of r[x] as a memory address, it uses the register contents as the operand itself.

Note that value 1111111111111111₂ or FFFF₁₆, when treated as an instruction, means move register 15 to register 15, doing nothing. Such an instruction is called a no-op, short for no operation. Processors designed for use with programmable read-only memory or PROM are frequently designed so that the values 0000₁₆ and FFFF₁₆ are no-ops. This is because each bit of a PROM can be changed exactly once from its initial value, and all bits are initially the same, either all 0 or all 1. If the initial value is FFFF₁₆, such no-ops can later be changed to useful instructions by changing some of the ones to zeros. If 0000₁₆ is also a no-op, erroneous instruction can be converted to no-ops by turning the remaining ones to zeros. This allows a program in PROM to be patched without the need to replace the PROM chip. In the case of flash memory, it allows patching without the need to reflash the entire memory, a significantly slower operation.

Exercises

n) Using only one LIS instruction and a sequence of MOVE instructions, write sequence of Hawk assembly language instructions that will set all 15 registers to 1.

o) How many distinct distinct correct answers are there to the previous problem? This is really a discrete math question!

p) Using only one LIS instruction and a sequence of ADD instructions, write sequence of Hawk assembly language instructions that will set all 15 registers so that r[i] contains i.

q) Assuming you set all 15 registers in order, how many distinct solutions are there to the previous problem? If you eliminate the order constraint, in how many distinct solutions are there? This question is harder than the similar one above.

Load Immediate Word and Macros

As mentioned earlier, the Hawk architecture does not have a single instruction to load a full 32-bit constant. Instead, a two-instruction instruction sequence is used. First, an LIL instruction loads the high 24 bits of the constant. Then, an ORIS instruction slips in the low 8 bits of the constant. ORIS stands for or-immediate shifted. This first shifts the contents of the destination register 8 places left, then it ors an 8-bit constant into the register:

**The Hawk ORIS instruction**
`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`	`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

1 1 0 0				dst				const8

This can be described formally as r[dst]=(r[dst]<<8)|const8. Note that the constant is not sign extended!

The << or left-shift operator has the same definition here as it has in C, C++, Java and Python, as well as in SMAL. This operator shifts the binary value of its left-hand argument to the left the number of places indicated by its right-hand argument, filling zeros into the left end of the word. In arithmetic terms, the result of this is that, a << b is the same as a × 2^b.

The | or bitwise-or operator is also used here as it is in C, C++, Java, Python and SMAL. This operator produces each bit of the result as the logical or of the corresponding bits of the operands. So, 1100₂ | 1010₂ = 1110₂.

With rare exceptions, ORIS will always be used immediately after an LIL instruction. For example if we wanted to load 76543210₂ into register 5, we could use the following instruction sequence:

**Loading a 32-bit constant**
1 USE "hawk.h" 2 . = #1000 00001000: E5 765432 3 LIL R5,#765432 00001004: C5 10 4 ORIS R5,#10 ; R5 = #76543210

When a sequence of instructions is used as a unit, as if it were a single instruction, that sequence is known as a macro instruction or just a macro. This terminology dates back to Autocoder, the assembler for the IBM 705 computer from the mid 1950s. Most assembly languages designed since that time have included tools to package macros so that they can be used as if they were single instructions. In the SMAL assembler, the macro LIW, meaning load-immediate word, can be defined as:

**Defining a macro to load a 32-bit constant**
	MACRO LIW =dst, =const LIL dst, const >> 8 ORIS dst, const & #FF ENDMAC

The >> or right-shift operator in the macro body has the same definition it has in C, C++, Java, and Python. This operator shifts the binary value of its left-hand operand to the right the number of places indicated by its right-hand operand, filling zeros into the most significant bits. If the operands are unsigned integers, a >> b is the same as ⌊a ÷ 2^b⌋.

Aside: Java and SMAL have two distinct right-shift operators. The signed right shift >> treats the operand as a signed two's complement integer, duplicating the sign bit as needed to fill in the most significant bits of the number, so shifting a negative integer will produce a negative result. The unsigned right shift >>> treats the operand as an unsigned integer, filling in the high bits with zeros, so shifting a negative integer will produce a positive result.

In contrast, in C, the >> operator does both jobs, depending on how the operand was declared. If the operand is of type int, it is a signed shift, while if the operand is of type unsigned int it is an unsigned shift.

All integers in Python are represented using signed two's complement values, so Python only has signed shifts.

Also note the & or bitwise-and operator in the macro body. As in C, C++ and Java, each bit of the result is the logical and of the corresponding bits of the operands. So 1100₂ & 1010₂ = 1000₂.

To explore how the macro body works, suppose that the macro parameter const has the value 76543210₁₆. The shifting operation const >> 8 throws away the least significant 8 bits giving the result 765432₁₆, while the masking operation const & #FF throws away all but the least significant 8 bits, giving 10₁₆. If this is confusing, recall that the mask FF₁₆ is ...0011111111₂. Shifting and masking, as illustrated here, are vary common in code that must pick out and manipulate individual bits or block of bits within a word.

In the macro definition above, the equals signs used as prefixes on the macro formal parameters tell the assembler to evaluat the expression used as a parameter and pass its value. This is called passing parameters by value. Omitting the equals sign would cause the assembler substitute the entire text of the actual parameter for the formal parameter.

The LIW macro is part of the standard hawk.h header file, so most Hawk programmers will never explicitly use the ORIS instruction; instead, they will write programs using LIW to load full-word constants, as follows:

**Using the macro to load a 32-bit constant**
1 USE "hawk.h" 2 . = #1000 00001000: E5 765432 C5 3 LIW R5, #76543210 00001005: 10

When the assembler processes this code, it expands the LIW macro call to a sequence of two instructions, so when the CPU executes this code, it will see an LIS followed by an ORIS, executed in two iterations of the fetch-execute cycle. If you use a debugger to examine the contents of memory, you will see two separate instructions, not one macro instruction. In the assembly listing above, the assembler listed the object code resulting from the macro call from left to right, filling each line before starting on the next line. The object code for the LIW instruction ended up split across two lines of the listing. This has no effect on how the data is stored in memory.

The hawk.h header file includes macros that define each of the built-in instructions in the Hawk. For example, the load-immediate-long and or-immediate-and-shift instructions can be defined as follows:

**Macros for two Hawk machine instructions**
	; the load-immediate-long instruction MACRO LIL =dst, =const B #E0 \| dst T const ENDMAC ; the or-immediate-and-shift instruction MACRO ORIS =dst, =const B #C0 \| dst B const ENDMAC

The keyword MACRO tells the assembler that the following text, up to ENDMAC, is to be substituted for the macro name whenever that name appears in the text as an instruction name. As this substitution is done, the parameters to the macro are substituted into the macro body. Macros resemble subroutines with similar terminology. We speak of macro calls and macro parameters, but unlike subroutines, macro processing is purely textual.

The macro defining the LIL instruction uses a T directive to assemble a 24-bit (three byte) constant into memory. This is the only use of the T directive in the Hawk macros.

If you look in the hawk.h file, you will see macros somewhat more complex than those presented here for LIW, LIL and ORIS. The complexity is the result of additions to prevent the use of symbols that are not register names in fields where register names are required. If all assembly language programs were error free, the code given here would suffice. Unfortunately, real programmers make errors, and the additional complexity in the hawk.h file helps detect some of these errors when a program is assembled.

Exercises

r) Give an appropriate macro definition for the Hawk LIS instruction.

s) Give an appropriate macro definition for the Hawk ADDSI instruction.

t) Give an appropriate macro definition for the Hawk ADDI instruction.

u) Give an appropriate macro definition for the Hawk LOADS instruction.

Macros in C

C and C++ also support macros, although with a very different syntax and with weaker semantics. Here are some example C macros used in a function to compute the length of a string:

**Example macros in C**
#define NUL '\0' #define succ(i) ((i) + 1) int strlen( char* s ) { int len = 0; while (s[len] != NUL) len = succ(len); return len; }

Here, the macro NUL renames the character constant '\0', so wherever the identifier NUL appears in the program, it is expanded to '\0' before the compiler sees it. The macro succ has a parameter, i, so the macro call succ(len) expands to ((len) + 1).

The succ macro definiton may seem overly parenthesized. Remember that macros expansion is done by simple textual replacement. This can lead to unexpected results if the macros are defined carelessly. Suppose the body of succ had been given as i + 1 without parentheses. As a result, -succ(5) would expand to -5 + 1, giving the value –4 where you might expect –6, and succ(5)*2 would expand to 5 + 1*2, giving the value 7 where you might expect 12.

When you run a C program, you use a toolchain that begins with the C preprocessor, followed by the actual C compiler. The compiler may produce assembly language that is then assembled make object code. Finally, a linker is used to combine the object code with various libraries to make an executable. All of this is typically hidden inside the cc command, although options on the cc command let you run just parts of the toolchain. For example, cc -E runs just the preprocessor showing you what it does to your program.

The Fetch-Execute Cycle Revisited

Our original presentation of the fetch-execute cycle was quite brief, suggesting that Hawk worked as follows:

**The Hawk fetch-execute cycle**
while (TRUE) do { /* fetch one instruction / instruction_register = halfword_memory[ program_counter ]; program_counter = program_counter + 2; / execute that instruction */ execute( instruction_register ); }

The Hawk fetch-execute cycle

while (TRUE) do {
        /* fetch one instruction */
        instruction_register = halfword_memory[ program_counter ];
        program_counter = program_counter + 2;

        /* execute that instruction */
        execute( instruction_register );
}

This is vastly oversimplified. We have covered enough ground that we can expand on this considerably. The following code, while still simplified, gives details of how the fields of the instruction are broken out and how instructions are decoded:

**The Hawk cycle, in more detail**
while (TRUE) do { /* fetch one instruction / instruction_register = halfword_memory[ program_counter ]; program_counter = program_counter + 2; / break out fields of the instruction register / opcode = (instruction_register >> 4) & 15; dst = (instruction_register ) & 15; const8 = (instruction_register >> 8) ; s1 = opcode1 = (instruction_register >> 12) ; s2 = const4 = x = (instruction_register >> 8) & 15; / execute the instruction / switch (opcode) { / 16 cases, one per legal opcode value */ } }

The Hawk cycle, in more detail

while (TRUE) do {
        /* fetch one instruction */
        instruction_register = halfword_memory[ program_counter ];
        program_counter = program_counter + 2;

        /* break out fields of the instruction register */
        opcode =          (instruction_register >>  4) & 15;
        dst =             (instruction_register      ) & 15;
        const8 =          (instruction_register >>  8)     ;
        s1 = opcode1 =    (instruction_register >> 12)     ;
        s2 = const4 = x = (instruction_register >>  8) & 15;

        /* execute the instruction */
        switch (opcode) {
                /* 16 cases, one per legal opcode value */
        }
}

In the above code, we have given no detail for each instruction, but we can easily add this. Note that many of the fields of the instruction have multiple names. For example, we used the name const4 in our description of the ADDSI instruction to refer to the same field of the instruction that we called x in the memory reference instructions or s2 in the ADD and SUB instructions.

The actual hardware for a CPU does not need separate registers to hold the different fields of the instruction register. Instead, there is just one instruction register, with wires (conducting traces on the surface of the chip) carrying the different bits to the required places. The variables opcode, const8, s1 and s2 are names for these bundles of wires and not names for registers. The assignment statements giving values to these fields do not describe real computations inside the processor, they describe how the bundles of wires are connected to the various bits of the instruction register.

Given the above, filling in the cases for instructions like ADD and LIS is quite simple:

**Executing the ADD and LIS instructions**
	case 3: /* ADD / r[dst] = r[s1] + r[s2]; break; case 15: / LIS */ r[dst] = sxt( 7, const8 ); break;

Execution of double-length instructions is more complex. This is the case with the LIL instruction. Only after we decode the opcode field to determine that it is an LIL instruction can we fetch the second halfword.

Other instructions are more complex because they involve additional decoding work. This is the case with the LOADS and STORES instructions. Here, we must decode the fact that the instruction is a memory reference instruction before we use the s1 field as a secondary opcode. Some of this complexity is described below:

**Executing the LIL and memory reference instructions**
	case 14: /* LIL / const16 = halfword_memory[ program_counter ]; program_counter = program_counter + 2; r[dst] = sxt( 23, (const16 << 8) \| const8 ); break; case 13: / memory reference instructions / switch (opcode1) { case 13: / LOADS / r[dst] = word_memory[ r[x] ]; break; case 15: / MOVE / r[dst] = r[x]; break; / 14 more cases */ } break;

Executing the LIL and memory reference instructions

case 14: /* LIL */
        const16 = halfword_memory[ program_counter ];
        program_counter = program_counter + 2;
        r[dst] = sxt( 23, (const16 << 8) | const8 );
        break;

case 13: /* memory reference instructions */
        switch (opcode1) {
                case 13: /* LOADS */
                        r[dst] = word_memory[ r[x] ];
                        break;
                case 15: /* MOVE */
                        r[dst] = r[x];
                        break;
                /* 14 more cases */
        }
        break;

It is crucial to remember that, within any CPU, whether implemented using silicon or vacuum tubes, the fetch-execute algorithm is directly executed by hardware. As this hardware executes the instruction, it performs a sequence of register transfers, to move data from one register to another. With each register transfer, the data may be routed through components that transform the data, for example, adding values from different registers.

When these register transfers are done in sequence, we can describe each instruction in the machine language as being executed in a series of microsteps; sometimes, the control unit of the processor directs these microsteps using what can be described as micro instructions, a phrase that was coined at around the same time as the phrase macro instruction. Some processors even have a small and very fast read-only memory that is used to store a microprogram that describes the instruction set of that processor.

If, instead of building hardware, we take an existing computer and run a program on that computer that executes the algorithm given above, we have constructed what is called an emululator for one processor's instruction set that runs on another. Emulators can be very complex because of the need to provide support for input and output devices, but the basic idea is fairly simple. Commercial emulators are widely used in order to allow applications written for one vendor's computers to be run on computers built by other vendors, for example, there were several emulators allowing IBM PC code using the Intel Pentium architecture to be run on Apple Macintosh computers in the days when Apple was using the Power PC architecture.

Emulators are also used to allow new computers to run code that was originally compiled for older machines, for example, to allow code from the original Apple Macintosh computers that used Motorola 68000 processors to run on newer Apple Macintosh computers that use the Motorola or IBM Power PC processor. Apple went through this again during the transition from the Power PC to the Intel Pentium architecture. Some of the very first emulators were written with this use in mind. For example, when IBM came out with their System 360 processor in 1965, one of their first products was an emulator for the IBM 1401 computer, so that they could immediately begin replacing existing 1401 computers with 360 computers.

Exercises

v) Give the code for the SUB case (you will have to look it up in the Hawk manual).

w) Give the code for the ORIS case.

x) Give the code for the ADDI case -- and note which switch statement it goes in!

y) Write code for the sxt() function. This may take some thought!

Instruction Register Formats of some Other Machines

For the sake of comparison, here are the instruction register formats of some other historically important machines. The simplest of the DEC PDP-8 family of computers had a main memory consisting of 4096 12-bit words, and the processor contained a 12-bit program counter and a single 12-bit accumulator. The instruction register on this machine has the following format:

`00`	`01`	`02`	`03`	`04`	`05`	`06`	`07`	`08`	`09`	`10`	`11`

opcode			i	z	address

With a 3-bit opcode field, the PDP-8 obviously has a very small instruction set. It shold be no surprise that computers with this architecture were among the least expensive computers on the market for many years in the 1960s and 1970s. Notice also that the bits of the word are numbered from the left instead of from the right. This is the bigendian system of numbering the bits.

The i and z bits specify the addressing mode. This determines how the 7-bit address field of the instruction is used. If the z bit is zero, the address refers to a location in the bottom 128 words of memory, while if it is one, the reference is to a location in the 128-word page selected by the top 5 bits of the program counter.

If the i bit is zero, the operand is in the memory word specified by the address and the z bit. This is called direct addressing. If the i bit is one, the addressed word holds the complete 12-bit address of the operand. This is called indirect addressing. The PDP-8 can directly address only 256 words of memory, page zero and the current page. Using indirect addressing, it can address the entire 4096 word address space.

The IBM System 360, introduced about the same time as the PDP-8 is the ancestor of IBM's current line of enterprise servers. This machine was at the other extreme of price and performance from the PDP-8. Like the Hawk, the 360 has 15 general prupose registers and a 32-bit word, where instructions occupy either one or two consecutive halfwords of memory. The instruction register on the 360 has the following format:

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15

opcode r x

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

b displacement

Like the PDP-8, the 360 is a bigendian machine. For short instructions, the second halfword of the 360 instruction register is unused. The opcode is always 8 bits, allowing 256 different instructions. The r field is comparable in use to the dst field of a Hawk instruction. It usually designates the destination register except for store instructions.

The x field is used to designate the source register for register to register arithmetic instructions, and it designates the index register holding a memory address for short memory reference instructions. The 360 long memory reference instructions used the x and b fields to designate two registers, the index register and the base register. The contents of these are added to the 12-bit displacement to form the memory address.

Modern architectures frequently have a dismaying variety of instruction formats. The Intel 80x86 family that dominates today's marketplace is extremely messy, while the Power PC architecture is more modest, with 5 basic instruction formats, all of which are 32 bits long. With 32 registers, this machine uses 5 bits for each of the source and target register fields, and some instructions specify as many as 4 registers (typically three source registers and a target).

**The Power PC instruction register**
`00`	`01`	`02`	`03`	`04`	`05`	`06`	`07`	`08`	`09`	`10`	`11`	`12`	`13`	`14`	`15`	`16`	`17`	`18`	`19`	`20`	`21`	`22`	`23`	`24`	`25`	`26`	`27`	`28`	`29`	`30`	`31`

opcode						tgt/src					src/tgt					immediate
opcode						tgt/src					src/tgt					src					extended opcode
opcode						tgt/src					src/tgt					src					src					ext opcd					Rc
opcode						BO					BI					BD														AA	LK
opcode						LI																								AA	LK

The ARM (Advanced RISC Microprocessor), common in cellphones and the Raspberry Pi family of computers supports two different instruction sets. The machine has (for our purposes) 16 registers, so like the Hawk and the IBM 360, it has 4-bit fields used for register selection. The Thumb instruction set is vaguely similar to the Hawk and IBM 360, with instruction halfwords, while full 32-bit ARM instructions are comparble to the Power PC in some ways but with some interesting differences.

One big difference is that there is a condition field in the high 4 bits of every instruction that makes that instruction conditional on the result of a previous instruction. That is, it is as if every instruction is included in an if statement to see if it should do anything at all. The other big difference is that essentially all of the arithmetic operations can have one of their operands shifted arbitrarily before it is combined with the other operand.

For most instructions, 8 bits specify the opcode, although spare bits elsewhere in the instruction register (marked o in the following) are also used as opcode bits. A single instruction can, for example, add register Rs shifted left by count bits to register Rn, and store the result in Rd, but only if the previous instruction produced a positive result. Another instruction could subtract Rn from a 12-bit immediate constant and unconditionally put the result in Rd. The presentation here is vastly oversimplified:

**The ARM instruction register**
`31`	`30`	`29`	`28`	`27`	`26`	`25`	`24`	`23`	`22`	`21`	`20`	`19`	`18`	`17`	`16`	`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`	`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`

cond				opcode								Rn				Rd				count					shift		o	Rm
cond				opcode								Rn				Rd				Rs				o	shift		o	Rm
cond				opcode								Rn				Rd				rotate				immediate
cond				opcode								Rn				Rd				immediate
cond				opcode				branch offset

4. The Fetch Execute Cycle

Overview

The Hawk CPU

The Hawk Fetch-Execute Cycle

Basic Arithmetic Instructions

Exercises

More Hawk Instructions, Loading Constants

Exercises

Adding Constants

Exercises

Memory Reference Instructions

Exercises

The Register-to-Register Move Instruction

Exercises

Load Immediate Word and Macros

Exercises

Macros in C

The Fetch-Execute Cycle Revisited

Exercises

Instruction Register Formats of some Other Machines