5. Assembly Language Programming

Part of 22C:60, Computer Organization Notes
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Assembly into Memory

To run a program, you must translate it from its source language to machine language and load it in memory. While there are assemblers that directly assemble into memory, most do not, and our SMAL assembler is typical. When the SMAL assembler processes a source file, say hello.a for the classic hello-world program, the assembled output is placed in an object file called hello.o, with the .o suffix used to indicate that it is an object file. Most assemblers also produce a listing file that shows the source and object code together. In the case of the SMAL assembler, this will be named hello.l, with the .l suffix indicating that it is a listing.

Typical SMAL assembler inputs and outputs
inputs hello.a
hawk.macs
monitor.h
smal
outputs hello.o
hello.l

**Typical SMAL assembler inputs and outputs**
inputs	`hello.a` `hawk.macs` `monitor.h`
`smal`
outputs	`hello.o` `hello.l`

If the source file was a stand-alone program, that is, if it does not call any functions defined somewhere else, and if the source file used only absolute assembly and never assembled data into a relocatable memory address (a topic discussed briefly in chapter 3), then the object file can be loaded directly into memory.

More frequently, however, the object file will need to be linked with other object files in order to create a loadable and executable file. In this example, the main program uses a package called the monitor, with an interface description and some documentation stored in the file monitor.h and object code in the file monitor.o. These file naming conventions are identical to those commonly used with C and C++. The Hawk monitor is a minimal package of input-output routines just sufficient to run simple demonstation programs. Before we run our hello-world program, we must link the object file hello.o with monitor.o to make an executable program. By default, the Hawk linker stores the executable program in link.o, and by default, the executable code is linked so that it will be loaded into read-only memory within the Hawk emulator.

Typical SMAL Hawk linker inputs and outputs
inputs hello.o
monitor.o
link
outputs link.o

**Typical SMAL Hawk linker inputs and outputs**
inputs	`hello.o` `monitor.o`
`link`
outputs	`link.o`

Finally, a program called the loader is used to load the program into memory so that it can be executed. The loader is usually a built-in part of the operating system on a modern computer, and each time you run a machine-language program, the loader is called on to load that program before it is run.

Another way to run a loadable object file is to run it under a debugger. Debuggers typically offer a way to observe the internal state of the processor as it executes a program, and they typically also allow examination of the program as it sits in memory. Our Hawk debugger, for example, includes a disassembler that shows a version of the assembly language source that corresponds to the machine code it finds in memory. This disassembled code is not always the same as the original assembly source because there are many possible source programs that will produce the same machine code. On the other hand, it is usually easy to see the relationship between the disassembled code displayed by the debugger and the assembler's listing file.

Typical Hawk debugger inputs and outputs
inputs link.o
keyboard
hawk
outputs display

**Typical Hawk debugger inputs and outputs**
inputs	`link.o` `keyboard`
`hawk`
outputs	`display`

Assuming we start with a single source file called hello.a in the current directory of a Unix-like system, and assuming that the SMAL assembler and linker are smart enough to look elsewhere for the other input files required, as indeed they are, then the dialog between the programmer and the assembler to load and run the hello-world program would be as follows:

Assembling, linking and running the example
$ ls hello.a $ smal hello.a no errors $ ls hello.a hello.l hello.o $ link hello.o no errors $ ls hello.a hello.l hello.o link.o $ hawk link.o

**Assembling, linking and running the example**
	`$ ls hello.a $ smal hello.a no errors $ ls hello.a hello.l hello.o $ link hello.o no errors $ ls hello.a hello.l hello.o link.o $ hawk link.o`

The user typed in the ls command before and after each of the constructive commands in the above example; this requests a listing of all the files in the current directory, and as a result, it is easy to see what files were created at each step between the source program and the executable result.

In the above example, input to the computer system has been shown in boldface, and the prompt output by the system to solicit each command is shown as a dollar sign. It is likely that you will see a different prompt on your computer system, since the string used for the prompt can be easily customized.

The Skeleton of a Hawk Application

Since the publication of The C Programming Language by Kernighan and Ritchie in 1978, introductory tutorials for programming languages have usually begun with an example program that produces some variation on the string "hello world" to the output. Producing such output from an assembly language program takes a bit more work than it does in the original version of the C language, where the following sufficed:

The original C hello-world program

main() { printf("hello, world\n"); }

**The original C *hello-world* program**
main() { printf("hello, world\n"); }

In our assembly language, we must do many things that are done automatically by C and many other high-level languages. Where a C programmer writes main(){} as the skeleton of an empty main program, a Hawk programmer will have to write more:

The skeleton of an empty Hawk program

S START USE "hawk.macs" USE "monitor.h" EXT UNUSED START: ; begin execution here LIL R2,UNUSED ; set up the stack ; ---- application code goes here ---- LIL R1,EXIT JSRS R1,R1 ; call monitor routine to stop! END

**The skeleton of an empty Hawk program**
S START USE "hawk.macs" USE "monitor.h" EXT UNUSED START: ; begin execution here LIL R2,UNUSED ; set up the stack ; ---- application code goes here ---- LIL R1,EXIT JSRS R1,R1 ; call monitor routine to stop! END

Parts of the above code should be familiar from previous chapters. The USE directive to insert the material from the file hawk.macs, for example, was used in the assembly language example in Chapter 4, and the END directive should be familiar from Chapter 3. Other material here, however, is new.

The assembler's S directive is used to tell the loader the starting address for the program, that is, initial value of the program counter to be used to start executing the main program. The argument to the S directive could be any number or expression, but since we want the address of the first executable instruction of the program, we use a label. In this case, we use the identifer START which is defined as a label on the first executable instruction and used as an operand on the S directive. Since this code does not specify an assembly origin, the linker determines where it will be placed in memory. By default, the linker puts all code in read-only memory so that programs may not modify themselves.

The S directive does not assemble any value into memory! All it does is communicate with the loader, telling the loader where to jump to in the loaded program in order to start it.

The header file monitor.h includes the interface definition for all of the routines in our Hawk monitor, a very minimal operating system for the Hawk computer. Our minimal example uses only one of these, the EXIT function that it calls in the second to the last line.

The EXT UNUSED directive tells the assembler that the symbol UNUSED is defined externally, with a value to be provided by the linker. In this case, the linker defines UNUSED to be the lowest unused memory address in RAM, and we are using this as the address of the push-down stack. By convention, we will always use register 2 as the stack pointer, and we will use this stack for subprogram linkage. We use the word subprogram here in the most generic sense to include such language specific concepts as methods, procedures and functions. All of these have the property that they can be called, and all have the property that, when they finish, they return to the caller. This stack is available throughout the program for the storage of any local variables that might be needed. Conceptually, on entry to a subprogram, local variables are pushed onto this stack, and on exit, they are popped from it.

It should be noted that the LIL instruction only works for values that can be correctly represented in 24 bits. For Hawk machines with over 2²⁴ bytes of memory (about 16 megabytes), it would be possible for the value of UNUSED to be too big for the LIL instruction. If this were ever to happen, the linker would give an error when linking the program, and we would have to rewrite our code to use a more general approach to loading large externally defined constants. We cannot solve this using the LIW macro because that macro only works for absolute constants, that is, constants whose value is known at the time the source code is assembled. We will defer a solution to this problem until we have studied more machine instructions.

Calling a Monitor Routine

The final thing our empty program skeleton does is call the monitor's exit routine. The interface to the monitor, given by monitor.h defined symbolic names (as external symbols) for the entry point of each monitor routine. To call a monitor routine, we must load the address of the desired routine into a register and then execute a jump-to-subroutine instruction to transfer control to the called routine. Here, we load the address of the EXIT routine into register 1.

Why did we use R1 and not one of the other 14 registers? In fact, we are free to use any register that is not currently in use for something else; this excludes R2, which we have loaded with the stack pointer, and it excludes any registers needed to hold parameters to the called routine. By convention, however, calls to monitor routines use R1 as a temporary pointer to the routine. The reason for this is that monitor routines always use R1 to hold their return address, so until the jump-to-subroutine instruction saves the return address, R1 is available.

The actual call to the monitor routine is done by the next (and final) instruction in our example, JSRS. All of the Hawk jump-to-subroutine instructions save the current value of the program counter in a register in order to allow for a later return. After they save the program counter, they change the program counter in order to effect the transfer of control to the desired destination. The Hawk has two jump-to-subroutine instructions, one long, JSR, and one short, JSRS. These are memory reference instructions, with the same formats as other memory reference instructions. The short form takes its destination address from a register, while the long form includes a 16-bit constant that is used to compute the destination. The JSRS instruction has the following form:

The Hawk JSRS instruction
07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

1 1 1 1 dst 1 0 1 1 x

**The Hawk JSRS instruction**
`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`	`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

1 1 1 1	dst	1 0 1 1	x

Formally, this does r[dst]=program_counter; program_counter=r[x]

These two assignments are done by the hardware in parallel, so JSRS R1,R1 exchanges the value of the program counter with the value stored in R1. Since we have just loaded the address of the first instruction of the Hawk monitor routine EXIT into R1, the JSRS instruction will transfer control to the EXIT routine while it leaves the address of whatever instruction follows the JSRS instruction in R1.

Because the JSRS instruciton copied the previous value of the program counter to a register, the called routine can use this value to return to the calling routine. We call that register the linkage register, and we also say that this register holds the return address for the call. Because of this use, we also say that the Hawk architecture uses register linkage for subprogram calls. Some other architectures include calling instructions that automatically push the calling address on the stack. This was popular in the 1970's when the Intel 8086 was designed, and therefore, the Intel Pentium does this.

In our skeleton main program, the program itself made absolutely no use of the stack. The only monitor routine it called was the exit routine, and that, as it turns out, never returns and makes no use of the stack either. Several of the Hawk monitor routines, on the other hand, make extensive use of the stack, and when you start writing code for your own subroutines, you will be storing local variables on the stack.

Exercises

c) Type in the skeleton of an empty Hawk program given above, then compile it, link it, and load it under the Hawk emulator so you can inspect the machine code loaded into memory.
d) Hand translate the skeleton of an empty Hawk program given above into a sequence of halfwords starting with first LOAD instruction and continuing until the JSRS instruction. You may have to put queston marks for some or all of the contents of some of these halfwords because they depend on material not given here.

Some Useful Hawk Monitor Routines

The exit service offered by the Hawk monitor is trivial. In fact, it does nothing useful, taking no parameters and never returning. If we want to write an interesting hello-world program, we need something more, a way to put a message on the screen. The Hawk monitor provides the tools we need for this along with many other useful tools. The file monitor.h contains brief documentation of the entire monitor. Here is a list of just some of the routines in this monitor:

EXIT -- terminates the application

Parameters: None.
Return value: Does not return
Side effects: Does not return

DSPINI -- initialize the output display device

Parameters: None.
Return value: Screen dimensions, Width = R3, Height = R4
Side effects:
The next character output will appear at row 0, column 0; this is the upper left-hand corner of the display. Without this, none of the other output routines will work.

DSPAT -- set output display coordinates

Parameters: Column = R3, Row = R4
Return value: None.
Side effects: Uses R3 to R7
The next character output will appear at the indicated row and column on the display, where the upper left is row 0, column 0.

DSPCH -- display one character

Parameters: R3 -- the character to display
Return value: None.
Side effects: Uses R4 to R5
The character will be displayed at the current row and column number on the display, and then the column number will be incremented.

DSPST -- display one null-terminated string

Parameters: R3 -- address of the first byte of the string.
Return value: None.
Side effects: Uses R3 to R7
The characters of the string will be displayed in sequence.

DSPHX -- display one integer in hexadecimal

Parameters: R3 -- integer to print.
Return value: None.
Side effects: Uses R3 to R7
The hexidecimal number will be displayed.

DSPDEC -- display one two's complement integer in decimal
DSPDECU -- display one unsigned integer in decimal

Parameters: R3, R4 -- integer to print and field width.
Return value: None.
Side effects: Uses R3 to R6
The hexidecimal number will be displayed.

KBGETC -- get one character from the keyboard

Parameters: None.
Return value: R3 -- the character typed.
Side effects: Also uses R4
The program will wait until some character is typed.

On entry, all of these monitor routines expect the return address in R1, and they all expect that R2 will hold the stack pointer, pointing to the first free word beyond any part of the stack that is already in use.

The documentation for each monitor routine indicates which registers it uses, but in general, Hawk programmers do not need to remember the specifics of this! Rather, the rule is that if a subroutine takes a parameter, it is passed in R3, and if the subroutine returns a result, it is returned in R3. Additional parameters and results will use R4 and up, and every Hawk monitor routine has permission to destroy the contents of any registers in the range R3 to R7 that are not used for parameters or results.

The Hawk monitor illustrates the general pattern we will use for all Hawk subroutines throughtout this text. All subroutines will be called with the return address in R1, all will use R2 as the stack pointer if they make use of a stack, all will use registers from R3 upward for parameters and results, all will be permitted to destroy the contents of registers R3 through R7, and all will be required to leave R8 through R15 as they were when the routine was called. These conventions are entirely arbitrary!

A Working Hello-World Program

We now have almost enough information in hand to write a program that outputs the string "Hello world!" to the display and then halts. Here is the source code for this program:

A working hello-world program

TITLE A Hello-World Program S START USE "hawk.macs" USE "monitor.h" EXT UNUSED ; the program starts here! START: LIL R2,UNUSED ; set up the stack ; --- begin aplication code LIL R1,DSPINI JSRS R1,R1 ; initialize the display LIL R3,HELLO LIL R1,DSPST JSRS R1,R1 ; output (HELLO) ; --- end aplication code LIL R1,EXIT JSRS R1,R1 ; stop! ; --- begin aplication constants HELLO: ASCII "Hello world!",0 END

**A working *hello-world* program**
TITLE A Hello-World Program S START USE "hawk.macs" USE "monitor.h" EXT UNUSED ; the program starts here! START: LIL R2,UNUSED ; set up the stack ; --- begin aplication code LIL R1,DSPINI JSRS R1,R1 ; initialize the display LIL R3,HELLO LIL R1,DSPST JSRS R1,R1 ; output (HELLO) ; --- end aplication code LIL R1,EXIT JSRS R1,R1 ; stop! ; --- begin aplication constants HELLO: ASCII "Hello world!",0 END

This program is built around the skeleton presented previously, with a TITLE directive added at the very start of the file. This directive is little more than an expensive sort of comment, but in large programs, particularly when they are listed to a printer, it is useful to have things like a meaningful title and page number at the top of each page of output.

The application code added to this program calls two routines in our minimal Hawk monitor, DSPINI to initialize the display for output, and DSPST to display the null-terminated string "Hello world!". Note the use of the LIL instruction to load the address of the start of the string as a parameter to the DSPST routine. There are other ways to load the address of this string. We will discuss these later.

Listing the hello-world program

SMAL32 (rev 2/05) A Hello-World Program 19:46:51 Page 1 Wed Jul 30 2008 1 TITLE A Hello-World Program 2 S START 3 USE "hawk.macs" 4 USE "monitor.h" 5 EXT UNUSED 6 7 ; the program starts here! +000000: E2 +000000 8 START: LIL R2,UNUSED ; set up the sta 9 ; --- begin app +000004: E1 +000000 10 LIL R1,DSPINI +000008: F1 B1 11 JSRS R1,R1 ; initialize the 12 +00000A: E3 +00001A 13 LIL R3,HELLO +00000E: E1 +000000 14 LIL R1,DSPST +000012: F1 B1 15 JSRS R1,R1 ; output (HELLO) 16 ; --- end appli +000014: E1 +000000 17 LIL R1,EXIT +000018: F1 B1 18 JSRS R1,R1 ; stop! 19 +00001A: 48 65 6C 6C 20 HELLO: ASCII "Hello World!",0 6F 20 57 6F 72 6C 64 21 00 21 END

**Listing the *hello-world* program**
SMAL32 (rev 2/05) A Hello-World Program 19:46:51 Page 1 Wed Jul 30 2008 1 TITLE A Hello-World Program 2 S START 3 USE "hawk.macs" 4 USE "monitor.h" 5 EXT UNUSED 6 7 ; the program starts here! +000000: E2 +000000 8 START: LIL R2,UNUSED ; set up the sta 9 ; --- begin app +000004: E1 +000000 10 LIL R1,DSPINI +000008: F1 B1 11 JSRS R1,R1 ; initialize the 12 +00000A: E3 +00001A 13 LIL R3,HELLO +00000E: E1 +000000 14 LIL R1,DSPST +000012: F1 B1 15 JSRS R1,R1 ; output (HELLO) 16 ; --- end appli +000014: E1 +000000 17 LIL R1,EXIT +000018: F1 B1 18 JSRS R1,R1 ; stop! 19 +00001A: 48 65 6C 6C 20 HELLO: ASCII "Hello World!",0 6F 20 57 6F 72 6C 64 21 00 21 END

This listing does not show the contents of the monitor.h file, but the definitions it gives for symbols such as DSPINI, DSPST and EXIT have all been processed. Had monitor.h ben omitted, any use of these symbols would have caused assembly errors. The assembled values of these symbols are shown as +000000. The plus sign before such values indicates that the linker will add something to the zero the assembler output. The values the linker will add here are the starting addresses of each monitor routine. The linker will also adjust the address of the string, HELLO, defined on line 20 and used on line 13. Here, it will add the address of the first location of the program to the 1A₁₆ the assembler outputs.

If we link this program and then use a Hawk emulator to load the linked result, the emulator will show the following display of its initial state. The emulator starts in the halted state so you can see the contents of memory after the program has been loaded but before it begins to run. If it came up running, it would be hard to distinguish between the data originally loaded and damage done by errors.

Ready to run the hello-world program

HAWK EMULATOR /------------------CPU------------------\ /----MEMORY----\ PC: 00001000 R8: 00000000 000FFC: NOP PSW: 00000000 R1: 00000000 R9: 00000000 000FFE: NOP NZVC: 0 0 0 0 R2: 00000000 RA: 00000000 ->001000: LIL R2,#01003C R3: 00000000 RB: 00000000 001004: LIL R1,#000160 R4: 00000000 RC: 00000000 001008: JSRS R1,R1 R5: 00000000 RD: 00000000 00100A: LIL R3,#00101A R6: 00000000 RE: 00000000 00100E: LIL R1,#0001B6 R7: 00000000 RF: 00000000 001012: JSRS R1,R1 **HALTED** r(run) s(step) q(quit) ?(help)

**Ready to run the *hello-world* program**
HAWK EMULATOR /------------------CPU------------------\ /----MEMORY----\ PC: 00001000 R8: 00000000 000FFC: NOP PSW: 00000000 R1: 00000000 R9: 00000000 000FFE: NOP NZVC: 0 0 0 0 R2: 00000000 RA: 00000000 ->001000: LIL R2,#01003C R3: 00000000 RB: 00000000 001004: LIL R1,#000160 R4: 00000000 RC: 00000000 001008: JSRS R1,R1 R5: 00000000 RD: 00000000 00100A: LIL R3,#00101A R6: 00000000 RE: 00000000 00100E: LIL R1,#0001B6 R7: 00000000 RF: 00000000 001012: JSRS R1,R1 HALTED r(run) s(step) q(quit) ?(help)

This display shows an initial PC value of 1000₁₆, and in the column labeled MEMORY, the emulator shows the disassembled value of the instruction at address 1000₁₆ as an LIL instruction. This is the first instruction of the program, but the emulator has no information about the symbolic addresses used in the program, so it shows the operand as a hexadecimal constant instead of the symbolic name that was used in the assembly code. The disassembly process used by the emulator to display the code works directly from the binary values in memory and does not have access to the symbolic information that the assembler and linker discarded in the process of generating the object code that was loaded into memory. The sequence of instructions and the registers they reference, as displayed by the disassembler should be fairly easy to recognize as being the same as the original source code.

Why did the assembly listing say that the first LIL instruction was at address +000000 while the emulator's display shows it at address 001000? The answer lies in a concept that was mentioned in chapter 3, relocation. The assembler did not assign absolute memory addresses to the code it assembled, but only relative addresses, relying on the linker to set the absolute address. This is signified, in the assembley listing, by the leading plus sign on the address. The linker, in this case, has added 1000₁₆ to each address, loading the object code for our example into addresses 1000₁₆ through 105C₁₆.

If we hit the r key at the point where the emulator display window shows the above text, the emulator will run, producing the output "Hello world!" in the bottom half of the screen, and then it will halt. At this point, this is not as interesting as watching the computer execute just one instruction. In this context, the s or n keys direct the emulator to step forward just one instruction. (The difference between these is only apparent when the next instruction calls a subprogram; in that case, s will step into the body of the called routine, while n will allow the call to execute as if it were one instruction.) Here is the state of our system after executing the first two load instructions:

The hello-world program after executing two instructions

HAWK EMULATOR /------------------CPU------------------\ /----MEMORY----\ PC: 00001008 R8: 00000000 000FFC: NOP PSW: 00000000 R1: 00000160 R9: 00000000 000FFE: NOP NZVC: 0 0 0 0 R2: 0001003C RA: 00000000 001000: LIL R2,#01003C R3: 00000000 RB: 00000000 001004: LIL R1,#000160 R4: 00000000 RC: 00000000 ->001008: JSRS R1,R1 R5: 00000000 RD: 00000000 00100A: LIL R3,#00101A R6: 00000000 RE: 00000000 00100E: LIL R1,#0001B6 R7: 00000000 RF: 00000000 001012: JSRS R1,R1 **HALTED** r(run) s(step) q(quit) ?(help)

**The *hello-world* program after executing two instructions**
HAWK EMULATOR /------------------CPU------------------\ /----MEMORY----\ PC: 00001008 R8: 00000000 000FFC: NOP PSW: 00000000 R1: 00000160 R9: 00000000 000FFE: NOP NZVC: 0 0 0 0 R2: 0001003C RA: 00000000 001000: LIL R2,#01003C R3: 00000000 RB: 00000000 001004: LIL R1,#000160 R4: 00000000 RC: 00000000 ->001008: JSRS R1,R1 R5: 00000000 RD: 00000000 00100A: LIL R3,#00101A R6: 00000000 RE: 00000000 00100E: LIL R1,#0001B6 R7: 00000000 RF: 00000000 001012: JSRS R1,R1 HALTED r(run) s(step) q(quit) ?(help)

Three fields in this output have changed from the values shown in the previous illustration. The program counter has advanced over the two load instructions and we are now ready to execute the first call to a Hawk monitor function. Register 2 now points to the first word of the stack, and we can now see that the stack begins at memory location 1003C₁₆. Register 1 now points to 160₁₆, the actual entry point of the DSPINI monitor routine.

What about those two NOP instructions that the emulator found in locations FFC₁₆ and FFE₁₆? The answer to this question is simple! The emulator's disassembler made a mistake. These two memory locations were not intended to be interpreted as instructions. If you use the emulator's t command to toggle the memory display, you will see a display of memory as a table of 32-bit words, each shown in hexadecimal and also as 4-character text strings. When viewed as a hexadecimal number, the word at location FFC₁₆ holds the value zero. If the Hawk processor did try to execute the contents of this location, it would be interpreted as a NOP instruction, but this should never happen.

Exercises

e) Modify the hello-world program given above so that it uses the DSPAT monitor routine to output its message starting on line 5 column 35 of the screen, so the message is more or less centered.
f) Modify the hello-world program given above so that it outputs the message "Hi!" using 3 successive calls to the DSPCH monitor routine.
g) A very bright programmer decides to output the message "Hi" using this fragment of code, loading R1 just once and then using the value twice:
LIL R1,DSPCH LIS R3,'H' JSRS R1,R1 LIS R3,'i' JSRS R1,R1
When the programmer tries this, it outputs the H, but then it starts to behave strangely. What happened? What was the mistake?

Load Effective Address and Load

In our first-draft of the hello-world program, we used LIL to load the address of the string "Hello world!. This left significant work to the linker. The Hawk architecture includes several alternative ways of doing this. In general, load instructions load values into a register. As we have seen, the load immediate instructions load constants. The LOAD instruction is used to load the contents of a memory location into a register, while the LEA instruction loads the effective address of a memory location into a register. The LOAD and LEA instructions are twins, with identical syntax and almost identical meaning. The only difference is that LOAD gets the contents of a memory location while LEA gets the address of that location.

In general, the term effective address refers to the memory address of the operand of an instruciton. All of the Hawk's long memory reference instructions begin by computing an effective address and then using it, for example, to loading something from memory into a register or to store something from a register to memory. Consider the simplest form of the LOAD and LEA instructions:

The Hawk LOAD instruction

07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

1 1 1 1 dst 0 1 0 1 0 0 0 0

15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

const16

The Hawk LEA instruction

07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

1 1 1 1 dst 0 1 1 1 0 0 0 0

15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

const16

Formally, LOAD does R[dst]=M[program_counter+sxt(15,const16)], while LEA does R[dst]=program_counter+sxt(15,const16). This is called program-counter-relative addressing or PC-relative addressing because the 16-bit constant operand is interpreted as a displacement from the current value of the program counter. As a result, this form of the LOAD and LEA instructions can be used to load the contents or address of any location from 32764 bytes prior to the load instruction to 32771 bytes after it. Why not 32768 before to 32767 after? Because the program counter is always incremented by the size of the instruction before that instruction is executed.

In the SMAL Hawk assembly language, as defined by a macro in hawk.macs, the instructions LOAD R1,X and LEA R1,X are assembled into the forms shown above, but with a subtle twist. Instead of assembling the constant X directly into the instruction, the assembler stores X-(.+4). As a result, if X is the label on some nearby word, this form of the LOAD and LEA instructions will load that word or the address of that word into the destination register. The following example illustrates three ways to load the address of the "hello world" string in order to illustrate the use of these instructions:

Three ways to load the address of the same string

LIL R3,HELLO

LEA R3,HELLO

LOAD R3,PHELLO ... ALIGN 4 PHELLO: W HELLO

**Three ways to load the address of the same string**
LIL R3,HELLO	LEA R3,HELLO	LOAD R3,PHELLO ... ALIGN 4 PHELLO: W HELLO

Each of the above approaches to loading the constant address HELLO into a register has advantages and disadvantages. Using the LIL instruction, as we have already noted, only works if the constant to be loaded can be expressed in 24 bits. If that constant is an address, this only works if the address is near the start of memory. Using the LEA LEA instruction, the address loaded must be near the address of the instruction that loads it. The final example is the most general and also the least convenient. This uses the LOAD instruction to load a nearby word holding the desired 32-bit value. This can load any 32-bit value, but it is awkward, a bit slower, and it takes extra memory to hold the 32-bit constant.

In general, when loading the address of a location defined in the same source file, the LEA instruction should be used. We only used the LIL instruction in the hello-world example in order to limit the variety of different instructions used there. If you change that one LIL instruction to an LEA instruction in the hello-world example, and then assemble, link and run the program, it will work exactly as it originally did.

We cannot use LEA R1,DSPST instead of LIL R1,DSPST to call monitor routines. This is because LEA only works if the assembler knows the relative distance, in bytes, between the instruction and the address to be loaded. So long as these are both defined in the same source file, the assembler can use this information easily, but in this case, DSPST is defined in the monitor and depending on how the program is linked, it could easily end up more than 32K bytes away from the point of call.

Exercises

h) Write a program that outputs the string "Hello " followed by the string "world!" (the output will be equivalent to the original hello-world program, but it will call DSPST twice, using LEA instructions where possible.

Control Structures

A program that just outputs the text "Hello world!" is not very interesting. The instructions we have covered so far offered us many different ways to load constants, to add and subtract integers and to load and store from any memory address. More arithmetic operations would be nice, but in theory, all arithmetic can be done with just addition and subtraction. The big thing we are missing is support for control structures.

At the machine language level, the fundamental primitive for control structuring is assignment to the program counter. In high-level languages such as Fortran, Pascal, C or C++, this is done using the goto statement, which forces a control transfer to a specific labeled statement. When introductory programming courses ever mention this statement, this is usually in the context of a warning to never use it. At the machine language level, though, this is all we have.

All more complex control structures must therefore be translated into assignments to the program counter. So, for example, at the end of each loop body, there must be an assignment that forces a jump back to the start of the loop body, and any break statements within the loop must be translated to assignments that force jumps out of the loop. Similarly, if statements will translate to jumps that are conditional on the value of some expression.

Just as the Hawk provides short and efficient ways to load small constants, but it requires long and cumbersome instructions to load large constants, the Hawk also provides a short efficient way jump short distances within the program, while providing longer and more cumbersome support for control transfers that make larger changes to the program counter. It is worth noting that the Intel Pentium and 80x86 family, as well as many other successful computer architectures offer similar tradeoffs.

All of the common control transfer instrucitons on the Hawk use program-counter-relative addressing. The short fast instruction is called the branch instruction, while the long and slower form is called the jump instruction. The short form is given first:

The Hawk BR instruction
07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

0 0 0 0 0 0 0 0 const8

**The Hawk BR instruction**
`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`	`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

0 0 0 0	0 0 0 0	const8

We can use the shorthand notation program_counter=program_counter+(2×sxt(7,const8)) to describe this. Here, the sxt(7,const8) function takes bit 7 of the operand const8 and sign extends the result by replicating this bit in all the places to the left, making it the sign bit of the result.

Notice that if the constant in this instruction is zero, the program counter is unchanged. As a result, the instruction halfword 0000₁₆ is a no-op. As we already noted, in chapter 4, the instruction FFFF₁₆ (a register to register move instruction) is also a no-op. This has a small value to developers of programs in read-only memory because it allows patching such a program by forcing existing instructions to either 0000₁₆ or FFFF₁₆ (whichever is possible in the ROM technology being used) when it is discovered that they need to be deleted from the program, and because blocks of no-ops (either 0000₁₆ or FFFF₁₆, depending on which can be changed to other instructions) can be left in programs as patch space to hold instructions added later.

The Hawk jump instruction is two halfwords long, and it may be used to compute the next address in a program in several ways, but for practical purposes, the simplest form of this instruction does close to the same thing as the branch instruction:

The Hawk JUMP instruction

07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

1 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0

15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

const16

We can use the shorthand notation program_counter=program_counter+sxt(15,const16) to describe this.

The big practical difference between this form of the JUMP instruction and the BR instruction is that the jump instruction can change the program counter to any value in a 64K byte range, from about +32K to -32K, while the branch instruction is limited to a range from +127 to -128 halfwords. Thus, the branch instruction should be used when the destination address is nearby, while the jump instruction should be used when the destination address is far away.

As with other instructions that use program-counter-relative addressing, the SMAL assembler, or rather, the macro definitions in hawk.macs take care of the details of program-counter relative addressing, so that all a programmer needs to do is label the destination instruction and then use that label on the jump or branch that transfers control to it. The following rather foolish example illustrates this:

Using the branch and jump instructions

1 USE "hawk.macs" 2 . = #1000 3 S L1 001000: 11 C1 4 L1: ADDSI R1,1 001002: 00 03 5 BR L3 001004: 11 C2 6 L2: ADDSI R1,2 001006: F0 30 FFF6 7 JUMP L1 00100A: 11 C4 8 L3: ADDSI R1,4 00100C: 00 FB 9 BR L2 10 END

**Using the branch and jump instructions**
1 USE "hawk.macs" 2 . = #1000 3 S L1 001000: 11 C1 4 L1: ADDSI R1,1 001002: 00 03 5 BR L3 001004: 11 C2 6 L2: ADDSI R1,2 001006: F0 30 FFF6 7 JUMP L1 00100A: 11 C4 8 L3: ADDSI R1,4 00100C: 00 FB 9 BR L2 10 END

This example program was written using absolute assembly, setting the assembly origin to 1000₁₆ so that it will appear in exactly the same memory locations as are shown on the assembly listing. Because of this and the fact that it does not use any Hawk monitor routines, the object file can be loaded directly into the Hawk emulator without first running it through the linker.

If you untangle the control structure of this example, you will find that it is an infinite loop. During each iteration, R1 is incremented by 1 by the instruction with the label L1, then by 4 by the instruction with the label L3, and finally by 2 by the instruction with the label L2.

The important thing to observe in this listing is that the BR and JUMP instructions look very similar in assembly language, but that the machine code generated is different. The machine code for the forward branch on line 5 is easy to understand. In this case, the instruction skips over 3 halfwords to get to the line with the label L3, and the constant in the instruction itself is 03₁₆.

The backward branch on line 9 contains the constant FB₁₆ or 11111011₂. This is the two's complement of 00000101₂ or 5, and 5 halfwords are skipped over between the backward branch instruction and the label if you count both the halfword containing the branch itself and the size of the labeled instruction. In the forward direction, both of these were excluded from the count.

The jump instruction on line 7 has the 16-bit constant FFF6₁₆ as an operand. Interpreting this as a two's complement number, this is -10₁₀, and it turns out that the destination label is 10 bytes prior to the jump instruction, including both the bytes of the jump instruction itself and the destination instruction.

Exercises

j) Give the machine code for the shortest infinite loop, expressed using a BR instruction that branches to itself. (Note, the SMAL assembler discourages this with an error message, but the Hawk manual does not make this illegal.)
k) Give the machine code for the second shortest infinite loop, expressed using a JUMP instruction that branches to itself.

An Example Program

Here is a little program, expressed in a C-like notation, that incorporates an infinite loop and calls some of the service routines in the hawk monitor:

An example

int x = 0; char ch = 'a'; dspini(); while (TRUE) { dspat( x, x ); dspch( ch ); x = x + 1; ch = ch + 1; }

**An example**
int x = 0; char ch = 'a'; dspini(); while (TRUE) { dspat( x, x ); dspch( ch ); x = x + 1; ch = ch + 1; }

Running this program, you would expect successive characters to be displayed, one per iteration, starting with the letter 'a'. Each character is displayed one line below and one character to the left of the previous one because of the call to dspat(), so the expected output is a diagonal line of characters extending down and to the right until something causes the program to terminate. If we slow down the computer sufficiently, it will be easy to see this diagonal row of characters growing.

When we translate this program to Hawk machine code, we must find some place to put the variables. Here, we will use registers 8 and 9 because none of the Hawk monitor routines use these registers. Having done this, we can replace the body of our Hawk hello world program with the following:

The example, translated to Hawk assembly language

START: LIL R2,UNUSED ; setup the stack LIS R8,1 ; x = 1; (using R8 for x) LIS R9,'a' ; ch = 'a'; (using R9 for ch) LIL R1,DSPINI JSRS R1,R1 ; dspini(); LOOP: ; while (TRUE) { MOVE R3,R8 MOVE R4,R8 LIL R1,DSPAT JSRS R1,R1 ; dspat(x,x); MOVE R3,R9 LIL R1,DSPCH JSRS R1,R1 ; dspch(ch); ADDSI R8,1 ; x = x + 1; ADDSI R9,1 ; ch = ch + 1; BR LOOP ; }

**The example, translated to Hawk assembly language**
START: LIL R2,UNUSED ; setup the stack LIS R8,1 ; x = 1; (using R8 for x) LIS R9,'a' ; ch = 'a'; (using R9 for ch) LIL R1,DSPINI JSRS R1,R1 ; dspini(); LOOP: ; while (TRUE) { MOVE R3,R8 MOVE R4,R8 LIL R1,DSPAT JSRS R1,R1 ; dspat(x,x); MOVE R3,R9 LIL R1,DSPCH JSRS R1,R1 ; dspch(ch); ADDSI R8,1 ; x = x + 1; ADDSI R9,1 ; ch = ch + 1; BR LOOP ; }

If you substitute this text into the Hawk version of the hello world program, and then assemble and run it, it will go into an infinite loop that begins by printing this text on the screen:

The output

a b c d e f g h

**The output**
a b c d e f g h

Computers are fast enough that it is not terribly interesting to watch this program produce this output at full speed. If you use the the n emulator command to step through the code one instruction at a time for a few iterations, or the i monitor command to advance one full iteration of the loop at a time, you will see how the program works. Be careful not to use the i command until you are inside the loop.

Look back, now at the original code for the example, and compare this with the assembly language code! Note that we have used comments in the assembly language to relate this code to the original C-like code. Some statements in the original reduce to single machine instructions, while others reduce to sequences of instructions. We used the label LOOP for the infinite while loop of the original program, and curiously, the syntactic complexity of the loop header in the original reduced to just a label, while the single end brace marking the end of the loop in the orignal reduced to a machine instruction, the branch back to the start of the loop.

Using names like LOOP for labels in assembly language is very common. A branch or jump instruction gives no hint about whether it is being used to skip the else clause in an if statement or to return to the top of a loop, but if we adopt a systematic way of assigning labels, for example, using the label LOOP to mark the top of a loop, we can make it clear that BR LOOP marks the end of a loop!

Exercises

l) Modify the example program so that it outputs the diagonal line of letters on a sharper diagonal, that is, so that the letter on line i is output in column 2i. You can add a number to itself in order to double it.
m) Modify the example program so that it outputs the diagonal line of letters backwards, starting with z and working back toward a.
n) The example program does not use registers ten and up. Rewrite the example so these hold the addresses of DSPAT and DSPCH and are loaded before the loop begins, instead of being loaded over and over within the loop. How much does this shorten the machine code of the problem, and how many instructions does it eliminate from each iteration?

Condition Codes and Conditional Branches

The branch and jump instructions are sufficient to allow infinite loops, but to write real programs, we need a way to build control structures that don't loop forever! This requires branch instructions that are conditional, branching or not branching depending on the results of previous computations.

In 1965, IBM introduced the System 360, and one of the innovations in this machine was the inclusion of 2 bits in the processor status word called the condition codes. Whenever the machine loaded data into a register or performed arithmetic, these bits were set to report a small amount of information about the result, and the conditional branch instructions in this machine tested the condition codes.

In 1970, Digital Equipment Corporation introduced the PDP-11; this computer introduced the model of condition codes used in the Hawk and many other computers, including the Intel 80x86/Pentium family. The PDP-11 and its successors have the following condition codes:

N - negative
Set to one when the result of the operation was negative; otherwise, it is set to zero.

Z - zero
Set to one when the result of the operation was zero; otherwise, it is set to zero.

V - overvlow
Set to one if an arithmetic operation produces an incorrect two's complement result, for example, if adding two positive numbers produces a negative result; otherwise, it is set to zero.

C - carry
Set to one if an arithmetic operation produces a carry out of the most significant bit; otherwise, it is set to zero.

The Hawk condition codes are stored in the least significant 4 bits of the processor status word. In the Hawk emulator display, they are shown below the processor status word so you can easily see them.

The Hawk ADD and SUB instructions set the condition codes, as do the ADDSI and ADDI instructions. The other instructions we have discussed do not change the condition codes, but there are variants of these that do. Thus, while LOADS, LOAD and MOVE do not change the condition codes, the Hawk architecture offers alternatives, LOADSCC, LOADCC and MOVECC that do. These variants have identically the same formats as the originals, except for a one bit change in the binary representation.

To see if a number is zero or negative, a Hawk programmer can use a load or move instruction such as MOVECC to set the condition codes. To compare two numbers, a Hawk programmer would subtract them so that the condition codes report on the result. When the destination register field of a load, move or arithmetic instruction is zero, the Hawk still computes the desired result and sets the condition codes, but then it discards the result instead of saving it to a register.

Most of the Hawk instructions that discard their results after setting the condition codes have special names. The most important of these is the compare instruction, really subtract storing the result in R0:

The Hawk CMP instruction
07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

1 1 0 1 0 0 0 0 s1 s2

**The Hawk CMP instruction**
`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`	`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

1 1 0 1	0 0 0 0	s1	s2

The instruction CMP R1,R2 subtracts R2 from R1, and we could write it as SUB R0,R1,R2 to mean exactly the same thing. After comparison or subtraction, the Z condition code will be set if the two operands were equal, and the N conditon code will be set if the result was negative, a result that usually indicates that R1 was less than R2.

Similarly, the Hawk architecture provides a compare immediate instruction, CMPI that compares a register with a 16-bit immediate constant. This instruction is really just the add immediate instruction, ADDI with the destination field set to zero so that the result gets discarded. It is the assembler that negates the constant from the user's program, so CMPI R1,C is assembled as if it had been written ADDI R0,R1,-C. Programmers can ignore this because adding a negated number sets the condition codes identically to the way they would have been set by subtracting the original number.

The Hawk architecture offers the same 14 conditional branches that were originally offered by the DEC PDP-11. 8 of these are trivial to understand: BNS, BZS, BVS, and BCS each test one of the condition code bits and branch if that bit is set. These names are mnemonic. For example, BCS stands for branch if C set. For each of these, there is an inverse test, BNR, BZR, BVR, and BCR that branches only if the corresponding condition code bit is reset.

After a comparison using the CMP instruction, use BEQ to branch if the operands were equal, and BNE to branch if they were unequal. These are really just BZS and BZR, renamed for better documentation.

After CMP R1,R2, where R1 and R2 hold two's complement integers, use BGT, BGE, BLE or BLT to branch if R1>R2, R1≥R2, R1≤R2, or R1<R2, respectively. These do not just test the N and Z conditon codes, as you might imagine, but they also check the V (overflow) condition code. If V is set, the sign of the result after subtraction was wrong.

After CMP R1,R2, where R1 and R2 hold unsigned positive integers, use BGTU, BGEU, BLEU or BLTU to branch if R1>R2, R1≥R2, R1≤R2, or R1<R2, respectively. Two of these, BGTU and BLEU are new machine instructions, while the other two, BGEU and BLTU are synonyms for BCS and BCR; why this works is best left for later, when we discuss the arithmetic unit within the central processor.

Exercises

o) Give the Hawk machine code corresponding to CMPI R1,100; try doing this by hand, working from the information given above, before testing your solution using the assembler.
p) CMPI is the same as ADDI with the destination field set to zero. Why can't we convert ADDSI to some kind of CMPSI instruction by setting its destination field to zero?
q) Suppose you use the Hawk to compute the two's complement of one by subtracting it from zero with the SUB instruction. Recall that, in two's complement arithmetic, subtraction is done by adding the one's complement of the subtrahend, with a carry of one added to the rightmost bit. Give the values you would expect this to put in the Hawk condition codes.
r) Suppose you use the Hawk to compute the two's complement of zero by subtracting zero from zero with the SUB instruction. Recall that, in two's complement arithmetic, subtraction is done by adding the one's complement of the subtrahend, with a carry of one added to the rightmost bit. Give the values you would expect this to put in the Hawk condition codes.

Examples illustrating definite loops

We can now modify our example program to halt after a finite number of iterations. In a high level language, we can do this with a construct such as for(x=1;x++;x<9) but this for-loop construct is quite complex, including initialization, increment and exit conditions all together. This is convenient, but an assembly-language programmer must deal with each of these issues separately. Therefore, we must rewrite our program using simpler control structures such as do-while loops. This is shown here:

The example, modified to contain a definite loop

START: LOAD R2,UNUSED ; setup the stack LIS R8,1 ; x = 1; (using R8 for x) LIS R9,'a' ; ch = 'a'; (using R9 for ch) LIL R1,DSPINI JSRS R1,R1 ; dspini(); LOOP: ; do { MOVE R3,R8 MOVE R4,R8 LOAD R1,PDSPAT JSRS R1,R1 ; dspat(x,x); MOVE R3,R9 LOAD R1,PDSPCH JSRS R1,R1 ; dspch(ch); ADDSI R8,1 ; x = x + 1; ADDSI R9,1 ; ch = ch + 1; CMPI R8,9 BLT LOOP ; } while (x < 9) LOAD R1,PEXIT JSRS R1,R1 ; exit();

**The example, modified to contain a definite loop**
START: LOAD R2,UNUSED ; setup the stack LIS R8,1 ; x = 1; (using R8 for x) LIS R9,'a' ; ch = 'a'; (using R9 for ch) LIL R1,DSPINI JSRS R1,R1 ; dspini(); LOOP: ; do { MOVE R3,R8 MOVE R4,R8 LOAD R1,PDSPAT JSRS R1,R1 ; dspat(x,x); MOVE R3,R9 LOAD R1,PDSPCH JSRS R1,R1 ; dspch(ch); ADDSI R8,1 ; x = x + 1; ADDSI R9,1 ; ch = ch + 1; CMPI R8,9 BLT LOOP ; } while (x < 9) LOAD R1,PEXIT JSRS R1,R1 ; exit();

The example above illustrates what is known as a post-test loop. One mildly inconvenient feature of this control structure is that the loop control variable x and the character in ch (in registers 8 and 9) are both incremented before the loop termination test is done. In some circumstances, a programmer may want to avoid changing the values of any variables from their values during the final iteration of the loop body. If this is desired, we must replace the post-test with a test for loop termination before the variables are incremented.

In languages descended from C, mid-loop exits are done with a conditional break statement in the loop body, testing the loop exit condition and breaking out of the loop at the appropriate place. If we test the loop control variable before it is incremented, we must rewrite our termination test. In the example above, we loop while(x<9) but with the termination test moved until before the increment, we must change this to if(x>=8)break.

We will also need to add a new label to mark the end of the loop. The name ENDLOOP is natural for a program containing only one loop. This is the destination address for the conditional branch that we use to implement the if-break construct. Assembly language programmers are strongly urged to adopt systematic naming conventions for labels so that the label names hint at their function in the control structures of a program. Long label names can detract from readability, so if there is more than one loop in a program, we might use the suffix LP to mean loop, allowing us to construct labels such as FIXLP and ENDFIXLP to mark the start and end of a loop that fixes something.

The above ideas are incorporated into a second version of our example program, presented below:

The example, with the loop exit in mid loop

START: LOAD R2,UNUSED ; setup the stack LIS R8,1 ; x = 1; (using R8 for x) LIS R9,'a' ; ch = 'a'; (using R9 for ch) LIL R1,DSPINI JSRS R1,R1 ; dspini(); LOOP: ; while (TRUE) { MOVE R3,R8 MOVE R4,R8 LOAD R1,PDSPAT JSRS R1,R1 ; dspat(x,x); MOVE R3,R9 LOAD R1,PDSPCH JSRS R1,R1 ; dspch(ch); CMPI R8,8 BGE ENDLOOP ; if (x >= 8) break; ADDSI R8,1 ; x = x + 1; ADDSI R9,1 ; ch = ch + 1; BR LOOP ; } ENDLOOP: LOAD R1,PEXIT JSRS R1,R1 ; exit();

**The example, with the loop exit in mid loop**
START: LOAD R2,UNUSED ; setup the stack LIS R8,1 ; x = 1; (using R8 for x) LIS R9,'a' ; ch = 'a'; (using R9 for ch) LIL R1,DSPINI JSRS R1,R1 ; dspini(); LOOP: ; while (TRUE) { MOVE R3,R8 MOVE R4,R8 LOAD R1,PDSPAT JSRS R1,R1 ; dspat(x,x); MOVE R3,R9 LOAD R1,PDSPCH JSRS R1,R1 ; dspch(ch); CMPI R8,8 BGE ENDLOOP ; if (x >= 8) break; ADDSI R8,1 ; x = x + 1; ADDSI R9,1 ; ch = ch + 1; BR LOOP ; } ENDLOOP: LOAD R1,PEXIT JSRS R1,R1 ; exit();

A more fully developed convention for labels used to form control structures might add, in addition to the suffix LP for loops, the suffix IF for labels involved with if statements. Another convention to consider is using shorthand abbreviations of the assertions that are true at some point in the program to name the labels at that point, so the label XBIGGER becomes a natural label for the point in the program where the variable x is bigger than something else.

Exercises

s) Translate the following example program to Hawk assembly language:
int x = 0; char ch = 'a'; dspini(); while (x < 9) { dspat( x, x ); dspch( ch ); x = x + 1; ch = ch + 1; }
t) Translate the following example program to Hawk assembly language:
int x; dspini(); for (x = 1; x < 9; x++) { dspat( x, x ); dspch( '*' ); }
u) Write a Hawk program that produces the following output using a loop that iterates exactly 6 times, where each iteration outputs one letter 4 times, with appropriate calls to dspat(), as needed, to put the letters in the correct rows and columns.
         ABCDEF
        A      A
        B      B
        C      C
        D      D
        E      E
        F      F
         ABCDEF
        
v) Add a nested loop to the example program so that it produces the output:
        a
         bb
          ccc
           dddd
            eeeee
             ffffff
              ggggggg
               hhhhhhhh