5. Assembly Language Programming

Part of CS:2630, Computer Organization Notes
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Assembly into Memory

To run a program, you must translate it from its source language to machine language and load it in memory. While there are assemblers and compilers that directly assemble into memory, most do not. Instead, they use a tool chain with many links. Our SMAL assembler is typical, and the classic C tool chain is even more complex.

When the SMAL assembler processes a source file, say hello.a for the classic hello-world program, the assembled output is placed in an object file called hello.o, with the .o suffix used to indicate that it is an object file. Most assemblers also produce a listing file that shows the source and a textual summary of the object code together. SMAL uses a .l suffix for listing files, so if you assemble hello.a, the listing will be in hello.l.

Typical SMAL assembler inputs and outputs
     
inputs hello.a
hawk.h
stdio.h
ascii.h
smal
outputs hello.o
hello.l
     

 
If the source file was a stand-alone program, one that does not call functions defined elsewhere, and if the source file used only absolute assembly and never assembled data into a relocatable address (a topic mentioned briefly in chapter 3), the object file can be loaded directly into memory.

In the usual case, the object file will need to be linked with other object files in order to create a loadable and executable file. For example, most of our example programs use the services of an operating system called the Hawk monitor. The interface description for the Hawk monitor's input-output routines are in the file stdio.h and the monitor's object code is in monitor.o. The .h and .o suffixes are identical to those commonly used with C and C++ and have the same meanings.

The Hawk monitor is minimal, just sufficient to run simple demonstation programs. Before we run our hello-world program, we must link the object file hello.o with monitor.o to make an executable program. By default, the Hawk linker stores the executable program in link.o, and by default, the executable code is linked so that it will be loaded in read-only memory within the Hawk emulator.

Typical SMAL Hawk linker inputs and outputs
     
inputs hello.o
monitor.o
hawklink
outputs link.o
     

 
Finally, a program called the loader is used to load the program into memory so that it can be executed. The loader is usually a built-in part of the operating system on a modern computer, and each time you run a machine-language program, the loader is called on to load that program before it is run.

Another way to run a loadable object file is to run it under a debugger. Debuggers typically offer a way to observe the internal state of the processor as it executes a program, and they typically also allow examination of the program as it sits in memory. Our Hawk debugger, for example, includes a disassembler that tries to reconstruct the assembly language source from the machine code it finds in memory. This disassembled code is not always the same as the original assembly source because there are many possible source programs that will produce the same machine code. On the other hand, the relationship between the disassembled code displayed by the debugger and the assembler's listing file is usually clear.
 

Typical Hawk debugger inputs and outputs
     
inputs link.o
keyboard
hawk
outputs display
     

 
Assuming we start with a single source file called hello.a in the current directory of a Unix-like system, and assuming that the SMAL assembler and linker are smart enough to look elsewhere for the other input files required, as indeed they are, then the dialog between the programmer and the assembler to load and run the hello-world program would be as follows:
 

Assembling, linking and running the example
                  $ ls
hello.a
$ smal hello.a
  no errors
$ ls
hello.a hello.l hello.o
$ hawklink hello.o
  no errors
ls
hello.a hello.l hello.o link.o
$ hawk link.o
                 

 
The user typed in the ls command before and after each of the constructive commands in the above example; this requests a listing of all the files in the current directory, and as a result, it is easy to see what files were created at each step between the source program and the executable result.

In the above example, input to the computer system has been shown in boldface, and the prompt output by the system to solicit each command is shown as a dollar sign. It is likely that you will see a different prompt on your computer system, since the string used for the prompt can be easily customized.

The C Tool Chain

Users of C should be familiar with the cc command to compile and link a C program. In fact, cc hides the entire C toolchain. It runs the C preprocessor, then runs the C compiler, then runs the assembler and finally, it runs the linker. Command line options on the cc command can be used to stop it after any one of these steps. Unfortunately, the names of these options are far from obvious.

For example, cc -E file.c will only run the preprocessor, sending the preprocessed output to standard output. To just run the compiler, producing assembly language output in file.s, use cc -S infile.c. To compile and assemble, producing an object code in file.o, use cc -c infile.c. Note, of course, that the .s and .o files are in the assembly and object languages of your computer, not the SMAL language discussed here, so the cc -S command will not produce Hawk code; on most systems, it will produce Intel 80x86 assembly code, but on a Raspberry Pi, it will produce ARM code.

The Skeleton of a SMAL Hawk Application

The original C hello-world program
     
#include <stdio.h>
int main() {
        printf("hello, world\n");
	return 0;
}
     

In our assembly language, we must do many things that are done automatically typical high-level languages such as C or C++. Where a C programmer writes main(){} as the skeleton of an empty main program, a Hawk programmer will have to write more:

The skeleton of an empty Hawk program
        USE     "hawk.h"
        USE     "stdio.h"
ARSIZE  =       4
        INT     MAIN
        S       MAIN
MAIN:                           ; executable code begins here
        STORES  R1,R2
        ADDI    R2,R2,ARSIZE

; ----  application code goes here ----

        ADDI    R2,R2,-ARSIZE
        LOADS   PC,R2           ; return to the monitor to stop!
        END

You do not need to understand this code yet. You just need to use it, verbatim, as a wrapper around your code. Even so, some of it should already make sense since there is only one new machine instruction here.

The most important thing to understand about this skeleton is that it does nothing useful. It is equivalent to the C main program main(){}, that it, it expects no parameters, it does nothing and it returns nothing. We cannot go beyond this minimum until we discuss the Hawk monitor, a very small operating system for the Hawk. Even so, we have written quite a bit of code here that is worth at least a bit of explaining.

The USE directives insert material from other files, just like the #include directives in C programs. The SMAL Hawk stdio.h use file defines the interface to the Hawk Monitor's input-output library, just as the C stdio.h include file defines the interface to the C input-output library. As noted in Chapter 4, the SMAL assembler doesn't know anything about the Hawk architecture unless it reads the hawk.h use file.

The definitions of the symbol ARSIZE allows for the number of bytes used by the local variables of the main program. The use of this symbol will become clearer when we discuss local variables in the next chapter.

The label MAIN marks the start of the executable code. When you run a program, it will be loaded in memory along with the code of the Hawk monitor, a rudimentary operating system. The monitor runs first, and once it has initialized the computer, it calls your main program. By default, all symbols defined an assembly file are local, so this code includes a special assembler directive, INT MAIN, to tell the assembler that the internal or local symbol MAIN should be globally visible. This allows the monitor to see it and call it. INT MAIN does not mean integer, it means internal!

When you start the Hawk computer, it will run whatever code is in memory, starting at location zero. Usually this is the Hawk monitor. The directive S MAIN tells the Hawk debugger to stop when it gets to MAIN. The stop directive does not assemble any value into memory; rather, it communicates through the linker and the loader to tell the Hawk debugger where to set the breakpoint before running. The S directive is optional. There is no need for it in a fully debugged program, but if you set the breakpoint, the computer will stop just before the labeled instruction so you can inspect things and optionally run the code one instruction at a time. In a partially debugged program, you might want to set the breakpoint right before the place where it seems to fail.

This skeleton code given here does not specify an assembly origin. This leaves the question of where the instructions will be loaded to be answered by the linker. By default, the linker will put all code in read-only memory so that programs may not modify themselves.

Exercises

a) Type in the skeleton of an empty Hawk program given above, then compile it, link it, and load it under the Hawk emulator so you can inspect the machine code loaded into memory.

b) Hand translate the skeleton of an empty Hawk program given above into a sequence of halfwords starting with first executable instruction and continuing until the LOADS instruction.

The Hawk Monitor

The Hawk monitor is a very minimal operating system for the Hawk computer. Your main program may allocate local variables, and many of the routines in the monitor allocate their own local variables. To allow this, before the monitor calls your main program, it sets up register 2 as a pointer to the lowest unused memory address in RAM. This block of unused memory is called the stack. Any function, procedure or method is free to allocate space on this stack, and in fact, as we will eventually see, the Hello World program uses one word of stack space.

The terms method, function and procedure all carry baggage. Methods in object-oriented languages apply to objects that are instances of classes, and we have not discussed classes yet. In mathematics, a function must compute a value from its arguments, and in some languages, procedures may not have return values. We use the term subroutine here as a generic term to covers all of these. A subroutine is simply a subsidiary part of a program, with no implication of object orientation or a return value. A subroutine can be called and it should eventually return.

Main programs under the Hawk monitor are pure procedures, with no return value or implication of object orientation. In some languages, they are more complex. In C and C++, for example, the main program is a function that returns an integer, although most programmers ignore this return value.

The Hawk monitor consists of a number of subroutines. Before calling a routine in the monitor, parameters to that routine must be loaded into the appropriate registers and the address of the desired routine must also be in some register. The actual call uses a jump-to-subroutine instruction to transfer control from the caller to the called routine. We will look at that instruction in much more detail in the next chapter.

How do we pass parameters and return results? The author of each subroutine determines the answer. By convention, on the Hawk and most modern computers, parameters and results are passed in registers. All Hawk monitor routines use registers 3 and up. If a routine has just one parameter, this will be passed in R3, while if it has two parameters, they will be passed in R3 and R4. Similarly, function return values, if any, are in R3.

All Hawk monitor routines may make unrestriced use of R3 to R7, while the caller may safely assume that R8 to R15 will not be altered by monitor calls. By convention, calls to monitor routines use R1 as a temporary pointer to the routine. The reasons for this will be clear when we discuss how subroutine calls and the jump-to-subroutine instruction actually work.
 

Input-Output and stdio.h

Many of the input/output routines in the Hawk monitor correspond to similar routines in the C and C++ standard libraries. Full documentation for the Hawk monitor routines is found in the comments in the header file; the abridged documentation is based on that file. Note that many of the Hawk monitor routines documented here closely match routines in the C standard library. Be begin with a list of some of the output routines in stdio.h:
 

PUTCHAR — display one character on the screen
Parameters: R3 — the character to display
Return value: None.
The character will be output at the current row and column on the display, and then the column will be incremented.

C equivalent: putchar( 'c' )
outputs a character to standard output, which defaults to the terminal window.

PUTS — display one null-terminated string with a newline.
Parameters: R3 — address of the first byte of the string.
Return value: None.
The characters of the string will be displayed in sequence, with an LF appended.

C equivalent: puts( "any string" )
outputs all of the characters of a string, in sequence, to standard output, with '\n' appended.

PUTSTR — display one null-terminated string
Parameters: R3 — address of the first byte of the string.
Return value: None.
The characters of the string will be displayed in sequence.

C equivalent: fputs( "any string", stdout )
outputs all of the characters of a string, in sequence, to standard output.

PUTHEX — display an unsigned integer in hexadecimal
Parameters: R3, R4 -- integer to output and field width.
Return value: None.
The integer will be displayed in hexadecimal, padded with leading zeros to the given field width.

C equivalent: printf( "%05X", n )
The number n will be written in hexadecimal to standard output. The field width 05 is a constant in the format string, The capital X in the format string specifies upper-case hexadecimal output.

PUTDEC — display a two's complement integer in decimal
PUTDECU — display an unsigned integer in decimal
Parameters: R3, R4 — integer to output and field width.
Return value: None.
The number will be displayed in decimal. If there are fewer digits than the field width, leading blanks will be added.

C equivalent: printf( "%5d", n ) and printf( "%5u", n )
The number n will be written in decimal to standard output. The field width 5 is a constant in the format string, The capital X in the format string makes specifies upper-case hexadecimal output. The %d or %u in the fomat string indicates whether n is a signed or unsigned integer.

PUTAT — set output display coordinates
Parameters: Column/X = R3, Row/Y = R4
Return value: None.
The next character output will appear at the indicated row and column on the display, where the upper left is row 0, column 0.

C equivalent: none

Many output conversion routines in C are only accessible through directives in printf format strings. There must be hidden routines inside printf for thse, but they are not offered to the public. The printf routine itself is very complex and has no provisions for output conversions for user-defined types.

The Hawk monitor contains only rudimentary tools for keyboard input, but they correspond closely to routines in the C stdio.h header file.

GETCHAR — get one character from the keyboard
Parameters: None.
Return value: R3 — the character typed.
The program will wait until some character is typed.

C equivalent: ch = getchar()

GETS — get one line of text from the keyboard
Parameters: R3 — memory address of the string buffer.
Return value: none
The line of text typed on the keyboard will be echoed to the screen and stored in memory as a null-terminated string, starting at the indicated memory address.

C equivalent: gets( buffer )
There are strong warnings in the C library documentation to never use this routine. We will discuss the risks later.

Utility routines in stdlib.h

The Hawk monitor contains several routines that are noteworthy, with interface definitions in stdlib.h:

EXIT — terminates the application
Parameters: None.
Return value: Does not return.
Halts the computer, leaving all registers unchanged so that their values can be inspected. A call to EXIT under the Hawk monitor is equivalent to a return from the main program.

C equivalent: exit(status)
The exit status should be either EXIT_SUCCESS (0) or EXIT_FAILURE (1). Scripts may be conditional on how an application exits.

TIMES — multiply two signed integers
TIMESU — multiply two unsigned integers
Parameters: multiplicand = R3, multiplier = R4 — integers to multiply.
Return value: R3 — the product
Needed because the Hawk has no multiply instruction.

C equivalent: product = multiplier * multiplicand
In C, a signed multiply will be used unless both operands are declared to be unsigned.

DIVIDEU — divide two unsigned integers
Parameters: dividend = R3, divisor = R4 — integers to divide.
Return value: quotient R3, remainder = R4 — the result.
This is needed because the Hawk has no divide instruction.

C equivalent: quotient = dividend / divisor
    remainder = dividend % divisor
The division algorithm gives both the remainder and quotient, but C and most other languages use two separate operators.

On entry, all Hawk monitor routines expect the return address in R1, and they expect that R2 will hold the stack pointer, pointing to the first free word beyond any part of the stack already in use. The main program must follow the same rules.

The Hawk monitor routines follow the pattern we will use for all Hawk subroutines throughtout this text. All subroutines will be called with the return address in R1, all use R2 as the stack pointer if they need a stack, all use registers from R3 up for parameters and results, all may destroy the contents of registers R3 through R7, and all are required to leave R8 through R15 unchanged when the routine returns. These conventions are arbitrary and will be reviewed again in the next chapter.

Calling a Monitor Routine

Suppose we want to output the character X to the display screen. In C, we would write putchar('X'); the analogous assembly language code to call PUTCHAR in the Hawk monitor is:
 

Code equivalent to putchar('X')
     
        LIS     R3,'X'          ; put parameter in place
        LIL     R1,PUTCHAR      ; get address of PUTCHAR
        JSRS    R1,R1           ; call the subroutine
     

The comments in the above assembly language code describe what each instruction does. The first two instructions were discussed in the previous chapter. Both simply load constant values into registers. The character constant 'X' is only an 8-bit value, so the LIS instruction is used. The address of the subroutine PUTCHAR is unknown, but we know we can load it with the LIL instruction since this can load any 24-bit value, and the the highest RAM address in our minimal Hawk computer is 1FFFF16, only 17 bits.

The actual call to the monitor routine is done by the next (and final) instruction in our example, JSRS. All of the Hawk jump-to-subroutine instructions save the current value of the program counter in a register in order to allow for a later return. After they save the program counter, they change the program counter in order to actually transfer control to the desired destination. The Hawk has two jump-to-subroutine instructions, one long, JSR, and one short, JSRS. These are memory reference instructions, with the same formats as other memory reference instructions. The short form takes its destination address from a register, while the long form includes a 16-bit constant that is used to compute the destination. The JSRS instruction we used has the following form:
 

The Hawk JSRS instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
1 1 1 1 dst 1 0 1 1 x

Formally, this does r[dst]=program_counter; program_counter=r[x]

These two assignments are done by the hardware in parallel, so JSRS R1,R1 exchanges the value of the program counter with the value stored in R1. Since we have just loaded the address of the first instruction of the Hawk monitor routine PUTCHAR into R1, the JSRS instruction will transfer control to PUTCHAR while it leaves the address of whatever instruction follows the JSRS instruction in R1.

Because the JSRS instruciton copied the previous value of the program counter to a register, the called routine can use this value to return when it is done. We call that register the linkage register, and we also say that the linkage register holds the return address for the call. Because of this use, we also say that the Hawk architecture uses register linkage for subprogram calls. Some other architectures have call instructions that automatically push the calling address on the stack. This was popular in the 1970's when the Intel 8086 was designed, and therefore, computers in the Intel 80x86 family do this.

Our skeleton main program is itself a subroutine, so as we will discuss in the next chapter, it begins by saving the return address and ends by restoring the return address and using it to return. While the body of the main program is running, the return address is saved on the stack.
 

A Working Hello-World Program

We now have enough information in hand to write a program that outputs the string "Hello world!" to the display and then halts. Here is the source code for this program:

A working hello-world program
        TITLE   "A Hello-World Program"
        USE     "hawk.h"
        USE     "stdio.h"
ARSIZE  =       4
        INT     MAIN
        S       MAIN
MAIN:                        ; entry point
        STORES  R1,R2
        ADDI    R2,R2,ARSIZE

;  --- begin aplication code ---
        LIL     R3,HELLO
        LIL     R1,PUTS
        JSRS    R1,R1        ; puts(HELLO)
;  --- end aplication code ---

        ADDI    R2,R2,-ARSIZE
        LOADS   PC,R2        ; return

; --- begin aplication constants ---
HELLO:  ASCII   "Hello world!",0
        END

The program uses the skeleton presented above, with a TITLE directive added at the start to tell the assembler what to put at the top of each page of the listing. The application code calls the Hawk monitor routine PUTS to display the null-terminated string "Hello world!", using the LIL instruction to load the address of this string parameter. We will discuss other ways to do this later. Here is the listing file produced by the assembler:

SMAL listing of the hello-world program
SMAL32 (rev  9/11)              A Hello-World Program        21:01:29  Page  1
                                                             Mon Jul 22 2019

                                 1          TITLE   "A Hello-World Program"
                                 2          USE     "hawk.h"
                                 3          USE     "stdio.h"
                                 4  ARSIZE  =       4
                                 5          INT     MAIN
                                 6          S       MAIN
                                 7  MAIN:                        ; entry point
+00000000: F1  A2                8          STORES  R1,R2
+00000002: F2  62  0004          9          ADDI    R2,R2,ARSIZE
                                10
                                11  ;  --- begin aplication code ---
+00000006: E3 +000016           12          LIL     R3,HELLO
+0000000A: E1 +000000           13          LIL     R1,PUTS
+0000000E: F1  B1               14          JSRS    R1,R1        ; puts(HELLO)
                                15  ;  --- end aplication code ---
                                16
+00000010: F2  62  FFFC         17          ADDI    R2,R2,-ARSIZE
+00000014: F1  D2               18          LOADS   PC,R2        ; return
                                29
                                20  ; --- begin aplication constants ---
+00000016: 48  65  6C  6C       21  HELLO:  ASCII   "Hello world!",0
+0000001A: 6F  20  77  6F
+0000001E: 72  6C  64  21
+00000022: 00
                                22          END

This listing does not show the contents of the stdio.h file, but if the definitions from that file had not been included, the symbol PUTS would have been undefined. The listing shows the value of PUTS as +000000. The plus sign indicates that the linker will add something to this to make the correct address. The linker will also adjust the address of the string, HELLO, defined on line 20 and used on line 13. Here, it will add the address of the first location of the program to the 1816 the assembler outputs.

After linking this program and loading the linked result with the Hawk emulator, the emulator will display the initial system state. You can inspect this state if you want before you start the emulator.

The Hawk emulator at startup
 HAWK EMULATOR
   /------------------CPU------------------\   /----MEMORY----\
   PC:  00000000                R8: 00000000 ->000000: LIL     R2,#01005C
   PSW: 00000000  R1: 00000000  R9: 00000000   000004: JSR     R1,#000220
   NZVC: 0 0 0 0  R2: 00000000  RA: 00000000   000008: LIL     R1,#001000
                  R3: 00000000  RB: 00000000   00000C: JSRS    R1,R1
                  R4: 00000000  RC: 00000000   00000E: BR      #000000
                  R5: 00000000  RD: 00000000   000010: CPUSET  R2,#3
                  R6: 00000000  RE: 00000000   000012: LIL     R2,#01001C
                  R7: 00000000  RF: 00000000   000016: STORES  R5,R2

 **HALTED**  r(run) s(step) q(quit) ?(help)

This display shows the initial PC value, 0000000016, and in the column labeled MEMORY, it shows the disassembled value of the instruction at address 00000016 as an LIL instruction. Pressing the r key at this point starts the emulator running the Hawk monitor until the program counter equals the breakpoint address that was set by the S MAIN directive in your main program. At that point, it stops again, with this display:

Ready to run the hello-world program
 HAWK EMULATOR
   /------------------CPU------------------\   /----MEMORY----\
   PC:  00001000                R8: 00000000   000FFC: NOP
   PSW: 0000FE01  R1: 0000000E  R9: 00000000   000FFE: NOP
   NZVC: 0 0 0 1  R2: 0001005C  RA: 00000000 -*001000: STORES  R1,R2
                  R3: FF0003D0  RB: 00000000   001002: LEACC   R2,R2,#0004
                  R4: 00010050  RC: 00000000   001006: LIL     R3,#001018
                  R5: 00000000  RD: 00000000   00100A: LIL     R1,#00043A
                  R6: 00000000  RE: 00000000   00100E: JSRS    R1,R1  
                  R7: FF000000  RF: 00000000   001010: LEACC   R2,R2,#FFFC

 **HALTED**  r(run) s(step) q(quit) ?(help)

This display shows that the PC now equals 0000100016; this is where linker puts the main program by default. To the right, the memory inspection column shows that the instruction at this address is STORES R1,R2, the first instruction of our program. The emulator does not use the assembly source file to display this column; instead, it decodes the actual contents of memory. This is why it shows the second instruction as LEACC instead of the ADDI in the source file, and why it does not show symbolic operand names. Look up ADDI in the Hawk manual and you will find that it an alternate name for LEACC. Despite such changes, the sequence of instructions disassembled and displayed by the emulator should be fairly easy to match up with the assembly listing.

Why did the assembly listing say that the first STORES instruction was at address +00000000 while the emulator's display shows it at address 00001000? The answer lies in a concept that was mentioned in chapter 3, relocation. The assembler did not assign absolute memory addresses to the code it assembled, but only relative addresses, relying on the linker to set the absolute address. In the assembly listing, this is signified by the leading plus sign on the address. The linker, in this case, has added 100016 to each address, loading the object code for our example into addresses 0000100016 through 0000105C16.

 
If we press the r (run) key when the emulator display window shows the above text, the emulator will run, producing the output "Hello world!" in the bottom half of the screen, and then it will halt back at location zero, ready to run the program again, if needed, and showing the registers as they were when the program ended.

There is an alternative to hitting the r key to run the program. We can watch the program execute one instruction at a time in order to observe the fetch-execute cycle in action. The emulator interprets the The s (step) key as a request to run just one fetch-execute step, and it interprets the n (next) key as a request set the breakpoint at the next consecutive instruction and then run until the program counter points to the breakpoint. In either case, the register display is updated to show the result of the execution step.

The s and n commands to the Hawk debugger do much the same thing except when the next instruction is a subroutine call or a branch. For subroutines, s will step into the body of the called routine, while n will execute the called routine if it were one instruction, Here is the state of our system after executing the first three instructions of the main program using the s key:

The hello-world program after three instructions
  /------------------CPU------------------\   /----MEMORY----\
   PC:  0000100A                R8: 00000000   000FFC: NOP
   PSW: 00000100  R1: 0000000E  R9: 00000000   000FFE: NOP
   NZVC: 0 0 0 0  R2: 00010060  RA: 00000000  *001000: STORES  R1,R2
                  R3: 00001018  RB: 00000000   001002: LEACC   R2,R2,#0004
                  R4: 00010050  RC: 00000000   001006: LIL     R3,#001018
                  R5: 00000000  RD: 00000000 ->00100A: LIL     R1,#00043A
                  R6: 00000000  RE: 00000000   00100E: JSRS    R1,R1  
                  R7: FF000000  RF: 00000000   001010: LEACC   R2,R2,#FFFC

Several fields have changed from the values shown in the previous illustration. The program counter has advanced from 0000100016 to 0000100A16. The three instructions that were executed stored the contents of R1 in the memory location pointed by R2, incremented R2 by 4, and loaded R3 with 0000101816, the address of the first character in the string "Hello World!" The starred instruction in the memory display shows that the breakpoint has not been changed; it still marks the start of the main program. The memory display also includes an arrow (->) marking the instruction pointed to by the program counter.

What about those two NOP instructions that the emulator found in locations FFC16 and FFE16? The answer to this question is simple. The emulator's disassembler made a mistake. These two memory locations were not intended to be interpreted as instructions. If you use the emulator's t command to toggle the memory display, you will see a display of memory as a table of 32-bit words, each shown in hexadecimal and also as 4-character text strings. When viewed as a hexadecimal number, the word at location FFC16 holds the value zero. If the Hawk processor did try to execute the contents of this location, it would be interpreted as a NOP (do nothing) instruction.

Exercises

c) Change the hello-world program to find out what the different control characters do. Try the characters from \b (BS) to \r (CR), and try them in both C and SMAL Hawk code.

d) Modify the hello-world program given above so that it outputs the message "Hi!" using 3 successive calls to the PUTCHAR monitor routine.

e) Someone tries to output the message "Hi" using this code fragment in place of the call to PUTS in the hello-world program, loading R1 just once but using it twice:

    LIL R1,PUTCHAR
    LIS R3,'H'
    JSRS R1,R1
    LIS R3,'i'
    JSRS R1,R1

When this runs, it outputs the 'H' but never outputs the 'i'. What went wrong and what was the mistake?

Load Effective Address and Load

In the hello-world program given above, we used LIL to load the address of the string "Hello world!". This left work to the linker, since the assembler does not know where in memory the code will be loaded. The Hawk architecture includes several alternatives to this. Generally, load instructions load values into registers. For example, load immediate instructions load constants. Here, we will look at the LOAD and LEA instructions.

LOAD and LEA have identical instruction formats and syntax. Both compute the address of a memory location, called the effective address. LEA, load effective address, loads that address into the destination register, while LOAD uses that address to load a word of data from memory. All Hawk memory reference The simplest forms of the LOAD and LEA instructions are:

The Hawk LOAD instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
1 1 1 1 dst 0 1 0 1 0 0 0 0
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
const16

The Hawk LEA instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
1 1 1 1 dst 0 1 1 1 0 0 0 0
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
const16

Formally, both begin by computing the effective address ea=program_counter+sxt(15,const16)].

Formally, LEA does R[dst]=ea.

Formally, LOAD does R[dst]=M[ea].

This is called program-counter-relative addressing or PC-relative addressing because the constant operand is used as a displacement from the program counter. This form of LOAD and LEA can be used to load the contents of or address of memory locations anywhare from 32764 bytes before the instruction to 32771 bytes after it. Why not 32768 before to 32767 after? Because the program counter is incremented before it is used.

A macro in hawk.h defines the instructions LOAD R1,X and LEA R1,X with a twist. The constant X is not assembled directly into the instruction. Instead, the assembler uses X-(.+4). As a result, if X is the label on a nearby word, these instructions will compute the address of that word as their effective address. If X is not defined as a nearby label, the result is an error message such as "misuse of relocation" or "value out of bounds."

Three ways to load the address of the same string
        LIL     R3,HELLO 
        LEA     R3,HELLO 
         LOAD    R3,PHELLO 

         ...

         ALIGN   4
 PHELLO: W       HELLO

Each of the three examples above loading the constant address HELLO into R3. They each have their own advantages and disadvantages. The LIL instruction, as already noted, only works if the constant to be loaded can be expressed in 24 bits. If that constant is an address, this only works when the address is somewhere in the first 8 megabytes of memory. With the LEA instruction, the address loaded must be near the address of the instruction that loads it, but may be anywhere in memory. The last example is the most general but the least convenient. Here, the LOAD instruction gets the contents of nearby word holding the desired value. This can load any 32-bit value, but it is slower and it takes more memory.

In general, when loading the address of a location defined in the same source file, the LEA instruction should be used. We only used the LIL instruction in the hello-world example in order to limit the variety of different instructions used there. If you change that one LIL instruction to an LEA instruction in the hello-world example, and then assemble, link and run the program, it will work exactly as it originally did.

We cannot use LEA R1,DSPST instead of LIL R1,DSPST to call monitor routines. This is because LEA only works if the assembler knows the relative distance, in bytes, between the instruction and the address to be loaded. So long as these are both defined in the same source file, the assembler can use this information easily, but in this case, DSPST is defined in the monitor, so it is a different source file, and depending on how the program is linked, it could easily end up more than 32K bytes away from the point of call.

Exercises

f) Modify the C hello-world program to call puts("hello") and then puts("world!").

g) Modify the SMAL Hawk hello-world program to call PUTS twice, once to output "Hello" and once to output "world!".

h) Use LEA instructions where appropriate in your solution to g).

Control Structures

A program that just outputs the text "Hello world!" is not very interesting. The instructions we have covered so far offered us many different ways to load constants, to add and subtract integers and to load and store from any memory address. More arithmetic operations would be nice, but in theory, all arithmetic can be done with just addition and subtraction. The big thing we are missing is support for control structures.

At the machine level, the fundamental primitive from which control structures are built is assignment to the program counter. In high-level languages such as C or C++, this is done using the goto statement to transfer control to a specific labeled statement. When goto is mentioned in introductory programming courses, it is usually in the context of a warning to never use it. At the machine language level, it is unavoidable.

All more complex control structures must therefore be translated into assignments to the program counter. For example, at the end of each loop body, there must be an assignment that forces a jump back to the start of the loop body, and any break statements within the loop must be translated to assignments that force jumps out of the loop. Similarly, if statements will translate to jumps that are conditional on the value of some expression.

Just as computers typically provide short and efficient ways to load small constants along with longer and slower instructions to load large constants, the Hawk also provides a short efficient way jump short distances within the program, while providing longer and more cumbersome support for control transfers to jump longer distances. The Intel Pentium/80x86 family offers similar tradeoffs, as do most other computers.

All of the common control transfer instrucitons on the Hawk use program-counter-relative addressing. The short fast instruction is called the branch instruction, while the long slow instructions is called the jump instruction.

The Hawk BR instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
0 0 0 0 0 0 0 0 const8

Formally, program_counter=program_counter+(2×sxt(7,const8))

That is, twice the sign-extended operand is added to the program counter. Multiplying the sign extended constant by two to makes it a displacement in halfwords, not bytes.

Notice that if the constant in this instruction is zero, the program counter is unchanged. As a result, the instruction halfword 000016 is a no-op. As we noted in Chapter 4, the instruction FFFF16 (a register to register move instruction) is also a no-op. This has a small value to developers of programs in read-only memory because it allows patching such a program by forcing existing instructions to either 000016 or FFFF16 (whichever is possible in the ROM technology being used) when it is discovered that they need to be deleted from the program, and because blocks of no-ops (either 000016 or FFFF16, depending on which can be changed to other instructions) can be left in programs as patch space to hold instructions added later.

The Hawk jump instruction is two halfwords long, and it may be used to compute the next address in a program in several ways. The simplest form of jump instruction does close to the same thing as the branch instruction:

The Hawk JUMP instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
1 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
const16

Formally, program_counter=program_counter+sxt(15,const16) describes this.

The big practical difference between this form of the JUMP instruction and the BR instruction is that the jump instruction can change the program counter to any value in a 64K byte range, from about +32K to -32K, while the branch instruction is limited to a range from +127 to -128 halfwords. Thus, the branch instruction should be used when the destination address is nearby, while the jump instruction should be used for distant destinations.

As with other instructions that use program-counter-relative addressing, the SMAL assembler, or rather, the macro definitions in hawk.h take care of the details of program-counter relative addressing, so that all a programmer needs to do is label the destination instruction and then use that label on the jump or branch that transfers control to it. The following rather foolish example illustrates this:

Using the branch and jump instructions
                                 1          USE     "hawk.h"
                                 2  .       =       #1000
                                 3          S       L1
 00001000: 11  C1                4  L1:     ADDSI   R1,1
 00001002: 00  03                5          BR      L3
 00001004: 11  C2                6  L2:     ADDSI   R1,2
 00001006: F0  30  FFF6          7          JUMP    L1
 0000100A: 11  C4                8  L3:     ADDSI   R1,4
 0000100C: 00  FB                9          BR      L2
                                10          END

This example program was written using absolute assembly, setting the assembly origin to 100016 so that it will appear in exactly the same memory locations as are shown on the assembly listing. Because of this and the fact that it does not use any Hawk monitor routines, the object file can be loaded directly into the Hawk emulator without first running it through the linker.

If you untangle the control structure of this example, you will find that it is an infinite loop. During each iteration, R1 is incremented by 1 by the instruction with the label L1, then by 4 by the instruction with the label L3, and finally by 2 by the instruction with the label L2.

The important thing to observe in this listing is that the BR and JUMP instructions look very similar in assembly language, but that the machine code generated is different. The machine code for the forward branch on line 5 is easy to understand. In this case, the instruction skips over 3 halfwords to get to the line with the label L3, and the constant in the instruction itself is 0316.

The backward branch on line 9 contains the constant FB16 or 111110112. This is the two's complement of 000001012 or 5, and 5 halfwords are skipped over between the backward branch instruction and the label if you count both the halfword containing the branch itself and the size of the labeled instruction. In the forward direction, both of these were excluded from the count.

The jump instruction on line 7 has the 16-bit constant FFF616 as an operand. Interpreting this as a two's complement number, this is -1010, and it turns out that the destination label is 10 bytes prior to the jump instruction, including both the bytes of the jump instruction itself and the destination instruction.

Exercises

i) Give the machine code for the shortest infinite loop, expressed using a BR instruction that branches to itself. (The assembler will complain when it assembles this instruction, but it does work.)

j) Give the machine code for the second shortest infinite loop, expressed using a JUMP instruction that branches to itself.

An Example Program

Here is a little C program fragment that is nothing but an infinite loop to output successive characters in a diagonal line down the screen.

An example in C
for (;;) {
    putchar( ch );
    putchar( '\v' );
    ch = ch + 1;
}

If you subsitute this code for the body of the hello-world program, you will find that it works but that is runs so quickly that it is almost impossible to see the output. Try the shell command a.out|more to slow it down.) The for(;;) construct is a common way to write an infinite loop in C. This is a for loop with no initalization, no increment and no terminating condition.

When we translate this to Hawk code, we need a place to put the variable ch. Here, we will use R8 because none of the Hawk monitor routines use it. This allows us to replace the application code part of the Hawk hello world program with this:

The example, translated to Hawk assembly language
; --- begin application code ---
        LIS     R8,'a'          ; ch = 'a'
LOOP:                           ; for (;;) {

        MOVE    R3,R8           ;   -- parameter ch
        LIL     R1,PUTCHAR
        JSRS    R1,R1           ;   putchar( ch )

        LIS     R3,VT           ;   -- parameter VT = '\v'
        LIL     R1,PUTCHAR
        JSRS    R1,R1           ;   putchar( '\v' )

        ADDSI   R8,1            ;   ch = ch + 1
        BR      LOOP            ; }
; --- end application code ---

If you substitute this text for the body of the hello world program, and then assemble and run it, it will go into an infinite loop that begins by printing this text on the screen, continuing downward to the right:

The output
        a
         b
          c
           d
            e
             f
              g
               h         

Computers are fast enough that it is not terribly interesting to watch this program produce this output at full speed. If you use the the s debugger command to step through the code one instruction at a time or the i (iterate) command to advance one full iteration once you enter the loop, you can slow down the program.

Look back at the C code given in planning the example and the the final assembly code. Note how we included the C code as comments in the assembly code to document which group of SMAL Hawk instructions implement each C statement. Additional comments have been added (set off by dashes) to explain individual instructions wheir their function might not be obvious We used the label LOOP for the infinite for loop of the original. Note that the syntactic complexity of the C loop header has been reduced to just a label, while the single end brace marking the end of the loop in C is now a machine instruction to branch back to the loop top.

Using names like LOOP for labels in assembly language is very common. A branch or jump instruction gives no hint about whether it is being used to skip the else clause in an if statement or to return to the top of a loop, but if we adopt a systematic way of assigning labels, for example, using the label LOOP to mark the top of a loop, we can make it clear that BR LOOP marks the end of a loop and not something else.

Subroutine calls to the monitor routines are a bit complicated, but each call follows a strict steriotypical form. First, the parameters are placed into the registers the subroutine requires. Second, the subroutine address is put in R1, and finally, the JSRS instruction is used to actually call the subroutine. To make this clearer, blank lines have been added to the code to set off the blocks of code making up each subroutine call.

Exercises

k) Modify the example program (first in C, then SMAL Hawk code) so it outputs the diagonal line of letters on a sharper diagonal by outputting a space before the vertical tab.

l) Modify the example program so that it outputs the diagonal line of letters backwards, starting with z and working back toward a.

m) The stdio.h package in the Hawk monitor includes a routine called DSPAT for setting the coordinates at which the next character will be output. Use this to make the diagonal line instead of using vertical tabs. You will need to use a variable, perhaps R9, to count the number of lines of output. What happens when it goes off the end of the screen?

Condition Codes and Conditional Branches

The branch and jump instructions are sufficient to allow infinite loops, but to write real programs, we need a way to build control structures that don't loop forever. This requires branch instructions that are conditional, branching or not branching depending on the results of previous computations.

In 1965, when IBM introduced the System 360, one of the innovations in that machine was the inclusion of 2 bits in the processor status word called the condition codes. Instructions that load data into registers or perform arithmetic use these bits to report whether the result was negative, zero or positive, with the 4th value used to report errors. Conditional branch instructions on the 360 test the condition codes, for example, branch if nonzero or branch if positive.

In 1970, Digital Equipment Corporation introduced the PDP-11; this computer introduced the model of condition codes used in the Hawk and many later computers, including the Intel 80x86/Pentium family. The PDP-11 and its successors have variations on the following condition codes:

N - negative
Set to one when the result of the operation was negative; otherwise zero.

Z - zero
Set to one when the result of the operation was zero; otherwise zero.

V - overflow
Set to one if an arithmetic operation produces an incorrect two's complement result, for example, if adding two positive numbers produces a negative result; otherwise zero.

C - carry
Set to one if an arithmetic operation produces a carry out of the most significant bit; otherwise zero.

The Hawk condition codes are stored in the least significant 4 bits of the processor status word. In the Hawk emulator display, they are also shown as single bit values below the processor status word for easier interpretation.

The Hawk ADD and SUB instructions set the condition codes, as do the ADDSI and ADDI instructions. The other instructions we have discussed do not change the condition codes, but there are variants of some instructions that do. Thus, while LOADS, LOAD and MOVE do not change the condition codes, the Hawk architecture offers alternatives, LOADSCC, LOADCC and MOVECC that do. These variants have identically the same formats as the originals, except for a one bit change in the binary representation.

To see if a number is zero or negative, a Hawk programmer can use a load or move instruction such as MOVECC to set the condition codes. To compare two numbers, a Hawk programmer would subtract them so that the condition codes report on the result. When the destination register field of a load, move or arithmetic instruction is zero, the Hawk still computes the desired result and sets the condition codes, but then it discards the result instead of saving it to a register.

Many Hawk instructions that discard their results after setting the condition codes are given alternative or more meaningful names. The most important of these is the compare instruction CMP which is really SUB (subtract) storing the result in R0:

The Hawk CMP instruction
07 06 05 04 03 02 01 00   15 14 13 12 11 10 09 08
1 1 0 1 0 0 0 0 s1 s2

The instruction CMP R1,R2 subtracts R2 from R1, so we could also write it as SUB R0,R1,R2 to mean exactly the same thing. After comparison or subtraction, the Z condition code will be set if the two operands were equal, and the N conditon code will be set for a negative result, for example, if R1 was less than R2 without an overflow. To interpret the C condition code, recall that subtraction is done by adding the two's complement of the subtrahend. As a result, after subtraction, C is set if there was no borrow out of the most significant bit.

The CMPI instruction compares a register with a 16-bit immediate constant. This is really just a macro that uses the add immediate instruction, ADDI with the destination field set to zero so the result gets discarded. The assembler negates the constant so CMPI R1,C is assembled as if it had been written ADDI R0,R1,-C. Programmers can largely ignore this because adding a negated value sets the condition codes exactly the way way they would have been set by subtracting the original value. Hawk emulators and debuggers, however, may show ADDI R0... where you expect CMPI....

The Hawk architecture offers the same 14 conditional branches that were originally offered by the DEC PDP-11. 8 of these are trivial to understand: BNS, BZS, BVS, and BCS each test one of the condition code bits and branch if that bit is set. These names are mnemonic. For example, BCS stands for branch if C set. For each of these, there is an inverse test, BNR, BZR, BVR, and BCR that branches only if the corresponding condition code bit is reset.

After a comparison using the CMP instruction, use BEQ to branch if the operands were equal, and BNE to branch if they were unequal. These are really just BZS and BZR, renamed for better documentation.

After CMP R1,R2, where R1 and R2 hold two's complement integers, use BGT, BGE, BLE or BLT to branch if R1>R2, R1R2, R1R2, or R1<R2, respectively. These do not just test the N and Z conditon codes, as you might imagine, but they also check the V (overflow) condition code because overflow means that the sign is wrong.

After CMP R1,R2, where R1 and R2 hold unsigned positive integers, use BGTU, BGEU, BLEU or BLTU to branch if R1>R2, R1R2, R1R2, or R1<R2, respectively. Two of these, BGTU and BLEU are new machine instructions, while the other two, BGEU and BLTU are synonyms for BCS and BCR; why this works is best left for later, when we discuss the arithmetic unit within the central processor.

Exercises

n) Give the Hawk machine code corresponding to CMPI R1,100; try doing this by hand, working from the information given above, before testing your solution using the assembler.

o) CMPI is the same as ADDI with the destination field set to zero. Why can't we convert ADDSI to some kind of CMPSI instruction by setting its destination field to zero?

p) Suppose you use the Hawk to compute the two's complement of one by subtracting it from zero with the SUB instruction. Recall that, in two's complement arithmetic, subtraction is done by adding the one's complement of the subtrahend, plus one in the rightmost bit. Give the values you would expect this to put in the Hawk condition codes.

q) Suppose you use the Hawk to compute the two's complement of zero by subtracting it from zero with the SUB instruction. Recall that, in two's complement arithmetic, subtraction is done by adding the one's complement of the subtrahend, plus one in the rightmost bit. Give the values you would expect this to put in the Hawk condition codes.

Examples illustrating definite loops

We can now modify our example program to halt after a finite number of iterations. In a high level language like C, we can do this with a construct such as for(x=1;x++;x<9). For loops in C are simpler than those in Java or Python because, in those languages, the loop header declares the loop control variable, while in C, the loop control variable must already exist.

Regardless of the language, for-loops are quite complex, including initialization, increment and exit conditions all bundled together. Assembly-language programmers must deal with each of these issues separately. Therefore, we must rewrite our program using simpler control structures such as do-while loops. This is shown here:

The example, modified to contain a definite loop
; --- begin application code ---
        LIS     R8,'a'          ; ch = 'a'
        LIS     R9,1            ; x = 1 -- initialization
LOOP:                           ; do {

        MOVE    R3,R8           ;   -- parameter ch
        LIL     R1,PUTCHAR
        JSRS    R1,R1           ;   putchar( ch )

        LIS     R3,VT           ;   -- parameter VT = '\v'
        LIL     R1,PUTCHAR
        JSRS    R1,R1           ;   putchar( '\v' )

        ADDSI   R8,1            ;   ch = ch + 1
        ADDSI   R9,1            ;   x = x + 1 -- increment

        CMPI    R9,9
        BLT     LOOP            ; } while (x < 9)
; --- end application code end ---

In the above, we dispersed the initialization and increment steps of the for loop and marked them with comments. This illustrates what is known as The do-while loop is a post-test loop because the test for termination comes at the end. Here, it is after the loop control variable x in R9 is incremented.

Sometimes, the loop must exit before changing the values of any variables. In languages descended from C, mid-loop exits are done with a conditional break in the loop body. For the example, if we test the loop control variable before it is incremented, we must rewrite our termination test. In the example, it was while(x<9) but with the test moved before the increment, it becomes if(x>=8)break.

We need to add a new label to mark the end of the loop. The name ENDLOOP is natural when a program contains just one loop. This is the destination address for the conditional branch that we use to implement the if-break construct. Assembly language programmers are strongly urged to adopt systematic naming conventions for labels so that the label names hint at their use in the program's control structure. Long names can detract from readability, so if there is more than one loop in a program, we might use the suffix LP to mean loop, allowing us to build labels such as FIXLP and ENDFIXLP to mark the start and end of a loop to fix something. The second version of our example program, given below uses this idea:

The example, with the loop exit in mid loop
; --- begin application code ---
        LIS     R8,'a'          ; ch = 'a'
        LIS     R9,1            ; x = 1 -- initialization
LOOP:                           ; for (;;) {

        MOVE    R3,R8           ;   -- parameter ch
        LIL     R1,PUTCHAR
        JSRS    R1,R1           ;   putchar( ch )

        LIS     R3,VT           ;   -- parameter VT = '\v'
        LIL     R1,PUTCHAR
        JSRS    R1,R1           ;   putchar( '\v' )

        CMPI    R9,8
        BGE     ENDLOOP         ;   if (x >= 8) break;

        ADDSI   R8,1            ;   ch = ch + 1
        ADDSI   R9,1            ;   x = x + 1
        BR      LOOP
ENDLOOP:                        ; } -- end while
; --- end application code ---

A more fully developed convention for labels used to form control structures might add, in addition to the suffix LP for loops, the suffix IF for labels involved with if statements. Another convention to consider is labels that make assertions about variables in the program, so for example, the label XBIGGER might be natural at the point where the variable x is bigger than something else.

What about pre-test loops? In C, Java and Python, these are usually written as while loops. Any while loop can be rewritten as an infinite loop with a mid-loop exit at the start. Conside this example in C:

Rewriting while loops
while (x < 9) {
    putchar( ch );
    putchar( '\v' );
    ch = ch + 1;
    x = x + 1;
}
       
for (;;) {
    if (x >= 9) break;
    putchar( ch );
    putchar( '\v' );
    ch = ch + 1;
    x = x + 1;
}

Exercises

r) Translate the following C code to Hawk assembly language:

int x = 0;
char ch = 'a';
while (x < 9) {
    putchar( ch );
    putchar( '\v' );
    x = x + 1;
    ch = ch + 1;
}

s) Rewrite the final example given above to eliminate the integer variable x and terminate the loop when ch with the reaches or passes the letter h.

t) Translate the following example C code to Hawk assembly language:

int x;
for (x = 1; x < 9; x++) {
    putat( x, x );
    putchar( '*' );
} 

u) Write a Hawk program that produces the following output using a loop that iterates exactly 6 times, where each iteration outputs one letter 4 times, with appropriate calls to putat(), as needed, to put the letters in the correct rows and columns.

         ABCDEF
        A      A
        B      B
        C      C
        D      D
        E      E
        F      F
         ABCDEF
        

v) Add a nested loop to the example program so that it produces the output. Do it in C first, and test your code before writing it in SMAL Hawk code:

        a
         bb
          ccc
           dddd
            eeeee
             ffffff
              ggggggg
               hhhhhhhh