5. Assembly Language Programming

Part of 22C:40, Computer Organization and Hardware Notes
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Assembly into Memory

To run a program, you must translate it from its source language to machine language and load it in memory. While there are assemblers that directly assemble into memory, most do not, and our SMAL assembler is typical. When the SMAL assembler processes a source file, say hello.a for the classic hello-world program, the assembled output is placed in an object file called hello.o, with the .o suffix used to indicate that it is an object file. Most assemblers also produce a listing file that shows the source and object code together. In the case of the SMAL assembler, this will be named hello.l, with the .l suffix indicating that it is a listing.
Typical SMAL assembler inputs and outputs
inputs hello.a
hawk.macs
monitor.h
smal
outputs hello.o
hello.l

If the source file was a stand-alone program, that is, if it does not call any functions defined somewhere else, and if the source file used only absolute assembly and never assembled data into a relocatable memory address (a topic discussed briefly in chapter 3), then the object file can be loaded directly into memory.

More frequently, however, the object file will need to be linked with other object files in order to create a loadable and executable file. For example, the main program may need to be linked with separately assembled subprograms and with a library of standard input/output functions. The linker is the program that performs this linkage. In the case of our Hawk computer, the minimal standard library is known as the monitor.
Typical SMAL Hawk linker inputs and outputs
inputs hello.o
monitor.o
link
outputs link.o

Finally, a program called the loader is used to load the program into memory so that it can be executed. The loader is usually a built-in part of the operating system on a modern computer, and each time you run a machine-language program, the loader is called on to load that program before it is run.

Another way to run a loadable object file is to run it under a debugger. Debuggers typically offer a way to observe the internal state of the processor as it executes a program, and they typically also allow examination of the program as it sits in memory. Our Hawk debugger, for example, includes a disassembler that shows a version of the assembly language source that corresponds to the machine code it finds in memory. This disassembled code is not always the same as the original assembly source because there are many possible source programs that will produce the same machine code. On the other hand, it is usually easy to see the relationship between the disassembled code displayed by the debugger and the assembler's listing file.
Typical Hawk debugger inputs and outputs
inputs link.o
keyboard
hawk
outputs display

Assuming we start with a single source file called hello.a in the current directory of a Unix-like system, and assuming that the SMAL assembler and linker are smart enough to look elsewhere for the other input files required, as indeed they are, then the dialog between the programmer and the assembler to load and run the hello-world program would be as follows: using the dialog shown below.
Assembling, linking and running the example
$ ls
hello.a
$ smal hello.a
  no errors
$ ls
hello.a hello.l hello.o
$ link hello.o
  no errors
$ ls
hello.a hello.l hello.o link.o
$ hawk link.o

The user typed in the ls command before and after each of the constructive commands in the above example; this requests a listing of all the files in the current directory, and as a result, it is easy to see what files were created at each step between the source program and the executable result.

In the above example, input to the computer system has been shown in boldface, and the prompt output by the system to solicit each command is shown as a dollar sign. It is likely that you will see a different prompt on your computer system, since the string used for the prompt can be easily customized.

The Skeleton of a Hawk Application

Since the publication of The C Programming Language by Kernighan and Ritchie in 1978, introductory tutorials for programming languages have usually begun with an example program that produces some variation on the string "hello world" to the output. Producing such output from an assembly language program takes a bit more work than it does in the original version of the C language, where the following sufficed:
The original C hello-world program
main()
{
	printf("hello, world\n");
}

In our assembly language, we must do many things that are done automatically by C and many other high-level languages. Where a C programmer writes main(){} as the skeleton of an empty main program, a Hawk programmer will have to write more:
The skeleton of an empty Hawk program
        S       START
        USE     "hawk.macs"
        USE     "monitor.h"

        COMMON  STACK,#1000
PSTACK: W       STACK

START:  LOAD    R2,PSTACK       ; set up the stack

; ----  application code goes here ----	

        LOAD    R1,PEXIT
        JSRS    R1,R1           ; call monitor routine to stop!
        END

Some parts of the above code should be familiar from previous chapters. The USE directive to insert the macro definitions from the file hawk.macs, for example, was present in every assembly language example in Chapter 4, and the W and END directives should be familiar from Chapter 3. Other material here, however, is new.

The assembler's S directive is used to tell the loader the starting address for the program, that is, initial value of the program counter to be used to start executing the main program. The argument to the S directive could be any number or expression, but since we want the address of the first executable instruction of the program, we use a label. In this case, we use the identifer START which is defined as a label on the first executable instruction and used as an operand on the S directive.

The S directive does not assemble any value into memory! All it does is communicate with the loader, telling the loader where to jump to in the loaded program in order to start it.

The header file monitor.h includes the interface definition for all of the routines in our Hawk monitor, a very minimal operating system for the Hawk computer. Our minimal example uses only one of these, the EXIT function that it calls in the second to the last line.

The COMMON directive is a more interesting puzzle. The usual way to perform subprogram calls on most computers is to use a push-down stack. We use the word subprogram here in the most generic sense to include such language specific concepts as methods, procedures, functions and so on. All of these have the property that they can be called, and all have the property that, when they finish, they return to the caller. This stack is available, within each subprogram, for the storage of any local variables needed by that routine. Conceptually, on entry to a subprogram, local variables are pushed onto this stack, and on exit, they are popped.

The purpose of this COMMON directive is to allocate the space for the stack in read-write memory. In this example, it allocates 100016 words as a block of memory called STACK. More formally, we can think of the symbol STACK as the name of a variable inside the assembler that holds the address of this block of memory, but even this is misleading because it is the linker that actually allocates this block of memory!

In general, the directive COMMON n,s directive asks the linker to allocate a block of memory of size s bytes starting at a word address and attach its address to the symbol n. Programs may allocate any number of common blocks, so long as each block has a unique name. If two different parts of the program allocate common blocks with the same name, these will refer to the same memory region, even if the two parts of the program were separately assembled.

After the stack has been allocated, the minimal program included a one-word pointer to that stack with the label PSTACK. We did this so we could load this 32-bit pointer into a register with the first executable instruction of the program, LOAD PSTACK.

The Load Instruction

The Hawk LOAD instruction is a memory reference instruction that allows one word of memory to be loaded, using a variety of different formats for the memory address. In the form we are using it, the load instruction is assembled as follows:
The Hawk LOAD instruction
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
1 1 1 1 dst 0 1 0 1 0 0 0 0
const16

Formally, this does r[dst]=M[program_counter+sxt(15,const16)].

This may take a moment to digest! In practice, what this means is that one word will be loaded, where the address of that word is from 32768 bytes prior to this instruction to 32767 bytes after it. This is called program-counter-relative addressing, and it is the usual way that one part of a particular assembly language source file references another part of the same source file. In some detail, because the program counter is incremented as the instruction is fetched, the 16-bit constant measures the distance of the operand from the next instruction in sequence. That is, a constant of zero causes the load instruction to load a copy of the word starting right after itself.

It would be inconvenient if, every time you wanted to use program-counter-relative addressing, you had to subtract the address of the next instruction from the intended address, so our macro for assembling this instruction does this job for you. Here is the macro, simplified to ignore those aspects of the load instruction we have been ignoring:
The LOAD macro (simplified)
MACRO LOAD =dst,=const
  H #F050 | (dst << 8)
  H const - (.+2)
ENDMAC

By convention, all of the Hawk monitor routines use register 2 as the stack pointer, so the net effect of the block of code that opened our skeleton of an empty program was to initialize register 2 with the address of the first byte of the 4096 byte stack:
Initializing the stack pointer
        COMMON  STACK,#1000
PSTACK: W       STACK

START:  LOAD    R2,PSTACK       ; set up the stack

In fact, we now have two ways to load a 32-bit constant into a register. We can either assemble that constant in memory, and then use the LOAD instruction to bring it into a register, or we can use the LIW macro, that is, a 2-instruction sequence, to load the constant directly from constant fields within the instruction stream. How does an assembly language programmer (or a compiler) decide which of these to use?

In this particular case, the answer is, we have no choice. The LIW macro only works with those constants with values known at assembly time, so called absolute constants. As we have already noted, the address of the common block holding the stack is not assigned by the assembler, but rather, this address left to be assigned by the linker or the loader. Such addresses, called relocatable addresses cannot be used as operands on the LIW macro, but they may be assembled into words for later use by the LOAD instruction.

More generally, the decision to use one or the other approach to loading a 32-bit constant will depend on space-time tradeoffs. The LIW macro uses 6 bytes of program memory each time it is used. Assembly of a word into memory takes 4 bytes for that word, and then, every reference to the word using a LOAD instruction takes 4 more bytes (2 halfwords). So, if some particular 32-bit constant is used frequently enough, it will take less memory if we assemble it into memory and the load it as needed. On the other hand, if a constant is only used once, we are better off loading it once, using the LIW macro.

What about execution speed? Unfortunately, the speed of a modern pipelined processor with cache memory is extremely hard to predict, so we must make some assumptions. If we assume that the processor is designed so that the cost of memory references dominates its performance, and if we assume that it fetches instructions one 32-bit word at a time, the LOAD instruction takes one memory reference to fetch from memory and one memory reference to fetch its operand. The LIW macro, on the other hand, takes one and a half memory references to fetch the instructions and nothing more. As a result, we can guess that the LIW macro will usually be a bit faster than the LOAD instruction.

Exercises

a) Here are some SMAL definitions of symbols

    A = 5
    B = 555
    C = .

If you want one of these values loaded in a register, you could use the LIS or LIL instructions, the LIW macro, or LOAD using a word holding the indicated value. Which of these 4 is the most appropriate way to load each of the above?

b) For each constant defined in the previous question, consider each of the possible ways of loading that constant, the LIS or LIL instructions, the LIW macro, or LOAD using a word holding the indicated value; which of these would not work for which constant, and why.

Calling a Monitor Routine

The final thing our empty program skeleton does is call the monitor's exit routine. The interface to the monitor, loaded by monitor.h included words that point to each of the routines in the monitor, so we set things up to call a monitor routines by loading the appropriate pointer into register 1 with a load instruction, in this case, LOAD R1,PEXIT, where we use the convention adopted in the monitor file that the label Pthing is a label on a word holding a pointer to thing.

The actual call to the monitor routine is done by the next (and final) instruction in our example, JSRS. The Hawk has two jump-to-subroutine instructions, one long and one short. These are memory reference instructions, with the same formats as the long and short versions of the load instruction, LOAD and LOADS. This instruction has the following form:
The Hawk JSRS instruction
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
1 1 1 1 dst 1 0 1 1 x

Formally, this does r[dst]=program_counter; program_counter=r[x]

These two assignments are done by the hardware in parallel, so the instruction JSRS R1,R1 exchanges the value of the program counter with the value stored in the register. Since we have just loaded the address of the first instruction of one of the Hawk monitor routines in register 1, this instruction will transfer control to that routine.

Because the JSRS instruciton copied the previous value of the program counter to a register, the called routine can use this value to return to the calling routine. We call that register the linkage register, and we also say that this register holds the return address for the call. Because of this use, we also say that the Hawk architecture uses register linkage for subprogram calls. Some other architectures include calling instructions that automatically push the calling address on the stack. This was popular in the 1970's when the Intel 8086 was designed, and therefore, the Intel Pentium does this.

In our skeleton main program, the program itself made absolutely no use of the stack. The only monitor routine it called was the exit routine, and that, as it turns out, never returns and makes no use of the stack either. Several of the Hawk monitor routines, on the other hand, make extensive use of the stack, and when you start writing code for your own subroutines, you will be storing local variables on the stack.

Exercises

c) Type in the skeleton of an empty Hawk program given above, then compile it, link it, and load it under the Hawk emulator so you can inspect the machine code loaded into memory.

d) Hand translate the skeleton of an empty Hawk program given above into a sequence of halfwords starting with first LOAD instruction and continuing until the JSRS instruction. You may have to put queston marks for some or all of the contents of some of these halfwords because they depend on material not given here.

Some Useful Hawk Monitor Routines

The exit service offered by the Hawk monitor is trivial! In fact, it does nothing useful, taking no parameters and never returning! If we want to write an interesting hello-world program, we need more, a way to put a message on the screen! Here are some of the routines in the Hawk monitor:

EXIT -- terminates the application
Parameters: None.
Return value: Does not return
Side effects: Does not return

DSPINI -- initialize the output display device
Parameters: None.
Return value: None.
Side effects: Uses R3 to R4
The next character output will appear at row 0, column 0; this is the upper left-hand corner of the display.

DSPAT -- set output display coordinates
Parameters: Column = R3, Row = R4
Return value: None.
Side effects: Uses R3 to R7
The next character output will appear at the indicated row and column on the display, where the upper left is row 0, column 0.

DSPCH -- display one character
Parameters: R3 -- the character to display
Return value: None.
Side effects: Uses R4 to R5
The character will be displayed at the current row and column number on the display, and then the column number will be incremented.

DSPST -- display one null-terminated string
Parameters: R3 -- address of the first byte of the string.
Return value: None.
Side effects: Uses R3 to R7
The characters of the string will be displayed in sequence.

On entry, all of these monitor routines expect the return address in register 1, and they all expect that register 2 will hold the stack pointer, pointing to the first free word beyond any part of the stack that is already in use.

A Working Hello-World Program

We now have almost enough information in hand to write a program that outputs the string "Hello world!" to the display and then halts. Here is an assembly listing of that program:
A working hello-world program
        TITLE   A Hello-World Program

        S       START
        USE     "hawk.macs"
        USE     "monitor.h"

        COMMON  STACK,#1000
PSTACK: W       STACK

; the program starts here!
START:  LOAD    R2,PSTACK       ; set up the stack

        LOAD    R1,PDSPINI
        JSRS    R1,R1           ; initialize the display

        LOAD    R3,PHELLO
        LOAD    R1,PDSPST
        JSRS    R1,R1           ; output (HELLO)

        LOAD    R1,PEXIT
        JSRS    R1,R1           ; stop!

	ALIGN	4
PHELLO: W	HELLO
HELLO:  ASCII   "Hello world!",0
	END

This program calls two routines in our minimal Hawk monitor, DSPINI to initialize the display for output, and DSPST to display the null-terminated string Hello world!". Note the use of the LOAD instruction to load the contents of the word labeled PHELLO that holds the address of the start of the string. This should appear a bit awkward, but we had to use this awkward construct in order to avoid introducing yet another way to load an address into a register.

Another change we made was to add a title to the program, using the TITLE directive. This directive is little more than an expensive sort of comment, but in large programs, particularly when they are listed to a printer, it is useful to have a meaningful title listed on each page. You can see the title on the first line of the assembly listing file:
Listing the hello-world program, part 1
SMAL32 (rev  9/03)              A Hello-World Program        12:28:21  Page  1
                                                             Sun Sep  7 2003

                             1          TITLE   A Hello-World Program
                             2
                             3          S       START
                             4          USE     "hawk.macs"
+000000:+00000000            5          USE     "monitor.h"
        +00000000
        +00000000
        +00000000
        +00000000
        +00000000
        +00000000
        +00000000
        +00000000
        +00000000
        +00000000
                             6
                             7          COMMON  STACK,#1000
+00002C:+00000000            8  PSTACK: W       STACK

This listing file shows the impact of including the monitor.h file. Although the contents of this file are completely suppressed, you can see that the version used here loaded 11 words into memory. The plus sign before each of these words indicates that the value known to the assembler, zero in this case, will be modified by the linker or the loader For these particular words, the linker will fill in the actual addresses of the entry points of various monitor routines.

The pointer with the label PSTACK will be handled in the same way by the linker, filling in the actual address of the common block named STACK when that is known. The next part of the assembly listing holds the actual program:
Listing the hello-world program, part 2
                            10  ; the program starts here!
+000030: F250  FFF8         11  START:  LOAD    R2,PSTACK       ; set up the sta
                            12
+000034: F150  FFCC         13          LOAD    R1,PDSPINI
+000038: F1B1               14          JSRS    R1,R1           ; initialize the
                            15
+00003A: F350  000E         16          LOAD    R3,PHELLO
+00003E: F150  FFCE         17          LOAD    R1,PDSPST
+000042: F1B1               18          JSRS    R1,R1           ; output (HELLO)
                            19
+000044: F150  FFB8         20          LOAD    R1,PEXIT
+000048: F1B1               21          JSRS    R1,R1           ; stop!
                            22
                            23          ALIGN   4
+00004C:+00000050           24  PHELLO: W       HELLO
+000050: 48  65  6C  6C     25  HELLO:  ASCII   "Hello World!",0
         6F  20  57  6F
         72  6C  64  21
         00
                            26          END

If we now go through the exercise of linking this program with the link command and then we run a Hawk emulator so that we can run this program, we will get the following initial output from the emulator:
Ready to run the hello-world program
HAWK EMULATOR
   /------------------CPU------------------\   /----MEMORY----\
   PC:  00001030                R8: 00000000   00102C: BR      #0010A6
   PSW: 00000000  R1: 00000000  R9: 00000000   00102E: BR      #001032
   NZVC: 0 0 0 0  R2: 00000000  RA: 00000000 ->001030: LOAD    R2,#00102C
                  R3: 00000000  RB: 00000000   001034: LOAD    R1,#001004
                  R4: 00000000  RC: 00000000   001038: JSRS    R1,R1
                  R5: 00000000  RD: 00000000   00103A: LOAD    R3,#00104C
                  R6: 00000000  RE: 00000000   00103E: LOAD    R1,#001010
                  R7: 00000000  RF: 00000000   001042: JSRS    R1,R1

 **HALTED**  r(run) s(step) q(quit) ?(help)

This display shows an initial PC valuye of 103016, and in the output column labeled MEMORY, the contents of location 103016 is shown as a LOAD instruction. The operands, however, are shown in hexadecimal form, not in the original source form. This is because this display of the code was produced by disassembly of the binary code in memory, with no reference to the original source program. Nonetheless, the sequence of instructions is exactly the same as the sequence of instructions in the original assembly language program: LOAD, LOAD, JSRS, LOAD, LOAD, JSRS.

Why did the assembly listing say that the first LOAD instruction was at address +000030 while the emulator's display shows it at address 001030? The answer lies in a concept that was mentioned in chapter 3, relocation. The assembler did not assign absolute memory addresses to the code it assembled, but only relative addresses, relying on the linker to set the absolute address. This is signified, in the assembley listing, by a leading plus sign on the address. The linker, in this case, has added 100016 to each address, loading the object code for our example into addresses 100016 through 105C16.

If we hit the r key at the point where the emulator display window shows the above text, the emulator will run, producing the output "Hello world!" in the bottom half of the screen, and then it will halt. At this point, this is not as interesting as watching the computer execute just one instruction. In this context, the s or n keys direct the emulator to step forward just one instruction. (The difference between these is only apparent when the next instruction calls a subprogram; in that case, s will step into the body of the called routine, while n will allow the call to execute as if it were one instruction.) Here is the state of our system after executing the first two load instructions:
Starting to run the hello-world program
HAWK EMULATOR
   /------------------CPU------------------\   /----MEMORY----\
   PC:  00001038                R8: 00000000   00102C: BR      #0010A6
   PSW: 00000000  R1: 00000178  R9: 00000000   00102E: BR      #001032
   NZVC: 0 0 0 0  R2: 0001003C  RA: 00000000   001030: LOAD    R2,#00102C
                  R3: 00000000  RB: 00000000   001034: LOAD    R1,#001004
                  R4: 00000000  RC: 00000000 ->001038: JSRS    R1,R1
                  R5: 00000000  RD: 00000000   00103A: LOAD    R3,#00104C
                  R6: 00000000  RE: 00000000   00103E: LOAD    R1,#001010
                  R7: 00000000  RF: 00000000   001042: JSRS    R1,R1

 **HALTED**  r(run) s(step) q(quit) ?(help)

Three fields in this output have changed from the values shown in the previous illustration. The program counter has advanced over the two load instructions and we are now ready to execute the first call to a Hawk monitor function. Register 2 now points to the first word of the stack, and we can now see that the stack begins at memory location 1003C16. Register 1 now points to 17816, the actual entry point of the DSPINI monitor routine.

What about those two BR instructions that the emulator found in locations 102C16 and 102E16? The answer to this question is simple! The emulator's disassembler made a mistake. If you use the emulator's t command to toggle the memory display, you will see a display of memory as a table of 32-bit words, each shown in hexadecimal and also as 4-character text strings. When viewed as a hexadecimal number, location 102C16 contains 1003C16, exactly the value loaded into register 2 by the first LOAD instruction.

Exercises

e) Modify the hello-world program given above so that it uses the DSPAT monitor routine to output its message starting on line 5 column 35 of the screen, so the message is more or less centered.

f) Modify the hello-world program given above so that it outputs the message "Hi!" using 3 successive calls to the DSPCH monitor routine.

g) A very bright programmer decides to output the message "Hi" using this fragment of code:

    LOAD R1,PDSPCH
    LIS R3,'H'
    JSRS R1,R1
    LIS R3,'i'
    JSRS R1,R1

When the programmer tries this, it outputs the H, but then it starts to behave strangely. What happened? What was the mistake?

Load Address, Yet Another Way to Load a Register

In our first-draft of the hello-world program, we used PHELLO as the label on a word holding HELLO, the memory address of the first letter in the string "Hello world!. This worked, but it is awkward. The Hawk architecture includes an instruction that allows us to directly load the address of any nearby object in the code. This is the load-effective-address instruction.

In general, the term effective address refers to the address of the operand of an instruciton in memory. So, for example, the LOAD instruction computes an effective memory address by adding a 16-bit constant to any of several registers, and then it uses this address to get its operand from memory. The load-effective-address or LEA instruction stops short of going to memory and instead loads the address itself.
The Hawk LEA instruction
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
1 1 1 1 dst 0 1 1 1 0 0 0 0
const16

Formally, this does R[dst]=program_counter+sxt(15,const16).

As with the LOAD instruction, this is using program-counter-relative addressing, and the assembler, or rather, the macro defined in hawk.macs takes a normal memory address as an operand and subtracts the program counter from it in order to build an instruction that will reconstruct that address.

This can only load the addresses of nearby memory locations into a register, but nearby is defined generously as within about 32K bytes, plus or minus, from the current value of the program counter. Thus, we can rewrite the call to write out the string "Hello world!" as follows:
Using LEA in the hello world program
LEA     R3,HELLO	; this line changed
LOAD    R1,PDSPST
JSRS    R1,R1           ; output (HELLO)

If you make this one line change to the hello world program and then assemble, link and run the program, it will work exactly as it originally did. The line with the label PHELLO is no longer needed, nor is the ALIGN directive preceeding it, so you can delete these also. Having made these chantges, your new program is at least one word smaller, and it will run imperceptably faster because it makes one less memory reference.

Why not use LEA R1,DSPST instead of LOAD R1,PDSPST? Because the former only works if the assembler knows the relative distance, in bytes, between the instruction and the address to be loaded. So long as these are both defined in the same source file, the assembler can use this information easily, but in this case, DSPST is defined in the monitor and could easily end up more than 32K bytes away from the point of call.

Exercises

h) Write a program that outputs the string "Hello " followed by the string "world!" (the output will be equivalent to the original hello-world program, but it will call DSPST twice, using LEA instructions where possible.

Control Structures

A program that just outputs the text "Hello world!" is not very interesting. It is worthwhile, at this point, to pause and ask, what else do we need the computer to be able to do before we would consider it useful. The instructions we have discussed offered us many different ways to load constants in memory, and they offered us the ability to add and subtract integers and to load and store from any memory address. With these, we can write programs to evaluate any simple arithmetic expression.

Of course, it would be nice to have more arithmetic operations available, but in theory, all arithmetic can be done with just addition and subtraction. The big thing we are missing is control structures. We would like to be able to write programs containing if statements and loops!

At the machine language level, the fundamental primitive for control structuring is assignment to the program counter. In high-level languages such as Fortran, Pascal, C or C++, this is done using the goto statement, which forces a control transfer to a specific labeled statement. When introductory programming courses ever mention this statement, this is usually in the context of a warning to never use it. At the machine language level, though, this is all we have.

All more complex control structures must therefore be translated into assignments to the program counter. So, for example, at the end of each loop body, there must be an assignment that forces a jump back to the start of the loop body, and any break statements within the loop must be translated to assignments that force jumps out of the loop.

Just as the Hawk provides short and efficient ways to load small constants, but it requires long and cumbersome instructions to load large constants, the Hawk also provides a short efficient way jump short distances within the program, while providing longer and more cumbersome support for control transfers that make larger changes to the program counter. It is worth noting that the Intel Pentium and 80x86 family, as well as many other successful computer architectures offer similar tradeoffs.

All of the common control transfer instrucitons on the Hawk use program-counter-relative addressing. The short fast instruction is called the branch instruction, while the long and slower form is called the jump instruction. The short form is given first:
The Hawk BR instruction
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
0 0 0 0 0 0 0 0 const8

We can use the shorthand notation program_counter=program_counter+(2×sxt(7,const8)) to describe this.

Notice that if the constant in this instruction is zero, the program counter is unchanged. As a result, the instruction halfword 000016 is a no-op. As we already noted, in chapter 4, the instruction FFFF16 (a register to register move instruction) is also a no-op. This has a small value to developers of programs in read-only memory because it allows patching such a program by forcing existing instructions to either 000016 or FFFF16 (whichever is possible in the ROM technology being used) when it is discovered that they need to be deleted from the program, and because blocks of no-ops (either 000016 or FFFF16, depending on which can be changed to other instructions) can be left in programs as patch space to hold instructions added later.

The Hawk jump instruction is two halfwords long, and it may be used to compute the next address in a program in several ways, but for practical purposes, the simplest form of this instruction does close to the same thing as the branch instruction:
The Hawk JUMP instruction
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
1 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0
const16

We can use the shorthand notation program_counter=program_counter+sxt(15,const16) to describe this.

The big practical difference between this form of the JUMP instruction and the BR instruction is that the jump instruction can change the program counter to any value in a 64K byte range, from about +32K to -32K, while the branch instruction is limited to a range from +127 to -128 halfwords. Thus, the branch instruction should be used when the destination address is nearby, while the jump instruction should be used when the destination address is far away.

As with other instructions that use program-counter-relative addressing, the SMAL assembler, or rather, the Hawk macro definitions in hawk.macs take care of the details of program-counter relative addressing, so that all an assembly language programmer needs to do is label the destination instruction and then use that label on the jump or branch instruction that transfers control to it. The following rather foolish example illustrates this:
Using the branch and jump instructions
                             1          USE     "hawk.macs"
                             2  .       =       #1000
                             3          S       L1
 001000: 11C1                4  L1:     ADDSI   R1,1
 001002: 0003                5          BR      L3
 001004: 11C2                6  L2:     ADDSI   R1,2
 001006: F030  FFF6          7          JUMP    L1
 00100A: 11C4                8  L3:     ADDSI   R1,4
 00100C: 00FB                9          BR      L2
                            10          END

This example program was written using absolute assembly, setting the assembly origin to 100016 so that, when it runs, it will appear in exactly the same memory locations as are shown on the assembly listing. Because of this, and the fact that it makes no use of the Hawk monitor routines, the object file can be loaded directly into the Hawk emulator, without any need to first run it through the linker.

If you untangle the control structure of this example, you will find that it is an infinite loop. During each iteration, register 1 is incremented by 1, by the instruction with the label L1, 4, by the instruction with the label L3, and 2, by the instruction with the label L2, in that sequence. It is an infinite loop because we have yet to discuss control structures that incorporate conditionals.

The important thing to observe in this listing is that the BR and JUMP instructions look very similar in assembly language, but that the machine code generated is different. The machine code for the forward branch on line 5 is easy to understand. In this case, the instruction skips over 3 halfwords to get to the line with the label L3, and the constant in the instruction itself is 0316.

The backward branch on line 9 contains the constant FB16 or 111110112. This is the two's complement of 000001012 or 5, and 5 halfwords are skipped over between the backward branch instruction and the label if you count both the halfword containing the branch itself and the size of the labeled instruction. In the forward direction, both of these were excluded from the count.

The jump instruction on line 7 has the 16-bit constant FFF616 as an operand. Interpreting this as a two's complement number, this is -1010, and it turns out that the destination label is 10 bytes prior to the jump instruction, including both the bytes of the jump instruction itself and the destination instruction.

Exercises

j) Give the machine code for the shortest infinite loop, expressed using a BR instruction that branches to itself.

k) Give the machine code for the second shortest infinite loop, expressed using a JUMP instruction that branches to itself.

An Example Program

Here is a little program, expressed in a C-like notation, that incorporates an infinite loop and calls some of the service routines in the hawk monitor:
An example
int x = 0;
char ch = 'a';
dspini();
while (TRUE) {
	dspat( x, x );
	dspch( ch );
	x = x + 1;
	ch = ch + 1;
}

Running this program, you would expect successive characters to be displayed, one per iteration, starting with the letter 'a'. Each character is displayed one line below and one character to the left of the previous one because of the call to dspat(), so the expected output is a diagonal line of characters extending down and to the right until something causes the program to terminate.

When we translate this program to Hawk machine code, we must find some place to put the variables. Here, we will use registers 8 and 9 because none of the Hawk monitor routines use these registers. Having done this, we can replace the body of our Hawk hello world program with the following:
The example, translated to Hawk assembly language
START:  LOAD    R2,PSTACK       ; setup the stack

        LIS     R8,1            ; x = 1;    (using R8 for x)
        LIS     R9,'a'          ; ch - 'a'; (using R9 for x)

        LOAD    R1,PDSPINI
        JSRS    R1,R1           ; dspini();
LOOP:				; while (TRUE) {
        MOVE    R3,R8
        MOVE    R4,R8
        LOAD    R1,PDSPAT
        JSRS    R1,R1           ;     dspat(x,x);

        MOVE    R3,R9
        LOAD    R1,PDSPCH
        JSRS    R1,R1           ;     dspch(ch);

        ADDSI   R8,1            ;     x = x + 1;
        ADDSI   R9,1            ;     ch = ch + 1;

        BR      LOOP		; }

If you substitute this text into the Hawk version of the hello world program, and then assemble and run it, it will go into an infinite loop that begins by printing this text on the screen:
The output
	a
	 b
	  c
	   d
	    e
	     f
	      g
	       h	 

Computers are fast enough that it is not terribly interesting to watch this program produce this output at full speed. Instead, consider using the n emulator command to step through the code one instruction at a time (counting each monitor call as one instruction) for a few iterations, and then consider using the i monitor command to allow it to advance one full iteration of the loop at a time. Be careful not to do the latter until you are inside the loop.

Look back, now at the original code for the example, and compare this with the assembly language code! Note that we have used comments in the assembly language to relate this code to the original C-like code. Some statements in the original reduce to single machine instructions, while others reduce to sequences of instructions. We used the label LOOP for the infinite while loop of the original program, and curiously, the syntactic complexity of the loop header in the original reduced to just a label, while the single end brace marking the end of the loop in the orignal reduced to a machine instruction, the branch back to the start of the loop.

Using names like LOOP for labels in assembly language is very common. A branch or jump instruction gives no hint about whether it is being used to skip the else clause in an if statement or to return to the top of a loop, but if we adopt a systematic way of assigning labels, for example, using the label LOOP to mark the top of a loop, we can make it clear that BR LOOP marks the end of a loop!

Exercises

l) Modify the example program so that it outputs the diagonal line of letters on a sharper diagonal, that is, so that the letter on line i is output in column 2i. You can add a number to itself in order to double it.

m) Modify the example program so that it outputs the diagonal line of letters backwards, starting with z and working back toward a.

n) The example program does not use registers ten and up. Rewrite the example so these hold the addresses of DSPAT and DSPCH and are loaded before the loop begins, instead of being loaded over and over within the loop. How much does this shorten the machine code of the problem, and how many instructions does it eliminate from each iteration?

Condition Codes and Conditional Branches

The branch and jump instructions are sufficient to allow infinite loops, but to write real programs, we need a way to build control structures that don't loop forever! This requires branch instructions that are conditional, branching or not branching depending on the results of previous computations.

In 1965, IBM introduced the System 360, and one of the innovations in this machine was the inclusion of 2 bits in the processor status word called the condition codes. Whenever the machine loaded data into a register or performed arithmetic, these bits were set to report a small amount of information about the result, and the conditional branch instructions in this machine tested the condition codes.

In 1970, Digital Equipment Corporation introduced the PDP-11; this computer introduced the model of condition codes used in the Hawk and many other computers, including the Intel 80x86/Pentium family. The PDP-11 and its successors have the following condition codes:

N - negative
Set to one when the result of the operation was negative; otherwise, it is set to zero.

Z - zero
Set to one when the result of the operation was zero; otherwise, it is set to zero.

V - overvlow
Set to one if an arithmetic operation produces an incorrect two's complement result, for example, if adding two positive numbers produces a negative result; otherwise, it is set to zero.

C - carry
Set to one if an arithmetic operation produces a carry out of the most significant bit; otherwise, it is set to zero.

The Hawk condition codes are stored in the least significant 4 bits of the processor status word, and in the Hawk emulator display, they are shown below the processor status word so you can easily watch how they change.

The Hawk ADD and SUB instructions set the condition codes after each arithmetic operation as do the ADDSI and ADDI instructions. The other instructions we have discussed do not change the condition codes, but there are variants of these that do. Thus, while LOADS, LOAD and MOVE do not change the condition codes, the Hawk architecture offers alternatives, LOADSCC, LOADCC and MOVECC that do. These variant instructions have identically the same formats as the originals on which they are based, and differ in only one bit in the binary representation.

Several Hawk instructions are used only to set the condition codes. The most important of these is the compare instruction:
The Hawk CMP instruction
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
1 1 0 1 0 0 0 0 s1 s2

The instruction CMP R1,R2 subtracts r[2] from r[1], and in fact, this instruction is the SUB instruction with the destination field set to zero. After comparison or subtraction, the Z condition code will be set if the two operands were equal, and the N conditon code will be set if the result was negative, a result that usually indicates that r[1] was less than r[2].

Similarly, the Hawk architecture provides a compare immediate instruction, CMPI that compares a register with a 16-bit immediate constant. In fact, this instruction is really just the add immediate instruction, ADDI with the destination field set to zero so that the result gets discarded. It is the assembler that negates the constant from the user's program; when the hardware adds the negated constant to complete the comparison, it sets the condition codes identically to the way they would have been set had the hardware done the subtraction.

The Hawk architecture offers the same 14 conditional branches that were offered by the DEC PDP-11. 8 of these are trivial to understand: BNS, BZS, BCS, and BCS each test one of the condition code bits and branch only if that bit is set. For each of these, there is an inverse test, BNR, BZR, BCR, and BCR that branches only if the corresponding condition code bit is reset.

After a comparison using the CMP instruction, you should use BEQ to branch if the operands were equal, and BNE to branch if they were unequal. These are synonyms for BZS and BZR.

After CMP R1,R2, where r[1] and r[2] hold two's complement integers, use BGT, BGE, BLE or BLT to branch if r[1]>r[2], r[1]>r[2], r[1]<r[2], or r[1]<r[2], respectively. These do not just test the N and Z conditon codes, as you might first imagine, but rather, they take into account the V (overflow) condition code, because if V is set, the sign of the result after subtraction was wrong.

After CMP R1,R2, where r[1] and r[2] hold unsigned integers, use BGTU, BGEU, BLEU or BLTU to branch if r[1]>r[2], r[1]>r[2], r[1]<r[2], or r[1]<r[2], respectively. Two of these, BGTU and BLEU are actually machine instructions, while the other two, BGEU and BLTU are synonyms for BCS and BCR; why this works is best left for a later discussion, when we delve farther into the works of the arithmetic unit within the central processor.

Exercises

o) Give the Hawk machine code corresponding to CMPI R1,100

p) CMPI is the same as ADDI with the destination field set to zero. Why can't we convert ADDSI to some kind of CMPSI instruction for comparison with small constants between +7 and -8 by setting its destination field to zero?

q) Suppose you use the Hawk to compute the two's complement of one by using the SUB instruction to subtract one from zero. Recall that, on a two's complement computer such as the Hawk, subtraction is done by adding the one's complement of the subtrahend, with a carry of one added in to the least significant digit. Give the values you would expect this to put in the Hawk condition codes.

r) Suppose you use the Hawk to compute the two's complement of zero by using the SUB instruction to subtract zero from zero. Recall that, on a two's complement computer such as the Hawk, subtraction is done by adding the one's complement of the subtrahend, with a carry of one added in to the least significant digit. Give the values you would expect this to put in the Hawk condition codes.

Examples illustrating definite loops

These tools allow us to modify our example program to halt after a finite number of iterations:
The example, modified to contain a definite loop
START:  LOAD    R2,PSTACK       ; setup the stack

        LIS     R8,1            ; x = 1;    (using R8 for x)
        LIS     R9,'a'          ; ch - 'a'; (using R9 for x)

        LOAD    R1,PDSPINI
        JSRS    R1,R1           ; dspini();
LOOP:				; do {
        MOVE    R3,R8
        MOVE    R4,R8
        LOAD    R1,PDSPAT
        JSRS    R1,R1           ;     dspat(x,x);

        MOVE    R3,R9
        LOAD    R1,PDSPCH
        JSRS    R1,R1           ;     dspch(ch);

        ADDSI   R8,1            ;     x = x + 1;
        ADDSI   R9,1            ;     ch = ch + 1;

        CMPI    R8,9
        BLT     LOOP		; } while (x < 9)

	LOAD	R1,PEXIT
	JSRS	R1,R1		; exit();

This example illustrates what is known as a post-test loop. We could, if we wish, rewrite the loop so that the terminating condition is tested elsewhere. Consider the following rewrite of the loop body from the above program, showing the most general case of a loop exit:
The example, with the loop exit in mid loop
START:  LOAD    R2,PSTACK       ; setup the stack

        LIS     R8,1            ; x = 1;    (using R8 for x)
        LIS     R9,'a'          ; ch - 'a'; (using R9 for x)

        LOAD    R1,PDSPINI
        JSRS    R1,R1           ; dspini();
LOOP:				; while (TRUE) {
        MOVE    R3,R8
        MOVE    R4,R8
        LOAD    R1,PDSPAT
        JSRS    R1,R1           ;     dspat(x,x);

        MOVE    R3,R9
        LOAD    R1,PDSPCH
        JSRS    R1,R1           ;     dspch(ch);

        CMPI    R8,8
        BGE     ENDLOOP		;     if (x >= 8) break;

        ADDSI   R8,1            ;     x = x + 1;
        ADDSI   R9,1            ;     ch = ch + 1;

        BR      LOOP		; }
ENDLOOP:

	LOAD	R1,PEXIT
	JSRS	R1,R1		; exit();

In this example, the loop exit is written using the break statement from C. This translates to a conditional branch to the end of the loop, which we have marked, here, with the label ENDLOOP. Assembly language programmers are strongly urged to adopt naming conventions for labels where the label name hints at its function and its position relative to that function.

The labels LOOP and ENDLOOP are naturals for a program that contains only one loop, but we can use LP to mean loop, allowing us to construct labels such as FIXLP and ENDFIXLP to mark the start and end of a loop that fixes something. Similarly, we might use ELSE and ENDIF for the labels marking the start of the else clause and the end of an if statement, but more generally, we will have to add something to these to distinguish one such label from another, because most programs contain more than one if statement.

Exercises

s) Translate the following example program to Hawk assembly language:

int x = 0;
char ch = 'a';
dspini();
while (x < 9) {
    dspat( x, x );
    dspch( ch );
    x = x + 1;
    ch = ch + 1;
}

t) Write a Hawk program that produces the following output using a loop that iterates exactly 6 times, where each iteration outputs one letter 4 times, with appropriate calls to dspat(), as needed, to put the letters in the correct rows and columns.

	 ABCDEF
	A      A
	B      B
	C      C
	D      D
	E      E
	F      F
	 ABCDEF
	

u) Add a nested loop to the example program so that it produces the output:

        a
         bb
          ccc
           dddd
            eeeee
             ffffff
              ggggggg
               hhhhhhhh