22C:18, Lecture 15, Fall 1996

Douglas W. Jones
University of Iowa Department of Computer Science

Separate Assembly

Most programming languages (Standard Pascal is a major exception) allow large programs to be broken into separate compilation units. The idea is that, when making changes, only those parts of the program that have been changed need to be recompiled with each change. In addition, pre-compiled libraries of code can be used.

The SMAL assembly language, like most assembly languages, allows large assembly language programs to be broken up and the results assembled independantly and then linked together later. Consider the following example program:

		S = main
	main:	; the main program
		JSR	R1,proc
		CLR	R1
		JUMPS	R1	; gracefully stop the program
	
	; ----------------------
	proc:	; this procedure does nothing!
		JUMPS	R1
Suppose we wanted to break this program into two separate pieces, perhaps because the program is so large and unwieldy. The way we do this in the Hawk language is with the EXT and INT assembly directives, documented in the SMAL manual section on External and Internal Symbols. The obvious way to break up this code would be to put the main program in one file:
		TITLE	main.a, the main program

		EXT	proc
	Pproc:  W       proc

        	S	main

	main:   ; the main program
        	LOAD    R1,Pproc
        	JSRS    R1,R1   ; call proc
		CLR	R1
		JUMPS	R1	; gracefully stop the program

		END
Here, the EXT directive tells the assembler that the symbol "proc" is not defined in this source file, it is defined externally. As a result, the assembler includes notations in the object file main.o telling a piece of software called the linker to look for a definition in some other object file. Our second file will hold the definition:
		TITLE	proc.a, the procedure
		INT	proc
	; ----------------------
	proc:	; this procedure does nothing!
		JMPS	R1
Here, the INT directive tells the assembler that the symbol "proc" is defined internally in this file, for use by other object files. As a result, the assembler includes notations in the object file telling the linker the value of the symbol proc.

Note the change in the way proc was called! This is because our assembler cannot compute a PC-relative address for proc when the value of proc is unknown at assembly time. Instead, the program is written with a word, procp, that holds a pointer to proc. The linker will fill in the value of this word when the two halves of the program are linked, and the simple JSR has been replaced with a LOAD followed by a JSRS to the address held in the loaded word.

Incidentally, as in C and C++, it is easy to create a header file that simplifies our formulation of the main program above. Consider the following header file:

		; proc.h, the header file for users of proc

		EXT	proc
	Pproc:  W       proc

		END
Given this header file, the main program can be written as follows:
		TITLE	main.a, the main program

		USE	"/group/22c018/hawk.system"
		USE	"proc.h"

        	S	main

	main:   ; the main program
		CALL	proc
		CLR	R1
		JUMPS	R1	; gracefully stop the program

		END
Here, we have used the CALL macro defined in the file /group/22c018/hawk.system along with the linkage definitions given in the header file. Together, these allow compact and notationally clean assembly code to call external procedures, assuming that the header file defines a pointer for each procedure with a P prefix on the procedure name.

Asembling proc.a and either version of main.a given above would produce two object files, main.o and proc.o; these, separately, are of no particular use! To use them, we must link them together into one object file. If you have used the C or C++ compilers on a UNIX system, you've probably found that you could list multiple source or object files on the command line for the compiler, for example:

	cc -o file file.c other.o -Llib
The cc command under UNIX is actually a complicated shell script that uses the C compiler to compile any files with names ending in .c, then takes the output of the compiler (a file with a name ending in .o) and uses the linker to combine all the .o files, plus the indicated library, to make an output file.

We have a similar shell script, "link", that runs the SMAL linker to produce an executable object file. Thus, after assembling main.a and proc.a to make main.o and proc.o, we can link them using:

	link main.o proc.o
This shell script uses the hawk linker to combine main.o and proc.o into a new object file, called link.o, by default. Our link script also, by default, links a copy of the Hawk operating system into the output, so that your program may call any of a set of standard system routines for input/output and other operations.

Relocation

Where do the programs combined by the linker end up in memory? This is not a trivial question! It is possible to pin each of the programs down by explicitly setting the assembly origin in each, but this is not recommended! By default, if an assembly source file does not set the origin, the assembler leaves the origin unspecified in the object file, so that the linker can freely move the object file whereever it is needed in memory. This is called relocatable assembly.

The linker is then responsible for determining where in memory the program should be put. Our linker script appends all relocatable code together, starting at location #1000 in memory (this is in ROM). The linkers for essentially all other machines on the market today operate similarly, although the particular starting location of the linked output varies from machine to machine, and some linkers (including ours) can produce relocatable output, although we are not using this feature.

The term relocation refers to moving the assembled code from one location in memory to another. In our case, we use the linker to determine the final placement of programs in memory, so we say that the linker relocates the assembled code to its final location.

Relocation poses a problem! Consider this snippet of code:

	label:	W	label
		W	label + 4
		W	label + 8
This stores 3 words in memory, each holding its own address. If relocation was not involved, a value could be assigned to "label" at assembly time, and the expressions "label + 4" and "label + 8" could be evaluated by the assembler. Because we allow for relocation, the evaluation of these expressions has to be deferred.

A program may inhibit relocation by explicitly setting the assembly origin. Thus, if a program begins with:

	. = #1000
then the program is not relocatable, and the assembler will be able to compute values for all expressions at assembly time. Such a program is described as an absolute program.

The assembler divides the universes of values and identifiers into two classes, absolute relocatable. Absolute values are known at assembly time, and therefore, all operations are legal on them! Relocatable values are unknown at assembly time because they depend on work done by the linker. It is illegal to add two relocatable values, or to shift or perform logical operations like and and or on relocatable values. Any of these operations will result in a "misuse of relocation" error message.

It is legal to add absolute constants to relocatable values, and it is legal to take the difference of two relocatable values, so long as the difference is an absolute number that can be computed at assembly time. This is because all relocatable values are represented, inside the assembler, as the sum of some base value known only to the linker plus a constant offset known at assembly time. The base values are:

Commons

The assembler's COMMON directive has the following syntax:

	COMMON	name,size
This declares name, as an external symbol (just as if it had been declared with an EXT directive). It also tells the linker to allocate a block of size bytes of memory, and to define name as the label on the start of that block.

If two different separately assembled parts of a program contain COMMON declarations with the same name, they will reference the same block of storage. The size of the block will be the size of the block as specified by the first program the linker finds that references that block. Our linker is actually quite flexible about where COMMON blocks should be put in memory, but we will ignore this flexibility and use it to put all COMMON blocks in RAM, while all code goes in ROM.

FORTRAN and C (and C++) all support exactly the same notion of COMMON storage. In FORTRAN, even the name is the same. In C, each external global variable corresponds to a SMAL COMMON. The linkers that support C, FORTRAN and SMAL all do the same thing with these variables.

In C and FORTRAN, these variables that are shared between compilation units can be initialized (in C, by having one and exactly one compilation unit specify an initial value, in old versions of FORTRAN, by using a BLOCK DATA statement). In SMAL, we can specify initial values for a COMMON as follows:

		COMMON	v,vsize
	savelc	=	.
	.	=	v
		W	v1	;\
		W	v2	; > some initial values
		W	v3	;/
	vsize	=	.-v
	.	=	savelc
Here, the variable v has been declared as being made of 3 consecutive words with the values v1, v2 and v3. The variable v will be stored in RAM because all COMMON blocks are stored in RAM. Instead of counting bytes manually, the above code lets the assembler count the bytes in these 3 words, computing this as the value of the assembly-time symbol vsize. The symbol savelc is used to save the value of the location counter while the common is being initialized, so that the remainder of the assembly will pick up where it left off before the common was declared and initialized.

The Hawk System Library

As an illustration of the utility of the SMAL linker, consider the monitor we have been using to catch illegal instruction and illegal memory reference traps. When this monitor intercepts a trap, it saves the registers and prints out an error message on the screen, including the values of the memory address that was referenced and the address of the instruction that caused the trap. To print this output, the monitor contained code to print strings on the screen and to print hexadecimal numbers on the screen. These are useful routines, and, using separate assembly, you can use them!

All monitor routines are linked through R1, and those that require space for auxiliary variables allocate it using R2 as a stack pointer.

DSPINI - display initialize
Initializes the display, using R3 and R4. This routine should be called before any other monitor routines that generate output.

DSPAT - move to X=R3, Y=R4
Moves the output cursor to row Y, column X, used R3 to R7. (0,0) is the upper left corner of the display, and coordinates are measured in units of one character.

DSPCH - output character in R3
Output the character, using R4 and R5. The character appears at the current position on the screen, and the current position is advanced one column. If the end of line is reached, the position is advanced to the next line.

DSPST - output string pointed to by R3
Output the null terminated string, using R3 to R7. Characters are output using DSPCH.

DSPHX - output R3 in hex
Converts R3 to hex and outputs all 8 hex digits, using R3 to R7. Characters are output using DSPCH.

TIMES - multiply R3 by R4, with result in R3.
Produces only the low 32 bits of the 64 bit product, using R4 to R6.
Typically, a program that uses these routines will begin by allocating a stack, and then it will call the output initialization routine. The most convenient place to allocate the stack is in a common, for example, as shown in the following program "Hello World":
		TITLE	---
		USE	"/group/22c018/hawk.macs"

		S	START	; this is a main program

	; linkage to other routines	
		EXT	DSPINI,DSPST
	PDSPINI:W	DSPINI
	PDSPST:	W	DSPST

	; the stack needed for procedure calling
		COMMON	STACK,#1000
	STACKP	W	STACK

	; code!
	START:	LOAD	R2,STACKP	; stack is setup
		LOAD	R1,PDSPINI
		JSRS	R1,R1		; init display
		LEA	R3,HELLO
		LOAD	R1,PDSPST
		JSRS	R1,R1		; output string
		CLR	R1
		JMPS	R1		; stop

	HELLO:	ASCII	"Hello World",0
When you use the link command to invoke the SMAL linker, it always links the monitor with your program, so, to assemble and link the above, assuming it is stored in a file named hello.a, just use the following commands, in sequence:
	smal hello.a
	link hello.o
This will leave executable object code in the linker's output file, link.o