Most programming languages (Standard Pascal is a major exception) allow large programs to be broken into separate compilation units. The idea is that, when making changes, only those parts of the program that have been changed need to be recompiled with each change. In addition, pre-compiled libraries of code can be used.
The SMAL assembly language, like most assembly languages, allows large assembly languages to be broken up and the results assembled independantly and then linked together later. Consider the following example program:
S = main main: ; the main program JSR R1,proc JMP 0 ; stop the program ; ---------------------- proc: ; this procedure does nothing! JMPS R1Suppose we wanted to break this program into two separate pieces, perhaps because the program is so large and unwieldy. The way we do this in the Hawk language is with the EXT and INT assembly directives, documented in the SMAL manual section on External and Internal Symbols. The obvious way to break up this code would be to put the main program in one file:
TITLE main.a, the main program S = main main: ; the main program LOAD R1,procp JSRS R1,R1 ; call proc JMP 0 ; stop the program EXT proc ALIGN 4 procp: W procHere, the EXT directive tells the assembler that the symbol "proc" is not defined in this source file, it is defined externally. As a result, the assembler includes notations in the object file main.o telling a piece of software called the linker to look for a definition in some other object file. Our second file will hold the definition:
TITLE proc.a, the procedure INT proc ; ---------------------- proc: ; this procedure does nothing! JMPS R1Here, the INT directive tells the assembler that the symbol "proc" is defined internally in this file, for use by other object files. As a result, the assembler includes notations in the object file telling the linker the value of the symbol proc.
Note the change in the way proc was called! This is because our assembler cannot compute a PC-relative address for proc when the value of proc is unknown at assembly time. Instead, the program is written with a word, procp, that holds a pointer to proc. The linker will fill in the value of this word when the two halves of the program are linked, and the simple JSR has been replaced with a LOAD followed by a JSRS to the address held in the loaded word.
Asembling the two files above would produce two object files, main.o and proc.o; these, separately, are of no particular use! To use them, we must link them together into one object file. If you have used the C or C++ compilers on a UNIX system, you've probably found that you could list multiple source or object files on the command line for the compiler, for example:
cc -o file file.c other.o -LlibThe cc command under UNIX is actually a complicated shell script that uses the C compiler to compile any files with names ending in .c, then takes the output of the compiler (a file with a name ending in .o) and uses the linker to combine all the .o files, plus the indicated library, to make an output file.
We have a similar shell script, "link", that runs the SMAL linker to produce an executable object file. Thus, after assembling main.a and proc.a to make main.o and proc.o, we can link them using:
link main.o proc.oThis shell script uses the hawk linker to combine main.o and proc.o into a new object file, called link.o, by default.
Where do the programs combined by the linker end up in memory? This is not a trivial question! It is possible to pin each of the programs down by explicitly setting the assembly origin in each, but this is not recommended! By default, if an assembly source file does not set the origin, the assembler leaves the origin unspecified in the object file, so that the linker can freely move the object file whereever it is needed in memory. This is called relocatable assembly.
The linker is then responsible for determining where in memory the program should be put. Our linker script appends all relocatable code together, starting at location #1000 in memory (this is in ROM). The term relocation refers to the linker's job of determining the final placement of programs in memory.
Note that relocation poses a problem! Consider this snippet of code:
label: W label W label + 4 W label + 8This stores 3 words in memory, each holding its own address. If relocation was not involved, a value could be assigned to "label" at assembly time, and the expressions "label + 4" and "label + 8" could be evaluated by the assembler. Because we allow for relocation, the evaluation of these expressions has to be deferred.
A program may inhibit relocation by explicitly setting the assembly origin. Thus, if a program begins with:
. = #1000the program is not relocatable, and the assembler will be able to compute values for all expressions at assembly time. Such a program is described as an absolute program.
The assembler divides the universes of values and identifiers into two classes, absolute relocatable. Absolute values are known at assembly time, and therefore, all operations are legal on them! Relocatable values are unknown at assembly time because they depend on work done by the linker. It is illegal to add two relocatable values, or to shift or perform logical operations like and and or on relocatable values. Any of these operations will result in a "misuse of relocation" error message.
It is legal to add absolute constants to relocatable values, and it is legal to take the difference of two relocatable values, so long as the difference is an absolute number that can be computed at assembly time. This is because all relocatable values are represented, inside the assembler, as the sum of some base value known to the linker plus a constant offset known at assembly time. The base values are:
The assembler's COMMON directive has the following syntax:
COMMON name,sizeThis declares name, as an external symbol (just as if it had been declared with an EXT directive). It also tells the linker to allocate a block of size bytes of memory, and to define name as the label on the start of that block.
If two different separately assembled parts of a program contain COMMON declarations with the same name, they will reference the same block of storage. The size of the block will be the size of the block as specified by the first program the linker finds that references that block. Our linker is actually quite flexible about where COMMON blocks should be put in memory, but we will ignore this flexibility and use it to put all COMMON blocks in RAM, while all code goes in ROM.
FORTRAN and C (and C++) all support exactly the same notion of COMMON storage. In FORTRAN, even the name is the same. In C, each external global variables corresponds to a SMAL COMMON. The linkers that support C, FORTRAN and SMAL all do the same thing with these variables.
In C and FORTRAN, these variables that are shared between compilation units can be initialized (in C, by having one and exactly one compilation unit specify an initial value, in old versions of FORTRAN, by using a BLOCK DATA statement). In SMAL, we can specify initial values for a COMMON as follows:
COMMON v,vsize savelc = . . = v W v1 ;\ W v2 ; > some initial values W v3 ;/ vsize = .-v . = savelcHere, the variable v has been declared as being made of 3 consecutive words with values v1, v2 and v3, stored in RAM (because all COMMON blocks are stored in RAM). Instead of counting bytes manually, the above code lets the assembler count the bytes in these 3 words, computing this as the value of the assembly-time symbol vsize. The symbol savelc is used to save the value of the location counter while the common is being initialized, so that the remainder of the assembly will pick up where it was before the common was declared and initialized.
As an illustration of the utility of the SMAL linker, consider the monitor we have been using to catch illegal instruction and illegal memory reference traps. When this monitor intercepts a trap, it saves the registers and prints out an error message on the screen, including the values of the memory address that was referenced and the address of the instruction that caused the trap. To print this output, the monitor contained code to print strings on the screen and to print hexadecimal numbers on the screen. These are useful routines, and, using separate assembly, you can use them!
All monitor routines are linked through R1, and those that require space for auxiliary variables allocate it using R2 as a stack pointer.
TITLE --- USE "/group/22c018/hawk.macs" S START ; this is a main program ; linkage to other routines EXT DSPINI,DSPST PDSPINI:W DSPINI PDSPST: W DSPST ; the stack needed for procedure calling COMMON STACK,#1000 STACKP W STACK ; code! START: LOAD R2,STACKP ; stack is setup LOAD R1,PDSPINI JSRS R1,R1 ; init display LEA R3,HELLO LOAD R1,PDSPST JSRS R1,R1 ; output string CLR R1 JMPS R1 ; stop HELLO: ASCII "Hello World",0When you use the link command to invoke the SMAL linker, it always links the monitor with your program, so, to assemble and link the above, assuming it is stored in a file named hello.a, just use the following commands, in sequence:
smal hello.a link hello.oThis will leave executable object code in the linker's output file, link.o