Most programming languages (Standard Pascal is a major exception) allow large programs to be broken into separate compilation units. The idea is that, when making changes, only those parts of the program that have been changed need to be recompiled with each change. In addition, pre-compiled libraries of code can be used.
The SMAL assembly language, like most assembly languages, allows large assembly language programs to be broken up and the results assembled independantly and then linked together later. Consider the following example program:
S = main main: ; the main program JSR R1,proc CLR R1 JUMPS R1 ; gracefully stop the program ; ---------------------- proc: ; this procedure does nothing! JUMPS R1Suppose we wanted to break this program into two separate pieces, perhaps because the program is so large and unwieldy. The way we do this in the Hawk language is with the EXT and INT assembly directives, documented in the SMAL manual section on External and Internal Symbols. The obvious way to break up this code would be to put the main program in one file:
TITLE main.a, the main program EXT proc Pproc: W proc S main main: ; the main program LOAD R1,Pproc JSRS R1,R1 ; call proc CLR R1 JUMPS R1 ; gracefully stop the program ENDHere, the EXT directive tells the assembler that the symbol "proc" is not defined in this source file, it is defined externally. As a result, the assembler includes notations in the object file main.o telling a piece of software called the linker to look for a definition in some other object file. Our second file will hold the definition:
TITLE proc.a, the procedure INT proc ; ---------------------- proc: ; this procedure does nothing! JMPS R1Here, the INT directive tells the assembler that the symbol "proc" is defined internally in this file, for use by other object files. As a result, the assembler includes notations in the object file telling the linker the value of the symbol proc.
Note the change in the way proc was called! This is because our assembler cannot compute a PC-relative address for proc when the value of proc is unknown at assembly time. Instead, the program is written with a word, procp, that holds a pointer to proc. The linker will fill in the value of this word when the two halves of the program are linked, and the simple JSR has been replaced with a LOAD followed by a JSRS to the address held in the loaded word.
Incidentally, as in C and C++, it is easy to create a header file that simplifies our formulation of the main program above. Consider the following header file:
; proc.h, the header file for users of proc EXT proc Pproc: W proc ENDGiven this header file, the main program can be written as follows:
TITLE main.a, the main program USE "/group/22c018/hawk.system" USE "proc.h" S main main: ; the main program CALL proc CLR R1 JUMPS R1 ; gracefully stop the program ENDHere, we have used the CALL macro defined in the file /group/22c018/hawk.system along with the linkage definitions given in the header file. Together, these allow compact and notationally clean assembly code to call external procedures, assuming that the header file defines a pointer for each procedure with a P prefix on the procedure name.
Asembling proc.a and either version of main.a given above would produce two object files, main.o and proc.o; these, separately, are of no particular use! To use them, we must link them together into one object file. If you have used the C or C++ compilers on a UNIX system, you've probably found that you could list multiple source or object files on the command line for the compiler, for example:
cc -o file file.c other.o -LlibThe cc command under UNIX is actually a complicated shell script that uses the C compiler to compile any files with names ending in .c, then takes the output of the compiler (a file with a name ending in .o) and uses the linker to combine all the .o files, plus the indicated library, to make an output file.
We have a similar shell script, "link", that runs the SMAL linker to produce an executable object file. Thus, after assembling main.a and proc.a to make main.o and proc.o, we can link them using:
link main.o proc.oThis shell script uses the hawk linker to combine main.o and proc.o into a new object file, called link.o, by default. Our link script also, by default, links a copy of the Hawk operating system into the output, so that your program may call any of a set of standard system routines for input/output and other operations.
Where do the programs combined by the linker end up in memory? This is not a trivial question! It is possible to pin each of the programs down by explicitly setting the assembly origin in each, but this is not recommended! By default, if an assembly source file does not set the origin, the assembler leaves the origin unspecified in the object file, so that the linker can freely move the object file whereever it is needed in memory. This is called relocatable assembly.
The linker is then responsible for determining where in memory the program should be put. Our linker script appends all relocatable code together, starting at location #1000 in memory (this is in ROM). The linkers for essentially all other machines on the market today operate similarly, although the particular starting location of the linked output varies from machine to machine, and some linkers (including ours) can produce relocatable output, although we are not using this feature.
The term relocation refers to moving the assembled code from one location in memory to another. In our case, we use the linker to determine the final placement of programs in memory, so we say that the linker relocates the assembled code to its final location.
Relocation poses a problem! Consider this snippet of code:
label: W label W label + 4 W label + 8This stores 3 words in memory, each holding its own address. If relocation was not involved, a value could be assigned to "label" at assembly time, and the expressions "label + 4" and "label + 8" could be evaluated by the assembler. Because we allow for relocation, the evaluation of these expressions has to be deferred.
A program may inhibit relocation by explicitly setting the assembly origin. Thus, if a program begins with:
. = #1000then the program is not relocatable, and the assembler will be able to compute values for all expressions at assembly time. Such a program is described as an absolute program.
The assembler divides the universes of values and identifiers into two classes, absolute relocatable. Absolute values are known at assembly time, and therefore, all operations are legal on them! Relocatable values are unknown at assembly time because they depend on work done by the linker. It is illegal to add two relocatable values, or to shift or perform logical operations like and and or on relocatable values. Any of these operations will result in a "misuse of relocation" error message.
It is legal to add absolute constants to relocatable values, and it is legal to take the difference of two relocatable values, so long as the difference is an absolute number that can be computed at assembly time. This is because all relocatable values are represented, inside the assembler, as the sum of some base value known only to the linker plus a constant offset known at assembly time. The base values are:
The assembler's COMMON directive has the following syntax:
COMMON name,sizeThis declares name, as an external symbol (just as if it had been declared with an EXT directive). It also tells the linker to allocate a block of size bytes of memory, and to define name as the label on the start of that block.
If two different separately assembled parts of a program contain COMMON declarations with the same name, they will reference the same block of storage. The size of the block will be the size of the block as specified by the first program the linker finds that references that block. Our linker is actually quite flexible about where COMMON blocks should be put in memory, but we will ignore this flexibility and use it to put all COMMON blocks in RAM, while all code goes in ROM.
FORTRAN and C (and C++) all support exactly the same notion of COMMON storage. In FORTRAN, even the name is the same. In C, each external global variable corresponds to a SMAL COMMON. The linkers that support C, FORTRAN and SMAL all do the same thing with these variables.
In C and FORTRAN, these variables that are shared between compilation units can be initialized (in C, by having one and exactly one compilation unit specify an initial value, in old versions of FORTRAN, by using a BLOCK DATA statement). In SMAL, we can specify initial values for a COMMON as follows:
COMMON v,vsize savelc = . . = v W v1 ;\ W v2 ; > some initial values W v3 ;/ vsize = .-v . = savelcHere, the variable v has been declared as being made of 3 consecutive words with the values v1, v2 and v3. The variable v will be stored in RAM because all COMMON blocks are stored in RAM. Instead of counting bytes manually, the above code lets the assembler count the bytes in these 3 words, computing this as the value of the assembly-time symbol vsize. The symbol savelc is used to save the value of the location counter while the common is being initialized, so that the remainder of the assembly will pick up where it left off before the common was declared and initialized.
As an illustration of the utility of the SMAL linker, consider the monitor we have been using to catch illegal instruction and illegal memory reference traps. When this monitor intercepts a trap, it saves the registers and prints out an error message on the screen, including the values of the memory address that was referenced and the address of the instruction that caused the trap. To print this output, the monitor contained code to print strings on the screen and to print hexadecimal numbers on the screen. These are useful routines, and, using separate assembly, you can use them!
All monitor routines are linked through R1, and those that require space for auxiliary variables allocate it using R2 as a stack pointer.
TITLE --- USE "/group/22c018/hawk.macs" S START ; this is a main program ; linkage to other routines EXT DSPINI,DSPST PDSPINI:W DSPINI PDSPST: W DSPST ; the stack needed for procedure calling COMMON STACK,#1000 STACKP W STACK ; code! START: LOAD R2,STACKP ; stack is setup LOAD R1,PDSPINI JSRS R1,R1 ; init display LEA R3,HELLO LOAD R1,PDSPST JSRS R1,R1 ; output string CLR R1 JMPS R1 ; stop HELLO: ASCII "Hello World",0When you use the link command to invoke the SMAL linker, it always links the monitor with your program, so, to assemble and link the above, assuming it is stored in a file named hello.a, just use the following commands, in sequence:
smal hello.a link hello.oThis will leave executable object code in the linker's output file, link.o