Lecture 7, Object Codes, Loaders and Linkers

Final steps on the road to machine code

Part of the notes for 22C:112, Operating Systems
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

The Problem

So called load-and-go assemblers and compilers load their output directly into main memory for execution, but this is generally not desirable. The most elementary objecto to load-and-go assemblers and compilers is that they cannot be used to reassemble or recompile themselves! You do not make arbitrary changes to a program while it is running!

If a language processor requires m bytes of memory and the largest program to be run requires n bytes, the machine must have m+n bytes of main memory to support a load-and-go assembler or compiler. If the output of the language processor is saved to disk and then re-loaded for execution, we need only the larger of m and n bytes, not the sum.

This argument from necessity is not the only justification for storing the output of the language processor in secondary memory. Compiling or assembling a program may take time, while users expect no delay at all between their request for a program and its execution. Storing the output of the language processor eliminates the need to wait for compilation or assembly each time a program is run. In addition, if programmers distribute pre-compiled or pre-assembled code, it is difficult for users to tinker with applications, thus improving application security and reliability as well as improving the programmer's own job security.

When a program has been assembled or compiled into a file of some kind, the file must first be copied into main memory before the program can run. This introduces important new terms: Because The saved copy, the object code, cannot be run until it is moved to main memory as executable code or machine code. The purpose of an assembler or compiler is therefore to translate source code into object code.

The program which copies the object code into main memory is called a loader. Loaders can be viewed as translating object code into machine code. Figure 7.1 describes the resulting view of the translation process and the different views of the meaning of a program as it is being translated (this is a modification of Figure 2.5).

                -------------
               | Source Text |
                -------------
                   /      \      assembler's
    programmer's  /        \   view of meaning
      view of    /       -------------
      meaning   /       | Object Code |
               /         -------------
              /                    \   loader's view
             /                      \    of meaning
     ------------------           --------------
    | Abstract Meaning | ------- | Machine Code |
     ------------------           --------------
                        hardware's
                     view of meaning

Figure 7.1. Views of the meaning of a program.

Object Code Formats

The simplest object code format is a simple bit for bit copy of the machine language program itself, called a memory image of the machine language. Consider a machine with, for example, 64K bytes of main memory, of which 1K bytes are reserved for the loader, then each object file would be exactly 63K bytes long, and the loader would be trivial.

Use of this object file format would waste a large amount of secondary memory if the average program size is much smaller than the maximum, and it makes it difficult to contemplating expanding the system or moving object code files from one system to another unless the two have identical memory configurations. Furthermore, it is hard to extend this scheme to support the linking of subroutine libraries or relocation (topics to be discussed later in this section).

A second simple object code format consists of a sequence of address-value pairs. This may be produced by replacing each statement in the assembler which assigns a value to a location in memory, for example M[a]:=v, with a write statement placing the address and value in the object file. If a textual file format is used, the addresses and values would typically be written in octal or hexadecimal to simplify the conversion to binary required in the loader.

In fact, this second code format differs only slightly from the object code field of the assembly listing format used in many of the figures in Chapter 4 and 5, where the second two columns in each output listing gave the memory locations modified and the values placed there. Of course, an assembly listing is not a particularly efficient object code format, since it is designed for convenient human readability. With disk space costing as little as it does, the question of efficiency is not dominated by the cost of storing the object file, but rather, the complexity of the loader and the extra computations done in the loader during reading such an object file. Nobody likes systems where loading a program takes a long time!

The use of a textual object code format for a small assembly language fragment is illustrated in Figure 7.2.

Assembly Code
    . = 3
      B 6
      W 3
    . = 7
      W 1
Assembly Listing
     1             |    . = 3
     2 0003: 06    |      B 6
     3 0004: 0003  |      W 3
     4             |    . = 7
     5 0007: 0001  |      W 1
Object Code
0003 06
0004 03
0005 00
0007 01
0008 00

Figure 7.2. A simple object code format.

A loader for this object format is relatively easy to write. In standard C, for example, it might be written as is shown in Figure 7.3.

#include <stdio.h>
main()
{
    for (;;) {
        int loc,val;
        fscanf( stdin, "%x %x", &loc, &val );
        *((char *)loc) = val;
    }
}

Figure 7.3. A loader for the object code format in Figure 7.2

The loader shown in Figure 7.3 is not particularly desirable because it rests on top of the standard C libarary routine fscanf() from <stdio.h>. This routine is relatively large, containing all the code necessary to handle 14 different input formats ranging from character input to floating point!

The assignment *((char*)loc)=val in this code illustrates a fundamental aspect of all loaders! The loader must be able to read an integer, loc from the object file and use it as a memory address. The type-casting prefix operator (char*) converts an integer to a pointer to an object of type char. If the address from the object file cannot be loaded by the loader, for example, if it is an address in ROM or an empty memory socket, this assignment will be an error, but depending on the hardware, this error may or may not be detected. If the address can be loaded, the contents of the variable val will be stored in the indicated memory location.

Although functional, this simple object code is expensive. Assuming a binary representation for both addresses and data, each byte loaded in memory will carry with it two bytes of address. Furthermore, because this is a textual object code, each hexadecimal digit is stored as one byte, and we have used spaces and newlines to separate successive fields items in the code. So, our total cost for one byte of machine code is 8 bytes of object code! As a result, a program that fills 64K of RAM will have an object file that fills 512K of disk space! This is hardly desirable!

On some machines, these economic arguments work out quite differently. For example, on the CDC 6600, introduced in 1965 and usually considered to be the first supercomputer, the smallest addressable unit of memory was one 60 bit word, while addresses were limited to 18 bits. In this case, still using hexadecimal for the values and still using 8-bit characters, this object code would use 22 characters to encode each 60 bit word, so the object file would be about 3 times the size of the machine code instead of the factor of 8 for our 16 bit running example.

The usual path to more efficient object encodings makes use of the fact that most of the values to be loaded in memory will be placed in consecutive memory addresses, since explicit manipulation of the assembly location counter is generally quite rare! This suggests the use of addresses followed by sequences of values in the object code. Although this is a simple idea, it introduces a new problem: How can the address heading a new sequence of values be distinguished from a value continuing the previous sequence? There are two basic classes of solutions to this problem, blocking and tagging.

In a blocked object code, each sequence of values to be loaded in consecutive addresses is placed in a separate block, along with the address of the first item in the sequence. In a tagged object code, values and addresses are differentiated by a tag (typically one bit); if data in the tagged object code is divided into blocks, this division is only for the sake of convenient storage and does not determine which entries in the block represent addresses and which represent values.

Blocked object codes are frequently used in systems where the storage medium used for the object code has a natural record structure. Many classic storage media have this characteristic. With punched cards, for example, each card could represent a block of data to be loaded, with the address encoded in the first few columns of holes on the card, followed by the data. With classic half-inch wide 9-track magnetic tape, each block of data on the tape could begin with a memory address, with the remainder of the block containing data to be stored in memory.

A common blocked object code still in widespread use is called the Intel Hex Code. This code was developed by Intel for the first generation of 8-bit microprocessors and read-only memory chips, and it is still widely used as an object format for the kind of 8-bit and 16-bit processors used as embedded microcontrollers.

This code was originally intended for use with paper tape. In Intel's early catalog of Read-Only Memory (ROM) products, the standard way to specify the contents of a ROM was to send Intel the paper tape, in Intel Hex format, and let them manufacture the chips holding the data you specify. This was before field programmable ROM chips (known as FPROM) or Erasable/Programmable ROM chips, EPROM. Today, this usage continues. For example, it is possible to order pre-programmed microcontrollers from Microchip or Motorola, two major manufacturers of such chips, by E-mailing, in Intel Hex format, the code you want loaded into the ROM of the microcontroller.

Intel Hex code is a textual code, with each line holding one block of load data, where all of the information on the line is encoded in hexadecimal. Each line must begin with a colon, followed by a 2-digit count of the number of data bytes on the line, a 4-digit address where the data should be stored, and a 2-digit block-type indicator; type zero is loadable data, and type 1 is used for an end of file marker. Following the type marker come the data bytes, and then a 2-digit checksum is always the last item on the line. The checksum serves to make it possible to detect malformed object files; in the case of Intel Hex format, the checksum is the two's complement of the least significant 8 bits of the sum of all of the other bytes on the line, including the byte count, the two bytes of the 16 bit address, and type. This object code is demonstrated by the example in Figure 7.4.

Assembly Listing
1                    | . = #0100
2  0100: 05          |   B 5
3  0101: 0F 00       |   W 15
4  0103: FF 80       |   W #80FF
5                    | . = #0008
6  0008: FF          |   B #FF
7  0009: 07 00       |   W #0007
Intel Hex Code
:05010000050F00FF8067
:03000800FF0700EF
:00000001FF
Decoding of the first line of this code:
: 05 0100 00 05 0F 00 FF 80 67
   |   |   |  |  |  |  |  |  |_ checksum
   |   |   |  |  |  |  |  |
   |   |   |  |__|__|__|__|_ 5 data bytes
   |   |   |
   |   |   |____________ type
   |   |
   |   |_____________ starting load address
   |
   |______________ byte count

Figure 7.4. The Intel hex object code format.

This is not a very efficient code, since it includes redundant information and because it uses a textual representation. The redundancy, of course, is useful, since it allows detection of corrupt files, and the textual format is useful for inclusion in E-mail or similar contexts where the benefits of a textual format outweigh considerations of efficiency.
The use of a purely textual format has an additional historical reason: In early usage, it was common to use a teletype to punch the paper tape that would be sent to Intel to specify the contents of the ROM chip. The teletype mechanism insisted on attempting to print all material sent to it as it punched the paper tape, so Intel avoided using a non-textual object codes because it could cause the teletype to print wild nonsense. Honeywell, in the late 1960's, produced a line of 16-bit minicomputers, notably the DDP-516, that used a paper-tape object code that was very carefully matched to the model 33 Teletype so that all bytes included in a legal object code file would be non-printing and would not cause the teletype to move its print head. This was very clever, but very specific to one specific (and now very much obsolete) print mechanism!

Simple object codes such as have been described in this section are frequently used for one very important purpose: Bootstrapping or cold-starting a computer system.

The term bootstrapping is used because the computer is, in effect, pulling itself up by its own bootstraps (a classic English idiom). The term cold-start is used to indicate that the computer is literally cold when it is started. After it is running, it is warm, and if you restart the machine while it is running, it is called a warm-start. When you do a cold-start, the contents of memory is unknown, while for a warm start, because the power is not turned off and on again, some useful data may still be in memory.

When a computer is first turned on, the only executable code which can be run is that which resides in read-only memory. Frequently, this code is simply a loader, the bootstrap loader, which reads the operating system in; on modern systems, this is usually from disk or CD-ROM, but some machines bootstrap from the network, and historically, many machines were bootstrapped from magnetic tape, paper tape or punched cards. Whether ancient or modern, bootstrap loaders tend to relatively small and realtively simple, making simple data formats such as Intel Hex code very appealing. In many cases, a very simple loader is used to read in a more complex loader which actually loads the operating system.

Historical note: In computers of the 1960's, there was usually no ROM at all. On startup, the machine operator had to enter the bootstrap loader into RAM one bit at a time using toggle switches and pushbuttons on the computer's console. A classic scheme was to enter a program that would read one punched card from the card reader into RAM and then execute the bit pattern read from that card. A similar scheme was used on the MODCOMP-IV computer, where the bootstrap loader was just one machine instruction; when executed from memory location zero after a master-clear or power-up, this instruction would read the contents of sector zero, track zero of disk zero into memory starting at address zero. Once the read finished, the computer would begin executing the data from the block just read in.

Relocation

One problem faced by users of simple assembly and loading systems is where to put programs in memory. Up to this point in the discussion, it has been assumed that assembly programs would start at location zero unless the programmer explicitly set a different assembly origin by modifying the location counter. This is clearly not an acceptable solution on large systems where code must be written without any knowledge of where it will eventually be run. The solution is to introduce a relocation mechanism, that is, a mechanism that allows the decision about where to put a program in memory to be deferred until after the program has been assembled and compiled.

If we require the program to be reassembled in order to change the location where it will be loaded, we could refer to this as assembly-time relocation, and this view is useful, particularly if the source code was compiled and the assembly code itself is viewed as something like an object code for communication between the compiler and a load and go assembler.

At the other extreme, some machine languages allow a running program to be moved about or unloaded from memory and reloaded in a different location without harm. This is called run-time relocation. Run-time relocation is only possible if consistent use is made of base registers or relative addressing. If the machine code or the user data structures contain absolute memory addresses, moving the code and data to a different address would be very difficult, but if all branch addresses and pointers are relative, that is, expressed as displacements from the memory location containing the address to the memory location to which the address refers, we can move the entire block of data holding a program's code and data to a different memory address. Machines that allow programs to be written this way are said to support position independent code.

Virtual memory hardware (to be discussed later) can hide the complexity of position indepent programming from users by using special address translation hardware to perform run-time relocation, but usually, ther term position independent code is reserved for programs which are explicitly coded to be able to be run at any memory address without the need to edit any memory addresses in the code.

Another way of thinking of relocation is in terms of when the objects that make up a program are bound to actual memory locations. We can therefore speak about the binding time of each object in the program. Some objects may be explicitly bound by the programmer. Some objects may be bound to specific memory locations at compile time, compile-time binding, and the binding of some objects may be deferred, for example, until the time the program is loaded in memory or even later, in run-time binding.

If binding is done at any time beteen run-time, when it is appropriate to speak of position independent code, and assembly time or compile time, we must modify the object code to allow a distinction between the values that are already determined -- those representing constants and those representing addresses of objects that are already bound to specific memory addresses, and values that are not-yet bound.

The effect of relocating a program is illustrated in Figure 7.5.

Assembly code     Loaded at #0000   Loaded at #0100

LAB:B 10          0000: 0A          0100: 0A
    W LAB         0001: 00          0101: 00 \relocated
                  0002: 00          0102: 10 /
    W 0           0003: 00          0103: 00
                  0004: 00          0104: 00
    W #0100       0005: 00          0105: 00
                  0006: 10          0106: 10
    W LAB         0007: 00          0107: 00 \relocated
                  0008: 00          0108: 10 /

Figure 7.5. The effect of relocating a program.

In the simplest model of relocation, the loader performs the final binding of code to specific memory locations, and as it does so, it adjusts or relocates all of the pointers from one part of the loaded program to another. As should be clear from Figure 7.5, this adjustment is quite simple! Relocation involves adding a constant, the relocation base, to each address which refers to another location in the same program. The relocation base is usually the same as the address at which the first byte of the program is loaded. Thus, if the program is loaded at address 000016, the relocation base is 0 and the addresses within the program are not modified. If the program is loaded at address 010016, the relocation base is 10016 and this constant is added to each address within the program.

Note that the only words in a program that are relocated are those that are used as addresses. Thus, words which were used as data, op-codes, or other constants must not be modified! such non-relocatable words are referred to as words holding absolute values. The contents of a word, as loaded in memory, usually provide no hint as to whether or not the word is an address or data, but the assembly language source code makes this quite clear:

This is because the relocaton base will be, when the code is finally loaded in memory, the real initial value of the location counter; the initial value of the location counter at assembly time is a tentative zero, to which the actual relocation base will be added.

The location counter for each byte or word assembled into memory is typically derived from the initial value of the location counter through a series of increment operations (by one for bytes, by two for words, etc). Labels take their values from the location counter, so operands that refer to labels take their values, with one more level of removal, from the location counter.

Note that we speak of the initial value of the location counter and derivation from this initial value. If a new assembly origin is set, this new origin may be absolute if or relocatable depending on the value specified; the default origin is typically relocatable, as are all labels derived from it. If the origin is set, for example, to a simple number, this is absolute. It is occasionally useful to be able to assemble a program which has some non-relocatable parts that will be placed in fixed memory locations while other parts are relocatable. This is especially true when assembling parts of the operating system on machines where input/output device interface registers or interrupt handlers must be stored at fixed addresses. Assembly code which references these must use absolute addresses even when the rest of the program is written to be relocatable.

As a result of the above considerations, four different possible modes of loading can be identified:

In order to take these alternatives into account in an assembler, each symbol table entry must contain a flag indicating whether the associated symbol is relocatable or absolute, and the location counter must be similarly flagged. The flag used for this purpose typically also allows a third alternative, undefined, in order to account for programming errors and the initial values associated with identifiers that have not yet been defined.

When a label is defined, it takes on the current value and type of the location counter. The initial value of the location counter is typically the value "relocatable zero", and the value remains relocatable as it is incremented during assembly. When the assembly origin is changed, however, the new value may be either absolute or relocatable depending on the type of the expression used to set the new origin. (in some assemblers, this also depends on the particular assembly directive used to set the origin; typical directives are ABS to set an absolute assembly origin and REL to set a relocatable origin).

The determination of the type of an assembly time expression used as an operand can be quite complex. The value of an expression will be absolute or relocatable depending on the types of the component operands and depending on the operators. If all operands are absolute, of course, the expression has an absolute value, but if an expression has a relocatable component, the situation is somewhat more complex.

In general, the type of an expression containing relocatable component operands may be determined by substituting for each relocatable operand O a new operand (O' + R) where O' is the absolute value of the operand and R is a variable with an unknown value standing for the relocation base that will be specified when the program is loaded. If algebraic manipulation of the value of the expression can be made to remove all uses of R, the expression is absolute. If algebraic manipulation of the expression can reduce it to a form that is the sum of R and some other value, the result is relocatable. If the expression cannot be algebraically manipulated into one of these forms, it cannot be handled by simple relocation and will be considered to be illegal by all assemblers that use this scheme. This approach is illustrated in Figure 7.6.

A:         ; A = (A'+R)        = (A')+R       -- relocatable
B = .      ; B = (B'+R)        = (B')+R       -- relocatable
C = A - B  ; C = (A'+R)-(B'+R) = A'-B'        -- absolute
D = A - 5  ; D = (A'+R)-5      = (A'-5)+R     -- relocatable
E = 5 - A  ; E = 5-(A'+R)      = (5-A')-R     -- illegal
F = A + B  ; F = (A'+R)+(B'+R) = (A'+B')+2R   -- illegal
G = A + 5  ; G = (A'+R)+5      = (A'+5)+R     -- relocatable
H = 5 + A  ; H = 5+(A'+R)      = (5+A')+R     -- relocatable

Figure 7.6. Types of expressions involving relocation.

The simplest relocatable object code is based on the address-value pair code discussed previously. To make such a simple code relocatable, each address and value must be tagged as either absolute or relocatable. This is quite easy on a word-oriented machine, but on a machine allowing addresses to refer to bytes or other fractions of a word, we must face a new problem. Consider what happens if we attempt to relocate a program to address 10016 when a byte contains a relocatable value. Adding 10016 to the initial value in that will do nothing because the maximum value that can be stored in one byte is FF16! As a result, most assemblers forbid any attempt to specify a relocatable value to be stored in a byte or an fraction of a word smaller than the size of a machine address.

To account for these problems in a relocatable object code designed to support our example assembly language, we must tag each value field as either an absolute byte, an absolute word, or a relocatable word, for example, with the tags B, W and R. This is illustrated in Figure 7.7 using a relocatable object not unlike that from Figure 7.2.

Assembly Code
      B 5    ;Store absolute 5 in relocatable 0
      W A    ;Store relocatable 3 in relocatable 1 and 2
    A:W B    ;Store absolute #0103 in relocatable 3 and 4
    . = #0100
      B 5    ;Store absolute 5 in absolute #0100
      W A    ;Store relocatable 3 in absolute #0101 and #0102
    B:W B    ;Store absolute #0103 in absolute #0103 and #0104
Object Code
R0000 B05
R0001 R0003
R0003 W0103
A0100 B05
A0101 R0003
A0103 W0103
Figure 7.7. Relocatable assembly and A simple relocatable object code.

An loader for this simple tagged object code might be constructed as shown in Figure 7.8:

function load( base: integer ): integer;
var ta, tv: char  {the tags on the address and value};
    a, v: integer {the address and value};
    high: integer {the highest loaded relocatable location};
begin
    high := base;
    while not(eof(input)) do begin
        read( ta ); readhex( a ); read( tv ); readhex( v );
        readln;
        if ta = 'r' then a := a + base;
            if a >= high then high := a + 1;
        end;
        if tv = 'r' then v := v + base;
        poke( v mod 256, a );
        a = a + 1;
        if tv <> 'b' then begin
            poke( v div 256, a + 1 );
            a = a + 1;
        end;
        if (tv = 'r') and (a > high) then high := a;
    end;
    load := high;
end {load};

Figure 7.8. A loader for the tagged textual object code format.

The Pascal code given in Figure 7.8 assumes that readhex reads a hexadecimal value into the variable given as a parameter and that it consumes the following delimiter (a space or newline). The Pascal standard includes read; when called with a character variable as a parameter, it reads one character of input. We also assume that poke(v,a) stores the value v in the memory location a.

This loader expects the relocation base as a parameter; when it finishes, it returns the address immediately after the highest relocatable location it loaded. Thus, the caller learns the size of the memory region used by the relocatable part of the loaded program. This feature will be needed if multiple files are to be loaded consecutively in memory; we will need this to link together multiple object files to make one large program.

In practice, textual object code formats are not common because they make inefficient use of memory. The ideas in Figures 7.7 and 7.8 can be easily applied to a binary object code, using one byte to store the tags indicating address and value types, with additional bytes for the binary address and value. Assuming 16 bit values, each pair needs 40 bits, as illustrated in Figure 7.9.

bit:  0  1  2        8  --------  23 24  -------  39
      ----------------------------------------------
use: |ta| tv |      |    address    |     value     |
      ----------------------------------------------
address tag  ta = 0  --> absolute address
                  1  --> relocatable address

value tag    tv = 00 --> byte (use only bits 24 to 32 of value)
                  01 --> absolute word
                  11 --> relocatable word
                  10 --> unused, could mark end of file

Figure 7.9. A binary object code format for tagged address-value pairs.

As with simple absolute object codes, we can easily create a blocked relocatable object code format in order to create a more compact. to compact relocatable object code. In a blocked relocatable object code, relocation information is typically separated from the data to be loaded. This suggests the use of two basic kinds of block, one to hold data to be loaded, and one to specify which locations already loaded need to be modified by adding the relocation base. Since both load addresses and the specifications of addresses to be relocated may be either absolute or relocatable, we end up needing a total of 4 block types, as illustrated in Figure 7.10.

Block types:
DA - data to be loaded in an absolute address.
DR - data to be loaded in a relocatable address.
Each is followed by a load address, a count of data bytes, and the data.

RA - relocation block for data loaded in absolute addresses.
RR - relocation block for data loaded in relocatable addresses.
Each is followed by a count of addresses and then that many addresses.
Textual representation of the object code for Figure 7.7:
DR 0000 05 05 03 00 03 01
DA 0100 05 05 03 00 03 01
RR 01 0001
RA 01 0101

Figure 7.10. A blocked relocatable object code.

Relocatable values are fairly uncommon in object files for most modern architectures, but if a large fraction of the words loaded in memory are likely to be relocatable, the format suggested in Figure 7.10 will not be very efficient because of the long lists of addresses that will end up in relocation information blocks (types RR and RA). When this is the case, tagged codes are a better choice, or mixed codes, for example, using a bit vector at the end of each block to mark the addresses that need relocation. Figure 7.11 illustrates the use of such a code.

Block types:
A - data and relocation for absolute load addresses.
R - data and relocation for relocatable load addresses.
Each is followed by the address of the first data byte to be loaded, followed by the number of bytes to be loaded, followed by that number of bytes and finally a bit vector giving one bit per byte indicating if that byte needs relocation (the bit vector is, of course, is padded to an 8-bit boundary with extra zero bits.
Textual representation of the object code for Figure 7.7:
R 0000 05 05 03 00 03 01 40
A 0100 05 05 03 00 03 01 40
Decoding of the first line of this code:
R 0000 05 05 03 00 03 01 40
|   |   |  |  |  |  |  |  |_ bit vector = 01000xxx
|   |   | 0| 1| 0| 0| 0|
|   |   |  |__|__|__|__|_ load data
|   |   |
|   |   |______________ count of data bytes
|   |
|___|________________ relocatable starting address

Figure 7.11. A blocked object code using relocation bit-vectors.

Note that the bit vectors at the end of each block above hold the pattern 4016 which is 010000002 indicating that the second byte of each block is the first byte of a word that will need relocation. In fact, the final 3 bits of the bit vector are padding, since there are only 5 bytes of data.

A practical tagged relocatable object code might be constructed as follows. An object file is viewed as a continuous sequence of 8 bit bytes, with no structure such as blocks, lines or records imposed by the underlying hardware or the underlying file system. This is the usual view of binary files in modern operating systems. This stream is divided into logical records of variable length, where the first or tag byte of each record gives the record type (4 bits) and the length of the record (4 bits), as shown in Figure 7.12.

Record structure of input:
---------------------------------------------------
  ... | tag |     0 to 15 bytes of data    | ...
---------------------------------------------------
Tag structure:
   ttttnnnn
   tttt      - 4 bits of record type.
       nnnn  - 4 bits giving count of data bytes.
Record types:
   0000 - no operation (usually nnnn = 0).
   0001 - load n bytes of absolute data.
   0010 - load n/2 words of relocatable data.
   0011 - set absolute loader origin (usually n = 2).
   0100 - set relocatable loader origin (usually n = 2).
Textual representation of the object code for Figure 7.7:
42 00 00 11 05 22 03 00 12 03 01 32 00 01 11 05 22 03 00 12 03 01
==------ ==--- ==------ ==------ ==------ ==--- ==------ ==------
                underlining shows record structure

Figure 7.12. A practical tagged relocatable object code for a 16 bit machine.

This tag structure limits each record to 16 bytes, so for long streams of absolute object code, with no relocatable data, we need only start a new load record every 16 bytes, and we have only one byte of overhead per load record, so our overhead in this case will be about 7 percent. Note that extending this code to work with 32-bit addresses is trivial, and we can have ample room to expand this code with other block types, for example, those we might use to support external symbols.

There are many situations where it is hard to control the value of the first or last byte in a file, for example, on machines that insist on rounding each file out to an even number of disk sectors. In such cases, the system will usually pad the file with zeros, and our selection of the tag value 0 as meaning do nothing with a block that contains no data lets us deal with this quite easily. We can even use non-empty no-op blocks to hold comments, copyright notices, and similar "excess baggage" on our object files.

Note that our loader for this object format must itself maintain a location counter, just like an assembler! Absolute data records are used to load any combination of bytes and words when relocation is not required; each byte loaded advances the loader location counter by one. Relocatable data records are used to load words which need to be relocated; each word loaded advances the loader location counter by two. The loader origin records set the loader location counter, and such records typically only appear in the object code if the the input to the assembler set the origin.

A loader for the object code given in Figure 7.12 is given in Figure 7.13.

char * load( char * base )
{
    char * loc; /* the current location */
    int mode;   /* the assembly mode (REL or ABS) */
    enum modes  { REL, ABS };
    char * high;/* the highest loaded relocatable location */
    int typ;    /* 0 to 15, the type of the current block */
    int count;  /* 0 to 15, the count for the current block */
    int value;  /* a value read from the block */
    int temp;   /* temporary byte read from block */

    /* default initial values */
    loc = base;
    mode = REL;
    high = loc;

    /* read the object code */
    for (;;) {
        temp = getc(stdin);
        if (temp < 0) exit; /* exit loop on end of file */

        /* break first byte of block into type and count */
        typ = temp >> 4;
        count = temp & 0xF;

        switch (typ) {
        case 0: /* ignore */
            while (count > 0) {
                value = getc(stdin);
                count = count - 1;
            }
            break;

        case 1: /* load absolute */
            while (count > 0) {
                value = getc(stdin);
                count = count - 1;
                *loc = value;
                loc = loc + 1;
            }
            break;

        case 2: /* load relocatable */
            while (count > 0) {
                value = getc(stdin); /* first 8 bits */
                temp = getc(stdin);  /* second 8 bits */
                count = count - 2;

                value = value + (temp << 8);
                value = (int)(base + value); /* relocate! */

                *loc = value & 0xFF;
                loc = loc + 1;
                *loc = value >> 8;
                loc = loc + 1;
            }

        case 3: /* set absolute origin */
            {
                value = getc(stdin); /* first 8 bits */
                temp = getc(stdin);  /* second 8 bits */
                loc = (char *)(value + (temp << 8));
                mode = ABS;
            }

        case 4: /* set relocatable origin */
            {
                value = getc(stdin); /* first 8 bits */
                temp = getc(stdin);  /* second 8 bits */
                value = value + (temp << 8);
                loc = base + value;  /* relocate! */
                mode = REL;
            }

        } /* end select */
        if ((mode == REL) && (loc > high)) high = loc;
    }
}

Figure 7.13. A loader for the object code given in Figure 7.12.

Note that, as in Figure 7.8, this loader takes the form of a function which is given a pointer to the starting locaton in which the code is to be loaded and returns a pointer to the first location after the relocatable part of the loaded code. This form of the basic load routine will be important in the upcoming discussion of the linking process!

It should be noted that practical object codes include a mechanism for encoding the starting address of a program in the object file! This is needed because there is no requirement that the first loaded location be the first executable instruction of a program, since some programmers (certainly most C, C++ and Pascal programmers) place data structures and secondary routines before the main program. As a result, once a program is loaded, there must be some way to tell the computer system to jump to the appropriate location in the program to start it running. In most assembly languages, this is indicated by a special assembly directive, for example, START, that takes, as an operand, the starting address, and passes it on to the loader.

In the tagged relocatable code given in Figure 7.12, starting addresses might be conveyed by two new record types, one for relocatable and one for absolute starting addresses. These new record types defined in Figure 7.14.

Record types:
   0101 - set absolute starting address (usually n = 2).
   0110 - set relocatable starting address (usually n = 2).

Figure 7.14. Starting addresses added to record types from Figure 7.12.

Most modern computer systems have a relocating loader which is the primary tool used when a user requests that some program be run. Thus, when a user types a command like RUN X from the keyboard, or when a user double clicks on the icon for an executable object file X, this requests that the loader be run with the file X as input before control control is transferred to the starting address of the loaded program. There are many variations on this! When good hardware facilities for run-time relocation are available, a simple (non-relocating) loader may be used to run user programs. Other systems require that loading and running a program be separately requested, and there are a broad range of syntactic alternatives in the command language.

When the Unix command interpretation shell encounters a command that is not listed in its symbol table of built-in commands, it assumes that the command name is the name of an object file and it attempts to load that object file and run it. Actually, there are many different Unix shells, the Bourne shell (sh), the C shell (csh), the "Bourne-again shell" (bash) and others, but all share this characteristic. Most of the common Unix commands are not built into the shell, but rather, are simply the names of object files found in the directory /usr/bin or other directories in the search-path, the list fo directories that the shell checks when it tries to open a file.

Linking

If many different programs which have been translated to object code files are to be loaded in memory, a loader built along the lines shown in Figure 7.15 could be used.

procedure loader( base: integer );
var next: integer;
begin
    next := base;
    while more-files-to-load do begin
        reset( input, next-file );
        next := load(next);
    end;
end {loader};

Figure 7.15. A loader for multiple files.

This uses a load function such as the one defined in Figure 7.8 or Figure 7.13 to load the relocatable parts of each of the files into consecutive memory locations. If any of the files have absolute components, it is up to the user to be sure that they do not interfere. The shortcoming of this loader is that, although it loades the required object files in memory, it contains no provisions to tell the different files about each other. Programmers using such a system would typically reserve some dedicated (absolute) locations for communication between the routines included in the different files, and this would be very cumbersome.

Most high level languages include features allowing items to be declared in one source file and referenced in another. For example, in C or C++, all top-level declarations that are not prefixed by the keyword static are global and may therefore be referenced from any of the source files that are combined to make one program. In Ada, a declaration may be completed with the stub is separate to indicate that the completion of the declaration (a function body, for example) will be found in a different source file that may be compiled at a later time. Most assembly languages include similar tools.

On a Unix system, the command cc a.c does not just compile the file named a.c into the default object file a.out; rather, it compiles a.c into an object file called a.o and then runs ln, the Unix linker to link a.o with the object files for the standard C library routines called from within a.c. This linkage phase can be suppressed with the -c command-line option.

If we wish to combine the object files resulting from the assembly or compilation of different source files, our source programs, whether in assembly language or a high level language, must include directives to specify which objects declared in one file may be referenced from other files, and our object code must include machinery to communicate such external references. The symbolic names used to construct an external reference are called external names; these are the only symbolic names that appear in the object code. By contrast, all other identifier names used in a source program are called internal names because their definitions and uses are internal to the source file in which they occur. Internal names have their values bound at assembly time or at compile time, while external names have their values bound at linkage time or at load time. The linkage-time processing of external names is sometimes called external reference resolution.

A system which supports a linker must have an assembly language notation for external names as well as an object code notation. Typically, this is done in assembly languages by introducing new pseudo-operations which are used to declare and define external symbols. For example, consider adding the following directives to the example assembly language.

EXT <identifier>
declares that the definition of <identifier> is external to this source file. The identifier is an external symbol that may not be declared in this source file, but it must be defined in some other source file. In effect, we are importing a definition from the other file.

INT <identifier>
declares that the definition of <identifier> must occur somewhere internal to this source file. The identifier is an external symbol and may be referenced from any number of other source files. In effect, we are exporting the definition from this source file.
When we link together a set of assembly language object files, each symbol declared as an EXT symbol in any file must be declared as an INT symbol in exactly one file, and in that file, it must be used as a label or as the left operand in a definition. The example given in Figure 7.16 illustrates the use of these directives.


Separately assembled source files

;File A
    EXT LAB
    W   LAB

;File B
    INT LAB
LAB =   5


An equivalent single source file

;File AB
    W   LAB
LAB =   5

Figure 7.16. The EXT and INT directives.

The functions of the EXT and INT directives are almost universally supported by commercially available assembly languages, but a variety of other names have been used. For example, the MACRO-11 assembler uses .GLOBL for both, simply delcaring that the operand is an external symbol without indicating whether the value is to be imported or exported. The assembler for the IBM 360-370-390 series of machines requires that if the name T is to be referenced as an external symbol, it must either be written as V(T), where the V modifier signals that the identifier is external, or it must be declared as an operand of an EXTRN directive. The 360-370-390 assembler always exports the names used as labels on CSECT directives, and in addition, it allows symbols to be explicitly exported using the ENTRY directive.

Most assemblers impose strict limits on the use of externally defined symbols within expressions. Typically, use of a simple imported symbol as a full word operand is always legal, but the application of any arithmetic operations to imported values is forbidden; those assemblers that allow arithmetic on imported values usually limit this the sum of an absolute constant and an external symbol. The reason is that any values used to modify the value of an external symbol must be included in the object code.

In a blocked relocatable object code, it is common to place an internal symbol dictionary, that is, a list of the symbols exported by the object file and their values, in a special block of each object file. A second special block is typically used to list all of the imported symbols and, for each imported symbol, a list of all the memory locations within the block where the associated values must be stored. If we modify the previous sentence slightly, replacing the text "where the associated values must be stored" with "to which the associated values must be added", we can support simple expressions that have, as values, the sum of an imported symbol and a small constant.

The tagged object code format given in Figure 7.12 and Figure 7.14 can be extended to support external and internal symbol definitions as is shown in Figure 7.17.

Record structure of input
------------------------------------------------------------
    ... | tag | v (2 bytes) | name (0 to 13 bytes) | ...
------------------------------------------------------------
Record types:
   0111 - load byte with v plus imported value of name.
   1000 - load word with v plus imported value of name.
   1001 - export v as the absolute value of name.
   1010 - export v plus the relocation base as name.

Figure 7.17. External symbols added to record types given in Figure 7.12.

In Figure 7.17, we add 4 new record types, two to allow references to externally defined symbols (one for each format of reference allowed, in this case, byte and word), and two to allow definition of such symbols (one for absolute definitions, one for relocatable definitions). Note that the record types for external references provide a 16 bit constant in addition to the symbolic reference. This allows limited use of expressions involving external symbols at assembly time; specifically, the set of legal operands includes an external symbol plus or minus an absolute constant.

Recall that, in Figure 7.14, we suggested a format for specifiying the starting address of a program as part of this example tagged object code format. When we link together multiple object files, only one of them may specify a starting address. The standard Unix linker for C and C++ uses a different approach to finding the starting address of the program: It requires that the name main be exported by only one object file in the set of object files to be linked, and it uses the value of this external symbol as the starting address.

Implementing Linkers

The processing performed by linkage editors and linking loaders is essentially the same, except that a linkage editor produces object code as its output, while a linking loader produces output directly into main memory. In either case, most values from the input object code can be passed through into the output with either no modification or with simple relocation. Occasionally, however, the input to the loader will define a symbol, exporting it. When this occurs, the linker will enter the symbol into the linker's symbol table. When the value of a symbol is imported, the linker must look it up in its symbol table. The processing of external names by the linker is extermely similar to the processing of label definitions and references in assembly languages. Thus, there are issues of symbol table management which must be addressed, and the problem of forward reference resolution must be solved.

As with assemblers, the forward reference problem in linkers may be solved with both chaining and the two-pass solution. Chaining is a very common approach with linking loaders because it is rarely a great inconvenience to forbid expressions from incorporating the names of external symbols. Some linking loaders create their output as a memory image in a random-access output file; these may also use chaining. Many linking loaders simply make two passes through the object code, just as the typical assembler makes two passes through the source code. Object codes that allow for any kind of expressions referencing external symbols end up requiring the two-pass solution for the same reason that assemblers allowing expressions to reference forward references require two passes.

It is extremely common for object codes to impose severe limits on the number of characters allowed in an external name. The example in in Figure 7.17, where the limit was 14 characters, is very typical. In fact, it is generous! The ANSI C standard, for example, only guarantees that the linker will pay attention to the first 6 characters of external names. It would be perfectly reasonable to consider an object code format that limited external names to such a small size to be archaic, but the truth is, many modern linkers retain such limits! Linkers that impose such limits frequently use schemes such as Radix 40 packing to squeeze 6 character names into 32 bits, and they frequently rely on the assembler or compiler to create the Radix 40 packing, so that the packed form of the external name is all that appears in the object code.

It is worth noting that, no matter how symbols are represented, linkage-time symbol tables are frequently fairly small. Only a few of the symbols used in the typical source program are external, and as a result, many linkers use linear search to manage their symbol tables.

In order to facilitate debugging of large programs, many linkers produce an output called a load map. This is frequently a simple dump of the linker symbol table, sorted by the numeric value of the symbols in the table. Given a load map, the debugger (whether human or a software tool) can determine what PC values correspond to what external functions of the program, and which operand references refer to external variables. This use of the load map in debugging motivates many programmers to make all functions in their source program external, so that the debugger can give useful output when the program fails.

When a linkage editor is used, it is common to have it generate relocatable object code as output. As with assemblers which generate relocatable object code, this requires that the linkage editor's location counter be initialized to relocatable zero. The relocation base for the first object file being linked must also be set to relocatable zero, and the relocation base for each successive object file being linked must be set to the maximum relocatable value taken on by the linkage editor's location counter in linking the previous file.

The linkers for many early computers did not perform the complete linkage suggested by the description given above. The assumption was frequently made that the only use of external symbols desired was in the address fields of procedure call instructions. Thus, as long as the linker modified these fields in such a way that the desired procedure was called, the programmer would be satisfied. One common approach was for the linker to insert indirect references to the called procedures in the address field. Whenever a linker using this approach encounters an external reference, it replaces this with an indirect reference through a data structure called the transfer vector. The transfer vector is simply a list of pointers to the entry points of each externally callable procedure. Whenever the linker encounters the actual definition of one of these entry points, it fills in the appropriate entry in the transfer vector.

Object Code Libraries

Most linkers provide one important facility in addition to the linking function: They allow selective loading from object code libraries. Thus, instead of requiring the user to explicitly enumerate to the loader all of the object files to be loaded, the user is only required to specify a main program and one or more libraries. The loader then searches these libraries for object files containing definitions of the required external symbols and loads these along with the main program. Such an object library is a required adjunct to many higher level languages which include large sets of predefined functions or procedures; for example, on Unix, the standard C library, libc is always included when a C program is linked, in addition to any secondary libraries the user may specify.

On some systems, object libraries have a special representation which requires a new systems program, the object library manager. On classic Unix systems, for example, the ar library manager is used to combine multiple .o object files into one .a archive or object library file that the loader can search for the files it needs. Such library managers were particularly important in the era of magnetic tape, since it was obviously desirable to store the entire object library on a single tape, but today, this holdover from that era remains in common use even though today's file systems are extraordinarily good at storing huge numbers of small files.

In many ways, object libraries are not very different from the directories of a file system. Each object file is simply entered in the directory for the library under the name or names that are exported by that object file. There is absolutely nothing to prevent a standard file system's directory structure from being used for this purpose, but despite this, most systems in current use today introduce a new data format for libraries.

When a linker or loader is searching a library, it uses the list of currently undefined symbols in its symbol table as a "shopping list". When an object file is found which includes a definition of one of these symbols, that file is loaded. Frequently, this introduces more undefined external symbols, since the loaded routine may itself contain references to other routines. This suggests that the linker must have two modes of operation: A search mode and a load mode. Some older linkers actually search the entire contents of an object library, making a list of the files to be loaded before making a second pass to do the actual loading. In fact, this is not needed, since a two pass assembler can arrange to place all internal symbol definitions at the head of the object file. The linker can then examine these and either switch to load mode or skip the file depending on what names are found.

When object files are stored on random access devices, a more extreme approach is sometimes taken: The directory for the object library is extended to include, for each object file, how much relocatable storage it will require, what external symbols it references, and which internal symbols it defines. This allows a two pass loader to make its first pass over the directory, without ever examining the actual object files until the second pass, at which time, all external symbols will already be defined.

The order in which object files are included in an object library is sometimes critical. The reason for this is that external references from a file included towards the end of the library which refer to symbols defined in a file included earlier in the library cannot be properly resolved if each object file in the library is examined only once. Most object libraries can be ordered in such a way that all external references from any file in the library refer to symbols defined in later files, but finding such an order can be difficult if an object file editor does not do it automatically. An alternate solution is to have the linker make multiple passes over the object library until either there are no more undefined external symbols or no files are found which provide the needed definitions.

Overlay Linkers

On machines where the available address space is relatively small compared to the size of the programs which users desire to run, a special kind of linkage editor called an overlay linker is sometimes used. Overlay linkers were developed in the mid 1960's, and they were used extensively in the early 1970's when many computers were being sold with memory capacities as small as 12K bytes, and when many machines had maximum memory capacities of 48K bytes. An overlay linker both resolves the external references in the various object files being linked and segments the program into overlays. Today, in the era of 32 bit addresses and huge multi-megabyte virtual memories, overlay linkers are something of a historical curiosity.

An overlay is an object file containing part of the code for a program which is to be loaded sometime after the rest of he program is loaded. Programs made of multiple overlays can run in small main memories because many different overlays may be loaded in the same region of main memory, overlaying each other. Of course, before calling a routine that is part of some overlay, the calling code must make sure that that overlay is loaded in memory!

Typically, an overlay linker first builds a data structure which describes the set of object files to be loaded, the amount of memory required by each, the internal symbols defined by each, the external symbols referenced by each, and which object file holds the starting address. Given this, the linker tries to construct an overlay tree. The root of the overlay tree is the object file containing the main program; the main program must be linked conventionally with every object file containing names directly referenced from the main program by any form of reference other than a subroutine (procedure or function) call.

The children of the root consist of object files contianing subroutines called from any that was linked into the root of the overlay tree, and the grandchildren are the object files containing subroutines called by the children. The tree building phase of an overlay linker will not always build an interesting tree. Sometimes, it will conclude that all of the object files to be linked must be combined into the root!

If the tree is bigger than just a root node, then each call from within one node of the tree to an external name residing in a child of that node is replaced with a call to a special load-on-call stub. This stub first checks to see if the overlay containing the desired procedure is currently loaded. If so, it passes control to the subroutine; if not, it loads the overlay and then passes control to the subroutine. Calls from a routine defined in an overlay to a routine in the parent of that overlay can proceed directly because the actual memory allocation is arranged in such a way that the parent overlay is guaranteed to be in memory whenever any of its children are loaded. Calls from within one overlay to routines defined in one of its sibling overlays are not usually allowed.

Terminology

The reader should be familiar with the following general terms after reading this section.

load and go assembly            absolute values
object code                     absolute locations
object file                     starting addresses
object module                   program linkage
loaders                         linkage editor
blocked object codes            linking loader
tagged object codes             external names
relocation                      internal names
assembly-time relocation        linkage-time
run-time relocation             external reference resolution
relocatable values              internal symbol dictionary
relocatable locations           object libraries
transfer vectors                overlay linkers
bootstrap loaders               bootstrapping
Additionally, the reader should understand the purpose of the pseudo-operations START, INT, and EXT which were added to the example assembly language.

Exercises

  1. Most assemblers that generate object code do not use chaining, but it can be done. Consider the following two approaches to one pass assembly.

    a) All symbols which are undefined in the assembler at the time they are used are simply treated as external symbols, as if they were operands on an EXT directive. When a definition is encountered for a symbol which has previously been treated as an external symbol, that definition is exported, as if it had been the operand of an INT directive. Is this really chaining? Does this require any changes to the object code or linker? How does this change the scope rules of the assembly language?

    b) A new operation is included in the object code, 'follow chain'; this takes, as operands, the address of a link in the chain and the value which is to replace all links in the chain. Is this really chaining? What object code would the assembler generate when it sees a forward reference? How does this change the scope rules of the assembly language?

  2. If A, B and C are relocatable symbols, and X and Y are absolute symbols, which of the following are legal, and if legal, what is the type of each expression?

    a) (A + X) - Y

    b) (A + X) - B

    c) (A - B) + (A - C)

    d) (A - X) + (B - Y)

    e) (A - B) + (A - X)

  3. If A, B, and C are relocatable symbols, the following example could (in principle) be accepted by an assembler, but is normally not allowed:
    (5 - A)+(B + C)

    a) Using Figure 7.6 as a model, find the relocatable value of this expression.

    b) Explain why most assemblers will not allow this expression.

    c) Rewrite this expression in an equivalent form which would be accepted by most assemblers.

  4. a) Propose a way to incorporate internal symbol definitions, external symbol references, and program starting address specifications into the object code illustrated in Figure 7.7.

    b) Modify the code given in Figure 7.8 to support your object code extensions from part a. Assume that the data type string is available, that the procedure readstring(s) reads a string and consumes a trailing blank or end-of-line, and that the symbol table routines define(s,v) and lookup(s) take a string parameter. Don't bother supporting forward references, and feel free to translate to the language of your choice.

  5. Write a relocating loader in the style of the one given in Figure 7.8 which can process the object code illustrated in Figure 7.11. Assume exactly the textual representation shown, and assume that you have a readhex(i) routine that consumes and returns the value of one hexadecimal number each time it is called. Feel free to translate to the language of your choice.

  6. Write case statement entries for the loader shown in Figure 7.13 so that it can handle the extensions documented in Figure 7.12, Figure 7.14 and Figure 7.17. In doing this, ignore the forward reference problem, and assume that you can use the type string with a special routine readstring(s,l)" which reads a string into s consisting of exactly the next l bytes of input (s is passed by reference, l by value). Also assume that you have access to code for define(s,v) and lookup(s). Finally, if a starting address specification is found, it should be put in the global variable named start. Feel free to use any appropriate programming language

  7. How could you modify your solution to the above problem so that multiple starting address specifications are detected and reported as errors?

  8. Modify the pseudo-code in Figure 7.15 so that it performs a two pass linking load. Your modified code must call initsymtab to initialize the linkage symbol table, and assume that the load function has been modified to support all of the object formats required for linkage.

  9. Modify the drawing in Figure 7.1 to show the different interpretations of the meanings of the various intermediate languages used when a linkage editor is used in addition to a loader.

  10. What modifications would you make to the object code format defined in Figures 7.12, 7.14 and 7.17 if you wanted to use it on the Intel Pentium family of processors? These machines have a 32 bit word and 32 bit memory address; memory is byte addressable, with an 8 bit byte, and instructions are as small as one byte, with fields of 8 bits, 16 bits, or 32 bits added as needed. Instructions may directly load or store bytes, 16-bit words, 32-bit long words or 64-bit quadwords.

References

Chapter 5 of the text

Assemblers and Loaders, 3rd ed. by D. W. Barron, North Holland, 1978.
provides a concise description of the different kinds of linkers and loaders and how they are used, but gives little description of how they work.

Chapter 8 of the text

P. Calingaert's text, Assemblers, Compilers, and Program Translation by P. Calingaert, Computer Science Press, 1979.
provides decent illustrations of a blocked object code with a relocation bit vector, as well as a nice discussion of overlay generation as performed by a linkage editor.

Chapter 3 of the text

System Software by L. L. Beck,
covers linkers and loaders in some detail, with reasonable examples of relocation, linking, and the basic algorithms used for linking and loading. An extended discussion of overlay management is included, as are brief descriptions of the linkers for some real machines.

The classic text,

Systems Programming by J. J. Donovan, McGraw Hill, 1972.
provides an extensive discussion of linkers and loaders in Chapter 5. Much of this discussion is somewhat quaint, since it is assumed that object files are stored on punched cards; nonetheless, the basic discussion is still valuable, although it is quite specific to the IBM 360.

Linkers have been around since the early days of the Fortran language, and it is instructive to look back to some of the early writing on the subject. The paper by McCarthy, Corbato and Daggett in Chapter 5D of

Programming Systems and Languages edited by S. Rosen, McGraw Hill, 1967
describes much of this early work.
"Linkers and Loaders" by L. Presser and J. White, in ACM Computing Surveys 4, 3 (Sept. 1972).
is an excellent description of the IBM 360 linker, better than most more recent descriptions.

The papers "Assembly Language as Object Code" by D. W. Jones and "A Machine Independent Linker" by C. W. Fraser and D. R. Hanson, both in

Software-Practice and Experience 13, 8 (August 1983) and 12, 4 (April 1982).
provide interesting descriptions of object code formats which are quite different, but both demonstrate the strong parallels between what a linkage editor does and what an assembler does. The former paper is based on the example assembly language used in this text.