Operating Systems, Chapter 7

So called load-and-go assemblers and compilers load their output output directly into main memory for execution. This is excellent for small programs that can fit comfortably in memory alongside the assembler or compiler and that are just run once -- for example, student assignments -- but in most cases, we want to store the executable code for repeated use later. The most elementary objection to load-and-go assemblers and compilers is that they cannot be used to assemble or compile new versions of themselves.

If a language processor requires m bytes of memory to run, and the largest program to be run requires n bytes to run, the machine must have m+n bytes of main memory to support a load-and-go assembler or compiler. If the output of the language processor is saved to some external storage medium and then re-loaded for execution, we need only the larger of m and n bytes, not the sum.

This argument from necessity is not the only justification for storing the output of the language processor in secondary memory. Compiling or assembling a program may take time, while users expect no delay at all between their request for a program and its execution. Storing the output of the language processor eliminates the need to wait for compilation or assembly each time a program is run. In addition, if programmers distribute pre-compiled or pre-assembled code, it is difficult for users to tinker with applications, thus improving application security and reliability as well as improving the programmer's own job security.

When a program has been assembled or compiled into a file of some kind, the file must first be copied into main memory before the program can run. This introduces important new terms: Because The saved copy, the object code, cannot be run until it is moved to main memory as executable code or machine code. The purpose of an assembler or compiler is therefore to translate source code into object code.

The program which copies the object code into main memory is called a loader. Loaders can be viewed as translating object code into machine code. Figure 1 describes the resulting view of the translation process and the different views of the meaning of a program as it is being translated.

                -------------
               | Source Text |
                -------------
                   /      \      assembler's
    programmer's  /        \   view of meaning
      view of    /       -------------
      meaning   /       | Object Code |
               /         -------------
              /                    \   loader's view
             /                      \    of meaning
     ------------------           --------------
    | Abstract Meaning | ------- | Machine Code |
     ------------------           --------------
                        hardware's
                     view of meaning

Figure 1: Views of the meaning of a program.

Object Code Formats

The simplest object code format is a simple bit for bit copy of the machine language program itself, called a memory image of the machine language. Consider a machine with, for example, 64K bytes of main memory, of which 1K bytes are reserved for the loader, then each object file would be exactly 63K bytes long, and the loader would be trivial.

Use of this object file format would waste a large amount of secondary memory if the average program size is much smaller than the maximum, and it makes it difficult to contemplate expanding the system or moving object code files from one system to another unless the two have identical memory configurations. Furthermore, it is hard to extend this scheme to support the linking of subroutine libraries or relocation (topics to be discussed later in this section).

A second simple object code format consists of a sequence of address-value pairs. This may be produced by replacing each statement in the assembler which assigns a value to a location in memory, for example M[a]=v, with a write statement placing the address and value in the object file. If a textual file format is used, the addresses and values would typically be written in octal or hexadecimal to simplify the conversion to binary required in the loader. object format differs only slightly from the object code field of the assembly listing format used by a typical assembler.

The first column in a typical assembler hexadecimal listing (perhaps following the line number) gives the address where data is to be placed, while the second column gives the value to be placed there. The number of hexadecimal digits in each value adds complexity to this format, since values may be individual 8-bit bytes (2 digits), 16-bit halfwords (4 digits) or 32-bit words (8 digits), and possibly 64-bit double words (16 digits). In an object code based on this idea, we could require that all multi-byte values be passed in the object code as single bytes.

An assembly listing is not a particularly efficient object code format, since it is designed primarily for human readability. With disk space costing as little as it does, the question of efficiency is not dominated by the cost of storing the object file, but rather, the complexity of the loader and the extra computations done in the loader during reading such an object file. Nobody likes systems where loading a program takes much time!

The use of a textual object code format for a small assembly language fragment is illustrated in Figure 2. This fragment contains no machine instructions. All it does is initialize a byte, followed by a halfword, followed by a word of memory.

Assembly Code

    . = 3      ; alternatively .ORG   3
      B 6      ;               .BYTE  6
      H 3      ;               .HALF  3
    . = 7      ;               .ORG   7
      W 1      ;               .WORD  1

A Typical Assembly Listing

     1                 |    . = 3
     2 0003: 06        |      B 6
     3 0004: 0003      |      W 3
     4                 |    . = 7
     5 0007: 00000001  |      W 1

Textual Object Code

00000003 06
00000004 03
00000005 00
00000007 01
00000008 00
00000009 00
0000000A 00

Figure 2: A simple object code format.

A loader for this object format is relatively easy to write. In standard C, for example, it might be written as is shown in Figure 3.

#include <stdio.h>
main()
{
    for (;;) {
        int loc,val;
        fscanf( stdin, "%x %x", &loc, &val );
        *((char *)loc) = val;
    }
}

Figure 3: A loader for the object code format in Figure 2

The loader shown in Figure 3 is not particularly desirable because it rests on top of the standard C libarary routine fscanf() from <stdio.h>. This routine is relatively large, containing all the code necessary to handle 14 different input formats ranging from character input to floating point, plus an interpreter for format strings to scan for percent signs and figure out what input conversion to use!

The assignment *((char*)loc)=val in this code illustrates a fundamental aspect of all loaders. The loader must be able to read an integer, loc from the object file and then use it as a memory address. The type-casting prefix operator (char*) converts an integer into a pointer to (the address of) an object of type char. If the address from the object file cannot be loaded by the loader, for example, if it is an address in ROM or an empty socket for a memory module that was not installed, this assignment will be an error, but depending on the hardware, this error may or may not be detected. If the address can be loaded, the contents of the variable val will be stored in the indicated memory location.

Although functional, this simple object code is expensive. Assuming a binary representation for both addresses and data and a 32-bit address, each byte loaded in memory will carry with it 4 bytes of address. Furthermore, because this is a textual object code, each hexadecimal digit is stored as one byte, and we have used spaces and newlines to separate successive fields in the code. So, our total cost for one byte of machine code is 12 bytes of object code! As a result, a program that fills 64K of RAM will have an object file that fills 768K of disk space! This is hardly desirable although this kind of bloat may well be tolerable given the low cost of secondary storage. What is less tolerable is the computational cost of loading, an operation that we want to be fast.

On some machines, these economic arguments work out quite differently. For example, on the CDC 6600, introduced in 1965 and usually considered to be the first supercomputer, the smallest addressable unit of memory was one 60 bit word, while addresses were limited to 18 bits. In this case, still using hexadecimal for the values and still using 8-bit characters, this object code would use 22 characters to encode each 60 bit word, so the object file would be about 3 times the size of the machine code instead of the factor of 8 for our running example. (In fact, the most common text representation on the 6600 used 6-bit characters, and octal was the preferred number base for presenting binary data in textual form.)

The usual path to more efficient object encodings makes use of the fact that most of the values to be loaded in memory will be placed in consecutive memory addresses, since explicit manipulation of the assembly location counter is rare! This suggests the use of addresses followed by sequences of values in the object code. Although this is a simple idea, it introduces a new problem: How can the address heading a new sequence of values be distinguished from a value continuing the previous sequence? There are two basic classes of solutions to this problem, blocking and tagging.

In a blocked object code, each sequence of values to be loaded in consecutive addresses is placed in a separate block, along with the address of the first item in the sequence. In a tagged object code, values and addresses are differentiated by a tag (typically one bit); if data in the tagged object code is divided into blocks, this division is only for the sake of convenient storage and does not determine which entries in the block represent addresses and which represent values.

Blocked object codes are frequently used in systems where the storage medium used for the object code has a natural record structure. Many classic storage media have this characteristic. With punched cards, for example, each card could represent a block of data to be loaded, with the address encoded in the first few columns of holes on the card, followed by the data. With classic half-inch wide 9-track magnetic tape, each block of data on the tape could begin with a memory address, with the remainder of the block containing data to be stored in memory. Similarly, data on disk is naturally divided into sectors.

An old blocked object code still in widespread use is called the Intel Hex Code. This code was developed by Intel for the first generation of 8-bit microprocessors and read-only memory chips, and it is still widely used as an object format for some embedded microcontrollers.

This code was originally intended for use with paper tape. In Intel's early catalog of Read-Only Memory (ROM) products, the standard way to specify the contents of a ROM was to send Intel the paper tape, in Intel Hex format, and let them manufacture the chips holding the data you specify. This was before field programmable ROM chips (known as FPROM) or Erasable/Programmable ROM chips, EPROM. Today, this usage continues. For example, it is possible to order pre-programmed microcontrollers by E-mailing, in Intel Hex format, the code you want loaded into the ROM of the microcontroller. If you are building thousands of TV remote control units, for example, manufacturing is greatly simplified if the embedded microcontrollers are delivered to your factory with your firmaware already installed.

Intel Hex code is a textual code, with each line holding one block of load data, where all of the information on the line is encoded in hexadecimal. Each line must begin with a colon, followed by a 2-digit count of the number of data bytes on the line, a 4-digit address where the data should be stored, and a 2-digit block-type indicator; type zero is loadable data, and type 1 is used for an end of file marker. Later, additional block types were introduced to allow for 20 and 32-bit memory addresses. Following the type marker come the data bytes, and then a 2-digit checksum is always the last item on the line. The checksum serves to make it possible to detect malformed object files; in the case of Intel Hex format, the checksum is the two's complement of the least significant 8 bits of the sum of all of the other bytes on the line, including the byte count, the address and the block type. This object code is demonstrated by the example in Figure 4, assembled for a machine with a 16-bit word (since many microcontrollers are 8 or 16-bit architectures).

Assembly Listing

1                    | . = #0100
2  0100: 05          |   B 5
3  0101: 0F 00       |   W 15
4  0103: FF 80       |   W #80FF
5                    | . = #0008
6  0008: FF          |   B #FF
7  0009: 07 00       |   W #0007

Intel Hex Code

:05010000050F00FF8067
:03000800FF0700EF
:00000001FF

Decoding of the first line of this code:

: 05 0100 00 05 0F 00 FF 80 67
   |   |   |  |  |  |  |  |  |_ checksum
   |   |   |  |  |  |  |  |
   |   |   |  |__|__|__|__|_ 5 data bytes
   |   |   |
   |   |   |____________ type
   |   |
   |   |_____________ starting load address
   |
   |______________ byte count

Figure 4. The Intel hex object code format.

The use of a purely textual format has an additional historical reason: In early usage, it was common to use a teletype to punch the paper tape that would be sent to Intel to specify the contents of the ROM chip. The teletype mechanism insisted on attempting to print all material sent to it as it punched the paper tape, so Intel avoided using a non-textual object codes because it could cause the teletype to print wild nonsense. Honeywell, in the late 1960's, produced a line of 16-bit minicomputers, notably the DDP-516, that used a paper-tape object code that was very carefully matched to the model 33 Teletype so that all bytes included in a legal object code file would be non-printing and would not cause the teletype to move its print head. This was very clever, but very specific to one specific and now very obsolete print mechanism!

Simple object codes such as have been described in this section are frequently used for one very important purpose: Bootstrapping or cold-starting a computer system.

The term bootstrapping is used because the computer is, in effect, pulling itself up by its own bootstraps (a classic English idiom). The term cold-start is used to indicate that the computer is literally cold when it is started. After it is running, it is warm, and if you restart the machine while it is running, it is called a warm-start. When you do a cold-start, the contents of memory (outside of ROM) is unknown, while for a warm start, because the power is not turned off and on again, some useful data may still be relied on to be in memory.

When a computer is first turned on, the only executable code which can be run is that which resides in read-only memory. Frequently, this code is simply a loader, the bootstrap loader, which reads the operating system in; on modern systems, this is usually from disk or CD-ROM, but some machines bootstrap from the network, and historically, many machines were bootstrapped from magnetic tape, paper tape or punched cards. The bootstrap loader on the Raspberry Pi, introduced in 2012, is in the graphics coprocessor ROM. When the machine starts, the coprocessor forces the CPU to halt while it reads the boot sector from the machine's SD card, and then it lets the CPU begin running. Whether ancient or modern, bootstrap loaders tend to relatively small and realtively simple, using very simple data formats such as Intel Hex code or even simply reading a fixed disk sector into location zero of memory. In many cases, a very simple loader is used to read in a more complex loader which actually loads the operating system.

Historical note: In computers of the 1960's, there was usually no ROM at all. On startup, the machine operator had to enter the bootstrap loader into RAM one bit at a time using toggle switches and pushbuttons on the computer's console. A classic scheme was to enter a program that would read one punched card from the card reader into RAM and then execute the bit pattern read from that card. A similar scheme was used on the MODCOMP-IV computer, where the bootstrap loader was just one machine instruction; when executed from memory location zero after a master-clear or power-up, this instruction would read the contents of sector zero, track zero of disk zero into memory starting at address zero. Once the read finished, the computer would begin executing the data from the block just read.

Relocation

One problem faced by users of simple assembly and loading systems is where to put programs in memory. Up to this point in the discussion, it has been assumed that assembly programs would start at location zero unless the programmer explicitly set a different assembly origin by modifying the location counter. This is clearly not an acceptable solution on large systems where code must be written without any knowledge of where it will eventually be run.

Computer security is improved if an attacker cannot guess what code is where in main memory. One way to help prevent such knowledge is address-space randomization. Instead of allocating memory sequentially, each time a program is loaded, it is loaded in a randomly selected block of free memory.

If the memory address where a program will run is unknown, we need some way to load the code into an arbitrary memory location. This is easy for constant valued code such as data and most machine instructions, but it is hard when some parts of the code contain memory addresses. When the code is loaded in memory, the addresses in the code must be modified to reflect where in memory the code has been loaded. This is called relocation.

A relocatable object code is one where addresses in that code that require relocation are marked. A relocating loader is a loader that is able to locate all relocatable addresses in the code and adjust them as the program is loaded.

If we require the program to be reassembled (or even recompiled) in order to change the location where it will be loaded, we could refer to this as assembly-time relocation, and this view is useful, particularly if the source code was compiled and the assembly code itself is viewed as something like an object code for communication between the compiler and a load and go assembler.

At the other extreme, some machine languages allow a running program to be moved about or unloaded from memory and reloaded in a different location without harm. This is called run-time relocation. Run-time relocation is only possible if consistent use is made of base registers or relative addressing. If the machine code or the user data structures contain absolute memory addresses, moving the code and data to a different address would be very difficult, but if all branch addresses and pointers are relative, that is, expressed as displacements from the first address in the program, or from from the location containing the pointer, we can move the entire block of data holding a program's code and data to a different memory address. Machines that allow programs to be written this way are said to support position independent code.

Virtual memory hardware (to be discussed later) can hide the complexity of position indepent programming from users by using special address translation hardware to perform run-time relocation, but usually, ther term position independent code is reserved for programs which are explicitly coded to be able to be run at any memory address without the need to edit any memory addresses in the code.

Another way of thinking of relocation is in terms of when the objects that make up a program are bound to actual memory locations. We can therefore speak about the binding time of each object in the program. Some objects may be explicitly bound by the programmer. When you declare the global identifier pi to be the floating point constant 3.14159, the binding of pi is done when the program is written.

Some objects may be bound to specific memory locations at compile time, compile-time binding. A compiler may finalize the binding of global variables to particular memory addresses, but this is rare. Usually, the compiler merely determines the relative placement of variables within various records or memory segments. The types and relative locations of variables are bound at assembly time, but not their final placement in memory.

The compiler usually defers binding until link time or even run-time. linkage-time binding occurs for variables where the linker makes the final decision about the memory address. On many systems, it is the linker that makes the final decision about where the code of each subroutine goes in memory and where each global variable is placed.

Finally, with run-time binding, the final location of variables in memory is determined while the program is running. This applies to local variables, where the binding is done as the activation record for a subroutine is pushed on the stack, and it applies for dynamically allocated objects, where the binding is finalized as the object is instantiated.

If binding is done at any time beteen run-time, when it is appropriate to speak of position independent code, and assembly time or compile time, we must modify the object code to allow a distinction between the values that are already determined -- those representing constants and those representing addresses of objects that are already bound to specific memory addresses, and values that are not-yet bound.

The effect of relocating a program is illustrated in Figure 5, under the assumption that words are just 16 bits. Our program contains two labels, and two words in the program refer to those labels. The rest of the "code" is composed of byte and word constants.

Assembly code     Loaded at #0000   Loaded at #0100

L1: B 10          0000: 0A          0100: 0A
    W L2          0001: 05          0101: 05 \relocated
                  0002: 00          0102: 10 /
    W 0           0003: 00          0103: 00
                  0004: 00          0104: 00
L2: W #0100       0005: 00          0105: 00
                  0006: 10          0106: 10
    W L1          0007: 00          0107: 00 \relocated
                  0008: 00          0108: 10 /

Figure 5. The effect of relocating a program.

In the simplest model of relocation, the loader performs the final binding of code and global data to specific memory locations, and as it does so, it adjusts or relocates all of the pointers from one part of the loaded program to another. As should be clear from Figure 5, this adjustment is simple; all that is involved in relocation is adding a constant, the relocation base, to each address which refers to another location in the same block of data. The relocation base is usually the same as the address at which the first byte of the program is loaded. Thus, if the program is loaded at address 0000₁₆, the relocation base is 0 and the addresses within the program are not modified. If the program is loaded at address 0100₁₆, the relocation base is 100₁₆ and this constant is added to each address within the program.

Note that the only words in a program that are relocated are those that are used as addresses. Thus, words which were used as data, op-codes, or other constants must not be modified! such non-relocatable words are referred to as words holding absolute values. The contents of a word loaded in memory provide no hint as to whether or not the word is an address or data, but the assembly language source code makes this quite clear:

This is because the relocaton base will be, when the code is finally loaded in memory, the real initial value of the location counter; the initial value of the location counter at assembly time is a tentative zero, to which the actual relocation base will be added.

The location counter for each byte or word assembled into memory is typically derived from the initial value of the location counter through a series of increment operations (by one for bytes, by two for words, etc). Labels take their values from the location counter, so operands that refer to labels take their values, with one more level of removal, from the location counter.

Note that we speak of the initial value of the location counter and derivation from this initial value. If a new assembly origin is set, this new origin may be absolute if or relocatable depending on the value specified; the default origin is typically relocatable, as are all labels derived from it. If the origin is set, for example, to a simple number, this is absolute. It is occasionally useful to be able to assemble a program which has some non-relocatable parts that will be placed in fixed memory locations while other parts are relocatable. This is especially true when assembling parts of the operating system on machines where input/output device interface registers or interrupt handlers appear as if they were fixed locations in memory. Assembly code which references these must use absolute addresses even when the rest of the program is written to be relocatable.

As a result of the above considerations, four different possible modes of loading can be identified:

In order to take these alternatives into account in an assembler, each symbol table entry must contain a flag indicating whether the associated symbol is relocatable or absolute, and the location counter must be similarly flagged. The flag used for this purpose typically also allows a third alternative, undefined, in order to account for programming errors and the initial values associated with identifiers that have not yet been defined.

When a label is defined, it takes on the current value and type of the location counter. The initial value of the location counter is typically the value "relocatable zero", and the value remains relocatable as it is incremented during assembly. When the assembly origin is changed, however, the new value may be either absolute or relocatable depending on the type of the expression used to set the new origin. (in some assemblers, this also depends on the particular assembly directive used to set the origin; typical directives are ABS to set an absolute assembly origin and REL to set a relocatable origin).

The determination of the type of an assembly time expression used as an operand can be quite complex. The value of an expression will be absolute or relocatable depending on the types of the component operands and depending on the operators. If all operands are absolute, of course, the expression has an absolute value, but if an expression has a relocatable component, the situation is somewhat more complex.

In general, the type of an expression containing relocatable component operands may be determined by substituting for each relocatable operand O a new operand (O' + R) where O' is the absolute value of the operand and R is a variable with an unknown value standing for the relocation base that will be specified when the program is loaded. If algebraic manipulation of the value of the expression can be made to remove all uses of R, the expression is absolute. If algebraic manipulation of the expression can reduce it to a form that is the sum of R and some other value, the result is relocatable. If the expression cannot be algebraically manipulated into one of these forms, it cannot be handled by simple relocation and will be considered to be illegal by all assemblers that use this scheme. This approach is illustrated in Figure 6.

A:         ; A = (A'+R)        = (A')+R       -- relocatable
B = .      ; B = (B'+R)        = (B')+R       -- relocatable
C = A - B  ; C = (A'+R)-(B'+R) = A'-B'        -- absolute
D = A - 5  ; D = (A'+R)-5      = (A'-5)+R     -- relocatable
E = 5 - A  ; E = 5-(A'+R)      = (5-A')-R     -- illegal
F = A + B  ; F = (A'+R)+(B'+R) = (A'+B')+2R   -- illegal
G = A + 5  ; G = (A'+R)+5      = (A'+5)+R     -- relocatable
H = 5 + A  ; H = 5+(A'+R)      = (5+A')+R     -- relocatable

Figure 6. Types of expressions involving relocation.

The simplest relocatable object code is based on the address-value pair code discussed previously. To make such a simple code relocatable, each address and value must be tagged as either absolute or relocatable. This is quite easy on a word-oriented machine, but on a machine allowing addresses to refer to bytes or other fractions of a word, we must face a new problem. Consider what happens if we attempt to relocate a program to address 100₁₆ when a byte contains a relocatable value. Adding 100₁₆ to the initial value in that will do nothing because the maximum value that can be stored in one byte is FF₁₆! As a result, most assemblers forbid any attempt to specify a relocatable value to be stored in a byte or an fraction of a word smaller than the size of a machine address.

To account for these problems in a relocatable object code designed to support our example assembly language, we must tag each value field as either an absolute byte, an absolute word, or a relocatable word, for example, with the tags b, w and r. This is illustrated in Figure 7 using a relocatable object not unlike that from Figure 2 and assuming that the word-size is 16 bits.

Assembly Code

      B 5    ;Store absolute 5 in relocatable 0
      W A    ;Store relocatable 3 in relocatable 1 and 2
    A:W B    ;Store absolute #0103 in relocatable 3 and 4
    . = #0100
      B 5    ;Store absolute 5 in absolute #0100
      W A    ;Store relocatable 3 in absolute #0101 and #0102
    B:W B    ;Store absolute #0103 in absolute #0103 and #0104

Object Code

r0000 b05
r0001 r0003
r0003 w0103
a0100 b05
a0101 r0003
a0103 w0103

Figure 7. Relocatable assembly and a simple relocatable object code.

An loader for this simple tagged object code might be constructed as shown in Figure 8 (coded in Pascal, merely for the sake of variety):

function load( base: integer ): integer {returns the least unused address};
var ta, tv: char  {the tags on the address and value};
    a, v: integer {the address and value};
    high: integer {the highest loaded relocatable location};
begin
    high := base;
    while not(eof(input)) do begin
        read( ta ); readhex( a ); read( tv ); readhex( v );
        readln;
        if ta = 'r' then begin
            a := a + base;
            if a >= high then high := a + 1;
        end;
        if tv = 'r' then v := v + base;
        poke( v mod 256, a ); {plant first byte in memory}
        a := a + 1;
        if tv <> 'b' then begin
            poke( v div 256, a + 1 ); {plant second byte, if needed}
            a := a + 1;
        end;
        if (tv = 'r') and (a > high) then high := a;
    end;
    load := high; { the return value }
end {load};

Figure 8. A loader for the tagged textual object code format.

Pascal should be easy for C programmers to read; the biggest difference is the use of := for assignment and = for comparison and the use of the keywords begin and end instead of curly braces as block delimiters. The Pascal code given in Figure 8 assumes that readhex reads a hexadecimal value into the variable given as a parameter and that it consumes the following delimiter (a space or newline). The Pascal standard includes read; when called with a character variable as a parameter, it reads one character of input. We also assume that poke(v,a) stores the value v in the memory location a.

Historical note: The peek and poke operators seem to have originated in the 8K version of MITS Altair Basic released in 1975. This was written by Bill Gates, Paul Allen and Monte Davidoff and was the very first product produced by a new company named Micro-Soft (that is how it was spelled). A listing of the original 4K version of MITS Basic survives (it appears to have been lost by one of the developers, it was found behind a file cabinet at Harvard); comments in the listing indicate that Bill Gates wrote the run-time parts of the code, suggesting that Bill Gates probably invented the names PEEK and POKE.

This loader expects the relocation base as a parameter; when it finishes, it returns the address immediately after the highest relocatable location it loaded. Thus, the caller learns the size of the memory region used by the relocatable part of the loaded program. This feature will be needed if multiple files are to be loaded consecutively in memory; we will need this to link together multiple object files to make one large program.

In practice, textual object code formats are not common because they make inefficient use of memory and require load-time translation, slowing loading. The ideas in Figures 7 and 8 can be easily applied to a binary object code, using one byte to store the tags indicating address and value types, with additional bytes for the binary address and value. Assuming 16 bit values, each pair needs 40 bits, as illustrated in Figure 9.

bit:  0  1  2        8  --------  23 24  -------  39
      ----------------------------------------------
use: |ta| tv |      |    address    |     value     |
      ----------------------------------------------
address tag  ta = 0  --> absolute address
                  1  --> relocatable address

value tag    tv = 00 --> byte (use only bits 24 to 32 of value)
                  01 --> absolute word
                  11 --> relocatable word
                  10 --> unused, could mark end of file

Figure 9. A binary object code format for tagged address-value pairs.

As with simple absolute object codes, we can easily create a blocked relocatable object code format in order to create a more compact representation. In a blocked relocatable object code, relocation information is typically separated from the data to be loaded. This suggests the use of two basic kinds of block, one to hold data to be loaded, and one to specify which locations already loaded need to be modified by adding the relocation base. Since both load addresses and the specifications of addresses to be relocated may be either absolute or relocatable, we end up needing a total of 4 block types, as illustrated in Figure 10.

Block types:

da - data to be loaded in an absolute address.
dr - data to be loaded in a relocatable address.: Each is followed by a load address, a count of data bytes, and the data.
ra - relocation block for data loaded in absolute addresses.
rr - relocation block for data loaded in relocatable addresses.: Each is followed by a count of addresses and then that many addresses.

Textual representation of the object code for Figure 7:

dr 0000 05 05 03 00 03 01
da 0100 05 05 03 00 03 01
rr 01 0001
ra 01 0101

Figure 10. A blocked relocatable object code.

Relocatable values are fairly uncommon in object files for most modern architectures because most local control structures are handled with position independent code, specifically program-counter relative branches, and because most operands are local variables allocated on the stack or fields of objects dynamically allocated on the heap. If a large fraction of the words loaded in memory are likely to be relocatable, the format suggested in Figure 10 will not be very efficient because of the long lists of addresses that will end up in relocation information blocks (types rr and ra). When this is the case, tagged codes are a better choice, or mixed codes, for example, using a bit vector at the end of each block to mark the addresses that need relocation. Figure 11 illustrates the use of such a code.

Block types:

a - data and relocation for absolute load addresses.
r - data and relocation for relocatable load addresses.: Each is followed by the address of the first data byte to be loaded, followed by the number of bytes to be loaded, followed by that number of bytes and finally a bit vector giving one bit per byte indicating if that byte needs relocation (the bit vector is, of course, is padded to an 8-bit boundary with extra zero bits.

Textual representation of the object code for Figure 7:

R 0000 05 05 03 00 03 01 40
A 0100 05 05 03 00 03 01 40

Decoding of the first line of this code:

R 0000 05 05 03 00 03 01 40
|   |   |  |  |  |  |  |  |_ bit vector = 01000xxx
|   |   | 0| 1| 0| 0| 0|
|   |   |  |__|__|__|__|_ load data
|   |   |
|   |   |______________ count of data bytes
|   |
|___|________________ relocatable starting address

Figure 11. A blocked object code using relocation bit-vectors.

A practical tagged relocatable object code might be constructed as follows. An object file is viewed as a continuous sequence of 8-bit bytes, with no structure such as blocks, lines or records imposed by the underlying hardware or the underlying file system. This is the usual view of binary files in modern operating systems. This stream is divided into logical records of variable length, where the first or tag byte of each record gives the record type (4 bits) and the length of the record (4 bits), as shown in Figure 12.

Record structure of input:

---------------------------------------------------
  ... | tag |     0 to 15 bytes of data    | ...
---------------------------------------------------

Tag structure:

   ttttnnnn
   tttt      - 4 bits of record type.
       nnnn  - 4 bits giving count of data bytes.

Record types:

   0000 - no operation (usually nnnn = 0).
   0001 - load n bytes of absolute data.
   0010 - load n/2 words of relocatable data.
   0011 - set absolute loader origin (usually n = 2).
   0100 - set relocatable loader origin (usually n = 2).

Textual representation of the object code for Figure 7:

42 00 00 11 05 22 03 00 12 03 01 32 00 01 11 05 22 03 00 12 03 01
==------ ==--- ==------ ==------ ==------ ==--- ==------ ==------
                underlining shows record structure

Figure 12. A practical tagged relocatable object code for a 16 bit machine.

This tag structure limits each record to 16 bytes, so for long streams of absolute object code, with no relocatable data, we need only start a new load record every 16 bytes, and we have only one byte of overhead per load record, so our overhead in this case will be about 7 percent. Note that extending this code to work with 32-bit addresses is trivial, and we can have ample room to expand this code with other block types, for example, those we might use to support external symbols.

There are many situations where it is hard to control the value of the first or last byte in a file, for example, on machines that insist on rounding each file out to an even number of disk sectors. In such cases, the system will usually pad the file with zeros, and our selection of the tag value 0 as meaning do nothing with a block that contains no data lets us deal with this quite easily. We can even use non-empty no-op blocks to hold comments, copyright notices, and similar "excess baggage" on our object files.

Note that our loader for this object format must itself maintain a location counter, just like an assembler! Absolute data records are used to load any combination of bytes and words when relocation is not required; each byte loaded advances the loader location counter by one. Relocatable data records are used to load words which need to be relocated; each word loaded advances the loader location counter by two. The loader origin records set the loader location counter, and such records typically only appear in the object code if the the input to the assembler set the origin.

A loader for the object code given in Figure 12 is given in Figure 13, written in C.

char * load( char * base ) { /* returns lowest unused relocatable address */
    char * loc; /* the current location */
    int mode;   /* the assembly mode (REL or ABS) */
    enum modes  { REL, ABS };
    char * high;/* the highest loaded relocatable location */
    int typ;    /* 0 to 15, the type of the current block */
    int count;  /* 0 to 15, the count for the current block */
    int value;  /* a value read from the block */
    int temp;   /* temporary byte read from block */

    /* default initial values */
    loc = base;
    mode = REL;
    high = loc;

    /* read the object code */
    for (;;) {
        temp = getc(stdin);
        if (temp < 0) exit; /* exit loop on end of file */

        /* break first byte of block into type and count */
        typ = temp >> 4;
        count = temp & 0xF;

        switch (typ) {
        case 0: /* ignore */
            while (count > 0) {
                value = getc(stdin);
                count = count - 1;
            }
            break;

        case 1: /* load absolute */
            while (count > 0) {
                value = getc(stdin);
                count = count - 1;
                *loc = value;
                loc = loc + 1;
            }
            break;

        case 2: /* load relocatable */
            while (count > 0) {
                value = getc(stdin); /* first 8 bits */
                temp = getc(stdin);  /* second 8 bits */
                count = count - 2;

                value = value + (temp << 8);
                value = (int)(base + value); /* relocate! */

                *loc = value & 0xFF;
                loc = loc + 1;
                *loc = value >> 8;
                loc = loc + 1;
            }

        case 3: /* set absolute origin */
            {
                value = getc(stdin); /* first 8 bits */
                temp = getc(stdin);  /* second 8 bits */
                loc = (char *)(value + (temp << 8));
                mode = ABS;
            }

        case 4: /* set relocatable origin */
            {
                value = getc(stdin); /* first 8 bits */
                temp = getc(stdin);  /* second 8 bits */
                value = value + (temp << 8);
                loc = base + value;  /* relocate! */
                mode = REL;
            }

        } /* end select */
        if ((mode == REL) && (loc > high)) high = loc;
    }
}

Figure 13. A loader for the object code given in Figure 12.

Note that, as in Figure 8, this loader takes the form of a function which is given a pointer to the starting locaton in which the code is to be loaded and returns a pointer to the first location after the relocatable part of the loaded code. This form of the basic load routine will be important in the upcoming discussion of the linking process!

It should be noted that practical object codes include a mechanism for encoding the starting address of a program in the object file! This is needed because there is no requirement that the first loaded location be the first executable instruction of a program, since some programmers (certainly most C, C++ and Pascal programmers) place data structures and secondary routines before the main program. As a result, once a program is loaded, there must be some way to tell the computer system to jump to the appropriate location in the program to start it running. In most assembly languages, this is indicated by a special assembly directive, for example, START, that takes, as an operand, the starting address, and passes it on to the loader.

In the tagged relocatable code given in Figure 12, starting addresses might be conveyed by two new record types, one for relocatable and one for absolute starting addresses. These new record types defined in Figure 14.

Record types:

   0101 - set absolute starting address (usually n = 2).
   0110 - set relocatable starting address (usually n = 2).

Figure 14. Starting addresses added to record types from Figure 12.

Most modern computer systems have a relocating loader which is the primary tool used when a user requests that some program be run. Thus, when a user types a command like RUN X from the keyboard, or when a user double clicks on the icon for an executable object file X, this requests that the loader be run with the file X as input before control control is transferred to the starting address of the loaded program. There are many variations on this! When good hardware facilities for run-time relocation are available, a simple (non-relocating) loader may be used to run user programs. Other systems require that loading and running a program be separately requested, and there are a broad range of syntactic alternatives in the command language.

When a Unix command interprer (a shell) encounters a command that is not a built-in command, it assumes that the command name is the name of an object file and it attempts to load that object file and run it. There are many different Unix shells, the Bourne shell (sh), the C shell (csh), the "Bourne-again shell" (bash) and others, but all share this characteristic. Most of the common Unix commands are not built into the shell, but rather, are simply the names of object files found in the directory /usr/bin or other directories in the search-path, the list fo directories that the shell checks when it tries to open a file.

Linking

If many different programs which have been translated to object code files are to be loaded in memory, a loader built along the lines shown in Figure 15 could be used.

procedure loader( base: integer );
var next: integer;
begin
    next := base;
    while more-files-to-load do begin
        reset( input, next-file );
        next := load(next);
    end;
end {loader};

Figure 15. A loader for multiple files.

Most high level languages include features allowing items to be declared in one source file and referenced in another. For example, in C or C++, all top-level declarations that are not prefixed by the keyword static are global and may therefore be referenced from any of the source files that are combined to make one program. In Ada, a declaration may be completed with the stub is separate to indicate that the completion of the declaration (a function body, for example) will be found in a different source file that may be compiled at a later time. Most assembly languages include similar tools.

On a Unix system, the command cc a.c does not just compile the file named a.c into the default object file a.out; rather, it compiles a.c into an object file called a.o and then runs ln, the Unix linker to link a.o with the object files for the standard C library routines called from within a.c. This linkage phase can be suppressed with the -c command-line option.

If we wish to combine the object files resulting from the assembly or compilation of different source files, our source programs, whether in assembly language or a high level language, must include directives to specify which objects declared in one file may be referenced from other files, and our object code must include machinery to communicate such external references. The symbolic names used to construct an external reference are called external names; these are the only symbolic names that appear in the object code. By contrast, all other identifier names used in a source program are called internal names because their definitions and uses are internal to the source file in which they occur. Internal names have their values bound at assembly time or at compile time, while external names have their values bound at linkage time or at load time. The linkage-time processing of external names is sometimes called external reference resolution.

A system which supports a linker must have an assembly language notation for external names as well as an object code notation. Typically, this is done in assembly languages by introducing new pseudo-operations which are used to declare and define external symbols. For example, consider adding the following directives to the example assembly language.

Separately assembled source files

;File A
    EXT LAB
    W   LAB

;File B
    INT LAB
LAB =   5

An equivalent single source file

;File AB
    W   LAB
LAB =   5

Figure 16. The EXT and INT directives.

The functions of the EXT and INT directives are almost universally supported by commercially available assembly languages, but a variety of other names have been used. For example, the MACRO-11 assembler uses .GLOBL for both, simply delcaring that the operand is an external symbol without indicating whether the value is to be imported or exported. The assembler for the IBM 360-370-390 series of machines requires that if the name T is to be referenced as an external symbol, it must either be written as V(T), where the V modifier signals that the identifier is external, or it must be declared as an operand of an EXTRN directive. The 360-370-390 assembler always exports the names used as labels on CSECT directives, and in addition, it allows symbols to be explicitly exported using the ENTRY directive.

Most assemblers impose strict limits on the use of externally defined symbols within expressions. Typically, use of a simple imported symbol as a full word operand is always legal, but the application of any arithmetic operations to imported values is forbidden; those assemblers that allow arithmetic on imported values usually limit this the sum of an absolute constant and an external symbol. The reason is that any values used to modify the value of an external symbol must be included in the object code.

In a blocked relocatable object code, it is common to place an internal symbol dictionary, that is, a list of the symbols exported by the object file and their values, in a special block of each object file. A second special block is typically used to list all of the imported symbols and, for each imported symbol, a list of all the memory locations within the block where the associated values must be stored. If we modify the previous sentence slightly, replacing the text "where the associated values must be stored" with "to which the associated values must be added", we can support simple expressions that have, as values, the sum of an imported symbol and a small constant.

The tagged object code format given in Figure 12 and Figure 14 can be extended to support external and internal symbol definitions as is shown in Figure 17.

Record structure of input

------------------------------------------------------------
    ... | tag | v (2 bytes) | name (0 to 13 bytes) | ...
------------------------------------------------------------

Record types:

   0111 - load byte with v plus imported value of name.
   1000 - load word with v plus imported value of name.
   1001 - export v as the absolute value of name.
   1010 - export v plus the relocation base as name.

Figure 17. External symbols added to record types given in Figure 12.

In Figure 17, we add 4 new record types, two to allow references to externally defined symbols (one for each format of reference allowed, in this case, byte and word), and two to allow definition of such symbols (one for absolute definitions, one for relocatable definitions). Note that the record types for external references provide a 16 bit constant in addition to the symbolic reference. This allows limited use of expressions involving external symbols at assembly time; specifically, the set of legal operands includes an external symbol plus or minus an absolute constant.

Recall that, in Figure 14, we suggested a format for specifiying the starting address of a program as part of this example tagged object code format. When we link together multiple object files, only one of them may specify a starting address. The standard Unix linker for C and C++ uses a different approach to finding the starting address of the program: It requires that the name main be exported by only one object file in the set of object files to be linked, and it uses the value of this external symbol as the starting address.

Unix Object Formats

There is no requirement that all Unix-like systems use the same object format, and in fact, Unix has evolved. The original a.out object format was replaced with the System V Unix COFF or Common Object File Format, a format that remains in use on Microsoft Windows. On Unix-like systems, COOF has been largely supplanted by ELF or Executable and Linkable Format, a data format that also has its origins in System V Unix. All Unix object file formats begin with a header block that starts with a "magic number" so that the execve system call and other software can determine how to read what remains. To execve, the prefix #! on a shell script is just another magic number.

All of the Unix object formats have been block structured object codes, with, typically, one block for the code segment (holding the machine instructions and global constants), one block for the static segment (holding the initial values of all global variables), and additional blocks for other segments, if needed. The header on the object file specifies the starting location and size of each segment, relative to the file itself, and also the requirements for allocating each segment in memory. For each segment to be allocated in memory, the size must be specified, and optionally, the address of the segment. If the memory address is not specified, the loader is free to set the address at load-time. Object files may contain additional segments that are not actually loaded. Among these are relocation segments that specifiy which words in some other segment are to be relocated, and symbolic segments, connecting symbolic names to the segment and address within segment of the named objets. These extra segments are used by the linker, and they are used by debuggers.

Implementing Linkers

The processing performed by linkage editors and linking loaders is essentially the same, except that a linkage editor produces object code as its output, while a linking loader produces output directly into main memory. In either case, most values from the input object code can be passed through into the output with either no modification or with simple relocation. Occasionally, however, the input to the loader will define a symbol, exporting it. When this occurs, the linker will enter the symbol into the linker's symbol table. When the value of a symbol is imported, the linker must look it up in its symbol table. The processing of external names by the linker is extermely similar to the processing of label definitions and references in assembly languages. Thus, there are issues of symbol table management which must be addressed, and the problem of forward reference resolution must be solved.

As with assemblers, the forward reference problem in linkers may be solved with both chaining and the two-pass solution. A two-pass assembler or linker first processes the entire body of input without producing any output. The purpose of this pass is to assign values to every symbol. Then, on the second pass, the input is processed in order to produce the output. Chaining is a one-pass solution where symbol-table entries for symbols that have not yet been defined contain the memory address of the most recent location where that value needs to be put. If there are multiple locations that need the value of the same symbol, they are chained together with each location containing the address of the previous location needing the same value, until the final location in the chain is reached.

The two-pass solution is more general. Chaining cannot handle code that requires a location to hold the sum of an external symbol and a constant offset, and chaining cannot handle code that includes the value of an external symbol stored in a location smaller than a pointer. These features are rare enough that many linking loaders use chaining; on systems with such loaders, assembly languages typically forbid to forbid expressions from incorporating the names of external symbols and forbid storing single bytes in memory with externally specified values. Some linking loaders create their output as a memory image in a random-access output file; these may also use chaining.

It is extremely common for object codes to impose severe limits on the number of characters allowed in an external name. The example in in Figure 17, where the limit was 14 characters, is very typical. In fact, it is generous! The ANSI C standard, for example, only guarantees that the linker will pay attention to the first 6 characters of external names. It would be perfectly reasonable to consider an object code format that limited external names to such a small size to be archaic, but unfortunately, many modern linkers retain such limits!

These limits are a legacy of a data formats known as SQUOZE and Radix-50 These schemes date back to the era of 36-bit computers, punched cards and punched paper tape. In that era, main memory costs were high, on the order of a penny a bit, and any trick allowing more information to be packed into a smaller space was extremely valuable. These encodings allowed things like packing 3 characters into 16 bits (or 18, with 2 tag bits), and they allowed 6 characters to be packed into a 36-bit word. This packing was only possible because the alphabet was limited to upper-case letters, digits, and a few punctuation marks.

It is worth noting that, no matter how symbols are represented, linkage-time symbol tables are frequently fairly small. Only a few of the symbols used in the typical source program are external, and as a result, many linkers use linear search to manage their symbol tables.

In order to facilitate debugging of large programs, many linkers produce an output called a load map. This is frequently a simple dump of the linker symbol table, sorted by the numeric value of the symbols in the table. Given a load map, the debugger (whether human or a software tool) can determine what PC values correspond to what external functions of the program, and which operand references refer to external variables. This use of the load map in debugging motivates many programmers to make all functions in their source program external, so that the debugger can give useful output when the program fails. This is the default in C and C++, and the standard Unix object code formats all allow the load map to be preserved as part of the object file after linking.

When a linkage editor is used, it is common to have it generate relocatable object code as output. As with assemblers which generate relocatable object code, this requires that the linkage editor's location counter be initialized to relocatable zero. The relocation base for the first object file being linked must also be set to relocatable zero, and the relocation base for each successive object file being linked must be set to the maximum relocatable value taken on by the linkage editor's location counter in linking the previous file.

The linkers for many early computers did not perform the complete linkage suggested by the description given above. The assumption was frequently made that the only use of external symbols desired was in the address fields of procedure call instructions. Thus, as long as the linker modified these fields in such a way that the desired procedure was called, the programmer would be satisfied. One common approach was for the linker to insert indirect references to the called procedures in the address field. Whenever a linker using this approach encounters an external reference, it replaces this with an indirect reference through a data structure called the transfer vector. The transfer vector is simply a list of pointers to the entry points of each externally callable procedure. Whenever the linker encounters the actual definition of one of these entry points, it fills in the appropriate entry in the transfer vector.

Object Code Libraries

Most linkers provide one important facility in addition to the linking function: They allow selective loading from object code libraries. Thus, instead of requiring the user to explicitly enumerate to the loader all of the object files to be loaded, the user is only required to specify a main program and one or more libraries. The loader then searches these libraries for object files containing definitions of the required external symbols and loads these along with the main program. Such an object library is a required adjunct to many higher level languages which include large sets of predefined functions or procedures; for example, on Unix, the standard C library, libc is always included when a C program is linked, in addition to any secondary libraries the user may specify.

On some systems, object libraries have a special representation which requires a new systems program, the object library manager. On classic Unix systems, for example, the ar library manager is used to combine multiple .o object files into one .a archive or object library file that the loader can search for the files it needs. Such library managers were particularly important in the era of magnetic tape, since it was obviously desirable to store the entire object library on a single tape. Today, this holdover from that era remains in common use even though today's file systems are extraordinarily good at storing huge numbers of small files.

In many ways, object libraries are not very different from the directories of a file system. Each object file is simply entered in the directory for the library under the name or names that are exported by that object file. There is absolutely nothing to prevent a standard file system's directory structure from being used for this purpose, but despite this, most systems in current use today introduce a new data format for libraries.

When a linker or loader is searching a library, it uses the list of currently undefined symbols in its symbol table as a "shopping list". When an object file is found which includes a definition of one of these symbols, that file is loaded. Frequently, this introduces more undefined external symbols, since the loaded routine may itself contain references to other routines. This suggests that the linker must have two modes of operation: A search mode and a load mode. Some older linkers actually search the entire contents of an object library, making a list of the files to be loaded before making a second pass to do the actual loading. This is not needed, since a two pass assembler can arrange to place all internal symbol definitions at the head of the object file. The linker can then examine these and either switch to load mode or skip the file depending on what names are found.

When object files are stored on random access devices, a more extreme approach is sometimes taken: The directory for the object library is extended to include, for each object file, how much relocatable storage it will require, what external symbols it references, and which internal symbols it defines. This allows a two pass loader to make its first pass over the directory, without ever examining the actual object files until the second pass, at which time, all external symbols will already be defined.

The order in which object files are included in an object library is sometimes critical. The reason for this is that external references from a file included towards the end of the library which refer to symbols defined in a file included earlier in the library cannot be properly resolved if each object file in the library is examined only once. Most object libraries can be ordered in such a way that all external references from any file in the library refer to symbols defined in later files, but finding such an order can be difficult if an object file editor does not do it automatically. An alternate solution is to have the linker make multiple passes over the object library until either there are no more undefined external symbols or no files are found which provide the needed definitions.

Overlay Linkers

On machines where the available address space is relatively small compared to the size of the programs which users desire to run, a special kind of linkage editor called an overlay linker is sometimes used. Overlay linkers were developed in the mid 1960's, and they were used extensively in the early 1970's when many computers were being sold with memory capacities as small as 12K bytes, and when many machines had maximum memory capacities of 48K bytes. An overlay linker both resolves the external references in the various object files being linked and segments the program into overlays. Today, in the era of 32 bit addresses and huge multi-megabyte virtual memories, overlay linkers are something of a historical curiosity.

An overlay is an object file containing part of the code for a program which is to be loaded sometime after the rest of he program is loaded. Programs made of multiple overlays can run in small main memories because many different overlays may be loaded in the same region of main memory, overlaying each other. Of course, before calling a routine that is part of some overlay, the calling code must make sure that that overlay is loaded in memory!

Typically, an overlay linker first builds a data structure which describes the set of object files to be loaded, the amount of memory required by each, the internal symbols defined by each, the external symbols referenced by each, and which object file holds the starting address. Given this, the linker tries to construct an overlay tree. The root of the overlay tree is the object file containing the main program; the main program must be linked conventionally with every object file containing names directly referenced from the main program by any form of reference other than a subroutine (procedure or function) call.

The children of the root consist of object files contianing subroutines called from any that was linked into the root of the overlay tree, and the grandchildren are the object files containing subroutines called by the children. The tree building phase of an overlay linker will not always build an interesting tree. Sometimes, it will conclude that all of the object files to be linked must be combined into the root!

If the tree is bigger than just a root node, then each call from within one node of the tree to an external name residing in a child of that node is replaced with a call to a special load-on-call stub. This stub first checks to see if the overlay containing the desired procedure is currently loaded. If so, it passes control to the subroutine; if not, it loads the overlay and then passes control to the subroutine. Calls from a routine defined in an overlay to a routine in the parent of that overlay can proceed directly because the actual memory allocation is arranged in such a way that the parent overlay is guaranteed to be in memory whenever any of its children are loaded. Calls from within one overlay to routines defined in one of its sibling overlays are not usually allowed.

Terminology

The reader should be familiar with the following general terms after reading this section.

load and go assembly            absolute values
object code                     absolute locations
object file                     starting addresses
object module                   program linkage
loaders                         linkage editor
blocked object codes            linking loader
tagged object codes             external names
relocation                      internal names
assembly-time relocation        linkage-time
run-time relocation             external reference resolution
relocatable values              internal symbol dictionary
relocatable locations           object libraries
transfer vectors                overlay linkers
bootstrap loaders               bootstrapping

Exercises

References

Nobody seems to have thought much about object file formats for a very long time. When an existing system is ported to a new architecture, the general approach is to make the fewest possible modifications to the object format used by that system, and when users start development of entirely new systems, they tend to simply copy the object formats that were used on the system to which they had the most access. The result is a world where tradition dominates and where very little engineering thought has been applied for a very long time.

Assemblers and Loaders, 3rd ed. by D. W. Barron, North Holland, 1978.

P. Calingaert's text, Assemblers, Compilers, and Program Translation by P. Calingaert, Computer Science Press, 1979.

System Software by L. L. Beck,

Systems Programming by J. J. Donovan, McGraw Hill, 1972.

Linkers have been around since the early days of the Fortran language, and it is instructive to look back to some of the early writing on the subject. The paper by McCarthy, Corbato and Daggett in Chapter 5D of

Programming Systems and Languages edited by S. Rosen, McGraw Hill, 1967

"Linkers and Loaders" by L. Presser and J. White, in ACM Computing Surveys 4, 3 (Sept. 1972).

The papers "Assembly Language as Object Code" by D. W. Jones and "A Machine Independent Linker" by C. W. Fraser and D. R. Hanson, both in

Lecture 7, Object Codes, Loaders and Linkers

Final steps on the road to machine code

The Problem

Object Code Formats

Relocation

Linking

Unix Object Formats

Implementing Linkers

Object Code Libraries

Overlay Linkers

Terminology

Exercises

References