Lecture 38, External Linkage

Part of the notes for 22C:196:002 (CS:4908:0002)
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

External Subroutines

Falcon allows subroutine definitions such as:

procedure putchar( c:char, f:file ) external;
function  getchar( f:file ):char external;

What does external mean here? The definiton says that it means that the function definition is provided by code outside the current program. In the case of putchar and getchar this could be code in the C standard library.

To allow such a call, an ideal Falcon compiler would generate code entirely compatable with the standard C calling and linkage conventions when it encounters an external definition. In to see the required code, compile the following fragment of C code with the -S option and examine the resulting assembly language:

#include <stdio.h>
int main() {
    char ch;
    ch = getchar( stdin );
    putchar( ch, stdout );

This raises two questions: That of external global variables (stdin and stdout) and that of generating subroutine calls that are fully C compatable.

In the above discussion, note the phrase "an ideal Falcon compiler". There is an alternative: Instead of making the Falcon compiler generate calling sequences that are C compatable, make it generate standard Falcon calling sequences, just as it would to call a normal non-external subroutine. The only thing the external keyword would do in this case is mark the function identifier so that the linker can link it.

In this case, if the internal calling sequence used by Falcon is not compatable with the external calling sequence used by C, you will need to write an assembly language adapter that can be called from code generated by your Falcon compiler. The adapter can then call the corresponding C subroutine with a C-compatable calling sequence. This is the kind of thing you might have to do if you write a resolutely stack-based code generator that makes no real use of registers, passing parameters by pushing them on the stack when the C compiler expects to make sensible use of registers for parameter passing.

External Variables

The Falcon manual says:

Private variables and functions declared in the block making up a Falcon program are purely local to that program. Public variables and functions are visible to the linker. With care, this permits linkage with variables that are declared in external code with which Falcon programs are linked. Notably, this allows linkage with variables declared in support libraries written in languages like C.

This implies that all public variables in the outermost block of a Falcon program must be announced by the compiler to the linker, in just the same way that global variables in C (excepting those marked with the static keyword) are announced to the linker.

Again, as with external subroutines, you can experiment with the -S linker option to see how the C compiler deals with global variables.

Calling Sequences (Reprise)

The discussion of external subroutines motivates a brief review of some key terms:

Calling sequence
The sequence of instructions that is used to call a subroutine. In its narrowest sense, this involves the call instruction itself, plus any instructions involved in setting up and taking down the activation record of the called routine or saving and restoring the activation record of the called routine. In a broader sense, all instrucitons involved in passing parameters (but not those involved in computing their values) are also part of the calling sequence.

Receiving sequence
The sequence of instructions that is used on entry to a subroutine. This sequence receives control from the calling sequence and finishes the job of saving the activation record of the calling routine and setting up the activation record of the called subroutine.

Return sequence
The sequence of instructions that is used at the end of a subroutine to return to the caller. This sequence starts the job of deallocating the activation record of the called subroutine and restoring the activation record of the calling routine, then transfers control back to the calling sequence, where this job is completed.

It is extremely important to note that there are two essential jobs here, the call, which saves the calling context and creates the called context, and the return, which deallocates the called context and restores the calling context.

The complete job of the call may be divided arbitrarily between the head end of the calling sequence and the receiving sequence, and the complete job of the return may be similarly divided between the return sequence and the tail end of the calling sequence.

For internal procedures -- those where the compiler has full control of the calling sequence, the receiving sequence and the return sequence, the compiler writer is entirely free to decide how to divide up the work. For linkage to external procedures, the compiler must generate calling, receiving and return sequences that are compatible with the interfaces used elsewhere.

Fortran: A Really Bad Calling Sequence

Just so you can see how programming language features can back the compiler writer into some really awkward places, consider classical Fortran as it was specified in the 1950s and a typical implementation from the 1960s. The manual said that if you had a subroutine that expected a number of parameters, a call that passed fewer parameters was legal so long as there had previously been a call that passed all of the parameters. Thus, after calling s(1,2,3), it was legal to call s(4). The manual defined the latter as being equivalent to s(4,2,3). That is, the parameters that you omitted defaulted to their values from the previous call.

Fortran also passed all parameters by reference. That is, the formal parameter was a reference to the storage location of the actual parameter. Where the actual parameter was an expression, the reference was to a memory location holding the value of that expression.

The net result of the above was that all formal parameters were statically allocated (equivalent to static local variables in C and C++), thus making recursion problematic, at best. Here is the code generated by a typical Fortran compiler I used (with a bit of back-translation to ARM code):

     BAL   rl, F     @ the control transfer
     .word 1         @ parameter count
     .word A         @ address of parameter 1
     .word B         @ address of parameter 2

F:   BAL   r1, FPA$$ @ call parameter/argument resolver
     .WORD 2         @ expected number of arguments
     .word X         @ address of first argument
     .word Y         @ address of second

The subroutine FPA$$ (dollar signs were considered to be alphanumeric and served in the same way that underscores serve in modern C), was the "Fortran Parameter Argument" resolver, responsible for copying the addresses of the actual parameters into the addresses reserved to hold the formal parameters -- the Fortran term for formal parameters was argument. In effect, FPA$$ was an interpreter responsible for passing those parameters that were provided while leaving old values intact to serve as defaults.

Old Fortran hands knew that subroutine calls were expensive because of this interpreter, so they avoided what they saw as excessive use of calls. Programmers who learned languages like C and Pascal first, with their relatively nimble calling sequences, were sometimes derided for using so many calls in their Fortran programs that their code would better be described as having been written in "Calltran."

A General Recommendation about Register Use

Keep in mind the three rules of optimization:

  1. Don't optimize.
  2. If you must optimize, do it later.
  3. Only optimize the bottlenecks, the things that really matter to performance.

In the context of the initial stages of a compiler construction project, this suggests the following general approach to register usage.

Version 1: Be stupid
Set aside one register, R3, as the accumulator that holds the stack top. Keep all variables on the stack, use R4 as the second operand, transiently, and ignore all the other registers.

Version 1.5: Marginally less stupid.
Don't use a run-time stack. Instead, each push allocates an anonymous local variable, and each pop deallocates it. You can still use R3 to hold the top item on the stack, and R4 as a temporary register, but now, you never increment or decrement the stack pointer, and instead use indexed addressing based on the frame pointer for all memory references.

Version 2: Optimize register allocation in basic blocks.
At the start and end of each basic block, put everything back in RAM, but within each basic block, use registers as a cache (with LRU replacement) for the local variables (including the anonymous locals that were used in Version 1.5).

Version 3: Inter-block optimization.
This is harder, obviously.