Lecture 25, Dynamic Semantics

Part of the notes for 22C:196:002 (CS:4908:0002)
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Variables

If we are only concerned about bindings of constants and types, the only attributes of identifiers that matter are their assembly time binding. The picture is more complex for variables. Consider this Falcon program fragment:

type
    x = record
        const
            a = 5;
        type
            b = a .. a+3;
        var
            c: b;
        type
            d = record
                var
                    e: b;
                    f: b;
                code
                    e = c;
                    f = e;
                end;
        var
            g: d;
            h: d;
        end;
var
    y: x;
    z: x;

Looking at the above code, we see a record type d of which there are two instances, g and h. The initializer code for the type d includes references to local variables, that is, to the fields e and f, and it includes a reference to a variable, c that is declared in the enclosing record x.

What does this imply about the atteributes bound to each identifier at compile time? Clearly, within the block d, the identifiers e and f are not simply bound to static memory addresses. Rather, when the declaration g:d is elaborated, e refers to g.e, while when the declaration h:d is elaborated, e refers to h.e, a different memory location.

Therefore, at compile time, the first attribute of every variable binding is the displacement of that variable into an instance of the block. This means, therefore, that the identifier c is also bound to a displacement into instances of the record x in the above example.

Notice that the enclosing block above is not global. It is itself a record declaration, and there are multiple instances of this block, y and z. Therefore, when evaluating c in the context of the assignment statement e=c above, the variable to which c refers is not statically determined.

In the object y, we can speak of y.g.e, and during the initialization of y.g.e, the assignment e=c really means y.g.e = y.c. Later, during the initialization of during the initialization of z.g.e, the assignment e=c means z.g.e = z.c.

Up-Level Addressing

This is called the up-level addressing problem. Languages like C eliminate this problem by forcing "flat programming", where all variables are either global (with static addresses) or local to a single subroutine. In C, you can nest blocks within a subroutine, but these are anonymous blocks that are (in Falcon terms) extensions of their enclosing block, not first-class blocks.

In the extension of C to C++, there are three levels. At the outer level are static objects and variables, allocated at fixed memory addresses. These may be nested in class definitions, that introduce a middle level, and methods of a class are the inner level. Methods have local variables, classes have instance variables, and there are global variables. The rules for declaring items at each of these levels are strictly constrained. While the definition int x; is allowed at any of these levels, other declarations are strictly limited to prevent generalized nesting and therefore avoid creating the up-level addressing problem.

This is not true in languages like Algol 60, Pascal and Ada, where declarations of all kinds are generally legal at any point. In these languages, up-level addressing is routinely possible.

How is it done? The answer is surprisingly simple: By default, unless an optimizer determines that there is no need for it, every instance of every block contains a pointer to the block instance of the enclosing block. When a block is instantiated, the first and entirely implicit parameter to the block's initializer is the pointer to the enclosing block. This is not a pointer to the block that asked for instantiation, it is a pointer to the block instance where the instantiator was found.

In many object oriented languages, there is a special identifier, self, that is a pointer to the current object. Consider the initializer for the record d:

code
    e = c;
    f = e;
end;

In the above, we could rewrite the second assignment as self@.f = self@.e;. The second assignment requires up-level addressing. If we name the implicitly declared pointer found in each block up, we can rewrite the first assignment as: self@.e = up@.c; or even as self@.e = self@.up@.c; (but please don't imagine self@.self@.self).

The above rewrites are not intended to be real Falcon code, because Falcon does not let users directly play with the self and up pointers. Nonetheless, these pointers exist inside the computer. At run-time, the self pointer is invariably stored in an index register whenever fields of an object are referenced, and when up-level addressing is required, the up-pointer is also loaded in an index register, at least temporarily.