Lecture 25, Dynamic Semantics

Part of the notes for CS:4908:0002 (22C:196:004)
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Variables

If we are only concerned about bindings of constants and types, the only attributes of identifiers that matter are their assembly time binding. The picture is more complex for variables. Consider this Kestrel program fragment:

x: type record
    a: const 5;
    b: type a .. a+3;
    c: var b;
    d: type record
        e: var b;
        f: var b;
        e = c;
        f = e;
    end;
    g: var d;
    h: var d;
end;
y: var x;
z: var x;

Looking at the above code, we see a record type d of which there are two instances, g and h. The initializer code for the type d includes references to local variables, that is, to the fields e and f, and it includes a reference to a variable, c that is declared in the enclosing record x. This means that the variable c is itself not a fixed location. At the outer level, our code creates two variables, y and z, and any particular reference to c in this code will either refer to y.c or to z.c, depending on the run-time context.

What does this imply about the atteributes bound to each identifier at compile time? Clearly, within the block d, the identifiers e and f are not simply bound to static memory addresses. Rather, when the declaration g: var d is elaborated, e refers to g.e, while when the declaration h:d is elaborated, e refers to h.e, a different memory location.

Therefore, at compile time, the first attribute of every variable binding is the displacement of that variable into an instance of some block. In the case of the identifier c in the above example, C is a displacement into the record x.

Notice that the enclosing block above is not global. It is itself a record declaration, and there are multiple instances of this block, y and z. Therefore, when evaluating c in the context of the assignment statement e=c above, the variable to which c refers is not statically determined.

In the object y, we can speak of y.g.e, and during the initialization of y.g, the assignment e=c really means y.g.e = y.c. Later, during the initialization of z.g, the assignment e=c means z.g.e = z.c.

Up-Level Addressing

This is called the up-level addressing problem. Languages like C eliminate this problem by forcing "flat programming", where all variables are either global (with static addresses) or local to a single subroutine. In C, you can nest blocks within a subroutine, but these are anonymous blocks that are (in Kestrel terms) extensions of their enclosing block, not first-class blocks.

When C was extended to make C++, a third level was added. At the outer level are static objects and variables, allocated at fixed memory addresses. Class definitions introduce a middle level, the fields of an instance of the class, and the local variables of the methods of a class are the inner level. While this is a 3-level nesting, no general mechanism is involved. Instead, C++ compilers treat the three levels as distinct unrelated special cases. The rules for declaring items at each of these levels are different and strictly constrained. While the definition int x; is allowed at any of these levels, other declarations are strictly limited to prevent generalized nesting and therefore avoid creating the up-level addressing problem.

This is not true in languages like Algol 60, Pascal and Ada, where declarations of all kinds are generally legal in any context. In these languages, up-level addressing is routinely possible.

How is up-level addressing done? The answer is surprisingly simple: By default, unless an optimizer determines that there is no need for it, every instance of every block, whether it is a record variable or the activation record of a procedure or function, contains an implicitly defined pointer to the block instance of the enclosing block. When a block is instantiated, the first and entirely implicit parameter to the block's initializer is the pointer to the enclosing block. This is not a pointer to the block that asked for instantiation, it is a pointer to the block instance where the instantiator was found.

In many object oriented languages, there is a special identifier, self, that is a pointer to the current object. Consider the initializer for the record d in our introductory example:

d: type record
    e: var b;
    f: var b;
    e = c;
    f = e;
end;

In the above, we could rewrite the second assignment statement using explicit self-references as:

self@.f = self@.e;

The first assignment requires up-level addressing. If we name the implicitly declared pointer found in each block up, we can rewrite the first assignment as one or the other of the following:

self@.e = up@.c;

self@.e = self@.up@.c;

The above rewrites are not intended to be real Kestrel code, because Kestrel does not let users directly play with the self and up pointers. Nonetheless, these pointers exist inside the computer. At run-time, the self pointer is invariably stored in an index register whenever fields of an object are referenced, and when up-level addressing is required, the up-pointer is also loaded in an index register, at least temporarily.

The second version emphasizes that all variables in the activation records of a subroutine (including the initializer for a record variable) are referenced relative to the start of that activation record. In effect, the special variable self is the name for the register in the CPU that points to the activation record; in some coding models, this is the stack pointer, in others, the stack pointer and activation record pointers are distinct. Do not imagine using self@.self@.self.

Initializers

In the same spirit that led us to explicitly name the self and up pointers, we can also explicitly construct the initializer function that is an implicit attribute of every record. Consider, again, our introductory example:

x: type record
    a: const 5;
    b: type a .. a+3;
    c: var b;
    d: type record
        e: var b;
        f: var b;
        e = c;
        f = e;
    end;
    g: var d;
    h: var d;
end;
y: var x;
z: var x;

It contains several distinct things: First, it contains declarations for variables that are fields of the records. Second, it contains the code of the initializers for those records, and finally, it contains an implicit outer record type and associated code. We can separate these as follows:

global: type record
    up: @void;

    x: type record
        up: var @global;
        a: const 5;
        b: type a .. a+3;
        c: var b;
        d: type record
            up: var @x;
            e: var b;
            f: var b;
        end;
        g: var d;
        h: var d;
    
        initd: procedure( self: @d, up: @x )
            self@.up = up;
            self@.e = up@.c;
            self@.f = self@.e;
        end;
    end;

    initx: procedure( self: @x, up: @global )
        self@.up = up;
        x.initd( self@.g, self );
        x.initd( self@.h, self );
    end;

    y: var x;
    z: var x;
end;

main: procedure( self: @global, up: @void )
    self@.up = up;
    global.initx( self@.y, self );   
    global.initx( self@.z, self );   
end;

When we do this, we are forbidden to name any local variables in the initializers -- that is because the activation record of the initializer is actually the record pointed to by the self pointer. Self must be passed by the caller, since it is the caller that actually allocated the instance of the type d that needs initializing.

Note that we are being a bit informal here. The parameter passing modes being used for the initializers are a bit suspect, and the nesting of initializers within record definitions is meaningless because we have no implicit scope rules any more -- all variables are referenced relative to chains of pointers that start with self.