Lecture 18, Identifier Bindings

Part of the notes for CS:4980:1
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Context

In our discussion of blocks in Lecture 16, we suggested that a call to the compile_declaration function, or perhaps declaration.compile() method, would look something like this,

e = compile_declaration( e );

Here, e is an environment. The declaration takes an environment that gives it the bindings of identifiers to attributes, and then it returns an updated environment that contains all the bindings of the old environment augmented with the new bindings established by that declaration. For example,

color: type ( red, blue, green );

This declaration creates the following new bindings:

color, a new type, specifically an enumerated type, where color.max = color.min + 2. because the enumeration has 3 elements.
red, a new constant of type color with the value color.min.
blue, a new constant of type color with the value color.min + 1.
green, a new constant of type color with the value color.min + 2.

What is a binding?

The kestrel definition gives the following definition for <declaration>

<declaration> ::= <identifier> ":" [ "private" | "restricted" ] <declarator>
<declarator> ::= <constant declarator>
              |  <type declarator>
              |  <exception declarator>
              |  <variable declarator>
              |  <procedure declarator>
              |  <function declarator>

The accompanying text states that bindings that are not declared to be private or restricted are public. This immediately suggests that if b is a binding, there should be a field to declare the visibility of that binding. We might call it b.access since it determines who can access this binding. The values allowed in this field can be defined as an enumeration:

typedef enum {ISPUBLIC, ISPRIVATE, ISRESTRICTED} accessibility;

The IS prefix on each of the above enumeration values is because the identifiers public and private are keywords in C++ and mere use of upper case is not always enough to make the reader see them as distinct from these keywords. In Kestrel, definitions are public unless marked otherwise, so our set of keywords is different.

The next problem posed by the grammar for declarators given above is to distinguish between the different kinds of declarators. There are two sensible ways of doing this, one that makes sense in an object-oriented setting. The simple-minded approach is to define two new fields, We might call them b.kind and b.value. Of these, b.kind would take its value from an enumeration, for example:

typedef enum {ISCONST, ISTYPE, ISEXCEPT, ISVAR, ISPROC, ISFUND } declarator_kind;

The second field, b.value would be an untyped pointer to the appropriate object. If b.kind is isconst, then b.value would be a pointer to a constant object. If b.kind is ISTYPE, b.value would be a pointer to a type object, and so on. This is how the job would be done in compilers written without use of object-oriented programming languages.

In an object-oriented language like C++, we can use the class hierarchy, with the field b.kind holding the handle for an object of class Declarator. Class Declarator would be an abstract class, with concrete subclasses Const_decl, Type_decl, Except_decl, Var_decl, Proc_decl, and Func_decl. If we were using Java, we could use the instanceof operator to see if some declarator was a member of one of these subclasses.

One of the extensions to C++ has a mechanism called dynamic casting that can be used to do this, but it is just as easy to add a kind virtual method to class Declarator that returns a value of type Declarator_kind. The kind method of class Const_decl, for example, would return isconst. In summary, in C++, we'd use something like the following:

class Declarator {
public:
	virtual declarator_kind kind();
}

Type

The Kestrel definition says:

<type declarator> ::= "type" <type>
                   |  "type" "-"
<type> ::= <reference>
        |  <enumeration>
        |  <subrange>
        |  <pointer>
        |  <array>
        |  <set>
        |  <record>

Here we discover that, if we ignore the - (dash) option, a type declarator is just a type. The purpose of the dash option is to allow this type to remain undefined, so that an identifier can be defined as a type identifier without finishing the declaration until later. This is useful in the context of circular definitions where several types each depend on the other, for example, in the construction of complex data structures.

If we ignore the dash, then, the objects returned by parsing a reference, an enumeration, a subrange, a pointer, an array, a set, or a record are all merely subclasses of type declarator, just as type declarator is a subclass of declarator. We can handle the dash by simply leaving a null handle in the data structure. Reducing this to code, we get:

class Type_decl: public Declarator {
public:
	virtual type_kinds type_kind();
}
Type_decl::kind() { return ISTYPE; }

In the above, we gave a function body one virtual function while we declared a new virtual function. We still cannot create instances of this class, but only subclasses that provide more detail.

Let's look briefly at the different kinds of of types we need to deal with:

word: type int32
A type may be defined by reference to another type. In the example, the identifier word is just a synonym for the existing type int32. No new type object is created here.
color: type (red, green, blue)
A type may be defined by an enumeration. In the above example, A new base type, a subrange of itself from the value of red (probably represented by zero) to the value of red+2 is created, along with 3 new constants of that type. This means that there is no need for objects of class enumeration as representations of enumerated types. All we need are subranges.
ten: type 1 .. 10
A type may be defined by a subrange. In the above example, a new type is created which is a subrange of a larger integer type, with specific bounds, each of which is a constant of that larger type.
treelink: type @treenode
Pointer types refer to objects of other types. In the above example, the new type treelink is defined as representing pointers to objects of type treenode. Every variable of type treelink is either null or it points to a treenode.
a: array ten of treelink
Array types name indexed collections of other types. In the above example, every object of type a is an array of 10 objects of type treelink, that is, pointers to objects of type treenode, indexed from 1 to 10.
s: set of ten
Set types name collections of values of the indicated types. In the above example, every variable of type s is a set containing values in the range from 1 to 10.
r: record ar: a se: s end
Record types name objects that may have multiple fields. In the above example, every variable of type r is a record containing two fields, ar (a 10 element array) and se (a set).

Subrange Types

Subrange types are, perhaps, the simplest kind of type supported by Kestrel. These are used for integers and characters.

<subrange> ::= <expression> ".." <expression>

The expressions must have constant values, which is to say, the compiler must be able to evaluate the expressions at compile time.

When the compiler evaluates an expression, it must have some idea of the type of the expression. Integer expressions have integer types, for example. The two expressions must have values that are of the same underlying type, and they must be in order, with the left expression less than or equal to the right expression. These give the lower and upper bounds on the subrange.

This means that a subrange has a lower bound, an upper bound, and an underlying type as its value, leading to something like the following.

class Subrange: public Type_decl {
public:
	virtual type_kinds type_kind();
	Subrange * base_type;
	int32_t min;
	int32_t max;
}
Subrange::type_kind() { return ISSUBRANGE; }

In the above, we require that the base type of every subrange type be itself a subrange type, and we require that the value of any subrange type be representable as a signed 32-bit integer. In Kestrel, all finite scalar types, that is, enumerated and subrange types, must be representable as some kind of integer. The choice here, to use int32_t, has serious consequences. It prevents us from representing unsigned 32-bit values. Thus, if we use this definition, we can't support the Kestrel type uint32. To support that type, the compiler needs at least 33 bits of 2's complement precision internally. For now, we'll avoid this issue, but in general, if a compiler supports a language that permits both signed and unsigned integers represented in n bits, the compiler itself needs at least n+1 bits in its internal integer representation.

Now, let's examine the data structure that must be bound to the type int32 by in our compiler. We can create this type as follows:

Subrange * kestrel_int32 = new Subrange;
kestrel_int32->base_type = kestrel_int32;
kestrel_int32->min = 0x80000000;
kestrel_int32->max = 0x7FFFFFFF;

Here, we've created a new subrange object that has itself as its own base type. This way, it has the same base type as any subranges derived from it, so that comparison and other operators all remain legal between any integer types.

The expression analyzer should return the value kestrel_int32 as the type of any integer-valued expression, along with the value of that expression, either a constant (of type int32_t) or an indication of where that value is stored.

The Boolean type poses additonal problems. We might define this as follows:

Subrange * kestrel_bool = new Subrange;
kestrel_bool->base_type = kestrel_bool;
kestrel_bool->min = 0;
kestrel_bool->max = 1;
Constant * kestrel_false = new Constant;
kestrel_true->base_type = kestrel_bool;
kestrel_true->value = 0;
Constant * kestrel_true = new Constant;
kestrel_true->base_type = kestrel_bool;
kestrel_true->value = 1;

Here, we've created internal names for the types and their constants, but we haven't bound those names to identifiers in the environment. The initializer for the standard environment will contain numerous bits of code such as the above, plus the code to create the bindings and enter them into the environment.