Lecture 19, Expression Values

Part of the notes for CS:4980:1
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Context

We define subrange types as follows, where the expressions have constant values:

<subrange> ::= <expression> ".." <expression>

Therefore, the code within the compiler that parses type definitions must be able to evaluate expressions.

When the compiler evaluates an expression, however, it must be able to represent the type of the expression. Therefore, the code within the compiler that parses expressions must understand types.

This kind of circular relationship is easy to deal with when all that is involved is local code. The problem is more difficult here because we have essential relationships between the interface definitions of two large classes, the class representing types, and the class representing values of expressions.

Inter-file dependencies

Assuming that our compiler has grown large, we will be dealing with at least 4 files here:

Clearly, expression.cpp will include expression.h, and type.cpp will include type.h. This include relationship is the normal relationship between the implementation file and its interface specification.

Clearly, expression.cpp will include type.h, because every expression has a type, and the natural way to include the link from the value of an expression to its type is to include the interface specificaiton for types.

Similarly, type.cpp will include expression.h, because the code to compile a type definition must be able to evaluate the expressions that are part of that type definition. Including header files from within implementation files is not generally a problem.

Unfortunately, it is tempting to have type.h include expression.h, because types reference values, and it is tempting to have expression.h include type.h, because expressions have types.

In C and C++, if a pair of header files each include the other, this sends the preprocessor into an infinite loop.

Kestrel (and most similar languages) contain several similar cycles. In Kestrel, for example, blocks contain statements, and statements may include blocks.

Header files in an object-oriented setting

Consider the problem of compiling expressions. In an object oriented setting, compiling an expression should return an object with all of the attributes you expect for an expression. The interface specification for class expression might look like this:

// expression.h interface definition for expression compiler

// the usual author, date, etc

// the user must first include type.h and environment.h

#ifundef EXTERN
	#define EXTERN extern
#endif

// put any enumerations, etc that users will need here

class Expression {
public:
	static Expression * compile( Environment * e );
	// factory method, compiles an expression

	Type * type; // every expression has a type
}

#undef EXTERN

The implementation file must contain the definitions of all of the methods in the form of a class declaration. Following our established convention for the structure of a header file, the result will look something like this:

// expression.cpp implementation for expression compiler

// the usual author, date, etc

#include "type.h"
#include "environment.h"

#define EXTERN
#include "expression.h"

static Expression * Expression::compile( Environment * e ) {
	// factory method, compiles an expression
}

Note something interesting happened. Our established convention for header files does not put #include directives in header files, instead, we have a comment saying what the caller must have included before including this header file. If we follow this rule consistently, we eliminate the possibility of an infinite recursive cascade of include files.

What about subexpressions? Terms, factors and the like all have exactly the same semantic attributes as expressions. They can be subclasses of Expression, but since they have no new attributes, they can be the same class, just different factory methods.

We have not solved the problem!

In the above code, we wrote this:

#include "type.h"
#include "environment.h"

What if our definition of class Type depends in some way on class Environment and class Environment, in turn, depends on class Type?

There are several ways to deal with such problems, all of which involve dividing at least one of the above include files into two files. For example, we can divide environment.h:

#include "environ.h"
#include "type.h"
#include "environment.h"

Here, environ.h contains all the definitions needed by type.h, but the circular interdependency is created by environment.h (which also depends on environ.h.

So what goes n environ.h? There are several ways of dealing with this.

In any case, the include file environment.h should document its dependency on environ.h, and every file that includes environment.h should first include environ.h.

Generally, you don't want to split any header file carelessly in anticipation that a split will be needed. So, don't rush to split environ.h until you actually discover yourself creating a circular dependency that requires such a split.

Sometimes, you may be tempted to simply put a forward declaration into a context, for example, instead of including environ.h you could just put a forward declaration such as class Environment; before including type.h. This is a bad idea. Mixing code with include directives works, but it is not very readable. It is better to clearly document inter-include-file dependencies in the header comments of the include files and then put the forward declarations in the prerequisite include file.

Back to the original problem

In our code, every expression will have a type, but not every type will involve expressions. Subrange types are bounded by values taken from expressions, but array types, record types, set types, etc. If class SubrangeType is a subclass of class Type, then we can easily declare Environment in terms of class Type without ever needing to know that there is such a thing as class SubrangeType.

What we must do, however, is put the typekind enumeration in type.h in order to allow any user of types to inquire what kind of type they have.

So, the proper ordering of the include directives (ignoring the non-problematic kinds of types) will be:

#include "type.h"
#include "environment.h"
#include "subrangetype.h"

In conclusion, what we discover here is that it is natural to use the subclass approach to solving the problem instead forward declarations, at least in this case. It may well be that forward declarations make more sense elsewhere.