Lecture 21, Statements

Part of the notes for 22C:196:002 (CS:4908:0002)
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

A Bit of Humor

Some dark corners of C

The Grammar, as Given

The Kestrel grammar, as given, defines statements as follows:

<statement> ::= <if>
             |  <case>
             |  <loop>
             |  <handler>
             |  <raise>
             |  <procedure call>
             |  <assignment>

Deciding which branch to take is trivial for all but the last two cases, because each alternative begins with a keyword: If statements begin with if. Case statements begin with select. Loop statements begin with either while or do or for. Exception handler statements begin with try and the raise statement begins with raise. The problem we face is to distinguish betwee procedures and assignment stateents.

In point of fact, there is no syntactic distinction between these that can be exploited by a recursive descent parser. Here is the grammar, as given, for procedure calls and assignment statements:

<procedure call> ::= <reference>
<assignment> ::= <reference> "=" <expression>
<reference> ::= <identifier>
             |  <reference> "@"
             |  <reference> "." <identifier>
             |  <reference> <constructor>
<constructor> ::= "(" <expression> { [ "," ] <expression> } ")"
               |  "[" <expression> { [ "," ] <expression> } "]"
               |  "{" <expression> { [ "," ] <espression> } "}"

Note that the actual parameter list of a procedure call is subsumed under the syntactic category of constructors. These are also used for array subscripting and for function calls on the left-hand side of an assignment statement (for example, functions that return pointers to the object being assigned to).

References can be quite simple: i, a simple variable. Or they can be very complex: a(i)@.m(p), where a is an array subscripted by i, the array elements being pointers to objects where m is a method that takes parameter p. The reference in the last example is a procedure call because the method did not return a value.

Evaluating References

The complexity of references is comparable to the complexity of expressions, and in fact, a reference may be considered to be a kind of expression. The difference is that an expression has a value that is an element of some type, while a reference has a value that names an object. The value of a reference is like a pointer to the object, except that references cannot be assigned to pointers. (This is not the case in C and C++, and it is one of the reasons that C and C++ are not safe programming languages.) Let's look at the grammar for references in more detail:

<reference> ::= <identifier>

If the identifier is the name of an object declared in the current scope, a compiler for a simple non-optimized stack-based machine would push the address of that object on the stack when it encounters this.

<reference> ::= <reference> "@"

On seeing an @ following a reference, the compiler should check that the type of the object on the stack top is "reference to a pointer variable", and issue machine code to replace the reference on the stack top with the value of the pointer variable -- treating this as a reference to the object the pointer points to.

<reference> ::= <reference> "." <identifier>

On seeing a dot following a reference, the compiler should check that the type of the object referenced on the stack top is "reference to a block" (recall that Kestrel uses blocks for what you are used to calling objects). Having done this, it should interpret the identifier in the scope of that block, and if the identifier names an object defined in that block, the compiler should issue machine code to replace the reference on the stack top with a reference to the named object in that block. (This will typically involve adding a displacement to the reference on the stack top).

<reference> ::= <reference> <constructor>

There are two cases here. If the reference on the stack top names an array object, the constructor should be evaluated to give an array subscript, and the compiler should issue code to do array indexing -- perform bounds checking if the bounds on the array index fall outside the bounds of the array and then multiply the subscript by the size of the array element and add to the address of element zero of the array.

If the reference on the stack top names a procedure or function, the the constructor is used to construct the parameter list for that procedure or function, and then the compiler emits code to call it. In the case of a procedure, the result is that there is no longer any type of reference on the stack top, so this must be the end, and any further construction is invalidated, on semantic grounds -- this use is legal as a procedure call but not within an expression or on the left-hand side of an assignment. In the case of a function, if the return value is a pointer, this becomes a reference on the stack top and evaluation may continue. If the return value is not a pointer, this must be the end of the reference -- this use is legal within an expression but not on the left-hand side of an assignment.