Lecture 21, Statements

Part of the notes for 22C:196:002 (CS:4908:0002)
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

A Bit of Humor

Some dark corners of C

The Grammar, as Given

The Falcon grammar, as given, defines statements as follows:

<statement> ::= <if>
             |  <case>
             |  <loop>
             |  <handler>
             |  <raise>
             |  <procedure call>
             |  <assignment>

Deciding which branch to take is trivial for all but the last two cases, because each alternative begins with a keyword: If statements begin with if. Case statements begin with select. Loop statements begin with either while or do or for. Exception handler statements begin with try and the raise statement begins with raise. The problem we face is to distinguish betwee procedures and assignment stateents.

In point of fact, there is no syntactic distinction between these that can be exploited by a recursive descent parser. Here is the grammar, as given, for procedure calls and assignment statements:

<procedure call> ::= <reference> [ <actuals> ]
<assignment> ::= <reference> "=" <expression>
<reference> ::= <identifier>
             |  <reference> "@"
             |  <reference> "." <identifier>
             |  <reference> <constructor>
<actuals> ::= "(" <expression> { "," <expression> } ")"
           |  "[" <expression> { "," <expression> } "]"
           |  "{" <expression> { "," <expression> } "}"
<constructor> ::= "(" <expression> { [ "," ] <expression> } ")"
               |  "[" <expression> { [ "," ] <expression> } "]"
               |  "{" <expression> { [ "," ] <espression> } "}"

Note that there is no syntactic distinction between an actual parameter list and a constructor, suggesting that, from a syntactic point of view, they are the same. Furthermore, as references may include a constructor, the presence of an actual parameter list following the reference is not obviously distinct from the inclusion of the parameter list as part of the reference.

The biggest problem here, however, is that procedure calls and assignment statements both begin with references -- and references are of unbounded length. Therefore, from a syntactic point of view, what we really have is an entangled thing that might best be described as follows:

<call or assignment> ::= <reference> [ "=" <expression> ]
<reference> ::= <identifier>
             |  <reference> "@"
             |  <reference> "." <identifier>
             |  <reference> <constructor>

Evaluating References

The grammar above for references reselbles a grammar for expressions, but the values in question are of a different sort from the values of expressons. Specifically, references have values that are memory addresses.

<reference> ::= <identifier>

If the identifier is the name of an object declared in the current scope, a compiler for a simple non-optimized stack-based machine would push the address of that object on the stack when it encounters this.

<reference> ::= <reference> "@"

On seeing an @ following a reference, the compiler should check that the type of the object on the stack top is "reference to a pointer variable", and issue machine code to replace the reference on the stack top with the value of the pointer variable -- treating this as a reference to the object the pointer points to.

<reference> ::= <reference> "." <identifier>

On seeing a dot following a reference, the compiler should check that the type of the object referenced on the stack top is "reference to a block". Having done this, it should interpret the identifier in the scope of that block, and if the identifier names an object defined in that block, the compiler should issue machine code to replace the reference on the stack top with a reference to the named object in that block. (This will typically involve adding a displacement to the reference on the stack top).

<reference> ::= <reference> <constructor>

There are two cases here. If the reference on the stack top names an array object, the constructor should be evaluated to given an array subscript, and the compiler should issue code to do array indexing -- perform bounds checking if the bounds on the array index fall outside the bounds of the array and then multiply the subscript by the size of the array element and add to the address of element zero of the array.

If the reference on the stack top names a procedure or function, the the constructor is used to construct the parameter list for that procedure or function, and then the compiler emits code to call it. In the case of a procedure, the result is that there is no longer any type of reference on the stack top, so this must be the end, and any further construction is invalidated, on semantic grounds -- this use is legal as a procedure call but not within an expression or on the left-hand side of an assignment. In the case of a function, if the return value is a pointer, this becomes a reference on the stack top and evaluation may continue. If the return value is not a pointer, this must be the end of the reference -- this use is legal within an expression but not on the left-hand side of an assignment.