In modern high-level languages that support an object oriented programming model, programmers are discouraged from writing in terms of pointers to records and instead encouraged to declare classes, class instances and methods. Consider the following binary tree node, declared in Simula 67, the original object oriented programming language:
class TREE; begin ref (TREE) left; ref (TREE) right; int value; procedure traverse; begin if not(left = none) then left.traverse; print(value); if not(right = none) then right.traverse; end traverse; procedure setval(v) int v; begin value := v; end setval; comment - initialization code; left :- none; right :- none; v := 0; end TREE;Note that in Simula 67, class declarations resembles record declarations, with fields of the record becoming instance variables of the class, and with procedures (or functions) declared inside the record becoming methods of the class. Executable statements in the class declaration make up the code of the initializer for instances of that class. The operator := is assignment, while :- is the pointer assignment operation.
Given the above declarations, the following bit of code would be legal:
ref (TREE) a; comment - a is an uninitialized pointer to a tree; a :- new TREE; comment - allocate and initialize an instance; a.setval(5); a.traverse;Calls to methods of a class instance are written as if they were references to components of a record. Thus, a.setval(5) calls the setval method of instance a of class TREE.
The problem we face is how to translate this kind of code to machine code! For simple object oriented programming, the answer is straightforward. Each class is just a record, and the methods and initializers of that class are procedures and functions. As with other high level programming problems, the first step to reducing object oriented code to assembly language has nothing to do with assembly language, but instead, it involves rewriting the high level language program in that languagei, or at least, in a language at a similar level, to avoid the features that don't translate easily to assembly language.
So, the first step is to "de-objectify" the object oriented code, converting it to simple use of procedures and functions. It is convenient, but not strictly necessary, to adopt a systematic naming scheme for the resulting new types and routines. The basic rules we will follow are:
class TREErep; begin ref (TREErep) left; ref (TREErep) right; int value; end TREErep;This declares a structure that is exactly analogous to a Pascal record or C struct, and that conveys no more information about the use of this structure than the corresponding C or Pascal declarations would. The remaining rules above allow us to replace the methods of the class with the following procedures and functions:
procedure TREEtraverse(t) ref (TREErep) t; begin if not(t.left = none) then TREEtraverse(t.left); print(value); if not(t.right = none) then TREEtraverse(t.right); end TREEtraverse; procedure TREEsetval(t,v) ref (TREErep) t; int v; begin t.value := v; end TREEsetval; ref (TREErep) procedure TREEinit; begin ref (TREErep) n; n :- new TREErep; n.left :- none; n.right :- none; n.v := 0; TREEinit :- n; end TREEinit;Note that in Simula 67, as in Pascal, values are returned from functions by assignment to the function name. Thus, the final statement above, TREEinit:-n, would be written as return(n) in C or FORTRAN.
Given the above rewritten declarations, the example bits of code used to illustrate references to the original Simula 67 fragment would be rewritten as follows:
ref (TREErep) a; comment - a is an uninitialized pointer to a tree; a :- TREEinit; comment - allocate and initialize an instance; TREEsetval(a,5); TREEtraverse(a);
The rules for deobjectification given here are completely sufficient for converting object oriented programs into conventional programs so long as the different classes involved are unrelated. In such cases, there is nothing special about object orientation. In fact, some say that the object structures illustrated above are nothing but syntactic sugar, since there is nothing that can be written in a language incorporating simple classes and methods that cannot be rewritten in terms of procedures, functions, records and pointers to records.
More generally, the term syntactic sugar refers to any language feature that is there to help make programs easier to read but contributes nothing in particular to the expressive power of the programming language. Clearly, comments are syntactic sugar, but so are for loops in C and Pascal! After all, there is essentially nothing that can be programmed with a for loop that can't also be programmed with a while and a few auxiliary statements.
Class hierarchies are not mere syntactic sugar! Programs containing class hierarchies cannot be trivially converted to conventional procedural programs; For example, such programs cannot be converted trivially to Pascal without the need to introduce significant extra code! Such programs can, however, be converted to both C and assembly language without adding executable code.
To demonstrate this, we need an example. Consider the problem of representing parse trees of expressions. The parse tree of an expression is a tree with leaf nodes for each operand and internal nodes for each operator, where the tree structure is used in liu of parentheses to indicate which operators are applied in what order. This is illustrated by the following example:
An expression: (1 + 2 * 3) / 4 The parse tree: (/) / \ (+) 4 / \ 1 (*) / \ 2 3There are clearly two kinds of nodes in a parse tree, leaf nodes, each with a value, and internal nodes, each of which carries an operator, a left pointer, and a right pointer. Both external and internal nodes are kinds of parse trees, so we would like to define them as kinds of parse tree nodes. Thus, we say that parse trees nodes are an example of a polymorphic (many shaped) type.
Furthermore, we would like to be able to perform operations on parse trees; specifically, we would like to be able to print the expression corresponding to a tree, and we would like to be able to evaluate the expression corresponding to a tree. To print a leaf node, we output the value of the node. To print an internal node, we print a left paren, print the left subtree, print the operator, and print the right subtree. Thus, depending on the node type, the print operations will be very different.
To handle polymorphic types, Simula 67 and other object oriented programming languages allow classes to be derived from other classes. We start by declaring the base class:
class NODE; begin virtual: procedure print; virtual: integer procedure evaluate; end NODE;The keyword virtual before a procedure declaration for a method of a class says that no definition for that method will be given, but that such a method must be given if the class is ever to be used. Thus above code says that all nodes have methods allowing them to print themselves and a method allowing them to evaluate themselves. This declaration does not provide any code for these methods, but just says that all instances of nodes must provide such code. We must introduce new classes that expand on NODE to actually define these methods. The two we need are given below:
NODE class LEAF; begin int value; procedure print; begin write(value); end print; procedure evaluate; begin evaluate := value; end evaluate; end LEAF; NODE class INTERNAL begin ref (NODE) left; char op; ref (NODE) right; procedure print; begin write('('); left.print; write(op); right.print; write(')'); end print; procedure evaluate; begin if op = '+' then evaluate := left.evaluate + right.evaluate else if op = '-'; evaluate := left.evaluate - right.evaluate else if op = '*'; evaluate := left.evaluate * right.evaluate else if op = '/'; evaluate := left.evaluate * right.evaluate else error; end evaluate; end INTERNAL;Given such a set of declarations, how can they be translated to assembly language? Again, we will seek a high level language solution as an intermediate step, but first, we must seek a conceptual solution.
Consider the code fragment right.evaluate used in the expressions in the second version of evaluate above. The name right refers to a variable which is a pointer to a NODE, but there are two kinds of node, INTERNAL and EXTERNAL. This means that, in a de-objectified version of this code, right.evaluate must determine which of two procedures is to be called. In Pascal, perhaps the best that can be imagined is replacing the call to right.evaluate with something like this:
if right.kind = internal then INTERNALevaluate(right) else EXTERNALevaluate(right);This assumes that NODErep is a variant record (a record with multiple representations) with an added tag field, kind, used to determine which representation, is being used. The problem is, this makes something that is easy to write into something expensive to implement.
In C or assembly language, another option is open. We can replace each "virtual method" of a class with a pointer to the procedure being used. Thus, we would declare three structures:
type struct NODErep { void (*print)(NODErep *); int (*evaluate)(NODErep *); } NODErep; type struct LEAFrep { /* this is a subclass of NODErep */ void (*print)(NODErep *); int (*evaluate)(NODErep *); /* added fields */ int value; } LEAFrep; type struct INTERNALrep { /* this is a subclass of NODErep */ void (*print)(NODErep *); int (*evaluate)(NODErep *); /* added fields */ NODErep * left; char op; NODErep * right; } INTERNALrep; type union NODErep { LEAFrep; INTERNALrep; } NODErep;Note, here, that we have been very careful to replicate all fields of NODErep exactly in the structures used to represent each subclass! The only fields common to all subclasses are the pointers to the print and evaluate functions, each of which takes a single instance as a parameter.
Given these routines, we can apply the same de-objectification rules discussed earlier, giving the following procedures:
void LEAFprint(NODErep * self) { printf( "%d", ((LEAFrep *)self)->value ); } int LEAFevaluate(NODErep * self) { return ((LEAFrep *)self)->value; } NODErep * LEAFinit() { NODErep * self = (NODErep *)malloc(sizeof(LEAFrep)); self->print = LEAFprint; self->evaluate = LEAFevaluate; return self; } void INTERNALprint(NODErep * self) { printf( "(" ); (*((INTERNALrep *)self)->left->print) (((INTERNALrep *)self)->left)); printf( "%c", ((INTERNALrep *)self)->op ); (*((INTERNALrep *)self)->right->print) (((INTERNALrep *)self)->right)); printf( ")" ); } int INTERNALevaluate(NODErep * self) { int leftval = (*((INTERNALrep *)self)->left->evaluate) (((INTERNALrep *)self)->left)); int rightval = (*((INTERNALrep *)self)->right->evaluate) (((INTERNALrep *)self)->right)); return leftval + rightval; } NODErep * INTERNALinit() { NODErep * self = (NODErep *)malloc(sizeof(LEAFrep)); self->print = INTERNALprint; self->evaluate = INTERNALevaluate; return self; }The worst thing about translations of object oriented code to C is the extensive use of casting needed! Casting in C is indicated by the prefix of a parenthesized type name. For example, the (NODErep *) used as a prefix to the call to malloc in INTERNALinit forces the pointer returned to be treated as a pointer to a structure of type NODErep, conforming to the type of the variable self and to the type to be returned by the function.
Formally, such casting is needed everywhere that one of our pointers changes type, and the type changes every time we move from, for example, a parameter of type NODErep, as required by the procedures type declarations, to a value of type LEAFrep or INTERNALrep, as needed to reference the fields of some specific item.
We can make the code slightly easier to read by introducing an extra variable. For example, we can rewrite the evaluate procedure for internal node evaluation as follows:
int INTERNALevaluate(NODErep * self) { INTERNALrep * myself = (INTERNALrep *) self; int leftval = (*myself->left->evaluate)(myself->left); int rightval = (*myself->right->evaluate)(myself->right); return leftval + rightval; }In assembly language, all of the casts dissapear, and there is no need for an extra variable to improve readability. The only thing we need to worry about is how to initialize a field of a record to point to a procedure and how to follow that pointer to get to a particular procedure.
Consider a procedure with the label INTERNALprint. If R3 is a pointer to a record with a field called print that should point to INTERNALprint, we can make it do so as follows:
LEA R5,INTERNALprint STORE R5,R3,printGiven that R3 points to a record containing a field called print that points to a procedure, that procedure can be called as follows:
LOAD R1,R4,print JSR R1,R1If the print procedure is intended as a method of a class, and if the record pointed to by R3 is the representation of an object of that class, the pointer to the record must be passed as a parameter, so using R3 is an excellent choice!