33. Program Transformations

Part of CS:2820 Object Oriented Software Development Notes, Fall 2015
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

 

Introduction

No programmer needs objects. Programming languages prior to Simula 67 did not have object-oriented features. These languages include Algol, BASIC, BCPL, C, COBOL, FORTRAN, and Pascal, although some of these have had object-oriented features grafted on as an afterthought.

The most famous of these afterthoughts is C++, which is basically C with object oriented features grafted on. Bjarne Stroustrup, a danish computer scientist, learned to program in Simula 67 before coming to the United States to work in the Unix group at Bell Labs in the late 1970s. The Unix group had developed the C programming language, and as a matter of policy, all of their work was in C. Stroustrup wanted the object-oriented features of Simula 67, so he wrote a preprocessor that added object-oriented features to C; the input language to this preprocessor became C++.

The fact that C++ can be translated to C is a strong hint. Nothing you can do in an object-oriented programming language cannot be done in a language that lackes classes, objects and methods. This should not be too much of a surprise, since our computer hardware has none of these.

Eliminating non-static methods

In general, all non-static methods can be eliminated from an object-oriented program. Consider this little (nonsense) code fragment:

class C {
        private int f; // some field of the object
        public void m( int g ) { // method m
                f = g;
        }
}

// somewhere else in the program
        C o;    // o is an object of class C
        o.m(5); // apply the method C.m to o

We can always rewrite this code as:

class C {
        private int f; // some field of the object
        public static void m( C o, int g ) { // method m
                o.f = g;
        }
}

// somewhere else in the program
        C o;      // o is an object of class C
        C.m(o, 5) // apply the method C.m to o

This transformation always works. In fact, this is the central thing that the original preprocessor for C++ did when it converted object-oriented C++ code to non-object oriented C code.

Note that if all methods are static, you can no-longer claim to be writing object-oriented code. Your classes, such as they are, are merely data organizers, with related subroutines (that do not return values) or functions (that do return values). Static methods with names like C.m() above are really just global subroutines that could have names like C_m() if Java allowd them (note that C++ does allow exactly this, if you want to ignore the object-oriented features of C++ and program in it as if it was plain C.

No method needs more than one parameter

There are a number of other transformations that we can apply. For example, we never really need multiple parameters for any method. Consider this little (nonsense) code fragment:

class C {
        private int f; // some field of the object
        public void m( int g, int h ) { // method m
                f = g + h;
        }
}

// somewhere else in the program
        C o;       // o is an object of class C
        o.m(5, 6); // apply the method C.m to o

We can always rewrite this code as:

class C {
        private int f; // some field of the object
        public int g;  // a parameter to method m
        public void m( int h ) { // method m
                f = g + h;
        }
}

// somewhere else in the program
        C o;       // o is an object of class C
	o.g = 5;
        o.m(6);    // apply the method C.m to o

We have made no changes to how the program does its job, we have just changed how parameters are passed to the method m. We can even make the field g private if we add a new method to set g:

class C {
        private int f; // some field of the object
        private int g; // a parameter to method m
	public void passG( int newg ) { // parameter passing to method m
		g = newg;
	}
        public void m( int h ) { // method m
                f = g + h;
        }
}

// somewhere else in the program
        C o;       // o is an object of class C
	o.passG = 5;
        o.m(6);    // apply the method C.m to o

Note. For static methods, the same transformation applies, but the new fields you add for passing the extra parameters need to be static fields.

Eliminating multiple parameters may seem stupid, but it has a real role. In Intro to Psych, when I took it years ago, one of the things I learned about is the magic number 7 plus or minus 2. Numerous experiments have shown that people's short term memories have a capacity of about 7 items. Psychologists call them chunks because the size of an item is ill-defined. If you read people a sequence of random digits such as 3710583, and then hand them a pencil and ask them to write the sequence down, most people will be able to do that for sequences of about 7 digits. On the other hand, if you read an Iowa City resident a number such as 3193350740, 10 digits long, many will be able to remember it, particularly if you pause in the right places, reading it as 319 335 0740. That is because 319 is just one chunk, the area code of Iowa City, and 335 is another chunk, the standard prefix on all University of Iowa academic phone numbers.

How does this apply to programming? Consider this method call:

	a.b(c,d,e,f,g);

When trying to deal with this call, there are about 7 chunks in it. For methods with fewer parameters, you can most likely juggle them in your head, but for methods with more parameters, you most likely need to write things down, keeping a manual page in one window while you struggle to get things right in another. In this context, it starts making good sense to transform the code to something like this:

	a.setup(c,d);
	a.b(e,f,g);

This makes the most sense when the setup code sets options and values that will remain the same through multiple calls, while the second call has parameters that tend to be different each time they are used. For example, a typical text display subsystem on a graphics computer screen has a logical operation something like the following:

	window.putchar( x, y, font, color, angle, char );

Calling this operation for each character would be extremely cumbersome, and in any case, we usually put out a number of characters all with the same attributes, so we rework the interface to look like this:

	window.settextattributes( font, color, angle );
	window.putchar( x, y, char );

And then, we make an additional change, having the window also retain the current screen coordinates and update them by the width of each character as it is displayed, so we end up with this:

	window.settextattributes( font, color, angle );
	window.setat( x, y );
	window.putchar( char );

Now, we can set the attributes for a whole block of text, and then set the starting location of a line, output all of the characters in that line, and then set the starting location of the next line. We made no change to the underlying algorithm for displaying a character, all we did is change the architecture of the interface to that algorithm.

Initializers

We don't need initializers, or rather, we don't need initializers other than the simplest of default initializers. Consider this class:

class C {
        int i = 5;
        float f = 7.5f;
        SomeClass sc;
	C() {
		sc = new Someclass( null );
	}
	C( Someclass sc ) {
		this.sc = sc;
	}
	... other methods ...
}

First note that the above code mixes in-line initializations of the fields i and f with initializers that handle the field sc. We can move the in-line initialization code into each of the many initializers a class may define. For example:

class C {
        int i;
        float f;
        SomeClass sc;
	C() {
		i = 5;
		f = 7.5f;
		sc = new Someclass( null );
	}
	C( Someclass sc ) {
		i = 5;
		f = 7.5f;
		this.sc = sc;
	}
	... other methods ...
}

Second, we can replace all initializers with factory methods, that is, methods that manufacture and return instances of the class:

class C {
        int i;
        float f;
        SomeClass sc;
	C newC() {
		C nC = new C();
		nC.i = 5;
		nC.f = 7.5f;
		nC.sc = new Someclass( null );
		return nC;
	}
	C newC( Someclass sc ) {
		C nC = new C();
		nC.i = 5;
		nC.f = 7.5f;
		nC.sc = sc;
		return nC;
	}
	... other methods ...
}

Now, elsewhere in the code, anywhere we used to write new C() or new C(x), we now write newC() or newC(x).

Eliminating private fields

The keywords private and protected are never needed in an object oriented program. So long as there are no naming conflicts -- that is, other variables with the same name as a private field -- we can make all fields public. If there are naming conflicts, we need to change the names of the private or protected fields to be unique, for example, by adding a prefix, but this does not change the meaning of the program in any way.

All access rights restrictions such as private and protected serve just one purpose, to prevent careless programmers from making mistakes. These declarations have an impact on the development process, and they have an impact on the documentation of the program, but they have no impact on program execution. In effect, if a variable is declared as private, you could just as well declare it as public and add a comment documenting the access restriction you wish the compiler would enforce.

Eliminating final variables

The keyword final can always be deleted from a program. If you declare a variable to be final, and the program compiles correctly, then deleting the keyword will leave the variable being merely effectively final.

What the final keyword does is prevent you from making new assignments to the variable. As with private and protected, therefore, it is there to help you document a program, and it allows the compiler to enforce restrictions you have documented.

It may well be valuable, in programs that don't use the final keyword, to add comments that document the fact that an assignment is indeed the final assignment to a variable.

Software architecture

The architecture of a piece of software describes how a user interacts with it, just as the architecture of a building describes how a user interacts with the building.

From an architectural standpoint, you do not care very much about what algorithms are used, except to the extent that they have an impact on things users care about like price and performance.

When speaking of a package or a class, where the users are the programmers making use of that package or class in their programs, the architecture is primarily exposed in the public interface to that package or class. This architecture can be changed in many ways while making no changes to the underlying algorithms.

In this semester, up to this point, we have explored several different architectures for the class Simulator that change the way programmers write discrete event simulation programs. We used lambda expressions and explicit subclasses to achieve the same behavior.

In designing small programs, architecture hardly matters. In big programs, poorly selected architecture can have a significant impact.