18. Inner Classes, Interfaces & Anonymity

Part of CS:2820 Object Oriented Software Development Notes, Fall 2015
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

 

Where Were We?

We has just suggested this code for receiving a parameter where the computation of the value of that parameter was to be deferred until it was actually needed. The idea is that the caller must construct a new subclass of class ErrorMessage that has a method to do the desired computation, and then create an instance of this subclass and pass that instance as a parameter so that the called code can call the method when it needs to do the required computation.

public abstract class ErrorMessage {
        public abstract String toString();
}

static void lineEnd( Scanner sc, ErrorMessage message ) {
                String skip = sc.nextLine();
                if (!"".equals( skip )) {
                        Errors.warning( message + message.toString() );
                }
        }

This sounds pretty awful, but Java provides some shorthand to make it easy. We'll do the awful long-winded solution first before we look at the shorthand notation.

A Preliminary Approach

Where our original call to lineEnd said something like this:

ScanSupport.lineEnd( sc, "Intersection " + name );

We could now write this:

class MyMessage implements ScanSupport.ErrorMessage {
        private String msg1;
        private String msg2;
        MyMessage( String m1, String m2 ) {
                msg1 = m1;
                msg2 = m2;
        }
        public toString s() {
                "Intersection " + name
        }
}
MyMessage msg = new MyMessage( "Intersection ", name );
ScanSupport.lineEnd( sc, msg );

The above code is hardly convenient! We had to create a new class with its own fields and initializer as well as the method that encapsulates our delayed computation. Doing this over and over, once for each call to lineEnd() promises to make a totally unreadable program. Fortunately, there is an alternative.

Inner Classes

The long winded code we just gave would work equally well no matter where we declared the new class MyMessage. That forced us to pass all the information that was to be carried in the msg object as parameters to the initializer. There is an alternative. So long as the class definition is local to the context where the variables being referenced lie, we may directly reference those variables from the code of a method in the class. Here is the code, in context:

class Road {
        String name;
        ...
        public Intersection( ... ) {
                ...

                class MyMessage implements ScanSupport.ErrorMessage {
                        public toString s() {
                                "Intersection " + name
                        }
                }
                MyMessage msg = new MyMessage();
                ScanSupport.lineEnd( sc, msg );
        }
}

In this code, the toString() method takes advantage of the fact that it is declared inside the Intersection() initializer. That means that it can access any variables declared locally to the initializer, and similarly, the initializer is declared inside the class Road, so it can access any variables declared within the class.

It is easy to state this scope rule: Anything declared within an outer block may be referenced from within a block nested inside that block. In Java, blocks begin with an opening curly brace { and end with a closing curly brace }.

Implementing this rule is far more difficult. In effect, each nested pair of curly braces implies the creation of an object. Each class is, in effect, the declaration of an object that holds the static fields of that class and the real initializers for objects of that class. And, when you call a method of a class (including an explicit initializer), a temporary object is allocated to hold the local variables of that method. Access to this temporary object is usually lost when the method returns.

There is always a pointer to (a handle for) the current object inside the CPU, so within the code of any object, access to the fields of that object is easy. Java actually gives a name to this handle, this. The more interesting question is, how do you access the fields of enclosing objects?

The answer is, except at the outermost nesting level, each object has an implicit field that is never explicitly mentioned in your code. It is common to call this the enclosing scope pointer or the uplink, but we'll just call it up. Whenever a new object is created, the uplink in that new object is initialized to point to the object that encloses this object.

We can rewrite the above code with these explicit uplinks as follows:

class Road {
        String name;
        ...
        public Intersection( ... ) {
                ...

                class MyMessage implements ScanSupport.ErrorMessage {
                        public toString s() {
                                "Intersection " + up.up.up.name
                        }
                }
                MyMessage msg = new MyMessage();
                ScanSupport.lineEnd( sc, msg );
        }
}

This looks ugly, but this is a problem that has been understood since the 1960s. Compilers eliminate all the uplinks that they can, and they notice common subexpressions. If you see 2*a+x(2*a), you can easily imagine a compiler smart enough to evaluate the subexpression 2*a once and then use that value first to call x() and then to add it to the result. Similarly, a compiler can see up.up.a+up.up.up.b and evaluate up.up just once before grabbing fields a and up.b relative to that subexpression.

The point is, compiler technology is good enough that you shouldn't worry about the computational cost of up-level addressing. Yes, uplinks slow things down a bit, but not so much that you should hesitate to use them. Use nested class delcarations if it makes your code easier to read. It even makes sense to nest things farther:

class Road {
        String name;
        ...
        public Intersection( ... ) {
                ...

                {
                        class MyMessage implements ScanSupport.ErrorMessage {
                                public toString s() {
                                        "Intersection " + name
                                }
                        }
                        MyMessage msg = new MyMessage();
                        ScanSupport.lineEnd( sc, msg );
                }
        }
}

What we did above is add an extra pair of braces and an extra indenting level so that class MyMessage and the object msg are totally local to this call to lineEnd(). That way, you don't need to worry about inventing new names that don't collide with the names used for other subclasses of ErrorMessage.

Interfaces

Before we finish the alternative, let's look back at the code for our abstract class ErrorMessage:

public abstract class ErrorMessage {
        public abstract String toString();
}

Notice that this class has no fields and no methods that are not abstract. All it does is define the interface to a class that may have many implementations. In Java, we can explicitly declare this as an interface as follows:

public interface ErrorMessage {
        public String toString();
}

The keyword interface does two things. First, it says that this is, effectively, an abstract class, and second, it forces all of the methods to be abstract. There is no need for the abstract keyword on each method in this context.

Anonymous Objects and Classes

It is annoying to declare a named item in a block when that name is only used once and the name itself doesn't convey much information. Different language designers have taken vastly different positions about the need for anonymous items. Some languages prevent most anonymity, while others permit it in some cases and not in others.

There is one context where anonymous variables are almost universally permitted. That is, within expresssions. When you write the statement:

i = j*k + l*m;

You are really writing something like this:

{
        int t1 = j*k;
        int t2 = l*m;
        i = t1 + t2;
}

Where t1 and t2 are anonymous local variables created by the compiler to hold the intermediate results during the evaluation of the expression. These variables come into existence at the point they are needed and go out of existence as soon as they are no longer needed.

Returning to the code we are working on, we can apply this idea immediately to eliminate the need for a named variable holding the object that holds our delayed parameters:

                {
                        class MyMessage implements ScanSupport.ErrorMessage {
                                public toString s() {
                                        "Intersection " + name
                                }
                        }
                        ScanSupport.lineEnd( sc, new MyMessage() );
                }

That leaves us with the desire to eliminate need for a named subclass of ErrorMessage. The designers of Java could have forbidden anonymity here, since Java considers classes to be an entirely different kind of thing from objects. The desingers relented, however, and provided a way to declare an anonymous class as an instance of that class is created. The following code is exactly equivalent to the above, except that it uses an anonymous class:

                ScanSupport.lineEnd(
                        sc,
                        new ScanSupport.ErrorMessage {
                                public toString s() {
                                        "Intersection " + name
                                }
                        }
                )

Here, we put the body of the class declaration right in the constructor call, using the name of the parent class (an abstract class) in the new statement, followed immediately by the body.

Admittedly, the text of this call to lineEnd() has gotten a bit long and difficult to format in a pleasing way, but we no longer have any extraneous identifiers for things that are never used again.

A Brief Historical Note

This is old stuff, dating back to the 1950's. Back when people first started designing high-level programming languages that allowed parameter passing, there were many arguments about how to do it. Several approaches were common in the early days. For example, in FORTRAN, the first commercially available high level language, if you called F(X) (a call to a function named F passing the parameter X), the expression X was first evaluated, and then the value of that expression was assigned to a static variable local to the function F. Since the designers of FORTRAN didn't worry about recursion, that was adequate.

In the programming language Algol 60, the designers decided that if f(x) was defined as a×x2+b×x+c, then the call f(z+1) should mean the same thing as a×(z+1)2+b×(z+1)+c. That is, passing a parameter is equivalent to substituting the text of that parameter into every place where the parameter occurs within the called code. They had to stand on their head a bit with this, saying that first, all identifiers in the called routine that conflicted with any identifiers in the parameter were renamed so that there were no naming conflicts. This was called passing the parameters by name and the effect it has is to defer the evaluation of any expressions in the parameter until the value of the parameter is actually needed.

(Yes, in the official documentation for Algol 60, identifiers were usually presented in italics and the text was in lower case. This was despite the fact that most computers of the era were upper-case only and had no italics.)

In the implementation of Algol 60, each parameter was used to define an anonymous function, called a thunk, and when the value of the parameter is needed in a called routine, the called code would then call the thunk. It is not wrong to refer to the objects we are passing in our Java code as thunks if the only reason for passing this object is to serve as a handle on a method that can be called to evaluate a deferred parameter.