13. The Road to Lambda

Part of CS:2820 Object Oriented Software Development Notes, Fall 2020
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

 

The Problem

Consider this method call:

        dst = sc.getNext( "???", "road " + src + " to missing destination" );

In the last class, we showed that Java implements this as:

        dst = sc.getNext(
            "???",
            new StringBuilder( "road " )
                    .append( src )
                        .append( " to missing destination" )
                            .toString()
        );

That means, for this callto getNext() we must construct a new StringBuilder, construct a new String (or, if Java is smart, convert the StringBuilder to a string, and pay the price of three method calls plus the array-copy operation required to move text from each component string into place. As a result, the total cost of this call to getNext() method may be dominated by the cost of the computation needed to compute the second parameter to the method call, a parameter that will be discarded in the normal case when there is no error.

A First Solution, a Crummy One

The first solution that comes to mind is to change the getNext() methods so they take a number of strings as parameters. Thus, we replace

        dst = sc.getName( "???", "road " + src + " to missing destination" );

with this call:

        String intersectionTyp
        dst = sc.getName( "???", "road ", src, " to missing destination" ); 
        );

This means that the getName() method must always receive 3 string parameters. In the normal case, getName() ignores these parameters, but if there is a need to assemble an error message, it concatenates them. Thus, there is no wasted computation in string concatenation.

If you need an error message that involves fewer strings, just pass empty strings. It is a bit of a nuisance designing methods to use this solution because you need to anticipate the likely number of parameters that might be required. If you have a large program and then you discover some setting where more parameters are required, you either have to search the entire program for calls and add an extra null parameter everywhere, or just concatenate some of the extras in the rare case that you didn't plan on enough parameters.

Note also, there is a cost for parameter passing. With this solution, we always pass the maximum number of parameters we might need, wasting the cost of passing a bunch of nulls (or empty strings) when the call doesn't need that many parameters.

Of course, the number of parameters we need to pass depends on the particular error message we are generating. That means calling toString methods for some parameters, and those can be quite computationally intensive. Converting a float or double to textual format is far from trivial.

A General Solution

The most general solution involves replacing the data parameter with a parameter that conveys a computation. In Java, the way we do this is to contruct an object and pass that object. If the called routine needs the value, it will call a method of that object. That is the method that will do the work. Consider this new version of the MyScanner.getNext() method:

public abstract class ErrorMessage {
    public abstract String myString();
}

public String getNext( String def, ErrorMessage msg ) {
    if (self.hasNext()) return self.next();
    Error.warn( msg.myString() );
    return def;
}

Now, all we have to do to call our syntax-check method is first create a new subclass of ErrorMessage with the appropriate toString() method.

This sounds awful, but Java provides some shorthand to make it easy. We'll do the awful long-winded solution first before we look at the shorthand notation.

Note, we really wanted to use toString() as the name of the method above, but that doesn't work. You can't declare an abstract method in a Java class if it already inherits a concrete method from one of its superclasses, and all classes inherit toString() from class Class.

A Preliminary Approach

Where our original call to sc.getNext() said something like this:

sc.getNext( "???", "Intersection " + name + " missing destination" );

We could now write this new supporting class:

class MissDestMsg implements ErrorMessage {
    private String msg;
    MyMessage( String m ) {
        msg = m;
    }
    public myString s() {
        return "Intersection " + msg + " missing destination";
    }
}

and in the code where we want to call line end, we do this:

ErrorMessage msg = new MissDestMsg( name );
sc.getNext( "???", msg );

Of course, we don't need to add a new variable, we can shorten this code to this:

sc.getNext( "???", new MissDestMsg( name ) );

The above code is hardly convenient! We had to create a new class with its own fields and constructor as well as the method that encapsulates our delayed computation, all to pass a simple 3-operand expression. Doing this over and over, once for each call to sc.getNext() promises to make a totally unreadable program. Fortunately, Java offers alternatives.

Inner Classes

The long winded code we just gave would work equally well if we declare the class ErrorMessage at the outer level of the program, but putting lots of little classes at the outer level leads to a very messy program. Fortunately, Java provides an alternative: We can declare class ErrorMessage as an inner class inside MyScanner.

class MyScanner {
    Scanner self; // the scanner this object wraps

    /**
     * Parameter carrier class for deferred string construction
     * used only for error message parameters to getXXX() methods
     */
    public static abstract class ErrorMessage {
        abstract String myString();
    }

    ... deleted code ...

    public String getNext( String def, ErrorMessage msg ) {
        if (self.hasNext()) return self.next();
        Error.warn( msg.myString() );
        return def;
    }

Note that the inner class here is defined as public, so that code outside class MyScanner can use it, and it is define to be static so that instances of this class have no access to anything other than static components of MyScanner. In fact, class ErrorMessage makes no use of any access it has to fields of MyScanner, but Java does allow such uses in a limited way. Finally, class ErrorMessage is abstract so you have to create a specific subclass for each kind of error message, and it commits those subclasses to providing a myString method by declaring that to be an abstract method.

Similarly, while we could declare each subclass of ErrorMessage at the outer level, each of those subclasses is likely to be needed in only one place, so it is better to declare them as inner classes right at the point of use. So, in Road, we can write code like this:

class Road {

    ... several lines deleted ...

    // the constructor
    public Road( MyScanner sc ) throws ConstructorFail {
        final String src;       // where does it come from
        final String dst;       // where does it go

        class MissingSource extends MyScanner.ErrorMessage {
            String myString() {
                return "road: from missing source";
            }
        }
        src = sc.getNext( "???", new MissingSource() );

        class MissingDestination extends MyScanner.ErrorMessage {
            final private String src;
            MissingDestination( String s ) {
                src = s;
            }
            String myString() {
                return "road " + src + ": to missing destination";
            }
        }
        dst = sc.getNext( "???", new MissingDestination( src ) );

In this code, classes MissingSource and MissingDestination are each used in just one place, the line immediately following the class declaration. Each of them extends MyScanner.ErrorMessage, referring to the inner class of class ErrorMessage. Of these two MissingSource is trivial. Its myString method just returns a constant string, making the entire mechanism just an expensive way to pass a string constant to getNext().

MissingDestination is more interesting. This has an instance variable src that is initialized by from a parameter to the constructor. Here, the actual parameter passed to the constructor for MissingDestination is also called src, but in the constructor call, it is the string holding the name of the source intersection. The myString method is no longer trivial, it concatenates two string constants, one before src and one after.

Because Java allows code in an inner block to reference items declared in outer blocks, we can simplify the above code, writing just this:

    // the constructor
    public Road( MyScanner sc ) throws ConstructorFail {
        final String src;       // where does it come from
        final String dst;       // where does it go

        src = ... some deleted code ...

        class MissingDestination extends MyScanner.ErrorMessage {
            String myString() {
                return "road " + src + ": to missing destination";
            }
        }
        dst = sc.getNext( "???", new MissingDestination );

In the above code, the variable src used in MissingDestination.myString() appears to be a direct reference to the variable src that is a local variable of the constructor Road.

The truth is more complicated. Java imposes some very strict limits on uses of outer variables from within inner classes. Specifically, Java requires that such "up-level references" be confined to variables that are "final or effectively final." In our case, src was declared to be final, so we have trivially met this constraint?

Why does Java have this restriction? The answer has to do with the history of Java and its antecedants, C++ and C. Inner classes are an afterthought in Java, and similar nesting relationships are an afterthought in C++. Classical C does not support any kind of nesting of one function definition within another.

So, how did the implementors of Java add inner classes with up-level variable references? The answer is, they cheated and made the compiler convert all inner classes to outer classes. Wherever an inner class contains a reference to a variable declared in the enclosing context, the compiler turns that into an implicit final instance variable of the inner class, adding a implicit parameter to the constructor to initialize that implicit variable. In short, the notation above withot an explicit constructor for MissingDestination is merely shorthand, and the Java compiler actually generates the code describe by the original version where the constructor for MissingDestination had a parameter used to initialize an instance variable.

When you write code with an explicit constructor and explicit initialization of the instance variable, you can pass anything you want to the constructor. The designers of of Java did not want to advertise what they were doing, so instead of explaining it, they simply make the compiler enforce the rule that the only outer variables you can use from a class are those that are final or effectively final. With this rule, they do not need to explain how they passed the value because all the possible implementations would produce the same result.

Sadly, the general solution to the up-level addressing problem was developed back in the 1960s for implementations of the Algol 60 programming language, first released in 1961. This solution was also used in Simula 67, the first object-oriented programming language and the direct ancestor of the object-oriented features of C++ and Java. Sadly, the general solutio to the up-level addressing problem never made it to C++ and Java.

The most general implementation works as follows: Except at the outermost nesting level, each object has an implicit final field that is never explicitly mentioned in your code. It is common to call this the enclosing scope pointer or the uplink, but we'll just call it up. Whenever a new object is created, the uplink in that new object is set to point to the object that encloses this object.

We can rewrite the above code with these explicit uplinks as follows:

    // the constructor
    public Road( MyScanner sc ) throws ConstructorFail {
        final String src;       // where does it come from
        final String dst;       // where does it go

        src = ... some deleted code ...

        class MissingDestination extends MyScanner.ErrorMessage {
            private final BlockReference up;
            MissingDestination( BlockReference u ) {
                up = u;
            }
            String myString() {
                return "road " + u.src + ": to missing destination";
            }
        }
        dst = sc.getNext( "???", new MissingDestination( this.Road ) );

In the above, the this.Road in the call to the constructor for MissingDestination() is not legal Java, but it is an attempt to suggest that the block of memory holding the local variables of the constructor Road are actually in an object (sometimes called an activation record or a stack frame), and the handle for that object is passed to MissingDestination(). All up-level addressing can be done this way.

Anonymous Inner Classes

If we only use a class name in one place, why not just put the class definition there instead of giving it a name. We do this with variables all the time. We can write this, and in fact, the following code is close to what actually gets executed inside the computer:

int t1 = a + b;
int t2 = t1 * 5;
methodCall( t2 );

Most programmers won't write that. Instead, they eliminate the variables t1 and t2 and simply put the expressions together into the place where the final value is needed:

int t1 = a + b;
int t2 = t1 * 5;
methodCall( (a + b) * 5 );

Inside the computer, the temporary variables t1 and t2 still exist, but they are now anonymous. When a is added to b the result still has to be put somewhere before it is multiplied by 5, but the variable holding this intermediate value now has no name.

Java lets us write code with single-use inner classes abbreviated the same way. We can write this:

class Road {

    ... several lines deleted ...

    // the constructor
    public Road( MyScanner sc ) throws ConstructorFail {
        final String src;       // where does it come from
        final String dst;       // where does it go

        src = sc.getNext(
            "???",
            new MyScanner.ErrorMessage() {
                String myString() {
                    return "road: from missing source";
                }
            }
        );

        dst = sc.getNext(
            "???",
            new MyScanner.ErrorMessage() {
                String myString() {
                    return "road " + src + ": to missing destination";
                }
            }
        )

In the above, constructs like new MyScanner.ErrorMessage() mean call the constructor of an anonymous subclass of MyScanner.ErrorMessage() where that class has the following body. That is to say, the notation just introduced is just a short-hand notation for what we have already done with explicit non-anonymous inner classes, and that is just a short-hand for conventional outer classes that are only visible in one part of the program and may have implicit constructors and hidden fields to handle up-level addressing. It's all syntactic sugar, but the result is a reasonably compact notation.

Interfaces

Before we finish the alternative, let's look back at the code for our abstract class ErrorMessage:

public abstract class ErrorMessage {
    public abstract String myString();
}

Notice that this class has no fields and no methods that are not abstract. All it does is define the interface to a class that may have many implementations. In Java, we can use the keyword interface instead of abstract class in this context. When we declare an interface instead of an abstract class, Java forbids declaring any fields, and all of the methods are implicitly abstract. So we can replace the above with this:

public interface ErrorMessage {
    public String myString();
}

When you declare something as an interface, classes that build on that interface are said to implement that interface, so you use the keyword implements instead of extends when you use the interface as the basis of a class.