14. Lambda

Part of CS:2820 Object Oriented Software Development Notes, Fall 2020
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

 

Where Are We

We have been trying to do "lazy parameter passing" where evaluation of the parameter expression is delayed for as long as possible. We are doing this using the following framework to receive parameters:

class MyScanner {
    Scanner self; // the scanner this object wraps

    /**
     * Parameter carrier class for deferred string construction
     * used only for error message parameters to getXXX() methods
     */
    public static abstract class ErrorMessage {
        public abstract String myString();
    }

    ... deleted code ...

    public String getNext( String def, ErrorMessage msg ) {
        if (self.hasNext()) return self.next();
        Error.warn( msg.myString() );
        return def;
    }

We have boiled down the call that passes a lasy parameter to the following form:

class Road {

    ... several lines deleted ...

    // the constructor
    public Road( MyScanner sc ) throws ConstructorFail {
        final String src;       // where does it come from
        final String dst;       // where does it go

        src = sc.getNext(
            "???",
            new MyScanner.ErrorMessage() {
                public String myString() {
                    return "road: from missing source";
                }
            }
        );

        dst = sc.getNext(
            "???",
            new MyScanner.ErrorMessage() {
                public String myString() {
                    return "road " + src + ": to missing destination";
                }
            }
        )

This moderately compact notation for lazy parametrer passing is equivalent to more verbose notations involving inner classes that extend the class used to receive the parameter, or even more disruptive expressions where the parameter is an instance of a class declared at the global level.

We also note that you can replace a abstract class can be replaced with an interface if the abstract class has no non-static variables and non-static methods that are not themselves abstract. As a result, we can rewrite the definition of MyScanner.ErrorMessage as follows:

class MyScanner {
    Scanner self; // the scanner this object wraps

    /**
     * Parameter carrier interface for deferred string construction
     * used only for error message parameters to getXXX() methods
     */
    public static interface ErrorMessage {
        String myString();
    }

Note that replacing the abstract class with an interface makes all the methods implicitly public abstract.

Lambda

Once you shift from an abstract class, Java lets you abbreviate the call that passes a lazy parameter to a remarkably compact form:

        dst = sc.getNext(
            "???",
            () -> return "road " + src + ": to missing destination"
        )

This is called a lambda expression or λ-expression. The text before the -> is the formal parameter list for the method we are passing, and the code after the -> is the body of the method. Java only allows this notation if the type of the formal parameter is a functional interface. That is, it must have just one public method of that interface that matches the argument list on the lambda expression.

All of Java's restrictions on uplevel addressing from within an inner class apply to references to variables from within the body of a λ expression, because Java actually implemets λ expressions by creating an anonymous inner class for each such expression. That is to say, Java's λ notation is nothing but syntactic sugar.

Lambda?

The idea of lambda expressions is old, dating back to the 1930s, when it was first developed by Alonzon Church. Computers had not been invented yet, and church was interested in the foundations of mathematics, not programming languages. He was trying to give a clear definition of the concept of a computable function. Alan Turing was also working on this problem, and came up with a compltetely different approach, using what is now called a Turing machine. Church and Turing learned of each other's work, and it was not long before the proof that both Church's approach to defining computable functions using his lambda calculus and Turing's approach using Turing machines were equivalent. As a result, we now refer to their definition of computability as the Church-Turing thesis.

Why is it called Lambda notation (and Lambda calculus)? Because Church used the Greek letter lambda λ in his notation. Where conventional mathematical notation might define the function f as:

f(x) = x + 1

Church used a new notation:

f = λx.x + 1

The advantage of this notation is that it allows function names such as f to be treated as variables. The symbol λ can be treated as an operator that takes two arguments (separated by a period), the parameter list, in this case x, and the expression used to compute the value of the function, x+1 here.

Lambda notation was viewed as an arcane mathematical notation until 1960, when the developers of the LISP programming language at MIT decided to use lambda notation. The original LISP equivalent of the above example would be:

(SET F (LAMBDA (X) (ADD X 1)))

LISP stands for LIst and String Processing language, but critics have described the LISP syntax as involving Lots of Infuriating Small Parentheses. LISP became the favored language for large areas of artificial intelligence research, and it lives on today in a variety of guises. The text editor EMACS, for example, is largely written in LISP, and uses LISP as its macro extension language. As with Church's lamda notation, LISP allows dynamic construction of new functions and it allows variables and parameters to have functions as their values.

Tools in other programming languages that allow the same basic flexibility are generally known as lambda notation even if they do not use the letter λ

There are some programming languages where all parameter passing is lazy, that is, where no expressions are evaluated until their values are actually needed. At the cost of a cumbersome implentation, Java's λ expressions can be used to approximate this, but the restriction on all arguments being final can limit this.

Generalized use of Lambda

Lambda expressions are not limited to use in expressions! Consider the problem of generalized lazy computation using integers. We can do this with the following interface:

interface LazyInt {
    int eval();
}

Given this interface, we can write code like this:

final LazyInt a = ()->0;
final LazyInt b = ()->(a.eval() + 5);
LazyInt c = ()->(b.eval() / a.eval());

In the above, the variable c does not hold the value (0+5)/0. Since this involves division by zero, it would throw an exception. Instead, c is the handle of an object with an eval method. When you call c.eval(), it will call b.eval(), which will call a.eval(), etc. As a result, it is the call to c.eval() that will throw an exception because of the attempt to divide by zero.

In the above, the variables a and b were declared to be final in order to emphasize that all variables referenced from within a λ expression must be final or effectiely final. This can trip up a lazy programmer, but there is a workaround, albeit an ugly one. Consider the following first attempt as a bit of code:

LazyInt a = ()->1;
for (int i = 1; i<=10; i++) {
    a = ()->(a.eval() + a.eval());
}
System.out.println( "= " + a.eval() );

This code will not work because a is not effectively final. to make it work, we simply add a gratuitous final variable inside the loop:

LazyInt a = ()->1;
for (int i = 1; i<=10; i++) {
    final LazyInt b = a;
    a = ()->(b.eval() + b.eval());
}
System.out.println( "= " + a.eval() );

When this runs, the variable ends up holding a remarkably complex data structure, so that when a.eval() is finally called at the end of this code, we finally do all the additions.

Of course, this example is silly. The cost of an integer add is less than the cost of the implicit call to an object constructor for an anonymous implementation of the interface LazyInt. This idea is, however, a powerful tool when the computations we want to do are more complex than simple primitive arithmetic operations.

Note that Java allows us to create generic classes and interfaces, so we can build a general mechanism that allows lazy evaluation of almost any type of expression:

interface Lazy<T> {
    T eval();
}

Java requires that the actual type be a full-fledged class, not a built in type such as int. So, we can rewrite the previous example as:

Lazy<Integer> a = ()->1;
for (int i = 1; i<=10; i++) {
    final Lazy<Integer> b = a;
    a = ()->(b.eval() + b.eval());
}
System.out.println( "= " + a.eval() );

This, of course, is even more awful than the previous code because of the amount of autoboxing and unboxing hidden in it. As a rule, simple integer computation is not where lazyness is at its best.

Aside: Memoization of Lazyness

Pure lazy evaluation, as described above, can be very wasteful. The problem is illutratd by this line:

    a = ()->(b.eval() + b.eval());

Here, a call to a.eval() will call b.eval() twice, doing all the work over again the second time. As written, the objct b has no memory of the value it returned the first time. Of course, in this case, we could have just written this:

    a = ()->(b.eval() * 2);

But, it would be nice to have a general solution, a way for an object that implements class Lazy to remember its value once it has been evaluated. This is called memoization. Java poses limitations here. We cannot use λ expressions with anything but a functional interface, ad we cannot add memory, in the form of an instance variable, to a functional interface. That can only be done with classes. We can, however, introduce a class that implements Lazy and can be wrapped around a Lazy object to memoize it:

class Memo implements Lazy {
    private Lazy lazy;  // the lazy expression, or null after evaluation
    private T memo = null; // the memorandum of the previous value

    public Memo( Lazy lazy ) {
        this.lazy = lazy;
    }

    public T eval() {
        if (lazy == null) return memo;
        memo = lazy.eval();
        lazy = null;
        return memo;
    }
}

Each Memo has two instance variables, one being the lazy object it wraps, and the other being a memorandum of the value obtained by evaluating the lazy object. The constructor wraps this Memo object around any Lazy object.

The eval method uses the Lazy object's handle to determine how to operate. If it is non-null, the result of normal lazy evaluation is returned, but a memorandum is made of the value that was returned, and the Lazy object's handle is set to null in order to signal that the memorandum should be returned on any future evaluations of this object.

We could add memoization to the example used above in two different ways. First, we could wrap a mem around the λ expression that causes the explosion in computation:

Lazy<Integer> a = ()->1;
for (int i = 1; i<=10; i++) {
    final Lazy<Integer> b = a;
    a = new Memo<>( ()-> (b.eval() + b.eval()) );
}
System.out.println( "= " + a.eval() );

Second, we could wrap the memo around the final sub-expression that is repeatedly re-evaluated in the λ expression:

Lazy<Integer> a = ()->1;
for (int i = 1; i<=10; i++) {
    final Memo<Integer> b = new Memo<>( a );
    a = ()->(b.eval() + b.eval());
}
System.out.println( "= " + a.eval() );

Both solutions work quite well, and the result is a huge improvement in performance when the loop is run with more iterations. For example, on an Intel(R) Xeon(R) Gold 6234 CPU @ 3.30GHz cpu, using OpenJDK Runtime Environment (build 1.8.0_265-b01) and and OpenJDK 64-Bit Server VM (build 25.265-b01, mixed mode), and setting the number of iterations on the for loop to 30, the version without memoization runs in 4.3 seconds, and the versions with memoization run in close to 0.1 seconds, although the trial to trial variation is significant.