CS:2820 Notes, Lecture 37

Here is a little Java program, something any first-semester programmer should be able to understand fairly early in the semester:

public class Demo {
    public static void main(String[] args) {
        int a = 1;
        for (int i = 0; i < 30; i++) {
            a = a + a;
        }
        System.out.println( "= " + a );
    }
}

Each iteration of the loop doubles a, starting with one, so the successive values are 1, 2, 4, 8, 16, 32, repeating it 30 times so that, at the end, the output is 2³⁰, a bit over a billion. Hardly exciting!

Adding lazyness using λ expressions

Here is what might at first seem to be a trivial change to the above program, adding λ expressions.

interface Lazy<T> {
    T eval();
}

public class LazyDemo {
    public static void main(String[] args) {
        Lazy<Integer> a = () -> 1;
        for (int i = 0; i < 30; i++) {
            final Lazy<Integer> b = a;
            a = () -> b.eval() + b.eval();
        }
        System.out.println( "= " + a.eval() );
    }
}

At first glance, the interface Lazy seems to do nothing useful, and the main program seems to have the exact same logic as the original, except that a bunch of annoying λ expressions have been added, making things more verbose and apparently accomplishing nothing at all.

As we will see, this is not true! This program actually builds a complex data structure, and its execution is recursive.

One of the more annoying additions to the program is the variable b added inside the loop. It would have been nice to write the body of the loop like this:

            a = () -> a.eval() + a.eval();

Unfortunately, the above is illegal because the variable a is not final or effectively final — obviously, since it is assigned many times inside the loop. Therefore, we need to make a final copy of a:

            final Lazy<Integer> b = a;
            a = () -> b.eval() + b.eval();

Before going on, go back and look at LazyDemo and see if you can find the data structures it builds and the recursive code that traverses that data structure.

The fact that something puzzling is going on should be evident the moment you try to run the above code. It is extraordinarily slow!

Unpacking the Code — Eliminate λ

To see what is going on above, it is worth looking at how Java implements λ expressions. We can rewrite the example removing λ notation as follows:

interface Lazy<T> {
    T eval();
}

public class LazyDemo {
    public static void main(String[] args) {
        Lazy<Integer> a = new Lazy<Integer> {           //
            public Integer eval() {                     //+
                return 1;                               //+
            }                                           //+
        };                                              //+
        for (int i = 0; i < 30; i++) {
            final Lazy<Integer> b = a;
            a = new Lazy<Integer> {                     //
                public Integer eval() {                 //+
                    return b.eval() + b.eval();         //+
                }                                       //+
            };                                          //+
        }
        System.out.println( "= " + a.eval() );
    }
}

Lines that are added to the original are marked with //+, lines with small changes from the original are marked with //.

Now, you should see that the program created a total of 11 objects of class Lazy<Integer>. One was the initial value of a, and 10 more were created during the iterations of the loop body.

Unpacking the Code — Eliminate Anonymous Inner Classes

To see what is going on above, it is worth looking at how Java implements anonymous inner classes:

interface Lazy<T> {
    T eval();
}

public class LazyDemo {
    public static void main(String[] args) {
        class InitLazy implements Lazy<Integer> {       //+
            public Integer eval() {
                return 1;
            }
        }                         
        Lazy<Integer> a = new InitLazy();               //
        for (int i = 0; i < 30; i++) {
            final Lazy<Integer> b = a;
            class UpdateLazy implements Lazy<Integer> { //+
                pubic Integer eval() {
                    return b.eval() + b.eval();
                }
            }
            a = new UpdateLazy();                       //
        }
        System.out.println( "= " + a.eval() );
    }
}

Again, lines that are added to the original are marked with //+, and lines with small changes from the original are marked with //. Lines that have merely been moved without changes are unmarked.

Again, it should be obvious that there are 11 new objects created here. The same quiz questions remain appropriate:

Unpacking the Code — Eliminate Inner Classes

To see what is going on above, it is worth looking at how Java implements inner classes:

interface Lazy<T> {
    T eval();
}

class InitLazy implements Lazy<Integer> {
    public Integer eval() {
        return 1;
    }
}

class UpdateLazy implements Lazy<Integer> {
    final Lazy<Integer> b;                              //+
    UpdateLazy( Lazy<Integer> i ) {                     //+
        b = i;                                          //+
    }                                                   //+
    public Integer eval() {
        return b.eval() + b.eval();
    }
}                                                

public class LazyDemo {
    public static void main(String[] args) {
        Lazy<Integer> a = new InitLazy();
        for (int i = 0; i < 30; i++) {
            a = new UpdateLazy( a );                    //
        }
        System.out.println( "= " + a.eval() );
    }
}

Again, lines that are added to the original are marked with //+, and lines with small changes from the original are marked with //. Lines that have merely been moved without changes are unmarked.

We made one optimization! We eliminated the final variablbe b from inside the loop, since we were forced to put a new variable inside the UpdateLazy object and the auxiliary final variable in the loop became redundant once we did this.

To change the topic completely, can you find all the autoboxing and unboxing in this program? The code involves many many conversions between the primitive type int and the class Integer. These undoubtedly slow the program considerably, but they only impose a linear time performance penalty, while the traversal of the list imposes an exponential penalty.

Lazyness and Memoization

What we've done above, originally using λ expressions and then unbundling to explain how it worked, is called lazy computation. The variable a inside our loop does not hold the result of the computation we want, it holds the instructions to do that computation when we need the result.

This particular example doesn't make lazyness look valuable, but there are times when lazyness is extraordinarily valuable. Some problems, for example, are best described by infinite data structures. No particular run of a program will ever need the entire infinite data structure, so what we can do is describe that structure using a lazy algorithm to construct it. Then, when the program runs, the eval methods are called only to generate just enough of the data structure to get the result we need.

The worst feature of the code above is that each call to eval required two recursive calls to eval. Of course, we could fix this by replacing this:

return b.eval() + b.eval();

return 2 * b.eval();

That is a one-time optimization requiring that you understand the algorithm. There is a trick called memoization (a term coined in 1968 by Donald Michie) that eliminates the exponential cost without requiring that we understand the algorithm being evaluated.

The idea of memoization is to have each object retain a memorandum of its value. The first time you call the eval method on an object, it actually does whatever work is required, but it also keeps a memo of the result it computed. On the second and later calls to eval on the same object, it returns the memo instad of repeating the work it already did.

Class Memo in the following version of our code does this. Here, we return to the use of compact λ notation for the sake of readability.

interface Lazy<T> {
    T eval();
}

class Memo<T> implements Lazy<T> {
    private Lazy<T> lazy;  // the lazy expression, or null after evaluation
    private T memo = null; // the memorandum of the previous value

    public Memo( Lazy<T> lazy ) {
        this.lazy = lazy;
    }

    public T eval() {
        if (lazy == null) return memo;
        memo = lazy.eval();
        lazy = null;
        return memo;
    }
}

public class MemoDemo {
    public static void main(String[] args) {
        Lazy<Integer> a = () -> 1;
        for (int i = 0; i < 30; i++) {
            final Memo<Integer> b = new Memo<Integer>( a );
            a = () -> b.eval() + b.eval();
        }
        System.out.println( "= " + a.eval() );
    }
}

Note that the main program has hardly changed, but that, because we have memoized the data structure, execution now takes linear time in the number of iterations of the loop. Setting the number of iterations to 30, the non-memoized lazy version took 4.35 seconds where the memoized version ran in about 0.1 seconds on one server.

37. λ Expression Review

A little program

Adding lazyness using λ expressions

Unpacking the Code — Eliminate λ

Unpacking the Code — Eliminate Anonymous Inner Classes

Unpacking the Code — Eliminate Inner Classes

Lazyness and Memoization