7. Iterators, Input Parsing

Part of CS:2820 Object Oriented Software Development Notes, Fall 2017
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

 

Iterators

In the last class, we used Java's for loop construct:

    /** Print out the road network from the data structure
     */
    static void printNetwork() {
        for (Intersection i:inters) {
            System.out.println( i.toString() );
        }
        for (Road r:roads) {
            System.out.println( r.toString() );
        }
    }

Each of these for loops is actually an abbreviation for a rather complex bit of code. First, the for loop creates an iterator over the list and then it use that iterator to get successive list elements. The following far more long-winded bit of code is exactly equivalent to the first loop above:

        for (
            Iterator <Intersection> it = inters.iterator();
            it.hasNext();
        ) {
            Intersection i = it.next();
            System.out.println( i.toString() );
        }

An iterator is class that can be derived from any collection that will deliver successive members of that collection each time its next() method is called.

The above long-winded code expresses exactly the same computation as the original for loop, and in fact, the Java compiler generates exactly the same code from the original and this long-winded version of the code. Any for loop can be rewritten as a while loop, so we could further deconstruct this code as follows:

        {
            Iterator <Intersection> it = inters.iterator();
            while ( it.hasNext() ) {
                Intersection i = it.next();
                System.out.println( i.toString() );
            }
        }

Why did we wrap the whole lump of code above in braces? The iterator it created by the for loop was local to the loop. When we converted the for loop to a while loop, we added the extra brackets to guarantee that the declaration of it would be local to the loop and not visible elsewhere in the program.

As a general rule, the for-loop construct in Java is always a shorthand. Every Java for loop can be rewritten as a while loop. Consider this elementary for loop that iterates over integers:

for(int i=0 ; i < 10 ; i++) {
    doSomethingWith( i );
}

This can be rewritten as follows, and in fact, the Java compiler would generate exactly the same code from the above as it generates from this long-winded rewrite:

{
    int i = 0;
    while (i < 10) {
        doSomethingWith( i );
        i++;
    }
}

Again, we wrapped the while loop in an extra set of braces in order to make the loop control variable i visible only inside the loop and not elsewhere in the program.

Where were we?

If we try to compile the code we had at the end of the previous class, we get these error messages:

[dwjones@serv15 ~/project]$ javac Road*java
RoadNetwork.java:47: error: cannot find symbol
            sourceName + " " +
            ^
  symbol:   variable sourceName
  location: class Road
RoadNetwork.java:48: error: cannot find symbol
            dstName + " " +
            ^
  symbol:   variable dstName
  location: class Road
2 errors
[dwjones@serv15 ~/project]$

Looking at the relevant code, we find:

class Road {
    float travelTime;               // measured in seconds
    Intersection destination;       // where road goes
    Intersection source;            // where road comes from
    // name of road is source-destination

    // initializer
    public Road( Scanner sc, LinkedList  inters ) {
        // code here must scan & process the road definition

        String sourceName = sc.next();
        String dstName = sc.next();
        // Bug:  Must look up sourceName to find source
        // Bug:  Must look up dstName to find destination

        // Bug:  What if no next float?
        travelTime = sc.nextFloat();
        String skip = sc.nextLine();
    }

    // other methods
    public String toString() {
        return (
            "Road " +
            sourceName + " " +
            dstName + " " +
            travelTime
        );
    }
}

The code for the toString() method we have is trying to access local variables of the Road() initializer when it ought to be relying on the information available from the instance variables of class Road(). Obviously, our design is incomplete.

Visibility

We could replace sourceName in the toString() method above with source.name. This assumes that we want the name() field of the object to be visible from outside, and this raises the question, what fields ought to be visible, and if visible, how can we protect them.

There is a rule of thumb in programming that comes from military security: The need to know rule. This states that no user of an object should be given more information about the internal state of that object than they need to know to get their job done.

Need to know security goes against the principle of transparency, that in an open society, secrets are bad. The primary reason that we keep secrets in object-oriented programming is to minimize the size of the public interface of each class, but keeping secrets also has genuine security consequences. In a large project, a rogue programmer working on one part of the program can harvest any information about the system that is public and export it. By only giving each programmer access to the bare minimum information needed to do the job, we limit the damage a rogue programmer could do and maximize the likelihood that rogues behavior will be detected.

In Java, each declaration can be marked as: public, private, or final. In general, marking declarations as private makes very good sense. It means that the variable is invisible to all code outside this class.

If a variable must be visible because outsiders need to know about it, you can minimize the damage they can do by marking it final. Final variables are not read-only, their values can be changed, but the binding of the variable name to the object is fixed for all time. Declare variables to be public only if outsiders have a natural need to not only see the variable but to make arbitrary changes to it.

Later, when we discuss clas hierarchies, we will discuss an additional attribute, protected. Protected variables are like private variables in programs that do not use class hierarchies. In a class hierarchy, one class may be a subclass of another. In programs that use class hierarchies, protected variables are visible not only in the class where they are declared, but also in all subclasses of that class. For the moment, we can ignore this marking.

In our running example road network code, the following declarations make sense:

class Road {
    private float travelTime;        // measured in seconds
    private Intersection destination;// where road goes
    private Intersection source;     // where road comes from

    ...
}

class Intersection {
    final String name;
    private LinkedList  outgoing = new LinkedList  ();
    private LinkedList  incoming = new LinkedList  ();

    ...

The markings above make all fields private except for the name of an intersection, which is final. Note that there must be exactly one assignment to a vinal variable in the constructor for a class. Java compilers enforce this rule conservatively, which means that there are cases where you, as a programmer, can assure that there is exactly one assignment to a variable but the Java compiler is not smart enough to figure this out. In such cases, the compiler will not permit that variable to be declared as final.

Lookup

The next problem we face is that of looking up each intersection. The need for this is announced by bug notices in our code:

class Road {

    ...

    // initializer
    public Road( Scanner sc, LinkedList  inters ) {
        // code here must scan & process the road definition

        String sourceName = sc.next();
        String dstName = sc.next();
        // Bug:  Must look up sourceName to find source
        // Bug:  Must look up dstName to find destination
        ...
    }
    ...
}

class Intersection {

    ...

    // initializer
    public Intersection( Scanner sc, LinkedList  inters ) {
        // code here must scan & process the intersection definition

        name = sc.next();
        // Bug: look up name!
      
        ...
    }
    ...
}

Obviously, we need a way, given the name of an intersection, to find that intersection in the list of all intersections. Since this list is a static field of class RoadNetwork, one way to do the lookup is to have RoadNetwork export a public static method to look things up.

public class RoadNetwork {
    static LinkedList <Intersection> inters
        = new LinkedList <Intersection> ();

    ...

    static Intersection findIntersection( String s ) {
        // return the intersection named s, or null if no such
        // Bug: flesh this out!
    }

    ...
}

The obvious way to code this is with a rather stupid linear search through the list of roads to find the right one. Java includes some far more sophisticated tools for this search, but we'll ignore them here.

The Linear Search Solution

Here is the stupid code to flesh out findIntersection

    static Intersection findIntersection( String s ) {
        // return the intersection named s, or null if no such
        for (Intersection i: RoadNetwork.inters) {
            if (i == s) {
                return i;
            );
        }
        return null;
    }

This code is almost inexcusably simple, but it works and it takes very little thought to write it. The fact is, linear searches are awful, and the Java standard library provides some far better tools. The Java Map interface is the gateway to improved performance here. Java maps, when given a key, return the associated value. What we've done above is implement a mapping using a linear search, close to the worst possible implementation.

Why did we opt to do this? There are two reasons:

First: There is a general rule in software development, in three parts:

  1. Don't optimize.
  2. If you must optimize, do it later, and
  3. Only optimize those parts of the program that actually matter to performance.

The point is, you should get the function right first, and only after you are sure that the program actually does something useful and worth optimizing should you worry about optimizing it. Time spent optimizing code that does something that turns out to be wrong is wasted time, and time spent optimizing code that turns out to have little impact on performance is almost as bad.

Second: The time it takes to figure out the complexities of the Map interface and figure out how to apply that interface to this problem is significantly longer than the time it takes to cobble together a quick and dirty solution such as the above. This strongly argues that the quick and dirty solution is the right way to get started.

String Comparison

If you try the above code, it won't work! In fact, it won't even come close to working. No matter how carefully we format the input file, all we get are error messages saying things like:

'intersection' not a road or intersection
'road' not a road or intersection

The problem is, we read a command such as the text road into the variable command in the main program, and then, when we tried the comparison command == "road" we faced a failure. Everywhere we try to compare strings, we have written things like the following:

            if ((command == "intersection")

What's wrong with this? The problem is, Java offers several tools for comparison. The == operator is the simplest of these. In detail, the == operator has two distinct uses:

In the case of class String, each string constant such as "road" is created as a new object by the Java compiler before the time the program containing that constant begins running. When a program calls an instance of class Scanner to read a string from some file, the scanner creates a new object of class String for each string it reads, without making any effort to see if a String object already exists that contains the same textual value. As a result, there may be a number of strings holding the same text that the == operator considers to be different strings.

There are several alternative ways of comparing two strings a and b:

Note that there are two useful ways to compare string variables with string constants:

We'd better rewrite readNetwork() to take this into account:

static void readNetwork( Scanner sc ) {
    while (sc.hasNext()) { 
        // until the input file is finished
        String command = sc.next(); 
        if ("intersection".equals( command )) {
            inters.add( new Intersection( sc, inters ) );
        } else if ("road".equals( command )) {
            roads.add( new Road( sc, inters ) );
        } else {
            // Bug: Should probably support comments
            Errors.warning(
                "'"
                + command
                + "' not a road or intersection"
            );
            // Bug: Should skip to the next line
        }
    }
}

Of course, we also need to rewrite findIntersection().

Using findIntersection

There were two places in our code with bug notices that we can now attack, in the initializers for the road and intersection classes. Consider class Intersection:

    public Intersection( Scanner sc, LinkedList  inters ) {
        // scan and process one intersection
        name = sc.next();
        // Bug: look up name!
        String skip = sc.nextLine();
        // Bug:  What if more junk at end of line (skip not empty)
    }

We need to look up the name of the intersection because our source file format only allows one definition of each named intersection. If an intersection has already been defined with this name, we ought to complain. Here is an initial suggestion for how to do this:

    public Intersection( Scanner sc, LinkedList  inters ) {
        // code here must scan & process the intersection definition

        name = sc.next();
        // Bug:  What if at end of file?  What does sc.next do?
        if (RoadNetwork.findIntersection( name ) != null) {
            Errors.warning( "Intersection "
                      + name
                      + " redefined."
            );
            // Bug:  Can we prevent the creation of this object?
        }
        String skip = sc.nextLine();
        // Bug:  What if more junk at end of line (skip not empty)
    }

We now have an initializer that complains with a useful error message when an attempt is made to declare a second (or third) intersection with the same name as a pre-existing intersection, but all may not be well. The new intersection is still created and added to the list of all intersections. Should we prevent this? Also, what if there is no next symbol for the scanner to scan? Does it return null? Does it return an empty string? Our code is not prepared for either possible outcome, so we have inserted more bug notices.

The second initializer we have to worry about is for roads. In this case, we want to guarantee that both intersections that the road connects are well defined. Here is the old code:

    public Road( Scanner sc, LinkedList  inters ) {
        // code here must scan & process the road definition

        String sourceName = sc.next();
        String dstName = sc.next();
        // Bug:  Must look up sourceName to find source
        // Bug:  Must look up dstName to find destination
        
        // Bug:  What if the next isn't a float?
        travelTime = sc.nextFloat();
        String skip = sc.nextLine();
        // Bug:  What if more junk at end of line (skip not empty)
    }

What we need to do is look up both the source and destination roads and complain if either are undefined. Reasonable error messages should identify the road that caused the problem and also explain the error, so our improved code gets a bit long-winded:

        String sourceName = sc.next();
        String dstName = sc.next();
        // Bug: What if at end of file?  What does sc.next do?

        source = RoadNetwork.findIntersection( sourceName );
        destination = RoadNetwork.findIntersection( dstName );
        if (source == null) {
            Errors.warning(
                "In road " + sourceName + " " + dstName +
                ", Intersection " + sourceName + " undefined"
            );
            // Bug:  Should we prevent creation of this object?
        }
        if (destination == null) {
            Errors.warning(
                "In road " + sourceName + " " + dstName +
                ", Intersection " + dstName + " undefined"
            );
            // Bug:  Should we prevent creation of this object?
        }

As with the previous example, we have not addressed what should be done if the object declaration contains an error, we have merely detected and reported the error. And again, we have not dealt with the possiblity that there might not be a source or destination intersection name.

The duplicated code above suggests that we might want to package the test for undefined names in a subroutine of some kind, perhaps a utility method for syntax checking that gets a name and checks to see if it is defined.

We've also inserted a new bug notice because we just noticed that there may be no sc.next() when we are trying to scan an intersection name. This also suggests we might want to package a more elaborate bit of code into a subroutine of some kind that not only scans an intersection name but checks to see if there is one.