11. Improved input processing

Part of CS:2820 Object Oriented Software Development Notes, Spring 2021
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

 

Where were we?

Look at this code:

class Road {
    float travelTime;               // measured in seconds
    Intersection destination;       // where road goes
    Intersection source;            // where road comes from
    // name of road is source-destination

    // constructor
    public Road( Scanner sc, LinkedList  inters ) {
        // code here must scan & process the road definition

        String srcName = sc.next();
        String dstName = sc.next();
        // Bug:  Must look up srcName to find source
        // Bug:  Must look up dstName to find destination

        // Bug:  What if no next float?
        travelTime = sc.nextFloat();
        String skip = sc.nextLine();
    }

    // other methods
    public String toString() {
        return (
            sourceName + " " +
            dstName + " " +
            travelTime
        );
    }
}

Visibility

Is source.name legal here? It is outside class Intersection. This raises the question, what fields ought to be visible, and if visible, how can we protect them.

There is a rule of thumb in programming that comes from military security: The need to know rule. This states that no user of an object should be given more information about the internal state of that object than they need to know to get their job done.

Need to know security goes against the principle of transparency, that in an open society, secrets are bad. The primary reason that we keep secrets in object-oriented programming is to minimize the size of the public interface of each class, but keeping secrets also has genuine security consequences. In a large project, a rogue programmer working on one part of the program can harvest any information about the system that is public and export it. By only giving each programmer access to the bare minimum information needed to do the job, we limit the damage a rogue programmer could do and maximize the likelihood that rogues behavior will be detected.

In Java, each declaration can be marked as: public, private, or final. In general, marking declarations as private makes very good sense. It means that the variable is invisible to all code outside this class.

If a variable must be visible because outsiders need to know about it, you can minimize the damage they can do by marking it final. Final variables are not read-only, their values can be changed, but the binding of the variable name to the object is fixed for all time. Declare variables to be public only if outsiders have a natural need to not only see the variable but to make arbitrary changes to it.

Later, when we discuss class hierarchies, we will discuss an additional attribute, protected. Protected variables are like private variables in programs that do not use class hierarchies. In a class hierarchy, one class may be a subclass of another. In programs that use class hierarchies, protected variables are visible not only in the class where they are declared, but also in all subclasses of that class. For the moment, we can ignore this marking.

In our running example road network code, the following declarations make sense:

class Road {
    private float travelTime;        // measured in seconds
    private Intersection destination;// where road goes
    private Intersection source;     // where road comes from

    ...
}

class Intersection {
    public final String name;
    private LinkedList  outgoing = new LinkedList  ();
    private LinkedList  incoming = new LinkedList  ();

    ...

The markings above make all fields private except for the name of an intersection, which is final and public. Note that there must be exactly one assignment to a final variable in the constructor for a class. Java compilers enforce this rule conservatively, which means that there are cases where you, as a programmer, can assure that there is exactly one assignment to a variable but the Java compiler is not smart enough to figure this out. In such cases, the compiler will not permit that variable to be declared as final.

Lookup

The next problem we face is that of looking up each intersection. The need for this is announced by bug notices in our code:

class Road {

    ...

    // constructor
    public Road( Scanner sc, LinkedList  inters ) {
        // code here must scan & process the road definition

        String sourceName = sc.next();
        String dstName = sc.next();
        // Bug:  Must look up sourceName to find source
        // Bug:  Must look up dstName to find destination
        ...
    }
    ...
}

class Intersection {

    ...

    // constructor
    public Intersection( Scanner sc, LinkedList  inters ) {
        // code here must scan & process the intersection definition

        name = sc.next();
        // Bug: look up name!
      
        ...
    }
    ...
}

Obviously, we need a way, given the name of an intersection, to find that intersection in the list of all intersections. Since this list is a static field of class RoadNetwork, one way to do the lookup is to have RoadNetwork export a public static method to look things up.

public class RoadNetwork {
    static LinkedList <Intersection> inters
        = new LinkedList <Intersection> ();

    ...

    static Intersection findIntersection( String s ) {
        // return the intersection named s, or null if no such
        // Bug: flesh this out!
    }

    ...
}

The obvious way to code this is with a rather stupid linear search through the list of roads to find the right one. Java includes some far more sophisticated tools for this search, but we'll ignore them here. One very sound rule of software engineering is:

Where were we?

In the versin of the Road Network simulation we've developed, we put the collection of members of each class inside that class, and we had the class keep that collection private so that nobody outside the class could see how it was organized. The class exported an iterator so that the outsiders could look through the collection, but the collection itself was hidden. For example:

class Intersection {
    ... lots of details omitted ...

    // the collection of all instances
    private static final LinkedList <Intersection> allIntersections =
	new LinkedList <Intersection> ();

    /** Allow outsiders to iterate over all roads
     * @return iterator over roads
     */
    public static Iterator <Intersection> iterator() {
	return allIntersections.iterator();
    }
}

This allowed the main program to output the collection for debugging with the following code:

    for (
	Iterator<Intersection> i = Intersection.iterator();
	i.hasNext();
    ){
        System.out.println( "Intersection " + i.next() );
    }

In the case of intersections, we needed to provide a way to look up an intersection by name, so we added this method to class Intersection:

    public static Intersection lookup( String n ) {
	for (Intersection i: allIntersections) {
	    if (i.name.equals( n )) return i;
	}
	return null;
    }

This is really stupid code, a simple linear search of allIntersections for an intersection with the right name. Java provides much better tools for this! There are hash sets, for example. The problem here is that looking up these tools and taking the time to explore the options delays development. Writing a for loop to do a linear search is trivial, we did it without thinking. That was the right choice for early in the development process. Later, if we discover that this linear search is a bottleneck, we can replace the linked list with something allowing a faster search. This follows the classical rules of optimization:

In the case of a simulation program, building the model is usually very inexpensive compared to running the model. Our lookup method is only intended for use during building the model, so it will most likely never matter that it uses a stupid algorithm.

String Comparison

A reminder: The following code will most likely never work!

            if ((command == "intersection")

What's wrong with this? The problem is, Java offers several tools for comparison. The == operator is the simplest of these. In detail, the == operator has two distinct uses:

In the case of class String, each string constant such as "road" is created as a new object by the Java compiler before the time the program containing that constant begins to run. When a program calls an instance of class Scanner to read a string from some file, the scanner creates a new object of class String for each string it reads, without making any effort to see if a String object already exists that contains the same textual value. As a result, there may be a number of strings holding the same text that the == operator considers to be different strings.

Java didn't have to work this way. Class String could have maintained a dictionary of all strings encountered, so that whenever a new string is constructed, if that string was already in the dictionary, the existing dictionary entry would be used instead of creating a new object. That would have made string processing even slower than it already is, so the designers of the String class opted to make string construction as fast as they could without eliminating duplicates.

There are several alternative ways of comparing two strings a and b:

Note that there are two useful ways to compare string variables with string constants:

Using Intersection.lookup()

There were two places in our code with bug notices that we can now attack, in the constructors for the road and intersection classes. Consider the constructor for class Intersection:

    public Intersection( Scanner sc ) {
	name = sc.next();   // pick off name of intersection;
	// BUG:  Must Detect duplicate definitions of intersections?
	// BUG:  Intersection type?  Other attributes?  Handle this later.

	allIntersections.add( this );
    }

We need to look up the name of the intersection because our source file format only allows one definition of each named intersection. If an intersection has already been defined with this name, we ought to complain. Here is an initial suggestion for how to do this:

    public Intersection( Scanner sc, LinkedList <Intersection> inters ) {
        // code here must scan & process the intersection definition

        name = sc.next();
        // Bug:  What if at end of file?  What does sc.next do?
        if (lookup( name ) != null) {
            Errors.warning( "Intersection " + name + " redefined." );
            // Bug:  Can we prevent the creation of this object?
        }
    }

We now have a constructor that complains with a useful error message when an attempt is made to declare a second (or third) intersection with the same name as a pre-existing intersection, but all may not be well. The new intersection is still created and added to the list of all intersections. Should we prevent this? Also, what if there is no next symbol for the scanner to scan? Our code is not prepared for this.

The second constructor we have to worry about is for roads. In this case, we want to guarantee that both intersections that the road connects are well defined. Here is the old code:

public Road( Scanner sc ) {
	// keyword Road was already scanned
	final String src;	// where does it come from
	final String dst;	// where does it go

	... code to scan src, dst omitted ...
	
	destination = Intersection.lookup( dst );
	source = Intersection.lookup( dst );
	// BUG: What if lookup returns null!

	allRoads.add( this ); // this is the only place items are added!
    }

What we need to do is look up both the source and destination roads and complain if either are undefined. Reasonable error messages should identify the road that caused the problem and also explain the error, so our improved code gets a bit long-winded:

	destination = Intersection.lookup( dst );
	source = Intersection.lookup( dst );

        if (source == null) {
            Errors.warning(
                "In road " + src + " " + dst +
                ", Intersection " + src + " undefined"
            );
            // Bug:  Can we prevent creation of this object?
        }
        if (destination == null) {
            Errors.warning(
                "In road " + src + " " + dst +
                ", Intersection " + dst + " undefined"
            );
            // Bug:  Can we prevent creation of this object?
        }

As with the previous example, we have not addressed what should be done if the object declaration contains an error, we have merely detected and reported the error. And again, we have not dealt with the possiblity that there might not be a source or destination intersection name.

The duplicated code above suggests that we might want to package the test for undefined names in a subroutine of some kind, perhaps a utility method for syntax checking that gets a name and checks to see if it is defined.

We've also inserted a new bug notice because we just noticed that there may be no sc.next() when we are trying to scan an intersection name. This also suggests we might want to package a more elaborate bit of code into a subroutine of some kind that not only scans an intersection name but checks to see if there is one.