34. Wrappers and Information Hiding

by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

 

Graphing CSV output

In our Epidemic simulator, we've been generating a CSV file, but there has been no mention of how to use it. Try this shell command:

java Epidemic testa > testa.csv

Where the output of the simulator would have been directed to the screen, it now goes into a file called testa.csv. Now:

  1. On your linux desktop, navigate to the directory (folder) where created testa.csv. Double click on it to open the file. Because the file name ends in .csv, you'll open the file as a spreadsheet.

  2. In the open spreadsheet, click whatever buttons you have to click to finish opening the file. Some spreadsheets will ask for confirmation when importing .csv files. Just say OK.

  3. In the spreadsheet, select all (usually done by clicking the upper left corner of the spreadsheet) and then deselect the first column (the column containing the time of day output; deselecton is usually something like control-click).

  4. In the spreadsheet with the above selections, click insert chart (or click insert, which gives you several options, and select chart). You'll get a popup asking you what kind of chart.

  5. In the popup, select the chart type. For an epidemic simulator, the best choice seems to be a line-chart without data points. Select this. You may have to click OK or done to bypass various other customizatio options.

You should see a rather effective graph of the progress of the epidemic as a function of time. Most spreadsheets know that the first row of the spreadsheet can be interpreted as a title row, so if you add the title row to the output of the simulator, you'll get a nicely captioned graph.

Modifying the simulator to add a title row is sufficiently trivial that it isn't an assignment. You're welcome to do it, and the next version of the simulator distributed to the class will produce a title row.

A Common Makefile Error

One mistake that is easy to make with a makefile is accidentally document a dependency between one source file and another, when in fact, the dependency is between a source file and an object file. Thus, you might accidentally write a make rule like this:

Loquat.class: Loquat.java Orange.java
        javac Loquat.java

In fact, when the Java compiler processes Loquat.java that is the only source file it is looking at. If a reader of Loquat.java needs to refer to Orange.java then the Java compiler will refer to Orange.class, not Orange.java, so the makefile entry should have been written as follows (with the change shown in bold-face):

Loquat.class: Loquat.java Orange.class
        javac Loquat.java

Mistakes in documenting dependencies don't always break a makefile. Generally, the error will only manifest itself if some particular file is edited and then make is used to rebuild the project. If make clean had been done first or if some other file had been edited as well, it make will frequently work even when some of the dependencies are wrong.

Because of this, defective makefiles are unfortunately common even on long-established projects.

In Java, there is no need to document any dependency which is not explicit in the source file. If there is a reference to class B in file A.java then A.java depends on B.class. If there is no mention of, B in A.java then there is no dependency. It can be quite effective to use grep to find dependencies. Just do this:

grep SomeClassName *.java > t

This will fill the (temporary) file t with all references occurances of SomeClassName in any Java source file in the current directory. Each line of t will hold one line of a source file, with the file name as a prefix. You can then edit t to delete lines it quotes that are comments and all the lines it quoted from SomeClassName.java. What is left are lines from files that actually mentioned SomeClassName in their Java text. In a Makefile, it is perfectly appropriate to document each of these dependencies.

Circular Make Dependencies and Layering

Make tolerates circular dependencies. It detects them and outputs a warning for each dependency it finds that creates a cycle in the make graph. For Java programs, I recommend leaving these dependencies in the Makefile because while make needes to detect and then ignore them, they serve as useful documentation.

When possible, organize the makefile so files are listed in dependency order, with the main program first, then things on which it depends, then things on which they depend.

Generally, circular dependencies tie together groups of files in the same logical layer of a layered application. If you can arrange the Makefile so all dependencies go down into later lines of the file, then there are no circular dependencies and each file can be thought of as a separate layer.

More commonly, groups of files will stand out, tied together by circular dependencies or by common related purpose. At the middle level of a project, groupings are frequently tied by circular dependencies, while at the lower layers, groupings more frequently emerge because of shared purpose, for example, there might be a collection of service routines with no relationship to each other aside from the fact that they all provide support for what sits above them.

Once you settle on a layereing of the entire project, it is a good idea to keep that order consistent throughout the Makefile. That is, the order in the file of the make rules for the different components should also be used in the lists of files in the dependency list for any particular make rule. Where a make rule depends on different layers in the project, it makes sense to break up the lists of components within the rule the same way the rules themslves are broken up to identify the layers.

Information Hiding

When we added the ability to cancel or reschedule events to our simulation framework, we created a problem. In the original framework, class Event was private and local to class Simulator. As a result, it was easy to enforce the rule that event times could never be changed when they were in the pending event set. The fact is, that once an event was added to the pending event set, that set contained the only record of the event, so the time could not be changed.

Now, in order to allow events to be cancelled or rescheduled, we wrote this:

public static class Event {
    public double time;
    public final Action act;

    Event( double t, Action a ) {
	time = t;
	act = a;
    }
}

public static schedule( double t, Action a ) {
    Event e = new( t, a );
    eventSet.add( e );
    return e;
}

This gives the caller the right to cancel or reschedule an event after it is scheduled, but it also gives the caller the right to make a dangerous mistake. Suppose the caller does this:

Simulator.Event e = schedule( someTime, (double t)->thingToDo( t ) );

// perhaps some timelater in the code
e.time = e.time + 10.0;

Once the event is added to the pending event set, its time field must remain constant! Whether the pending event set is implemented with a heap, with a balanced tree, with a calendar queue, or with something as simple as a sorted linear list, the algorithms used for getting the output in order by time from the collection generally fail if items are put into the collection and then their times are changed.

Of course, we can add comments. Warnings that the time field on an event must not be changed except by explicitly rescheduling events. The trouble with such warnings is, people ignore comments and manuals. If the declaration of some data structure permits some field to be changed, someone will probably change it at some point, and the result will be much wasted time and difficult debugging.

How can we prevent this? One solution is to add a new class, a wrapper class that hides the details of what it wraps. Here, we will leave class Event as the name used by outsiders for scheduled events that can later be cancelled or rescheduled. Internally, we will use class RealEvent to actually hold the dangerous information. By making class RealEvent private while class Event is public, we can hide things from users of class Simulator:

private static class RealEvent {
    public double time;
    public final Action act;
    ... details omitted ...
}

public static class Event {
    public RealEvent e;

    public RealEvent( Event e ) {
	this.e = e;
    }
}

public static schedule( double t, Action a ) {
    RealEvent e = new( t, a );
    eventSet.add( e );
    return new RealEvent( e );
}

Here, the public field and constructor of class Event are useless outside of class Simulator because, although they are public, they involve classes that are unknown in the outside world. As a result, under the access control rules of Java, they are completely protected from abuse.

In effect, we have used the wrapper class the same way envelopes are used in the postal system. The letter inside is private, and we wrap it up in an envelope in order to allow people to handle the letter without messing with it or reading it.

Using wrappers this way has a cost. In schedule(), we had to call two constructors, one to create the RealEvent that we scheduled, and one to wrap it up in an Event for return. Constructor calls aren't cheap. They involve a call to the storage manager which must use its algorithms to search available memory for a chunk the right size to hold the object requested. The storage management algoriths in use today are quite fast, but quite fast is not the same as free. They do have a run-time cost, and it is a cost we would rather not pay.

There is another cost. Our original version of reschedule() looked something like this:

public static reschedule( Event e, double t ) {
    eventSet.remove( e );
    e.time = t;
    eventSet.add( e );
}

Now, we must rewrite this as follows:

public static reschedule( Event e, double t ) {
    eventSet.remove( e.e );
    e.e.time = t;
    eventSet.add( e.e );
}

It is easy to replace the expression e, a "naked" Event, with e.e, code to "unwrape" an Event to reveal the RealEvent hiding inside it, this too is not free. The cost of this unwrapping is, however, very small, about the same as the cost of an add instruction on most machines, including the JVM (Java Virtual Machine).

We will explore this issue further in the next class, looking for lower cost alternatives.