8. Text file design

Part of CS:2820 Object Oriented Software Development Notes, Spring 2021
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Where are we?

We ended the last lecture with a suggested main program that opened an input file using a Scanner and then passed that scanner to another method readNetwork() giving it the job of picking apart the input file to build a model of a road network.

    /** Initialize this road network by scanning its description
     */
    static void readNetwork( Scanner sc ) {
        while (sc.hasNext()) {
            // until the input file is finished
            string command = sc.next()
            if (command == "intersection") {
                inters.add( new Intersection( sc, inters ) );
            } else if (command == "road") {
                roads.add( new Road( sc, inters ) );
            } else {
                // Bug: Complain about unknown command
            }
        }
    }

    /** Main program
     * @see readNetwork
     */
    public static void main(String[] args) {
        if (args.length < 1) {
            // Bug:  Complain about a missing argument
        } else try {
            readNetwork( new Scanner(new File(args[0])) );
            // BUG:  Actually simulate the roads and interesections?
        } catch (FileNotFoundException e) {
            // Bug:  Complain that the file doesn't exist
        }
    }
}

Text file design

The above code assumes that the constructors for classes Road and Intersection exist. In order to write this code, we need to start fleshing out some details of the source file describing the road network. The code above suggests a file format where each line begins with the keyword road or intersection followed by some kind of description of the indicated item. Furthermore the code assumes that the constructor itself will actually read that description in order to build a road or intersection object. The form of the input file would be something like:

intersection [[something read by the intersection constructor]]
intersection [[something read by the intersection constructor]]
road [[something read by the road constructor]]
road [[something read by the road constructor]]
road [[something read by the road constructor]]

One obvious way to begin each intersection is with the name of that intersection. For example, we might write something like this:

intersection a [[other attributes for the intersection]]
intersection b [[other attributes for the intersection]]
intersection c [[other attributes for the intersection]]
road a b [[other attributes of the road]]
road b c [[other attributes of the road]]
road c a [[other attributes of the road]]

Here, we propose that intersections have simple names, while roads are named by the intersections that they connect. We could have named the roads with simple names and then named the intersections by the roads they connect. That approach is more common with city streets, where we speak of the Clinton street or Washington street, and the intersection of Clinton and Washington.

What are the other attributes? For roads between intersections, the obvious attributes are the length of the road and its speed limit. If we assume that drivers obey the speed limit, we can fold these together and describe the road by its travel time -- how long it takes to get from one end to the other.

The attributes of intersections are more complex. Some intersections have stop lights, some have stop signs, and stop-lights have directions associated with them. We will develop this incrementally, adding attributes as we introduce complexity to our model. For example, we'll discover that some intersections have stoplights that alternately allow east-west travel and north-south travel. When we get to that point, we'll have to extend our naming convention so that a road can connect, for example, outgoing north from intersection A and incoming east to intersection B. For now, we'll ingore this, but we'll do so with the knowledge that our initial design is inadequate. The initial design gives us something like this:

intersection a
intersection b
intersection c
road a b 30
road b a 30
road a c 12
road c a 12
road b c 22

In the above, note that all roads are one-way. If you want to allow traffic from intersection a to interesection b and back again, you have to describe this with two separate roads, one allowing traffic in each direction.

The numbers on the roads give the travel times in unnamed time units. Perhaps we should consider extending this with a concept of time unit, so 20s would mean 20 seconds, while 20m would mean 20 minutes. That is too complicated for an initial version of our code.

If you look at the details of class Scanner one thing you will notice is that it will take work to divide the input using anything other than whitespace (blanks, tabs and newlines). We will look at this issue later, but for our initial design, it makes sense to just use whitespace, without any punctuation.

Later, we can consider adding these details:

intersection a;
intersection b;
intersection c;
road a-b 30s;
road b-a 0.5m;
road a-c 12;
road c-a 12;
road b-c 22;

In the above, we've made the default time unit the second, but allowed the user to specify travel times in minutes (and perhaps hours or even days). We've also added dashes between the names of the two ends of each road and semicolons at the end of each "declaration" of a road or intersection.

A text file design for the epidemic model

The initialization logic for the epidemic model is going to be different, because the epidemic model does not rest on a pre-specified map of the community. Instead, in the epidemic model, the initialization code will read in a list of statistics and then generate a community.

Here is one proposal for an input file format:

pop 2500;       // population
house 4,3;      // household size average 4, plus or minus 3
jobs 0.25;      // 1/4 of the population has jobs
study 0.5;      // 1/2 of the population are students
school 250,100;	// number of students per school, plus or minus
class 25;	// student-teacher ratio

While this proposal was the starting point of the exercise last semester, but it has some severe weaknesses. Primary among them, it leads to special cases for each kind of workplace. A general mechanim would seem more desirable, where categories of workplaces would not be predefined, but instead, defined on the fly:

pop 2500;       // population
house 4,3;      // household size average 4, plus or minus 3
place office;
  employment 0.2  // fraction of the population who are office workers
    arrive 9        // office workers start at 9AM
    depart 17       // office workers depart at 5PM
  size 5,3        // offices have an average of 5 workers plus or minus 3
place school;
  employment 0.01 // fraction of the population who work in schools
    arrive 8        // school employee arrival time
    depart 4        // school employee departure time
    size 20,10      // school have 20 employees, plus or minus 10
  others 0.30    // fraction of the population who attend school
    arrive 8.5      // students arrive at 8:30AM
    depart 3        // students depart at 3:00PM
    size 300,150    // schools have 300 students, plus or minus 150

At this point, we are still vague about how things work on several fronts. Punctuation and indenting are used inconsistently above, because we really don't know what we want. The // comment format is borrowed from programming languages, and it looks useful. We really don't need comments, so we can abandon this idea if it leads to trouble.

The above notation is a bit confusing because it ties people to places, rather like serfs in medieval Europe. Each person's behavior seems to be entirely tied to a place, and this makes it hard to introduce places like restaurants and stores which are workplaces for some and sites of occasional visits by others.

This suggests that we really need three fundamental things, people, places and roles. Each person has a role, and that role provides the script for their life. Imagine describing a community like this:

population 2500;
place home 3.5, 2;      // there is one home per 3.5 people plus or minus 2
place office 25, 24;    // there is one office per 25 people plus or minus 24
place school 1000, 700; // there is one school per 1000 people
role homemaker 0.3   // 0.3 of the population are homemakers
  home;                   // homemakers start at home
role officework 0.35 // 0.35 of the population are office workers
  home,                   // office workers start at home
  office(9 17 weekdays);  // they go to the office from 9AM to 5PM
role teacher 0.05    // 0.05 of the population are teachers
  home,                   // teachers start at home
  school(8 17 weekdays);  // they go to school from 8AM to 5PM
role student 0.3     // 0.03 of the population are students
  home,                   // students start at home
  school(9 15 weekdays);  // they go to school from 9AM to 3PM

At this point, we are still vague about how things work on several fronts. How do we control probabilistic things like being late for work or the fact that students make occasional trips to stores after school? For that matter, how do we control the distribution of people to homes, office workers to offices, and the like? We'll leave those vague for now!

And, note that we said that home sizes, school sizes, etc follow some kind of probability distribution.

Perhaps we need to refine the specificaiton so that we can write this:

place home 4,3 uniform;
place school 250,100 normal;

Or, perehaps we should make the type of probability distribution for each type of item fixed by the item type in order to avoid creating a general purpose mechanism to solve a problem where, in real life, the distribution depends only on the type of real-world data we are specifying.

Executive decision: Prototype software should stay simple! We will defer this, but we will remember that there may be need to generalize later.

Scanners again

When in doubt about tools, experiment! All of the above suggestions for file formats will require some kind of input scanning. Class Scanner is a powerful tool, but difficult for beginners. The best way to get started with such a tool is to play with it. Here is a little program to test class Scanner agains the kind of input formats we've suggested above for the highway network and epidemic models:

import java.util.Scanner; public class ScanTest { static final Scanner in = new Scanner( System.in ); public static void main(String[] args) { System.out.print( in.next() ); // get a string System.out.print( in.nextInt() ); // followed by an int System.out.print( in.next(";") ); // followed by semicolon } }

Playing with it shows that using scanners with the default definition of an integer requires that there be a space between the integer and any following punctuation. We can live with this, at least for the time being. It's also clear that any real application using scanners needs to catch exceptions so that the input isn't cluttered with stack traces every time someone makes a typo!