9. Testing

Part of CS:2820 Object Oriented Software Development Notes, Fall 2020
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

 

An aside on Java enumerations

Consider the following declaration of an enumeration in Java:

enum Color { red, green, blue }

When the Java compiler sees this, it responds as if it had seen this:

final class Color {
    private final int value; // the only instance variable

    private Color( int v ) { value = v; }

    public static final Color red = new Color(0);
    public static final Color green = new Color(1);
    public static final Color blue = new Color(2);

    private static final String names = { "red", "green", "blue" };
    public toString() { return names[ this.value ]; }
}

Thee above is oversimplified. There are several other methods provided for each enumeration, but this also illustrates some useful code patterns:

Comparing values from an enumeration class is easy if all you want to do is compare for equality. The == and != operators work as a naive programmer expects and it is never necessary to use equals(). That is, if a and b are two variables of the same enumeration class, a==b and a.equals(b) will always return the same result.

Comparing for order is also allowed. a.compareTo(b) compares the hidden underlying integer values. These values are always assigned in the same order as the textual order of the enumeration values. In this example, red was declared first, so it is less than everything after it.

Testing

How do you go about testing a program? There are numerous approaches to this problem ranging from using random input to carefully reading the code and then designing tests based on that reading.

Fuzz Testing

Random input is actually useful enough that it has a name: Fuzz testing. In an early study of the quality of various implementations of Unix and Linux, fuzz testing forced a remarkable number of standard utility programs into failure, and showed that reimplementing a utility from the specifications in its manuals generally tended to produce a more solid result than the original code that had evolved from some cruder starting point, sometimes decades ago. If a program reads a text file, a fuzz test will simply fill a text file with random text for testing purposes. Programs that respond to mouse clicks can be fuzz tested with random sequences of mouse clicks. A program is deemed to fail a fuzz test if it goes into an infinite loop or fails with an unhandled exception. A program passes if it responds with appropriate outputs and error messages.

Here is a little Java program that generates a fuzz to standard output:

import java.util.Random;
import java.lang.NumberFormatException;

/** Fuzz test generator
 * output a random length string of gibberish to standard output
 * takes 1 command line argument, an integer,
 * this controls the expected output length.
 * @author Douglas W. Jones
 * @version Sept. 18, 2020
 */
class Fuzz {

    public static void main( String arg[] ) {
        Random rand = new Random();
        int n = 0; // controls length of output
	if (arg.length != 1) {
	    System.err.println( "argument required -- length of output file" );
	    System.exit( 1 );
	} else try {
	    n = Integer.valueOf( arg[0] );
	} catch (NumberFormatException e) {
	    System.err.println( "non numeric argument -- length of output" );
	    System.exit( 1 );
        }
	while (rand.nextInt(n) > 0) {
	    System.out.print( (char)rand.nextInt( 128 ) );
	}
	System.out.print( '\n' );
    }
}

Fuzz testing is an example of black-box testing. Black-box testing works from the outside. The designers of black-box tests works from the external specifications for the behavior of the system, trying out everything the system is documented as being able to do, and trying variations in order to see that the system behaves appropriately when the inputs are outside the specifications. In the case of fuzz testing, the test designers don't even need to read the specifications very closely, since the goal is to simply throw junk at the program to see if it breaks. Black-box testing is unlikely to find backdoors or other undocumented features of software, and is therefore not a very strong tool for security assessment.

If you wanted to fuzz test the road network simulator, you might run the following shell commands repeatedly:

java Fuzz 10 > test
java RoadNetwork test

The trouble with this is that the fuzz is too fuzzy. You'd have to run the test thousands of times just to have chance that the test would begin with the word intersection, so the test wouldn't be very thorough. On the other hand, if your program does throw an exception on fuzz test input, it is seriously defective. We could use fuzz testing where a file begins sensibly and then ends with fuzz. For example, consider this:

echo "intersection " > test
java Fuzz 10 >> test
java RoadNetwork test

Or this:

echo "intersection a" > test
echo "intersection a" >> test
java Fuzz 10 >> test
java RoadNetwork test

This allows us to construct a series of tests that add selective fuzz at various places in the input. Still, fuzz testing is too fuzzy. You really don't know how thorough the test was. Nonetheless, fuzz testing can be easily automated to run thousands of trials, ignoring normal output from the program and paying attention only to outputs that indicate unhandled exceptions or other fairly easy to detect failures.

Path Testing

One common way to design a systematic test is called path testing. To design a path test, you read the code and try to provide test inputs that force the program to execute every possible alternative path through the code.

Path testing is an example of white-box testing (which really ought to be called transparent-box testing). White-box testing is the opposite of black-box testing. White-box testing uses access to the internal structure of the system in order to design tests. In effect, the test designer looks at the mechanism and then creates tests to assure that the mechanism actually does what it seems intended to do. White-box testing can find undocumented features of a system, but it is more labor intensive than black-box testing.

Looking at just the main program we have been working on, there are 4 paths through the code, so our path test will have at least 4 test cases:

Here is a transcript of a successful path-testing session that tests 3 of the above cases:

[HawkID@serv15 ~/project]$ java RoadNetwork
Missing filename argument
[HawkID@serv15 ~/project]$ java RoadNetwork a b
Unexpected extra arguments
[HawkID@serv15 ~/project]$ java RoadNetwork nothing
Can't open file 'nothing'

Two of the above test cases were completely tested, but we did a bad job on the final case because we only tested for nonexistant files, not unreadable files. We really ought to test for files that exist but cannot be read. We can do this as follows:

[HawkID@serv15 ~/project]$ echo "nonsense" > testfile
[HawkID@serv15 ~/project]$ chmod -r testfile
[HawkID@serv15 ~/project]$ java RoadNetwork testfile
Can't open file 'nothing'
[HawkID@serv15 ~/project]$ rm testfile

This series of Unix/Linux shell commands creates a file called testfile containing some nonsense, and then changes the access rights to that file so it is not readable before passing it to the program under test. Finally, after testing the program, the final shell command deletes the test file.

The final test we are missing above is one to test how the main program handles a file that contains a valid description of a road network. Reading the code for readNetwork(), the outer loop is a while loop, that terminates when there is no next token in the input file. As a result, an empty input file is a valid (but trivial) road network, so we can finish the path testing of our main program hoping to pass a test something like this:

[HawkID@serv15 ~/project]$ echo "" > testfile
[HawkID@serv15 ~/project]$ java RoadNetwork testfile
[HawkID@serv15 ~/project]$ rm testfile

Test Scripts

Before we move onward, note that it is prudent to re-run all of the tests after any change to the source program. Doing this by hand gets tedious very quickly, so it is tempting to just test the things you think you changed, while hoping that everything else still works. Any experienced programmer knows that fixing one bug frequently creates another, so this is a foolish testing philosophy.

It is far better to create a test script that can be run again after each change to the program. Consider a test file something like this:

#!/bin/sh
# testroads -- test script for road network
java RoadNetwork
java RoadNetwork a b
java RoadNetwork nothing
echo "nonsense" > testfile
chmod -r testfile
java RoadNetwork testfile
rm testfile
echo "" > testfile
java RoadNetwork testfile
rm testfile

The first line of this test script, #!/bin/sh tells the system what shell to use to run your file. You have a choice shell script. If you make the file executable using the chmod +x testroads, the first line lets you run it by just typing the ./testroads shell command.

The second line is a comment giving the file name the text is intended to be in. It would make sense to add comments taking credit for the file, noting the creation date, and other details. The remaining lines are the test. After you create this file, you can run it like this:

[HawkID@serv15 ~/project]$ sh < testfile
Missing filename argument
Unexpected extra arguments
Can't open file 'nothing'
Can't open file 'testfile'

That's quite a jumble of output, and for that matter, the test file itself is a jumble. We need to document the output we expect, and the output of the test script should, at the very least, document the tests being performed and describe how to recognize whether the test was passed or failed. Here's a better test script:

#!/bin/sh
# testroads -- test script for road network

echo "GROUP of path tests for main program"
echo "TEST missing file name argument"
java RoadNetwork

echo "TEST unexpected extra arguments"
java RoadNetwork a b

echo "TEST can't open nonexistant file"
java RoadNetwork nothing

echo "TEST can't open unreadable file"
echo "nonsense" > testfile
chmod -r testfile
java RoadNetwork testfile
rm testfile

echo "TEST reading from an empty file"
echo "" > testfile
java RoadNetwork testfile
rm testfile

The echo shell command simply outputs its command line arguments to standard output, so most of the echo commands above serve both as comments in the source file and to put comments into the output of the tests.

Note that several of the later tests in this test script create temporary test files by using echo to put text into a test file. Each of these tests ends by removing the test file. We could just as easily create a suite of permanent test files, and for larger tests, this would make good sense.

Running the test

Unfortunately, our road network program fails the first interesting test, the final test in our script with an empty source file. We get output something like this:

[HawkID@serv15 ~/project]$ java RoadNetwork roads
Exception in thread "main" java.lang.NullPointerException
    at RoadNetwork.printNetwork(RoadNetwork.java:149)
    at RoadNetwork.main(RoadNetwork.java:172)

What went wrong here? The above error message says that line 149 in our program tried to use a null pointer, and that this was called from line 172 of the program. On opening up the program in an editor and looking at these lines in our source code, it turns out that line 172 is the call to printNetwork(), so we have definitely finished our path coverage of the main program.

Line 149 is this:

        for (Intersection i:inters) {

Here, we tried to pick an intersection out of a list of intersections, but there was no list! That is, the list inters was null, a condition quite distinct from being an empty list. That is, in Java, the statements that "there is no list" and "the list is empty" are not equivalent.

Why was inters null? The call to initializeNetwork() in the main program was supposed to build roads and inters, but since the input file was empty, it did nothing, leaving these lists as they were initially. The initial values of these lists were determined by their declarations:

    /* the sets of all roads and all intersections */
    static LinkedList <Road> roads;
    static LinkedList <Intersection> inters;

These declarations are, as it turns out, wrong. The default value of any object in Java (excepting built-in types like int) is null, and that is the source of our null-pointer exception. What we need to do is initialize these two lists to empty lists, not null pointers.

    /* the sets of all roads and all intersections */
    static LinkedList <Road> roads
        = new LinkedList <Road> ();
    static LinkedList <Intersection> inters
        = new LinkedList <Interseciton> ();

A question of format: Why wrap both lines when only the second line was too long to fit in an 80 column display? We could have written this:

    /* the sets of all roads and all intersections */
    static LinkedList <Road> roads = new LinkedList <Road> ();
    static LinkedList <Intersection> inters
        = new LinkedList <Interseciton> ();

The problem is, these two lines of code are parallel constructions. If we write them so that they wrap identically, the fact that they are parallel is easy to see. If, on the other hand, we wrap them differently or worse, let the text editor wrap them randomly, you have to actually read and understand the text to see the parallel. Attention to this kind of detail makes programs much easier to read.

Path Testing the Input Parser

With this fix, the code compiles and we get the expected result, no output because the input file was empty. So, we can begin testing. As with the main program, we begin with something very simple, a one-line data file that defines just one intersection. For now, let's try these files without worrying about scripts and automation:

intersection A

The program output is identical to the input, so if it works, we can build on this, adding more intersections and roads, working up to something like this:

intersection A
intersection B
road A B 10
road B A 20

This is not particularly interesting unless we uncover some bugs. The next step is to start making some errors. Consider this input file:

intersection A
intersection B
intersection A
road A B 10
road B A 20

Here, we've deliberately inserted a duplicate intersection definition. When we run the program over this input (stored in the file roads, we get this output:

[HawkID@serv15 ~/project]$ java RoadNetwork roads
Intersection A redefined.
Intersection A
Intersection B
Intersection A
Road A B 10
Road B A 20

This is correct, in as far as it goes, but the output is not very readable. The problem is, the error message is not cleanly distinguished from the output. Our current version of the errors package is at fault, with code something like this:

class Error {
    static void warn( String msg ) {
        System.err.println( msg );
    }
    static void fatal( String msg ) {
        warn( msg );
        System.exit( 1 );
    }
}

What we need is simple, a standard prefix on each error message that distinguishes it from the normal output of the program. Consider this:

class Error {
    static void warn( String msg ) {
        System.err.println( "Error: " + msg );
    }
    static void fatal( String msg ) {
        System.err.print( "Fatal " );
        warning( msg )
        System.exit( 1 );
    }
}

Aside: Standard Error versus Standard Output

In our program, we have output error messages to System.err and normal data output to System.out

By default, when running under the Unix/Linux shell (and under the DOS command line under Windows), output to System.err is mixed in with output to System.out, but they can be separated. Here is a Unix/Linux example to illustrate this:

[HawkID@serv15 ~/project]$ java RoadNetwork roads > t
Intersection A redefined.
[HawkID@serv15 ~/project]$ cat t
Intersection A
Intersection B
Intersection A
Road A B 10
Road B A 20
[HawkID@serv15 ~/project]$ rm t

The added > t at the end of the command running our program diverts System.out (or rather, the Linux/Unix standard output stream) to the file named t. So, when our program runs, the only thing we see on the screen is the error message. Then, we use the command cat t to dump the file t to the screen. We could just as easily have used any text editor to examine the file, and finally, although nothing required us to do so, we deleted the file with the rm command.

Under some Unix/Linux command shells, it is almost as easy to divert standard error (System.err) to a file, but this was an afterthought, so the way you do so differs from one shell to another. Initially, the designers of the Unix shell assumed that users always wanted to see the error messages immediately, while they might want to save other output. As a result, shell tools for redirecting standard error are afterthoughts and differ from one shell to the next.

The two most common families of Unix/Linux shells are sh (the Bourne shell) and its open-source replacement bash (the Bourne-again shell), on the one hand, and csh (the C shell) and its open-source replacement tcsh (the TENEX-inspired rewrite of csh). To find out what shell you are using, type echo $SHELL. This will output the file name from which your current shell is being executed.

In sh and bash, typing >f after a shell command redirects standard output to a file named f while leaving standard error directed to the terminal. In contrast, typing 2>f redirects standard error and leaves standard output unchanged. This strange use of the numeral 2 is based on the fact that, in Unix and Linux, all open files are numbered, and by default, file 0 is standard input, file 1 is standard output, and file 2 is standard error. This is a really odd design, but it works. If you want to redirect both standard output and standard error to different files, you can write >f 1>g.

In csh and tcsh, typing >f after a shell command works as it did in sh. In csh typing >&f after a shell command redirects both standard output and standard error to the same file. If you want to split the two into different files, you can use >f >&g. This works because the first redirection took only standard output, so all that is left for the second redirection is standard error. In effect, the >& really means to take standard output, standard error or both, whichever has not already been redirected.

In both csh and tcsh, typing >f after a shell command will overwrite the contents of file f if that file already exists. In contrast, typing >>f after the command will append that command's output to the existing file.

Path Testing Continued

Another obvious error to explore occurs when a road is defined in terms of an undefined intersection. Consider this input file:

intersection A
intersection B
intersection A
road A B 10
road B A 20
road A C 2000

When we run this, we get the expected error messages, but when it tries to output Road C we get a null pointer exception.

What is the problem? There are some bug notices in our code that are closely related to this. Specifically, in the initializer for Road, when we output the warning about an undefined intersection, we wrote this:

        if (destination == null) {
            Errors.warning(
                "In road " + sourceName + " " + dstName +
                ", Intersection " + dstName + " undefined"
            );
            // Bug:  Should we prevent creation of this object?
        }

We did not prevent creation of the object when the declaration of that object contained an undefined destination interseciton name. Instead, we left the object with a null destination field. This caused no problem until later when we tried to output the road description using the toString() method:

    public String toString() {
        return (
            "Road " +
            source.name + " " +
            destination.name + " " +
            travelTime
        );
    }

In this code, we blindly reached for the name fields of the source and destination intersections without checking to see if they exist. We need to add this check. Perhaps the uglyest but most compact way to do so is to use the embedded conditional operator from Java:

    public String toString() {
        return (
            "Road " +
            (source != null ? source.name : "---" ) +
            " " +
            (destination != null ? destination.name : "---" ) +
            " " +
            travelTime
        );
    }

This code works, substituting --- for any names that were undefined in the input file, but it is maddeningly difficult to format this code so that it is easy to read. C, C++ and Java all share the same basic syntax for the conditional operator (a?b:c), and some critics consider this operator to be so unreadable that they advise never using it. It might be better to add a private method that is easier to read and packages the safe return of either the name or dashes if there is no name. We'll worry about this later.

Test Frameworks

Testing a large program typically produces a number of tests. During program development, it is a good idea, after each change to the program, to re-run all of the tests in order to make sure that the change did not cause any damage. Doing the tests by hand can be time consuming, so it is a good idea to create a test framework and automate things.

First, it makes sense to keep all of the test files in their own directory.

Second, as we've already suggested, it makes sense to write a shell script to automate the testing. Following the model used in our first script example, we might add tests that look something like this:

echo "GROUP of path tests for readNetwork"
echo "TEST reading from an empty file"
echo "" > testfile
java RoadNetwork testfile
rm testfile

echo "TEST reading from a non empty file"
echo "intersection A" > testfile
echo "intersection B" >> testfile
echo "road A B 10" >> testfile
echo "road B A 10" >> testfile
java RoadNetwork testfile
rm testfile

echo "TEST error when not a command"
cat > testfile << '--end--'
intersection A
road A A
error A
--end--
java RoadNetwork testfile
rm testfile

The above illustrates two ways of creating multiple line test files. The first is to echo each line of the test into the test file separately, the second uses a special form of input redirection that takes all input after the command up to the indicated termination string.

The problem with this is that test output gets long and it is up to the person running the test to scroll through all the output and see if it makes sense. It would be better to have the test script pause after each test and tell the user what output to expect in cases where the test title didn't explain it:

echo "GROUP of path tests for readNetwork"
echo "TEST reading from an empty file"
echo "" > testfile
java RoadNetwork testfile
rm testfile
echo "--- The above should produce no output"
read -p "--- Press enter to continue"

echo "TEST reading from a non empty file"
echo "intersection A" > testfile
echo "intersection B" >> testfile
echo "road A B 10" >> testfile
echo "road B A 10" >> testfile
java RoadNetwork
echo "--- The above should output something equivalent to this:"
cat testfile
rm testfile
read -p "--- Press enter to continue"

echo "TEST error when not a command"
cat > testfile << '--end--'
intersection A
road A A
error A
--end--
java RoadNetwork
echo "--- The above should complain: 'error' is not a road or intersection"
rm testfile
read -p "--- Press enter to continue"

The above test framework requires human evaluation of the output of each test. There are tools we can use to automate this. The most important of these tools is the diff shell command. This command compares two source files and outputs all of the differences between the files. If the files are identical, it exits with a success code, while if they are different, it exits with a failure code. This allows you to write a shell script that runs flat out except when tests fail, and only then halts to call attention to the failure.

For the next example, we suppose that the directory testfiles contains test data files and files of the expected output for each test. We could rewrite the first test in the above test script as follows:

echo "TEST reading from an empty file"
java RoadNetwork testfiles/emptyfile > output 2> errors
if ! diff output testfiles/emptyfile
        then read -p "FAILURE, wrong output, press enter."
fi
if ! diff errors testfiles/emptyfile
        then read -p "FAILURE, wrong errors, press enter."
fi
rm output errors

Note that the above script is written for sh or bash. The syntax of conditionals in csh and tcsh is different. The convention for indenting shell scripts is to use a single tab for each indenting level. It shouldn't be difficult to figure out the above. The if ! command executes the command that follows it on the same line and sets things up so that code following the next then command will execute only if the tested command fails. You can read this as "if there are differences between the two files, then prompt with a message starting with FAILURE and await input.

The strange command fi ends the if-then-else block, since fi is if spelled backwards. The designer of sh, Stephen Bourne, copied this idea from Algol 68, an innovative language that also ended do blocks with od and case blocks with esac. People joked that Algol 68 comments beginning with comment should have ended with tnemmoc but the language designers relented and made them end with a semicolon.

At this point, it should be clear that developing test scripts is itself a programming job. This was recognized as long ago as the late 1960s, when Fred Brooks, in his 1974 book The Mythical Man Month (a classic in the field of software engineering, still in print) suggested that in a software developmnt team, having someone specialize in building tools such as test frameworks makes very good sense, as does having someone specialize in testing and someone else specialize in documentation.

A suggestion: As each new test is developed, do the testing by hand first. Once you are satisfied that the code passes the test, redirect the output to capture the expected output into a file, and then add that test to the script.

A More Complex Model

The code we have focused on up to this point works, but it is vastly oversimplified. For example, a major detail in real road networks is that there are many types of intersection. We have at least the following variants:

Stop lights have several characteristics, but one of the most significant is that the simplest ones turn green in two directions while they are red in the other two directions. This means that, for example, the lights facing both north and south are green when the east and west lights are red, and visa versa. More complex stoplights have turn arrows, but for all varieties of stoplights, roads into or out of that intersection must have labels indicating the direction from which they enter or leave.

Similarly, in the neural net example, we have several kinds of synapses. There are excitatory synapses where an action potential traveling down an axon to that synapse causes a positive change in the receiving neuron, pushing it closer to the threshold that would cause it to fire, and there are inhibitory synapses that cause a negative change in the receiving neuron, making it less likely to fire. there are also axosynaptic interfaces where a secondary synapse transmits signals to a primary synapse, activating or inhibiting the primary synapse.

In a logic simulator, there are several kinds of gates. We typically speak of and, or, and not gates, but there are also nand, nor, and exclusive-or gates, as well as assymetric gates that perform functions such as a and not b. This means that we must document each wire leading to a gate by indicating which input it connects to. In the general case, gates may have multiple outputs, so wires from a gate must also be tagges with which output they connect to.

In an epidemic simulator, we need to worry about classes of people. There are employees and students, for example. Employees have a workplace as well as a home. Students have a school as well as a home. This also leads us to think about multiple categories of place, homes, schools and workplaces, where a school is a type of workplace that also has both students and employees. We'll stop here, but we could add students with part-time jobs and homes as workplaces for domestic labor or high-tech startups run out of people's basements.

Impact on the Road Network Descriptioin Language

This has an immediate impact on our road network description. Where we formerly just said:

intersection A
intersection B

We can now say things like:

intersection A stoplight
intersection B

We've made a decision above, a decision that has several consequences. That is, for specialized types of intersection, we explicitly name the intersection type, but there is also a default where there is no explicit name. We could have reqired intersection B above to be declared as a simple intersection or something like that. The primary problem with this design decision is that it complicates the problem of parsing the input file.

A second consequence is that for roads, we need to document how the road connects to the intersections it joins, for example, using a notation like this:

road A north B south

This means that there is a road leaving intersection A going north to intersection B where it enters from the south.

Impact on the Road Network Epidemic Model

In our epidemic model, we need to add new classes to our model, but we also need to extend the model description. A quick game of search-engine shows that both school sizes and household sizes are reasonably approximated by log-normal distributions, although Poisson distributions may be technically better. A log-normal distribution can be generated from a normal distribution, and such a distribution has just two parameters, the mean of the normal distribution and its standard deviation. The wikipedia page shows how to derive these from the median size and variance.

So, we should use median and variance as our parameters on the description of our population:

family 3,4
school 200,100

This describes families with a median size of 3 and a variance of 4, and schools with a median size of 200 and a variance of 100. These figures are not necessarily right, but they are good enough. We could do workplaces similarly, but note that schools are a special case of workplace, characterized by student-teacher ratios.