9. Errors, Organization, and Scaffolding

Part of CS:2820 Object Oriented Software Development Notes, Spring 2021
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Extreme Programming

In the description of the input languagee for describing road networks and epidemics, we noted that time units would be handy but that a default of specifying times in seconds would be adequate. Similarly, we noted that there could be many types of intersections, but that we were deferring the issue, and we noted that there could be many different kinds of places un an epidemic model but we are deferring the issue.

Here is class Road from the road-network simulation problem with a preliminary version of a reasonable constructor that reads from the road-network description file:

/** Roads are one-way connections between intersections
 *  @see Intersection
 */
class Road {
    float travelTime;         //measured in seconds
    Intersection destination; //where the road goes
    Intersection source;      //where the comes from
    // name of road is source-destination

    // constructor
    public Road( Scanner sc, LinkedList <Intersection> inters ) {
        // code here must scan & process the road definition

        string sourceName = sc.next();
        string dstName = sc.next();
        // Bug:  Must look up sourceName to find source
        // Bug:  Must look up dstName to find destination

        // Bug:  What if the next isn't a float
        travelTime = sc.nextFloat();
	sc.nextLine(); // skip anything else on this line
    }       
}       

There are, of course, some bugs here, and they have been noted with comments. We need to do quite a bit of work to go from this code to bullet-proof code suitable for even limited release, and, of course, we need to write the corresponding code for class Intersection.

What is important is that the above code is just enough that it can be tested. What we are doing here is incremental development. This involves deliberately ignoring large parts of a problem. What we do is write close to the smallest piece of code we can so that the code is testable and it implements some small part of what we hope will be the final specification.

In testing this small part of the code, we are testing two separate issues: First, does the code work, and second, is the specification really right? It is quite common, when working on large problems, to find that the customer really doesn't know what they want until they can play with a working application that does at least part of what they thought they wanted. As a result, devising a complete specification and then writing code to that specification can be a significant mistake.

Incremental development can be done informally, or it can be formalized as a strict methodology. One attempt to make a strict methodology of it is called extreme programing sometimes abbreviated XP. When you find software products with the XP as a suffix on their names, this is sometimes an indication that the product was either built using extreme programming, or built to encourage extreme programming, or trying to sound exciting and exotic because it has an exciting buzzword attached to its name.

Extreme Programming

Long before you have a useful program, you should be running it and testing it. Extreme programming, or XP as some advocates abbreviate it, recommends that any large programming project be developed incrementally. The classic model XP development model organizes the development process in one-week steps. During each week:

Monday
Develop specifications for the end product for the week.
Tuesday
Develop tests that the product must meet by the week's end.
Wednesday
Develop code to meet the specifications.
Thursday
Run tests and debug the code. The code should pass all tests including those for previous steps in the development cycle, excepting those tests that have been made obsolete by changes in the specifications.
Friday
Assess progress for the week.

At the end of the week, either you have reached the week's goals, with a completely tested body of code, or you have failed. If you fail, the XP methodology demands that you completely discard the week's work and start over. This is, perhaps, the only extreme element of the XP methodology. The reason for this extreme recommendation is that debugging is hard. If you did not get it right within a week, starting over may well get you to working code faster than grinding away at the bugs that just won't go away.

Consider a piece of code that took only a few hours to write. If you have a bug in that code, is it better to spend days trying to find and fix that bug, or is it better to discard the code and spend a few hours writing a replacement? If you try to write code to the same specification as before, you might make the same mistake, so the methodology here suggests you go back and toss not only the code, but also the specification, and re-specify, perhaps making the total work for this step smaller, so that you only reach the final goal after a larger number of smaller steps.

The important thing is that each weeklong step in the development cycle starts with working code from the previous successful step and (if successful) produces working code on which the next step can be built. The only thing you carry over from a failed development step is what you learned from that failure.

Note that this is a vast oversimplificaton of the XP methodology. The full methodology speaks about team structure, coding standards, and many other details. In my opinion, rigid adherence to the full methodology is likely to be foolish, particularly since the full XP methodology seems particularly oriented toward development that is driven by the GUI structure, or more generally, by input-output specifications. For software products that aren't GUI based, the methodology must be modified, and there is no reason to believe that one week is the right step size.

Regardless of that criticism, incremental development is extremely powerful and the basic framework above is very useful. The basic take-away from the XP model is: Take little steps. Don't try to get a whole large program working in one bang, but start with working code (for example, Hello World) and then augment it, one step at a time, until it does everything you want.

The Waterfall Model

At the opposite end of the spectrum from extreme programming, you will find what has come to be described as the waterfall model of software development:

Requirements
First, work out the requirements for the entire project. do not do any design work until the requirements are worked out in detail.
Design
Then, design the solution at a high level, working out all the internal components and their relationship. Do not do any implementation until a detailed design is completed.
Implementation
Write the code to implement the design. Complete writing the code before you do any testing.
Verification
Verify that the code meets the requirements. This involves extensive testing.
Maintain
Invariably, the requirements will change with time, and customers will find bugs.

Many software purchase contracts have been written assuming the waterfall model, where the requirements are written into the contract, and where the contractor is required to submit a complete design before implementation begins. If you look at the way bridges are designed and built, the waterfall model comes very close to describing a working system. First, you work out what the bridge should do, then you hire an engineering firm to complete the design, then you let bids for a contractor to build the bridge. Frequently, another engineering firm is hired to inspect the finished bridge, then you open it for traffic and take responsibility for maintenance.

In the world of software, the waterfall model has led to huge cost overruns and project failures, yet it seems to be a natural application of a methodology with a long proven track record. What's wrong?

The problem with the waterfall model is that it assumes that you can check your specification and have some confidence in it before you start coding. Unfortunately, most software projects are too complex for that. Until you have working code, you can't be sure that the specificaitons are right. Similarly, it assumes that you can look at the code and verify that it meets the specifications without trying it, and it assumes that your testing will determine that the code meets the specifications. In the real world of software, the end user will invariably try things that weren't covered in the tests, finding bugs, and many of those bugs will be due to flaws in the initial design.

A Hierarchy of Virtual Machines

One approach to designing a large system is to view the system as a set of layers, where each layer is, in effect, a programming language or a programmable machine. Consider this view:

Hardware
The bottom layer is a areal machine of some kind.
Operating System
User applications run on a virtual machine built by layering an operating system on top of the hardware.
Programming Environment
Most applications don't use the bare operating system, they add a set of resources to the system resources that are created, for example, by the standard library of the programming language they are using. So, for example, we are using the Java library on top of the Linux operating system.
Special-purpose components
Large applicatoins typically include their own collections of purpose-built classes, that sit on top of the system and language, creating an environment in which the final development steps are simplified.
Higher-level components
There may be several layers of custom components sitting under a very large application.
Top level code
If the lower layers are well thought out, the top level code to implement the application can end up being very simple.

It is important to note that, in a layered system, test frameworks can be built for each layer, where the test framework does not depend in any way on the behavior of the higher layers (the layers closer to the end user and application). Furthermore, lower layers in the system may be repurposed to serve other applications.

Note that developing a hierarchy of virtual machines from the bottom up can be very messy because until the final application is developed, there is little motivation or direction for the lower layers. Nonetheless, this is a workable incremental programming methodology.

For incremental development, lower layers of a system can be completely developed and tested before work on the layer above is started, but that approach tends to lead to building pieces at lower levels that end up not being useful, and it can lead to "garden path design," where lower-level designs are selected not because they lead to useful features at the upper level but because they are easy.

Aside: The English phrase "to be led down the garden path" means to be deceived, or to be tricked, or to be seduced into reaching a goal other than the one intended. Garden path design occurs during bottom-up development when features at lower levels suggest the development of features at higher levels that, in turn, suggest adding features at even higher levels that in the end don't serve to reach the desired goal or even impede making the entire system useful for its intended purpose.

The classic "waterfall" approach to avoiding garden-path design in a hierarchically structured system is to do top-down design first, and then bottom-up implementation and testing. Top down design starts with the top-level specification, from which lower levels are derived. As a result, no feature in the lower levels emerges that is not required to meet the top-level specification. After finishing specifying all the layers, incremental development can work from the bottom up, testing each layer before starting on the next. Of course, all this assumes that the top-level specification was complete and correct to start with, and that is rare in the real world.

The ideas of extreme programming suggest that it might be better to slice a system made of a hierarchy of layers diagonally, building a small part of the final application on top of a small part of each of the lower layers, growing each layer as needed to support that part of the top-level application that is next on the development list. The point here is, it can be constructive to view a system as made of multiple layers of virtual machines even if the layers are incompletely defined to begin with and even if the order of development is not the same as the order of the layers.

Transparency

In any system composed of layers of virtual machines, each layer can be transparent or opaque (or a mosaic of transparent and opaque parts).

Where a layer is transparent the behavior of the underlying layers is completely exposed. If a programmer working on the lower layer could do something, a programmer working on the upper layer can do the same thing. A transparent virtual machine adds features or functions without preventing use of the lower layers, or it duplicates the features of the lower layers, making them available at the upper layer.

Where a layer is opaque it prevents certain behaviors that a programmer at the lower level could have evoked. Opaque layers can prevent unsafe activity.

This notion was originally developed by David Parnas, who illustrated it with the following example: Consider a vehicle with 4 wheels (like a car) where the front wheels can be independently turned to steer the vehicle. This vehicle is both very manuverable and very dangerous. If you steer the front wheels so that the lines of their axles intersect the line of the rear axle at a single point, you can turn the vehicle around the point of intersection. Because the front wheels can be turned arbitrarily, you can even turn the vehicle around the point midway between the rear wheels -- allowing the car to rotate in place and making it very easy to park your car.

On the other hand, if you are moving at any speed and you turn the front wheels so that the lines of their axles don't follow the basic rule above -- intersecting the line of the rear axles at two different points -- the front wheels will skid and you will lose control. At high speeds, this mistake can even flip the vehicle. This vehicle endangers the lives of the driver, the passengers and bystanders.

In automobiles, we add an opaque virtual machine on top of the independently hinged front wheels. That layer consists of a (modified) parallelogram linkage that keeps the front wheels approximately parallel and, for wide radius turns, comes close to the ideal of keeping the lines of their axles intersecting the line of the rear axle at a single point. This virtual machine is not transparent -- it completely prevents turns below a minimum radius, so you have to learn complex manuvers for parallel parking, but it also prevents you from putting the front wheels into strange positions that would be very dangerous in a moving car.

The access control mechanisms of languages like Java are there to allow you to control the transparency of your implementations. If you declare components of your code to be private, for example, you can prevent unsafe manipulation of those components. Note, however, that the access control mechanisms of Java, while quite powerful, are not a complete set.

For example, if you declare a class with two methods and one component variable, you cannot declare that variable to be read-only when seen from one method and read-write when seen from another. People have proposed programming languages where this kind of fine grained control is possible, but the best you can do in Java is to add comments saying, for example, "here, variable x is never modified" where you might want to have the language prevent modification.

Incremental Development

Let's apply these goals to our running examples! We cannot reasonably use the waterfall model for a project that has such an ill-specified behavior. We need to begin with something simple.

Consider these example goals:

For the Road Network

For a road network: Read a description of the road network and build a model, then working just from that model, print out the description. The printed description serves to prove that we successfully read and represented the network.

Aside: This initial function is useless. We could achieve the same result far more simply by just printing out the description file without bothering to build the model. The only purpose of the initial output proposed here is to prove that we built the model.

Once we fully describe the network, we have no need for the print mechanisms and we can begin working on simulation mechanims that actually use the model. Code that is built for testing purposes and later thrown away is called scaffolding

Aside: Software scaffolding is named by analogy with the scaffolding used in building construction. Scaffolding is not part of the finished building, but it is constructed to aid in the building process, and then removed once the building is finished enough that it is no longer needed. Wood or bamboo scaffolding is frequently just discarded, while metal scaffolding is usually disassembled for reuse in other projects.

The first version of the model that we work with won't even be complete enough to allow a simulation of the road network. We can begin with just roads and generic intersections, with no subclasses at all, or perhaps only one subclass, uncontrolled intersections. Once we have this working, we can add subclasses of intersections, one at a time, debugging the processing and printing of each subclass before we add another. Only when we reach the point where we add vehicles will simulation be possible.

For the Epidemic Model

For the epidemic model: Read a description of the community and build a model of it, with no simulation, then print out the model. That is, print out the "names" of all the people and places, giving enough of a description of each that the correctness of the model can be checked. Names can be very crude, just unique nonsense textual identifiers. (Java has a built-in mechanism that automatically produces such a name for every object.) The description of each person would identify their family, job, school and whatever other attributes they have. The description of each place would identify those who live there, who works there who studies there, etc, and when.

Once we fully describe the community, we can discard the print-mechanism scaffolding and begin working on simulation mechanims that actually use the model.

The first version of the code can be built with just minimal generic people and places, and no subclasses, except perhaps households. Once we get this working, we can start adding other kinds of places to the model, one at a time, debugging each subclass before we add another.