CS:2820 Notes, Lecture 6

In the description of the data structures, we already decided that travel times are given in seconds. For long roads, it would be useful to give times in other time units such as minutes or hours. For now, we'll assume that times are given in seconds, with the full knowledge that this is an indadequate design.

Here is class Road with a preliminary version of a reasonable initializer to read from the above file:

/** Roads are one-way streets linking intersections
 *  @see Intersection
 */
class Road {
    float travelTime;         //measured in seconds
    Intersection destination; //where the road goes
    Intersection source;      //where the comes from
    // name of road is source-destination

    // initializer
    public Road( Scanner sc, LinkedList <Intersection> inters ) {
        // code here must scan & process the road definition

        string sourceName = sc.next();
        string dstName = sc.next();
        // Bug:  Must look up sourceName to find source
        // Bug:  Must look up dstName to find destination

        // Bug:  What if the next isn't a float
        travelTime = sc.nextFloat();
        string skip = sc.nextLine();
    }       
}

There are, of course, some bugs here! We need to do quite a bit of work, and, of course, we need to write the corresponding code for class Intersection.

Extreme Programming

Long before you have a useful program, you should be running it and testing it. Extreme programming, or XP as some advocates abbreviate it, recommends that any large programming project be developed incrementally. The classic model XP development model organizes the development process in one-week steps. During each week:

Monday: Develop specifications for the end product for the week.
Tuesday: Develop tests that the product must meet by the week's end.
Wednesday: Develop code to meet the specifications.
Thursday: Run tests and debug the code. The code should pass all tests including those for previous steps in the development cycle, excepting those tests that have been made obsolete by changes in the specifications.
Friday: Assess progress for the week.

At the end of the week, either you have reached the week's goals, with a completely tested body of code, or you have failed. If you fail, the XP methodology demands that you completely discard the week's work and start over. The reason for this extreme recommendation is that debugging is hard. If you did not get it right within a week, starting over may well get you to working code faster than grinding away at the bugs that just won't go away.

Consider a piece of code that took only a few hours to write. If you have a bug in that code, is it better to spend days trying to find and fix that bug, or is it better to discard the code and spend a few hours writing a replacement? If you try to write code to the same specification as before, you might make the same mistake, so the methodology here suggests you go back and toss not only the code, but also the specification, and re-specify, perhaps making the total work for this step smaller, so that you only reach the final goal after a larger number of smaller steps.

The important thing is that each weeklong step in the development cycle starts with working code from the previous successful step and (if successful) produces working code on which the next step can be built. The only thing you carry over from a failed development step is what you learned from that failure.

Note that this is a vast oversimplificaton of the XP methodology. The full methodology speaks about team structure, coding standards, and many other details. In my opinion, rigid adherence to the full methodology is likely to be foolish, particularly since the full XP methodology seems particularly oriented toward development that is driven by the GUI structure, or more generally, by input-output specifications. For software products that aren't GUI based, the methodology must be modified, and there is no reason to believe that one week is the right step size.

Regardless of that criticism, incremental development is extremely powerful and the basic framework above is very useful. The basic take-away from the XP model is: Take little steps. Don't try to get a whole large program working in one bang, but start with working code (for example, Hello World) and then augment it, one step at a time, until it does everything you want.

The Waterfall Model

At the opposite end of the spectrum from extreme programming, you will find what has come to be described as the waterfall model of software development:

Requirements: First, work out the requirements for the entire project. do not do any design work until the requirements are worked out in detail.
Design: Then, design the solution at a high level, working out all the internal components and their relationship. Do not do any implementation until a detailed design is completed.
Implementation: Write the code to implement the design. Complete writing the code before you do any testing.
Verification: Verify that the code meets the requirements. This involves extensive testing.
Maintain: Invariably, the requirements will change with time, and customers will find bugs.

Many software purchase contracts have been written assuming the waterfall model, where the requirements are written into the contract, and where the contractor is required to submit a complete design before implementation begins. If you look at the way bridges are designed and built, the waterfall model comes very close to describing a working system. First, you work out what the bridge should do, then you hire an engineering firm to complete the design, then you let bids for a contractor to build the bridge. Frequently, another engineering firm is hired to inspect the finished bridge, then you open it for traffic and take responsibility for maintenance.

In the world of software, the waterfall model has led to huge cost overruns and project failures, yet it seems to be a natural application of a methodology with a long proven track record. What's wrong?

The problem with the waterfall model is that it assumes that you can check your specification and have some confidence in it before you start coding. Unfortunately, most software projects are too complex for that. Until you have working code, you can't be sure that the specificaitons are right. Similarly, it assumes that you can look at the code and verify that it meets the specifications without trying it, and it assumes that your testing will determine that the code meets the specifications. In the real world of software, the end user will invariably try things that weren't covered in the tests, finding bugs, and many of those bugs will be due to flaws in the initial design.

A Hierarchy of Virtual Machines

One approach to designing a large system is to view the system as a set of layers, where each layer is, in effect, a programming language or a programmable machine. Consider this view:

Hardware: The bottom layer is a areal machine of some kind.
Operating System: User applications run on a virtual machine built by layering an operating system on top of the hardware.
Programming Environment: Most applications don't use the bare operating system, they add a set of resources to the system resources that are created, for example, by the standard library of the programming language they are using. So, for example, we are using the Java library on top of the Linux operating system.
Special-purpose components: Large applicatoins typically include their own collections of purpose-built classes, that sit on top of the system and language, creating an environment in which the final development steps are simplified.
Higher-level components: There may be several layers of custom components sitting under a very large application.
Top level code: If the lower layers are well thought out, the top level code to implement the application can end up being very simple.

It is important to note that, in a layered system, test frameworks can be built for each layer, where the test framework does not depend in any way on the behavior of the higher layers (the layers closer to the end user and application). Furthermore, lower layers in the system may be repurposed to serve other applications.

Note that developing a hierarchy of virtual machines from the bottom up can be very messy because until the final application is developed, there is little motivation or direction for the lower layers. Nonetheless, this is a workable incremental programming methodology.

The ideas of extreme programming suggest that it might be better to slice the system diagonally, building a small part of the final application on top of a small part of each of the lower layers, growing each layer as needed to support the part of the top-level application that is next on the development list. The point here is, it can be constructive to view a system as made of multiple layers of virtual machines even if the layers are incompletely defined to begin with and even if the order of development is not the same as the order of the layers.

Transparency

In any system composed of layers of virtual machines, each layer can be transparent or opaque (or a mosaic of transparent and opaque parts).

Where a layer is transparent the behavior of the underlying layers is completely exposed. If a programmer working on the lower layer could do something, a programmer working on the upper layer can do the same thing. A transparent virtual machine adds features or functions without preventing use of the lower layers.

Where a layer is opaque it prevents certain behaviors that a programmer at the lower level could have evoked. Opaque layers can prevent unsafe activity.

This notion was originally developed by David Parnas, who illustrated it with the following example: Consider a vehicle with 4 wheels (like a car) where the front wheels can be independently turned to steer the vehicle. This vehicle is both very manuverable and very dangerous. If you steer the front wheels so that the lines of their axles intersect the line of the rear axle at a single point, you can turn the vehicle around the point of intersection. Because the front wheels can be turned arbitrarily, you can even turn the vehicle around the point midway between the rear wheels -- allowing the car to rotate in place and making it very easy to park your car.

On the other hand, if you are moving at any speed and you turn the front wheels so that the lines of their axles don't follow the basic rule above -- intersecting the line of the rear axles at two different points -- the front wheels will skid and you will lose control. At high speeds, this mistake can cause the vehicle to flip over, killing the driver.

In automobiles, we add an opaque virtual machine on top of the independently hinged front wheels. That layer consists of a (modified) parallelogram linkage that keeps the front wheels approximately parallel and, for wide radius turns, comes close to the ideal of keeping the lines of their axles intersecting the line of the rear axle at a single point. This virtual machine is not transparent -- it completely prevents turns below a minimum radius, so you have to learn complex manuvers for parallel parking, but it also prevents you from putting the front wheels into strange positions that would be very dangerous in a moving car.

The access control mechanisms of languages like Java are there to allow you to control the transparency of your implementations. If you declare components of your code to be private, for example, you can prevent unsafe manipulation of those components. Note, however, that the access control mechanisms of Java, while quite powerful, are not a complete set.

For example, if you declare a class with two methods and one component variable, you cannot declare that variable to be read-only when seen from one method and read-write when seen from another. People have proposed programming languages where this kind of fine grained control is possible, but the best you can do in Java is to add comments saying, for example, "here, variable x is never modified" where you might want to have the language prevent modification.

Incremental Development

Let's apply these goals to our running projects! We cannot reasonably use the waterfall model for a project that has such an ill-specified behavior. We need to begin with something simple.

For the Road Network

For a road network: Read a description of the road network and build a model, the working just from that model, print out the description. The printed description serves to prove that we successfully read and represented the network.

Once we fully describe the network, we have no need for the print mechanisms and we can begin working on simulation mechanims that actually use the model.

The first version of the model that we work with won't even be a complete simulation of the road network. We can begin with just roads and generic intersections, with no subclasses at all, or perhaps only one subclass, uncontrolled intersections. Once we have this working, we can add subclasses of intersections, one at a time, debugging the processing and printing of each subclass before we add another.

For the Epidemic Model

For the epidemic model: Read a description of the community and build a model of it, with no simulation, then print out the model. That is, print out the "names" of all the people and places, giving a description of each. Names can be very crude, just unique nonsense textual identifiers (Java actually creates such nonsense names for each object automatically). The description of each person would identify their family, job, school and so on. The description of each place would identify those who live there, work there or study there, business and school hours, and so on.

Once we fully describe the community, we have no need for the print mechanisms and we can begin working on simulation mechanims that actually use the model.

The first version of the code can be built with just minimal generic people and places, and no subclasses, except perhaps households. Once we get this working, we can start adding different kinds of places to the model, one at a time, debugging each subclass before we add another.

Scaffolding

Both of the above proposals include the idea of writing software for debugging that will be discarded at the end of the process. Such software is called scaffolding, by analogy with the scaffolding used in building construction. Scaffolding is not part of the finished building, but it is constructed to aid in the building process, and when it is no longer needed, it either discarded (typical of wood or bamboo scaffolding) or disassembled for reuse in other projects (typical of modular metal scaffolding).

6. Errors, Organization, and Scaffolding

Extreme Programming

Extreme Programming

The Waterfall Model

A Hierarchy of Virtual Machines

Transparency

Incremental Development

For the Road Network

For the Epidemic Model

Scaffolding