CS:2820 Notes, Lecture 22

Discrete-event simulation was first developed in the 1960s, before anyone understood object-oriented programming. Much of the early work was done in the FORTRAN programming language. FORTRAN, at that time, had very limited control structures and a very limited data model. Variables could be integers, floating-point numbers, or double-precision floating point numbers. There was no character data type, but characters could be processed as integers. There was nothing resembling classes or what C calls structures, but there were arrays. This was very limiting, but more than sufficient to write any program, at least in theory.

If we were building our road network simulation in FORTRAN back then, we might begin by designing the event set. Each event might be described by a logical record containing:

Of course, since FORTRAN did not have record structures of any sort, we'd have to make do with arrays. For each integer value of I, the array entries ETIME(I), EVEHICLE(I), EROAD(I), EINTERS(I), and ENEXT(I), would describe one event. The time field is floating point, while the other fields are all integers. Logically, there is an array of vehicles indexed by vehicle number, an array of roads, indexed by road number, and an array of intersections indexed by intersection number. Events themselved are arranged in a logical array of events indexed by event number, but note, event numbers are arbitrary. The next field organizes events into a linked list, sorted in chronological order, with the head of the list being the head of the event set, the next event to simulate.

FORTRAN, first introduced in 1956, was historically written in all upper case because the older codes used for punched cards didn't support lower case. Fortran identifiers were limited to 8 significant characters (characters after the eighth were ignored) so the prefix E is used above to distinguish all of the variables that represent parts of the event set.

If you were developing a discrete-event simulator in FORTRAN, one of your first goals would be to hide the clutter of arrays described above from the rest of the program. You might do it as follows:

The problem with this approach is that the event set implementation is specific to vehicles, roads and intersections. Any change to the subject matter of the simulation requires a change to the user interface for the pending event set. This is just one of a huge number of deficiencies in first generation programming languages that drove the development of more recent languages.

Unfortunately, a large amount of the early work on discrete-event simulation was done using FORTRAN, so much of the published work on simulation methodology focuses on interfaces such as that given above.

A Better Framework

Languages like Pascal and C, both dating to the late 1960s, introduced the concept of records. In both of these languages, instead of writing ETIME(I) as in FORTRAN, you could write e[i].time where the event set e is a single array of events, indexed by event number, and each event is a record or structure with multiple fields, including a time field.

Both Pascal and C supported pointers, and in both languages, it would be unnatural to represent the event set as an array of events. Instead, the next field of each event would be a pointer to the next event in chronological order, and if i is a pointer to an event, i->time or (*i).time (both C notation) or i^.time (Pascal) would be the time field of the event pointed to by i. Pascal or C programmers might be tempted to use the folloing pending-event set interface:

This interface is both better and worse than the FORTRAN interface suggested above. It is better because, so long as all event notices have appropriate next and time fields, the pending-event-set management code can easily be preserved and repurposed from one simulatioin model to another.

This interface is not an improvement in one crucial way: The schedule operation forces the programmer to first create a new event and load up its fields. What was one line of code using the FORTRAN-inspired interface will now be many lines of code using this interface. It might look something like this in Pascal and C:

{ Pascal }              /* C */
new(e);                   e = (event)malloc( sizeof( event ) );
e^.time = t;              e->time = t;
e^.vehicle = v;           e->vehicle = v;
e^.road = r;              e->road = r;
e^.intersection = i;      e->intersection = i;
schedule( e );            schedule( e );

It's worth noting that the above code would most likely never be written after the event is created. Instead, the event notice created for some vechicle would most likely follow that vehicle around the model. As the vehicle leaves a road, the intersection field would be changed. As the vehicle enters an intersection, only the time field would change. On leaving an intersection, the road field would change, and on entry to a road, the time field would change.

Object-oriented programming models allow us to improve on this. It is worth noting that the first object-oriented programming language, Simula 67, was a contemporary of C and Pascal, but it was not used widely outside of Norway. Only when Bjarne Stroustrup, an experienced Simula 67 progrmmer, took a job at Bell Labs and developed C++ did object-oriented programming begin to make significant inroads in the rest of the world. The first C++ compilers outside Bell Labs began to appear around 1985. As a result, Pascalish and Cish thinking dominates a considerable amount of the simulation literature.

A Stupid Object-Oriented Framework

The underlying data structure of the pending-event set can be viewed as an object, so naturally, we can take the framework suggested above and trivially make it object oriented by explicitly recognizing this:

This interface offers no real advantage over the previous framework. It allows for there to be multiple pending-event sets, but the discrete-event simulation algorithm only requires one, so this is no advantage.

The move to object orientation has a huge benefit in one place: The nature of events. Events are naturally polymorphic. In our highway network simulation, for example, we have events representing things like:

The values that must be specified for each of these are quite different, and the actions to take when each of these events occur are different. Use of polymorphism allows us to deal with this by having class Event be the superclass of class ArrivalEvent, EntryEvent, and LightChangeEvent. What matters is that all subclasses of Event implement a trigger method that causes that event to happen.

Program transformations

No programmer needs objects. Programming languages prior to Simula 67 did not have object-oriented features. These languages include Algol, BASIC, BCPL, C, COBOL, FORTRAN, and Pascal, although some of these have had object-oriented features grafted on as an afterthought.

The most famous of these afterthoughts is C++, which is basically C with object oriented features grafted on. Bjarne Stroustrup, a danish computer scientist, learned to program in Simula 67 before coming to the United States to work in the Unix group at Bell Labs in the late 1970s. The Unix group had developed the C programming language, and as a matter of policy, all of their work was in C. Stroustrup wanted the object-oriented features of Simula 67, so he wrote a preprocessor that added object-oriented features to C; the input language to this preprocessor became C++.

The fact that C++ can be translated to C is a strong hint. Nothing you can do in an object-oriented programming language cannot be done in a language that lackes classes, objects and methods. This should not be too much of a surprise, since our computer hardware has none of these.

Eliminating non-static methods

In general, all non-static methods can be eliminated from an object-oriented program. Consider this little (nonsense) code fragment:

class C {
    private int f; // some field of the object
    public void m( int g ) { // method m
        f = g;
    }
}

// somewhere else in the program
    C o;    // o is an object of class C
    o.m(5); // apply the method C.m to o

class C {
    private int f; // some field of the object
    public static void m( C o, int g ) { // method m
        o.f = g;
    }
}

// somewhere else in the program
    C o;      // o is an object of class C
    C.m(o, 5) // apply the method C.m to o

This transformation always works. In fact, this is the central thing that the original preprocessor for C++ did when it converted object-oriented C++ code to non-object oriented C code.

Note that if all methods are static, you can no-longer claim to be writing object-oriented code. Your classes, such as they are, are merely data organizers, with related subroutines (that do not return values) or functions (that do return values) bundled together with them.

If you were writing assembly language or C code, you could remove all the static methods form their classes, replacing names like C.m() above with global subroutines that have names like C_m(). This is exactly what the original C++ preprocessor did as it translated C++ code to C.

No method needs more than one parameter

There are a number of other transformations that we can apply. For example, we never really need multiple parameters for any method. Consider this little (nonsense) code fragment:

class C {
    private int f; // some field of the object
    public void m( int g, int h ) { // method m
        f = g + h;
    }
}

// somewhere else in the program
    C o;       // o is an object of class C
    o.m(5, 6); // apply the method C.m to o

We can always rewrite this code, replacing parameters with components of the object:

class C {
    private int f; // some field of the object
    public int g;  // a parameter to method m
    public void m( int h ) { // method m
        f = g + h;
    }
}

// somewhere else in the program
    C o;       // o is an object of class C
    o.g = 5;
    o.m(6);    // apply the method C.m to o

We have made no changes to how the program does its job, we have just changed how parameters are passed to the method m. We can even make the field g private if we add a new method to set g:

class C {
    private int f; // some field of the object
    private int g; // a parameter to method m
    public void passG( int newg ) { // parameter passing to method m
        g = newg;
    }
    public void m( int h ) { // method m
        f = g + h;
    }
}

// somewhere else in the program
    C o;       // o is an object of class C
    o.passG(5);
    o.m(6);    // apply the method C.m to o

Note. For static methods, the same transformation applies, but the new fields you add for passing the extra parameters need to be static fields.

Eliminating multiple parameters may seem stupid, but it has a real role. In Intro to Psych, when I took it years ago, one of the things I learned about is the magic number 7 plus or minus 2. Numerous experiments have shown that people's short term memories have a capacity of about 7 items. Psychologists call them chunks because the size of an item is ill-defined. If you read people a sequence of random digits such as 3710583, and then hand them a pencil and ask them to write the sequence down, most people will be able to do that for sequences of about 7 digits. On the other hand, if you read an Iowa City resident a number such as 3193350740, 10 digits long, many will be able to remember it, particularly if you pause in the right places, reading it as 319 335 0740. That is because 319 is just one chunk, the area code of Iowa City, and 335 is another chunk, the standard prefix on all University of Iowa academic phone numbers.

    a.b(c,d,e,f,g);

When trying to deal with this call, there are about 7 chunks in it. For methods with fewer parameters, you can most likely juggle them in your head, but for methods with more parameters, you most likely need to write things down, keeping a manual page in one window while you struggle to get things right in another. In this context, it starts making good sense to transform the code to something like this:

    a.setup(c,d);
    a.b(e,f,g);

This makes the most sense when the setup code sets options and values that will remain the same through multiple calls, while the second call has parameters that tend to be different each time they are used. For example, a typical text display subsystem on a graphics computer screen has a logical operation something like the following:

    window.putchar( x, y, font, color, angle, char );

Calling this operation for each character would be extremely cumbersome, and in any case, we usually put out a number of characters all with the same attributes, so we rework the interface to look like this:

    window.settextattributes( font, color, angle );
    window.putchar( x, y, char );

And then, we make an additional change, having the window also retain the current screen coordinates and update them by the width of each character as it is displayed, so we end up with this:

    window.settextattributes( font, color, angle );
    window.setat( x, y );
    window.putchar( char );

Now, we can set the attributes for a whole block of text, and then set the starting location of a line, output all of the characters in that line, and then set the starting location of the next line. We made no change to the underlying algorithm for displaying a character, all we did is change the architecture of the interface to that algorithm.

Constructors

We don't need constructors, or rather, we don't need constructors other than the simplest of default ones. Consider this class:

class C {
    int i = 5;
    float f = 7.5f;
    SomeClass sc;
    C() {
        sc = new Someclass( null );
    }
    C( Someclass sc ) {
        this.sc = sc;
    }
    ... other methods ...
}

First note that the above code mixes in-line initializations of the fields i and f with constructors that handle the field sc. We can move the in-line initialization code into each of the many constructors a class may define. For example:

class C {
    int i;
    float f;
    SomeClass sc;
    C() {
        i = 5;
        f = 7.5f;
        sc = new Someclass( null );
    }
    C( Someclass sc ) {
        i = 5;
        f = 7.5f;
        this.sc = sc;
    }
    ... other methods ...
}

Second, we can replace all constructors with factory methods, that is, methods that manufacture and return instances of the class:

class C {
    int i;
    float f;
    SomeClass sc;
    C newC() {
        C nC = new C();
        nC.i = 5;
        nC.f = 7.5f;
        nC.sc = new Someclass( null );
        return nC;
    }
    C newC( Someclass sc ) {
        C nC = new C();
        nC.i = 5;
        nC.f = 7.5f;
        nC.sc = sc;
        return nC;
    }
    ... other methods ...
}

Now, elsewhere in the code, anywhere we used to write new C() or new C(x), we now write C.newC() or C.newC(x).

Eliminating private fields

The keywords private and protected are never needed in an object oriented program, So long as there are no naming conflicts -- that is, other variables with the same name as a private field -- we can make all fields public. If there are naming conflicts, we need to change the names of the private or protected fields to be unique, for example, by adding a prefix, but this does not change the meaning of the program in any way.

All access rights restrictions such as private and protected serve just one purpose, to prevent careless programmers from making mistakes. These declarations have an impact on the development process, and they have an impact on the documentation of the program, but they have no impact on program execution. In effect, if a variable is declared as private, you could just as well declare it as public and add a comment documenting the access restriction you wish the compiler would enforce.

Eliminating final variables

The keyword final can always be deleted from a program. If you declare a variable to be final, and the program compiles correctly, then deleting the keyword will leave the variable being merely effectively final.

What the final keyword does is prevent you from making new assignments to the variable. As with private and protected, therefore, it is there to help you document a program, and it allows the compiler to enforce restrictions you have documented.

It may well be valuable, in programs that don't use the final keyword, to add comments that document the fact that an assignment is indeed the final assignment to a variable.

Software architecture

The architecture of a piece of software describes how a user interacts with it, just as the architecture of a building describes how a user interacts with the building.

From an architectural standpoint, you do not care very much about what algorithms are used, except to the extent that they have an impact on things users care about like price and performance.

When speaking of a package or a class, where the users are the programmers making use of that package or class in their programs, the architecture is primarily exposed in the public interface to that package or class. This architecture can be changed in many ways while making no changes to the underlying algorithms.

In this semester, up to this point, we have explored several different architectures for the class Simulator that change the way programmers write discrete event simulation programs. We used lambda expressions and explicit subclasses to achieve the same behavior.

In designing small programs, architecture hardly matters. In big programs, poorly selected architecture can have a significant impact.

22. Alternative Simulation Frameworks

An Archaic Framework