3. Models and Objects

Part of CS:2820 Object Oriented Software Development Notes, Fall 2020
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

 

Models:

A model is an abstraction of some aspect of reality. So, for example, we can describe a flood control reservoir such as the Coralville reservoir by a set of variables and functions:

  • inflow(t), the inflow at time t in volume per unit time.
  • outflow(t), the outflow at time t in volume per unit time.
  • volume(t), the reservoir volume at time t.
  • level(t), the reservoir level at time t, an altitude.
  • gate-setting(t), the altitude of the edge of the sluice gate at time t.
  • f1(v) gives the level as a function of volume v.
  • f2(h) gives the outflow as a function if h, the height of the reservoir over the sluice gate.
  • The model connects these variables and functions to describe the behavior of the reservoir:

  • volume(t) = volume(t – 1) + inflow(t) – outflow(t)
  • level(t) = f1( volume(t) )
  • outflow(t) = f2( level(t) – gate-setting(t) )
  • This model is an oversimplification. It ignores groundwater, it ignores bank erosion, it ignores evaporation. All real models are oversimplifications.

    The U.S. Army Corps of Engineers in Rock Island spends quite a bit of effort on such models, but they are not what we will focus on here. The above model is a model of a continuous system, the kind of system that is described by differential equations. Take a numerical analysis course if you want to learn more about that kind of modeling. Here, we will focus on discrete event models.

    For a second example, consider orbital dynamics. Consider a model of a satellite in near Earth orbit. For short-term dynamics, we can ignore the effects of the Moon, the Sun and other planets, and we can ignore the rotation of the Earth, treating both the Earth and Moon as point masses. The variables that interest us are:

  • t, the time
  • r(t), the center to center distance from earth to satellite, as a function of t
  • v(t), the velocity of the satellite as a function of time.
  • a(t), the acceleration of the satellite as a function of time.
  • f(t), the force on the satellite as a function of time.
  • Some constants matter:

  • G, the gravitational constant
  • me, the mass of the earth
  • ms, the mass of the satellite
  • Δt, the time step
  • All of the above are 3 dimensional vectors, with x y and z components. The basic rules are the rules of physics:

  • f(t) = Gmems / |r(t)|2
  • a(t) = f(t) / ms
  • v(t + Δt) = v(t) + a(t)Δt
  • r(t + Δt) = r(t) + v(t)Δt

    Given an initial position and velocity, this model predicts the future behavior of the satellite. Smaller values of Δt and higher precision arithmetic lead to more precise predictions. Numerical analysis can be used to derive far more accurate models from this framework.

    Discrete Event Models:

    Our modern ideas of object-oriented programming originated in a Norwegian research group that was heavily ivolved in discrete event models and discrete event simulation. Here are some examples of systems that illustrate this kind of model:

    Example 1: A Highway Network

    Consider modelling a highway network. A model of a highway network will typically have:

    roads
    Roads connect intersections, and they have attributes such as length and, travel-time. In a simple model that ignores congestion, we can ignore the possibility that the road also serves as a parking lot during rush hour.

    intersections
    Intersections tie together roads, intersections may only be occupied by one vehicle at a time, and they may have one of several different control algorithms: Uncontrolled, traffic circle, stop-signs, or traffic light. The control algorithm determines how vehicles pass through the intersection from road to road.

    vehicles
    Vehicles may either be running on a road or waiting at an intersection. Each vehicle has a plan that names a series of roads that it intends to follow to get from its current position to its destination. Vehicles may also carry cargo and passengers.

    In an object-oriented model, we can consider each of the above to be the name of a class, and each attribute of the objects in a class can be considered to be a field of that class. Some of these classes are more complex than others. Intersections have subclasses depending on the control algorithm. Vehicles may have subclasses that depend on the type (taxis and dump trucks have very different behavior).

    The events that interest us constitute another class. The following events matter in this highway network:

    Vehicle enters road
    When a vehicle enters a road, it will arrive at the intersection at the other end at a time determined by the road and (in more complex models) by the number of other vehicles on that road.

    Vehicle arrives at intersection
    When a vehicle arrives at an intersection, it will enter one of the roads leaving that intersection at a later time determined by the rules for that intersection.

    The primary attribute of every event is the time at which it occurs. In a discrete event model, each event occurs at a specific instant of time. Any process that takes time is considered to occur as a sequence of events spanning that time, where each event in that sequence marks a points where the state of the model must be checked to determine when the next event in that sequence occurs.

    Of course, if your model is being used in a context where there is a graphic display of the progress of the model, each vehicle, road and intersection will need to have display coordinates associated with it.

    Example 2: Digital Logic

    As a second example consider digital logic. Here, the major classes were:

    Logic gates
    A logic gate is a component that computes some logical operation such as and, nand, or, nor, or not. These should be familiar Boolean operators, although you may not be used to thinking of them as physical objects. Aside from the logical function computed by each gate, the attributes of a gate are the wires to which it connects, the current value of the output of that gate, and the time delay of that gate, from a change in one of its inputs to a change in its output.

    Wires
    Wires may be attached to the output of any logic gate to carry the value output by that gate to the inputs of zero or more other gates. The attributes of a wire are the logical value it is currently presenting to its output(s) and the time delay the wire introduces from its input to its output(s).

    Of course, if your goal is to manipulate digital logic circuits on a display screen, each gate and wire must know where it is on the display.

    Events, in a digital logic circuit, represent changes in the value of the output of a gate or the output of a wire. As in the traffic example, these output state changes are considered to be instantaneous, and for boolean logic we use only two values, true and false. (Not all logic is boolean; in the 1960's, the Russians built a run 50 ternary mainframe computers with the three values true, unknown and false.)

    It turns out that the model just presented is just a bit too simplistic: To realistically simulate digital logic circuits containing feedback loops, you need the time delay of gates to have a small random component. Without this, as it turns out, metastable states of the feedback loops do not behave realistically. While you have no need to know this about logic circuitry, the refinement of a model to incorporate such details is very typical of real-world modelling.

    Example 3: Neural Networks

    A third example, neural networks, model the behavior of neurons in the brains of animals, including people. Note that the term neural network has another meaning, referring to classification algorithms inspired by biological neural networks. Here, we are talking about models of actual nervous systems, not artificial intelligence algorithms. The components that matter in such a model are:

    Neurons
    A neuron is a type of cell that fires when its membrane potential exceeds some threshold. In the absence of external stimulii, the membrane potential decreases exponentially with a fixed half-life. There are two types of external stimulii, exitatory and inhibitory. Exitatory stimulii add a fixed increment to the membrane potential. Inhibitory stimulii subtract a fixed increment (or alternatively, they increase the threshold, which reverts exponentially to its default value).

    Axons
    An axon is a part of a neuron that connects the cell body with remote connections to other neurons. The only characteristic we care about is the delay from when the cell body fires to when the synapses on that axon transmit their inhibitory or excitatory inputs to other neurons. Each synapse may have a different delay.

    Synapses
    A synapse may be either inhibitory or excitatory, and in either case, it has a strength, the amount by which it changes the membrane potential of the neuron to which it connects. The difference between inhibition and excitation can be modeled by the sign of the strength, positive for excitation, negative for inhibition.

    This is vastly oversimplified, but it is still a useful approximation for how neural networks work. This model does not cover learning, which is believed to involve modification to the synapses, and it does not cover exhaustion of the neurotransmitters when a synapse fires too frequently. Again, adding such detail is typical of how models are refined.

    Example 4: Epidemic dynamics

    A model of an epidemic ignores many details of human behavior, viewing people only as being subject to infection. The goal of the model is to study the spread of infection through the community, so the model needs to track people's contact patterns. The basic components that matter in such a model are:

    People
    People move from place to place. When multiple people share the same place, if one of them is infected, the others have a probability of infection that depends on the place and how long they are together there. People have the following states, uninfected and vulnerable, latent, infectious, bedridden, recovered, and dead. When a person is infected, they become latent for a while, then they become infectious, while still going about their business, and then some fraction become bedridden, while others recover. Among the bedridden, some recover, and some die. When a person is not bedridden or dead, they move from place to place in patterns that depend on their class. Classes of people include students, workers, and homemakers.

    Families
    Families are groups of people that live together, that is, share some place (a home) as a base of operations.

    Places
    People move from their homes to schools, stores, and workplaces. Schools and stores are types of workplaces because some people go there for work. Schools are also frequented by students, and stores are also frequented by customers. Homemakers, workers and students have different patterns of visiting stores.

    This model can be made arbitrarily complex, giving it more and more detail, but the goal is to ask, if one person in the community becomes infected, how does the infection spread through the community, how many are bedridden at any time (a measure of Hospital demand), and how many die as a function of time.

    Once the basic model works, you can do things like examine the impact of policy changes such as closing schools when the infection rate crosses some threshold. With a slightly more complex model, subdividing workplaces between essential and non-essential, you can examine the impact of closing some categoryof non-essential businesses.

    Everything is an instance of a class

    The textbook has a chapter titled Everything is an Object, and in the world of object-oriented programming, that is true. When you look at a large programming problem and think about how to create code, every noun you find in the problem description is a very good candidate for the name of either an object or a class of objects in the program that solves that problem.

    For example, when you look at the screen of a computer, you typically see windows, icons and a cursor on that screen. An object-oriented implementation of the window manager for that computer will almost certainly be built on classes with names like Window, Icon and Cursor. If the window manager only supports one screen, there will probably be an object called screen. if the window manager supports multiple screens. There may well be a class, Screen, with one object of this class per screen, and possibly an object named currentScreen that names the object that that the user is currently focused on because the cursor is there. By convention, Java class names are almost always capitalized, while object names are usually in lower case. This is only a convention! Nothing requires this. Other conventions are used in some settings, but we will try to conform to the Java convention here.

    In our road-newtork example, there will be classes like Road, Intersection and Vehicle. In our logic-circuit example, there will be classes like Gate and Wire. In our neural-network example, there will be classes like Neuron and Synapse.

    The important thing to note in all of these examples is that we can actually construct a huge amount of the framework of a program by analyzing the classes that make up the problem and their relationship to each other. Significant parts of this work can be done long before we know what algorithms are involved, before we know what output the program is supposed to produce, and before we know what input the program will take.

    Class Definition in Java

    We spoke in the abstract about classes of objects in our discussion of modelling a road network, a digital logic circuit and a neural network. Now, let's talk about implementing these classes in Java. Initially, we'll talk about these classes as pure data. We'll add behavior later.

    If we are modelling a road network, we might begin with the following classes:

    class Road {
        // indent to here between braces
    }
    
    class Intersection {
    }
    

    An aside on code format

    Note in the above that the closing brace for each block is aligned under the keyword that opened the block, while the opening brace is at the end of the line (except perhaps for a comment). This indenting style is preferred both by our textbook's author and by me.

    There is a matter of style here. Java is perfectly happy if we write this without newlines, without comments, and with a minimum of spaces like this:

    class Road{}class Intersection{}
    

    That is not very readable, and the only reason to write code this way is to prevent it from being read. Unreadable code will not be tolerated in this class. Languages such as Python force you to use newlines and indenting, but Algol 60, Simula 67, C, Pascal, C++ and Java, among many others, leave indenting and newlines entirely to the programmer.

    We could also write it as follows, keeping the opening and closing braces vertically aligned with everything between indented. This style tends to push code onto more lines, pushing code off the bottom of the editing window.

    class Road
        {
            // indent to here between braces
        }
    
    class Intersection
        {
        }
    

    The style I use (and that the book uses) makes balancing brackets easy enough without pushing text off the bottom of your editing window.

    Indenting

    Another question about code format is, how much should you indent. Short indents, for example, 4 spaces, allow deeper nesting than deep indents without adding pressure for longer lines. Tab stops in plain text files such as are used to store programs usually default to every 8 characters, a default dating back to the early Unix system from around 1970. This is also the default supported by web browsers when displaying .txt files and when displaying text between <pre> and \<pre> tags in HTML. While most text editors allow you to set the tabs to other spacings, this causes trouble if you change editors, e-mail the code to someone else, or print the file using the default printer settings. So the best practice is to leave the tab setting at its default. If you want to use shorter tabs, use the space bar, not the tab key.

    The human mind can only digest a certain amount of complexity before it is overwhelmed. If your program really needs more than 4 or 5 levels of indenting, it may be too complex to understand and perhaps it should be broken up into digestable components. This suggests that indenting using one tab per indenting level is reasonable. This was the standard indenting convention in C for the first 15 years of use of that language. The Sun/Oracle formatting standards for Java suggest that 4 spaces is reasonable, while allowing 8 spaces. They emphasize uniformity over the exact value of the indenting step. Generally, it is very bad form to mix code that uses 4-space indenting with code that uses 8-space indenting.

    Line Length

    Similar arguments suggest that long lines are not a good idea. Yes, the 80 column default for terminal windows is directly descended from the fact that punched cards have 80 characters each (a standard IBM introduced in 1928). This is an archaic reason for the default length of a line, but it is not a bad length. The 80 column standard is based on the length of a line of text on a typical page of typing paper. That, in turn, is based on experience with easy readability.

    When pages get wider than on the order of 80 characters, it gets difficult for the reader to track from the end of one line to the start of the next. If you're reading this on the web with your web browser window maximized to take up the full screen, your reading speed will be significantly reduced compared to your reading speed with the window width set somewhere in the range from 50 to 100 characters.

    When faced with wide pages, people have long opted for multi-column text. This goes back to the days when hand-written ink on parchment was the standard, and it continues today in contexts such as large-format books and newspapers. Keep this in mind when you are tempted to simply widen your editing window and write really long lines of code. Oracle's standard for Java formatting requires lines to be no more than 80 charactes. We will enforce this standard.

    Back to the road network

    Regardless of how you indent it, the code given above is a framework, but we can store this in a file and start testing immediately. Consider using the file RoadNetwork.java to hold the code for a road-network simulation

    Of course, we ought to document this file with appropriate commentary, so right up front, before starting to write any code ,we'll add some notes:

    // RoadNetwork.java - Classes needed to describe a road network
    
    /** Roads are one-way connections between intersections
     *  @author Douglas Jones
     *  @version -1?
     *  @see Intersection
     */
    class Road {
        // Bug: Lots of details are missing
    }
    
    /** Intersections join roads
     *  @see Road
     */
    class Intersection {
        // Bug: Lots of details are missing
    }
    
    // Bug: Java demands that this file contain class RoadNetwork
    

    The above commentary uses the javadoc style of comments so that, later, when the program grows huge, we can use the javadoc tool to generate a documentation file from these comments. Note that Javadoc insists that the comment documenting any specific class, field or method be placed directly before that class definition.

    In short, the special comment marker /** opens a Javadoc comment, and the marker */ closes the comment. Between these two markers, you can put arbitrary text, but the @ symbol causes the following text to be processed specially. Look up Javadoc in Wikipedia; that's not a bad introduction.

    Test this! Save the above Java code and use the javac command to make sure you have not messed up, then come back and start thinking about the next step. Let's start fleshing out the first class:

    class Road {
        float travelTime;         //measured in seconds
        Intersection destination; //where the road goes
        // Bug: do we need to know where this road comes from?
    }
    

    One attribute of each road is its length, but (at least for this class) we aren't as worried about the physical length of the road as how long it takes to travel down the road. So, we'll measure length in terms of the travel time for a vehicle going at the speed limit. The decision to measure travel time in seconds is arbitrary.

    Once a vehicle enters a road, it must end up somewhere, so we also added a field that indicates what intersection we get to if we get on this road. This, in turn, implies that each road is a one-way connection from some source intersection to some destination intersection. If you want to model two-way roads, you do it with a pair of one-way roads, one for each direciton. If you want to permit U turns at some point along a two-way road, you do it by adding an intersection.

    We also added a comment indicating a currently unanswered question: When looking at a road, do we ever need to know where that road came from? We don't need the answer immediately, but if we need this information, we left a comment, a bug notice, indicating where the information should be stored if we do need it. Later, after we start thinking about simulation algorithms, we'll find the answer.

    It is a really good idea to adopt a convention of writing comments in your code to document bugs and other things you don't understand. If you consistently use a word like Bug to mark such comments, you'll have a very easy time finding places you marked earlier as needing work. Do not put off writing comments until the end. Time spent thinking about how to comment your code is usually time well spent because it forces you to think about the code you have and recognize bugs early in the design process.

    As an incentive to think about comments early, if you need help with code and we see that it does not have comments, we'll ask you to fix that before we look at the code.

    An aside on capitalization

    The above code fragment illustrates two issues: The first has to do with multiple-word variable names. It might have been nice to call one variable travel time and the other intersection destination, but in Java, you cannot put spaces inside an identifier. Other languages differ. In FORTRAN (the oldest high-level programming language), spaces are allowed in identifiers. In fact, in FORTRAN, all spaces are ignored, so they can be added at random.

    An alternative to the style used in the code here (and in the textbook) is to use underscore as a space character in identifiers, for example, travel_time. This is a very popular style.

    The style used in the text, and here, has been called StudlyCaps as if there is something masculine about squeezing out the spaces and capitalizing the first letter of each word, and also BiCapitalization.

    The secnd issue surrounding the use of capital letters is a matter of convention: Here, the first letter of each class name is capitalized, while this is not done for variable names. When you define a new symbol, you can capitalize it any way you want, but conventions can improve readability.

    So why aren't the names of built-in classes like int and float capitalized? There are two explanations:

    First: We could claim that this is to emphasize the fact that int and float are not quite first-class classes. If a Java object is from a first-class class, it inherits a large number of attributes from the superclass of all classes. This has a high cost. Objects of built-in classes like int and float don't inherit these attributes. They have much more limited semantics in order to allow very efficient execution.

    There is a full-scale class, Integer, that does everything that class int does, but more slowly. Each Integer has a single field of type int. Similarly, there is a class Float. These classes are useful because they contain a number of attributes and methods supporting the built-in classes.

    Second: We could give the actual explanation. The type names int and float come from C and C++. Java didn't change things that worked just fine in those older languages.

    Back to the road network

    We can continue fleshing out our road network by adding comments to the definition of an intersection. We have some problems to solve here: How does one include a set of outgoing roads in a class? How does one create a class that comes in several types: uncontrolled intersections, intersections where some road has a stop sign, intersections where all incoming roads have stopsigns? Does the intersection even need to know the identities of its incoming roads?

    /** Intersections join roads
     *  @see Road
     */
    class Intersection {
        // Bug: multiple outgoing roads
        // Bug: multiple incoming roads
        // Bug: multiple types of intersections (uncontrolled, stoplight)
    }
    

    Class vehicle has the potential to have attributes like cargo capacity and passenger capacity, but those depend on why we are building the model. Initially, our biggest question about vehicles is, does the vehicle need to know its current location? The answers to these questions depend on how we use the model, but we need to go quite some distance before that matters.

    /** Vehicles travel on roads through intersections
     *  @see Intersection
     *  @see Road
     */
    class Vehicle {
        // Bug: what are the relevent attributes of a vehicle?
        // Bug: do vehicles need to know their current location?
    }
    

    Finally, as mentioned in the previous lecture we will eventually need to worry about events. We will put off that issue until we dive into discrete event simulation in considerably more detail.