3. Models and Objects
Part of
CS:2820 Object Oriented Software Development Notes, Fall 2020
|
A model is an abstraction of some aspect of reality. So, for example, we can describe a flood control reservoir such as the Coralville reservoir by a set of variables and functions:
The model connects these variables and functions to describe the behavior of the reservoir:
This model is an oversimplification. It ignores groundwater, it ignores bank erosion, it ignores evaporation. All real models are oversimplifications.
The U.S. Army Corps of Engineers in Rock Island spends quite a bit of effort on such models, but they are not what we will focus on here. The above model is a model of a continuous system, the kind of system that is described by differential equations. Take a numerical analysis course if you want to learn more about that kind of modeling. Here, we will focus on discrete event models.
For a second example, consider orbital dynamics. Consider a model of a satellite in near Earth orbit. For short-term dynamics, we can ignore the effects of the Moon, the Sun and other planets, and we can ignore the rotation of the Earth, treating both the Earth and Moon as point masses. The variables that interest us are:
Some constants matter:
All of the above are 3 dimensional vectors, with x y and z components. The basic rules are the rules of physics:
Given an initial position and velocity, this model predicts the future behavior of the satellite. Smaller values of Δt and higher precision arithmetic lead to more precise predictions. Numerical analysis can be used to derive far more accurate models from this framework.
Our modern ideas of object-oriented programming originated in a Norwegian research group that was heavily ivolved in discrete event models and discrete event simulation. Here are some examples of systems that illustrate this kind of model:
Consider modelling a highway network. A model of a highway network will typically have:
In an object-oriented model, we can consider each of the above to be the name of a class, and each attribute of the objects in a class can be considered to be a field of that class. Some of these classes are more complex than others. Intersections have subclasses depending on the control algorithm. Vehicles may have subclasses that depend on the type (taxis and dump trucks have very different behavior).
The events that interest us constitute another class. The following events matter in this highway network:
The primary attribute of every event is the time at which it occurs. In a discrete event model, each event occurs at a specific instant of time. Any process that takes time is considered to occur as a sequence of events spanning that time, where each event in that sequence marks a points where the state of the model must be checked to determine when the next event in that sequence occurs.
Of course, if your model is being used in a context where there is a graphic display of the progress of the model, each vehicle, road and intersection will need to have display coordinates associated with it.
As a second example consider digital logic. Here, the major classes were:
Of course, if your goal is to manipulate digital logic circuits on a display screen, each gate and wire must know where it is on the display.
Events, in a digital logic circuit, represent changes in the value of the output of a gate or the output of a wire. As in the traffic example, these output state changes are considered to be instantaneous, and for boolean logic we use only two values, true and false. (Not all logic is boolean; in the 1960's, the Russians built a run 50 ternary mainframe computers with the three values true, unknown and false.)
It turns out that the model just presented is just a bit too simplistic: To realistically simulate digital logic circuits containing feedback loops, you need the time delay of gates to have a small random component. Without this, as it turns out, metastable states of the feedback loops do not behave realistically. While you have no need to know this about logic circuitry, the refinement of a model to incorporate such details is very typical of real-world modelling.
A third example, neural networks, model the behavior of neurons in the brains of animals, including people. Note that the term neural network has another meaning, referring to classification algorithms inspired by biological neural networks. Here, we are talking about models of actual nervous systems, not artificial intelligence algorithms. The components that matter in such a model are:
This is vastly oversimplified, but it is still a useful approximation for how neural networks work. This model does not cover learning, which is believed to involve modification to the synapses, and it does not cover exhaustion of the neurotransmitters when a synapse fires too frequently. Again, adding such detail is typical of how models are refined.
A model of an epidemic ignores many details of human behavior, viewing people only as being subject to infection. The goal of the model is to study the spread of infection through the community, so the model needs to track people's contact patterns. The basic components that matter in such a model are:
This model can be made arbitrarily complex, giving it more and more detail, but the goal is to ask, if one person in the community becomes infected, how does the infection spread through the community, how many are bedridden at any time (a measure of Hospital demand), and how many die as a function of time.
Once the basic model works, you can do things like examine the impact of policy changes such as closing schools when the infection rate crosses some threshold. With a slightly more complex model, subdividing workplaces between essential and non-essential, you can examine the impact of closing some categoryof non-essential businesses.
The textbook has a chapter titled Everything is an Object, and in the world of object-oriented programming, that is true. When you look at a large programming problem and think about how to create code, every noun you find in the problem description is a very good candidate for the name of either an object or a class of objects in the program that solves that problem.
For example, when you look at the screen of a computer, you typically see windows, icons and a cursor on that screen. An object-oriented implementation of the window manager for that computer will almost certainly be built on classes with names like Window, Icon and Cursor. If the window manager only supports one screen, there will probably be an object called screen. if the window manager supports multiple screens. There may well be a class, Screen, with one object of this class per screen, and possibly an object named currentScreen that names the object that that the user is currently focused on because the cursor is there. By convention, Java class names are almost always capitalized, while object names are usually in lower case. This is only a convention! Nothing requires this. Other conventions are used in some settings, but we will try to conform to the Java convention here.
In our road-newtork example, there will be classes like Road, Intersection and Vehicle. In our logic-circuit example, there will be classes like Gate and Wire. In our neural-network example, there will be classes like Neuron and Synapse.
The important thing to note in all of these examples is that we can actually construct a huge amount of the framework of a program by analyzing the classes that make up the problem and their relationship to each other. Significant parts of this work can be done long before we know what algorithms are involved, before we know what output the program is supposed to produce, and before we know what input the program will take.
We spoke in the abstract about classes of objects in our discussion of modelling a road network, a digital logic circuit and a neural network. Now, let's talk about implementing these classes in Java. Initially, we'll talk about these classes as pure data. We'll add behavior later.
If we are modelling a road network, we might begin with the following classes:
class Road { // indent to here between braces } class Intersection { }
Note in the above that the closing brace for each block is aligned under the keyword that opened the block, while the opening brace is at the end of the line (except perhaps for a comment). This indenting style is preferred both by our textbook's author and by me.
There is a matter of style here. Java is perfectly happy if we write this without newlines, without comments, and with a minimum of spaces like this:
class Road{}class Intersection{}
That is not very readable, and the only reason to write code this way is to prevent it from being read. Unreadable code will not be tolerated in this class. Languages such as Python force you to use newlines and indenting, but Algol 60, Simula 67, C, Pascal, C++ and Java, among many others, leave indenting and newlines entirely to the programmer.
We could also write it as follows, keeping the opening and closing braces vertically aligned with everything between indented. This style tends to push code onto more lines, pushing code off the bottom of the editing window.
class Road { // indent to here between braces } class Intersection { }
The style I use (and that the book uses) makes balancing brackets easy enough without pushing text off the bottom of your editing window.
Another question about code format is, how much should you indent. Short indents, for example, 4 spaces, allow deeper nesting than deep indents without adding pressure for longer lines. Tab stops in plain text files such as are used to store programs usually default to every 8 characters, a default dating back to the early Unix system from around 1970. This is also the default supported by web browsers when displaying .txt files and when displaying text between <pre> and \<pre> tags in HTML. While most text editors allow you to set the tabs to other spacings, this causes trouble if you change editors, e-mail the code to someone else, or print the file using the default printer settings. So the best practice is to leave the tab setting at its default. If you want to use shorter tabs, use the space bar, not the tab key.
The human mind can only digest a certain amount of complexity before it is overwhelmed. If your program really needs more than 4 or 5 levels of indenting, it may be too complex to understand and perhaps it should be broken up into digestable components. This suggests that indenting using one tab per indenting level is reasonable. This was the standard indenting convention in C for the first 15 years of use of that language. The Sun/Oracle formatting standards for Java suggest that 4 spaces is reasonable, while allowing 8 spaces. They emphasize uniformity over the exact value of the indenting step. Generally, it is very bad form to mix code that uses 4-space indenting with code that uses 8-space indenting.
Similar arguments suggest that long lines are not a good idea. Yes, the 80 column default for terminal windows is directly descended from the fact that punched cards have 80 characters each (a standard IBM introduced in 1928). This is an archaic reason for the default length of a line, but it is not a bad length. The 80 column standard is based on the length of a line of text on a typical page of typing paper. That, in turn, is based on experience with easy readability.
When pages get wider than on the order of 80 characters, it gets difficult for the reader to track from the end of one line to the start of the next. If you're reading this on the web with your web browser window maximized to take up the full screen, your reading speed will be significantly reduced compared to your reading speed with the window width set somewhere in the range from 50 to 100 characters.
When faced with wide pages, people have long opted for multi-column text. This goes back to the days when hand-written ink on parchment was the standard, and it continues today in contexts such as large-format books and newspapers. Keep this in mind when you are tempted to simply widen your editing window and write really long lines of code. Oracle's standard for Java formatting requires lines to be no more than 80 charactes. We will enforce this standard.
Regardless of how you indent it, the code given above is a framework, but we can store this in a file and start testing immediately. Consider using the file RoadNetwork.java to hold the code for a road-network simulation
Of course, we ought to document this file with appropriate commentary, so right up front, before starting to write any code ,we'll add some notes:
// RoadNetwork.java - Classes needed to describe a road network /** Roads are one-way connections between intersections * @author Douglas Jones * @version -1? * @see Intersection */ class Road { // Bug: Lots of details are missing } /** Intersections join roads * @see Road */ class Intersection { // Bug: Lots of details are missing } // Bug: Java demands that this file contain class RoadNetwork
The above commentary uses the javadoc style of comments so that, later, when the program grows huge, we can use the javadoc tool to generate a documentation file from these comments. Note that Javadoc insists that the comment documenting any specific class, field or method be placed directly before that class definition.
In short, the special comment marker /** opens a Javadoc comment, and the marker */ closes the comment. Between these two markers, you can put arbitrary text, but the @ symbol causes the following text to be processed specially. Look up Javadoc in Wikipedia; that's not a bad introduction.
Test this! Save the above Java code and use the javac command to make sure you have not messed up, then come back and start thinking about the next step. Let's start fleshing out the first class:
class Road { float travelTime; //measured in seconds Intersection destination; //where the road goes // Bug: do we need to know where this road comes from? }
One attribute of each road is its length, but (at least for this class) we aren't as worried about the physical length of the road as how long it takes to travel down the road. So, we'll measure length in terms of the travel time for a vehicle going at the speed limit. The decision to measure travel time in seconds is arbitrary.
Once a vehicle enters a road, it must end up somewhere, so we also added a field that indicates what intersection we get to if we get on this road. This, in turn, implies that each road is a one-way connection from some source intersection to some destination intersection. If you want to model two-way roads, you do it with a pair of one-way roads, one for each direciton. If you want to permit U turns at some point along a two-way road, you do it by adding an intersection.
We also added a comment indicating a currently unanswered question: When looking at a road, do we ever need to know where that road came from? We don't need the answer immediately, but if we need this information, we left a comment, a bug notice, indicating where the information should be stored if we do need it. Later, after we start thinking about simulation algorithms, we'll find the answer.
It is a really good idea to adopt a convention of writing comments in your code to document bugs and other things you don't understand. If you consistently use a word like Bug to mark such comments, you'll have a very easy time finding places you marked earlier as needing work. Do not put off writing comments until the end. Time spent thinking about how to comment your code is usually time well spent because it forces you to think about the code you have and recognize bugs early in the design process.
As an incentive to think about comments early, if you need help with code and we see that it does not have comments, we'll ask you to fix that before we look at the code.
The above code fragment illustrates two issues: The first has to do with multiple-word variable names. It might have been nice to call one variable travel time and the other intersection destination, but in Java, you cannot put spaces inside an identifier. Other languages differ. In FORTRAN (the oldest high-level programming language), spaces are allowed in identifiers. In fact, in FORTRAN, all spaces are ignored, so they can be added at random.
An alternative to the style used in the code here (and in the textbook) is to use underscore as a space character in identifiers, for example, travel_time. This is a very popular style.
The style used in the text, and here, has been called StudlyCaps as if there is something masculine about squeezing out the spaces and capitalizing the first letter of each word, and also BiCapitalization.
The secnd issue surrounding the use of capital letters is a matter of convention: Here, the first letter of each class name is capitalized, while this is not done for variable names. When you define a new symbol, you can capitalize it any way you want, but conventions can improve readability.
So why aren't the names of built-in classes like int and float capitalized? There are two explanations:
First: We could claim that this is to emphasize the fact that int and float are not quite first-class classes. If a Java object is from a first-class class, it inherits a large number of attributes from the superclass of all classes. This has a high cost. Objects of built-in classes like int and float don't inherit these attributes. They have much more limited semantics in order to allow very efficient execution.
There is a full-scale class, Integer, that does everything that class int does, but more slowly. Each Integer has a single field of type int. Similarly, there is a class Float. These classes are useful because they contain a number of attributes and methods supporting the built-in classes.
Second: We could give the actual explanation. The type names int and float come from C and C++. Java didn't change things that worked just fine in those older languages.
We can continue fleshing out our road network by adding comments to the definition of an intersection. We have some problems to solve here: How does one include a set of outgoing roads in a class? How does one create a class that comes in several types: uncontrolled intersections, intersections where some road has a stop sign, intersections where all incoming roads have stopsigns? Does the intersection even need to know the identities of its incoming roads?
/** Intersections join roads * @see Road */ class Intersection { // Bug: multiple outgoing roads // Bug: multiple incoming roads // Bug: multiple types of intersections (uncontrolled, stoplight) }
Class vehicle has the potential to have attributes like cargo capacity and passenger capacity, but those depend on why we are building the model. Initially, our biggest question about vehicles is, does the vehicle need to know its current location? The answers to these questions depend on how we use the model, but we need to go quite some distance before that matters.
/** Vehicles travel on roads through intersections * @see Intersection * @see Road */ class Vehicle { // Bug: what are the relevent attributes of a vehicle? // Bug: do vehicles need to know their current location? }
Finally, as mentioned in the previous lecture we will eventually need to worry about events. We will put off that issue until we dive into discrete event simulation in considerably more detail.