Machine Problem 7

Due Mar 22, on line

Part of the homework for CS:2820, Spring 2021
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Background

To model the progress of a disease through the population, we need to add some new attributes for people, and we need to add some new inputs to the model description file. The key new attribute for each person in the model is that person's infection state. The infection states are:

In order to generate useful simulation output, we need to track the number of people in each of these infection states, so so for the entire population, we track the number of uninfected, latent, asymptomatic, symptomatic, bedridden, recovered and dead people.

The dead infection state may not be needed, since people can be removed from the model instead of marking them as dead. When it is determined that a person dies, the count of dead people in the population statistics must be increased, and then all records of the existance of that person can be erased.

The key characteristics of each of these states are how long a person is in that state. This can be described by a probability distribution; the log-normal distribution is quite adequate for this job, with both the median and scatter times given in days. In addition, for each state from which a person can recover, there is a probability of recovery. When a person is bedridden and does not recover, they die. For our model, people never die without first becoming, at least briefly bedridden.

Added Features for the Model Description File

These lead us to the following new types of input in the model description file:

infected 5;

latent       2   1.5;
asymptomatic 3   2.5;
symptomatic  5.5 2.5  0.9;
bedridden    8   5    0.9;

end 20;

In this example we have said that 5 people in our population are initially infected. Those 5 must be created in disease state latent while all others are created in state uninfected. The total number of initially infected people should not be greater than the population.

The median latency time for the disease described above is 2 days, with a scatter of 1.5 days, it is asymptomatic for 3 days with a scatter of 2.5 days. The symptomatic and bedridden phases of the disease are described similarly, with an added field giving the probability of recovery. In the above example, both probabilities are set to 0.9. As a result, we expect about 0.01 of the victims of this disease to die.

The use of whitespace above is merely for the sake of example. The disease description could have been all mushed together on one line. The final line specifies that the simulation is to be run for 20 days of simulated time. Failure to specify an end time is fatal, and the end time must be strictly positive.

Simulation Output

The output should be one line per day, reporting daily at midnight, where the very first midnight report defined as time zero and each subsequent reports is exactly 24-hours of simulated time after the previous one.

Each report should give the time in days, followed by the number of people in each infection state, all in CSV format, with the states in their conventional order (uninfected first, dead last). For example, if the model specifies a population of 100 with 10 people initially infected, the first line of output would be:

0.0,90,10,0,0,0,0,0

This indicates that at time 0.0, 90 people were uninfected, 10 people were infected (latent), and none were asymptomatic, symptomatic, bedridden, recovered or dead.

CSV format is well documented on Wikipedia. The genereal rule is to use commas as separators, newlines to separate consecutive output records, and no added whitespace. This conservative data format allows many spreadsheets and graphing tools to take the output of our program and display it, while leaving the output reasonably easy to read and debug without the need for other tools.

Note: Some CSV tools allow the first line to be used the table or row headings. This is an optional part of CSV format, and you should not output a title line. If your program is used as input to a tool that wants such a title line, just use the cat shell command to prefix this title line (or a line with abbreviations to allow a more compact table):

time,uninfected,latent,asymptomatic,symptomatic,bedridden,recovered,dead

Your program should produce no other output!

Assignment

Extend any solution to MP5 so that:

Also in writing your code, think ahead. In future versions:

The usual rules and coding standards apply. Your code should conform reasonably to the Sun/Oracle rules for Java formatting: 4-space indents, appropriate use of comments, 80-character lines, etc. You should check your code for annoying problems with ~dwjones/format. This will identify overlength lines and flag other undesirable problems with your text file such as residual DOS artifacts that come from transferring files between Windows and Linux systems.

Submision

As usual, submit using the ~dwjones/submit utility. Your program should be in a single source file called epidemic.java.

Questions and Answers

A student asked: I am having trouble finding a roadmap through the problem to a solution!

Do the simplest thing first: The end directive from the input file, with its time in hours should suffice. When you see this, schedule the end of time event. Then see if the simulatin works. The simulation should simply terminate (for debugging, you might add output saying end-of-time and saying when). If this much works, you've scheduled an event and simulated it using the simulation framework distributed in the lecture notes.

The second simplest thing is the daily report: At the beginning of time, schedule the first report at time 24 hours. Each report event should schedule another report event 24 hours latere and print out the report. The report can be very simple at first, just the time in days alone by itself, one time per line. Later, you'll have more stuff to report. Now, you can test it. If you ask for a 10 day run, you should get 10 numbers in ascending order on ten lines of output.

The daily report code probably goes in the class where the population statistics will eventually go. That may take some thinking. Since only people are involved in these statistics, they could be part of the person class, but perhaps they go elsewhere?

Independently of the above, you can think about how to infect the initial number of people. You can't add or delete people when you decide that some are "born" infected (latent) while some are born uninfected, and it seems undesirable to cluster all the initial infections in one role. Making the initial infections random over the community is not hard and can be done in two completely different ways:

How do you infect someone? The obvious answer (this is objeect-oriented program design after all) is to have a method applicable to person objects that you call to infect them. If they were uninfected, this should change them to latent, while if they were already infected, this should have no effect.

The infect method can track the changes in statistics for the transition from uninfected to infected, but initially, you're just out to debug, so you don't worry about making the disease progress.

Now, as a next step, it's time to worry about disease progression. When a person is infected, a number should be drawn from the random number distribution for the disease latency. That number determines when you schedule the transition from latent to infectious but asymptomatic. The code for that transition can track the population statistics for that change and, once you get it working, serve as a prototype for the code for the trasition from asymptomatic but symptomatic, and so on.

Many of the above steps will involve adding code to process one or another new kind of keyword in the input file. The code for all of these will basically be quite similar, so if you can read the time keyword followed by a time, you can read any of the other keywords followed by their numbers. As usual, any input errors should lead to no simulation, and input errors should always produce appropriate error messages and never throw an exception. You know how to do this now, or at least you have code that does it pretty well. Use what that code offers!

A student asked: What do you mean, squeezing all the excess whitespace out of the input file.

I certainly don't mean squeezing all the whitespace out, but I mean squeezing out any whitespace not needed to unambiguously pick apart the input. The following illustates this kind of minimal whitespace:

infected 5;latent 2 1.5;asymptomatic 3
2.5;symptomatic 5.5 2.5 0.9;bedridden
8 5 0.9; end 20;

The above example isn't very readable, but it can be picked apart with the MyScanner methods from the posted solution to MP5, and the same exact code would read nicely formatted input with ample whitspace and newlines for readability.

A student asked: In the example provided it said that 5 were infected and they were placed in latent status. The median latency time for the disease described above is 2 days, with a scatter of 1.5 days. It doesn't provide a probability that they become asymptomatic. Do latent people ever become asymptomatic?

Latent people always become asymptomatic, 100% of the time. Asymptomatic people always become symptomatic, 100% of the time. Symptomatic people eithe recover with the probability given or become bedridden. Bedridden people always either recover with the probability given or become dead. So, starting the model with some sick people, each sick person should progress over a period of days through all the stages of the simulated disease, ending up either recovered or dead.

A student asked: Secondly, if people do move to the symptomatic phase will each event occur at an arbitrary time in the day?

Yes. Each infected person changes infection state at an arbitrary time, selected by drawing a random number from the probability distribution. If I become asymptomatic but infectious at 10:35AM Tuesday, with the example statistics, I draw a random time from a log-normal distribution with a median of 3 days and a scatter of 2.5 days. Say the result is 4 days 3 hours and 2 minutes. That means I will become infectious at 1:37PM Friday.

The point is, events are scheduled for the infected person when that person's disease state is scheduled to change.

A student asked: I am having difficulty understaning how the disease progresses.

Each change in disease state, for each person, is an event. So, if p is a person, you might want to call p.infect() to infect that person. The infect() method be responsible for:

The logic for becoming asymptomatic would be similar, perhaps you could package that in a method called beInfectious(), so inside infect() there would be a call to schedule looking something like this:

Simulator.schedule( time + latencyTime, (double t)-> this.beInfectious( t ) );

Similarly, the logic for becoming symptomatic might be packaged in a method called feelSick() and so on all the way up to die() and recover.

A student asked: I need a really simple example of the use of class Simulator.

The following main class can be run in the context of class Simulator. It schedules a series of 11 step() events at times 0.0, 1.0, 2.0 and so on up to 10.0. Note that the last step() may or may not be run because the program schedules it at exactly the same time as the exit() event marking the end of time.

public class Demo {
    private static void step( double time ) {
        Simulator.schedule( time + 1.0, (double t)-> step( t ) );
        System.out.println( "step " + time );
    }
    public static void main( String arg[] ) {
        Simulator.schedule( 10.0, (double t)-> System.exit( 0 ) );
        Simulator.schedule( 0.0, (double t)-> step( t ) );
        Simulator.run();
    }
}

A student asked: How does the material from MP4 and MP5 (people, roles, places, schedules) relate to the material from MP7 (disease states)?

People have disease states, and the population from MP4 and MP5 is the populatioin you are infecting.

That's a really weak relationship, but it lets us begin to simulate something on the framework we've got.

The next step will be for people to begin moving around, following the schedules on the places they are associated with. This will mean, for each person, scheduling their moves in the simulation. It's better to do this after you have a basic simulation framework in place and tested, something that this MP7 has you do.

The step after that will be to have people start infecting other people when they happen to be in the same place. At that point, we will have a working epidemic simulator.

A student asked: So all this means that if 10 people out of 100 are initially infected. The daily at midnight reports will show the progress of the disease for those 10 while the 90 people who were initially uninfected stay uninfected, right?

Exactly. Until we introduce contageon into our model, the disease does not spread. This machine problem just focuses on having the disease progress in those individuals who were infected at the start.

A student asked: What do we do about the MP5 output of the population.

The only output should be the CSV text. The output from MP5 was just for debugging the model's data structures, but those are now debugged. In my solution, I simply commented out the call to Person.printAll(). All of the input to MP5 should still build the model. In my tests, I've been using this much just to make sure there's a role and a place, and then adding appropriate MP7 lines after this start:

population 10;
place world 10 0;
role human 1 world;

A student asked: What about sanity constraints on times?

The times input for the disease parameters and simulation duration are given in floating point days. Median times must be strictly positive, and scatters must be non-negative. Of course, if someone types in 1, that means 1.0. The getNextFloat() in the model MP5 solution does that just fine.

The times for schedules, as input to MP5, should still work, and those are in floating point hours.

In this class, we're not likely to add refinements such as allowing people to give times like "8 days" or "16 hours" as input. In my model solution, class Time would be an interesting place to add service code such as getNextTime() that could read such cute data formats and reduce them to values of type double. Similarly, the constructor for Schedule objects could be enhanced to read things like "(8:15-9:20AM TuTh)." Adding these cute features would be busy work, so we're in no hurry to do that.

A student asked: Could you give an example of correct input and output?

Here is a small example input:

population 10;
infected 5;
place world 1 0;
role human 1 world;
end 10;

latent       2.0 1.5;
asymptomatic 3   2.5;
symptomatic  5.5 2.5 0.9;
bedridden    8   5   0.9;

And here is the output from one run.

0.0,5,5,0,0,0,0,0
1.0,5,4,1,0,0,0,0
2.0,5,3,2,0,0,0,0
3.0,5,1,3,1,0,0,0
4.0,5,0,4,1,0,0,0
5.0,5,0,4,1,0,0,0
6.0,5,0,3,2,0,0,0
7.0,5,0,2,3,0,0,0
8.0,5,0,0,5,0,0,0
9.0,5,0,0,5,0,0,0

Because of the randomness in the model, you'll get different numbers on each run. Here's another run:

0.0,5,5,0,0,0,0,0
1.0,5,4,1,0,0,0,0
2.0,5,1,4,0,0,0,0
3.0,5,0,4,1,0,0,0
4.0,5,0,3,2,0,0,0
5.0,5,0,2,3,0,0,0
6.0,5,0,2,3,0,0,0
7.0,5,0,1,4,0,0,0
8.0,5,0,1,4,0,0,0
9.0,5,0,1,3,0,1,0

We didn't run it long enough here for anyone to get bedridden, but one sick person did recover from the symptomatic phase. With such a small population, the variation from run to run will be big.

A student asked: In lecture, you noted that the structure of the latent, asymptomatic, symptomatic and bedridden input lines was so similar that a common bit of code could be used to pick apart the pieces of these, building for each an object containing the parameters for a log-normal distribution and a recovery probability. What is the default recovery probablity?

I mentiond this in passing both Mar. 19 and Mar. 22. For latent and asymptomatic, where no recovery probability is given, the meaning is that they always advance to the next state without recovery. This means that a missing recovery probability translates to a probabaility of zero! In my solution, I used this idean and allow the recovery probability to be omitted anywhere or added anywhere, taking the missing value as zero.

I will not test this behavior in the submitted work because it is a generalization that occurred to me after I made the assignment, except, of course, that I will not give recovery probabilities for latent and asymptomatic, where there should be no spontaneous recovery, and I will give them on symptomatic and bedridden, where the result should be some recoveries.