CS:2820 Notes, Lecture 14

The Java library provides a class, Random, where each instance of that class is a stream of pseudo-random numbers. The key word here is pseudorandom. No algorithm can generate truly random numbers. Algorithms are, by their very nature, deterministic unless they rely on an external source of randomness.

Pseudo-random number generators have outputs that appear random, but they are really deterministic. A typical pseudo-random number generator is based on the following skeleton:

SeedType seed = initialValue;

int next() {
    seed = f(seed);
    return g(seed);
}

The key parts of this are the type of seed, the initial value of seed, the function used to update seed, and how values of seed are converted to the values returned to the user.

Some pseudo-random number generators keep the seed as an array of integers, but most use a integer seed. Java's class Random defines itself as using a 48-bit seed and a linear congruential function. All linear congruential pseudo-random number generators can be described as variations on this function:

int f(int i) {
    return (A*i + B) % C;
}

Of course, we need to use an appropriate number of bits for the type int, 48 bits in the case of Java's Random. The key thing is, with the right constants A, B and C, the sequence of values that go in seed appears to be entirely random, except that it recycles after some number of numbers are drawn from the generator. A generator with a 48 bit seed will have a period no longer than 2⁴⁸, and with the right A, B and C, it can have a period longer than 2⁴⁷. That is, the period is longer than 10¹⁴. This makes it unlikely that a program will ever see a repeat, so long as it has only one random number stream.

The initial value of seed does not add any randomness. All it does is change where in the cycle we start. By default, Java harvests some randomness from the environment, for example, using the time of day, how many processes have been started on this computer since it was turned on, the time interval between the two most recent Internet transactions, or things like that. This harvested randomness is used to decide where to start in the cycle in order to give greater apparent randomness.

Users can start the generator at any user-selected point if they want truly deterministic behavior. This is useful primarily during debugging, in order to make it possible to re-run a program after a change and see what that change did without the impact being hidden by randomness.

Note that the values of seed given by a linear congruential pseudo-random number generator are integers uniformly distributed over the range 0 to C–1. Java's class Random always returns the top bits of the number, since the least significant bits output by linear congruential generators tend to be more predictable, and when you ask for a long integer, it concatenates two consecutive 32-bit values.

Java creates floating-point random values between 0.0 and 1.0 by drawing an appropriate integer and then dividing by the range to scale the value into the range required.

The big problem with Java's class Random is that Java allows users to create multiple instances. Each instance has its own seed, but all of them run the same algorithm. As a result, the sequences of values drawn from two different instances of Random can easily end up overlapping. The more streams you create and the more numbers you draw from each stream, the more likely it is that there will be an overlap.

The standard advice for avoiding this problem is: Never ever use more than one instance of a pseudo-random number stream. Seed that stream well, and draw all the random numbers you need from the same stream.

Note, multiplicative congreuence pseudo-random number generators should never be used for cryptography or other secure applications. Good pseudo-random number generators are very hard to build, and ones that are good enough for secure applications are even more difficult. Java's class Random is good enough for simulation applications like ours, and good enough for gaming, but it has its limits.

Practical consequences

The first practical consequence of the above is that we should create a single global class that encapsulates the pseudo-random number stream we need. One way to do this is with a class that is never used to construct objects, all it does is group static variables and methods:

class MyRandom {
    private MyRandom() {} // constructor only to prevent instantiation

    public final Random rand = new Random();    // the only random stream
    // for repeatable debugging, use Random( 1 ) (or any other fixed constant)

    ...
}

This works. We've already used this approach to group our error handling methods in class Error. This has the disadvantage that we have to write long winded code like MyRandom.rand.nextInt() whenever we want a random number, and it also prevents us from passing this source of randomness as a parameter.

Class MyRandom extends Random {
    private MyRandom() {
	super(); // the Random() constructor
	// during debugging, we can pass a fixed seed here
    }
    private static final Myrandom stream = MyRandom();

    // the only way a user can get access to the only instance of MyRandom
    public static MyRandom stream() {
        return myStream;
    }

    // all methods of class Random are available to the user

    // new methods extending the built-in Random class

    /** exponential distribution
     *  @param mean -- the mean value of the distribution
     *  @return a positive exponentially distributed random value
     */
    public double nextExponential( double mean ) {
        return mean * -Math.log( this.nextDouble() );
    }
}

The way Java handles enumerations is closely related to the above, except that there are a fixed number of instances, one per enumeration constant. Werever we need a random value in our application that uses the above framework, we write code like this:

MyRandom rand = Myrandom.stream();

This one line of code looks like it might be calling a factory method to construct a new stream, but it is not. Instead, every single call to MyRandom.stream() will return the same object, whether the declaration is static or not. There is no point in making it anything but static because it will always be the same stream, but if someone makes this an instance variable, nothing will break, the only cost of this error is an increased cost of object construction.

Had we made stream public, we could have also allowed direct user access to the stream like this:

MyRandom rand = Myrandom.stream;

So long as the stream is final, this works, and for something to be a true singleton class, it is naturally going to be final.

The class declaration above includes one new method to extend Java's class Random and that is a method to return a random number with an exponential distribution. The exponential distribution is the expected distribution of the intervals between events that occur independently. So, the intervals between successive raindrops, the intervals between successive customers arriving at a bank, and the intervals between successive cars on a freeway all tend to obey this distribution until the density of events is so high that they influence each other. Once the raindrops start to collide with each other, once the customers start to jostle, and once the drivers of the cars start to have to adjust their speeds to avoid collisions, the distribution will change.

Log normal distributions are also probably good models of many things in a population model. People's decisions to shop at stores they do not frequently visit may well be best modeled using such a distribution, although the decision to go to a store and the day of the actual visit might not be directly coupled. People may put off many such trips until a weekend.

14. Randomness and Singleton Classes

An aside on Randomness

Practical consequences