Randomness

Part of 22C:169, Computer Security Notes
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

One Time Pads

A classic encryption method involves what is called a one-time pad, sometimes called a Vernam Mauborgne cypher. The modern history of one-time pads begins in the 20th century when Joseph Mauborgne recognized that Gilbert Vernam's cryptographic machine produced cyphertext that was truly unbreakable if the key was truly random. This is actually a reinvention -- Steven Bellovin has recently discovered that a California banker named Frank Miller published the same basic idea in his Telegraphic Code to Insure Privacy and Security in the Transmission of Telegrams in 1882. Variants of this idea may be older, since book cyphers where you never reuse any page of the book achieve a similar effect.

In a book cypher, the key is some book that both the sender and recipient of the message happen to own. Various editions of the Bible are perennially popular here. To send a message, you use some page of the book, where the sender and receiver have some way of agreeing on the page to use. A typical scheme is to pick the page depending on the date -- for example, for messages encrypted on the 53rd day of the year, use page 53.

In addition to agreeing on a starting page, the sender and reciever need to agree on how to use that page. The oldest book cyphers were actually codes (in that words were transmitted, not letters) in which a search for the word to be sent was made by looking forward from the designated starting point.

So, if this sentence was being used, the message "sentence used twice" would be encoded as 4-7-12. If we have a code wheel, as used for classic Caesar cyphers, we could as easily use the consecutive letters on the page, starting from the first letter, as keys to set the code wheel. This is called a running key cypher.

ABCDEFGHIJKLMNOPQRSTUVWXYZ  Key = S
SRQPONMLKJIHGFEDCBAZYXWVUT            Plaintext = S
^                                    Cyphertext = A
ABCDEFGHIJKLMNOPQRSTUVWXYZ  Key = O
ONMLKJIHGFEDCBAZYXWVUTSRQP            Plaintext = E
^   ^                                Cyphertext = K
ABCDEFGHIJKLMNOPQRSTUVWXYZ  Key = I
IHGFEDCBAZYXWVUTSRQPONMLKJ            Plaintext = N
^            ^                       Cyphertext = V
ABCDEFGHIJKLMNOPQRSTUVWXYZ  Key = F
FEDCBAZYXWVUTSRQPONMLKJIHG            Plaintext = T
^                  ^                 Cyphertext = M

In modern running-key cyphers, we use an exclusive or on the bit patterns representing each character with the bit pattern of the key instead of using a code wheel. That is what Vernam's machine was the first to do.

A cryptanalyst who suspects that a book cypher is being used will, of course, begin by trying to figure out what book might be in use. This is why common bible editions are such a bad bet. The cryptanalyst will also need to work out how the pages of the book are being used.

Running key cyphers are subject to cryptanalytic attack if the key is not random. Knowing that the key is an English text, for example, and knowing that e is the most common letter, then English text encrypted with this key will encrypt e with e more frequently than any other encryption. As a result, we can use letter frequency distributions to attack such a code.

If the same page of the book is ever reused, whether for is a running key cypher or a classic word-based book code, frequency distributions of common plaintext letters or words will begin to reveal the structure of the text. So, book codes where you use one page of the book for every message sent in that day are weak. Furthermore, once you begin to discover properties of the encrypted text, you can also begin to discover properties of the text used as a key, and that helps you hunt up the right book in the library.

Running key cyphers where the key is never reused and is entirely random are unbreakable using cryptanalytic attacks. This leads naturally to the idea of printing random text in the book instead of fixed text, and destroying each page of the book as it is used. That is how a one-time pad works. Nowdays, one-time pads are distributed on media such as CD-RW disks instead of paper -- RW so that your crypto software can erase each segment of the key as it is used.

Randomness

Generating a one-time pad by hand is entirely possible. Just use a roulette wheel, dice or coin tosses to generate random numbers, and take down two identical copies of the sequence of numbers generated. Carbon paper can be used to avoid making clerical errors in the result.

Automatic generation of streams of random numbers is not easy! We usually build machines to have predictable behavior. Most genuinely effective hardware random number generators rely on measuring quantum phenomona, thermal noise or radioactive decay. On computers, we can also use arrival times of network packets and key presses as effectively random events, although we can only extract a few bits of randomness from each such event, since the average rate of events is extremely predictable -- the randomness is in the deviation of each event from the average rate.

It is very tempting to use algorithmic methods to generate random numbers. The problem with this, of course, is that the very fact that something is computed by an algorithm implies an entirely deterministic process. We call an algorithm that generates apparently random numbers a pseudo-random number generator, and we refer to the output as pseudo-random to emphasize that it is actually predictable.

Pseudo-Random Numbers

"The generation of random numbers is too important to be left to chance." This famous warning about the difficulty of designing pseudo-random number generators is attributed to Robert R. Coveyou of Oak Ridge National Laboratory. He's right.

Most pseudo-random number generators are not very good.

In general, a pseudo-random number generator with a finite internal state must generate a periodic sequence. For an n bit state, the period can be no longer than 2ⁿ. Unfortunately, for many algorithms, the period is much shorter. Thus, if the sequence of outputs is the vector V v_i = v_i+p, where p is the period of the generator.

We say that the seed for a pseudo-random number generator is the initial value v₀ from which the sequence is computed.

Linear congruential pseudorandom number generators are the most common family of generators, but even the best of these are a bad choice for cryptography.

v_i+1 = (a × v_i + b) mod m
For example:
v_i+1 = 65539 × v_i mod 2³¹ -- the infamous RANDU
v_i+1 = 16807 × v_i mod (2³²-1) -- Park and Miller's minimal standard

Bad selection of the constants a, b and m can lead to very bad generators. Such generators are unfortunately common (the worst offender is an ancient generator known as RANDU which, unfortunately, made it into the Unix/C world as rand. The good and bad linear congruential generators use essentially the same algorithm. All that differ are the "magic numbers" used. If you find such a generator being used, you can find essentially all of the research done on its performance by doing a web search on these numbers, since the combination of values is unlikely to be used anywhere else.

Nobody should ever use a linear congruential generator for cryptography. Cryptographically secure pseudo-random number generators exist, and they have significant applications. Note the following properties of an ideal pseudorandom number generator:

The input and output domains are identical, so the generation function is a mapping from that domain to itself.
Given the relation "maps to" under the pseudo-random number generator, and viewing this relation as a graph on the domain, there is exactly one cyclic path through this graph, and it visits every element of the domain.
Every contiguous snippet of this path passes all possible tests for randomness, until you see the entire path, at which point, it is obviously cyclic and therefore not at all random.

It is noteworthy that a good encryption function should have the exact same properties for any particular encryption key! If the encryption function does not map the entire domain to the entire domain, it is not useful. If the mapping creates multiple small cycles instead of one big cycle, then the job of the codebreaker is simplified. Therefore, good encryption functions ought to be closely related to good pseudorandom number generators!

The Most Obvious Application

If we take the output of a pseudorandom number generator and exclusive or it with our text to encrypt or decrypt it, then the seed used for pseudorandom number generation is our encryption key.

Note that if we use an encryption function as a pseudorandom number generator, we have now demonstrated a new way to use that function for encryption!

References

The Wikipedia is a good source:

For one-time pads, see: http://en.wikipedia.org/wiki/One-time_pad
For book cyphers, see: http://en.wikipedia.org/wiki/Book_cipher
For running-key cyphers, see: http://en.wikipedia.org/wiki/Running_key_cipher
For hardware random number generation, see: http://en.wikipedia.org/wiki/Hardware_random_number_generator
For pseudo-random number generation, see: http://en.wikipedia.org/wiki/Pseudorandom_number_generator
For cryptographically secure pseudo-random number generation, see: http://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator