Introduction to Programming Machine Problem V from Spring 1993

Background

Arbitaray letter substitution cyphers can be automatically broken by taking advantage of the known letter frequencies of the language being encrypted. The frequencies for the English language, based on a sample of a few hundred thousand letters, are given in the file freq

Assignment

Your assignment is to write a program that uses this data to crack the code for arbitrary letter substitution cyphers. The input data you will be given will be encrypted so that each letter is replaced with some other letter. Distinctions between upper and lower case letters will be preserved, and punctuation and numerals will not be scrambled.

In fact, the encrypted data will be constructed using MP3 to perform the letter substitutions.

The following basic algorithm will work for this problem:

  1. First, read the cyphertext and measure the letter frequencies, as in MP4 (borrow from ~dwjones/public/mp4.p if you want).

  2. Second, sort the letters of the alphabet in the order of the letter frequencies you measured.

  3. Third, construct a key, in the style of the keys used for MP3, from your sorted data and the data in the file freq.

  4. Finally, use the algorithm from MP3, plus the key you have computed, to re-read and decrypt the input. (borrow code from the published solution, if you want, but this may make things unnecessarily difficult).

Your program should put a two line title on the output data to indicate the substitution it used. These two lines should be the same as the key file that could be used with MP3 to decrypt the data. Assuming that the original input was encrypted using MP2 (ROT13), the title might be:

RGNBVAFUEQYHZJPTSLCOXIWKMDrgnbvafueqyhzjptslcoxiwkmd
ETAOINSHRDLUMWCGFYPBKVJXZQetaoinshrdlumwcgfypbkvjxzq
Note that the second line of this title contains the letters in the frequency order shown above, while the first line shows the letters in the frequency order observed in the input data.

Note that, unlike MP4, your solution to MP5 is unlikely to completely decrypt the sample data, but it is likely to get enough right that you can figure out the rest by hand.

Grading Criteria

This problem is worth 5% of your grade. If you get your assignment to work perfectly, you will earn only half of credit. The other half will depend on the style of your code and commentary.