Homework 9, Due Monday, 4/13 by midnight pm via ICON


Introduction.
In this homewok, you will use the dictionary data structure that is in-built into Python to perform a fairly simple task. The Think Python book has a nice chapter (Chapter 11, pages 103-109) on Python dictionaries. It would immensely help you to read at least sections 11.1-11.4 of this chapter, before you get started on the homework.

Computing a word histogram. For the homework, I would like you to read a text file and count the number of ocurrances of each word in the file. You can model your solution around the example on page 105 of the Think Python book.

Here are a few issues not directly related to that of counting word ocurrences:

  1. What is a word? Let us define a word to be a contiguous sequence of upper or lower case letters. According to this simple definition, hyphenated words such as "cross-reference" will be two words, "cross" and "reference". Also, for the purposes of counting, upper case and lower case letters should be treated as being identical and so "the" and "The" should be counted as the same word.

  2. Reading words from the file. You can start with the the following function to read words from a file. The function split which is a defined in the string module, splits a line into "words" by using blanks as delimiters. We will show you how to use other characters as delimiters. Given our definition of a word, any character that is not an upper-case or lower-case letter should be considered a delimiter.
    #Reads a sequence of words, separated by spaces, from a file
    #which is passed in as an argument to the function. The sequence is
    #stored in a list and returned.
    def getList(fileref):
    
     l = []
    
     #readlines() returns the contents of file as a list of strings
     #with each line stored in a single string
     lines = fileref.readlines()
     for s in lines:
       words = s.split()
       for w in words:
         l.append(w)
       
     return l
    
    The above function can be called using the pickAFile() function that we have see before. Here is an example:
    def example():
    
      fileref = open(pickAFile(), 'r')
      wordList = getList(fileref)
    
    At the end of the function example(), the list wordList is a list of all words in the file.

    Things you should do.
    Your task is to write a program that reads a file, extracts words from the file, and counts the number of ocurrances of each word in the file. The output should be a table of the kind:

    word1 frequency
    word2 frequency
    ...   ...
    ...   ...
    ...   ...
    

    What to submit.
    A file called hw9.py with your program and a README file.