In this homework, we consider the following task. We consider a sequence
of events, where each event is either an IP address or a query. If the
event is a query, we should compute the most frequent IP address among the
previous 100,000 IP addresses seen. In our simulation of these events,
we simply take an IP address to mean an integer between 0 and 10,000 (but
please bear in mind that this is not what IP addresses actually are).
The program here implements two different algorithms for
storing the last 100,000 IP addresses and answering such queries. The first
algorithm uses a hash table and a queue for storing the last 100,000 IP
addresses seen. The second algorithm uses a queue as above, along with
a hash table and adaptable priority queue. (Indicentally, understanding this
second algorithm will tell you why location aware priority queues are useful.)
This homework is about understanding and comparing these two implementations.
You need to do the following:
- Run the program for different choices of the variable f in the
main method, and record the running time in milliseconds of the two algorithms.
The program prints the two running times. (Actually, the two algorithms
run on hopefully similar but not the same input, but never mind.) The variable
f controls the expected fraction of the query events. Right now,
f is set to 0.1, which means that in the simulation, we should
expect roughly 1 out of every 10 events to be a query event. Be sure to
run the program a few times for each choice of f, so that you
report typical times. Exactly which choices of f to try is left to
you, but keep in mind that the goal is to compare the relative performance
of the two algorithms.
- Do the same as above, but instead of varying f, fix it at
0.1 and vary the number of events, which is currently 1 million, within
a the range from 100,000 to 2 million. Record the running times of both
algorithms for each choice of the number of events.
- Explain the observed relative performance of the two algorithms over the
range of the parameters for which you made the measurements. For
this explanation, you can appeal to your understanding of how the data
structures like hash tables and priority queues work. You can also appeal
to worst case and typical case running times of various operations.
Submit your observations (this should include the record of the running times)
and your explanations as a text file into a dropbox in ICON called Homework10.
This assignment is due by 11:59 pm on Wednesday, Dec 7.