Part One

The task here is to add a method mst() to this undirected graph class that computes the minimum spanning tree of the graph and returns the edges in this spanning tree as a linked list of its edges. You should implement Kruskal's algorithm to implement this method. Kruskal's algorithm is described in Section 9.5.2 of the text.

Kruskal's algorithm first sorts the edges by increasing cost. For this, you should simply invoke a library sort method and not implement one yourself.

You also need a Union-Find data structure for maintaining the connected components. For this, you should implement the Union-by-Size scheme from Section 8.4. All of this was discussed extensively in class on April 26.

Feel free to add other required fields and methods to the Vertex, Edge, and Graph classes. Part One is worth 10 points.

Part Two

Here, you will apply the mst method to the document similarity setting that we looked at in homework 7. We create a graph which has a vertex for each of the ten documents. Between every pair of vertices, there is an edge whose cost is the dissimilarity between the two corresponding documents. The dissimilarity is defined to be 1 minus the similarity. The code for creating this graph is given here , along with a copy of the undirected graph class. Once you have finished Part One, you should incorporate the modifications into this undirected graph class.

Run the mst method on the graph constructed here. From the minimum spanning tree, remove the four costliest edges. This will yield four components. Output the names of the files in each component.

What we have described here is a well-known way of clustering using the minimum spanning tree. The choice of four made above is somewhat arbitrary.

Update 1: You can write your own sorting method if that is more convenient.

Update 2: If we remove four edges in the minimum spanning tree, we get 5 components, not four.

Update 3: After obtaining the min-spanning tree for the 10-node graph on the files, you can do the rest of the stuff manually, just to save time. So print the edges of the min-spanning tree, and just report the 5 components by manual inspection. You can submit a text file with these 5 components.

Update 4: Using a library sort on an array of objects belonging to a class we define.