This is an old revision of the document!
Entry Six
Chapter 4.7
Clustering
How do minimum spanning trees play a role in clustering???
Clustering is when you organize a collection of objects into coherent groups. How might we organize these groups? By how similar they are. We use a distance function on the objects with the assumption that those closer together are more similar to each other.
The Problem: Given a set of objects and a distance function on the objects, divide the objects into groups so that objects in the same group are close and ones in different groups are far apart.
Clusterings of Maximum Spacing- Given a set of objects each pair has a numeric distance and that distance is greater than 0 for distinct objects and the distances are symmetric. Given the number of groups k, we want to divide the set of objects into k groups. A k-clustering is a partition of the set into k non-empty sets. The spacing of a clustering is the minimum distance between nay pair of points in different clusters. Our goals is to maximize the minimum spacing. We want points in different clusters to be as far apart as possible from each other.
The Algorithm:
Grow a graph on the vertex set U edge by edge with connected components corresponding to clusters.
- draw an edge between the closest pair of points
- add edge between next closest pair of points
- continue in order of increasing distance
- if two points pi and pj are already in the same cluster, don't add an edge between them (b/c it does not change the set of components and so we never create a cycle)
The graph is a union of trees. The algorithm is just like Kruskal's but instead of continuing until we have one connected component, we stop at k connected components. We stop before it adds k-1 edges. We are creating a minimum spanning tree and deleting the longest edges.