**__Chapter 4.7: Clustering__**

Clustering arises whenever one has a collection of objects- say, a set of photographs, documents, or microorganisms - that one is trying to classify or organize into coherent groups.  The clustering problem seeks to divide objects into groups so that, intuitively, objects within the same group are "close," and objects in different groups are "far apart."

The //spacing// of a k-clustering is the minimum distance between any pair of points lying in different clusters.  We seek to find the k-clustering with the maximum possible spacing.  

**Designing the Algorithm**

Add edges in order of closest pairs, but we are only interested in the connected component of the graph.  Each time we add an edge that spans two distinct components, it is as though we have merged the two corresponding clusters.  This is called //single-link clustering//.  This is exactly what Kruskal's algorithm does, but the difference is that we seek a k-clustering, so we stop the procedure once we obtain k connected components.  

**Analyzing the Algorithm**

The components C1, C2, ..., Ck formed by deleting the k-1 most expensive edges of the MST T constitute a k-clustering of maximum spacing.

Interesting section, short, and to the point.  9/10.