====== 4.6. Implementing Kruskal's Algorithm: The Union-Find Data Structure ====== \\ * The basic idea behind the algorithm is that the graph in question has a fixed population of nodes * And the graphs grows over time by inserting edges between certain pairs of nodes. * Goal: Maintain a set of connected components of the graph being processed throughout the whole operation * With this algorithm, we develop a data structure called the ** Union-Find ** structure * The ** Union-Find ** data structure stores a representation of the components in a way that supports fast searching and updating * This data structure is used to efficiently implement Kruskal's Algorithm\\ >>>>>>>>>>>>>>>>>>>>> As each edge e = (u,w) is considered\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Find the connected components containing //v// and //w//.\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If these components are different:\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>Then there is no path from //v// to //w//, so e should be added to the structure\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Elif they're the same:\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Then there's a path from //v// to //w//, so e should be omitted.\\ \\ ==== The Problem ==== * Operations: * ** Find(u) **: returns name of set containing u * It can be used to find if two nodes //u// and //v// are in the same set by simply checking if Find(u) = Finf(v) * Goal implementation: O(log n) * ** Union(A, B) ** : merges sets A and B into one set * Goal implementation: O(log n) * ** MakeUnionFind(S) **: returns the Union-Find data structure on set S where all elements are in separate sets. * Example of such a set: A connected component with no edges * Goal Implementation: O(n), n = |S| * The sets obtained will be the connected components of the graph * So, for a given node u, Find(u) returns the name of the component containing u * To add an edge (u,v) we first test if Find(u) = Find(v) * If Find(u) ≠ Find(v), then Union(Find(u), Find(v)) merges the two components into one * The ** Union-Find ** data structure maintains components of a graph only when we add edges, not when we delete them * The "name of the set" returned by Union-Find will be arbitrary \\ ==== A simple Data Structure for Union-Find ==== >>>>>>>>>>>>>>>>> Maintain an Array ** Component ** of size n that contains the name of the set currently containing each element\\ >>>>>>>>>>>>>>>>> Let S = {1,...,n}\\ >>>>>>>>>>>>>>>>> Component[s] = the name of set containing s\\ >>>>>>>>>>>>>>>>> To implement ** MakeUnionFind(S) **:\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Set up an array and initialize it to Component[s] = s for all s in S\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> With this operation, Find(v) is done in O(1) time\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But Union(A,B) can take O(n) time with this operation(-->We have to update the values of Component[s] for all elements in A and B)\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To improve on this bound of Union(A,B):\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Maintain the list of elements in each set\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Choose the name for the union to be the name of one of the sets, so we only update Component[s] for s in the other set\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If the other set is big, name the union to be the name of that set and update Component[s] for s in the smaller set\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Improvements:\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Maintain an Array Size of length n\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Size[A] = the size of A\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So when a Union(A,B) operation is needed, we use the name of the larger set for the union and update Component for only the smaller set\\ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, we still have a O(n) time even after the improvements\\ >>>>>>>>>>>>>>>>> So, overall, Find(u) takes O(1) time,MakeUnionFind(A,B) takes O(n) time and any sequence of k Union operations takes at most O(klogk) time with this simple implementation of the algorithm.\\ \\ ==== A Better Implementation for Union-Find ==== \\ * Instead of using an array, we now use a pointer-based implementation * With this implementation:\\ >>>>>>>>>>>>>>>>> We maintain a set S of size n \\ >>>>>>>>>>>>>>>>> Unions keep the name of the largest set \\ >>>>>>>>>>>>>>>>> A Union operation takes O(1) time \\ >>>>>>>>>>>>>>>>> MakeUnionFind(S) takes O(n) time \\ >>>>>>>>>>>>>>>>> Find(u) takes O(logn) time \\ ==== Implementing Kruskal's Algorithm ==== >>>>>>>>>>>>>>>>> T = {} >>>>>>>>>>>>>>>>> foreach (u ∈ V) make a set containing singleton u >>>>>>>>>>>>>>>>> for i = 1 to m >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (u,v) = ei >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if (u and v are in different sets) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> T = T ∪ {ei} >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> merge the sets containing u and v >>>>>>>>>>>>>>>>> return T ( The above algorithm, is from the class discussion) \\ \\ Kruskal's algorithm implemented as above runs in O(mlogn) time. \\ \\ This section was also interesting, although it contained some long proofs. I give it an 8/10