Table of Contents
4.6. Implementing Kruskal's Algorithm: The Union-Find Data Structure
- The basic idea behind the algorithm is that the graph in question has a fixed population of nodes
- And the graphs grows over time by inserting edges between certain pairs of nodes.
- Goal: Maintain a set of connected components of the graph being processed throughout the whole operation
- With this algorithm, we develop a data structure called the Union-Find structure
- The Union-Find data structure stores a representation of the components in a way that supports fast searching and updating
- This data structure is used to efficiently implement Kruskal's Algorithm
As each edge e = (u,w) is considered
Find the connected components containing v and w.
If these components are different:
Then there is no path from v to w, so e should be added to the structure
Elif they're the same:
Then there's a path from v to w, so e should be omitted.
The Problem
- Operations:
- Find(u) : returns name of set containing u
- It can be used to find if two nodes u and v are in the same set by simply checking if Find(u) = Finf(v)
- Goal implementation: O(log n)
- Union(A, B) : merges sets A and B into one set
- Goal implementation: O(log n)
- MakeUnionFind(S) : returns the Union-Find data structure on set S where all elements are in separate sets.
- Example of such a set: A connected component with no edges
- Goal Implementation: O(n), n = |S|
- The sets obtained will be the connected components of the graph
- So, for a given node u, Find(u) returns the name of the component containing u
- To add an edge (u,v) we first test if Find(u) = Find(v)
- If Find(u) ≠ Find(v), then Union(Find(u), Find(v)) merges the two components into one
- The Union-Find data structure maintains components of a graph only when we add edges, not when we delete them
- The “name of the set” returned by Union-Find will be arbitrary
A simple Data Structure for Union-Find
»»»»»»»»> Maintain an Array Component of size n that contains the name of the set currently containing each element
Let S = {1,…,n}
Component[s] = the name of set containing s
To implement MakeUnionFind(S) :
Set up an array and initialize it to Component[s] = s for all s in S
With this operation, Find(v) is done in O(1) time
But Union(A,B) can take O(n) time with this operation(–>We have to update the values of Component[s] for all elements in A and B)
To improve on this bound of Union(A,B):
Maintain the list of elements in each set
Choose the name for the union to be the name of one of the sets, so we only update Component[s] for s in the other set
If the other set is big, name the union to be the name of that set and update Component[s] for s in the smaller set
Improvements:
Maintain an Array Size of length n
Size[A] = the size of A
So when a Union(A,B) operation is needed, we use the name of the larger set for the union and update Component for only the smaller set
However, we still have a O(n) time even after the improvements
So, overall, Find(u) takes O(1) time,MakeUnionFind(A,B) takes O(n) time and any sequence of k Union operations takes at most O(klogk) time with this simple implementation of the algorithm.
A Better Implementation for Union-Find
- Instead of using an array, we now use a pointer-based implementation
- With this implementation:
We maintain a set S of size n
Unions keep the name of the largest set
A Union operation takes O(1) time
MakeUnionFind(S) takes O(n) time
Find(u) takes O(logn) time
Implementing Kruskal's Algorithm
T = {}
foreach (u ∈ V) make a set containing singleton u
for i = 1 to m(u,v) = ei
if (u and v are in different sets)T = T ∪ {ei}
merge the sets containing u and vreturn T
( The above algorithm, is from the class discussion)
Kruskal's algorithm implemented as above runs in O(mlogn) time.
This section was also interesting, although it contained some long proofs. I give it an 8/10