4.6. Implementing Kruskal's Algorithm: The Union-Find Data Structure

The basic idea behind the algorithm is that the graph in question has a fixed population of nodes
And the graphs grows over time by inserting edges between certain pairs of nodes.
Goal: Maintain a set of connected components of the graph being processed throughout the whole operation
With this algorithm, we develop a data structure called the Union-Find structure
The Union-Find data structure stores a representation of the components in a way that supports fast searching and updating
This data structure is used to efficiently implement Kruskal's Algorithm

As each edge e = (u,w) is considered

Find the connected components containing v and w.

If these components are different:

Then there is no path from v to w, so e should be added to the structure

Elif they're the same:

Then there's a path from v to w, so e should be omitted.

The Problem

Operations:
- Find(u) : returns name of set containing u
  - It can be used to find if two nodes u and v are in the same set by simply checking if Find(u) = Finf(v)
  - Goal implementation: O(log n)
- Union(A, B) : merges sets A and B into one set
  - Goal implementation: O(log n)
- MakeUnionFind(S) : returns the Union-Find data structure on set S where all elements are in separate sets.
  - Example of such a set: A connected component with no edges
  - Goal Implementation: O(n), n = |S|
The sets obtained will be the connected components of the graph
So, for a given node u, Find(u) returns the name of the component containing u
To add an edge (u,v) we first test if Find(u) = Find(v)
If Find(u) ≠ Find(v), then Union(Find(u), Find(v)) merges the two components into one
The Union-Find data structure maintains components of a graph only when we add edges, not when we delete them
The “name of the set” returned by Union-Find will be arbitrary

A simple Data Structure for Union-Find

»»»»»»»»> Maintain an Array Component of size n that contains the name of the set currently containing each element

Let S = {1,…,n}

Component[s] = the name of set containing s

To implement MakeUnionFind(S) :

Set up an array and initialize it to Component[s] = s for all s in S

With this operation, Find(v) is done in O(1) time

But Union(A,B) can take O(n) time with this operation(–>We have to update the values of Component[s] for all elements in A and B)

To improve on this bound of Union(A,B):

Maintain the list of elements in each set

Choose the name for the union to be the name of one of the sets, so we only update Component[s] for s in the other set

If the other set is big, name the union to be the name of that set and update Component[s] for s in the smaller set

Improvements:

Maintain an Array Size of length n

Size[A] = the size of A

So when a Union(A,B) operation is needed, we use the name of the larger set for the union and update Component for only the smaller set

However, we still have a O(n) time even after the improvements

So, overall, Find(u) takes O(1) time,MakeUnionFind(A,B) takes O(n) time and any sequence of k Union operations takes at most O(klogk) time with this simple implementation of the algorithm.

A Better Implementation for Union-Find

Instead of using an array, we now use a pointer-based implementation
With this implementation:

We maintain a set S of size n

Unions keep the name of the largest set

A Union operation takes O(1) time

MakeUnionFind(S) takes O(n) time

Find(u) takes O(logn) time

Implementing Kruskal's Algorithm

T = {}
foreach (u ∈ V) make a set containing singleton u
for i = 1 to m

(u,v) = ei
if (u and v are in different sets)

T = T ∪ {ei}
merge the sets containing u and v

return T

( The above algorithm, is from the class discussion)

Kruskal's algorithm implemented as above runs in O(mlogn) time.

This section was also interesting, although it contained some long proofs. I give it an 8/10

Table of Contents

4.6. Implementing Kruskal's Algorithm: The Union-Find Data Structure

The Problem

A simple Data Structure for Union-Find

A Better Implementation for Union-Find

Implementing Kruskal's Algorithm