Emily Boyes' Wiki

This is an old revision of the document!

Introduction: Greedy Algorithms
Section 4.1: Interval Scheduling: The Greedy Algorithm Stays Ahead
Section 4.2: Scheduling to Minimize Lateness: An Exchange Argument
Section 4.3: Optimal Caching: A More Complex Exchange Argument
Section 4.4: Shortest Paths in a Graph
Section 4.5: The Minimum Spanning Tree Problem
Section 4.6: Implementing Kruskal's Algorithm: The Union-Find Data Structure
Section 4.7: Clustering
Section 4.8: Huffman Codes and Data Compression
Section 4.9: Minimum-Cost Arborescences: A Multi-Phase Greedy Algorithm

Introduction: Greedy Algorithms

An algorithm is said to be greedy is it builds up a solution in small steps, choosing a decision at each step myopically to optimize some underlying criterion. When a greedy algorithm succeeds in solving a nontrivial problem optimally, it typically implies something interesting and useful about the structure of the problem itself. There are two basic methods for proving that a greedy algorithm produces an optimal solution to a problem: greedy stays ahead and the exchange argument. Applications of greedy algorithms include shortest paths in a graph, the Minimum Spanning Tree Problem, the construction of Huffman codes for performing data compression, and the Minimum Cost Arbolescence Problem, a more complex application.

Section 4.1: Interval Scheduling: The Greedy Algorithm Stays Ahead

Set Up: Consider a set of requests {1, 2, …, n}; the i^th request corresponding to an interval of time starting at s(i) ad finishing at f(i). A subset of the requests is compatible if no two of them overlap in time, and the goal is to accept as large a compatible subset as possible. The compatible sets of maximum size will be called optimal.

Designing a Greedy Algorithm

The basic idea in a greedy algorithm for interval scheduling is to use a simple rule to select a first request i_l. Once a request i_l is accepted, we reject all requests that are not compatible with i_l. We then select the next request i₂ to be accepted, and again reject all requests that are not compatible with i₂. We continue in this fashion until we run out of requests.

Here is the process we go through to find the most optimal ordering:

1. Select the available request that starts earliest –> Not an optimal solution

2. Accept the request that requires the smallest interval of time, or the request for which f(i) - s(i) is as small as possible –> Not an optimal solution, but somewhat better than 1.

3. For each request, count the number of other requests that are not compatible, and accept the request that has the fewest number of non compatible requests –> Solution might be optimal

4. Accept first the request that finishes first, or the request i for which f(i) is as small as possible –> This leads to an optimal solution. The general algorithm is shown below:

Initially let R be the set of all requests, and let A be empty
While R is not yet empty
    Choose a request i ∈ R that has the smallest finishing time
    Add request i to A
    Delete all requests from R that are not compatible with request i
EndWhile
Return the set A as the set of accepted requests

Analyzing the Algorithm

For the above algorithm we can prove optimality by proving |A| = |O| where A is a compatible set of requests and O is an optimal set of intervals. This shows that A contains the same number of intervals as O and hence is also an optimal solution.

Theorem 4.3 The greedy algorithm returns a optimal set A.

Implementation and Running Time

We can make our algorithm run in time O(n log n) as follows:

Begin by sorting the n requests in order of finishing time and labeling them in this order; that is, we will assume that f(i) < f(j) when i < j. This takes time O(n log n).
In an additional O(n) time, we construct an array S[1… n] with the property that S[i] contains the value s(i). We now select requests by processing the intervals in order of increasing f(i). We always select the first interval; we then iterate through the intervals in

order until reaching the first interval j for which s(j) > f(1); we then select this one as well.

So we implement the greedy algorithm analyzed above in one pass through the intervals, spending constant time per interval. Thus this part of the algorithm takes time O(n).

Extensions

The Interval Scheduling Problem we considered here is a quite simple scheduling problem. There are further complications that could arise in practical settings such as:

We assumed that all requests were known to the scheduling algorithm when it was choosing the compatible schedule, but online algorithms must make decisions as time proceeds, without knowledge of future input.
Our goal was to maximize the number of satisfied requests, there could exist a situation in which each request has a different value or weight to us (one thing was more important to have than another).

A Related Problem: Scheduling All Intervals

The Problem

Designing the Algorithm

Analyzing the Algorithm

Sort the intervals by their start times, breaking ties arbitrarily
Let I<sub>1</sub>, I<sub>2</sub> ..... I<sub>n</sub> denote the intervals in this order
For j = 1, 2, 3, ..., n
    For each interval Ii that precedes Ii in sorted order and overlaps it
        Exclude the label of I<sub>j</sub> from consideration for I<sub>j</sub>
    Endfor
    If there is any label from {1, 2, ..., d} that has not been excluded then
        Assign a nonexcluded label to I<sub>j</sub>
    Else
        Leave I<sub>j</sub> unlabeled
    Endif
Endfor

Theorem 4.6 The greedy algorithm above schedules every interval on a resource, using a number of resources equal to the depth of the set of intervals. This is the optimal number of resources needed.

Section 4.2: Scheduling to Minimize Lateness: An Exchange Argument

The Problem

Designing the Algorithm

Theorem 4.7 There is an optimal schedule with no idle time.

Theorem 4.8 All schedules with no inversions and no idle time have the same maximum lateness.

Theorem 4.9 There is an optimal schedule that has no inversions and no idle time.

Theorem 4.10 The schedule A produced by the greedy algorithm has optimal maximum lateness L.

Extensions

Section 4.3: Optimal Caching: A More Complex Exchange Argument

Section 4.4: Shortest Paths in a Graph

The Problem

Designing the Algorithm

Dijkstra’s Algorithm (G, //l//)
Let S be the set of explored nodes
    For each u ∈ S, we store a distance d(u)
Initially S = {s} and d(s) = 0
While S ≠ V
    Select a node v ∉ S with at least one edge from S for which
        d’(v) = min <sub>e = (u,v):u∈s</sub> d(u) + //l//<sub>e</sub> is as small as possible
    Add v to S and define d(v) = d’(v)
EndWhile

Analyzing the Algorithm

Theorem 4.14 Consider the set S at any point in the algorithm’s execution. For each u ∈ S, the path P_u is a shortest s-u path.

Implementation and Running Time

Theorem 4.15 Using a priority queue, Dijkstra’s Algorithm can be implemented on a graph with n nodes and m edges to run in O(m) time, plus the time for n ExtractMin and m ChangeKey operations.

Table of Contents

Introduction: Greedy Algorithms

Section 4.1: Interval Scheduling: The Greedy Algorithm Stays Ahead

Designing a Greedy Algorithm

Analyzing the Algorithm

Extensions

A Related Problem: Scheduling All Intervals

Section 4.2: Scheduling to Minimize Lateness: An Exchange Argument

The Problem

Designing the Algorithm

Section 4.3: Optimal Caching: A More Complex Exchange Argument

Section 4.4: Shortest Paths in a Graph

The Problem

Designing the Algorithm

Analyzing the Algorithm

Section 4.5: The Minimum Spanning Tree Problem

Section 4.6: Implementing Kruskal's Algorithm: The Union-Find Data Structure

Section 4.7: Clustering

Section 4.8: Huffman Codes and Data Compression

Section 4.9: Minimum-Cost Arborescences: A Multi-Phase Greedy Algorithm