Chapter 4

Greedy Algorithm: builds up a solution in small steps, choosing a decision at each step myopically to optimize some underlying criterion.

4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead

Interval Scheduling Problem: set of requests, where the ith request corresponds to an interval of time starting at s(i) and finishing at f(i)
- Subset of requests is compatible if no two of them overlap in time: want the largest subset → Optimal

Designing a Greedy Algorithm

Potential Designs:

Always Select the Available Request that Starts Earliest: a very long request that starts the earliest is not the most optimal
Accept the Request that Requires the Smallest Interval of Time: accepting a short interval in the middle, may prevent the acceptance of two other requests
For each request, cont the number of requests that are not compatible: accept the request that has the fewest noncompatible requests; more complicated but could work
Accept First the Request That Finishes First: maximize the time left to satisfy other requests

Analyzing the Algorithm

R: set of requests that we have neither accepted nor rejected yet A: done the set of accepted requests O: optimal set of interval

A is a compatible set of requests

Must show that A contains the same number of intervals as O (not necessarily the same set)

Proofs:

We want to show that the greedy algorithm is doing better in a step-by-step fashion
i₁,…,i_k - set of requests in A in the order they were added to A
j₁,…,j_m - set of requests in O ordered in the natural left-to-right order of the corresponding intervals (start to finish points)

For all indices r ⇐ k, we have f(i_r) ⇐ f(j_r)

r = 1: reselts the request i₁ with minimum finish time
for r > 1: assume induction hypothesis is true for r-1, prove for r; by the induction hypothesis, we can assuming that f(i_r-1) ⇐ f(j_r-1). Greedy algorithm always has the option of choosing j_r rather than choosing a later-finishing interval.
For each r, the rth interval it selects finishes at least as soon as the rth interval in O

The greedy algorithm returns an optimal set A

Proof by contradiction
If A is not optimal, O must have more requests. There is a request j_k+1 in O. This request starts after j_k ends (hence after i_k ends). R still contains j_k+1, but greedy algorithm stops only when R is empty: contradiction

Implementation and Running Time:

Algorithm Run Time: O(nlogn)
- sorting the n requests in order of finishing time: O(nlogn)
- Construct an array: O(n)
- Select requests in order of increasing f(i): O(n)

Extensions

Requestors may leave if the schedular waits too long
Online Algorithms: make decisions as time proceeds, without knowledge of future input
Each request carries a certain value/weight

A Related Problem: Scheduling All Intervals

Many identical resources available: goal is to schedule all the requests using as few resources as possible

Interval Partitioning Problem
- i.e.-Classroom Scheduling
Solution: use k resources as a rearrangement of the requests into k rows of nonoverlapping intervals

Proof: Depth of a set of intervals is the maximum number that passes over any single point on the time-line

No solution could use a number of resources smaller than the depth

Designing the Algorithm d = depth of the set of intervals

algorithm used is a simple one-pass greedy strategy, ordering by starting times

Analyzing the Algorithm

If you have d labels at your disposal, then as you sweep through the intervals from left to right, assigning an available label to each interval you enounter, you can never reach a point where all the labels are currently in use

Personal Thoughts

We reviewed many of these concepts in class before Feb Break, so the section made a lot of sense. However, because there are a lot of symbols and equations, it was harder to understand than when it was taught in class. Overall, the concepts were pretty clear though. Readability: 6.5 Interesting: 7

4.2 Scheduling to Minimize Lateness: An Exchange Argument

The Problem

Single resource, set of n requests to use the resource for an interval
Resource is available at start time s
Request i has a deadline d_i and requires a contiguous time interval of length t_i
Different requests must be assigned nonoverlapping intervals

Goal: satisfy each request, but are allowed to let certain requests run late – want the smallest maximum lateness
Request i is late if it misses a deadline: f(i) > d_i
Lateness of request i: l_i = f(i) - d_i

Designing the Algorithm

Sort the jobs in increasing order of their deadlines d_i a nd schedule them in this order
Jobs with earlier deadlines get completed first

Analyzing the Algorithm

There is an optimal schedule with no idle time
- Exchange argument: gradually modify O, preserving optimality at each step, but transforming it into a schedule found by the greedy algorithm

All schedules with no inversions and no idle time have the same maximum lateness
- Start with any optimal schedule having no idle time; convert it into a schedule w/ no inversions without increasing its maximum lateness: this is optimal

There is an optimal schedule that has no inversions and no idle time
- If O has an inversion, then there is a pair of jobs i and j such that j is scheduled immediately after i and have d_j < d_i
- After swapping i and j we get a schedule with one less inversion
- The new swapped schedule has a maximum lateness no larger than that of O
- As j is swapped to an earlier position, its lateness does not increase.
- For job i, its lateness may have been increased, but since d_i > d_j, there is no increase to the maximum lateness

The schedule A produced by the greedy algorithm has optimal maximum lateness L

Extensions

Requests also with an earliest possible starting time r_i
- Called the release time

Personal Thoughts

Similar to the section above, we reviewed many of these concepts in class before Feb Break, so the section made a lot of sense. For some reason, I did find this Greedy algorithm easier to follow than the previous one. We have reviewed the example of the classroom scheduling multiple times in class, so that is very clear to me. Readability: 8.5 Interesting: 8

4.4 Shortest Paths in a Graph

The Problem

Graphs are used to model networks in which one travels from one point to another
A problem: determine the shortest path between nodes in a graph
s has a path to every node in G, each edge e has a positive length indicating the time (or distance, or cost, etc.) it takes to traverse e
For a path P, the length of P is the sum of all the lengths of all edges in P.

Designing the Algorithm

set S of vertices u for which we have determined a shortest-path distance d(u) from s: “explored” part of the graph

As each node v is added to the set S, we simply record the edge (u,v) on which it achieved the value min*d(u)+edge length
if (u,v) is the edge we have stored for v, then P_v is just (recursively) the path P_u followed by the single edge (u,v).
Start at v, follow the edge we have stored for v in the reverse direction to u, then follow the edge we have stored for u in the reverse direction to its predecessor, and so on until we reach s

Analyzing the Algorithm

Greedy because it always forms the shortest new s-v path we can make from a path in S followed by a single edge
Each time it selects a path to a node v, that path is shorter than every other possible path to v

Consider the set S at any point in the algorithm's execution. For each u in the set S, the path P_u is a shortest s-u path
- when set S contain 1 node, we have S = {s} and d(s) = 0
- when S is size k+1 by adding the node v, (u,v) is the final edge on our s-v path P_v
- there is no path from s to y through x that is shorter than P_v.

Required non negative lengths
Dijkstra's Algorithm is a “continuous” version of the standard breadth-first search algorithm for traversing a graph

Implementation and Running Time

n-1 iterations for a graph with n nodes
each iteration adds a new node v to S
for a graph with m edges, computing all these minima can take O(m) time → O(mn)
Data Structures to Improve Runtime
- Using a priority queue, Dijkstra's Algorithm can be implemented on a graph with n nodes and m edges to run in O(m) time, plus the time for n ExtractMin and m Change Key Operations
  - O(m log n)
- Maintain the values of the minima rather than recomputing each iteration
- Keep the nodes V-S in a priority queue with d'(v) as their keys
  - ChangeKey to decrease the key of the node
    - Can occur at most once per edge, when the tail of the edge e' is added to S
  - ExtractMin to select the node v to add to the set S

—-

Personal Thoughts

This section provided more clarity to the class discussion on Dijkstra's algorithm. However, I did still struggle to understand the whole algorithm in action. I think with more examples and practice, the algorithm will actually be very simple to understand. Further, running through the algorithm with an example will help solidify why we want to use a priority queue. Readability: 6.5 Interesting: 6

4.5 The Minimum Spanning Tree Problem

The Problem

Find a subset of the edges T in E, so that the graph (V,T) is connected and the total cost is as small as possible.
Let T be a minimum-cost solution to the network design problem defined above. Then (V,T) is a tree.
- (V,T) must be connected.
- Contains no cycles
- If it contains a cycle, removing an edge e in the cycle makes (V,T) still be connected. And it is cheaper!
If some edges are 0 then there can be a minimum-cost solution with extra edges can could optimally be deleted
Subset T from E is a spanning tree of G if (V,T) is a tree
GOAL: Find the cheapest spanning tree

Designing Algorithms

Three Greedy Algorithms each correctly finds a minimum spanning tree

Start without any edges, insert edges from E in order of increasing cost; only insert an edge if it does not create a cycle. Otherwise, discard.
Prim's Algorithm: start with a root node s, try to grow greedily from s, each time we simply add the node that can be attached as cheaply as possible to the partial tree we already have; grow S by one node, adding the node v that minimizes the “attachment cost”
Kruskal's Algorithm: start with a full graph, begin deleting edge in order of decreasing cost. Delete edges as long as it does not disconnect the graph.

Analyzing the Algorithms

Make the assumption that all edge costs are distinct from one another
Cut property: Assume that all edge costs are distinct. Let S be any subset of nodes that is neither empty nor equal to all of V, and let edge e = (v,w) be the minimum-cost edge with one end in S and the other in V-S. Then every minimum spanning tree contains the edge e.
- T is a spanning tree that doesn't contain e
- Identify an edge e' in T that is more expensive than e
  - Edge e' has one end in S and the other in V - S
  - e is the cheapest edge with this property and so the cost of e' is greater than the cost of e
- Exchange e for e'
- Cost of T' is less than that of T, as desired.

Kruskal's Algorithm produces a minimum spanning tree of G
- Consider any edge e added by the Algorithm; let S be the set of all nodes to which v has a path at the moment just before e is added; only one side of e is in S, since adding e does not create a cycle; No such edge from S to V-S has been encountered yet; So e is the cheapest and belongs in the minimum spanning tree.
Prim's Algorithm produces a minimum spanning tree of G
- In each iteration a set S has been constructed and a node v and edge e are added that minimize the quantity. e is the cheapest edge so by the Cut Property it is in every minimum spanning tree.
Assume all edge costs are distinct. Let C be any cycle in G, and let edge e=(v,w) be the most expensive edge belonging to C. Then e does not belong to any minimum spanning tree of G.
- T is a spanning tree that contains e; show that T does not have the minimum cost; show this by swapping e for a cheaper edge that still forms a spanning tree. Delete e from T; use a different eddge to stitch the nodes back together.
The Reverse-Delete Algorithm produces a minimum spanning tree of G
- Consider any edge e = (v,w) removed by Reverse-Delete. If edge e is removed, it is a part of a cycle. It is the most expensive as it is the first encountered. As long as output is all connected, the proof is done.
- Combination of the Cut Property and the Cycle Property implies that any algorithm that builds a spanning tree by repeatedly including and removing edges when justified, will end up with a minimum spanning tree.
- If edge costs are not distinct, make them distinct by extremely small numbers. This creates tie-breakers to resolve comparisons.

Implementing Prim's Algorithm

Runtime: O(mlogn)
Keep the nodes in a priority queue with these attachment costs a(v) as the keys
Select a node with extract min; update the attachment costs using change key
n-1 iterations in which we perform extract min
Perform change key at most once for each edge
Can be implemented with n nodes and m edges to run in O(m) time, plus the time for n extract min and m change key operations

Extensions

Point-to-point distances: willing to reduce these even if we pay more for the set of edges
Congestion on the edges: one could seek a spanning tree in which no single edge carries more than a certain amount of traffic
Is a spanning tree event the right solution? Resilience as an explicit goal.

Personal Thoughts

This section was not overly hard to understand. A lot of the concepts were straightforward and we discussed them thoroughly in class. Not surprisingly, the part I struggled with was understanding the various proofs in this section. This section was more proof-heavy than some of the other sections, so that made the section a little more dense. Readability: 6 Interesting: 6

4.6 Implementing Kruskal's Algorithm: The Union-Find Data Structure

Consider a scenario in which a graph evolves through the addition of edges: Fixed population of nodes that grows over time by having edges appear between certain pairs of nodes.

Data structure called Union-Find to store a representation of the components in a way that supports rapid searching and expansion.

The Problem

Union-Find data structure allows us to maintain disjoint sets
- Can use it to see if two nodes are in the same set
If we add edge (u,v) to the graph, first test if u and v are already in the same connected component. If not, Union(Find(u),Find(v)) to merge the two components
- This is not designed to handle the effects of edge deletion

MakeUnionFind(S): where all elements are in separate sets
Find(u): return the name of set containing u
Union(A,B): cahnge the data structure by merging the sets A and B into a single set. → O(logn)

A Simple Data Structure for Union-Find

array Component: contains the name of the set currently containing each element; size n
S: set with n elements
initialize Component[s] = s for all s in S
Find(v): O(1)
Union(A,B): O(n)

Explicitly maintain the list of elements in each set, so we don't have to look through the whole array to find the elements that need updating.
Maintain an additional array size of length n, where size[A] is the size of set A, and when a Union(A,B) operation is performed, we use the name of the larger set for the union.
After these optimizations: worst case = O(n)

Consider the array implementation of Union-Find data structure for some set S of size n, where unions keep the name of the larger set. The Find operation takes O(1) time, MakeUnionFind(S) takes O(n) time, and any sequence of k Union operations takes at most O(klogk) time.
- Consider a sequence of k Union operations
- Only updating the array Component is the only part that takes more than O(1) time.
- Componenet[v] gets updated at most logbase2(2k). At most 2k elements are involved in any Union operations, so bound of O(klogk)

A Better Data Structure for Union-Find

Use pointers
each node contained in a record with an associated pointer to the name of the set that contains v
Two sets A and B
- the name used for set A is a node v in A; the name used for set B is a node u in B
  - u or v will be the name of the combined set; assume its v
- Update u's pointer to point to v
Pointer-based data structure implements Union in O(1) time
Find operation: number of steps needed is exactly the number of times the set containing node u had to change its name; number of times Component[u] array position is updated: as large as O(n)
- To optimize, we keep the name of the larger set as the name of the union.
- We will maintain an additional field with the nodes: the size of the corresponding set.

Further Improvements

Speeding up the Find operation
Compress the path we followed after every Find operation by resetting all pointers along the path to point to the current name of the set.
Union: O(1)
MakeUnion: O(n)
Find(v): at most O(logn)
- gain from compression is in making subsequent calls to Find operation

Implementing Kruskal's Algorithm

Use Union-Find data structure
Sort edges by cost: O(mlogm)
At most one edge between any pair: O(mlogn)
Union-Find structure is used to maintain the connected components as edges are added
For each edge considered, compute Find(u) and Find(v). Check equality to see if they belong to different components
Merge if the algorithm decides to include edge e in the tree T

At most 2m Find and n-1 Union operations
Array-based implementation or Pointer-based implementation: O(mlogn)

Personal Thoughts

Just like the last section, this section was pretty reasonable to understand. We did go through this material in class, so that helped. The data operations were pretty simple, though there was a lot of variability in implementation style, which was a little confusing to keep track of. Overall, not too bad though! Readability: 6.5 Interesting: 6.5

4.7 Clustering

The Problem

Clustering arises when a collection of objects must be categorized into coherent groups.
Define a distance function: objects at a larger distance from one another are less similar to each other
- Group objects that are close / objects in different groups are far apart

Clusterings of Maximum Spacing: Given a set U of n objects; for each pair, there is a numerical distance; distances must be greater than zero and symmetric (pi,pj vs. pj,pi); we want a k-clustering with k groups; spacing of a k-clustering is the minimum distance between any pair of points lying in different clusters
- Goal: k-clustering with the maximum spacing possible

Designing the Algorithm

Consider growing a graph on the vertex set U
Grow a graph H on U edge by edge
- connected components = clusters
- Don't connect nodes already in the same cluster → avoids cycles : Single-Link Clustering
- H is a union of trees
Procedure: Kruskal's Algorithm
- Stop once we obtain k connected components (stop before the last k-1 edge is added)

Analyzing the Algorithm

The components C1, C2,…,CK formed by deleting the k-1 most expensive edges of the minimum spanning tree T constitute a k-clustering of maximum spacing
- Spacing of C: length d* of the (k-1)st most expensive edge
- Some other k-clustering C' partitions U into nonempty sets.
- The two clusterings C and C' are not the same
  - so one cluster Cr is not a subset of any of the k sets in C'
  - It must be that Kruskal's Algorithm added all the edges before we stopped it.
    - this means that this edge has at most d*, so the spacing of C' is at most d(p,p') ⇐ d*.

Personal Thoughts

This section was not bad, because it was very general information. The nitty gritty details that I'm sure are related to clustering were not really covered. The overall idea of clustering makes a lot of sense and the basic algorithm was not complicated to follow (especially since it built off of the last section's Kruskal's Algorithm). I am looking forward to seeing more examples/applications of clustering. Readability: 7 Interesting: 7

4.8 Huffman Codes and Data Compression

Greedy rule to shrink the size of the problem instance so that an equivalent smaller problem can then be solved by recursion
Data Compression: area that forms part of the foundations for digital communication.

The Problem

Encoding Symbols Using Bits
- Use a fixed number of bits to represent each symbol in the alphabet
- Concatenate bit strings to form text
- Tremendous waste to translate all letters into the same number of bits
  - use a small number of bits for frequent letters and larger number of bits for less frequent ones
- Need an optimal way to take advantage of the nonuniform frequencies of the letters.
Variable-Length Encoding Schemes
- Morse Code: translating each letters into a sequence of dots and dashes (0s and 1s, for comparison) → maps more frequent letters to shorter bit strings
  - Dealing with Ambiguity: short pauses between letters
Prefix Codes
- Ambiguity: bit string that encodes one letter is a prefix of the bit string that encodes another
- Prefix Code: for set S of letters is a function γ that maps each letter x in S to some sequence of zeros and ones, in such a way that for distinct x,y in S, the sequence γ(x) is not a prefix of the sequence γ(y).
- When scanning text, as soon as you've seen enough bits to match the encoding of some letter, output this as the first letter of the text.
Optimal Prefix Codes
- For each letter x in S, there is a frequency fx, representing the fraction of letters in the text that are equal to x. Frequencies sum to 1.
  - Total length of encoding = sum, over all letters x in S, of the number of times x occurs times the length of the bit string γ(x) used to encode x.
    - Average number of bits required per letter = ABL(γ0

Designing the Algorithm

Representing Prefix Codes Using Binary Trees
- Rooted binary tree T: number of leaves is equal to the size of the alphabet S; each leaf has a distinct letter in S
- For each letter x in S, we follow the path from the root to the leaf labeled x; each time the path goes from a node to its left child, we write down a 0, and each time the path goes from a node to its right child, we write down a 1.
- The encoding of S constructed from T is a prefix code.
  - For the encoding of x to be a prefix of the encoding of y: path from root to x would have to be a prefix of the path from root to y. Same as saying that x would lie on the path from the root to y, which isn't possible if x is a leaf.
- Given a prefix code y, we can build a binary tree recursively
  - Start with root, all letters x in S whose encodings begin with a 0 = leaves in the left subtree of the root; opposite for encodings beginning with a 1; use recursion
- A binary tree is full if each node that is not a leaf has two children: there are no nodes with exactly one child.
- Binary tree corresponding to the optimal prefix code is full
  - Proof using an exchange argument
  - T = binary tree corresponding to the optimal prefix code
  - Suppose it contains a node u with exactly one child v
  - Convert T into T' by replacing node u with v.
    - if u = root, delete u and use v as root
    - otherwise, let w be the parent of u in T, delete u, and make v be a child of w in place of u
A First Attempt: The Top-Down Approach
- Goal: produce a labeled binary tree in which the leaves are as close to the root as possible
- Pack leaves as tightly as possible: want a “nearly balanced” split of the alphabet
  - Shannon-Fano Codes: no version of this top-down splitting strategy is guaranteed to always produce an optimal prefix code
What If We Knew The Tree Structure of the Optimal Prefix Code?
- Assume we have the binary tree T* corresponding to an optimal prefix code - need to figure out the labeling
- Suppose that u and v are leaves of T*, such that depth(u) < depth(v). Further suppose that in a labeling of T* corresponding to an optimal prefix code, leaf u is labeled with y in S and leaf v is labeled with z in S. Then fy >= fx.
  - Proof using an exchange argument
  - if fy<fz, then effect of the exchange is as follows: the multiplier on fy increases [from depth(u) to depth(v)] and the multiple on fz decreases by the same amount [from depth(v) to depth(u)].
    - change to the overall sum is a negative number, contradicting the supposed optimality of the prefix code.
  - Having a lower-frequency letter at a strictly smaller depth than some other higher-frequency letter rules out for an optimal solution
  - Take all leaves of depth 1, label them with the highest-frequency letters, do the same with each level of increasing depth → leads to an optimal T*
- w is a leaf of T*
  - If w were not a leaf, there would be some leaf w' in the subtree below it. But then w' would have a depth greater than that of v, contradicting our assumption that v is a leaf of maximum depth in T*.
    - There is an optimal prefix code, with corresponding tree T*, in which the two lowest-frequency letters are assigned to leaves that are sibling in T*.
An Algorithm to Construct an Optimal Prefix Code
- Suppose y* and z* are the two lowest-frequency letters in S.
  - can lock them together because we know they end up as sibling leaves below a common parent
  - Replace y* and z* with meta-letter, obtaining an alphabet that is one letter smaller. Recursively find a prefix code for the smaller alphabet, and then “open up” the meta-letter back into y* and z* to obtain a prefix code for S:

Huffman's Algorithm

bottom-up approach: focuses on the leaves representing the two lowest-frequency letters, and then continues by recursion.

Analyzing the Algorithm

The Optimality of the Algorithm
- Recursive algorithm: proof by induction
- Base: optimal for all two letter alphabets (one bit per letter)
- By induction: optimal for all alphabets of size k-1
- Consider: alphabet S of size k
- The Huffman code for a given alphabet achieves the minimum average number of bits per letter of any prefix code
Implementation and Running Time
- Can be polynomial time in k, the number of letters in the alphabet
  - recursive calls: sequence of k-1 iterations
  - each iteration consists simply of identifying the two lowest-frequency letters → O(k)
  - combined: O(k^2)
- Use a priority queue to maintain a set of k elements
  - allows for insertion and extraction
  - Each iteration: extract the minimum twice; insert a new letter whose key is the sum of these two minimum frequencies
  - Using priority queues via heaps: each insertion and extraction: O(logk) → over k iterations, O(klogk)

Extensions

Data Compression: black and white images already use one bit per letter
- need to be highly compressed: use a “fraction of a bit” for each white bixel
Cannot adapt to changes in text: frequency is stable for half the text, but then changes radically
- Halfway into the sequence insert some kind of instruction to completely change the encoding
- This will pay off in the long run: called adaptive compression schemes

Personal Thoughts

While arguably one of the more interesting topics of this textbook, this section was long and very dense. There was a lot of information that was hard to digest all at once. Our discussion in class was helpful, but not long enough to help make this section very clear. I will most likely need to review this chapter section again. Readability: 5 Interesting: 8

Table of Contents

Chapter 4

4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead

Personal Thoughts

4.2 Scheduling to Minimize Lateness: An Exchange Argument

Personal Thoughts

4.4 Shortest Paths in a Graph

Personal Thoughts

4.5 The Minimum Spanning Tree Problem

Personal Thoughts

4.6 Implementing Kruskal's Algorithm: The Union-Find Data Structure

Personal Thoughts

4.7 Clustering

Personal Thoughts

4.8 Huffman Codes and Data Compression

Personal Thoughts