Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
courses:cs211:winter2011:journals:chen:chapter_4 [2011/02/16 05:07] – created zhongc | courses:cs211:winter2011:journals:chen:chapter_4 [2011/03/02 03:13] (current) – zhongc | ||
---|---|---|---|
Line 11: | Line 11: | ||
===== 4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead ===== | ===== 4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead ===== | ||
Section Summary: | Section Summary: | ||
+ | |||
+ | Problem Motivation: | ||
+ | We have a set of requests {1, 2 ..... n}; the ith request corresponds to an interval of time starting at s(i) | ||
+ | and finishing at f(i). | ||
+ | We’ll say that a subset of the requests is **compatible** if no two of them overlap in time, | ||
+ | and our **goal** is to accept as large a compatible subset as possible. Compatible sets of maximum size will be called **optimaL**. | ||
+ | |||
+ | * Most Natural attempt: | ||
+ | * Result: not optimal => | ||
+ | * Counter-Example: | ||
+ | reject a lot of requests for shorter time intervals | ||
+ | * attempt 2: smallest interval of time | ||
+ | * result: not optimal | ||
+ | |||
+ | * Optimal solution: choosing the request that finishes the earliest | ||
+ | * Proof of optimality: greedy stays ahead in each step. | ||
+ | |||
+ | **A Related Problem: Scheduling All Intervals** | ||
+ | Interval Partitioning Problem | ||
+ | |||
+ | algorithm on P149 | ||
+ | |||
+ | Proof Approaches: | ||
+ | Greedy algorithm stays ahead | ||
+ | • Does better than any other algorithm at each step | ||
+ | Exchange argument | ||
+ | • Transform any solution into a greedy solution | ||
+ | Structural Argument | ||
+ | • Figure out some structural bound that all solutions | ||
+ | must meet | ||
+ | |||
+ | |||
+ | Readable/ | ||
===== 4.2 Scheduling to Minimize Lateness: An Exchange Argument ===== | ===== 4.2 Scheduling to Minimize Lateness: An Exchange Argument ===== | ||
+ | The Problem | ||
+ | We have a single resorce, and a set of job requests with a start time and a deadline. We are trying to schedule the tasks | ||
+ | to minimize the max lateness. | ||
+ | |||
+ | opt Algo: | ||
+ | Earliest deadline first. | ||
+ | |||
+ | The main step in showing the optimality of our algorithm is to establish | ||
+ | that there is an optimal schedule that has no inversions and no idle time. | ||
+ | |||
+ | analysis: | ||
+ | Our plan here is to gradually | ||
+ | modify O while preserving its optimality at each step, but eventually transforming | ||
+ | it into a schedule that is identical to the schedule A found by the greedy | ||
+ | algorithm. We refer to this type of analysis as an **exchange argument** | ||
+ | |||
+ | There is an optimal schedule with no idle time. | ||
+ | All schedules with no inversions and no idle time have the same | ||
+ | maximum lateness. | ||
+ | Def. An **inversion** in schedule S is a pair of jobs i and j such that: | ||
+ | di < dj but j scheduled before i. | ||
+ | |||
+ | Greedy Exchange Proofs on slide 4.2 p7. | ||
+ | |||
+ | Running time: nlongn --- need to sort the deadlines first. | ||
+ | |||
+ | |||
+ | Readable/ | ||
+ | ===== 4.4 Shortest Paths in a Graph ===== | ||
+ | problem: | ||
+ | |||
+ | Given a directed weighted graph, what is the shortest path? | ||
+ | |||
+ | Dijkstra' | ||
+ | The algorithm maintains a set S of vertices u for which we have determined a shortest-path distance d(u) | ||
+ | from s; this is the " | ||
+ | |||
+ | Now, for each node v ~ V-S, we determine the shortest path that can be constructed by traveling along a path through the explored part S to some u ~ $, followed by the single edge (u, v). That is, we consider the quantity d’(v) = mine=(a, | ||
+ | |||
+ | |||
+ | We always form shortest new s-v path from a path in S followed by a single edge | ||
+ | • Proof of optimality: Stays ahead of all other | ||
+ | solutions | ||
+ | Each time selects a path to a node v, that path is | ||
+ | shorter than every other possible path to v (stay ahead proof) | ||
+ | |||
+ | proof of correctness: | ||
+ | **inductive proof** | ||
+ | Inductive hypothesis: Assume true for |S| = k, k ≥ 1 | ||
+ | Let v be the next node added to S by Greedy, and let uv be | ||
+ | the chosen edge. | ||
+ | The shortest s->u path plus u->v is an s->v path of length π(v) | ||
+ | Consider any s->v path P. It's no shorter than π(v). | ||
+ | Let x->y be the first edge in P that leaves S,and let P' be the subpath to x. | ||
+ | P is already too long as soon as it leaves S. | ||
+ | |||
+ | |||
+ | Question: Still cannot see why the order matter. If we do a BFS and visit all vertices and update all distance labels, wouldn' | ||
+ | |||
+ | |||
+ | |||
+ | Readable/ | ||
+ | |||
+ | ===== 4.5 The Minimum Spanning Tree Proble ===== | ||
+ | Section Summary | ||
+ | |||
+ | **Motivation of the problem/ | ||
+ | How to minimize cost of laying cables? | ||
+ | How to find the minimum spanning tree for a connected weighed graph? | ||
+ | |||
+ | **Definition: | ||
+ | Min Spanning tree: | ||
+ | spanning: spans all nodes in graph | ||
+ | Tree: | ||
+ | Minimum: | ||
+ | |||
+ | Claim: MST contains no cycle | ||
+ | Proof: Contradiction, | ||
+ | Thus the tree is not minimum. | ||
+ | |||
+ | **Two Properties: | ||
+ | When is it safe to invclude an edge in the MST? | ||
+ | Cut property. Let S be any subset of nodes, and let e | ||
+ | be the min cost edge with exactly one endpoint in S. | ||
+ | Then MST contains e. | ||
+ | proof on P170 | ||
+ | |||
+ | When is it safe to remove an edge from the MST? | ||
+ | Cycle property. Let C be any cycle, and let f be the | ||
+ | max cost edge belonging to C. Then MST does not | ||
+ | contain f. | ||
+ | proof on P170 | ||
+ | |||
+ | cycle-cut intersection proof | ||
+ | |||
+ | **Three algorithms** | ||
+ | |||
+ | Prim: | ||
+ | Start with an arbitrary node s and add it to the tree T. Grow T outward by choosing the cheapest edge e with on endpoint in T and the other out of T. | ||
+ | Proof on P172 | ||
+ | Cut property | ||
+ | |||
+ | Kruskal: | ||
+ | Sort the edges in G in ascending orders. pick the smallest edge that does not creat a cycle. | ||
+ | |||
+ | Reverse-delete: | ||
+ | It is the rever version of Kruskal | ||
+ | |||
+ | **Implementations of the two algorithms** | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | Questions: | ||
+ | I had some trouble understanding the proof for Cayley' | ||
+ | |||
+ | |||
+ | ===== 4.6 Implementing Kruskal’s Algorithm: The Union-Find Data Structure ===== | ||
+ | |||
+ | Section Summary: | ||
+ | |||
+ | Problem Motivation: | ||
+ | Try to implement the Kruskal' | ||
+ | |||
+ | pointer based implementation | ||
+ | |||
+ | Find Operations | ||
+ | Log(n) | ||
+ | |||
+ | Union Operations | ||
+ | Log(n) | ||
+ | |||
+ | |||
+ | The Kruskal Algorithm in detail | ||
+ | |||
+ | |||
+ | Intereting/ | ||
+ | |||
+ | |||
+ | |||
+ | ===== 4.7 Clustering ===== | ||
+ | |||
+ | Section Summary | ||
+ | definition: | ||
+ | Divide objects into clusters so that points in | ||
+ | different clusters are far apart. | ||
+ | |||
+ | Motivation/ | ||
+ | Identify patterns in gene expression | ||
+ | |||
+ | K-Clustering: | ||
+ | farthest two elements in one cluster are not farther apart than any | ||
+ | elements outside of the cluster. | ||
+ | |||
+ | It is basically Kruskal' | ||
+ | |||
+ | proof on P184 | ||
+ | |||
+ | Interesting/ | ||
+ | |||
+ | |||
+ | ===== 4.8 Huffman Codes and Data Compression ===== | ||
+ | |||
+ | summary | ||
+ | |||
+ | Motivation: | ||
+ | How to encode data so that transmission could be more efficient? | ||
+ | Answer: use less bits on the data without much differentiation!(Low entropy data?) | ||
+ | |||
+ | We use Huffman codes. | ||
+ | |||
+ | If we use all letters in the same frequency, then there is nothing to encode or compress, but when we do not, which is often the case, | ||
+ | we can always represent the more frequently used words with less bits. | ||
+ | |||
+ | Watch out for prefix ambiguity! | ||
+ | |||
+ | **variable-length encoding** | ||
+ | |||
+ | Goal: minimize Average number of Bits per Letter (ABL): | ||
+ | calculate abl using the expected value. | ||
+ | |||
+ | Algorithm | ||
+ | |||
+ | implemenation | ||
+ | * Binary tree for the prefix codes | ||
+ | * Priority queue (with heap) choosing the node with lowest frequency | ||
+ | Cost (nlogn) Why? logn inserting and dequeuing. do it n times. | ||
+ | Interesting/ |