====== Chapter 4 ====== Greedy Algorithms "Greed... is good. Greed is right. Greed works" - Wall Street. Greed algorithm definition: * Hard to define * builds up a solution in small steps. * need to prove the greedy algorithm is optimal. Two methods of proof * greedy algorithm stays ahead (4.1) * Exchanage argument All this leads up to minimum spanning tree! ==== 4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead ==== Greedy rule that works * accept the request that finishes first * algorithm on page 118 Show that this algorithm is optimal * need to show that this algorithm produces the same size interval as the optimal solution * To prove this algorithm is optimal need lemma. * Lemma 1: for all indices r<= k we have f(ir) <= f(jr). proof on p. 120 * now can prove greedy algorithm is optimal. see page 121 Scheduling all Intervals * This is similar to the classroom multi resource problem we did in class * 4.4: in any instance of interval partitioning the number of resources needed is at least the depth of the set of intervals. * algorithm on p. 124 * not sure how to code the "exclusion" idea in python. * 4.5: if we use the greedy algoirthm above every interval will be assigned a label, and no two overlapping intervals will receive the same label. * 4.6: the greedy algoithm above schedules every interval on a resource, using a number of resources equal to the depth of the set of interals. This is the optimal number of resources needed. ==== 4.2 Scheduling to Minimize Lateness: An exchange Argument ==== * We did a lot with this in class - talks about maximum lateness * optimal greedy rule - earliest deadline firt * algorithm pgs. 127 - 128 * 4.7 there is an optimal sched with no idle time * 4.8 all schedules with no inversions and no idle time have the same maximum lateness * 4.9 there is an optimal schedule that has no inversions and no idle time. * swapping argument page 129 and page 130 * 4.10 the schedule A produced by the greedy algorithm has optimal maximum lateness L. ==== 4.3 Optimal Catching: A more Complex Exchange Argument ==== * A problem that focuses on a sequence of requests of a different form. * I don't think we covered this in class and its not listed on the schedule so I'm not going to write on it. If you want me to just let me know! ==== 4.4 Shortest Paths in a Graph ==== Problem = how to find the shortest paths either from one point to another or to all the points. * Dijkstra algorithm * on page 138 * 4.14 consider the set S at any point in the algorithm's execution. For each u element of S, the path Pu is a shortest s-u path. * proof on page 139 * used dijkstra's for a project in highschool before. * 4.15 using a priority queue, Dijkstra's algorithm can be implemented on a graph with n nodes and m edges to run in O(m) time, plus the time for extractMin and Changekey operations. ==== 4.5 The Minimum Spanning Tree Problem ==== * problem: looks for way to hit all the verticies the cheapest way possible. * 4.16: Let T be a minimum-cost solution to the network design problem defined above. Then (V,T) is a tree. * Three algorithms that work: * Kruskals: orders edges from low cost to high cost and keeps adding in the min cost that does not cause a cycle * Prim's - pick a source node and add the node that can be added most cheaply to the tree. Continue to do this. * Reverse - delete algorithm - delete highest cost edges as long as it does not disconnect the tree. * 4.17: assume that all edges costs are distinct. Let s be any subset of nodes that is neither empty nor equal to all of V, and let edge e = (v,w) be the minimum cost edge with one end in S and the other in V - S. Then every minimum spanning tree contains the edge e. * from pgs 146 - 148 the book discusses the why the three algoithms above are optimal. * we went over these in class. the book also discusses the cycle and cut properties we talked about * pgs 149 - 150 talk about the implementation of prims algorithm * important take away for prim use a priority queue ==== 4.6 implementing Kruskal's algorithm: the union-find data structure ==== * union-find data - * given a node u, the operation Find(u) will return the name of the set containing U. * Can be used to test if two nodes are in the same set. * Can merge two sets. Use the union-find to maintain connected components of graph G(V,E) * more info on p 152. * 4.23 consider the array implementation of the union-find data structure for some S of size n, where unions keep the name of the larger set. The Find operation takes O(1) time, MakeUninionFind(S) takes O(n) time, and any sequence of k Union operations takes at most O(klogk) time. * pf on page 153 * another implmentation of Union-Find possible. * 4.24 Consider the above pointer-based implementation of the Union-Find data structure for some set S of size n, where unions keep the name of the larger set. A union operation takes O(1) time. MakeUnionFind(S) takes O(n) time. and a find operation takes O(logn) time. * Definitely still have some questions of the Union-find data structure, but I am going to go review the slides I missed. * implementing Kruskal's Algorithm * sort the edges by cost, then use union-find to maintain the connected components of (V,T) more details on p. 157 * 4.25 Kruskal's algoirthm can be implemented on a graph with n nodes and m edges to run in O(mlogntime) ==== 4.7 Clustering ==== The problem * clustering arises when you have a group of objects and want to stratify them into different groups. * common approach is to define a //distance function// on the objects where the larger the distance the less "related" the objects are to each other. In a physicial world this function could be taken as literal distance. * Given a distance function the clustering approach seeks to divide them into groups so that objects in the same group are close and objects in the other group are far apart. //clustering of maximum spacing// * further def of distance function: d(i,i) = 0, d(i,j) > 0, d(i,j)=d(j,i) * if we want k groups you need k clusterings. * we also want clustering with maximum possible spacing, which means we want the clusters to be as far as possible from each other. Designing the algorithm: * we grow a graph from the set of vertices * add edges in increasing order of distances, don't add an edge if the two vertices are already in the same cluster. * if we add an edge that spans two distinct components we will be merging the two clusters. (call this single link clustering) * we are using hierachial agglomerative clustering to make this work. * we are using Kruskal's minimum spanning tree algorithm thus the union find data structure. * we stop just before we add the k-1 edges. Analyzing the algorithm * 4.26: The components C1, C2, ..., Ck formed by deleting the k -1 most expensive edges of the minimum spanning tree T constitute a k clusering of maximum spacing. * proof on page 150. ==== 4.8 HUffman codes and data compression ==== * basic problems of data compression encoding symbols using bits * convert text into binary aka long strings of bits * could fix a number of bits to each letter. so if we have k letters need 2^k bits. * this might not be the most efficient. * how do you lower the average number of bits per letter? Variable-length encoding schemes * ex. morse code * morse code the lettters are prefixes. we don't want this. prefix code * means that no letters are prefixes. don't need spaces. * ex. a = 00, b = 10, c = 11. 110010 is clearly cab. * we will want an optimal prefix codes aka we want less bits for more frequent letters. * will lower abl. * see def. of encoding length on page 165. (sum of frequencys * length of encoding) designing the algorithm * representing prefix codes using bin trees. * each leaf is a letter * going left is 0 going right is one * see ex. on 168 * will give a prefix code if all the letters are leafs! (4.27) * 4.28 the bin tree for optimal prefix code is full * Top down approach * different from how we did it in class *this isn't optimal *see 4.29 and page 170-171 *Now lets find the optimal prefix code algorithm * 4.31 - there is an optimal prefix code, with corresponding tree t* in which the two lowest-frequency letters are assigned to leaves that are siblings in t* * called huffman's algorithm * see page 172 * we also did a different algorithm in class Analyzing Huffman's algoirthm * proof of optimality - on p. 174. (Very mathy, uses summation notation and not words) * second part of proof uses proof by contradiction. * see page 174 - 175 * Running time = o(klogk) extensions * can use it with images and encoding messages. * I can't remember where or why I went to this, (was it at UDEL?, but I remember going to talk and listening to a woman talk about how her job was to do this to encode messages in images.