Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
courses:cs211:winter2012:journals:carrie:ch4 [2012/02/23 17:55] – [The Minimum Spanning Tree Problem] hopkinsc | courses:cs211:winter2012:journals:carrie:ch4 [2012/03/02 16:51] (current) – hopkinsc | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Chapter 4 ====== | ====== Chapter 4 ====== | ||
Greedy Algorithms | Greedy Algorithms | ||
+ | |||
+ | " | ||
+ | |||
+ | Greed algorithm definition: | ||
+ | * Hard to define | ||
+ | * builds up a solution in small steps. | ||
+ | * need to prove the greedy algorithm is optimal. | ||
+ | |||
+ | Two methods of proof | ||
+ | * greedy algorithm stays ahead (4.1) | ||
+ | * Exchanage argument | ||
+ | |||
+ | All this leads up to minimum spanning tree! | ||
==== 4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead ==== | ==== 4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead ==== | ||
Greedy rule that works | Greedy rule that works | ||
Line 70: | Line 83: | ||
* 4.25 Kruskal' | * 4.25 Kruskal' | ||
+ | |||
+ | ==== 4.7 Clustering ==== | ||
+ | The problem | ||
+ | * clustering arises when you have a group of objects and want to stratify them into different groups. | ||
+ | * common approach is to define a //distance function// on the objects where the larger the distance the less " | ||
+ | * Given a distance function the clustering approach seeks to divide them into groups so that objects in the same group are close and objects in the other group are far apart. | ||
+ | |||
+ | // | ||
+ | * further def of distance function: d(i,i) = 0, d(i,j) > 0, d(i, | ||
+ | * if we want k groups you need k clusterings. | ||
+ | * we also want clustering with maximum possible spacing, which means we want the clusters to be as far as possible from each other. | ||
+ | |||
+ | Designing the algorithm: | ||
+ | * we grow a graph from the set of vertices | ||
+ | * add edges in increasing order of distances, don't add an edge if the two vertices are already in the same cluster. | ||
+ | * if we add an edge that spans two distinct components we will be merging the two clusters. (call this single link clustering) | ||
+ | * we are using hierachial agglomerative clustering to make this work. | ||
+ | * we are using Kruskal' | ||
+ | * we stop just before we add the k-1 edges. | ||
+ | |||
+ | Analyzing the algorithm | ||
+ | * 4.26: The components C1, C2, ..., Ck formed by deleting the k -1 most expensive edges of the minimum spanning tree T constitute a k clusering of maximum spacing. | ||
+ | * proof on page 150. | ||
+ | |||
+ | ==== 4.8 HUffman codes and data compression ==== | ||
+ | * basic problems of data compression | ||
+ | |||
+ | encoding symbols using bits | ||
+ | * convert text into binary aka long strings of bits | ||
+ | * could fix a number of bits to each letter. so if we have k letters need 2^k bits. | ||
+ | * this might not be the most efficient. | ||
+ | * how do you lower the average number of bits per letter? | ||
+ | |||
+ | Variable-length encoding schemes | ||
+ | * ex. morse code | ||
+ | * morse code the lettters are prefixes. we don't want this. | ||
+ | |||
+ | prefix code | ||
+ | * means that no letters are prefixes. don't need spaces. | ||
+ | * ex. a = 00, b = 10, c = 11. 110010 is clearly cab. | ||
+ | * we will want an optimal prefix codes aka we want less bits for more frequent letters. | ||
+ | * will lower abl. | ||
+ | * see def. of encoding length on page 165. (sum of frequencys * length of encoding) | ||
+ | |||
+ | designing the algorithm | ||
+ | * representing prefix codes using bin trees. | ||
+ | * each leaf is a letter | ||
+ | * going left is 0 going right is one | ||
+ | * see ex. on 168 | ||
+ | * will give a prefix code if all the letters are leafs! (4.27) | ||
+ | * 4.28 the bin tree for optimal prefix code is full | ||
+ | * Top down approach | ||
+ | * different from how we did it in class | ||
+ | *this isn't optimal | ||
+ | *see 4.29 and page 170-171 | ||
+ | *Now lets find the optimal prefix code algorithm | ||
+ | * 4.31 - there is an optimal prefix code, with corresponding tree t* in which the two lowest-frequency letters are assigned to leaves that are siblings in t* | ||
+ | * called huffman' | ||
+ | * see page 172 | ||
+ | * we also did a different algorithm in class | ||
+ | |||
+ | Analyzing Huffman' | ||
+ | * proof of optimality - on p. 174. (Very mathy, uses summation notation and not words) | ||
+ | * second part of proof uses proof by contradiction. | ||
+ | * see page 174 - 175 | ||
+ | * Running time = o(klogk) | ||
+ | |||
+ | extensions | ||
+ | * can use it with images and encoding messages. | ||
+ | * I can't remember where or why I went to this, (was it at UDEL?, but I remember going to talk and listening to a woman talk about how her job was to do this to encode messages in images. | ||
+ | |||