Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
courses:cs211:winter2012:journals:carrie:ch4 [2012/02/23 17:55] – [The Minimum Spanning Tree Problem] hopkinsccourses:cs211:winter2012:journals:carrie:ch4 [2012/03/02 16:51] (current) hopkinsc
Line 1: Line 1:
 ====== Chapter 4 ====== ====== Chapter 4 ======
 Greedy Algorithms Greedy Algorithms
 +
 +"Greed... is good. Greed is right. Greed works" - Wall Street. 
 +
 +Greed algorithm definition: 
 +  * Hard to define
 +  * builds up a solution in small steps. 
 +  * need to prove the greedy algorithm is optimal. 
 +
 +Two methods of proof
 +  * greedy algorithm stays ahead (4.1) 
 +  * Exchanage argument
 +
 +All this leads up to minimum spanning tree! 
 ==== 4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead ==== ==== 4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead ====
 Greedy rule that works Greedy rule that works
Line 70: Line 83:
         * 4.25 Kruskal's algoirthm can be implemented on a graph with n nodes and m edges to run in O(mlogntime)              * 4.25 Kruskal's algoirthm can be implemented on a graph with n nodes and m edges to run in O(mlogntime)     
  
 +
 +==== 4.7 Clustering ====
 +The problem 
 +  * clustering arises when you have a group of objects and want to stratify them into different groups. 
 +  * common approach is to define a //distance function// on the objects where the larger the distance the less "related" the objects are to each other. In a physicial world this function could be taken as literal distance. 
 +  * Given a distance function the clustering approach seeks to divide them into groups so that objects in the same group are close and objects in the other group are far apart. 
 +
 + //clustering of maximum spacing// 
 +  * further def of distance function: d(i,i) = 0, d(i,j) > 0, d(i,j)=d(j,i)
 +  * if we want k groups you need k clusterings. 
 +  * we also want clustering with maximum possible spacing, which means we want the clusters to be as far as possible from each other. 
 +
 +Designing the algorithm: 
 +  * we grow a graph from the set of vertices
 +    * add edges in increasing order of distances, don't add an edge if the two vertices are already in the same cluster. 
 +    * if we add an edge that spans two distinct components we will be merging the two clusters. (call this single link clustering)
 +    * we are using hierachial agglomerative clustering to make this work. 
 +    * we are using Kruskal's minimum spanning tree algorithm thus the union find data structure. 
 +        * we stop just before we add the k-1 edges. 
 +
 +Analyzing the algorithm
 +  * 4.26: The components C1, C2, ..., Ck formed by deleting the k -1 most expensive edges of the minimum spanning tree T constitute a k clusering of maximum spacing. 
 +  * proof on page 150. 
 +
 +==== 4.8 HUffman codes and data compression ====
 +  * basic problems of data compression
 +
 +encoding symbols using bits 
 +  * convert text into binary aka long strings of bits
 +  * could fix a number of bits to each letter. so if we have k letters need 2^k bits. 
 +  * this might not be the most efficient. 
 +  * how do you lower the average number of bits per letter?
 +
 +Variable-length encoding schemes
 +  * ex. morse code
 +  * morse code the lettters are prefixes. we don't want this. 
 +
 +prefix code
 +  * means that no letters are prefixes. don't need spaces.
 +  * ex. a = 00, b = 10, c = 11. 110010 is clearly cab. 
 +  * we will want an optimal prefix codes aka we want less bits for more frequent letters. 
 +      * will lower abl. 
 +  * see def. of encoding length on page 165. (sum of frequencys * length of encoding) 
 +
 +designing the algorithm
 +  * representing prefix codes using bin trees. 
 +    * each leaf is a letter
 +    * going left is 0 going right is one 
 +    * see ex. on 168
 +    * will give a prefix code if all the letters are leafs! (4.27)
 +    * 4.28 the bin tree for optimal prefix code is full
 +  * Top down approach
 +    * different from how we did it in class
 +    *this isn't optimal
 +    *see 4.29 and page 170-171
 +  *Now lets find the optimal prefix code algorithm
 +    * 4.31 - there is an optimal prefix code, with corresponding tree t* in which the two lowest-frequency letters are assigned to leaves that are siblings in t*
 +    * called huffman's algorithm
 +    * see page 172
 +    * we also did a different algorithm in class
 +
 +Analyzing Huffman's algoirthm
 +  * proof of optimality - on p. 174. (Very mathy, uses summation notation and not words)
 +  * second part of proof uses proof by contradiction. 
 +  * see page 174 - 175
 +  * Running time = o(klogk)
 +
 +extensions
 +  * can use it with images and encoding messages. 
 +  * I can't remember where or why I went to this, (was it at UDEL?, but I remember going to talk and listening to a woman talk about how her job was to do this to encode messages in images. 
 + 
  
  
  
  
courses/cs211/winter2012/journals/carrie/ch4.1330019700.txt.gz · Last modified: 2012/02/23 17:55 by hopkinsc
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0