Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2012:journals:carrie:ch4 [2012/02/23 17:55] – [The Minimum Spanning Tree Problem] hopkinsc
+++ courses:cs211:winter2012:journals:carrie:ch4 [2012/03/02 16:51] (current) – hopkinsc
@@ Line 1: / Line 1: @@
 ====== Chapter 4 ======
 Greedy Algorithms
+"Greed... is good. Greed is right. Greed works" - Wall Street.
+Greed algorithm definition:
+  * Hard to define
+  * builds up a solution in small steps.
+  * need to prove the greedy algorithm is optimal.
+Two methods of proof
+  * greedy algorithm stays ahead (4.1)
+  * Exchanage argument
+All this leads up to minimum spanning tree!
 ==== 4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead ====
 Greedy rule that works
@@ Line 70: / Line 83: @@
         * 4.25 Kruskal's algoirthm can be implemented on a graph with n nodes and m edges to run in O(mlogntime)
+==== 4.7 Clustering ====
+The problem
+  * clustering arises when you have a group of objects and want to stratify them into different groups.
+  * common approach is to define a //distance function// on the objects where the larger the distance the less "related" the objects are to each other. In a physicial world this function could be taken as literal distance.
+  * Given a distance function the clustering approach seeks to divide them into groups so that objects in the same group are close and objects in the other group are far apart.
+ //clustering of maximum spacing//
+  * further def of distance function: d(i,i) = 0, d(i,j) > 0, d(i,j)=d(j,i)
+  * if we want k groups you need k clusterings.
+  * we also want clustering with maximum possible spacing, which means we want the clusters to be as far as possible from each other.
+Designing the algorithm:
+  * we grow a graph from the set of vertices
+    * add edges in increasing order of distances, don't add an edge if the two vertices are already in the same cluster.
+    * if we add an edge that spans two distinct components we will be merging the two clusters. (call this single link clustering)
+    * we are using hierachial agglomerative clustering to make this work.
+    * we are using Kruskal's minimum spanning tree algorithm thus the union find data structure.
+        * we stop just before we add the k-1 edges.
+Analyzing the algorithm
+  * 4.26: The components C1, C2, ..., Ck formed by deleting the k -1 most expensive edges of the minimum spanning tree T constitute a k clusering of maximum spacing.
+  * proof on page 150.
+==== 4.8 HUffman codes and data compression ====
+  * basic problems of data compression
+encoding symbols using bits
+  * convert text into binary aka long strings of bits
+  * could fix a number of bits to each letter. so if we have k letters need 2^k bits.
+  * this might not be the most efficient.
+  * how do you lower the average number of bits per letter?
+Variable-length encoding schemes
+  * ex. morse code
+  * morse code the lettters are prefixes. we don't want this.
+prefix code
+  * means that no letters are prefixes. don't need spaces.
+  * ex. a = 00, b = 10, c = 11. 110010 is clearly cab.
+  * we will want an optimal prefix codes aka we want less bits for more frequent letters.
+      * will lower abl.
+  * see def. of encoding length on page 165. (sum of frequencys * length of encoding)
+designing the algorithm
+  * representing prefix codes using bin trees.
+    * each leaf is a letter
+    * going left is 0 going right is one
+    * see ex. on 168
+    * will give a prefix code if all the letters are leafs! (4.27)
+    * 4.28 the bin tree for optimal prefix code is full
+  * Top down approach
+    * different from how we did it in class
+    *this isn't optimal
+    *see 4.29 and page 170-171
+  *Now lets find the optimal prefix code algorithm
+    * 4.31 - there is an optimal prefix code, with corresponding tree t* in which the two lowest-frequency letters are assigned to leaves that are siblings in t*
+    * called huffman's algorithm
+    * see page 172
+    * we also did a different algorithm in class
+Analyzing Huffman's algoirthm
+  * proof of optimality - on p. 174. (Very mathy, uses summation notation and not words)
+  * second part of proof uses proof by contradiction.
+  * see page 174 - 175
+  * Running time = o(klogk)
+extensions
+  * can use it with images and encoding messages.
+  * I can't remember where or why I went to this, (was it at UDEL?, but I remember going to talk and listening to a woman talk about how her job was to do this to encode messages in images.