Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2011:journals:chen:chapter_4 [2011/02/28 18:08] – [4.6 Implementing Kruskal’s Algorithm: The IJnion-Find Data Structure] zhongc
+++ courses:cs211:winter2011:journals:chen:chapter_4 [2011/03/02 03:13] (current) – zhongc
@@ Line 172: / Line 172: @@
 Problem Motivation:
 Try to implement the Kruskal's algorithm
+pointer based implementation
 Find Operations
@@ Line 183: / Line 185: @@
+Intereting/readible : 5/5
+===== 4.7 Clustering =====
+Section Summary
+definition:
+Divide objects into clusters so that points in
+different clusters are far apart.
+Motivation/Application
+Identify patterns in gene expression
+K-Clustering: try to minimize the distance functions so that the
+farthest two elements in one cluster are not farther apart than any
+elements outside of the cluster.
+It is basically Kruskal's algo except we stop when there are k connected components we seek for.
+proof on P184
+Interesting/readable: 7/7
+===== 4.8 Huffman Codes and Data Compression =====
+summary
+Motivation:
+How to encode data so that transmission could be more efficient?
+Answer: use less bits on the data without much differentiation!(Low entropy data?)
+We use Huffman codes.
+If we use all letters in the same frequency, then there is nothing to encode or compress, but when we do not, which is often the case,
+we can always represent the more frequently used words with less bits.
+Watch out for prefix ambiguity!
+**variable-length encoding**
+Goal: minimize Average number of Bits per Letter (ABL):
+calculate abl using the expected value.
+Algorithm
+implemenation
+  * Binary tree for the prefix codes
+  * Priority queue (with heap) choosing the node with lowest frequency
+Cost (nlogn) Why? logn inserting and dequeuing. do it n times.
+Interesting/readable: 5/8