Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2018:journals:bairdc:chapter4 [2018/03/06 02:42] – bairdc
+++ courses:cs211:winter2018:journals:bairdc:chapter4 [2018/03/13 03:55] (current) – [4.8 – Huffman Codes and Data Compression] bairdc
@@ Line 82: / Line 82: @@
 ===== 4.6 – Implementing Kruskal's Algorithm: The Union-Find Data Structure =====
-A Union-Find data structure is a special one created for Kruskal's Algorithm, which stores a representation of the components in a way that supports rapid searching and updating. The data structure can test is two nodes are in the same set by comparing the Find operation on both. It can also merge two sets into a single set with the Union operation. Find and Union both operate in O(logn) time. We can use an array implementation so that Find takes O(1), MakeUnionFind takes O(n), and any sequence of k Union operations takes O(klogk) time. However, a better representation would be to use a pointer-based representation. This way, we get Union operations in O(1), MakeUnionFind in O(n), and Find operations in O(logn).
+A Union-Find data structure is a special one created for Kruskal's Algorithm, which stores a representation of the components in a way that supports rapid searching and updating. The data structure can test is two nodes are in the same set by comparing the Find operation on both. It can also merge two sets into a single set with the Union operation. The goal time for Find and Union both a are O(logn) time. We can use an array implementation so that Find takes O(1), MakeUnionFind takes O(n), and any sequence of k Union operations takes O(klogk) time. However, a better representation would be to use a pointer-based representation. This way, we get Union operations in O(1), MakeUnionFind in O(n), and Find operations in O(logn).
 Overall this section was pretty readable and interesting. 7/10 for both.
+===== 4.7 – Clustering =====
+Clustering happens when you're trying to classify a collection of objects into distinct groups. One way to divide them up into clusters is to calculate the distances between the distinct groups. However, there are infinitely many clusterings in a set; the issue is finding the ones with the max spacing. This procedure is the same as Kruskal's minimum spanning tree algorithm, except we stop the procedure when we obtain //k// connected components. This is the same as deleting the k-1 most expensive edges from the minimum spanning tree and returning the resulting components.
+Overall this section was very interesting and very readable to me: 10/10.
+===== 4.8 – Huffman Codes and Data Compression =====
+The goal of Huffman Codes is lossless data compression. The original top-down approach created by Shannon and Fano isn't guaranteed to always produce an optimal code. So, Huffman came up with a bottom-up approach that did. The idea in this is to have the most frequently used characters to have the shortest codes in order to optimize the overall compression. Furthermore, you don't want to have one character to be the prefix of another character, or you will get multiple ways to interpret data. To do this, we must minimize the average number of bits per letter ABL(y) = Sigma x in S fx|y(x)|.
+Using a Priority Queue, we can get Huffman's algorithm to run in O(klogk):
+<code>
+if S has two letters:
+  Encode one letter as 0 and the other letter as 1
+else:
+  Let y* and z* be the two lowest-frequency letters
+  Form a new alphabet S’ by deleted y* and z* and replacing
+    them with a new letter w of freq fy* + fz* Recursively construct a prefix code y’ for S’ with tree T’
+  Define a prefix code for S as follows:
+    Start with T’
+    Take the leaf labeled w and add two children below it
+      labeled y* and z*
+</code>
+I'd give this section a 10/10 on readability and interestingness. I was very interested in the topic, and it was explained well.