Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2018:journals:donohuem:chapter4 [2018/03/05 23:49] – donohuem
+++ courses:cs211:winter2018:journals:donohuem:chapter4 [2018/03/11 21:59] (current) – [4.8 Huffman Codes] donohuem
@@ Line 2: / Line 2: @@
-===== 4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead =====
+===== 4.1 Interval Scheduling =====
 The motivation for interval scheduling is to schedule as many jobs as possible with no overlap between the intervals of time for each job. When designing a greedy algorithm for this problem, we base the algorithm on one simple rule; e.g. selecting jobs that start the earliest, or jobs with smallest interval times. In this problem, the simple rule that produces an optimal solution is selecting jobs that finish the earliest. The greedy algorithm looks like such:
@@ Line 18: / Line 18: @@
-===== 4.2 Scheduling to Minimize Lateness: An Exchange Argument =====
+===== 4.2 Scheduling to Minimize Lateness =====
 The motivation for this problem is to schedule jobs with corresponding deadlines in a way such that there is minimal lateness. Jobs are allowed to be late (scheduled so that they will end past their deadline), but this must happen as little as possible. Again, we must design a greedy algorithm based on a simple rule; e.g. schedule longest jobs first. The best rule in this problem is earliest job first. Algorithm:
@@ Line 47: / Line 47: @@
-===== 4.7 Implementing Kruskal's Algorithm =====
+===== 4.7 Clustering =====
+Motivation: We wish to organize groups of data into coherent groups. One such way to categorize data, in the context of geographical points, is differentiating clusters by distance. This gives way to the Clusterings of Maximum Spacing problem. Given a set of U objects, we divide U into k groups. The spacing of a k-clustering is defined as the minimum distance between any pair of points lying in different clusters (this makes absolutely no sense to me). The algorithm is similar to the Minimum Spanning Tree algorithms, specifically Kruskal's. Readability: 5/10.
+===== 4.8 Huffman Codes =====
+Our problem is to encode symbols (letters of the alphabet) into bits. However, there are two important stipulations we must meet. First, we want our encoding to be as concise as possible (minimize ABL). This means that frequently used letters like e, t, a, o, and i should be smaller in size than less frequently used letters. Second, there should be no ambiguity in our encoding. Encoded symbols should be unique and not confusable with other symbols. To meet these two requirements, we use //Optimal Prefix Codes//, where each prefix of an encoded letter is distinct and the size of each prefix corresponds to a letter's frequency. An algorithm that creates optimal prefix codes would utilize a binary tree. More specifically, we would take the two most frequent letters, make them leaf nodes in the tree, have their parent be their combined frequencies, then recursively repeat for the remaining letters. Huffman's Algorithm:
+     if S has two letters:
+        Encode one letter as 0 and the other letter as 1
+     else:
+        Let y* and z* be the two lowest-frequency letters
+        Form a new alphabet S’ by deleted y* and z* and replacing
+         them with a new letter w of freq fy* + fz*
+        Recursively construct a prefix code y’ for S’ with tree T’
+        Define a prefix code for S as follows:
+            Start with T’
+            Take the leaf labeled w and add two children below it
+             labeled y* and z*
+The run time of this algorithm can be O(klogk) when it uses a priority queue, where k is the number of letters in the alphabet. The readability of this section is 6/10.