Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2012:journals:jeanpaul:chapterfour_sectionviii [2012/03/06 04:03] – [The Problem] mugabej
+++ courses:cs211:winter2012:journals:jeanpaul:chapterfour_sectionviii [2012/03/06 05:27] (current) – [Implementation and Running Time] mugabej
@@ Line 13: / Line 13: @@
 \\
 ** Variable-Length Encoding Schemes**:
   * Basically more frequent letters are represented differently from the less frequent ones.
+  * The goal is to represent symbols with as short bits as possible
+\\
+** Prefix codes**:
+  * It's the encoding of symbols in such a way that no encoding is a prefix of any other
+  * A //Prefix code// for a set S of letters is a function γ that maps each letter x∈ S to some sequence of zeros and ones such that:
+    * For distinct x,y ∈ S, the sequence γ(x) is not a prefix of the sequence γ(y).
+\\
+** Optimal Prefix Codes **:
+  * For each letter x in S, there is a frequency f<sub>x</sub>, that represents the fraction of letters in the text that are equal to x
+  * Assuming there are n letters, n*f<sub>x</sub> of these letters equal to x
+  * The frequencies sum to 1 -->∑<sub>x ∈ S</sub> f<sub>x</sub> = 1
+  * |γ(x)|: Length of encoding of x
+  * ABL = Σ<sub>x in S</sub> f<sub>x</sub>*|γ(x)|
+  * ABL is the Average number of Bits per Letter
+\\
+\\
+==== Designing the Algorithm ====
+  * The search space of the problem includes all of the possible ways of mapping letters to bit strings
+  * We need to develop a tree-based means of representing prefix codes that exposes their structure more clearly
+\\
+\\
+** Representing Prefix Codes using Binary Trees **
+  * We label each leaf of the binary tree with a distinct letter in the size of the alphabet S
+  * For each letter x in S, we follow the path from the root to the leaf labeled x:
+    * Each time the path goes from a node to its left child, we write down a 0
+    * Each time the path goes from a node to its right child, we write down a 1
+  * The Encoding of S constructed from T is a prefix code
+  * The search for an optimal prefix code reduces to the search of a binary Tree T, together with a labeling of the leaves of T, that minimizes the ABL.
+  * Thus the length of the encoding of a letter x in S is simply the length from the root to the leaf labeled x: This length is termed the ** depth** of the leaf, and the depth of a leaf v in T will be denoted by depth<sub>T</sub>(v).
+  * So now our goal is to find the labeled tree T that minimizes the weighted average of the depths of all leaves, where the average is weighted by the frequencies of the letters that label the leaves: ∑<sub>x ∈ S</sub> f<sub>x</sub>*depth<sub>T</sub>(x).--> ABL
+  * The Binary Tree corresponding to the optimal prefix code is full
+==== Algorithm to Construct an Optimal Prefix Code ====
+>>>>>>>>>>>>>>>>> To construct a prefix code for an alphabet S, with given frequencies:\\
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If S has 2 letters:\\
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Encode one letter using 0 and the other using 1
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Else:\\
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let y* and z* be the 2 lowest-frequency letters\\
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Form a new alphabet S' by deleting y* and z* and replacing them with\\
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a new letter ω of frequency f<sub>y*</sub>+ f<sub>z*</sub>\\
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Recursively Construct a prefix code y' for S', with tree T'\\
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Define a prefix code for S as follows:\\
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Start with T'\\
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Take the leaf labeled ω and add two children below it labeled y* and z*\\
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> End if\\
+\\
+  * The algorithm above is referred to as the //Huffman Algorithm// and the prefix code it produces is termed the //Huffman code//
+  * This algorithm produces an optimal prefix code
+  * This algorithm also follows a bottom-up approach as seen: It focuses on the leaves representing the two lowest-frequency letters, and then continues by recursion.
+\\
+\\
+==== Analyzing the Algorithm ====
+  * The algorithm is optimal
+  * ABL(T') = ABL(T) - f<sub>ω</sub>
+  * The Huffman code for a given alphabet achieves the minimum ABL of any prefix code
+==== Implementation and Running Time ====
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Create a leaf node for each symbol, labeled by its frequency, and add to a\\ queue(PQ).Takes O(nlogn)\\
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> While there is more than one node in the queue: Takes O(n)\\
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Remove the two nodes of lowest frequency. Takes O(logn)\\
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Create a new internal node with these two nodes as children and\\ with frequency equal to the sum of the two nodes' probabilities. Takes O(1) time. \\
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Add the new node to the queue Takes O(logn)\\
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The remaining node is the tree’s root node.\\
+\\
+So overall takes O(nlogn).\\
+\\
+Very intriguing section although its proofs were long. I give it an 8/10.