Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
courses:cs211:winter2018:journals:patelk:chapter4 [2018/03/10 16:37] – [4.8 Huffman Codes and Data COmpression] patelkcourses:cs211:winter2018:journals:patelk:chapter4 [2018/03/10 20:29] (current) – [4.8 Huffman Codes and Data COmpression] patelk
Line 345: Line 345:
 Readability: 7 Interesting: 7 Readability: 7 Interesting: 7
  
-===== 4.8 Huffman Codes and Data COmpression =====+===== 4.8 Huffman Codes and Data Compression =====
  
   * Greedy rule to shrink the size of the problem instance so that an equivalent smaller problem can then be solved by recursion   * Greedy rule to shrink the size of the problem instance so that an equivalent smaller problem can then be solved by recursion
Line 386: Line 386:
         * if u = root, delete u and use v as root         * if u = root, delete u and use v as root
         * otherwise, let w be the parent of u in T, delete u, and make v be a child of w in place of u         * otherwise, let w be the parent of u in T, delete u, and make v be a child of w in place of u
 +  * __A First Attempt: The Top-Down Approach__
 +    * Goal: produce a labeled binary tree in which the leaves are as close to the root as possible
 +    * Pack leaves as tightly as possible: want a "nearly balanced" split of the alphabet
 +      * Shannon-Fano Codes: no version of this top-down splitting strategy is guaranteed to always produce an optimal prefix code
 +  * //What If We Knew The Tree Structure of the Optimal Prefix Code?//
 +    * Assume we have the binary tree T* corresponding to an optimal prefix code - need to figure out the labeling
 +    * Suppose that u and v are leaves of T*, such that depth(u) < depth(v). Further suppose that in a labeling of T* corresponding to an optimal prefix code, leaf u is labeled with y in S and leaf v is labeled with z in S. Then fy >= fx.
 +      * Proof using an exchange argument
 +      * if fy<fz, then effect of the exchange is as follows: the multiplier on fy increases [from depth(u) to depth(v)] and the multiple on fz decreases by the same amount [from depth(v) to depth(u)].
 +        * change to the overall sum is a negative number, contradicting the supposed optimality of the prefix code.
 +      * Having a lower-frequency letter at a strictly smaller depth than some other higher-frequency letter rules out for an optimal solution
 +      * Take all leaves of depth 1, label them with the highest-frequency letters, do the same with each level of increasing depth -> leads to an optimal T*
 +    * w is a leaf of T*
 +      * If w were not a leaf, there would be some leaf w' in the subtree below it. But then w' would have a depth greater than that of v, contradicting our assumption that v is a leaf of maximum depth in T*.
 +        * There is an optimal prefix code, with corresponding tree T*, in which the two lowest-frequency letters are assigned to leaves that are sibling in T*.
 +  * //An Algorithm to Construct an Optimal Prefix Code//
 +    * Suppose y* and z* are the two lowest-frequency letters in S.
 +      * can lock them together because we know they end up as sibling leaves below a common parent
 +      * Replace y* and z* with meta-letter, obtaining an alphabet that is one letter smaller. Recursively find a prefix code for the smaller alphabet, and then "open up" the meta-letter back into y* and z* to obtain a prefix code for S:
 +//Huffman's Algorithm//
 +
 +{{:courses:cs211:winter2018:journals:patelk:optimalprefixcodesiblings.png?nolink&400|}}
 +
 +{{:courses:cs211:winter2018:journals:patelk:optimalprefixcodesiblings2.png?nolink&400|}}
 +
 +  * bottom-up approach: focuses on the leaves representing the two lowest-frequency letters, and then continues by recursion.
 +
 +**Analyzing the Algorithm**
 +  * //The Optimality of the Algorithm//
 +    * Recursive algorithm: proof by induction
 +    * Base: optimal for all two letter alphabets (one bit per letter)
 +    * By induction: optimal for all alphabets of size k-1
 +    * Consider: alphabet S of size k
 +    * The Huffman code for a given alphabet achieves the minimum average number of bits per letter of any prefix code
 +  * //Implementation and Running Time//
 +    * Can be polynomial time in k, the number of letters in the alphabet
 +      * recursive calls: sequence of k-1 iterations
 +      * each iteration consists simply of identifying the two lowest-frequency letters -> O(k)
 +      * combined: O(k^2)
 +    * Use a priority queue to maintain a set of k elements
 +      * allows for insertion and extraction 
 +      * Each iteration: extract the minimum twice; insert a new letter whose key is the sum of these two minimum frequencies
 +      * Using priority queues via heaps: each insertion and extraction: O(logk) -> over k iterations, O(klogk)
 +
 +**Extensions**
 +  * Data Compression: black and white images already use one bit per letter
 +    * need to be highly compressed: use a "fraction of a bit" for each white bixel
 +  * Cannot adapt to changes in text: frequency is stable for half the text, but then changes radically
 +    * Halfway into the sequence insert some kind of instruction to completely change the encoding
 +    * This will pay off in the long run: called adaptive compression schemes
 +
 +==== Personal Thoughts ==== 
  
-* __A First Attempt: The Top-Down Approach__ +While arguably one of the more interesting topics of this textbook, this section was long and very dense. There was lot of information that was hard to digest all at once. Our discussion in class was helpful, but not long enough to help make this section very clear. I will most likely need to review this chapter section again. 
-  * Goal: produce labeled binary tree in which the leaves are as close to the root as possible +Readability5 Interesting8
-  * back leaves as tightly as possiblewant a "nearly balanced" split of the alphabet +
-    * Shannon-Fano Codesno version of this top-down splitting strategy is guaranteed to always produce an optimal prefix code+
  
courses/cs211/winter2018/journals/patelk/chapter4.1520699830.txt.gz · Last modified: by patelk
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0