Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | |||
courses:cs211:winter2011:journals:chen:chapter_4 [2011/03/02 03:03] – zhongc | courses:cs211:winter2011:journals:chen:chapter_4 [2011/03/02 03:13] (current) – zhongc | ||
---|---|---|---|
Line 208: | Line 208: | ||
Interesting/ | Interesting/ | ||
+ | |||
+ | |||
+ | ===== 4.8 Huffman Codes and Data Compression ===== | ||
+ | |||
+ | summary | ||
+ | |||
+ | Motivation: | ||
+ | How to encode data so that transmission could be more efficient? | ||
+ | Answer: use less bits on the data without much differentiation!(Low entropy data?) | ||
+ | |||
+ | We use Huffman codes. | ||
+ | |||
+ | If we use all letters in the same frequency, then there is nothing to encode or compress, but when we do not, which is often the case, | ||
+ | we can always represent the more frequently used words with less bits. | ||
+ | |||
+ | Watch out for prefix ambiguity! | ||
+ | |||
+ | **variable-length encoding** | ||
+ | |||
+ | Goal: minimize Average number of Bits per Letter (ABL): | ||
+ | calculate abl using the expected value. | ||
+ | |||
+ | Algorithm | ||
+ | |||
+ | implemenation | ||
+ | |||
+ | * Binary tree for the prefix codes | ||
+ | * Priority queue (with heap) choosing the node with lowest frequency | ||
+ | |||
+ | Cost (nlogn) Why? logn inserting and dequeuing. do it n times. | ||
+ | |||
+ | Interesting/ |