Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | |
courses:cs211:winter2011:journals:anna:chapter_4 [2011/03/02 02:45] – poblettsa | courses:cs211:winter2011:journals:anna:chapter_4 [2011/03/02 03:48] (current) – poblettsa |
---|
| |
=== 4.8 Huffman Codes and Data Compression === | === 4.8 Huffman Codes and Data Compression === |
| |
| The general problem of this section is how to encode symbols using bits. The simplest way, but not the most efficient, is to have codes of the same length for each symbol. For example, if we had 32 symbols to encode, we would need 5 bits. However, must alphabets have letters that occur more frequently than others, so it would be more efficient to make the bits for those symbols shorter. This idea is called data compression. A type of code with different numbers of bit for different symbols is a variable length coding (i.e. Morse code). However, it is also important that a code be a prefix code, which means that no code word is a prefix of another. This means you can just read the encoded message in a straight line until you reach a code word, and you will not run into any confusion. |
| |
| To evaluate compression, we can look at average number of bits per letter, which is the sum of the frequencies times the number of bits for each letter. The goal is to find a prefix code with the minimum average bits per letter. These codes are called Huffman codes, and are represented using a binary tree. In this tree, the left child represents a 0 and the right side represents a 1. Each leaf is a letter and you get the bits for that letter by following the path down the tree. An algorithm to build this tree, given frequencies and an alphabet, is on page 172. You can prove Huffman codes are optimal using induction and the running time of the algorithm is O(k logk). |
| |
| This section was interesting to me because I actually took a class on data compression in Scotland last semester. We actually learned to think about and visualize Huffman codes without the tree, but we still used the tree to implement it. It was interesting to see a more in depth analysis of the algorithms I worked with before. |
| |