Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revision | |||
| courses:cs211:winter2018:journals:cohene:home:chapter4 [2018/03/06 01:54] – [4.7: Clustering] cohene | courses:cs211:winter2018:journals:cohene:home:chapter4 [2018/03/12 23:28] (current) – Added Section 4.8 cohene | ||
|---|---|---|---|
| Line 62: | Line 62: | ||
| To find a clustering of maximum spacing, we want to cluster points as rapidly as possible. We grow a graph edge by edge, with connected components corresponding to clusters. This process will never create a cycle, but rather a union of trees. This procedure is extremely similar to Kruskal' | To find a clustering of maximum spacing, we want to cluster points as rapidly as possible. We grow a graph edge by edge, with connected components corresponding to clusters. This process will never create a cycle, but rather a union of trees. This procedure is extremely similar to Kruskal' | ||
| + | |||
| + | |||
| + | ===== 4.8: Huffman Codes and Data Compression ===== | ||
| + | |||
| + | This chapter discusses solving problems using data compression. Here, the problem is encoding symbols using bits. To convert various " | ||
| + | |||
| + | One example of how data has been compressed is Morse code. Morse code translates each symbol into dots or dashes. However, Morse code removes ambiguity between letters that are represented similarly by adding a space in between each letter. If we were to turn this into bits, the ambiguity would remain as certain encodings are prefixes to others. A method must be built to eliminate the prefix problem. The method could work as follows: | ||
| + | Start reading bits from left to right, once you come across enough bits to match an encoding this becomes a letter, continue with the next bit and repeat. | ||
| + | |||
| + | Now, we must account for letter frequencies to make the most optimal encoding. We can use a greedy algorithm to find the optimal prefix code. First, we can build a tree to help us build each encoding. We will create a binary tree where each time we go from a node to its left child we add a 0 and every time we go from a node to its right child we add a 1. By using a tree, we can see that the length of each symbol in bits is its corresponding layer in the tree. We want to fill the binary tree to make it optimal. | ||
| + | |||
| + | The first thought of building is tree is to attempt to build it top down. However, if we knew the optimal prefix code, this would help a great amount. We could then simply fill in the nodes of the tree. We would start with the highest frequency letters and move down through the tree, labeling nodes in order of decreasing frequency. This is essentially the basis of Huffman' | ||
| + | |||
| + | Huffman' | ||
| + | |||
| + | |||
| + | |||
