Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| courses:cs211:winter2018:journals:holmesr:section_4.1 [2018/03/06 02:38] – holmesr | courses:cs211:winter2018:journals:holmesr:section_4.1 [2018/04/19 20:08] (current) – admin | ||
|---|---|---|---|
| Line 72: | Line 72: | ||
| I didn't find this chapter to be too dense and the diagrams filled in the gaps fairly well so I would give it a reasonably high readability. | I didn't find this chapter to be too dense and the diagrams filled in the gaps fairly well so I would give it a reasonably high readability. | ||
| + | |||
| + | ===== Section 4.7 Clustering ===== | ||
| + | |||
| + | Clustering problems revolves around classifying objects into groups based on their distance from one another. For some collections of items, this distance can be physical but for many others, the distance is a measure of similarity for which a distance function must be defined. The idea of a clustering is that items more like each other will be in the same cluster, while items less like each other will be in disparate clusters. The trouble is finding an efficient way to partition these objects into a specified number k clusters. | ||
| + | |||
| + | It turns out that deleting the k-1 costliest edges from Kruskal' | ||
| + | |||
| + | Since a clustering is yielded by an execution of Kruskal' | ||
| + | |||
| + | I found this chapter surprisingly intuitive and stimulating and thus it was easily readable. | ||
| + | |||
| + | ===== Section 4.8 Huffman Codes and Data Compression ===== | ||
| + | |||
| + | This section begins by discussing different ways which one may encode data. One example discussed is the scheme using 1's and 0's which allows one to encode 2< | ||
| + | |||
| + | The problem with such an encoding is that it is sometimes ambiguous. Since e is 0, t is 1, and 01 is a, a string 01 may either be et or a. The solution to this problem is adding a slight pause in between letters, but if the pause isn't long enough it may be missed and additionally the pause becomes its own type of bit. | ||
| + | |||
| + | Such an ambiguity can be overcome by a construction called a prefix code. A prefix code is one in which the encoding for one letter in the alphabet can not be a prefix for another letter. In the Morse code example, e's encoding - 0, was a prefix for the encoding of a - 01. An example of a prefix code for the alphabet {a, b, c} could be a = 11, b = 01, c = 00. Using variable length encoding, it is possible to make an optimal prefix code by setting the most used letters to represent the shortest characters. | ||
| + | |||
| + | A metric known as the ABL, or Average Bits per Letter, can be used to compare different variable length encodings and the lowest ABL is the optimal encoding. ABL is found by multiplying the length of each letter' | ||
| + | |||
| + | The Shannon-Fano codes split the set of letters into two subsets of equal frequency and then recall themselves recursively on the subsets until each letter gets its own encoding. This top-down approach was a good start but a precursor to the superior Huffman codes. The important difference between Huffman codes and Shannon-Fano codes is that Huffman codes build up from the bottom, assigning the two least frequent letters to be children of a meta-letter and re-adding the meta-letter as a component of the alphabet for the algorithm to recursively operate on. | ||
| + | |||
| + | This algorithm lends itself to the use of a priority queue because letters can be added into the priority queue with their frequencies being their keys. Now all that remains is insertion and deletion, both of which occur in O(log n) time. Since the algorithm is recursive and adds meta-letters back into the alphabet, it will carry out n loops which each use the aforementioned O(log n) time, thus giving O(n log n) total time. | ||
