Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
courses:cs211:winter2014:journals:emily:entrysix [2014/03/05 00:49] – [Chapter 4.7] hardye | courses:cs211:winter2014:journals:emily:entrysix [2014/03/05 02:49] (current) – [Chapter 5.1] hardye | ||
---|---|---|---|
Line 38: | Line 38: | ||
+ | In this problem we use greedy rules to " | ||
+ | **The Problem:** | ||
+ | |||
+ | Encoding Symbols Using Bits | ||
+ | * we want to use a fixed number of bits to encode the letters of the alphabet | ||
+ | * we want to make this more efficient by using a smaller number of bits for frequent letters and larger number of bits for less frequently used letters | ||
+ | * always want to represent files as compactly as possible | ||
+ | * compression algorithms reduce the space of input files through encoding schemes | ||
+ | |||
+ | Variable-Length Encoding Schemes | ||
+ | * Morse code- translating letters into sequence of dots and dashes | ||
+ | * such short strings makes the encoding ambiguous | ||
+ | |||
+ | Prefix Codes | ||
+ | * ambiguity happens when a pair of letters where the bit string that encodes one letter is the prefix of the bit string that encodes another letter | ||
+ | * we have to map letters to bit strings so no encoding is a prefix to any other | ||
+ | * a //prefix code// is a function that maps each letter to a sequence of 0's and 1's so that for distinct letters the sequence of either of those letters is not a prefix to the other | ||
+ | |||
+ | Optimal Prefix Codes | ||
+ | * we want to take advantage of some letters being more frequent than other when encoding our letters | ||
+ | * each letter has its own frequency, for n letters in the set, the total frequencies of all n letters equals 1 | ||
+ | * To fine the total length of our encoding we take the sum of the number of times a letter occurs times the length of the bit string used to encode that letter | ||
+ | * a prefix code that minimizes the average number of bits per letter is an optimal prefix code | ||
+ | |||
+ | **The Algorithm: | ||
+ | |||
+ | We use a greedy method to construct an optimal prefix code by developing tree representation of the codes. | ||
+ | |||
+ | Representing Prefix Codes Using Binary Trees | ||
+ | * binary tree- rooted tree in which each node that is not a leaf has at most two children | ||
+ | * the number of leaves is = to the size of the alphabet | ||
+ | * follow the path from the root to the leaf-- if the path goes left record a 0, if the path goes right record a 1 | ||
+ | |||
+ | |||
+ | **(4.27)** The encoding of S constructed from T is a prefix code | ||
+ | |||
+ | **Proof:** | ||
+ | The letters can only be leaves, so an encoding of x cannot be a prefix to the encoding of y because x is not a node on the path to y. | ||
+ | |||
+ | We can also build a tree from a prefix code recursively!! So the search for a binary tree is the search for a prefix code. We want the tree that minimizes the weighted average (frequencies of the letters) of the depths of all leaves. | ||
+ | |||
+ | A binary tree is **full** if each node that is not a leaf has two children. | ||
+ | |||
+ | **(4.28)** The binary tree corresponding to the optimal prefix code is full. (p.168) | ||
+ | |||
+ | We prove this using an exchange argument. | ||
+ | |||
+ | The Top-Down Approach | ||
+ | * we want to produce a tree with the smallest average leaf depth | ||
+ | * try and split the alphabet into as nearly balanced split of the total frequencies as possible | ||
+ | * recursively construct prefix codes for the split sets independently | ||
+ | * this does not guarantee always producing an optimal prefix code | ||
+ | |||
+ | If we knew the binary tree corresponding to an optimal prefix code but didn't know which letters the leaves were we can complete the tree with an easy fact. | ||
+ | **(4.31)** There is an optimal prefix code, with corresponding tree T*, in which the two lowest-frequency letters are assigned to leaves that are siblings in T*. | ||
+ | |||
+ | Lock the two lowest frequencies together into a metaletter...recursive strategy: | ||
+ | |||
+ | **Huffman' | ||
+ | |||
+ | To construct a prefix code fo an alphabet S, with given frequencies: | ||
+ | If S has two letters then | ||
+ | Encode one letter using 0 and the other letter using 1 | ||
+ | Else | ||
+ | Let y* and z* be the two lowest-frequency letters | ||
+ | Form a new alphabet S' by deleting y* and z* and replacing them with a new letter w of frequency f(y)+f(z) | ||
+ | Recursively construct a prefix code A' for S', with tree T' | ||
+ | Define a prefix code for S as follows: | ||
+ | Start with T' | ||
+ | Take the leaf labeled w and add two children below it labeled y* and z* | ||
+ | Endif | ||
+ | | ||
+ | The prefix code this algorithm produces is called Huffman code. This is a bottom-up approach by focusing on leaves with two lowest frequencies and moving up to leaves with highest frequencies. | ||
+ | |||
+ | **Analysis: | ||
+ | |||
+ | We prove the optimality of Huffman' | ||
+ | |||
+ | This section was realllllly long to read which made it more boring, but the presentation of it in class was much more interesting. The proof was definitely a part of the chapter that I will have to come back to to understand. Four pages of a lot of variables was intimidating. I would rate this readability: | ||
+ | |||
+ | ====== Chapter 5.1 ====== | ||
+ | |||
+ | **A First Recurrence: The Mergesort Algorithm** | ||
+ | |||
+ | In this section we are beginning divide-and-conquer algorithms!! | ||
+ | |||
+ | Mergesort | ||
+ | * sorts given list of numbers | ||
+ | * divides list in two equal parts | ||
+ | * sort each half recursively | ||
+ | * combine the results spending linear time for the initial division and final recombining | ||
+ | |||
+ | Let T(n) be the worst-case running time of the algorithm on a list of size n. To divide the list into two even parts (n/2) takes O(n) time. For each division it takes T(n/2) to solve the sorting. It then spends O(n) to combine the lists back together. SO.... | ||
+ | |||
+ | **(5.1)** For some constant c, T(n) =< 2T(n/2) + cn when n > 2 and T(2) =< c. | ||
+ | |||
+ | T(n) is bounded by an inequality. There is a base case that says T(n) is equal to a constant when n is a constant. We have to solve the recurrence so that T is only on the LHS of the inequality. | ||
+ | |||
+ | How to solve recurrences: | ||
+ | * " | ||
+ | * start with a guess and check if it works by substituting it into the recursion. Justify by induction! | ||
+ | |||
+ | **Unrolling the Mergesort Recurrence** | ||
+ | * level 1: problem size n, takes cn time | ||
+ | * level 2: two problems size n/2 each takes cn/2 time for a total of at most cn | ||
+ | * level 3: four problems size n/4 each take cn/4 time for total of at most cn | ||
+ | * pattern: at level j of the recursion the number of subproblems is 2< | ||
+ | * sum over all levels: number of times the input is halved is log n so the sum of the cn work on log n levels is O(n log n) | ||
+ | |||
+ | **Substituting a Solution into the Mergesort Recurrence** | ||
+ | |||
+ | I would rate this chapter a readability of 8/10. It was fairly short and easy to follow but if I hadn't read this chapter after it was presented in class I would have been a little lost. |