Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2014:journals:emily:entrysix [2014/03/05 00:49] – [Chapter 4.7] hardye
+++ courses:cs211:winter2014:journals:emily:entrysix [2014/03/05 02:49] (current) – [Chapter 5.1] hardye
@@ Line 38: / Line 38: @@
+In this problem we use greedy rules to "shrink the size of the problem instance, so that an equivalent smaller problem can then be solved by recursion."
+**The Problem:**
+Encoding Symbols Using Bits
+    * we want to use a fixed number of bits to encode the letters of the alphabet
+    * we want to make this more efficient by using a smaller number of bits for frequent letters and larger number of bits for less frequently used letters
+    * always want to represent files as compactly as possible
+    * compression algorithms reduce the space of input files through encoding schemes
+Variable-Length Encoding Schemes
+  * Morse code- translating letters into sequence of dots and dashes
+  * such short strings makes the encoding ambiguous
+Prefix Codes
+  * ambiguity happens when a pair of letters where the bit string that encodes one letter is the prefix of the bit string that encodes another letter
+  * we have to map letters to bit strings so no encoding is a prefix to any other
+  * a //prefix code// is a function that maps each letter to a sequence of 0's and 1's so that for distinct letters the sequence of either of those letters is not a prefix to the other
+Optimal Prefix Codes
+  * we want to take advantage of some letters being more frequent than other when encoding our letters
+  * each letter has its own frequency, for n letters in the set, the total frequencies of all n letters equals 1
+  * To fine the total length of our encoding we take the sum of the number of times a letter occurs times the length of the bit string used to encode that letter
+  * a prefix code that minimizes the average number of bits per letter is an optimal prefix code
+**The Algorithm:**
+We use a greedy method to construct an optimal prefix code by developing tree representation of the codes.
+Representing Prefix Codes Using Binary Trees
+  * binary tree- rooted tree in which each node that is not a leaf has at most two children
+  * the number of leaves is = to the size of the alphabet
+  * follow the path from the root to the leaf-- if the path goes left record a 0, if the path goes right record a 1
+**(4.27)** The encoding of S constructed from T is a prefix code
+**Proof:**
+The letters can only be leaves, so an encoding of x cannot be a prefix to the encoding of y because x is not a node on the path to y.
+We can also build a tree from a prefix code recursively!! So the search for a binary tree is the search for a prefix code. We want the tree that minimizes the weighted average (frequencies of the letters) of the depths of all leaves.
+A binary tree is **full** if each node that is not a leaf has two children.
+**(4.28)** The binary tree corresponding to the optimal prefix code is full. (p.168)
+We prove this using an exchange argument.
+The Top-Down Approach
+  * we want to produce a tree with the smallest average leaf depth
+  * try and split the alphabet into as nearly balanced split of the total frequencies as possible
+  * recursively construct prefix codes for the split sets independently
+  * this does not guarantee always producing an optimal prefix code
+If we knew the binary tree corresponding to an optimal prefix code but didn't know which letters the leaves were we can complete the tree with an easy fact.
+**(4.31)** There is an optimal prefix code, with corresponding tree T*, in which the two lowest-frequency letters are assigned to leaves that are siblings in T*.
+Lock the two lowest frequencies together into a metaletter...recursive strategy:
+**Huffman's Algorithm:**
+    To construct a prefix code fo an alphabet S, with given frequencies:
+      If S has two letters then
+        Encode one letter using 0 and the other letter using 1
+      Else
+        Let y* and z* be the two lowest-frequency letters
+        Form a new alphabet S' by deleting y* and z* and replacing them with a new letter w of frequency f(y)+f(z)
+        Recursively construct a prefix code A' for S', with tree T'
+        Define a prefix code for S as follows:
+          Start with T'
+          Take the leaf labeled w and add two children below it labeled y* and z*
+      Endif
+The prefix code this algorithm produces is called Huffman code. This is a bottom-up approach by focusing on leaves with two lowest frequencies and moving up to leaves with highest frequencies.
+**Analysis:**
+We prove the optimality of Huffman's by induction. The proof is really long and complicated but follows to prove by contradiction that the tree Huffman's produces is optimal. The total runtime of Huffman's algorithm is O(k log k). If we use a PQ by a heap we can inset and extract the minimum in O(log k) time which we do a total of k times.
+This section was realllllly long to read which made it more boring, but the presentation of it in class was much more interesting. The proof was definitely a part of the chapter that I will have to come back to to understand. Four pages of a lot of variables was intimidating. I would rate this readability: 7/10
+====== Chapter 5.1 ======
+**A First Recurrence: The Mergesort Algorithm**
+In this section we are beginning divide-and-conquer algorithms!!
+Mergesort
+  * sorts given list of numbers
+  * divides list in two equal parts
+  * sort each half recursively
+  * combine the results spending linear time for the initial division and final recombining
+Let T(n) be the worst-case running time of the algorithm on a list of size n. To divide the list into two even parts (n/2) takes O(n) time. For each division it takes T(n/2) to solve the sorting. It then spends O(n) to combine the lists back together. SO....
+**(5.1)** For some constant c, T(n) =< 2T(n/2) + cn when n > 2 and T(2) =< c.
+T(n) is bounded by an inequality. There is a base case that says T(n) is equal to a constant when n is a constant. We have to solve the recurrence so that T is only on the LHS of the inequality.
+How to solve recurrences:
+  * "unroll" the recursion by accounting for the run time in the first few levels then specify a pattern for as the recursion expands. Sum the runtimes over all levels
+  * start with a guess and check if it works by substituting it into the recursion. Justify by induction!
+**Unrolling the Mergesort Recurrence**
+  * level 1: problem size n, takes cn time
+  * level 2: two problems size n/2 each takes cn/2 time for a total of at most cn
+  * level 3: four problems size n/4 each take cn/4 time for  total of at most cn
+  * pattern: at level j of the recursion the number of subproblems is 2<sup>j</sup> with size n/2<sup>j</sup> so takes at most cn/2<sup>j</sup> time
+  * sum over all levels: number of times the input is halved is log n so the sum of the cn work on log n levels is O(n log n)
+**Substituting a Solution into the Mergesort Recurrence**
+I would rate this chapter a readability of 8/10. It was fairly short and easy to follow but if I hadn't read this chapter after it was presented in class I would have been a little lost.