Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2018:journals:devlinn:chapter4 [2018/03/03 18:37] – devlinn
+++ courses:cs211:winter2018:journals:devlinn:chapter4 [2018/03/12 21:00] (current) – devlinn
@@ Line 63: / Line 63: @@
 There are three greedy algorithms which can be used to solve this problem: Kruskal's Algorithm, Prim's Algorithm, and the Reverse-Delete Algorithm. Each algorithm works by repeatedly inserting or deleting edges from a partial solution. There are two important properties which need to be defined in order to analyze these algorithms: the //Cut Property// and the //Cycle Property//. The //Cut Property// says the following, and allows us to know when it is safe to include an edge in the minimum spanning tree:
-    //Assume that all edge costs are distinct. Let S be any subset of nodes that is neither empty nor equal to all of V, and let edge e = (v, w) be the minimum-cost edge with one end in S and the other in V - S. Then every minimum spanning tree contains the edge e.//
+    Assume that all edge costs are distinct. Let S be any subset of nodes that is neither empty nor equal to all of V, and let edge e = (v, w) be the minimum-cost edge with one end in S and the other in V - S. Then every minimum spanning tree contains the edge e.
-We can use the Cut Property to prove that Kruskal's Algorithm and Prim's Algorithm both produce optimal solutions.\
+We can use the Cut Property to prove that Kruskal's Algorithm and Prim's Algorithm both produce optimal solutions for the minimum spanning tree of //G//. The //Cycle Property// says the following, and allows us to know an edge is //not// in the minimum spanning tree:
-The //Cycle Property// says the following, and allows us to know an edge is //not// in the minimum spanning tree:
+    Assume that all edge costs are distinct. Let C be any cycle in G, and let edge e = (v, w) be the most expensive edge belonging to C. Then e does not belong to any minimum spanning tree of G.
-    //Assume that all edge costs are distinct. Let C be any cycle in G, and let edge e = (v, w) be the most expensive edge belonging to C. Then e does not belong to any minimum spanning tree of G.//
+The Cycle Property is used to prove the optimality of the Reverse-Delete Algorithm. If using a priority queue, Prim's Algorithm can be implemented to run in O(//m//) time, plus the time for n ExtractMin and m ChangeKey operations. I understood this section alright. I would have liked to understand more simply how to Cycle and Cut Properties are implemented to show these things, because the way the book tried to explain this was unclear. I would rate it a 6.
+===== 4.6 Implementing Kruskal's Algorithm: The Union-Find Data Structure =====
+We would like to consider a graph evolving through the addition of edges. To maintain a set of connected components as we add to the graph, without having to recompute this information each time, we can implement the **Union-Find** data structure. This structure stores a representation of the components in such a way to support efficient searching and updating. The operation Find(//u//), where //u// is a given node, returns the name of the set containing //u//. It can be used to check if two nodes are in the same set as well, and it can be implemented in O(log//n//) run time. We also have Union(//A//, //B//) which merges two sets into one set and runs in O(log//n//) run time as well. MakeUnionFind(//S//) returns a Union-Find data structure on the set //S// and can be implemented in O(//n//) run time.
+The simplest way to implement a Union-Find data structure is using an array which will contain the name of the set currently containing each element. Using this implementation, Find(//v//) takes O(1) time, but Union(//A//, //B//) can take as much as O(//n//) time. Using a few optimizations, we can claim that Find(//v//) takes O(1) time, MakeUnionFind(//S//) takes O(//n//) time, and any sequence of //k// Union operations takes at most O(//k//log//k//) time.
+To improve upon this implementation, we could use pointers instead. Using pointers from each node to the name of the set which contains the node means we can implement Find(//v//) in O(log//n//) time, MakeUnionFind(//S//) takes O(//n//) time, and any Union operation in O(1) time. We can use this Union-Find data structure to implement Kruskal's Algorithm in O(//m//log//n//) time. This section was fairly straightforward in terms of what the Find and Union operations are, but complicated in the analysis of different data structure implementations. Therefore I rate this section a 7.
+===== 4.7 Clustering =====
+Minimum spanning trees play a role in //clustering//, which arises when classifying or organizing a collection of objects into coherent groups. One way to do this is to define a distance function on the objects where objects that are closer to each other are more similar to each other. To find a clustering of maximum spacing of a //k-clustering//, where we divide //n// objects into //k// groups, we can use minimum spanning trees. To do this, we can run Kruskal's algorithm but stop before adding the last //k// - 1 edges. The following claim is made:
+    The components C1, C2, ..., Ck formed by deleting the k -1 most expensive edges of the minimum spanning tree T constitute a k-clustering of maximum spacing.
+This section was also a bit challenging to understand. We started to discuss this in class. I would have liked more visuals in the book to guide me through the maximum clustering problem. I would rate it a 6.
+===== 4.8 Huffman Codes and Data Compression =====
+For this problem (data compression), the goal is to produce an algorithm that shrinks the size of a problem instance and uses recursion to solve the smaller problem. If we want to encode symbols using bits, we can create a greedy algorithm to efficiently solve this problem and reduce the average number of bits per letter. We would also like our encoding scheme to avoid ambiguity, so we need to make sure no encoding for a particular symbol is a prefix of another symbol's encoding.
+Another thing to consider when creating an optimal encoding of symbols is how frequently certain letters appear in everyday language. It is preferable to assign these letters a smaller number of bits to minimize our use of bits. To do this, we assign a frequency to every letter, with the total frequency of all letters summing to 1.
+To construct a greedy algorithm for this problem, we will use binary trees. This ensures that our encoding constructed from the binary tree T will be a prefix code. In addition, the binary tree which corresponds to the //optimal// prefix code is full. Here is the algorithm, //Huffman's Algorithm//, to construct an optimal prefix code, called a //Huffman Code//:
+    To construct a prefix code for alphabet S, with given frequencies:
+        If S has two letters then
+            Encode one letter using 0 and the other using 1
+        Else
+            Let y* and z* be the two lowest-frequency letters
+            Form a new alphabet S' by deleting y* and z* and replacing them with a new letter w of frequency fy* + fz*
+            Recursively construct a prefix code y' for S', with tree T'
+            Define a prefix code for S as follows:
+                Start with T'
+                Take the leaf labeled w and add two children below it labeled y* and z*
+            Endif
+When implemented using a priority queue, Huffman's Algorithm can run in //O//(//k//log//k//) time, where //k// is the number of letters in the alphabet. This unit is very clear to me. It was explained well in class and the examples we went over were straightforward. In addition, I find the logic behind this unit very intuitive. I rate this a 9.5 for my understanding.