Differences

This shows you the differences between two versions of the page.

--- courses:cs211:winter2018:journals:shermanc:chapter4 [2018/03/05 06:42] – shermanc
+++ courses:cs211:winter2018:journals:shermanc:chapter4 [2018/03/20 21:30] (current) – shermanc
@@ Line 53: / Line 53: @@
 Overall, this chapter was pretty drawn out with large proofs and simple ideas that went on and on, just to be put in more understandable terms in the following paragraph.  I give it a 3/10.
+===== 4.6: Implementing Kruskal's Algorithm: The Union-Find Data Structure =====
+In order to implement this algorithm, we will create a structure called the Union-Find structure.  This allows us to check both sides of an edge as we consider each edge.  If they are the same, a path is already included, so they will not be added.  But, if they are different, they will be added.  This works great as we add edges, because we just have a check with an if statement, but the same cannot be said for deletion, which is not supported.  But, we can simply take two arrays and, for each n, use a Find operation to compare the two and then add the union to a final set, which will all take O(klogk) time, as we said in the last section.
+We can actually have a better data structure, though, that uses pointers.  This is useful, as we just update the pointers from pointing at themselves to pointing at the set they are now unioned with.  The problem with this is that the Find operation now takes O(logn).
+Whichever one we use, it will be done in O(mlogn) time, it is just a matter of some slight improvements that aren't //too// noticeable in the long run.
+This section was shorter but a more dull read as it was review basically, 5/10.
+===== 4.7: Clustering =====
+This problem had to do with examples such as a collection of photos, letters, etc. that we would try to classify into groups, or clusters.  To start, we have a graph that we will grow by first connecting the edges with the closest cost.  The connection here is that we are just using Kruskal's Algorithm, stopping when we obtain the specified number of connected components.  So, we basically do everything in Kruskal's, but delete the "k - 1 most expensive edges."  This allows us to get a cluster with the closest related set.
+This section was actually pretty interesting and extremely short, so 8/10.
+===== 4.8: Huffman Codes and Data Compression =====
+The whole deal with Huffman Codes was that they were a way to find how to encode symbols using bits.  So, the book discussed how to go about coding them and then, in turn, decoding them.
+The goal is to produce a "labeled binary tree in which the leaves are as close to the root as possible."  This is because this gives us an average leaf depth.  What we end up doing is constructing a prefix code for a given alphabet.  We pick out the two letters that occur least frequently and assign them as variables.  Then we make a new alphabet excluding these letters, but replacing them with a new letter that occurs the amount of both least-occurring letters combined.  Then we recursively construct a prefix code on this new alphabet.  The way we get a prefix for the original alphabet is by taking the given tree from our newest alphabet and take the letter we replaced the least-occurring letters with, and give it two children: the original least-occurring letters.  This final tree will give us our desired prefix code as defined as a //Huffman code//.
+This algorithm runs in O(k<sup>2</sup>), because it first does a full scan over an alphabet of size k and then proceeds to iterate k-1 times over the alphabet.  If we use a priority queue, however, we can get it to run in O(klogk) time, as the priority queue allows for insertions and extractions of O(logk).
+This was a very interesting section, talking about encoding and the uses for it.  Also, the reason why the algorithm works and how to use it was interesting too, so I give it a 9/10.