Table of Contents
Cory's Journal
Preface Summary
Algorithmic ideas are useful within and far beyond computer science. They do not only have many applications; algorithms are also a great way to see the over-arching goals of the field of computer science in general. Dealing with algorithms, there are two main components: getting to the mathematical center of the problem, and then identifying the appropriate algorithm design techniques. At their most effective, algorithmic ideas don't just provide solutions to well-posed problems; they allow us to express the underlying questions effectively. This section was a very simple introduction to the ideas we will be learning throughout the course. I do not have any questions on the introduction, but I do believe that I am beginning to get a feel for how important this course is to understanding major concepts of computer science (and also for being able to effectively tackle technical interviews). We are also supposed to rank each section that we summarize on a scale of 1 to 10, 10 being most interesting and readable. This section was roughly (3/10). In the future, I will just put this number at the end of each summary with the understanding that I am ranking the section on this type of scale.
1.1 Summary
The Stable Matching Problem was the result of two mathematical economists, Gale and Shapley, trying to find a college admissions or job recruiting process that was self-enforcing. In a situation where matches repeatedly change based on individuals' preferences, a lot of chaos may be generated, and both sides of the matches can end up unhappy with the process and the outcome. The Stable Matching Problem is a method to reduce this chaos. To understand the essence of the problem, we eliminate real-life complications and just look at a more bare-bones version: Consider a set M = {m1, …, mn} of n men and a set of W = {w1, …, wn} of n women. M x W denotes the set of all possible ordered pairs of the form (m, w) where m is a man in the set M and w is a woman in the set W. A matching S is a set of ordered pairs each from M x W where each member of M and each member of W appears in at most one pair in S. A perfect match S' is a matching with the property that each member of M and W appears in exactly one pair in S'. Adding preferences means that each m in set M ranks all women. m prefers w to w' if m ranks w higher than w' in his preference list. Likewise, all women rank all the men. Our goal is a perfect matching with no instabilities. S is stable if it is perfect and there is no instability with respect to S. It is possible for an instance to have more than one stable matching. To go through the algorithm, initially all m in M and w in W are single, or free. While there is a man m who is free and hasn't proposed to every woman, you choose one of the free men, m. Have him propose to each w in the order that they appear on the man's preference list. If w is free, (m, w) become engaged. Otherwise, w is currently engaged to m'. From here, if w prefers m' to m, m remains free and keeps searching. Otherwise, w prefers m to m' and (m, w) become engaged while m' becomes single. This process is repeated until there are no single men, or until free men have proposed to each woman. (7/10) My question for section 1.1: How useful would this solution be in a real-life situation with multiple real-life complications?
2.1 Summary
A major focus of the book is finding efficient algorithms for computational problems, which seems to encompass all of computer science. So to approach this specifically, first we identify broad themes and design principles in the development of algorithms. A property shared by many of the problems we study is that they are discrete, meaning that they involve a search over many combinatorial possibilities with the goal of efficiently finding a solution to satisfy clearly delineated conditions. Algorithms must be efficient. Particularly, space or memory used by an algorithm is a big issue. The proposed definition of efficiency is that an algorithm is efficient if, when implemented, it runs quickly on real input instances. This definition omits where and how well we implement an algorithm. This definition also does not account for how an algorithm may scale as the problem grows. Analyzing the worst-case of an algorithm has often been found to do a reasonable job of capturing its efficiency in practice. And if the worst-case can be found to be efficient, the algorithm is efficient. The search space on the Stable Matching Problem, for example, is very large, and the brute-force algorithm would plow through all perfect matchings in numerical order. This means that brute-force is a very slow, inefficient solution to a problem. So, another proposed definition for efficiency is that “an algorithm is efficient if it achieves qualitatively better worst-case performance, at an analytical level, than brute-force search.” Polynomial running time is when an algorithm holds to the principle where for two constants c > 0 and d > 0, on every input instance of size N, its running time is bounded by cN^d primitive computational steps. Another proposed definition of efficiency is that an algorithm is efficient if it has a polynomial running time. This is also good because it becomes negatable; it is possible to express the notion that there is no efficient algorithm for a particular problem. (6/10) This leads to my question: Just how efficient is the Gale-Shapley solution to the Stable Matching Problem?
2.2 Summary
An algorithm's worst-case running time on size n inputs grows at most at a rate proportional to some function f(n). Algorithms can be expressed in pseudo-code, that resembles a high-level programming language. We want to express the growth rate of running times and other functions so that constant factors and low-order terms are not counted; for example, with a running time of 1.62n^2 + 3.5n + 8, we'd say that it grows like n^2 up to constant factors.
Asymptotic Upper Bounds
Let T(n) be a function, or the worst-case running time of an algorithm with input size n. With another function f(n), T(n) is O(f(n)) if the function T(n) is bounded above by a constant multiple of f(n); T(n) = O(f(n)). In other words, T(n) = O(f(n)) if there are constants c > 0 and nO ≥ 0 so that for all n ≥ n), we have T(n) ≤ c*f(n). In this case, T is asymptotically upper-bounded by f. Functions can have many upper bounds.
Asymptotic Lower Bounds
To show that the upper bound is the best possible running time, we show that for arbitrarily large input sizes n, T(n) is at least a constant multiple of some specific function f(n). Thus, T(n) = Ω(f(n)) if there exist constants ε > 0 and nO ≥ 0 so that for all n ≥ nO, we have T(n) ≥ ε*f(n). Here, T is asymptotically lower-bounded by f. ε must be fixed, independent of n.
Asymptotically Tight Bounds
If we can show that T(n) is both O(f(n)) and Ω(f(n)), then we've found the “right” bound: T(n) grows exactly like f(n) to within a constant factor. The notation to express this is as follows: if T(n) is O(f(n)) and Ω(f(n)), then T(n) is Θ(f(n)), and we say f(n) is an asymptotically tight bound for T(n). These are nice things to find on worse-case running times because they characterize the worst-case performance of algorithms precisely up to constant factors. One can obtain such bounds by closing the gap between upper and lower bounds.
Some properties that these asymptotic growth rates have are transitivity and sums of functions (we can get results by adding the two functions). Some functions that come up repeatedly in the analysis of the algorithms are polynomials, logarithms, and exponentials.
Though I do not have a direct question for this section, I am still having difficulty fully grasping the concept of these bounds: how they are found, and what exactly we use them for. I will continue to study them in hopes of fully understanding them as soon as possible. (8/10)
2.3 Summary
For each algorithm, the choice of data structure is up to the algorithm's designer. An array has these properties: -We can find the ith element in the arry in O(1) (constant) time. -Looking for an element e in the array is O(n) (linear) time(if unsorted). -If the elements in the array are sorted, we can find the element e in the array in O(logn) time with binary search. Arrays are less good for maintaining a dynamic list of elements that are frequently added to or removed from. Linked lists are an alternative, where each element points to the next in the list. Each pointer must be maintained, and the list can be traversed in time proportional to its length. A doubly linked list can also be used, elements pointing to previous and next elements. Linked lists are not as useful for finding the ith elements, because this takes O(i) time. Arrays and linked lists can be used to implement the Stable Matching algorithm. With these, the G-S algorithm can be implemented in O(n^2) time. Question: Is there an even faster way? Interest: 8/10
2.4 Summary
Linear Time: Algorithms that run in O(n) time have running times at most a constant factor times the size of the input. This is generally found by processing the input with a single pass. O(nlogn) time is a common running time: It is the running time of any algorithm that splits its input into two equal-sized pieces, solves each piece recursively, and then combines the solution in linear time. Another common running time is quadratic time, or O(n^2). This is commonly found by performing a search over all pairs of input items and spending constant time per pair, or from a pair of nested loops. Cubic (O(n^3)) time can be seen with more elaborate sets of nested loops. It is unclear if the algorithms that lead to this are practical on inputs of reasonable size. Finally, we have O(n^k) time running time when for any constant k, we search over all subsets of size k. Question: Are there other running times between these that just aren't as common? Interest: 9/10
2.5 Summary
Priority queues are useful when describing how to implement graph algorithms. It is designed for applications where elements have a priority value, or key, and each time an element is selected from a set, we want to take the one with highest priority. A priority queue is a data structure that maintains a set of elements where each element has an associated value key that denotes the element's priority. A heap is used to implement a priority queue. The heap data structure combines benefits of a sorted array and a list. We think of a heap as a balanced binary tree; the tree will have a root and each node can have up to 2 children. The keys in the binary tree are in heap order if the key of any element is at least as large as the key of the element of its parent. Heapify-up is used to “fix” the tree if this condition is not met. Heapify-down is used if the key is too big and one of its children. Together, Heapify-down and Heapify-up ops can efficiently implement a priority queue. Question: What are priority queues good for compared to arrays or linked lists? Interest: 8/10
3.1 Summary
A graph G is a way of encoding pairwise relationships among objects. In a graph we have a collection V of nodes and a collection E of edges, where two nodes are connected to each other. If we want to represent asymmetric relationships between nodes (one node points to the other, but not vice versa), we use a directed graph. When we want to emphasize that a graph is not directed, we call it an undirected graph, but by default, “graph” refers to an undirected graph. Some useful examples of graphs are as follows: Transportation networks (maps of routes between airports), Communication networks (maps with computers as nodes and network connections as edges), Information networks (World Wide Web graph), Social Networks (nodes of people who interact, edges are their connections or interactions), and Dependency Networks (directed graphs that show interdependencies among a collection of objects, like a food web). A traversal of a sequence of nodes connected by edges is called a path. The path is simple if all its vertices are distinct. A cycle is a path in which all nodes but the last are distinct (if there are more than 2 nodes). An undirected graph is connected if there a path to every node from every other node. The distance between two nodes is the minimum number of edges required to get to one of the nodes from the other. An undirected graph is a tree if it is connected and doesn't contain a cycle. A node x is the parent of node v if x directly precedes v on its path from an original point r (in an oriented tree). Node w is the child of v if v is the parent of w, and w is a descendant of v if v lies on the path from the root r to w. Furthermore, a node y is a leaf if it has no descendants. Such rooted trees are fundamental in computer science because they lead to the notion of a hierarchy. Q: Are there other types of graphs that we will see? Interest: 6/10 (seen this several times before in other classes, but it's a nice review)
3.2 Summary
We want to find an efficient algorithm that determines s-t connectivity, or in other words, determines whether there is a path from node s to node t in a graph G. This could also be called the Maze-Solving Problem. The simplest algorithm for determining s-t connectivity is breadth-first search that explores outward from s in all possible directions, adding nodes one layer at a time. This is continued until no new nodes are found. Breadth-first search naturally produces a tree T rooted at s on the set of nodes reachable from s. A tree produced this way is called a breadth-first search tree. All nodes reachable from s are found by the BFS, and are known as the connected component of G containing s. Another way to find nodes reachable from s is the approach that may be taken if G were actually a maze: follow the first edge leading from v and continue until a node whose neighbors have all already been explored is found. Then, you must backtrack to a node with an unexplored neighbor and resume the search from there. This algorithm is called the depth-first-search (DFS) because it explores G by going as deeply as possible in one direction and only backtracks when necessary. A tree built from this is called a depth-first search tree of the connecting component R. Both DFS and BFS build the connected component containing s and achieve qualitatively similar levels of efficiency. DFS visits the same set of nodes as BFS, but usually in a different order. For any two nodes s and t in a graph, their connected components are either identical or disjoint. Q: Is there another type of search not based on the connected components? Would that be much less efficient? Interest: 7/10
3.3 Summary
BFS and DFS differ essentially in that one uses a queue and one uses a stack. There are two main ways to represent graphs: by an adjacency matrix and by an adjacency list representation. Adjacency lists are used mainly in our book. An adjacency matrix is an n x n matrix A where A[u, v] is equal to 1 if the graph contains the edge (u, v) and 0 otherwise. The matrix A is symmetric if the graph is undirected. This method allows us to check if a given edge is in the graph in constant time. However, this takes up Θ(n²) space, and many algorithms are needed to examine all edges incident to a given node. An adjacency list works better for sparse graphs with many fewer than n² edges. This representation has a record for each node, with a list of the nodes to which each node has edges. This adjacency list only requires O(m+n) space. A queue is a set from which we extract elements in first-in, first-out (FIFO) order. A stack is a set from which we extract elements in last-in, first-out (LIFO) order. An adjacency list is ideal for implementing a breadth-first search, and can also be useful for depth-first searches. Discovering a node is the first time it is seen in an algorithm, while exploring the node means that all incident edges to v are scanned, resulting in the potential discovery of further nodes. There is a distinction between the two in both BFS and DFS. The overall running time of both with an adjacency list is just O(m + n). Q: When is an adjacency matrix most useful? Interest: 7/10
3.4 Summary
A bipartite graph is a graph where the set of nodes V can be partitioned into sets X and Y such that every edge has one end in X and the other in Y. If a graph is a bipartite, it cannot contain an odd cycle. In fact, odd cycles are the only obstacle for a graph being bipartite. Say that each edge must connect to one “red” and one “blue” node. Pick a node in the graph and color it red. Then, all of its neighbors must be colored blue. It then follows that all the neighbors of THESE nodes must be red, their neighbors must be blue, and so on until the whole graph is colored. If this works, the graph is bipartite. If an edge connects to two ends of the same color, then there is nothing that can be done, and the graph is not bipartite. This coloring process is essentially identical to a BFS. Following this, with the graph G and the layers produced by the BFS L1, L2,… exactly one of the following two things must hold: 1)There is no edge of G joining two nodes of the same layer. In this case G is a bipartite graph in which the nodes in even-numbered layers can be colored red, and the nodes in odd-numbered layers can be colored blue. 2) There is an edge of G joining two nodes of the same layer. In this case, G contains an odd-length cycle, and so it cannot be bipartite.
3.5 Summary
Many basic definitions and algorithms, like the adjacency list representation and graph search algorithms (like BFS, DFS) have applications with directed graphs, not just undirected. Directed graphs are represented with a version of the adjacency list that is used for undirected graph. But now, each node has two lists associated with it: one list of nodes to which is has edges, and another list of nodes from which it has edges. The graph search algorithms are almost the same in directed graphs as in undirected, and the running time is still O(m+n). Directed graphs are strongly connected if, for every two nodes us and v, there is a path from u to v and a path from v to u. If there is a path from u to v and from v to u, the two nodes are mutually reachable. If u and v are mutually reachable, and v and w are mutually reachable,then u and w are mutually reachable. For any two nodes s and t in a directed graph, their strong components are either identical or disjoint.
3.6 Summary
If an undirected graph has no cycles, each of its connected components is a tree. A directed graph with no cycles is called a directed acyclic graph, or a DAG. DAGs can be used to encode precedence relations or dependencies in a natural way. If G has a topological ordering, then G is a DAG. Likewise, if G is a DAG, then G has a topological ordering. In every DAG G, there is a node v with no incoming edges. Computing a topological ordering of a graph can be done (and is done in the book) in O(n^2) time.
Chapter 4 Intro Summary
A greedy algorithm is one that builds up a solution in small steps, and chooses a decision at each step to optimize some underlying criterion. Many different greedy algorithms can often be designed for the same problem, each locally, incrementally optimizing different parts on the way to a solution. Greedy algorithms can be invented for almost any problem, and are often close to optimal, even if not exactly optimal. One approach for establishing this is that the greedy algorithm stays ahead, meaning that if one measures the greedy algorithm's progress step-by-step, one can see that it does better than any other algorithm at each step. Another approach is the exchange argument, where one considers any possible solution to the problem and gradually transforms it into the solution found by the greedy algorithm without hurting its quality. Some well-known applications of greedy algorithms: shortest paths in a graph, the Minimum Spanning Tree Problem, and construction of Huffman Codes. What are Huffman Codes? Interest 8/10
4.1 Summary
Recalling the Interval Scheduling Problem, we will say that a subset of the requests is compatible if no two of them overlap in time, and compatible sets of maximum size will be called optimal. This problem can be solved with a greedy algorithm, where available requests are selected in some way. The most optimal greedy algorithm is, for each request, count the number of other requests not compatible, and accept the request that has the fewest number of noncompatible requests. In fact, this can be analyzed to show that this greedy algorithm does “stay ahead,” and ends up being the optimal solution. In the Interval Scheduling Problem, there is a single resource and many time interval requests that must be picked from. Related to this is the problem of scheduling ALL intervals, in the fewest number of slots. (Ex: seeing how many classrooms are required to schedule x number of classes with varying time intervals.) In any instance of Interval Partitioning, the number of resources needed is at least the depth of the set of intervals. The greedy algorithm in the book schedules each interval on a resource, using a number of resources equal to the depth of the set of intervals, which is the optimal number of resources needed. Could we work with more complex interval scheduling problems with this algorithm? Interest 7/10
4.2 Summary
We will now look at another, more complicated scheduling type of problem. We are given a set of n requests, each with a starting time s. However, each request is now flexible: the request i has a deadline d_i, and requires a contiguous time interval of length t_i, but is willing to be scheduled any time before the deadline. Each request must be assigned an interval of time of length t_i, with nonoverlapping intervals for different requests. So beginning at start time s, assign each request i an interval of time of length t_i, denoted by [s(i), f(i)], with f(i) = s(i) + t_i. This algorithm must actually determine a start time for each interval. A request is late if it misses the deadline: f(i) > d_i. Lateness of such a request is defined as l_i = f(i) - d_i. l_i = 0 if i is not late. The goal is to schedule all requests, using nonoverlapping intervals, minimizing the maximum lateness, L = max_i l_i. This problem comes up when scheduling jobs that need to use a single machine, so we refer to our requests as jobs. An optimal solution to this can be reached by sorting the jobs in increasing order of their deadlines d_i, and scheduling them in this order (called Earliest Deadline First). This approach is optimal despite ignoring half the input information (the length of the jobs). This can be seen because the algorithm does not allow for gaps in time, or idle time where there is work to be done but for some reason the machine is sitting idle. Is there another Greedy Algorithm that is just as efficient? Interest: 7/10
4.4 Summary
Here we apply a greedy algorithm to the problem of finding shortest paths in graphs. Given nodes u and v, with the shortest u-v path? Or, given a start node s, what is the shortest path from s to each other node? This problem can be solved with Dijkstra's algorithm. This determines the length of the shortest path from s to each other node in the graph; the paths can then be produced from this. The algorithm maintains a set S of vertices u for which we've determined a shortest-path distance d(u) from s, the “explored” part of the graph. Initially S = {2} and d(s) = 0. Now for each node v, we determine the shortest path that can be constructed by traveling along a path through the explored part S to another node u, followed by the single edge (u, v). Now, consider the set S at any point in the algorithm's execution. For each node u in S, the path P_u is a shortest s-u path. This immediately establishes the correctness of Dijkstra's Algorithm, since we can apply it when the algorithm terminates, at which point S includes all nodes. This algorithm has an O(mn) running time. However, using a priority queue, Dijkstra's Algorithm can be implemented on a graph with n nodes and m edges to run in O(m) time, plus the time for n ExtractMin and m ChangeKey operations. Thus, the overall implementation is O(mlogn). How would we use a priority queue with Dijkstra's Algorithm? Interest: 8/10
4.5 Summary
Now we will look again at the Minimum Spanning Tree Problem with Greedy Algorithms. Given a set of locations, we want to build a connected network so that there is a path between every pair of nodes, but as cheaply as possible. We may build a direct link between two nodes for a certain cost, greater than 0. Thus we can represent the set of possible links that may be built using a graph with a positive cost associated with each edge. We must find a subset of the edges so that the graph (V, T) is connected, and the total cost is as small as possible. Let T be a minimum-cost solution to this network design problem. Then (V,T) is a tree. There are 3 greedy algorithms which correctly find a minimum spanning tree for this problem, defined on page 143 of the book. These are Kruskal's Algorithm, Prim's Algorithm, and the Reverse-Delete Algorithm. Now, when is it safe to include an edge in the minimum spanning tree? Assume that all edge costs are distinct. Let S be any subset of nodes that is neither empty nor equal to all of V, and let edge e = (v, w) be the minimum-cost edge with one end in S and the other in V - S. Then every minimum spanning tree contains the edge e. Kruskal's Algorithm, Prim's Algorithm, and the Reverse-Delete Algorithm all produce a minimum spanning tree of G. So, when can we guarantee an edge is not in the minimum spanning tree? Assume that all edge costs are distinct. Let C be any cycle in G, and let edge e = (v, w) be the most expensive edge belonging to C. Then e does not belong to any minimum spanning tree of G. Using a priority queue, Prim's Algorithm can be implemented in O(mlogn) running time. Are there any benefits to using one of these algorithms over the other two? Interest: 8/10
4.6 Summary
We will develop a data structure called the Union-Find structure, which will store a representation of the components in a way that supports rapid searching and updating. This kind of data structure is needed to implement Kruskal's Algorithm efficiently. As each edge is considered, we need to find the identities of the connected components containing v and w. If these components are different, there isn't a path from v and w, so the edge should be included; conversely, if the components are the same, that edge is already included and the new addition should be excluded. If the edge e is included, the data structure should support the efficient merging of the components of the nodes it connects into a single new component. The Union-Find data structure supports three main operations: 1) MakeUnionFind(S): for a set S will return a Union-Find data structure on set S where all elements are in separate sets, implementing MakeUnionFind in O(n) time. 2) Find(u), which will return the name of the set containing a particular node u. This should be done in O(logn) time. 3) Union(A, B) will change the data structure by merging two sets A and B into a single set, which should be implemented in O(logn) time. Now, consider the array implementation of the Union-Find data structure for some set S of size n, where unions keep the name of the larger set. The Find operation takes O(1) time, MakeUnionFind(S) takes O(n) time, and any sequence of k Union operations takes at most O(klogk) time. Also, consider the pointer-based implementation of the Union-Find data structure (page 154-155 of the book) for some set S of size n, where unions keep the name of the larger set. A Union operation takes only O(1) time, MakeUnionFind(S) takes O(n) time, and a Find operation takes O(logn) time. So, Kruskal's Algorithm can be done on a graph with n nodes and m edges in O(mlogn) time. Would using a pointer for Kruskal's be noticeably faster than the original way described? Interest: 8/10
4.7 Summary
Clustering occurs whenever we have a collection of objects that we are trying to classify or organize into logical, coherent groups. A common approach for grouping these objects is defining a distance function on the objects, so that objects at a larger distance from one another are less similar to each other. Minimum Spanning Trees thus play an important role in this kind of grouping. If we are seeking to divide the objects in a set U of n objects in k groups, for a given parameter k, we that a k-clustering of U is a partition of U into k nonempty sets C_1, C_2, …, C_k. The spacing of a k-clustering is the minimum distance between any pair of points lying in different clusters. So, there are exponentially many different k-clusterings of a set U. To find one of maximum spacing, we grow a graph on the vertex set U. The connected components are the clusters, and we try to bring nearby points together into the same cluster as quickly as possible so they don't end up as points in different clusters that are close together. Each time we add an edge that spans two distinct components, we have basically merged the two corresponding clusters, which is called single-link clustering: a special case of hierarchical agglomerative clustering. (Agglomerative means that we combine clusters; single-link means we do so as soon as a single link joins them together). The components C_1, C_2, …, C_k formed by deleting the k-1 most expensive edges of the minimum spanning tree T constitute a k-clustering of maximum spacing. What are some other uses for the minimum spanning tree algorithms? Interest: 8/10
4.8 Summary
Data compression is an area that forms part of the foundations for digital communications. We can take text written in richer alphabets (such as the alphabets used for human lanugages) and convert them into long strings of bits using a fixed number of bits for each symbol in the alphabet and then concatenating the bit strings for each symbol to form the text. However, it's a huge waste to translate them all into the same number of bits; we could use a smaller number of bits for more frequent letters and larger numbers of bits for less frequent ones, and perhaps use fewer than five bits per letter on average. Reducing the average number of bits per letter is a fundamental problem in the area of data compression, and are sought after with compression algorithms. A similar example to this kind of compression on the internet is Morse code, where each letter was translated into a sequence of dots and dashes. The ambiguity problem in Morse code happens because there are pairs of letters where the bit string that encodes one letter is a prefix of the bit string that encodes another. We can eliminate this problem by mapping letters to bit strings in such a way that no encoding is a prefix of any other. There is an optimal prefix code, with corresponding tree T*, in which the two lowest-frequency letters are assigned to leaves that are siblings in T*. The Huffman code for a given alphabet achieves the minimum average number of bits per letter of any prefix code. Did the people who invented computer codes model their binary ideas after Morse code? Interest: 7/10
5.1 Summary
The mergesort algorithm is a very basic example of a divide-and-conquer algorithm. To analyze its running time, we will abstract its behavior into this template that is used to describe many common divide-and-conquer algorithms: Divide the input into two pieces of equal size; solve the two subproblems on these pieces separately by recursion; and then combine the two results into an overall solution, spending only linear time for the initial division and final recombining. The running time T(n) satisfies the following recurrence relation. For some constant c, T(n) ≤ 2T(n/2) + cn when n > 2, and T(2) ≤ c. There are two ways one can go about solving a recurrence: 1)“Unroll” the recursion, accounting for the running time across the first few levels, and identify a pattern that can be continued as the recursion expands. Then sum the running times over all levels of the recursion, thereby arriving at a total running time. 2) Start with a guess for the solution, substitute it into the recurrence relation, and check that it works. This can be justified by induction on n. Is mergesort the most efficient sorting algorithm? Interest: 6/10
5.2 Summary
A more general class of algorithms is obtained by considering divide-and-conquer algorithms that create recursive calls on q subproblems of size n/2 each and then combine the results in O(n) time. 1) For some constant c, T(n) ≤ qT(n/2) + cn when n >2, and T(2) ≤ c. Any function T(*) satisfying 1) with q > 2 is bounded by O(n^(log_2 * q)). So, the running time is more than linear, since log_2 * q > 1, but still polynomial in n. Any function T(*) satisfying 1) with q = 1 is bounded by O(n). Furthermore, for some constant c, T(n) ≤ 2T(n/2) + cn^2 when n > 2 and T(2) ≤ c. What are some algorithms that we may use regularly that use these recurrence relations? Interest: 4/10
5.3 Summary
Another problem that can be solved with divide-and-conquer algorithms is a problem that arises in the analysis of rankings. For example, many sites on the Web make use of a technique known as collaborative filtering, where they try to match people's preferences with those of other people on the Internet. It uses these similarities in interest to recommend new things that the other people have liked. Another application is in meta-search tools on the Web. To compare two sets of rankings, we count the number of inversions used to order the sets. This can be done in O(nlogn) time. The most crucial routine in the process of counting inversions is the Merge-and-Count algorithm. Because the Merge-and-Count procedure takes O(n) time, the Sort-and-Count algorithm correctly sorts the input list and counts the number of inversions; it runs in O(nlogn) time for a list with n elements. Question: Do Facebook, twitter, and other modern search engines still use a similar algorithm in their recommendations? Interest: 6/10
5.4 Summary
Finding the closest pair of points means that, given n points in the plane, we want to find the pair that is closest together. This problem is found in computational geometry, and is useful in graphics, computer vision, geographic information systems, molecular modeling, etc. Given points P = {p_1, …, p_n}, our goal is to find a pair of points p_i, p_j that minimizes the distance d(p_i, p_j). This is solved with another divide and conquer algorithm, like Mergesort: find the closest pair among the points in the “left half” of P and the closest pair among the points in the “right half” of P. Use this information to get the overall solution in linear time. Overall, this gives O(nlogn) running time, and correctly outputs a closest pair of points in P. Question: How exactly DO we recombine at the end in linear time? Interest: 3/10
5.5 Summary
Integer multiplication can be done with a different divide and conquer algorithm, in which the “default” quadratic algorithm is improved by means of a different recurrence. Here, we break up the product of the two integers into partial sums. Putting it into base-2 reduces the problem of solving a single n-bit instance to the problem of solving four n/2 - bit instances. Then, recursively compute the results for these four n/2 - bit instances, and then combine them. This results in the Recursive-Multiply algorithm. The running time of Recursive-Multiply on two n-bit factors is O(n^(log_2(3)) = O(n^1.59). Question: Is this algorithm used behind the scenes for each integer multiplication that we ask of our computers? Interest: 8/10
6.1 Summary
The Interval Scheduling Problem can be solved for an optimal solution with a particular greedy algorithm, but the Weighted Interval Scheduling Problem requires a more general version, where each interval has a value or weight, and we want to accept a set of maximum value. With weighted intervals, no natural greedy algorithm is known; instead, we switch to dynamic programming. In particular, a recursive type of algorithm is used to solve this problem, where each request is said to belong to an optimal solution on the given set if and only if v_j + OPT(p(j)) >= OPT(j-1), where OPT(x) is the value of the optimal solution. In the process of using this recursion to calculate the optimal solution, we must save values that have already been computed, which is called memoization. The running time of the complete algorithm with the memoization is O(n). Question: How does this algorithm avoid delving through all possibilities, and thus being as inefficient as brute force? Interest: 4/10
6.2 Summary
The key to the efficient algorithm used for the weighted interval scheduling problem is the array M. This array is built by using the value of optimal solutions to the subproblems on intervals 1 through j for each j, using the previous algorithm to define each value of M[j] based on values that come earlier in the array. Once we have the array M, the problem is solved, because it contains the value of the optimal solution on the full instance, and our Find-Solution algorithm can be used to backtrack through M and return the final, optimal solution. Dynamic programming algorithms can be developed using an iterative building up of subproblems; this approach is often a simpler way to express the algorithms. To develop an algorithm based on dynamic programming, one needs a collection of subproblems derived from the original problem that satisfy the following properties: 1) There are only a polynomial number of subproblems 2) The solution to the original problem can be computed from solutions to the subproblems 3) There's a natural ordering on subproblems from “smallest” to “largest” together with an easy-to-compute recurrence that allows one to determine the solution to a subproblem from the solutions to some number of smaller subproblems. Question: Is dynamic programming only used for those problems that cannot be solved with more efficient algorithms, just as a better alternative to brute force? Interest: 4/10
6.3 Summary
In the segmented Least Squares problem, the recurrence we use will involve what could be called “multi-way choices.” At each step, we have a polynomial number of possibilities to consider for the structure of the optimal solution. Essentially, we are trying to find the smallest number of lines of best fit that can be used and still approximately cover the majority of the points on the graph. For each segment, we compute the line minimizing error with respect to the points in S. The penalty of a partition is defined to be a sum of 1) the number of segments into which we partition P, times a fixed, given multiplier C > 0 and 2) for each segment, the error value of the optimal line through the segment. We want to find a partition of least penalty. For the subproblem on the points p_1 through p_j, OPT(j) = min(e_i,j + C + OPT(i-1)) and the segment p_i through p_j is used in an optimum solution for the subproblem if and only if the minimum is obtained using index i. In total, the algorithm that is found for this problem has a running time of O(n^2) once all the e_i,j values have been determined. Question: What would we do if there was a graph with several very-stray points; would it just be impossible to compute a good best-fit line? Interest: 6/10
6.4 Summary
In the subset sums and knapsacks problem, the “obvious” set of subproblems turns out to not be enough, and we end up creating a richer collection of subproblems. This is done by adding a new variable to the recurrence underlying the dynamic program. Here we have a single machine that can process jobs between the times of 0 and W for some W. Our goal is to process jobs so as to keep the machine as busy as possible up to the W “cut-off.” Given n items that each have a nonnegative weight, and a bound W, we want to select a subset S of the items so that the sum of each of the items' weights is less than the W cut-off, while the sum of all of their weights is as large as possible. This is called the Subset Sum Problem, which is a natural special case of the Knapsack Problem where each request i has both a value and a weight. Dynamic programming can be used to solve this problem. The Subset-Sum(n, W) Algorithm correctly computes the optimal value of the problem and runs in O(nW) time. Given a table M of the optimal values of the subproblems, the optimal set S can be found in O(n) time. The Knapsack Problem can be solved in O(nW) time. Question: Is there another algorithm (not using dynamic programming) that can find an equally efficient solution/running time? Interest: 7/10
7.1 Summary
Graphs can often be used to model transportation networks, whose edges carry “traffic” and whose nodes act as “switches” passing traffic between different edges. Network models have: capacities on the edges, source nodes which generate traffic, sink (destination) nodes, and the traffic itself (or flow) which is transmitted across the edges. To find a maximum flow in a network, with the smallest bottleneck, we must answer the Maximum-Flow Problem with a combination of greedy and dynamic programming. We can push forward on edges with leftover capacity and backward on edges that are already carrying flow to change its direction. A residual graph provides a systematic way to search for forward-backward operations like this.The algorithm that answers this is known as the Ford-Fulkerson Algorithm, which runs in O(mC) time. Question: Which parts of this are dynamic programming, and which are greedy? Interest: 6/10
7.2 Summary
This section is about the analysis of the Ford-Fulkerson Algorithm, and how it provides insight into the Maximum-Flow Problem. Here we see and prove that the Ford-Fulkerson Algorithm has the maximum possible value of any flow in G. By watching the amount of flow f (any s-t flow) sends across a cut, we can measure the flow value. Let f be any s-t flow and (A, B) any s-t cut. Then v(f) ≤ c(A, B). If f is an s-t-flow such that there is no s-t path in the residual graph G_f, then there is an s-t cut (A*, B*) in G for which v(f) = c(A*, B*). Consequently, f has the maximum value of any flow in G, and (A*, B*) has the minimum capacity of any s-t cut in G. Given a flow f of maximum value, we can compute an s-t cut of minimum capacity in O(m) time. In every flow network, the maximum value of an s-t flow is equal to the minimum capacity of an s-t cut. If all capacities in the flow network are integers, then there is a maximum flow f for which every flow value f(e) is an integer. Question: Because this algorithm could potentially run forever, is it really always efficient? Interest: 6/10
7.5 Summary
One of the main goals of the Maximum-Flow Problem is to be able to solve the Bipartite Matching Problem. A bipartite graph G=(V, E) is an undirected graph whose node set can be partitioned so that every edge has one end in X and the other in Y. A matching M in G is a subset of the edges so that each node appears in at most one edge in M. The Bipartite Matching Problem involves finding a matching in G of largest possible size. The size of the maximum matching in G is equal to the value of the maximum flow in G'; and the edges in such a matching in G are the edges that carry flow from X to Y in G'. The Ford-Fulkerson Algorithm can be used to find a maximum matching in a bipartite graph in O(mn) time. Assume that the bipartite graph G = (V, E) has two sides X and Y such that |X| = |Y|. Then the graph G either has a perfect matching or there is a subset A⊆X such that |Γ(A)| < |A|. A perfect matching or an appropriate subset A can be found in O(mn) time. Question: What real world applications do bipartite graphs have? Interest: 4/10
7.7 Summary
In the Maximum-Flow Problem, we had only a single source s and a single sink t. Now we can have a set S of sources and a set T of sinks. Instead of maximizing the flow value, we will have sources that have fixed supply values and sinks with fixed demand values, and we want to ship flow from nodes with supply to those with demands. There is a feasible circulation with demands {d_v} in G if and only if the maximum s* - t* flow in G' has value D. If all capacities and demands in G are integers, and there is a feasible circulation, then there is a feasible circulation that is integer-valued. There is a feasible circulation in G if and only if there is a feasible circulation in G'. If all demands, capacities, and lower bounds in G are integers, and there is a feasible circulation, then there is a feasible circulation that is integer-valued. Question: Is this algorithm just a slightly tweaked version of the Ford-Fulkerson Algorithm? Interest: 6/10