Table of Contents
Maggie's Journal
Preface
Algorithms’ use has an enormous reach, not just for use in computer programing and the internet but for also in expressing the similarities among genes and genomes in biology, algorithms is a powerful lens through which to view the field of computer science in general, algorithms is the tasks of getting to the mathematically clean core of a problem and the task of identifying the appropriate algorithm design technics based on the structure of the problem, the goal of this books is to offer advice on how to identify clean algorithms in complex issues from different areas of computing and how to design efficient algorithms
1.1 Stable Matching:
The Stable Matching Problem – Gale and Shapley wanted to come up with a self-enforcing process, so you take two sets of groups, one being applicants and the other being employers for example, and you match each applicant to each company based on a list of preferences from both sets. Companies makes offers and an applicant can either accept or reject the offer and the process continues until all are in a stable matching – ie there isn’t an applicant and a company that would be willing to leave their matched other for each other. So a company gives out an offer to their favorite applicant. if the applicant is not matched with anyone they accept the offer and if the applicant is already matched to a company then the applicant can accept the offer if it is a better offer or reject if it is not a better offer, if rejected the company would then place an offer to the next applicant on their preference list. Once a stable matching has been made then we move on to another company until all the companies and applicants are matched. One problem with the matching of companies to applicants is that one company is searching for multiple applicants while each applicant is searching for one employer. To generalize it is best to look at n men and n women who are looking to be matched with one another into pairs. So our goal is to make a set of marriages with no instabilities. This algorithm for making stable pairs terminates after at most n^2 iterations of the while loop because the worst case scenario is that each of the n men must propose to each of the n women before he finds one who accepts. Here are some properties that arise during the running of the algorithm
- A woman remains engaged from the point at which she receives her first proposal - The sequence women to whom a man proposes gets worse and worse as in terms of his preference list - If a man is free at some point during the algorithm, then there is a women to whom he has not yet proposed - The set of pairs returned at termination is a perfect matching - The set of pairs returned is a stable matching
2.1 Computational Tractability
Efficiency: An algorithm is efficient if, when implemented, it runs quickly on real input instances. But what does it mean to run quickly and what are real input instances?? This varies from algorithm to algorithm and how the algorithms were implemented. So we often look at the worst-case scenario as a guide for how efficient an algorithm is. A second definition for efficient is an algorithm is efficient if it achieves qualitatively better worst-case performance, at an analytical level, than brute-force search. So back to the stable matching problem if we were to do the brute-force search of figuring out which were stable matches we would have to look at the n! different possible matchings and sort through them that way as opposed to the algorithm which has a worst case runtime of n^2. And n^2 is definitely better than n!. The different types of runtime are constant (1), linear(n), logarithmic(nlogn) polynomial(n^2, n^3 and so on), exponential(2^n), and factorial(n!). Another definition for efficiency is an algorithm is efficient if it has a polynomial running time. (see table 2.1 on page 34)
2.2 Asymptotic Order of Growth:
Asymptotic Upper Bounds – Let T(n) be a function, the worst-case running time of a certain algorithm of an input of size n, we say that T(n) is order f(n) given another function f(n) if for sufficiently large n, the function T(N) is bounded above by a constant multiple of f(n) – sometimes written as T(n) = O(f(n)) if there exists constants c>0 and n_0 >= 0 such that for all n >=n_0 we have T(n) ⇐ c * f(n). here we say that T is asymptotically upper-bounded by f, a constant c to exist that works for all n, c cannot depend on n, O notation expresses only an upper bound, not the exact growth rate of the function so if T(n) = pn^2 + qn + r then O(n^2)
Asymptotic Lower Bounds – we want to express the notion that for arbitrarily large input sizes n, the function T(n) is atleast a constant multiple of some specific function f(n), T(n) = Omega(f(n)) as being the asymptotically lower-bounded by f. Asymptotic Tight Bounds – if we can show that a running time T(n) is both O(f(n)) and Omega(f(n)), then in a natural sense we’ve found the right bound: T(n) grows exactly like f(n) within a constant factor, if T(n) is both O(f(n)) and Omega(f(n)) then we say that T(n) is Theta(f(n)), in this case, we say that f(n) is an asymptotically tight bound for T(n) Properties of Growth Rates
-Let f and g be two functions that the limit as n approaches infinity of f(n)/g(n) exist and is equal to some number number c > 0. Then f(n) = Theta(g(n))
-A first property is transitivity: if a function f is asymptotically upper-bounded by a function g and if g in turn is asymptotically upper-bounded by a function h, then f is asymptotically upper-bounded by, a similar property holds for lower bounds o If f = O(g) and g = O(h), then f = O(h)
-If f = Theta(g) and g = Theta(h), then f = Theta(h)
-Suppose that f and g are two functions such that for some other function h, we have f = O(h) and g = O(h), then f + g = O(h)
-Let k be a fized constant, and let f_1, f_2, …, f_k and h be functions such that f_i = O(h) for all i. Then f_1 + f_2 + … + f_k = O(h)
-Suppose that f and g are two functions, such that g = O(f). Then f + g = Theta(f)
-Let f be a polynomial of degree d, in which the coefficient a_d is positive, then f = O(n^d)
-For every b > 1 and every x > 0, we have log_b(n) = O(n^x)
-For every r > 1 and every d > 0, we have n^d = O(r^n)
2.3 Implementing the Stable Matching Algorithm Using Lists and Arrays:
- An Array is less good for dynamically maintaining a list of elements that changes over time, it is generally cumbersome to frequently add or delete elements to a list that is maintained as an array
- In a linked list, the elements are sequenced together by having each element point to the next in the list
- A doubly linked list can be modified with deletion and insertion both of which run in constant time since it is a matter of changing pointers
- Unlike arrays, we cannot ifnd the ith element of the list in O(1) time: to find the ith element, we have to folow the NExt pointers starting from the beginning of the list, which takes a total of O(i) time
- It is easy to convert between the array and list representation in O(n) time. This allows us to freely choose the data structure that suits the algorithm better and not be constrained by the way the information is given as input.
Implementation for the Stable Matching
- Identify a free man –> we will maintain the set of free men as a linked list, when men become engaged and single it is easy to add and remove them from the linked list and the list held and runs in constant time
- For any man m, be able to identify the highest-ranked woman to whom he has not yet proposed –> an array will be made for each man m which has the preference list of women for each man m and there will be pointers to mark the position of the next woman man m will propose to
- For a woman w, be able to identify is w is engaged or not and if engaged, then identify her current partner –> we will use an array of length n the indices of the array are each woman and the value at each index is the man to whom woman w is engaged to, if she is not engaged to anyone, a null symbol (-1) will mark these women
- For a woman w and two men m and m', be able to identify which man w prefers, m or m' –> we will make an inverse preference array from the women's preference list such that for a woman w, each man m is represented in the indices and then the value is how woman w ranked man m
*All of these run in constant time*
2.4 Survey of Common Running Times:
- Linear - O(n) - its running time is at most a constant factor times the size of the input, computing the maximum and merging two sorted lists have run times of O(n)
- Logarithmic - O(nlogn) - it is the running time of any algorithm that splits its input into two equal-sized pieces, solves each piece recursively, and then combines the two solutions in linear time,sorting is the most well known example of a problem that can be solved with this runtime, specifically MergeSort
- Quadratic - O(n^2) - arises from a pair of nested loops, an algorithm consists of a loop when O(n) iterations, and each iteration of the loop launches an internal loop that takes O(n) time, multiplying these two factors of n together gives us O(n^2)
- Cubic - O(n^3) - set of 3 nested loops within each other each of size n giving us O(n^3)
- Polynomial - O(n^k) - the total number of k-element subsets in an n-element set gives us O(n^k)
- Exponential - O(2^n) - the total number of subsets of an n-elment set 2^n and so the outer loop in this algorithm will run for 2^n iterations as it tries all these subsets
- Factorial - O(n!) - grows even more rapidly than 2^n, n! is the number of ways to match up n items with n other items
2.5 A More Complex Data Structure: Priority Queues:
- A priority queue is a data structure that maintains a set of elements S, where each element v in S has an associated value key(v) that denotes the priority of element v; smaller keys represent higher priorities. Priority queues support the addition and deletion of elements from the set, and also the selection of the element with smallest key.
- The heap data structure combines the benefits of a sorted array and list for purposes of this application. Think of a heap as a balanced binary tree, it will have a root and each node can have up to two children, a left and a right child. The keys in such a binary tree are said to be in heap order if the key of any element is at least as large as the key of the element at its parent node in the tree.
- The heap element with smallest key is at the root, so it takes O(1) time to identify the minimal element.
- To add an element we run Heapify-up and to delete an element we run Heapify-down
- The heap data structure with the Heapify-down and Heapify-up operations can efficiently implement a priority queue that is constrained to hold at most N elments at any point in time
- StartHeap(N) - returns an empty heap H that is set up to store at most N elements, has O(n) runtime
- Insert(H, v) - inserts the item v into heap H, has O(logn) run time
- FindMin(H) - identifies the minimum element in the heap H, has O(1) run time
- Delete(H, i) - deletes the element in heap position i, has O(logn) run time
- ExtractMin(H) - identifies and deletes an element with minimum key value from a heap, has O(logn) run time
2.3, 2.4, 2.5
- After reading 2.4 I am beginning to understand run time more than the last chapter, I understand O notation but I am sometimes confused on the difference between Omega and O notation, I understand what they each represent but I don't understand how calculating Omega and O notation differ for some examples. Can we do some examples in class of calculating O, Omega, and then Theta?
- I understand Heaps and priority queues very well, we did them in 112, but we did max heaps which were a little different but I understand the main process.
3.1 Basic Definitions and Applications:
- a graph G consists of a collection V of nodes and a collection E of edges, each of which “joins” two of the ndoes
- examples of graphs - transportation networks, communication networks, information networks(internet), social networks, and dependency networks
- a path in an undirected graph G = (V, E) to be a sequence P of nodes v1, v2, …, vk with the property that each consecutive pair vi and vi+1 is joined by an edge in G
- A path is called simple if all its vertices are distinct from one another
- A cycle is a path in which the sequence of nodes cycles back to where it began
- an undirected graph is connected if, for every pair of nodes u and v, there is a path from u to v
- A graph is strongly connected if for every two nodes u and v, there is a path from u to v and a path from v to u (directed graphs)
- the distance between two nodes u and v to be the minimum number of edges in a u-v path
- An undirected graph is a tree if it is connected and does not contain a cycle, deleting any edge from a tree will disconnect it
- Every n-node tree has exactly n-1 edge
3.2 Graph Connectivity and Graph Traversal:
Breadth-First Search - explore outward from node s in all possible directions, adding nodes one “layer” at a time, the first layer being all the nodes connected to s, then the second layer would be all the nodes that are connected to the nodes in the first layer if they haven't already been placed in a layer, (3.3)pg 80 For each j greater than or equal to 1, layer Lj produced by BFS consists of al nodes at distance exactly j from s. There is a path from s to t if and only if t appears in some layer. A further property of breadth-first search is that it produces, in a very natural way, a tree T rooted at s on the set of nodes reachable from s, we call the tree T that is produced in this way a breadth-first search tree. (3.4) pg 81 Let T be a breadth-first search tree, let x and y be nodes in T belonging to layers Li and Lj respectively, and let(x,y) be an edge of G. Then i and j differ by at most 1.
- The set R of nodes discovered by the BFS algorithm is precisely those reachable from the starting node s is the connected component of G containing S (3.5) pg 82
Depth-First Search - it explores a graph G by going as deeply as possibly and only retreating when necessary, like BFS, DFS builds the connected component containing s
- (3.8) pg 86 For any two nodes s and t in a graph, their connected components are either identical or disjoint, clear in Figure 3.2, pg 79
3.3 Implementing Graph Traversal Using Queues and Stacks:
- two ways to represent graphs, by adjacency matrix and by adjacency list
- an adjacency matrix is an n by n matrix A wehre a[u, v] is equal to 1 if the graph contains the edge (u,v) and 0 otherwise, if the graph is undirected, the matrix A is symmetric, with A[u, v] = A[v, u] for all nodes u and v in V, the representation takes Theta(n^2) space
- in the adjacency list representation there is a record for each node v, containing a list of the nodes to which v has edges
- (3.10) pg 89 The adjacency matrix representation of a graph requires O(n^2) space while the adjacency list representation requires only O(n+m) space
Queues and Stacks
- a queue is a set from which we extract elements in first-in, first-out order
- a stack is a set from which we extract elements last-in, first-out
Implementing BFS
- adjacency list data structure is ideal for implementing bfs,BFS algorithm pg 90, 91 with run time in O(n+m), if the graph is given by the adjacency list representation
- implement the algorithm using a single list L that we maintain as a queue, algorithm processes nodes in the order they were first discovered: each time a node is discovered, it is added to the end of he queue, and the algorithm always processes the edges out of the node that is currently first in the queue
Implementing DFS
- the difference between BFS and DFS is the way in which discovery and exploration are interleaved, for DFS we will do this in stack order
- (3.12) pg 93 The algorithm implements DFS, in the sense that it visits the nodes in exactly teh same order as the recursive DFS procedure in the previous section
- the main step in the algorithm is to add and delete nodes to and from the stack S, which takes O(1) time
- the implementation of DFS algorithm runs in time O(n+m) if the graph is given by the adjacency list representation
3.1, 3.2, 3.3
Are we going to use the recursive DFS that is given on pg 84? The book references it but the only one we went over in class was the non recursive one. I just don't really get how that one works so incase we need to know it maybe we can go over it in class.
So if you're running DFS on a graph and say you do it twice where you start at the same node, will you always get the same result? What if you choose one node after the first and a different one after the first. You won't get the same, but they are equivalent results right?
3.4 Testing Bipartiteness
- think of bipartiteness as coloring one node red and then all the nodes adjacent blue and then all adjacent nodes off of those red, and if you run into a spot where you are trying to change a node from blue to red or red to blue then the graph is not bipartite
- if a graph G contains an odd cycle, then the graph is not bipartite
- this is an application of BFS so you make the root node red and then all nodes in layer 1 blue and then the next layer would be red and if you come to any problems where a node is already colored and youre trying to change the color then the graph is not bipartite
- 3.15 Let G be a connected graph and let L1, L2, … be the layers produced by BRS starting at node s. Then exactly one of the following musth hold (1) there is no edge of G joining two nodes of the same layer - the graph is bipartite (2) there is an edge of G joining two nodes of the same layer - ie the graph is not bipartite (pg96)
3.5 Connectivity in Directed Graphs
- in a directed graph, the edge(u,v) as direction: it goes from u to v
- a graph is strongly connected if for every two nodes u and v, there is a path from u to v and a path from v to u
- if u and v are mutually reachable and v and w are mutually reachable then u and w are mutually reachable
- for any two nodes s and t in a directed graph, their strong components are either identical or disjoint
3.6 Directed Acyclic Graphs and Topological Ordering
- if an undirected graph has no cycles, then it has extremely simple structure: each of its connected components is a tree
- if a directed graph has no cycles, we call it a directed acyclic graph or DAG
- for a directed graph G, we say that a topological ordeirng of G is an ordering of its nodes as v1, v2, …, vn so that for every edge (vi, vj) we have i<j. in other words all edges point “forward” in the ordering , topological ordering on tasks provides an order in which they can be safely performed
- (3.18) if G has a topological ordering, then G is a DAG (3.20) If G is a Dag then G has topological ordering
- (3.19) in every DAG G, there is a node with no incoming edges
To compute topological ordering of G: Find a node v with no incoming edges and order it first Delete v from G Recursively compute a topological ordering of G-{v} and append this order after v
4 preface
- an algorithm is greedy if it builds up a solution in small steps, choosing a decision at each step myopically to optimize some underlying criterion
- the greedy algorithm stays ahead - if one measures the greedy algorithms' progress in a step-by-step fashion, one sees that it does better than any other algorithm at each step; it then follows that it produces an optimal solution
- exchange argument - one considers any possible solution to the problem and gradually transforms it into the solution found by the greedy algorithm without hurting its quality
4.1 interval Scheduling: The Greedy Algorithm Stays Ahead
- accept first the request that finishes first, that is, the request i for which f(i) is as small as possible
Initially let R be the set of all requests, and let A be empty While R is not yet empty
Choose a request i in R that has the smallest finishing time Add request i to A Delete all requests from R that are not compatible with request i
End while loop Return the set A as the set of accepted requests
- A is a compatible set of requests
Scheduling All Intervals - there is a single resource and many requests in the form of time intervals, so we must choose which requests to accept and which to reject
- 4.4 In any instance of Interval Partitioning, the number of resources needed is at least the depth of the set of intervals
Sort the intervals by their start times, breaking ties arbitrarily Let I1, I2,…,In denote the intervals in this order For j = 1, 2, 3, … n
For each interval Ij that procedes Ij in sorted order and overlaps i Exclude the lavel of Ii from consideration for Ij End for If there is any label from (1, 2, ...d) that has not been excluded then Assign a nonexcluded label to Ij Else Leave Ij unlabeled End if
Endfor
- If we use the greedy algorithm above, every interval will be assigned a label, and no two overlapping intervals will receive the same label
- (4.6) The greedy algorithm above schedules every interval on a resource using a number of resources equal to the depth of the set of intervals. This is the optimal number of resources needed.
3.4 to 4.1 readings
- i was wondering if we could go over section 4.1 in class again, well really only the algorithm written on page 124, before I read this section I thought I understood the scheduling all intervals problem but the algorithm given has just royally confused me.
- other than that, everything else seemed to read clearly
4.2 Scheduling to Minimize Lateness: An Exchange Argument
- so the goal of the algorithm is to to schedule all requests, using non overlappping intervals, so as to minimize the maximum lateness, this problem arises naturally when scheduling jobs that need to use a single machine
- for this we will sort the jobs in increasing order of their deadlines di, and schedule them in this order, Earliest Deadline First produces optimal solutions (proof to which is on 127, 128)
- So this algorithm gives us an optimal schedule with no idle time, all schedules with no inversions and no idle time have the same maximum lateness ( we say that a schedule A' has an inversion if a job i with deadline di is scheduled before another job j with earlier deadline dj < di, by definition the schedule A produced by our algorithm has no inversions)
- There is an optimal schedule that has no inversions and no idle time (formal proof on pg 129, 130)
- The schedule A produced by the greedy algorithm has optimal maximum lateness L
4.4 Shortest Paths in a Graph
Dijkstra's Algorithm
- the algorithm determines the length of the shortest path from s to each other node in the graph, the algorithm maintains a set S of vertices u for which we have determined a shortest-path distance d(u) from s; this is the “explored” part of the graph. Initially S = {s} and d(s) = 0. for each node v in V-S we determine the shortest path that can be constructed by traveling along a path through teh explored part S to some u in S by the single edge (u,v), we choose the node v in V-S for which this quantitiy (d'(v)) is minimized, add v to S, and define d(v) to be the value d'(v)
- Consider the set S at any point in the algorithm's execution for each u in S, the path Pu is a shortest s-u path
- two observations about the algorithm - the algorithm does not always find shortest paths if some of the edges can have negative lengths, the second is that the algorithm is even simpler than described above, it is a continuous version of the standard breadth-first search algorithm for traversing a graph
- Runtime - there are n-1 iterations of the While loop for a graph with n nodes, as each iteration adds a new node v to S but selecting the v is based on the number of m edges, and computing all these minima can take O(m) time, giving us a runtime of O(nm)
4.5 The Minimum Spanning Tree Problem
- we have a set of locations and we want to build a communication network on top of them, networks should be connected (ie there should be a path between every pair of nodes) but to do this as cheaply as possible
- Kruskal's algorithm - starts without any edges at all and builds a spanning tree by successively inserting edges from E in order of increasing cost, as we move through the edges in this order, we insert each edge e as long as it does not create a cycle when added to the edges we've already inserted, if e would result in a cycle, then we discard e and continue
- Prim's Algorithm - start with a root node s and try to greedily grow a tree from s outward, at each step, we simply add the node that can be attached as cheaply as possibly to the partial tree we already have
- Reverse-Delete Algorithm - start with the full graph and begin deleting edges in order of decreasing cost, as we get to each edge e, we delete it as long as doing so would not actually disconnected the graph
- Kruskal's and Prim's and Reverse-Delete algorithm produces a minimum spanning tree of G
- Prim's Algoirhtm can be implemented in a simpler way than Dijkstra's with only an O(m) running time for the m edges
4.6 Implementing Kruskal's Algorithm
- given a node u, the operation Find(u) will return the name of the set containing u, this operation can be used to test if two nodes u and v are in the same set, by simply checking if Find(u) = Find(v), Union(A, B) takes sets A and B and merges them to a single set, these operations can be used to maintain connected components as edges are added
- Data Structure for union-Find - each node v in S will be contained in a record with an associated pointer to the name of the set that contains v, the MakeUnionFind(S) operation, we initialize a record for each element v in S with a pointer that points to itself, to indicate that v is in its own set
I am rather confused with the Union-Find Data Structure section, I don't really understand why it is used to implement Kruskal's algorithm, nor do I understand how it is implemented, I am thrown with the pointers. Can we go over this in class again?
4.7 Clustering
- clustering arises whenever one has a collection of objects that one is trying to classify or organize into coherent groups
- distance function on the objects - objects at larger distance from one another are less similar to each other
- given a distance function on objects, the clustering problem seeks to divide them into groups so that, intuitively, objects within the same group are “close” and objects in different groups are “far apart”
- we say that a k-clustering of U is a partition of U into k nonempty sets C1, C2, …, Ck
- we define the spacing of a k-clustering to be the minimum distance between any pair of points lying in different clusters
- given that we want points in different clusters to be far apart form one another, a natural goal is to seek the k-clustering with the maximum possible spacing
- The Algorithm - consider a graph on the vertex set U, the connected components will be the clusters, start by drawing an edge between the closet pair of points, then repeat and draw an edge between the next closet pair of points, continue until we obtain k connected components if we are seeking a k-clustering
- The components C1, C2, …, Ck formed by deleting the k-1 most expensive edges of the minimum spanning tree T constitute a k-clustering of maximum spacing
4.8 Huffman Codes and Data Compression
- Encoding Symbols USing Bits - since computers operate using 0 and 1, one needs encoding schemes that take text written in richer alphabets and converts this text into long strings of 0s and 1s.
- Best way to do this is with Optimal Prefix Codes - some letters are more frequent than others and we want to take advantage of the fact that more frequent letters should have shorter encodings
- Representing Prefix Codes using Binary Trees - take a rooted tree T in which each node that is not a leaf has at most two children, and the number of leaves is equal to the size of the alphabet S, to label the binary tree T, for each x in S we follow the path from the root to the leaf labeled x each time the path goes to its left child, we write down a 0, and each time the path goes from a node to its right child, we write down a 1, Figure on pg 168
- There is an optimal prefix code, with corresponding tree T*, in which the two lowest-frequency letters are assigned to leaves that are siblings in T*
- Suppose that y* and z* are the two lowest frequency letters in S, the above statement tells us that y* and z* are the lowest leaves are are sibling leaves below a common parent, in effect this common parent acts like a “meta-letter” whose frequency is the sum of the frequencies of y* and z*
Huffman's Algorithm which produces for a given alphabet a Huffman Code
To construct a prefix code for an alphabet S, with given frequencies:
If S has two letters then Encode one letter using 0 and the other letter using 1 Else Let y* and z* be the two lowest-frequency letters Form a new alphabet S' by deleting y* and z* and replacing them with a new letter w of frequency f_x* + f_y* Recursively construct a prefix code x' for S', with tree T' Define a prefix code for S as follows: Start with T Take the leaf labeled w and add two children below it lableed y* and z* Endif
- the Huffman code for a given alphabet achieves the minimum average number of bits per letter of any prefix code
- Implementation and Running Time: Using an implementation of priority queues via heaps we can make each insertion and extraction of the minimum run in time O(log k); hence each iteration – which performs just three of these operations takes time O(log k). Summing over all k iterations, we get a total running time of O(k*log k)
5.1 A First Recurrence: The Merge-sort Algorithm
- divide the input into two pieces of equal size; solve the two subproblems on these pieces separately by recursion; and then combine the two results into an overall solution, spending only linear time for the initial division and final recombining
- consider an algorithm that fits the patter above, and let T(n) denotes its worst-case running time on inputs instances of size n
- for some constant c, T(n) ⇐ 2(T(n/2)) + cn when n > 2 and T(2) ⇐ c
- at the first level of recursion, we have a single problem of size n, which takes time at most cn plus the time spentin all subsequent recursive calls
- at the next level, we have two problems each of size n/2, each takes time at most cn/2 for a total of at most cn, again plus the time in subsequent
- at the third level, we have four problems each of size n/4, each taking time at most cn/4, for a total of at most cn
- Pattern: at level j of the recursion, the number of subproblems has doubled j times, ie 2^j, and each size is n/(2^j), and hence each takes time at most cn/(2^j), thus level j contributes a total of at most 2^j * (cn/(2^j)) = cn
- so summing the cn work over log n levels of recursion, we get a total running time of O(nlogn)
5.2 Further Recurrence Relations
- The more general class of algorithms is obtained by considering divide-and conquer algorithm that create recursive calls on q subproblems of size n/2 each and then combine the results in O(n) time
For some constant c, T(n) < = qT(n/2) + cn when n>2, and T(2) < = c
- At an arbitrary level j, we have q^j distinct instances, each of size n/(2^j), thus the total work performed at level j is q^j(cn/(c^j)) = (q/2)^j * cn
- Summing over all levels of recursion, as before there are log 2 n levels of recursion and the total amount of work performed is the sum over all these, see summation on page 216
- This summation gives us the runtime of O(n^(log 2 q))
- Any function T with q = 1 is bounded by O(n)
- Divide the input into two pieces of equal size; solve the two subproblems on these pieces separately by recursion; and then combine the two results into an overall solution, spending quadratic time for the initial division and final recombining:
For some constant c, T(n) < = 2T(n/2) + cn^2, when n >2 and T(2) < = c
- At any arbitrary level j of the recursion, there are 2^j subproblems, each of size n/(2^j), and hence the total work at this level is bounded by cn^2/2^j
- Summing over all levels of recursion: see summation on 221 and gives us a runtime of O(n^2)
5.3 Counting Inversions
- collaborative filtering - in which they try to match your preferences with those of other people out on the internet
- based on a comparison of how you and they rate various things - it can recommend new things that these other people have liked
- the core issue in application is the problem of comparing two rankings
- compare two sets of rankings, take one as the original set and see how many pairs of the second set are out of order - ie different than that of the first
- a natural way to quantify this notion is by counting the number of inversions - we say that two indices i < j form an inversion if ai > aj that is , if the two elements ai and aj are out of order
- counting the inversions by looking at every pair of numbers (ai, aj) and determine whether they constitute an inversion; this would take O(n^2) time, lets see if we can do this more quickly
- So lets divide the list in two two pieces, we first count the number of inversions in each of these two halves separately, then we count the number of inversions where the two numbers belong to different halves; the trick is to do this in linear O(n) time, this step is closely related to the simpler problem of the combining step for Mergesort, the difference here is that we want to do something extra: not only should a list be produced but we should also count the number of inverted pairs, see algorithm on page 224
- (5.7) The Sort-and-count algorithm correctly sorts the input list and counts the number of inversions; it runs in O(nlogn) time for a list with n elements
5.4 Finding the Closet Pair of Points
- Given n points in the plane, find the pair that is closet together, it is immediately clear that there is an O(n^2) solution - compute the distance between each pair of points and take the minimum,
- so lets first assume that no two points in the plane have the sam x-coordinate or the same y-coordinate, so we add line L that splits the plane in half, we do this by sorting the the points by their x-coordinates and finding the middle, then we will divide and conquer like in Mergesort:, we find the closet pair among the points in the left half of the plane and the closet pair among the pionts in the right half of the plane, and then we use this information to get the overall solution in linear time
- the last combining phase of the algorithm is to find the distances between a point in the left half and a point in the right half of the plane and that this distance may be less than any other distance in either side, to find the minimum distance between a point on the left and a point on the right can be found on pg 228 and the proof on pg 229
- the summary of the algorithm can be found on page 230 and the algorithm has a runtime of O(nlogn)
6.1 Weighted Interval Scheduling: A Recursive Procedure
- Let's suppose that the requests are sorted in order of nondecreasing finish time, we'll say a request i comes before a request j if i < j, we define p(j), for an interval j, to be the largest index i < j such that intervals i and j are disjoint, we define p(j) = 0 if no request i < j is disjoint from j
- For the optimal solution Oj on {1, 2, …, j} our reasoning says either j is in Oj in which case OPT(j) = value of j + OPT(p(j)), or j is not in Oj in which the case is OPT(j) = OPT(j-1), since these are the only two possibles choices we can say that OPT(j) = max{vj + OPT(p(j)), OPT(j-1)} and so j belongs to an optimal solution on the set {1, 2, …,j} if and only if vj + OPT(p(j)) >= OPT(j-1)
Compute-Opt(j):
If j = 0 Return 0 Else Return max(vj + Compute-Opt(p(j)), Compute-Opt(j-1))
- Compute-Opt(j) correctly computes OPT(j) for each j = 1, 2, …, n
Memoizing the Recursion
- the code aboves runs in exponential time as written is simply due ot the spectacular redundancy in the number of times it issues each of these calls, to eliminate the redundancy, we could store the value of Compute-Opt in a globally accessible place, the first time we compute it and then simply use this value in place of all future recursive calls, this technique is called memoization
- for this procedure we will make use of an array M[0…n]; M[j] will start with the value empty but will hold the value of Compute-Opt(j) as soon as it is first determined, to determine OPT(n), we call M-Compute-Opt(n)
M-Compute-Opt(j)
If j = 0 Return 0 If M[j] is not empty Return M[j] else Define M[j] = max(vj + M-Compute-Opt(p(j)), M-Compute-Opt(j-1)) Return M[j]
- The running time of M-Compute-Opt(n) is O(n) assuming the intervals are sorted by their finish times
- Computing a solution in Addition to ITs Value, we will trace back through the array M to find the set of intervals in an optimal solution
Find-Solution(j)
If j = 0 Output Nothing Else If vj + M[p(j)] >= M[j-1] Output j together with the result of Find-Solution(p(j)) Else Output the result of Find-Solution(j-1)
- Given the array M of the optimal values of the sub-problems, Find-Solution returns an optimal solution in O(n) time
6.2 Principles of Dynamic Programming: Memoization or Iteration over Subproblems
- we can directly compute the entries in M by an iterative algorithm, rather than using memoized recursion, we just start with M[0] = 0 and keep incrementing j
Iterative-Compute-Opt
M[0] = 0 For j from 1 to n M[j] = max(vj + M[p(j)], M[j-1])
- this algorithm write OPT(j) in array entry M[j], we can pass the filled-in array M to Find-Solution to get an optimal solution in addition to the value, finally running time of Iterative-Compute-Opt is clearly O(n) since it explicitly runs for n iterations and spends constant time on each
- this provides a second efficient algorithm to solve the Weighted Interval Scheduling Problem
A Basic Outline of Dynamic Programming
- iterative building up of subproblems, to set about developing an algorithm based on dynamic programming, one needs a collection of subproblems derived from the original problem that satisfies a few basic propterties
- There are only a polynomial number of subproblems
- The solution to the original problem can be easily computed from teh solutions to the subproblems
- there is a natural ordering on subproblems from smallest to largest, together with an easy-to-compute recurrence that allows one to determien the solution to a subproblem from the solutions to some number of smaller subproblems
6.3 Segmented Least Squares: Multi-way Choices
- multi-way choices - at each step, we have a polynomial number of possibilities to consider for the structure of the optimal solution
- plotted on a two-dimensional set of axes, one tries to pass a line of best fit through the data, find the line with the minimum error y = ax + b where a and b are defined on page 262
- given a set of points where each x value is unique, we must partition P into some number of segments where P is the set of points, each segment is a subset of P that represents a contiguous set of x-coordinates, for each segment S in our partition of P we compute the line minimizing the error with respect to the points in S
- the penalty of a partition is defined by the sum of the following terms
- * the number of segments into which we partition P, times a fixed given multiplier c > 0
- * for each segment, the error value of the optimal line through that segment
- our goal is to find a partition of minimum penalty
- let OPT(i) denote the optimum solution for the points p1,…,pn and let eij denote the minimum error of any line with repect to pi, pi+1,…,pj
- If the last segment fo the optimal partition is pi,…,pn, then the value of the optimal solution is OPT(n) = ein + C + OPT(i-1)
- For the subproblem on the points p1,…,pj, OPT(j) = min from i = 1 to j (eij + C + OPT(i-1)), and the segment pi,…,pj is used as an optimum solution for the subproblem if and only if the minimum is obtained using index i
Segmented-Least-Squares(n)
Array M[0...n] Set M[0] = 0 For all pairs i <= j Compute the least squares error eij for the segment pi,...pj For j = 1,...n Use the recurrence OPT(j) = min from i = 1 to j (eij + C + OPT(i-1)) to compute M[j] Return M[n]
FindSegments(j)
If j = 0 output nothing Else Find an i that minimizes eij + C + M[i-1] Output the segment {pi,...,pj} and the result of Find-Segments(i-1)
- Analyzing the Runtime - first we need to compute the values of all the least-squares errors eij, and to perform this there are O(n^2) pairs (i,j) for which this computation is needed and for each pair we can use the formula given at the begining of the section in O(n) time, giving us O(n^3) for computing the eij error values. Following this the algorithm has n iterations, and for each j we need to determine the minimum in the recurrence which takes time O(n), giving a running time of O(n^2) once all the eij values have been determined
6.4 Subset Sums and Knapsacks: Adding a Variable
- in this scheduling problem, we have a single machine that can process jobs and a set of requests, we are only able to use this resource for the period between time 0 and time W, for some number W,
- for the knapsack problem, where each request i has both a value vi, and a weight wi, the goal in this problem is to select a subset of maximum total value, subject to the restriction that its total weight not exceed W
- We will use OPT(i, w) to denote the value of the optimal solution using a subset of the items {1,…,i} with maximum allowed weight w, that is OPT(i, w) = maximum over subsets S of {1, …, i} that satisfy the Summation of j in S where wj ⇐ w
- If n is not in the optimum solution, then OPT(n, W) = OPT(n-1, W), since we can simply ignore item n
- If n is in the optimum solution, then OPT(n, W) = wn + OPT(n-1, W-wn) wince we now seek to use the remaining capacity of W - wn in an optimal way across items 1, 2, …, n-1
- If w < wi then OPT(i, w) = OPT(i-1, w), otherwise OPT(i, w) = max(OPT(i-1, w), wi + OPT(i-1, w-wi))
Subset-Sum(n, W)
Array M[0...n, 0....W] Initialize M[0,w] = 0 for each w = 0,1,...W For i = 1,..n For w = 0,...,W Use the recurrence If w < wi then OPT(i, w) = OPT(i-1, w), otherwise OPT(i, w) = max(OPT(i-1, w), wi + OPT(i-1, w-wi)) to compute M[i,w] Return M[n, W] * for the algorithm we've just designed, we need a two-dimensional table, reflecting the two-dimensional array of subproblems that is being built up * The Subset-Sum(n, W) algorithm correctly coputes the optimal value of the problem, and runs in O(nW) time * Given a table M of the optimal values of the subproblems, the optimal set S can be found in O(n) time
7.1 The Maximum-Flow Problem and the Ford-Fulkerson Algorithm
- Consider a highway system in which the edges are highways and the nodes are interchanges, network models of this type have several ingredients: capacities on the edges indicating how much they can carry; source nodes in the graph which generate traffic; and sink nodes in the graph which absorb traffic as it arrives, and the traffic itself which is transmitted across the edges
- We'll refer to the traffic as flow - an abstract entity that is generated at source nodes, transmitted across edges, and absorbed at sink nodes
- flow network is a directed graph G = (V, E) where each edge e is a capacity, which is a nonnegative number denoted ce, there is a single source s in V, and a single sink t in V
- we will say that an s-t flow is a function f that maps each edge e to a nonnegative real number where the value f(e) represents the amount of flow carried by edge e
- For a flow f, each e in E, we have 0 ⇐ f(e) ⇐ce, and for each node v other than s and t, we have the sum for all edges e into v, f(e) equals the sum of all edges e out of v f(e)
- Given a flow network, the goal is to arrange the traffic so as to make as efficient use as possible of the available capacity
7.2 Maximum Flows and Minimum Cuts in a Network
- we say that an s-t cut is a partition (A, B) of the vertex set V so that s is in A and t is in B
- the capacity of a cut (A, B), which we will denote c(A, B) is simply the sum of the capacities of all edges out of A
- cuts provide very natural upper bounds on the values of flows
- Let f be any s-t flow, and (A, B) any s-t cut, then v(f) = f out (A) - f in (A), this statement says that by watching the amount of flow f sends across a cut, we can exactly measure the flow value, it is the total amount that leaves A, minus the amount that “swirls back” into A
- (7.8) Let f be any s-t flow, and (A, B) any s-t cut, then v(f) ⇐ c(A, B)
- Let f bar denote the flow that is returned by the Ford-Fulkerson Algorithm, we want to show that f bar has the maximum possible value of any flow in G
- If f is an s-t flow s.t. there is no s-t path in the residual graph Gf, then there is an s-t cut (A*, B*) in G for which v(f) = c(A*, B*). Consequently f has the maximum value of any flow in G, and (A*, B*) has the minimum capacity of any s-t cut in G
- The flow f bar returned by the Ford-Fulkerson Algorithm is a maximum flow
- Given a flow f of maximum value, we can compute an s-t cut of minimum capacity in O(m) time
- The point is that f must be a maximum s-t flow for if there were a flow f' of greater value, the value of f' would exceed the capacity of (A, B), and this would contradict (7.8). Similarly, it follows that (A, B) is a minimum cut - no other cut can have a smaller capacity, for if there were a cut (A', B') of smaller capacity, it would be less than the value of f, and this again would contradict (7.8), so this gives us the Max-Flow Min-Cut Theorem - In every flow network, the maximum value of an s-t flow is equal to the minimum capacity of an s-t cut
- If all capacities in the flow network are integers, then there is a maximum flow f for which every flow value f(e) is an integer
7.5 A First Application: The Bipartite Matching Problem
- Beginning with the graph G in an instance of the Bipartite Matching, we construct a flow network G', first we direct al edges in G from X to Y, we then add a node s and an edge (s,x) from s to each node in X. We add a node t, and an edge (y, t) from each node in Y to t. Finally, we give each edge in G' a capacity of 1, we now compute a maximum s-t flow in this network G'
- The size of the maximum matching in G is equal to the value of the maximum flow in G'; and the edges in such a matching in G are the edges that carry flow from X to Y in G'
- The Ford-Fulkerson Algorithm can be used to find a maximum matching in a bipartite graph in O(mn) time, where m is the number of edges
- We can decide if the graph G has a perfect matching by checking if the maximum flow in a related graph G' has value at least n, by the Max-Flow Min-Cut theorem there will be an s-t cut of capacity less than n if the maximum -flow value in G' has value less than n
- If there are nodes x1 and x2 in X that have only one incident edge each, and the other end of each edge is the same node y, then clearly the graph has no perfect matching
- Assume that the bipartite graph G = (V, E) has two sides X and Y st the size of X is equal to the size of Y. Then the graph G either has a perfect matching or there is a subset A of X st the size of Gamm(A) < the size of A where Gamm(A) is a subset of Y and denotes the set of all nodes that ar adjacent to nodes in A. A perfect matching can be found in O(mn) time
7.6 Disjoint Paths in Directed and Undirected Graphs
- We say that a set of paths is edge-disjoint if no two paths share an edge,
- Given a directed graph G = (V, E) with two distinguished nodes s, t in V, the Directed Edge-Disjoint Paths Problem is to find the maximum number of edge-disjoint s-t paths in G
- The Undirected Edge-Disjoint Paths Problem is to find the maximum number of edge-disjoint s-t paths in an undirected graph G
- Both directed and the undirected versions of the problem can be solved using flows, starting with the directed problem, given the graph G = (V, E) we define a flow network in which s and t are the source and sink, and with capacity of 1 on each edge, suppose there are k edge-disjoint s-t paths, we can make each of these carry one unit of flow. If there are k edge-disjoint paths in a directed graph G from s to t, then the value of the maximum s-t flow in G is at least k
- If f is a 0-1 valued flow of value v, then the set of edges with flow value f(e) = 1 contains a set of v edge-disjoint paths
- There are k edge-disjoint paths in a directed graph G from s to t if and only if het value of the maximum value of an s-t flow in G is at least k
- The Ford-Fulkerson Algorithm can be used to find a maximum set of edge-disjoint s-t paths in a directed graph G in O(mn) time
- In every directed graph with nodes s and t, the maximum number of edge-disjoint s-t paths is equal to the number of edges whose removal separates s from t
- There are k edge-disjoint paths in an undirected graph G from s to to if and only if the maximum value of an s-t flow in the directed version G' of G is at least k. The Ford-Fulkerson Algorithm can be used to find a maximum set of disjoint s-t paths in an undirected graph G in O(mn) time
- In every undirected graph with nodes s and t, the maximum umber of edge-disjoint s-t paths is equal to the minimum number of edges whose removal separates s from t.