This is an old revision of the document!

Eric's Journal

Week 3's Reading Summary:

3.1 Basic Defs and Apps

- Graph: pairwise relations among set of objects of V nodes and E edges - There are symmetric (undirected) and asymmetric graphs (directed) with u as a tail and v as the head - They're abstract and hard to define, so here are some examples of graphs: transportation networks (airports, train stations, taxi stations), communication networks (b/t computers, internet providers, networks and routing), information networks WWW), social networks, dependency networks (classes, bank tellers, food ordering) - Paths (connectivity) is moving from one node to another through connected edges. Simple paths mean no node is visited twice, cycle is path that starts and ends in same placd - Connected means every pair of nodes u and v has a path between them; distance between two nodes is the min number of edges connecting them (aka short path) - Tree - an undirected graph that is connected and doesn't have a cycle. Root is “top” of rest of nodes, parent precedes child with edge distance of 1, with ancestors and descendants being distances greater than one; leaf has no descendant → concept of hierarchy captured in rooted trees - Fact: every n-node tree has exactly n-1 edges - Statement: G is an undirected graph of n nodes. any two of the following implies the third - g is connected, does not contain a cycle, has n-1 edges

3.2 Graph Connectivity and Graph Traversal

- Problem of connectivity: is there a path from node s to t in graph G? - Breadth-First Search - explore outward from s in all pos directions, adding nodes one layer at a time (find all nodes joined by one edge to s)

L1 is the neighbor of s, which is L0. For L1 to Lj, Lj+1 consists of all nodes not belonging to an earlier layer that have an edge to a node in Lj
Fact: For each layer j>=1, Lj produced by BFS consists of all nodes distance j from s; there's a path from s to t iff t appears in some layer
Fact: Let T be BFS tree, x and y belonging in layers Li and Lj, and x-y is an edge of G. The i and j differ by at most 1

Proof: suppose by way of contradiction that i and j differed by more than 1; since x belongs to layer Li, the only nodes disc from x belongs to Li+1 and earlier; hence if y is a neighbor of x it should have been discovered by this pt at the latest and hence should belong to Li+1 or earlier - Exploring a Connected Component - set of nodes BFS alg finds reachable from s

Alg: R consists of nodes to which s has a path Initially R = {s} While there is an edge u v and u in R but v not in R: Add v to R
Prop: the set R produced is precisely the connected component of G containing s. Proof: we know for any node v in R there's path sv. consider node w, and that BWOC there is sw path P in G. s is in R but w is not so there must be a node v on P not in R; node v is not s; so there is a u before v on P, and edge uv exists; u is in R; uv is an edge where u is in R but v is not in R, which contradicts the stopping rule

-Depth-First Search - follow first edge until dead end, then backtrack; the “maze algorithm”

Alg: DFS(u): Mark u explored and add u to R For each edge uv incident to u: if v is not explored then recursively invoke DFS(v) > Prop: for a given recursive call DFS(u), all explored nodes b/t invocation and end of recursive call are descendants of u in Tree > Prop: For T, nodes x y in T, and x-y an edge of G that isn't an edge of T, either x or y is an ancestor of the other. Proof: xy is an edge of G not of T, x is reached first by DFS. xy is not added to T because it's marked explored. y was not explored earlier, and must have been discovered b/t invocation and end of the recursive call DFS(s) - so it must be a descendant of x - The Set of All Connected Components - For any two nodes s and t, their connected components are either identical or disjoint. Proof: path st in G; claim conn comp of s and t are same set; so there's a path from t to v and v to s. If there's not path st, there's no v connected comp of each, because then we could walk from sv and then to t, but there is no path ==== 3.3 Implementing Graph Traversal Using Queues and Stacks ==== - Repping Graphs - two basic ways are adjacency matrix and adjacency list > Goal run time is O(m+n) where m is the number of edges; note that since m>=n-1, we call O(m+n) linear > Ad matrix: nxm matrix A where A[u,v] is equal to 1 if the graphs contains edge uv and 0 otherwise. It's symmetric for undirected graphs. Two disads are it takes Theta(n^2) space no matter what, and takes O(n) time to check all edges incident to a node no matter what node) > Ad list (good for sparse (fewer than n^2 edges) graphs): actually an array where the index is the node and value is a list of index's adjacent nodes. Takes just O(m+n) space > Ad matrix is constant access, and ad list linear, but ad list is more natural rep of exploring graphs cuz physically it's like learning the neighbors of u when you arrive at u, and once there it is constant time although getting there might be linear > Proof: edge vw contributes twice to the sum of degrees in a graph: once in nv and once in nw. sinc the sum is the total of contributions of each edge, it is 2m - Queues and Stacks - used when order matters in keeping records > Queue is FIFO and stack is LIFO; both repped as doubly linked list cuz is has explicit First and Last pointers - Implementing BFS: set only s true, L0 consisting only s, layer counter i=0, current Tree T as empty. while Li not empty, initialize empty list L[i+1] and for each node u in L[i], if edge uv is false then set it to discovered true and add it to tree T and v to list L[i+1]. Then increment layer counter i by one. > Prop: the above impl of BFS runs linear if the graph is given by the ad list rep. n for the for loop n times, and constant considering of each edge of each vertice u times using an additional array Discovered set up outside the for loop - Implementing DFS: stack S has one element s. while S is not empty take node u from S and if u is false, set to true and add each edge v incident to u to S > Prop: the above alg implements DFS in the sense that it visits the nodes in exactly the same order as the recursive DFS procedure in the previous section (except that the ad list is in reverse order to this) > Prop: the above implementation of DFS runs in O(m+n) time if the graph is given by the ad list rep -Finding the Set of All Connected Components: start with some node s, use BFS or DFS, then find node v not visited by search from s and use BFS and DFS (v is disjoint from s by def). Linear runtime ===== Week 2's Reading Summary: ==== ==== 2.3 Implementing the Stable Matching Alg Using Lists and Arrays: ==== -It's good to use data structures to estimate the run time of algs rather than breaking down every tiny step. We'll use lists and arrays for the GS alg -Need to rep ranking and who is matched with whom and who is free === Arrays and Lists === -Keep a list of n elements as an array of length n, where the A[i] is the ith element of list. If ordered, then search is logn time -Linked list for the list of men since ll is more dynamic and flexible than array. Create the list and also a pointer that goes to the current pos of the last operation (has deletion and insertion) === Implementing the Stable Matching Alg === -Have an all men or all women array, but two arrays for the preferences of both genders. The goal is that each iteration of the GS alg runs in constant time so that it runs in time proportional to the upper bound of n^2 and no worse -What has to be C time: who is a free man (linked list, select first man on that list), who m should propose to next (use extra array Next that indicates for each m the pos of the next woman to propose to), is w engaged (use array Current where w is the index and the value is her man), if w prefers m or m' (trickiest of the four: create nxn array at start ofalg, where Ranking[m,w] contains the rank of man m in sorted order of w's preferences. Creation is linear time for each w, and to decide if w prefs m or m' just compare [w,m] with [w,m'] ==== 2.4 A Survey of Common Running Times: ==== -“Natural space” of a prob: the set of all possible solutions. Nat space and run time are the two bounds to consider when encountering a new problem Linear Time (rt is at most a constant * size of input) -Computing the Maximum - a linear pass, where constant per element adds up to O(n) -Merging Two Sorted Lists A and B (Maintain 2 pointers init to front els While both lists not empty: append smaller of A or B to new list, advance Current pointer of which was smaller If one list is empty, put all the els of the other to the new list) Here's why it's n and not n^2 although the latter is technically correct: count the cost of each iteration of the selected element as constant because once it is first selected (“charged”) it is given to the new list and not seen by the alg again → 2n iterations at most = n. -nlogn time: the rt of any alg that splits input into 2 even pieces and recurses before combining in linear time. mergesort, or algs whose expensive step is sorting -Quadratic Time alg for finding closes pair of points using nested loop (For each x,y For each other x, y compute distance formula update the min accordingly) As the class progresses though, we'll find a way to do the above task in nlogn and then… n. say wha?? -Cubic time: 3 nested loops -n^k time alg for checking if there's an edge between two nodes (For each subset s in k, check is S is independent set and if yes, we're done If no k-node ind set was found, declare failure) -Beyond Polynomial Time: >2^n is a search space for many problems, so if we encounter it, then don't worry because most likely that actually means there's a better alg out there to improve the op. The “dichotomy” of brute-force algs and their alternatives >n! is the number of ways to match n items with n other items (think of this as perm n from 121) >Sublinear Time: possible when input can be “queried” and not read completely. An example is binary search where we prob entries by group of “active regions.” This happens when an alg does a C work to throw away a constant fraction of the input ==== 2.5 A More Complex Data Structure: Priority Queues: ==== More complex data structures take nontrivial processing when invoked but afterwards lead to highly efficient algs that would not be using simpler data structures === The Prob (have a changing set S of the free men in the StabMat) === -PQs support addition and deletion and each element has a key that denotes the priority of it. They are used in real life whenever a schedule of processes must be made while taking turns -Why aim for logn time / op? Proof that “a sequence of O(n) pq ops can be used to sort n numbers”: set up H by inserting each number with its value as key. Then simply extract min one by one. Thus n numbers times logn ops per number gives nlogn which is the best we can hope for at this point === A Data Structure for Implementing a PQ === Heap - balanced bin tree with a root, node with up to children, and child. Heap order means child is at least equal to or bigger than parent. We can use pointers or arrays to rep heaps-repping-pqs depending on if we know that total number of elements that will be in it at any one time. left child is 2i and right child is 2i + 1 where i is the index of any node. -Implementing Heap Ops. Start heap n, insert logn, findmin C, delete logn, extractmin logn. Heapify up (H, i): (if i>1 j is parent, and if child smaller than parent swap up.recursive heapify up(H,j). Swapping is C. We claim that after each swap array H is either a heap of almost a heap with the key of H[j. -The height of tree is the max number of recursions. Add new elements to the end of the array and then heapify up. To delete, we only allow deleting the root; replace the root with the last element and then heapify up or down -Heapify down (H,i): (n = length(H). if 2i>n, done. else 2i<n set j to be the smaller of i's left and right child. else 2i=n, set j to 2i. endif if child is smaller than parent, swap them, then call heapify down (H, j)). -Implementing PQs with Heaps (same as implementing heap ops mentioned above). -A second category of ops is finding elements by name rather than pos (Does this mean value rather than key?). To do this have additional array Pos that stores the current pos of each el/node in the heap. To delete, it's Delete(H,Pos[v]) and to change key, find where v is in Pos, change the key and then use heapify up or down as appropriate ===== Week 1's Reading Summary: ==== ==== Preface: ==== Algorithms apply not just to cs. They aren't just a matter of solving a problem, but are about how to solve a problem the most efficient way (what the authors are hoping this book will teach me). -Once you find the 'mathematically clean core of a problem', then design the appropriate structure to solve that problem. '[Algs don't] just provide solutions to well-posed problems; they form the language that lets you cleanly express the underlying questions.' So more than as a tool for problem solving, they allow elegant design of algs and then in turn programs. ==== 1.1 Stable Matching Problem: ==== Interest: 8/10 Gale and Shapley, two mathematicians/economists, wanted to see if any process involving an applicant and acceptor-or-rejector could be automated based on the preference ordering of both parties. It's tricky though: if self-interest allowed the parties to drop their current partner, chaos would ensue. This alg is in use today in hospitals. -Make it the 'bare-bones version', where we can assume things to go right (like for men and women, there are an even number of them); the goal is to have no instabilities, where E and A, or M and W, won't have motivation to ditch the other. Pseudocode alg: All m w are free (with w having 2 arrays) While there is a free man who hasn't proposed to every woman (choose m choose highest w m hasn't proposed to yet if w is free, engage both else if w prefers someone else m is free, or if if w prefers m to current man they become engaged and her current m is free) return to set s of engaged pairs Runtime can be n^2 if we set the w up with two arrays that are inverses of each other for the men they prefer. Otherwise it is n^3 (as shown in class). pg 7-12 shows that the solution is logically sound. Also see class notes for proofs (Professor, I'm not sure if saying things like the last two sentences are ok. I'm hoping they are for this one, but if not, then in the second journal entry I will type them out here) ==== 2.1 Computational Tractability: ==== Interest: 7/10 It's about being efficient, not just solving something. The way they're gonna measure efficiency is by run time, and then in terms of memory space. Efficiency: 'an alg is eff if, when implemented, it runs quickly on real input instances'; but more than that general definition, good algs need to be platform-independent, instance-ind, and of predictive values with respect to increasing input sizes. -In general, we start with looking at worst-case rt. The problem with average-case analysis is that 'random inputs' can not capture real, 'average' inputs. Second def of efficiency: …if it achieves qualitatively better worst-case performance, at an analytical level, than brute-force search. -Polynomial running times are characteristic of eff algs (definition 3 of eff). Pg 34 has a table from O(n) best, which is great rt, to n!, which is onion awful. In general: n > nlogn > n^2 > n^3 > 1.5^n > 2^n > n! (pg 34) Questions I have: just the notion of how to determine the O of an alg, because, say we need to do an operation to the xth object of some collection, sometimes we count getting to x as part of the runtime, while other times we just count O to be whatever we do once we're there already. How do I know when to count what? ==== 2.2 Asymptotic Order of Growth: ==== Interest: 6/10 Analyzing algs is done by analyzing how fast the rt grows with each additional unit of input. The function f(n) gives a line repping the worst-case run time of some alg of input size n. 'Granularity' in order to be able to generalize improvements/solutions to inefficient algs. Pseudo-code is enough, no need to microscope an alg. O(n) reps asymptotic upper bounds, Ω(n) reps lower, and θ(n) reps the tight bounds (which is found if the upper and lower bounds are the same or very close to being the same). Asymp growth rate algs have transitivity and sum properties. -Tips on determining asymp bounds. Polynomial: ignore everything but the highest order power. Log: for b>1 and x>0, logbn = O(n^x). Exponentials are terrible and cannot be bounded by a fixed constant c like the previous two can.

Table of Contents

Eric's Journal

Week 3's Reading Summary:

3.1 Basic Defs and Apps

3.2 Graph Connectivity and Graph Traversal