A graph G is simply a way of encoding pairwise relationships among a set of objects: it consists of a collection V of nodes and a collection E of edges, each of which “joins” two of the nodes. We thus represent an edge e ∈ E as a two-element subset of V: e = {u, v} for some u, v ∈ V, where we call u and v the ends of e.
Edges in a graph indicate a symmetric relationship between their ends.
1. Transportation networks 2. Communication networks 3. Information networks 4. Social networks 5. Dependency networks
We say that an undirected graph is a tree if it is connected and does not contain a cycle. Trees are the simplest kind of connected graph: deleting any edge from a tree will disconnect it. Rooted trees are fundamental objects in computer science, because they encode the notion of a hierarchy. Many Web sites are organized according to a tree-like structure, to facilitate navigation.
Theorem 3.1 Every n-node tree has exactly n - 1 edges.
Theorem 3.2 Let G be an undirected graph on n nodes. Any two of the following statements implies the third.
(i) G is connected.
(ii) G does not contain a cycle.
(iii) G has n - 1 edges.
This section was pretty simple and short, more of an introduction than anything else. I would give it a 9/10 for readability for this reason. It will be a nice resource that I can reference for the rest of the chapter.
In this section, we discuss node-to-node connectivity. Given a graph G = (V, E) and two particular nodes s and t, we'd like to find an efficient algorithm that answers the question “Is there any path from s to t in G?” This is called determining s-t connectivity and there are two natural algorithms for this problem: breadth-first search (BFS) and depth-first search (DFS).
Perhaps the simplest algorithm for determining s-t connectivity is breadth-first search (BFS), in which we explore outward from s in all possible directions, adding nodes one “layer” at a time. We start with s and include all nodes that are joined by an edge to s, creating the first layer of the search. We define the layers L1, L2, L3, …, Lj as follows:
Theorem 3.3 For each j ≥ 1, layer Lj produced by BFS consists of all nodes at distance exactly j from s. There is a path from s to t if and only if t appears in some layer.
Another property of breadth-first search is that it produces a tree T rooted at s on the set of nodes reachable from s. The root is layer L0 and the last layer of the tree is j.
Theorem 3.4 Let T be a breadth-first search tree, let x and y be nodes in T belonging to layers Li and Lj respectively, and let (x, y) be an edge of G. Then i and j differ by at most 1.
The BFS algorithm discovers the nodes that are reachable from the starting node s. We will call this set of nodes R, defined as the connected component of G containing s. Once we know the connected component containing s, we can simply check whether t belongs to it in order to answer the question of s-t connectivity. The BFS algorithm is just one way to produce this component. The algorithm below shows a different way to find R, the connected component.
R will consist of nodes to which s has a path
Initially R = {s}
While there is an edge (u, v) where u ∈ R and v ∉ R
Add v to R
Endwhile
The key property of this algorithm is:
Theorem 3.5 The set R produced at the end of the algorithm is precisely the connected component of G containing s.
Another algorithm to find all the nodes reachable from s is called the depth-first search. The DFS explores a graph G by going as deeply as possible and only retreating when necessary. We can invoke a DFS from any starting point and maintain global knowledge of which nodes have already been explored. To apply the following algorithm to s-t connectivity, we declare all nodes initially to be not explored, and invoke DFS(s).
DFS(u):
Mark u as "Explored" and add u to R
For each edge (u, v) incident to u
If v is not marked "Explored" then
Recursively invoke DFS(u)
Endif
Endfor
Theorem 3.6 For a given recursive call DFS(u), all nodes that are marked “Explored” between the invocation and end of this recursive call are descendants of u in T.
Theorem 3.7 Let T be a depth-first search tree, let x and y be nodes in T, and let (x, y) be an edge of G that is not an edge of T. Then one of x or y is an ancestor of the other.
There is a connected component associated with each node in the graph. The relationship between these components is highly structured and expressed in the following claim:
Theorem 3.8 For any two nodes s and t in a graph, their connected components are either identical or disjoint.
I thought this section was a lot easier to understand after last week's homework, problem set 3. I think that I learn best by doing as opposed to reading or listening so the problem sets are always helpful when trying to understand the reading. I thought this section had a lot of extraneous information that made it harder to read, so I would give it a 6/10 for readability.
In this section we discuss how to use lists and arrays to represent graphs, and we discuss the trade-offs between the different representations and how to actually implement BFS and DFS.
There are two ways to represent graphs: by an adjacency matrix and by an adjacency list representation. A graph G = (V, E) has two input parameters, the number of nodes |V|, and the number of edges |E|. We will use n = |V| and m = |E| to denote these. The run times will also be in terms of these input parameters.
1. Consider a graph G = (V, E) with n nodes, and assume the set of nodes is V = {1 ..... n}. The simplest way to represent a graph is by an **adjacency matrix**, which is an n x n matrix A
where A[u, v] is equal to 1 if the graph contains the edge (u, v) and 0 otherwise. If the graph is undirected, the matrix A is symmetric.
2. In the **adjacency list** representation there is a record for each node v, containing a list of the nodes to which v has edges. To be precise, we have an array Adj, where Adj [v] is a
record containing a list of all nodes adjacent to node v. For an undirected graph G = (V, E), each edge e = (v, w) ∈ E occurs on two adjacency lists: node w appears on the list for node
v, and node v appears on the list for node w.
Theorem 3.9 Σv∈V nv = 2m, the total length of all lists v and w
Theorem 3.10 The adjacency matrix representation of a graph requires O(n2) space, while the adjacency list representation requires only O(m + n) space.
The bound O(m + n) is never worse than O(n2) because m ≤ n2. Now, comparing the ease of accessing information stored in these two representations, in an adjacency matrix we can check in O(1) time if a particular edge (u, v) is present in the graph, but in the adjacency list representation this can take time proportional to the degree O(nv). In view of this, the adjacency list is a natural representation for exploring graphs.
In both DFS and BFS, the order in which elements are considered is crucial. Both queues and stacks can be easily implemented via a doubly linked list. In a queue a new element is added to the end of the list as the last element, while in a stack a new element is placed in the first position on the list. In the BFS we will want to use a queue and in the DFS we will want to use a stack.
The adjacency list data structure is ideal for implementing breadth-first search. In this implementation it does not matter whether we manage each list L[i] as a queue or a stack, since the algorithm is allowed to consider the nodes in a layer Li in any order. This implementation in terms of a single queue will produce the same result as the BFS implementation shown below.
BFS (s) :
Set Discovered[s] = true and Discovered[v] = false for all other v
Initialize L[0] to consist of the single element s
Set the layer counter i = 0
Set the current BFS tree T = 0
While L[i] is not empty
Initialize an empty list L[i + 1]
For each node u ∈ L[i]
Consider each edge (u, v) incident to u
If Discovered[v] = false then
Set Discovered[v] = true
Add edge (u, v) to the tree T
Add v to the list L[i + I]
Endif
Endfor
Increment the layer counter i by one
Endwhile
Theorem 3.11 The above implementation of the BFS algorithm runs in time O(m + n) (i.e., linear in the input size), if the graph is given by the adjacency list representation.
The DFS is naturally defined as a recursive procedure, but it can also be viewed as almost identical to BFS, with the difference that it maintains the nodes to be processed in a stack, rather than in a queue. Below is the full implementation of the DFS:
DFS (s) :
Initialize S to be a stack with one element s
While S is not empty
Take a node u from S
If Explored[u] = false then
Set Explored[u] = true
For each edge (u, v) incident to u
Add v to the stack S
Endfor
Endif
Endwhile
If we want the algorithm to also find the DFS tree, we need to have each node u on the stack S maintain the node that “caused” u to get added to the stack. This can be easily done by using an array parent and setting parent[v] = u when we add node v to the stack due to edge (u, v).
Theorem 3.12 The above algorithm implements DFS, in the sense that it visits the nodes in exactly the same order as the recursive DFS procedure in the previous section (except that each adjacency list is processed in reverse order).
Theorem 3.13 The above implementation of the DFS algorithm runs in time O(m + n) (i.e., linear in the input size), if the graph is given by the adjacency list representation.
Both BFS and DFS spend work only on edges and nodes in the connected component containing the starting node, s, meaning they never see any of the other nodes or edges. Thus the algorithm used to find all connected components of a graph only spends a constant amount of work on a given edge or node in the iteration when the connected component it belongs to is under consideration. Thus the overall running time of this algorithm is still O(m + n).
Again, after analyzing the algorithms in the problem set from last week I was better able to understand this section. I would give it a 7/10 for readability.
Definition: A Bipartite graph is one where the node set V can be partitioned into sets X and Y in such a way that every edge has one end in X and the other end in Y.
An example of a nonbipartite graph is a triangle. Clearly a triangle is not bipartite, since we can color one node red,,another one blue, and then we can’t do anything with the third node. This leads to the following observation:
Theorem 3.14 If a graph G is bipartite, then it cannot contain an odd cycle.
Now suppose we encounter a graph G with no color annotation provided for us, and we’d like to determine for ourselves whether it is bipartite–that is, whether there exists a partition into red and blue nodes, as required. We want to design an algorithm that will tell us whether a graph is bipartite or not.
The basic structure of the algorithm:
1. Assume the graph G is connected, since otherwise we can first compute its connected components and analyze each of them separately. 2. Pick any node s ∈ V and color it red; there is no loss in doing this, since s must receive some color. 3. It follows that all the neighbors of s must be colored blue, so we do this. 4. It then follows that all the neighbors of these nodes must be colored red, their neighbors must be colored blue, and so on, until the whole graph is colored.
At this point, either we have a valid red/blue coloring of G, in which every edge has ends of opposite colors, or there is some edge with ends of the same color. Notice that the coloring procedure we have just described is essentially identical to the description of BFS: we move outward from s, coloring nodes as soon as we first encounter them. Thus another way to describe the coloring algorithm is as follows: we perform BFS, coloring s red, all of layer L1 blue, all of layer L2 red, and so on, coloring odd-numbered layers blue and even-numbered layers red. The total running time for the coloring algorithm is O(m + n), just as it is for BFS.
Theorem 3.15 Let G be a connected graph, and let L1, L2 …. be the layers produced by BFS starting at node s. Then exactly one of the following two things must hold:
(i) There is no edge of G joining two nodes of the same layer. In this case G is a bipartite graph in which the nodes in even-numbered layers can be colored red, and the nodes in odd-numbered layers can be colored blue.
(ii) There is an edge of G joining two nodes of the same layer. In this case, G contains an odd-length cycle, and so it cannot be bipartite.
I'm still a bit unsure of what being bipartite means and why it's important. I think testing a graph for bipartiteness as a homework exercise would help me to understand better how the algorithm works. I also am not sure what the applications of knowing whether a graph is bipartite or not are, or if the point of introducing it in this chapter is another way of showing how BFS can be used. I would give this section a 6/10 for readability because I think it was very technical.
In a directed graph, the edge (u, v) has a direction: it goes from u to v.
We use a version of the adjacency list representation that we employed for undirected graphs to represent a directed graph. Now, instead of each node having a single list of neighbors, each node has two lists associated with it: one list consists of nodes to which it has edges, and a second list consists of nodes from which it has edges.
Breadth-first search and depth-first search are almost the same in directed graphs as they are in undirected graphs. For the BFS, we discover nodes layer by layer as they are reached in this outward search from s, and the nodes in layer j are precisely those for which the shortest path from s has exactly j edges. This algorithm performs at most constant work for each node and edge, resulting in a running time of O(m + n). In directed graphs, it is possible for a node s to have a path to a node t even though t has no path to s; and what directed BFS is computing is the set of all nodes t with the property that s has a path to t. Depth-first search also runs in linear time and computes the same set of nodes. It is again a recursive procedure that tries to explore as deeply as possible, in this case only following edges according to their inherent direction. Thus, when DFS is at a node u, it recursively launches a depth-first search, in order, for each node to which u has an edge.
A directed graph is strongly connected if, for every two nodes u and v, there is a path from u to v and a path from v to u. Two nodes u and v in a directed graph are mutually reachable if there is a path from u to v and also a path from v to u.
Theorem 3.16 If u and v are mutually reachable, and v and w are mutually reachable, then u and w are mutually reachable.
The strong component containing a node s in a directed graph to be the set of all v such that s and v are mutually reachable.
Theorem 3.17 For any two nodes s and t in a directed graph, their strong components are either identical or disjoint.
The idea of this section was relatively simple and I think the explanation was too long winded for the simplicity of the idea. I am also wondering how connected graphs can be used differently than non-connected graphs and what their purpose is in computer science. This section was short and to the point so I would give it an 8/10 for readability.
If a directed graph has no cycles, we call it a directed acyclic graph, or a DAG for short.
DAGs can be used to encode precedence relations or dependencies in a natural way. To represent a precedence relation, we can introduce a node for each task, and a directed edge (i, j) whenever i must be done before j. For a directed graph G, we say that a topological ordering of G is an ordering of its nodes as vl, v2 ….. vn so that for every edge (vi, vj), we have i < j. A topological ordering on tasks provides an order in which they can be safely performed.
Theorem 3.18 If G has a topological ordering, then G is a DAG.
The question we want to answer is does every DAG have a topological ordering, and if so, how do we find one efficiently?
Theorem 3.19 In every DAG G, there is a node v with no incoming edges.
The existence of such a node v is all we need to produce a topological ordering of G by induction.
Theorem 3.20 If G is a DAG, then G has a topological ordering.
The inductive proof contains the following algorithm to compute a topological ordering of G:
To compute a topological ordering of G:
Find a node v with no incoming edges and order it first
Delete v from G
Recursively compute a topological ordering of G-{v}
and append this order after v
To bound the running time of this algorithm, we note that identifying a node v with no incoming edges, and deleting it from G, can be done in O(n) time. Since the algorithm runs for n iterations, the total running time is O(n2). This is not a bad running time; and if G is very dense, containing Θ(n2) edges, then it is linear in the size of the input. We may want something better when the number of edges m is much less than n2. In such a case, a running time of O(m + n) could be a significant improvement over Θ(n2), and indeed this is possible.
I thought this section was interesting because of it's applications in the real world and in programming. I would give this section a 9/10 for readability because it was short and to the point.