Graph Algorithms
Contents
0 Algorithm Cheat Sheet
| Algorithm | Applications | Intuition | Running Time | Space | Implementation |
|---|---|---|---|---|---|
| Breadth First Search | |||||
| Depth First Search | |||||
| Prim’s | |||||
| Kruskal | |||||
| Boruvka | |||||
| Dijkstra’s | |||||
| Topological Sort | |||||
| Connected Components | |||||
| Strongly Connected Components | |||||
| Vertex Colouring | |||||
| Edge Colouring |
1 Terminology
Minimum Number of Vertices to reach all nodes
In a DAG, this is equal to the set of nodes with indegree 0
2 Representations of Graphs
- We assume a graph contains n vertices and m edges
Adjacency List
- Using lists to store the neighbours of each vertex
- more space efficient for sparse graphs
- O(m + n) graph traversal***
- Suitable for most problems
Adjacency Matrix
- an matrix where element if is an edge of , and 0 if it is not
- O(1) for edge insertion, deletion, and querying (is ) in ?)
- Uses excessive space for sparse graphs (graphs with many vertices and few edges)
3 Breadth First Search
- The Key idea behind graph traversal is to mark each vertex when we first encounter it and keep track of what we have not yet completely explore.
- Each vertex will exist in one of three states
- Undiscovered — the vertex in its initial state
- Discovered — the vertex has been found, but we have not yet explored all its incident edges
- Finished — the vertex after we have visited all of its incident edges (also, processed.)
from collections import deque
def bfs(graph, start):
# our data structures for bfs:
q = deque()
visited = {} # keep track of state of nodes
# initialize the breadth-first-search
q.append(start)
visited[start] = 0 # 0 means its been discovered but not finished processing
while len(q) != 0:
node = q.popleft()
for neighbour in graph[node]:
if not neighbour in visited:
visited[neighbour] = 0
q.append(neighbour)
visited[node] = 1 # mark the node as finished after visiting its neighbours
Exploiting Traversal
- Depending on the use-case, we might process a vertex early (ie, when we first remove it from the queue) or we might process it late (after exploring its incident edges)
Finding Paths
- a BFS from starting node u will yield the shortest path to any node visited during the traversal (assuming unweighted graph)
BFS Tree
- Breadth First Search (or any graph traversal in general) produces a BFS Tree of nodes and edges discovered along the BFS
3.1 Applications of Breadth First Search
3.1.1 Connected Components
- connected component: for an undirected graph, it is a maximal set of vertices such that there is a path between every pair of vertices
- BFS can be used to find connected components. Any node visited during a bfs from start node u will be part of the same connected component.
- You can find all connected components of a graph by performing bfs from each unvisited node and recording the nodes visited in each traversal.
3.1.2 Two-Colouring Graphs
- the vertex colouring problem seeks to assign a label to each vertex of a graph so that no adjacent vertices share the same colour (label).
- a graph is bipartite if it can be coloured without conflicts using two colours.
- We can augment breadth-first search:
- so that whenever we discover a new vertex, we colour it the opposite of its parent.
- we check whether any non-tree edge (an edge that is not part of the BFS tree) links two vertices of the same color.
- this would mean the graph is not two-colourable
- if the process terminates without conflicts, we have constructed a proper two-colouring
def bfsBipartite(self, start, graph, visited):
q = deque()
q.append((start, 1))
visited[start] = 1
while q:
node, clr = q.popleft()
nxtClr = self.getClr(clr)
for nbr in graph[node]:
if nbr not in visited:
visited[nbr] = nxtClr
q.append((nbr, nxtClr))
elif visited[nbr] == clr: ## check non-tree edge is valid
return False
return True
def getClr(self, clr):
if clr == 1:
return 0
if clr == 0:
return 1
4 Depth First Search
-
Difference between BFS and DFS is in the order in which they explore the vertices.
-
DFS uses a stack to store discovered but not processed vertices, meaning newly discovered vertices will be processed before ones that have already been discovered
- This results in a traversal that goes to the deepest nodes in the tree before backtracking to finish processing the parent vertices
-
Depth-first search uses essentially the same idea as backtracking, which involve exhaustively searching all possibilities by advancing if it is possible, and backing up only when there is no remaining unexplored possibility for further advance. Both are most easily understood as recursive algorithms.
-
We can keep track of different timestamps during the DFS
- discover (entry) time: the time we first discover a node and add it to the dfs stack
- ** process time: the time we start to process a node, popping it from the dfs stack
- finish (exit) time: the time we finish processing a node, after completing the dfs on its children
-
we use the entry and exit times in several applications of DFS, ie-
- topological sorting
- biconnected/strongly connected components
DFS also partitions edges:
- in an undirected graph: tree edges and back edges
- in a directed graph: tree edges, back edges, forward edges and cross edges
DFS Tree
undefined- Tree edges: those which are part of the DFS Tree. Tree edges discover new vertices
- Back edges: those who’s other endpoint points to a node that has already been discovered (but is not yet finished). In other words, they point to an ancestor of the node which is currently being explored.
- cross edges:
- Forward Edges:
4.1 Applications of Depth First Search
- the correctness of a DFS-based algorithm depends on specifics of exactly when we process the edge and vertices
- we can process vertex v either before we have traversed any outgoing edges rom v (early)
- or we can process v after we have finished processing the outgoing edges (late)
- or both, depending on what we want to do
4.1.1 Cycle Detection
- Back edges are the key to finding a cycle in an undirected graph
4.1.2 Articulation Vertices
- articulation vertices: a vertex whose deletion disconnects a connected component of the graph
- connectivity of a graph: the smallest number of vertices whose deletion will disconnect the graph. The connectivity is 1 if the graph has an articulation vertex
- The root of the search tree is a special case, if it has only one child it functions as a leaf node, if the root has two or more children, its deletion disconnects them
- Finding articulation vertices requires keeping track of the extent to which back edges link parts of the DFS tree back to ancestor nodes.
- Root cut nodes
- Bridge cut nodes
- parent cut nodes
5 Depth First Search on Directed Graphs
-
The correct labelling of each edge can be determined from the state, discovery time, and parent of each vertex.
-
Tree edges: those which are part of the DFS Tree. Tree edges discover new vertices
-
Back edges: those who’s other endpoint points to a node that has already been discovered (but is not yet finished). In other words, they point to an ancestor of the node which is currently being explored.
-
cross edges:
-
Forward Edges:
5.1 Topological Sorting
- Topological sorting: orders vertices such that all directed edges go from left to right
- Each DAG has atleast one topological sort (there may be many topological sorting for a single DAG)
- Topological sorting gives us an ordering so we can process each vertex before any of its successor
- Labelling the vertices in the reverse order that they are marked finished defines a topological sort of a DAG
- we explore and finish all of a nodes successors prior to finishing the current node.
5.2 Strongly Connected Components
- a directed graph is strongly connected if there is a directed path between any two vertices The following 2 conditions must hold for any vertex V in G
- All vertices are reachable from v
- All vertices can reach v
- To test for (1), do a BFS or DFS traversal from v
- to test for (2), we do a BFS or DFS on the transpose of G
- this identifies all vertices that have paths to v
All directed graphs can be partitioned into strongly connected components using a double DFS approach
- Run a DFS on G and push vertices on the stack with ascending finish time like topological sort (we want the vertices which were discovered first aka the ones which were FINISHED last)
- Compute the transpose of G
- Run the second DFS on but explore nodes in the “topological sort” order computed in (1)
6 Minimum Spanning Trees
- a spanning tree of a connected graph G = (V,E) is a subset of edges from E forming a tree connecting all vertices of V
- the minimum spanning tree is a spanning tree whose sum of edge weights is as small as possible
6.1 Lazy Prim’s MST
- Maintain a min PQ that sorts edges based on min edge cost
- Start the algorithm on any node s
- Mark s as visited
- Iterate over all edges of s, adding it to the pq if its node has not already been visited
- While the PQ is not empty and a MST has not been formed, dequeue the next cheapest edge from the PQ
- if the dequeued edge has already been visited, continue (skip it and poll again)
- Else, mark the current node as visited and add the selected edge to the MST
- Repeat steps 2-5 on all unvisited nodes
- Break when the number of edges in the tree is one less than the number of nodes in the graph (this is the definition of a tree) O(m + nlogn)
## run prim
visited = set()
pq = []
start = 1
visited.add(start)
for nbr, cost in graph[start]:
heapq.heappush(pq, (cost, nbr))
treeSize = 0
treeCost = 0
while len(pq) != 0 and treeSize != n-1:
cost, node = heapq.heappop(pq)
if node in visited:
continue
treeCost += cost
treeSize += 1
visited.add(node)
for nbr, cost in graph[node]:
if nbr not in visited:
heapq.heappush(pq, (cost, nbr))
if treeSize == n-1:
return treeCost
else:
return -1
Eager Prim’s MST
- Keep track of the next best edge to add to the priority queue (to avoid polling stale edges)
- use and Indexed Priority Queue data structure
Kruskal’s Algorithm
-
more efficient on sparse graphs (graphs with more nodes than edges)
-
the algorithm
- consider the lightest edge
- and checks if its two endpoints lie in the same corrected componenet
- insert the edge in the tree only if the endpoints lie in different components (if they were in the same component it would create a cycle)
-
Time Complexity:::
- sorting m edges takes O(mlogm) time
- the while loop makes at most m iterations (iteration through at most all edges)
- simple implementation O(mn)
- faster implementation using union-find O(mlgm)
class Solution:
def minimumCost(self, n: int, connections: List[List[int]]) -> int:
return self.kruskal(n, connections)
def kruskal(self, n: int, connections: List[List[int]]) -> int:
### union-find
parent = {i:i for i in range(1,n+1)} # O(V)
def find(x):
if parent[x] == x:
return x
parent[x] = find(parent[x]) #path compression
return parent[x]
def union(x,y):
parent[find(y)] = find(x)
sortedEdges = sorted(connections, key=lambda x: x[2], reverse=True) # O(ElogE)
treesize = 0
treecost = 0
while sortedEdges and treesize != n-1: # runs at most E times (assuming E > V-1)
start, end, cost = sortedEdges.pop()
if find(start) != find(end): #logV
treecost += cost
treesize += 1
union(start,end) # log V
return treecost if treesize == n-1 else -1
# O (ElogE + ElogV), if E >= V then O(ElogE)
def prims(self, n: int, connections: List[List[int]]) -> int:
# build the graph O(E)
graph = defaultdict(list)
for a,b, cost in connections:
graph[a].append([b,cost])
graph[b].append([a,cost])
# initialize the bfs - pick any node to start
# then add all of its edges to the pq
start = 1
visited = set([start])
pq = []
for nbr,cost in graph[start]:
heapq.heappush(pq, (cost, nbr))
minCost = 0
treeSize = 0
# this loop runs N-1 (E) times in the worstcase
while pq and treeSize != n-1:
cost, node = heapq.heappop(pq) #logV
if node in visited:
continue
visited.add(node)
minCost += cost
treeSize += 1
#O(ElogV) in total because we skip if the nbr is already visited,
for nbr, nbrCost in graph[node]:
if nbr not in visited:
heapq.heappush(pq, (nbrCost, nbr))
return minCost if treeSize == n-1 else -1
# O( E + ElogV + ElogV )
# = O (ElogV + ElogV) = O(ElogV)
The Union Find Data Structure
- a set partition parcels out the elements of some universal set into a collection of disjoint subsets, where each element is in exactly on subset
- the connected components in a graph can be represented as a set partition
- We a need a data structure that efficiently supports the following operations:
- Same component(v1, v2).
- Merge components(C1, C2) — Merge the given pair of connected components into one component in response to the insertion of an edge between them
- the union-find data structure represents each subset as a “backwards” tree, with pointers from a node to its parent
- each node of the tree contains a set element and the name of the set is taken from the key at the root
- We also keep track of the number of elements in the sub tree rooted in each vertex v
- Find(i) — Find the root of the tree containing element i by walking up the parent pointers until there is nowhere to go. Return the label of the root
- Union(i,j) — Link the root of the tree containing i to the root of the tree containing j so find(i) == find(j)
- To minimize the time it takes to execute the worst possible sequence of union and finds, we must limit the height of our trees
- the most obvious way to control this is the choice of the two component roots that becomes the root of the merged component
- to minimize the tree height, its better to make the smaller tree the sub tree of the bigger one — the heights of the nodes in the root sub tree stay the same but the height of the nodes merged into this tree all increase by one.
- Time Complexity::: O(logn) for both union and finds. Why? bla bla bla
Variations on Minimum Spanning Trees
- Maximum Spanning Trees
- Minimum product spanning trees — recall , thus the minimum spanning tree on a graph whose edge weights are replaced with their logarithms gives the minimum product spanning tree on the original graph
- Minimum bottleneck spanning tree — same as a minimum spanning tree
7 Shortest Paths
Dijkstra’s Algorithm
- starting from a point s, it finds the shortest path from s to all other vertices in the graph, including you’re desired destination t
- the algorithm proceeds in a series of rounds, where each round establishes the shortest path from s to some new vertex
- x is the vertex that minimizes over all unfinished vertices
- this suggests a dynamic programming-like strategy
- Once we determine the shortest path to a node x, we check all the outgoing edges of x to see whether there is a shorter path from s through x to some unknown vertex.
- Its very similar to Prims
- The desirability is a function of the the new edge weight AND the distance from s to the tree vertex it is adjacent to
- for prim, desirability is only a function of the new edge weight
- The algorithm defines a shortest path spanning tree rooted in s
- Dijkstra works correctly only on graphs without negative-cost edges.
All-Pairs Shortest Path
- Computing the shortest path between all pairs of vertices in a given graph
- ex - find the center vertex in a graph — the one that minimizes the longest or average distance to all other nodes.
- We could solve all-pairs shortest path by calling Dijkstras algorithm from each of the n possible starting vertices
- Floyd’s algorithm provides a slick way to construct this distance matrix from the original weight matrix of the graph
- Floyd algorithm runs in time. While not asymptomatically better than n calls to Dijkstra’s algorithm, it runs better in practice.
Design Graphs, Not Algorithms
- Proper modeling is the key to making effective use of graph algorithms
Graph Problems - Polynomial Time
Connected Components
What if my graph is directed?
- a directed graph is strongly connected if there is a directed path between every pair of vertices
- a directed graph is weakly connected if it would be connected when ignoring the directed edges
- testing whether a directed graph is weakly connected can be done in linear time
- Simply turn all edges of G into undirected edges and use DFS/BFS connected components algorithm
- a directed graph’s strongly connected components can be computed in linear time using DFS-based algorithms
- Perform a DFS from an arbitrary vertex in G, labelling each vertex in order of its completion time
- Compute the transpose of G by reversing the direction of each edge in G
- Perform a DFS of G, in the descending order of completion time from step (1). If this search does not completely traverse G, start a new search continuing to respect the order.
- Each DFS tree created in step (3) defines a strongly connected component
What is the weakest point in my graph?
Is the graph a tree? How can I find a cycle if one exists?
- For undirected graphs, the analogous problem is tree identification
- If the graph is connected and has n-1 edges for n vertices, it is a tree
- DFS can be used to find cycles in both directed and undirected graphs – whenever we encounter a back edge in a DFS, we have found a cycle.
Topological Sorting
Input Description: a directed acyclic graph Problem Description: Find a linear ordering of the vertices of V such that for each edge , vertex is to the left of vertex , Discussion:
- Topological sort arises as a sub problem is most algorithms on DAGs
- It can be used to schedule jobs under precedence constraints
- Three important facts about topological sorting:
- Only DAGs can be topologically sorted
- Every DAG can be topologically sorted
- DAGs can usually be topologically sorted in many different ways, especially when there are few constraints.
Algorithms
- Perform a DFS and order the vertices in decreasing DFS finish time
Alternate Algorithm:
- Perform a DFS to identify the complete set of source vertices
- Place source vertices at the front of any schedule.
- Delete all outgoing edges of these source vertices to create a new set of source vertices
- repeat steps 2 and 3 until all vertices are accounted for
What if I need all the linear extensions, instead of just one of them?
- Beware, the number of linear extensions grows exponentially in the size of the DAG.
- The problem of counting the number of linear extensions is NP-hard
What if the graph is not acyclic?
- when a set of constrains contains inherent contradictions, the problem becomes removing the smallest set of i temps that eliminates all inconsistencies
- the set of offending jobs (vertices) or constraints (edges) whose d’élection leaves a DAG are known as a feedback vertex set
- Unfortunately both problems are NP-complete
- Because DFS based topological sorting algorithm gets stuck as soon as it identifies a vertex on a directed cycle, we can delete the offending edge or vertex and continue
- this heuristic will eventually leave a DAG but might delete more things than necessary
Minimum Spanning Tree
Input Description: A graph with weighted edges Problem Description: Find a subset of edges that define a tree of minimum weight on
Discussion:
- The MST of a graph defines the cheapest subset of edges that connects the graph in a single component
- MSTs are important for several reasons
- They can be computed quickly and easily, and create a sparse subgraph that reflects a lot about the original graph
- They provide a way to identify clusters in a set of points
- They can be used to give approximate solutions to hard problems
Algorithms
Kruskal‘s
Prim’s
Boruvka’s
Are all edges of the of graph identical weight?
- if the graph is unweighted then any spanning tree must be a minimum spanning tree.
- BFS or DFS can be used to find a spanning tree in minimum time
Should I use Prim’s or Kruskal’s algorithm?
- Prim with pairing heaps —
- Kruskal —
- Prim implementation using pairing heaps would be the fastest practical choice for both sparse and dense graphs
What if my input consists of points in the plane, instead of a graph?
- The complete distance graph can be constructed in time and then finding the MST of this complete graph
- For points in 2 dimensions, tt’s more efficient to solve the problem directly — construct the Delauney triangulation of the points which gives the graph with edges containing all the edges of the MST of the point set
- Running Kruskal on this sparse graph finishes the job in time
How can I find a spanning tree that avoid vertices of high degree?
- Another common goal of spanning tree problems is to minimize the max degree.
- This problem is identical to the Hamiltonian path problem
- Unfortunately this problem is NP complete
Shortest Path
Input Description: an edge-weighted graph G, with vertices s and t Problem Description: Find the shortest path from s to t in G
Is your graph weighted or unweighted?
- If the graph is unweighted then a simple BFS from the source vertex while find the shortest path to all other vertices in linear time
Does your graph have negative cost weights?
- Dijkstra’s algorithm does not work on graphs with negative cost weights, you will need to use the Bellman-Ford algorithm instead
Is your input a set of geometric obstacles?
- the most straightforward solution is to convert your problem into a graph of distances to feed to Dijkstra’s
Is your graph acyclic?
- Shortest paths in directed acyclic graphs can be found in linear time
- Perform a topological sort to order the vertices starting from the source s.
- The distance from s to itself is clearly 0, we now process the vertices from left to right
- Observe
- the minimal distance from s to j is equal to the minimum of (the distance from s to any node where i and j are incident + the weight of the edge
- we already know the shortest path for all vertices to the left of .
- Most dynamic programming problems can be modeled as shortest paths on specific DAGs.
- Thee same algorithm suffices to find the longest path by replacing min with max.
- longest paths are useful in applications like scheduling
Do you need the shortest path between all pairs of points?
- The naive approach is to run Dijkstra’s algorithm n times
- The Floyd-Warshall algorithm is a slick dynamic programming algorithm for all-pairs shortest path which is faster and easier to program than Dijkstra;s and works on negative ghost edges (but not cycles)
How do I find the shortest cycle in a graph?
- ???
Transitive Closure and Reduction
Input Description: a directed graph Problem Description: Transitive Closure:
- Construct a graph with edge iff there is a directed path from to in . Transitive Reduction:
- Construct a graph G’ with the smallest number of edges s.t. a directed path from to exists in iff there is a directed path from to in .
Discussion:
- Transitive closure can be thought of as establishing a datastructe to efficient solve reachability queries: Can I get to from ?
- After constructing a transitive closure matrix , such queries can be answered in cosntant time by reporting matrix entry
3 Basic Algorithms for computing Transitive Closure:
- perform a BFS or DFS from each vertex, and keep track of all vertices encountered,
- Doing n such traversals gives algorithm which degenerates to cubic time when the graph is dense
- Easy to implement and runs well on sparse graphs
- Warshall’s algorithm constructs the transitive closure in time
- the approach is identical to Floyd’s all-pairs shortest path algorithm
- Matrix multiplication can also be used to solve transitive closure
- All vertices in any strongly connected component must reach exactly the same subset of G
- We can reduce our problem to finding the transitive closure of a graph of strongly-connencted components
- SCC of G can be computed in linear time
Transitive Reduction:
- also known as minimal equivalent digraph, is the inverse problem of transitive closure
- goal: reduce the number of edges while maintaining identical reachability properties
- Quick and dirty linear time approach:
- Algorithm
- Identify the SCC of G
- Replace each SCC with a simple directed cycle
- Add edges to bridge the different components
- One catch:
- it might add edges to the transitive reduction that are not in .
- Algorithm
- A heuristic for edge preserving transitive reduction:
- Algorithm:
- Successively consider each edge
- Delete an edge iff its removal does not change the transitive closure
- Note:
- Implementing this efficiently means minimizing the time spent on reachability tests.
- Algorithm:
9 Graph Problems – NP-Hard
9.1 Travelling Salesman Problem
9.2 Hamiltonian Cycle
9.4 Vertex Colouring
Notes
- Vertex colouring arises in scheduling and clustering applications
- Register allocation in compiler optimization is the canonical application of colouring
- Each variable in a program fragment has a range of times during which it is live (its value must be kept intact) – after it’s initialized and before its final use
- Any two variables whose live ranges (life spans) overlap cannot be placed in the same register
- construct a graph where each vertex corresponds to a variable, with an edge between vertices whose live ranges overlap.
- If we colour the vertices of the graph, we ensure that no variables assigned the same colour will clash
- The smallest number of colours sufficient to vertex colour a graph is called its chromatic number.
- Computing the chromatic number of a graph is NP-complete
- Incremental methods are the heuristic of choice
- Vertices are coloured sequentially with the colours chosen in response to colours already assigned in the vertex’s neighborhood.
- The methods vary in how the next vertex is selected and how it is assigned a colour
- Experience suggests inserting vertices in non-increasing order of degree, because high-degree vertices have more colour constraints and are more likely to require an additional colour if inserted late
- Brelaz’ heuristic dynamically selects the uncoloured vertex of highest colour degree (adjacent to the most different colours), and colors it with the lowest-numbered unused colour
Several special cases of interest arise in practice:
Can I colour the graph using only 2 colours?
- An important special case is testing whether a graph is bipartite
- Fast simple algorithms exist for matching problems when restricted to bipartite graphs
- ...
Is the graph planar, or are all vertices of low degree?
Is this an edge colouring problem?
- Certain vertex color problems can be modeled as edge colouring – we seek to color the edges of a graph G such that edges get different colors if they share a vertex in common
- There is an efficient algorithm that always returns a near optimal edge-colouring.
Implementation
Greedy Algorithm - Take any available vertex and assign it the smallest available color DSATUR - Same as Greedy, but instead take the vertex with the highest degree of saturation (most colours assigned to neighbours), break ties with indegree
undefined9.5 Edge Colouring
Problem Description: What is the smallest set of colours needed to colour the edges of s.t. no two edges of the same colour share a common vertex?
- Edge colouring arises in scheduling applications
- typically associated with minimizing the number of non interfering rounds needed to complete a given set of tasks.
- We construct a graph whose vertices are people and whose edges represent the pairs of people who must meet
- An edge colouring of defines the schedule, with colour classes representing different time periods in the schedule, so that all meetings of the same colour happen simultaneously.
- the minimum number of colours required is called the edge-chromatic number.