bfs and dfs
DESCRIPTION
BFS and DFS. BFS and DFS in directed graphs BFS in undirected graphs An improved undirected BFS-algorithm. The Buffered Repository Tree (BRT). Stores key-value pairs (k,v) Supported operations: I NSERT (k,v) inserts a new pair (k,v) into T E XTRACT (k) extracts all pairs with key k - PowerPoint PPT PresentationTRANSCRIPT
BFS and DFS
• BFS and DFS in directed graphs• BFS in undirected graphs• An improved undirected BFS-algorithm
The Buffered Repository Tree (BRT)
• Stores key-value pairs (k,v)• Supported operations:
• INSERT(k,v) inserts a new pair (k,v) into T• EXTRACT(k) extracts all pairs with key k
• Complexity:
• INSERT: O((1/B)log2(N/B)) amortized
• EXTRACT: O(log2(N/B) + K/B) amortized (K = number of reported
elements)
The Buffered Repository Tree (BRT)
• (2,4)-tree• Leaves store between B/4 and B elements• Internal nodes have buffers of size B• Root in main memory, rest on disk
Main memory
Disk
Main memory
Disk
INSERT(k,v)
• O(X/B) I/Os to empty buffer of size X B• Amortized charge per element and level: O(1/B)
• Height of tree: O(log2(N/B))
• Insertion cost: O((1/B)log2(N/B)) amortized
Main memory
Disk
Elements with key k
EXTRACT(k)
• Number of traversed nodes: O(log2(N/B) + K/B)
• I/Os per node: O(1)
• Cost of operation: O(log2(N/B) + K/B)
• But careful with removal of extracted elements
Main memory
Disk
Main memory
Disk
Cost of Rebalancing
• O(N/B) leaf creations and deletions O(N/B) node splits, fusions, merges• Each such operation costs O(1) I/Os O(N/B) I/Os for rebalancing
Theorem: The BRT supports INSERT and EXTRACT operations in O((1/B)log2(N/B)) andO(log2(N/B) + K/B) I/Os amortized.
Directed DFS
• Algorithm proceeds as internal memory algorithm:• Use stack to determine order in which
vertices are visited• For current vertex v:
• Find unvisited out-neighbor w• Push w on the stack• Continue search at w• If no unvisited out-neighbor exists
• Remove v from stack• Continue search at v’s parent
• Stack operations cost O(N/B) I/Os• Problem: Finding an unvisited vertex
Directed DFS
• Data structures:• BRT T
• Stores directed edges (v,w) with key v
• Priority queues P(v), one per vertex• Stores unexplored out-edges of v
• Invariant:
Not in P(v)In P(v) and in TIn P(v), but not in T
Directed DFS
• Finding next vertex after vertex v:
v
EXTRACT(v): Retrieve red edges from T
Remove these edges from P(v) using DELETE
Retrieve next edge using DELETEMIN on P(v)
Insert in-edges of w into T
w
Push w on the stack
O(log2(|E|/B) + K1/B)
O(sort(K1))
O(1 + (K2/B)log2(|E|/B))
O(1/B) amortized
O((1/B)logm(|E|/B))
O(|V| log2(|E|/B) + |E|/B)O(|V| + sort(|E|))
O((|E|/B)log2(|E|/B))
O(|V|/B)
O(sort(|E|))
Total:O((|V| + |E|/B)log2(|E|/B))
Directed DFS + BFS
• BFS can be solved using same algorithm• Only modification: Use queue (FIFO) instead of
stack
Theorem: Depth first-search and breadth-first search in a directed graph G = (V,E) can be solved in O((|V|+|E|/B)log2(|E|/B)) I/Os.
Exercise: Convince yourself that the priority queues P(v) are not necessary in the case of BFS.
Undirected BFS
Observation: For v L(i), all its neighbors are inL(i – 1) L(i) L(i + 1).
Build BFS-tree level by level:• Initially, L(0) = {r}• Given levels L(i – 1) and L(i):
• Let X(i) = set of all neighbors of vertices in L(i)• Let L(i + 1) = X(i) \ (L(i – 1) L(i))
Partition graph into levels L(0), L(1), ...around source:L(0), L(1), L(2), L(3)
Undirected BFS
Constructing L(i + 1):• Retrieve adjacency lists of vertices in L(i)
X(i)• Sort X(i)• Scan L(i – 1), L(i), and X(i) to
• Remove duplicates from X(i)• Compute X(i) \ (L(i – 1) L(i))
Complexity: O(|L(i)| + sort(|L(i – 1)| + |X(i)|)) I/Os
O( ) I/Os|V| +sort(|E|)
Theorem: Breadth-first search in an undirected graph G = (V,E) can be solved in O(|V| + sort(|E|)) I/Os.
A Faster BFS-Algorithm
Problem with simple BFS-algorithm:• Random accesses to retrieve adjacency lists
Idea for a faster algorithm:• Load more than one adjacency list at a time
• Reduces number of random accesses• Causes edges to be involved in more than one
iteration of the algorithm Trade-off
A Faster BFS-Algorithm (Randomized)
• Let 0 < < 1 be a parameter (specified later)
• Two phases:• Build |V| disjoint clusters of diameter O(1/)• Perform modified version of SIMPLEBFS
• Clusters C1,...,Cq formed using BFS from randomly chosen set V’ = {r1,...,rq} of masters
• Vertex is chosen as a master with probability (coin flip)
Observation: E[|V’|] = |V|. That is, the expected number of clusters is |V|.
Forming Clusters (Randomized)
• Apply SIMPLEBFS to form clusters• L(0) = V’
• v Ci if v is descendant of ri
s
Forming Clusters (Randomized)
Lemma: The expected diameter of a cluster is 2/.
• E[k] 1/
Corollary: The clusters are formed in expected O((1/)sort(|E|)) I/Os.
xv1
v2
v3
v4
v5
s
vk
Forming Clusters (Randomized)
• Form files F1,...,Fq, one per clusterFi = concatenation of adjacency lists of vertices in Ci
• Augment every edge (v,w) Fi with the start position of file Fj s.t. w Cj:
• Edge = triple (v,w,pj)
s
The BFS-Phase
• Maintain a sorted pool H of edges s.t. adjacency lists of vertices in L(i) are contained in H
• Scan L(i) and H to find vertices in L(i) whose adjacency lists are not in H
• Form list of start positions of files containing these adjacency lists and remove duplicates
• Retrieve files, sort them, and merge resulting list H’ with H
• Scan L(i) and H to build X(i)• Construct L(i + 1) from L(i – 1), L(i), and X(i) as
before
O((|L(i)| + |H|)/B)
O(sort(|L(i)|))
O(K + sort(|H’|) + |H|/B)
O((|L(i)| + |H|)/B)
O(sort(|L(i)| + |L(i–1)| + |X(i)|))
The BFS-Phase
I/O-complexity of single step:• O(K + |H|/B +
sort(|H’| + |L(i – 1)| + |L(i)| + |X(i)|))
Expected I/O-complexity:O(|V| + |E|/(B) + sort(|E|))
• Choose
Theorem: BFS in an undirected graph G = (V,E) canbe solved in I/Os.
,1max EBV
Escan1EsortO EBV
Single Source Shortest Paths
• The tournament tree• SSSP in undirected graphs• SSSP in planar graphs
Single Source Shortest Paths
Need:
• I/O-efficient priority queue
• I/O-efficient method to update only unvisited vertices
The Tournament Tree
= I/O-efficient priority queue
• Supports:• INSERT(x,p)• DELETE(x)• DELETEMIN
• DECREASEKEY(x,p)
• All operations take O((1/B)log2(N/B)) I/Os amortized
Note: N = size of the universe # elements in the tree
The Tournament Tree
• Static binary tree over all elements in the universe• Elements map to leaves, M elements per leaf
• Internal nodes have signal buffers of size M• Root in main memory, rest on disk
Main memory
Disk
• Internal nodes store between M/2 and M elements
Main memory
Disk
The Tournament Tree
• Elements stored at each node are sorted by priority
• Elements at node v have smaller priority than elements at v’s descendants
• Convention: x T if and only if p(x) is finite
The Tournament TreeDeletions
• Operation DELETE(x) signal DELETE(x)
x
DELETE(x)UPDATE(x,)
v
The Tournament TreeInsertions and Updates
• Operations INSERT(x,p) and DECREASEKEY(x,p) signal UPDATE(x,p)
x
w
v
Current priority p’
If p < p’: UpdateIf p p’: Do nothing
All elements < p• Forward signal to w
At least one element p• Insert x• Send DELETE(x) to w
The Tournament TreeHandling Overflow
• Let y be element with highest priority py
• Send signal PUSH(y,py) to appropriate child of v
y
w
v
The Tournament TreeKeeping the Nodes Filled
w
v
O(M/B) I/Os to moveM/2 elements one level up the tree
Main memory
Disk
The Tournament TreeSignal Propagation
• Scan v’s signal, partition into sets Xu and Xw
• Load u into memory, apply signals in Xu to u,insert signals into u’s signal buffer
• Do the same for w• O((|X| + M)/B) = O(|X|/B) I/Os
The Tournament TreeAnalysis
• Elements travel up the tree• Cost: O(1/B) I/Os amortized per element and
level
• O((K/B)log2(N/B)) I/Os for K operations
• Signals travel down the tree• Cost: O(1/B) I/Os amortized per signal and
level• O(K) signals for K operations
• O((K/B)log2(N/B)) I/Os
Theorem: The tournament tree supports INSERT, DELETE, DELETEMIN, and DECREASEKEY operations in O((1/B)log2(N/B)) I/Os amortized.
Single Source Shortest Paths
Modified Dijkstra:• Retrieve next vertex v from priority queue Q
using DELETEMIN
• Retrieve v’s adjacency list• Update distances of all of v’s neighbors, except
predecessor u on the path from s to v
• Repeat
• O(|V| + (E/B)log2(V/B)) I/Os using tournament tree
Single Source Shortest Paths
Problem:
Observation: If v performs a spurious update of u,u has tried to update v before.
• Record this update attempt of u on v by insterting u into another priority queue Q’
Priority: d(s,u) + w({u,v})
u
v
Single Source Shortest Paths
Second modification:• Retrieve next vertex using two DELETEMIN’s,
one on Q, one on Q’
• Let (x,px) be the element retrieved from Q,let (y,py) be the element retrieved from Q’
• If px py: re-insert (y,py) into Q’ and proceed as normal
• If px < py: re-insert (x,px) into Q and perform a DELETE(y) on Q
Single Source Shortest Paths
Lemma: A spurious update is removed from Q before the targeted vertex can be retrieved using DELETEMIN.
• Event A: Spurious update happens (“time”: d(s,v))• Event B: Vertex u is deleted by retrieval of u
from Q’ (“time”: d(s,u) + w(e))
• Event C: Vertex u is retrieved from Q using DELETEMIN operation (“time”: d(s,v) + w(e))
u
v
Single Source Shortest Paths
• Assume that all vertices have different distance from source s
d(u) < d(v)• d(v) d(u) + w(e) < d(u) + w(e)
• Sequence of events: A B C
Theorem: The single source shortest path problem on an undirected graph G = (V,E) can be solved inO(|V| + (|E|/B)log2(|V|/B)) I/Os.
Planar Graphs
• Shortest paths in planar graphs• Planar separators• Planar DFS
Shortest Paths in Planar Graphs
s
GR
s v
vs
Shortest Paths in Planar Graphs
Observation: For every separator vertex v, the distances from s to v in G and GR are the same.
The distances from s to all separator vertices can be computed in GR.
s
Shortest Paths in Planar Graphs
Observation: For every vertex v in Gi,dist(s,v) = min{dist(s,x) + dist(x,v) : v Gi}.
Can compute dist(s,v) in the following graph:
vs
Shortest Paths in Planar Graphs
Three main steps:
• Solve all-pairs shortest paths in subgraphs Gi
• Compute shortest paths from s to separator vertices in GR
• Compute shortest paths from s to all remaining vertices
Shortest Paths in Planar Graphs
Regular h-partition:
• O(N/h) subgraphs G1,...,Gr
• Each Gi has size at most h
• Each Gi has boundary size at most
• Total number of separator vertices• Number of boundary sets is O(N/h)
h
h/NO
Shortest Paths in Planar Graphs
Three main steps:
• Solve all-pairs shortest paths in subgraphs Gi
• Compute shortest paths from s to separator vertices in GR
• Compute shortest paths from s to all remaining vertices
• Assume the given partition is regular B2-partition
Steps 1 and 3 take O(scan(N)) I/Os Graph GR has O(N/B) vertices and O(N) edges
Shortest Paths in Planar Graphs
Data structures:• List L storing tentative distances of all vertices• Priority queue Q storing vertices with their
tentative distances as priorities
One step:• Retrieve next vertex v using DELETEMIN
• Get distances of v’s neighbors from L• Update their distances in Q using DELETE and
INSERT
O(N + sort(N)) I/Os
Shortest Paths in Planar Graphs
• One I/O per boundary set• Each boundary set is touched O(B) times:
• Once per vertex on the boundary of the region• O(N/B2) boundary sets O(N/B) I/Os
Planar Separator
Goal: Compute a separator S of size whose removal partitions G into subgraphs of size at most h.
Basic idea:• Compute hierarchy of log(DB) graphs of
geometrically decreasing size using graph contraction
• Compute a separator of the smallest graph• Undo the contractions and maintain the
separator while doing this
Assumption: M = (h log2 B)
h/NO
G0
Planar Separator
G1
G2
Planar Separator
Properties:
• All Gi are planar
• |Gi+1| |Gi|/2
• Every vertex in Gi+1 represents only a constant number of vertices in Gi
• Every vertex in Gi+1 represents at most 2i+2 vertices in G0
• r = log2(DB) graphs G0,…,Gr
|Gr| = O(N/(DB))
Planar Separator
G0
G1
G2
Planar Separator
• Compute separator Sr of Gr:
• Sr = Sr partitions Gr into connected components of size at most hlog2(DB)
• Takes O(|Gr|) = O(N/B) I/Os [AD96]
Planar Separator
• Compute Si from Si+1:
• Let Si be the set of vertices in Gi represented by the vertices in Si+1
• Connected components of Gi – Si have size at most chlog2(DB)
• Partition every connected components of size more than hlog2(DB) into components of size hlog2(DB) separator Si
• Takes O(sort(|Gi|)) I/Os:• Connected components O(sort(|Gi|))
• Partitioning happens in internal memory
• Total: O(sort(N)) I/Os
Planar Separator
• Separator S0 partitions G0 into connected components of size at most hlog2(DB)
• Size of S0:
h/NO
Blogh/NO
Blogh/GO2
S2S
r
0i
r
0ii
i
r
0ii
i0
Planar Separator
• Compute a superset S of S0 so that no connected component of G – S has size more than h:• Partition every connected component of G –
S0 separately in internal memory
• Total number of extra separator vertices is
• Extra cost: O(sort(N)) I/Os
h/NO
h/NOTheorem: A separator S of size whose removal partitions G into subgraphs of size at most h can be obtained in O(sort(N)) I/Os, provided that M = (h log2 B).
Building the Graph Hierarchy
Properties:
• All Gi are planar
• |Gi+1| |Gi|/2
• Every vertex in Gi+1 represents only a constant number of vertices in Gi
• Every vertex in Gi+1 represents at most 2i+2 vertices in G0
• Build Gi+1 from Gi by
• Contracting edges• Merging vertices of degree 2 with the same
neighbors
Building the Graph Hierarchy
Iterative approach:• Extract set of edges that can be contracted• Contract subset of these edges to reduce
number of vertices by a factor of two• Repeat until no contractible edges remain
Problem:• Standard graph contraction procedure may
contract too many vertices into a single vertex.
Building the Graph Hierarchy
Solution:• Compute maximal matching of contractible
subgraph• Contract edges in the matching
New problem:• We may not contract sufficient number of edges to
reduce number of vertices by a constant factor
Two-stage contraction:• Contract maximal matching• Contract edges between matched and
unmatched vertices
Building the Graph Hierarchy
Why is this two-stage approach good?• No unmatched vertex remains in contractible
subgraph• Every matched vertex represents at least two
vertices before the contraction
Size of graph reduces by a factor of two If a single iteration takes O(sort(|Gi|)) I/Os, the
whole construction of Gi+1 from Gi takesO(sort(|Gi|)) I/Os
A Single Contraction Phase
• Maximal matching can be computed and contracted in O(sort(|H|)) I/Os, where H is the current contractible subgraph
• Bipartite contraction:
• Takes O(sort(|H|)) I/Os using buffer tree as priority queue
Building the Graph Hierarchy
Lemma: Graph Gi+1 can be constructed from Gi in O(sort(|Gi|)) I/Os.
Corollary: The whole graph hierarchy can be built in O(sort(|G0|)) = O(sort(N)) I/Os.
Level 0Level 1Level 2
Planar DFS
s
Planar DFS
s
Planar DFS
ObservationObservation: Every cycle in the i-th layer is a boundary cycle of graph Gi.
Every bicomp of a layer is a cycle.
Level > i
Level < i
DFS in a Layer
Planar DFS
• DFS in a single layer Hi takes O(sort(|Hi|)) I/Os:
• Compute the bicomps• Root the bicomp tree• Remove one of the edges incident to parent
cutpoint in each cycle
Total I/O-complexity: O(sort(N))
Planar DFS
Gi
v
r
Planar DFS
Building the Face-on-Vertex Graph
Lower Bounds and Open Problems
• Lower bounds• List ranking, BFS, DFS, and shortest paths• Connected and biconnected components
• Open problems
Lower BoundsSplit Proximate Neighbors
1 2 3 4 5 6 7 8 1 23 45 67 8
123 456 7 8123 456 7 8
Lower BoundsSplit Proximate Neighbors
Lemma: Split proximate neighbors requires (perm(N)) I/Os.
1 2 3 4 5 6 7 8 1 23 45 67 8
1 2 3 4 5 6 7 8 1 23 45 67 8
123 456 7 8123 456 7 8
I(N)
1 2 3 4 5 6 7 8 1 23 45 67 8
I(N)123 456 7 8123 456 7 8
O(scan(N))
Total: O(I(N) + scan(N)) = O(I(N)) I(N) = (perm(N))
Lower BoundsList Ranking
• Consider general algorithms for weighted list ranking
• Algorithm is only allowed to use associativity of sum operator
Algorithm can be made to have the following property:• For every vertex v, v and succ(v) are both in
main memory at some point during the course of the algorithm
Note: The lower bound we show does not hold for unweighted list ranking or weighted list ranking over groups.
1 2 3 4 5 6 7 8 1 23 45 67 81 2 3 4 5 6 7 8 1 23 45 67 8
Lower BoundsList Ranking
• When both copies of x are in main memory, move to buffer of size B
• When buffer full, flush to disk• Split proximate neighbors could be solved in
O(I(N) + scan(N)) I/Os I(N) = (perm(N))
Lower BoundsList Ranking, BFS, DFS, and Shortest
PathsTheorem: List ranking requires (perm(N)) I/Os.
• List ranking can be solved using BFS, DFS, or SSSP from the head of the list.
Theorem: BFS, DFS, and SSSP require (perm(N)) I/Os.
Note: Again, lower bound holds only for algorithms that compute distances from source only by adding path lengths.
Lower BoundsSegmented Duplicate Elimination
• Let P N P2
• Elements drawn from interval [2P+1,3P]• Construct Boolean array C[2P+1..3P] s.t.
C[i] = 1 iff i S
Proposition: Segmented duplicate elimination requires (perm(N)) I/Os.
17 181920 22 2319 1920 20222018 231719S:
P/2 P/2 P/2 P/2
17 181920 22 2319 1920 20222018 231719
S1 S2 S3 S4
17
18
19
20
21
22
23
24
1
2
3
4
Lower BoundsConnected Components
• Graph construction O(scan(N)) I/Os• |V| = (P), |E| = N
Lower BoundsConnected and Biconnected Components
Theorem: Computing the connected components of a graph G = (V,E) requires (perm(|E|)) I/Os.
Theorem: Computing the biconnected components of a graph G = (V,E) requires (perm(|E|)) I/Os.
More Classes of Sparse Graphs
• Grid graphs• Separators: Size in O(sort(N)) I/Os• BFS/SSSP: O(sort(N))• DFS:
• Graphs of bounded treewidth• Separators: O(N/h) in O(sort(N)) I/Os• BFS/SSSP: O(sort(N))• DFS: ???
h/NO
B/NO
Open Problems
• Optimal separators for grid graphs• DFS
• Grid graphs• Graphs of bounded treewidth
• Semi-external shortest paths• Optimal connectivity• Optimal BFS, DFS, and shortest paths or lower
bounds• Directed graphs
• Topological sorting• Strongly connected components