i/o and space-efficient path traversal in planar graphs craig dillabaugh, carleton university meng...
TRANSCRIPT
I/O and Space-Efficient Path Traversal in Planar Graphs
Craig Dillabaugh, Carleton University
Meng He, University of Waterloo
Anil Maheshwari, Carleton University
Norbert Zeh, Dalhousie University
Background: Succinct Data Structures What are succinct data structures
(Jacobson 1989) Representing data structures using ideally
information-theoretic minimum space Supporting efficient navigational operations
Why succinct data structures Large data sets in modern applications:
textual, genomic, spatial or geometric
Background: External Memory Model
Parameters N: number of elements in the problem instance M: size of the internal memory B: size of a disk block
Cost: number of I/O’s (block transfers) between internal memory and external memory
Aggarwal and Vitter 1988
CPU
Internal Memory
Block
External Memory
Our Contributions Our goal is to design data structures that are
both succinct and efficient in the External Memory setting
Our results A succinct representation of bounded-degree planar
graphs that supports I/O-efficient path traversal A succinct representation of triangulated terrains that
supports various geometric queries
Notation
N: number of vertices of the given graph G d: maximum degree of vertices q: number of bits required to encode the key of each
vertex K: the length of the path
3
5 31
99
4
12
22
18
4
Two-Level Partition A tool: graph separator (Frederickson 1987)
Size of each subgraph (region): r Number of regions: Θ(N/r) Number of boundary vertices: O(N/(r1/2))
Two-level partition Subdivide G into regions of fixed maximum size Subdivide each region into sub-regions of smaller
fixed maximum size Types of vertices for each region / subregion
Interior vertices Boundary vertices
α-Neighbourhood Definition
Beginning with a given vertex v, we perform a breadth-first search in G and select the first α vertices encountered
The α-neighbourhood of v is the subgraph of G induced by these vertices
Internal and terminal vertices Property: The distance between v and any terminal vertex in its α-
neighbourhood is at least logd α
In our representation, we store α-neighbourhood of each boundary vertex. If a sub-region boundary vertex is interior to a region, we add an additional constraint that its α-neighbourhood cannot be extended beyond the region
Overview of Labeling Scheme Labels at three levels for the same vertex
Graph-label (unique) Region-label (one or more) Subregion-label (one or more)
Assign the labels for bottom up
Sub-Region Labels Encoding subregion Ri,j using any succinct
representation for planar graphs
This induces a permutation of the vertices in Ri,j
Subregion-label: the kth vertex in the above permutation has subregion-label k in Ri,j
Region-Labels and Graph-Labels
1, 2, 3, 4, 5, 6 1, 2, 3, 4, 5 1, 2, 3, 4, 5, 6, 7
R1,1R1,2
R1,3
R1
1, 2, 3, 4, 5, 6 7, 8, 9, 10, 11, 12,13,14,15 …
The assignment of graph-labels are similar
Succinct structures of o(n) bits are constructed to support conversion between labels at different levels in O(1) I/O’s
Data Structures Denote by A the maximum number of vertices that may
be stored in a block, and this is our maximum sub-region size
Choose Alg3N to be the maximum size of each region We only encode sub-regions and α-neighbourhoods of
boundary vertices as components Encode the graph structure of each component in a
succinct fashion Information is encoded so that we can retrieve the graph
labels of the internal vertices in an α-neighbourhood without requiring additional I/O’s
Space Analysis We assume B = Ω(lg N) A = (B lg N) / (c + q)
c: number of bits per vertex required to the sub-graph structure and boundary bit vector
Choose α = A1/3
Intuitively, our structures are space-efficient because: Region boundary vertices are few enough, so that information
such as the graph labels of the vertices in their α-neighbourhoods do not occupy too much space
The number of sub-region boundary vertices is larger, but information such as region-labels uses fewer bits (lg (Alg3N))
Total space: O(N) + Nq + o(Nq) bits
Traversal Algorithm Load either a sub-region or the α-neighbourhood
of a boundary vertex
Traverse the above component until a boundary/terminal vertex is encountered
Load the next component from external memory and traversal continues
I/O Efficiency Observations
When encountering a terminal/boundary vertex, the next component can be loaded in O(1) I/O’s
Given a component, the graph labels of all interior/internal vertices can be reported without incurring any additional I/O’s
By loading a constant number of components, we can visit Ω(lg B) vertices along the path
I/O complexity: O(K / lg B)
Main Result A succinct representation of bounded-
degree planar graph:
Space: O(N) + Nq + o(Nq) bits
I/O complexity for path traversal: O(K / lg B)
Terrains Modeled as Triangular-Irregular Network Notation
N: number of points Φ: number of bits required to store the coordinates of
each point Space:
NΦ + O(N) + o(NΦ) bits I/O complexity:
Reporting a path crossing K faces: O(K / lg B)
Queries on Triangulated Terrains Point location: O(log B N) I/O’s Terrain profile: O(K / lg B) I/O’s Trickle path: O(K / lg B) I/O’s Connected component
O(K / lg B) I/O’s if the component is convex Can be generalized to components that are not
convex, though the result is more complex
Conclusions We designed a succinct representation of
bounded-degree planar graphs that supports I/O-efficient path traversal, and applied this to terrains modeled as TIN to support queries
This provides solutions to modern applications that process very large data
Future work: combining succinct data structures and external memory data structures for other problems
Thank you!