Efficiently Processing Querieson Interval-and-Value Tuples
in Relational Databases
Jost Enderle, Nicole Schneider, Thomas Seidl
RWTH Aachen University, Germany
VLDB 2005, Trondheim
Data Management and ExplorationProf. Dr. Thomas Seidl
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 2
Data Management and ExplorationProf. Dr. Thomas Seidl
Outline
• Interval-and-Value (IaV) Data and Applications
• Relational Interval Tree (RI-tree)
• Managing Interval-and-Value Tuples Using RI-tree
• Experimental Results
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 3
Data Management and ExplorationProf. Dr. Thomas Seidl
Contracts table: storing period and budget of contracts
CREATE TABLE contracts (// key:c_no VARCHAR(10),// simple-valued attribute:c_budget DECIMAL(10,2),// interval:c_period ROW (
c_start DATE,c_end DATE))
Interval-and-Value Data: Example
No. Budget (k€)
Period
Start End
C1 250 2005-03-01 2005-31-07
C2 5300 2002-02-17 2003-05-06
C3 10700 1999-05-27 2001-12-17
C4 1600 2001-02-28 2002-11-02
C5 870 2002-06-25 2002-08-12
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 4
Data Management and ExplorationProf. Dr. Thomas Seidl
Interval-and-Value Data: Query
• Sample query on contracts table// Find all contractsSELECT c_no FROM contracts// within certain budget rangeWHERE c_budget BETWEEN 500 AND 2000
// running during certain time interval
AND c_period OVERLAPS(DATE ‘2003-03-01’, DATE ‘2004-01-31’)
• Special Cases of this general Range-Interval query:– Value-Interval Query // value range is a single point– Range-Stabbing Query // query interval is a single point– Value-Stabbing Query // both restrictions hold
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 5
Data Management and ExplorationProf. Dr. Thomas Seidl
Motivation of Relational Indexing
• Main Memory Structures– no persistency, no disk block structure
• Secondary Storage Structures+ persistency, high block-oriented efficiency
– integration into DBMS kernel typically not supported (GiST?)
• Relational Storage Structures+ basic idea: don‘t extend, just use RDBMS (virtual storage machine)
+ sound formal fundament, little implementation effort
+ immediate industrial strength (availability, robustness, ACID, …)
+ high efficiency by exploiting built-in indexing structures (B+-tree)
Disk
No DB
SQL
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 6
Data Management and ExplorationProf. Dr. Thomas Seidl
52 13234
15452
12, 15, C112, 15, C1
12, 10, C112, 10, C1
8, 13, C28, 13, C2 12, 15, C112, 15, C1
8, 5, C28, 5, C2 12, 10, C112, 10, C1
4, 7, C34, 7, C3 8, 13, C28, 13, C2 12, 15, C112, 15, C1
4, 1, C34, 1, C3 8, 5, C28, 5, C2 12, 10, C112, 10, C1
4, 7, C34, 7, C3 8, 13, C28, 13, C2 8, 15, C48, 15, C4 12, 15, C112, 15, C1
4, 1, C34, 1, C3 8, 3, C48, 3, C4 8, 5, C28, 5, C2 12, 10, C112, 10, C1
• Two relational indexes (B+-trees) store the interval bounds
lowerIndex (node,start,id):
upperIndex (node,end,id):
• Supported by any RDBMS: No modification of built-in B+-trees
• Optimal complexities for space, updates, and intersection queries
Relational Interval Tree
C4
7313 101 151
C3C2C1
15
8
1 3 5 7 13119
2 6 10 14
4 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
root = 2h-1
[Kriegel, Pötke, Seidl: VLDB 2000]based on [Edelsbrunner 1980]
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 7
Data Management and ExplorationProf. Dr. Thomas Seidl
Single Interval Query Processing
Two steps to process an interval query
1. Transform interval query into a set of range queries– The generated queries are collected in transient tables (no I/Os)
2. Perform a single SQL query– Join the transient query tables with the relational indexes
start endstart end
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 8
Data Management and ExplorationProf. Dr. Thomas Seidl
Preprocessing: Generate Query Ranges
• Generate a set of range queries for lowerIndex and upperIndex– At nodes left of start: report entries i with i.end start (32,48,52)(32,48,52)
– At nodes right of end: report entries i with i.start end (56)(56)
– For nodes between start and end: report all entries (54 - 55)(54 - 55)
start endstart end
upperIndex 32 48 52
lowerIndex 5654 to 55
1513
14
1 3
2
5 7
6
4
8
119
10
12
17 19
18
21 23
22
20
24
3129
30
2725
26
28
16
4745
46
33 35
34
37 39
38
36
40
4341
42
44
49 51
50
53 55
54
52
56
6361
62
5957
58
60
48
32
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 9
Data Management and ExplorationProf. Dr. Thomas Seidl
Processing by a Single SQL Query
• Join transient query tables with B+-tree indexesSELECT id
FROM upperIndex AS i JOIN :leftQueries USING (node)WHERE i.end >= :start
UNION ALL
SELECT idFROM lowerIndex AS i JOIN :rightQueries USING (node)WHERE i.start <= :end
UNION ALL
SELECT idFROM lowerIndex // or upperIndexWHERE node BETWEEN :start AND :end
• No duplicates are produced → UNION ALL
• Blocked output of index range scans is guaranteed
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 10
Data Management and ExplorationProf. Dr. Thomas Seidl
Extending the RI-tree for IaV Support (1)
• Add value predicate to RI-tree querySELECT id // lower subquery
FROM upperIndex AS i JOIN :leftQueries USING (node)WHERE i.end >= :startAND i.value BETWEEN :Value1 and :Value2
UNION ALL ... // upper subquery
UNION ALL
SELECT id // inner subqueryFROM lowerIndex // or upperIndexWHERE node BETWEEN :start AND :endAND value BETWEEN :Value1 and :Value2
• Integrate simple value attribute into lower-/upperIndex– old schema: (node, bound, id)
– new schema: ? → depends on type of query to support
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 11
Data Management and ExplorationProf. Dr. Thomas Seidl
Extending the RI-tree for IaV Support (2)
• Viable schemas for new lower-/upperIndexes– (value, node, bound, id)
– (node, value, bound, id) estimate access cost for each query type
– (node, bound, value, id)
• Observations (see paper for details):– Value queries best supported by (value, node, bound, id) index
• simple attribute predicates = point queries• evaluation requires same number of disk accesses as original proceeding
– Range Queries: choice of index not obvious• inner subquery of Range-Stabbing Queries best supported by
(node, value, bound, id)• otherwise: depends on stored data and values of query variables
• Question: Can Range Queries be further enhanced?
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 12
Data Management and ExplorationProf. Dr. Thomas Seidl
Improving Range Query Processing (1)
• Problem of composite indexes for multiple attributes– queries may contain range predicates on two or more of the indexed
attributes
– tuples satisfying first predicate lie in contiguous disk area
– tuples satisfying both/all predicates are scattered within this area
• Common solution: using space-filling curves– mapping multi-dimensional data to one-dimensional values
– similar values of original data are mapped on similar index data
– ranges of indexed attributes will be found in adjacent disk areas
• Application on RI-tree scenario– combining some attributes of lower-/upperIndex
– depends on type of query to support
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 13
Data Management and ExplorationProf. Dr. Thomas Seidl
Improving Range Query Processing (2)
Identifying viable schemas for new lower-/upperIndexes
– find subqueries containing several range predicates• for Range Queries: lower and upper subqueries (bound, value)• for Range-Interval Queries:
inner subquery (node, value)
– combine respective attributes (x,y)within space-filling curve {x,y}
– useful combinations forlower-/upperIndex:
• (node, {value, bound}) • ({node, value}, bound)
node
valu
e
lower
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 14
Data Management and ExplorationProf. Dr. Thomas Seidl
Improving Range Query Processing (3)
• Observations:– lower and upper subqueries of Range Queries will profit by a
(node, {value, bound}) index
– inner subquery of Range-Interval Queries will profit by a({node, value}, bound) index
– Value Queries will not profit by “space-filling indexes”
• Intermediate result– space-filling indexes can reduce disk accesses in certain cases
– there is no “universal” index supporting all queries to the same extent
– different subqueries will profit by different indexes
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 15
Data Management and ExplorationProf. Dr. Thomas Seidl
Identifying best indexes for each query type– Value Queries: best supported by (value, node, bound, id) index– Range Queries: depends on data and space-filling curve (if used)
• different subqueries best supported by different indexes
• subqueries may be evaluated separately using best index
• drawback: higher cost for index updates and storage requirements
Employing index mixes
Queries Lower/Upper Subquery Inner Subquery
Value-Stabbing (value, node, bound) (value, node, bound)
Value-Interval (value, node ,bound) (value, node, bound)
Range-Stabbing (node, {value, bound}) (node, value, bound)
Range-Interval (node, {value, bound}) ({node, value})
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 16
Data Management and ExplorationProf. Dr. Thomas Seidl
Adapting the RI-tree Algorithms (1)
Example: Evaluate a contracts query using „space-filling index“
Contracts table:– Node and Z-order value calculated for each tuple
– B-tree index on (node, Z(budget, start), no)
No.
Budget(k€)
Period
Node Z(budget, start)Start End
C1 2 1 5 4 4
C2 5 2 9 8 50
C3 10 8 17 16 221
C4 6 14 19 16 149
C5 8 21 26 24 186
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 17
Data Management and ExplorationProf. Dr. Thomas Seidl
Range-Interval Query: value range (1,12); interval (3,6)
Adapting the RI-tree Algorithms (2)
start
bu
dge
t
end
Ra
ng
e(1
, 12
)
start <= end
Evaluation ofupper subquerywith Z-order index
Evaluation ofupper subquerywith Z-order index
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 18
Data Management and ExplorationProf. Dr. Thomas Seidl
Access Cost with Varying Table Sizes
0
2.000
4.000
6.000
8.000
10.000
12.000
14.000
1,0E+05 1,0E+06 1,0E+07
table size [number of tuples]
acce
ss c
ost [
num
ber
of I/
Os] RI(VNB)
RI(NVB)
RI(NBV)
RI({NV})h
RI(N{VB})h
0
5.000
10.000
15.000
20.000
25.000
30.000
35.000
1,0E+05 1,0E+06 1,0E+07
table size [number of tuples]ac
cess
cos
t [nu
mbe
r of
I/O
s] RI(VNB)
RI(NVB)
RI(NBV)
RI({NV})h
RI(N{VB})h
Value-Stabbing QueriesValue-Stabbing Queries Value-Interval QueriesValue-Interval Queries
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 19
Data Management and ExplorationProf. Dr. Thomas Seidl
Access Cost with Varying Table Sizes
0
5.000
10.000
15.000
20.000
25.000
30.000
1,0E+05 1,0E+06 1,0E+07
table size [number of tuples]
acce
ss c
ost [
num
ber
of I/
Os] RI(VNB)
RI(NVB)RI(NBV)RI({NV})hRI(N{VB})hRI(NVB) + RI(N{VB})h
0
10.000
20.000
30.000
40.000
50.000
60.000
70.000
1,0E+05 1,0E+06 1,0E+07
table size [number of tuples]
acce
ss c
ost [
num
ber
of I/
Os] RI(VNB)
RI(NVB)RI(NBV)RI({NV})hRI(N{VB})hRI({NV})h + RI(N{VB})h
Range-Stabbing QueriesRange-Stabbing Queries Range-Interval QueriesRange-Interval Queries
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 20
Data Management and ExplorationProf. Dr. Thomas Seidl
Access cost for varying length of ranges
0
1.000
2.000
3.000
4.000
5.000
0 10 20 30 40 50
length of query range [% of attr. domain]
acce
ss c
ost [
num
ber
of I/
Os]
(VNB) (NVB)
({NV}B) (NVB) + (N{VB})
(VNB) + (N{VB}) (VNB) + (NVB) + (N{VB})
0
1.000
2.000
3.000
4.000
5.000
6.000
0 10 20 30 40 50
length of query range [% of attr. domain]
acce
ss c
ost [
num
ber
of I/
Os]
(VNB) ({NV}B)
({NV}B)+(N{VB}) (VNB)+({NV}B)
(VNB)+({NV}B)+(N{VB})
Stabbing QueriesStabbing Queries Interval QueriesInterval Queries
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 21
Data Management and ExplorationProf. Dr. Thomas Seidl
Access cost for varying length of ranges
0
1.000
2.000
3.000
4.000
5.000
6.000
0 10 20 30 40 50
length of query interval [% of int. domain]
acce
ss c
ost [
num
ber
of I/
Os]
({NV}B) (N{VB})
(NVB)+(N{VB}) ({NV}B)+(N{VB})
(VNB)+({NV}B)+(N{VB})
Range QueriesRange Queries
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 22
Data Management and ExplorationProf. Dr. Thomas Seidl
595
703
664
1544
1002
1558
1148
901
1995
2151
1572
1740
2680
1639
1773
4527
3842
2268
2457
2949
0 1000 2000 3000 4000 5000 6000
(VNB)+(NVB)+({NV}B)+(N{VB})
(VNB)+({NV}B)+(N{VB})
(VNB)+(NVB)+(N{VB})
(VNB)+(N{VB})
(VNB)+({NV}B)
(VNB)+(NVB)
({NV}B)+(N{VB})
(NVB)+(N{VB})
(N{VB}) Hilbert
(N{VB}) z-curve
({NV}B) Hilbert
({NV}B) z-curve
(NBV)
(NVB)
(VNB)
Spatial RI-tree
R-tree
RI-tree → B-tree
B-tree → RI-tree
B-tree ∩ RI-tree
(VLU)
(LUV)
access cost [number of I/Os]
Comparison with competing techniques
Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 23
Data Management and ExplorationProf. Dr. Thomas Seidl
Conclusions
• Processing Interval-and-Value Tuples in SQL databases
• Extensions of the Relational Interval Tree
• Various types of queries– Range vs. Value Queries
– Interval vs. Stabbing Queries
• Experiments demonstrate high performance
• Future work:– Extend proposed techniques to more complex queries (joins)
– Cost models to predict benefits for evolving query workload