db seminar schedule seminar schedule...
TRANSCRIPT
DB Seminar Schedule Seminar Schedule
=================================================================Chui Chun Kit 30/11/07Gong Jian Jim 7/12/07Loo Kin Kong 14/12/07Ngai Wang Kay Jackie 21/12/07Siu Wing Yan Angela 4/1/08Tam Ming Wai 11/1/08Tsang Pui Kwan Smith 18/1/08U Leong Hou Kamiru 25/1/08Wong Wai Kit 1/2/08Cui Yingjie Jason 15/2/08LEE King For 22/2/08Lin Zhifeng Arthur 7/3/08Yuan Wenjun Clement 14/3/08Zhang Shiming Simon 28/3/08Zhang Yiwei Kelvin 11/4/08LEE Yau Tat 18/4/08Pan Guodong Delvin 25/4/08
Please send the abstract to [email protected] one week before your talk
Refreshing the Sky: The Compressed Skycube with Efficient Support for Frequent Updates
Authors :Tian Xia, Donghui Zhang
Northeastern University
Published in : SIGMOD 2006
Presenter : Chun-Kit Chui (Kit)
Presentation Outline
Introduction What is skylines? Motivation for subspace skyline queries. Skycube.
Compressed skycube (CSC) How to use the compressed skycube to answer
skyline queries? How to handle object updates in compressed
skycube? Experimental evaluation Conclusion
Introduction
What is skyline?
What is skyline?
Skyline query: Find the hotels with both low price and close to the beach.
t7 1 4
Price Dist. To Beach
t1 3 2
t2 4 7
t3 9 5
t4 9 1
t5 2 3
t6 6 1
Hotels in Hawaii
Ranked the price of the hotels (the
smaller the cheaper).
Ranked the distance of the hotel to the beach
(the smaller the closer).
What is skyline?
Skyline query: Find the hotels with both low price and close to the beach.
t5
t6
t7
t1
t4
t2
t3
1 2 3 4 5 6 7 8 9
1
2
3
4
5
6
7
8
Dist. To Beach
Price
t7 1 4
Price Dist. To Beach
t1 3 2
t2 4 7
t3 9 5
t4 9 1
t5 2 3
t6 6 1
Hotels in Hawaii
Ranked the price of the hotels (the
smaller the cheaper).
Simple plot of the hotels dataset with x-axis as the
price rank, and y-axis as the rank of the dist. to beach.
Ranked the distance of the hotel to the beach
(the smaller the closer).
The hotels highlighted in red are the hotels with both low price and close to the beach among the others. They are the skylines in this query space.
An object t which is not dominated by any objects in a set of dimensions U is called the skyline of U. i.e. t sky(U).
Here, we say hotel t6 dominates hotel t3 in terms of price and distance to the beach.
An object a dominates object b in a set of dimensions U if the a is smaller than or equal to b in all dimensions. And a has a smaller value in at least one dimension.
What is skyline?
Skyline query: Find the hotels with both low price and close to the beach.
t5
t6
t7
t1
t2
t3
1 2 3 4 5 6 7 8 9
1
2
3
4
5
6
7
8
Dist. To Beach
Price
t4
Simple plot of the hotels dataset with x-axis as the
price rank, and y-axis as the rank of the dist. to beach.
The hotels highlighted in red are the hotels with both low price and close to the beach among the others. They are the skylines in this query space.
An object t which is not dominated by any objects in a set of dimensions U is called the skyline of U. i.e. t sky(U).
Here, we say hotel t6 dominates hotel t3 in terms of price and distance to the beach.
An object a dominates object b in a set of dimensions U if the a is smaller than or equal to b in all dimensions. And a has a smaller value in at least one dimension.
What is skyline?
Skyline query: Find the hotels with both low price and close to the beach.
t5
t6
t7
t1
t2
t3
1 2 3 4 5 6 7 8 9
1
2
3
4
5
6
7
8
Dist. To Beach
Price
t4
Without loss of generality, we will use the MIN
operation to evaluation skylines in this talk.
t7 1 3 4 1
u1 u2 u3 u4
t1 3 4 2 5
t2 4 6 7 2
t3 9 7 5 6
t4 4 3 6 1
t5 2 2 3 1
t6 6 1 1 3
Subspace Skyline Query
In many applications, users may issue skyline queries based on arbitrary subsets of dimensions. Price, dist. to the beach, dist. to the shopping center…etc
Results of subspace skylines can be very different!
u1 1 2 3 4 5 6 7 8 9
1
2
3
4
5
6
7
8
u3
t2
t1
t3t4
t5
t6
t7
t5
Skyline in u1, u3
u4 1 2 3 4 5 6 7 8 9
1
2
3
4
5
6
7
8
u3
t5
t6 t7
t1
t2
t3
t4
Skyline in u3, u4Objects of 4-dimensions
t7 1 3 4 1
u1 u2 u3 u4
t1 3 4 2 5
t2 4 6 7 2
t3 9 7 5 6
t4 4 3 6 1
t5 2 2 3 1
t6 6 1 1 3
Subspace Skyline A d-dimensional space contains 2d-1 subspaces, and the
subspaces of various users’ interests are unpredictable. On-the-fly computation (compute from scratch upon each
query) does not achieve fast response time for an online system.
u1 1 2 3 4 5 6 7 8 9
1
2
3
4
5
6
7
8
u3
t2
t1
t3t4
t5
t6
t7
t5
Skyline in u1, u3
u4 1 2 3 4 5 6 7 8 9
1
2
3
4
5
6
7
8
u3
t5
t6 t7
t1
t2
t3
t4
Skyline in u3, u4Objects of 4-dimensions
t7 1 3 4 1
u1 u2 u3 u4
t1 3 4 2 5
t2 4 6 7 2
t3 9 7 5 6
t4 4 3 6 1
t5 2 2 3 1
t6 6 1 1 3
Subspace Skyline
Skycube (proposed by Yuan, et al., in VLDB 2005) is the collection of all subspace skyline results. To answer subspace query, simply retrieve
the skylines of the corresponding cuboids.
Objects of 4-dimensions
t7
Cuboid Skyline
u1
u2
u3
u4
u1 , u2
u1 , u3
u1 , u4
u2 , u3
u2 , u4
u3 , u4
u1 , u2 , u3
u1 , u2 , u4
u1 , u3 , u4
u2 , u3 , u4
u1 , u2 , u3 , u4
t5 , t6
t1 , t5 , t6 , t7
t1 , t5 , t6 , t7
t5 , t6 , t7
t5 , t6
t5 , t6
t6
t7
t1 , t5 , t6 , t7 , t9
t5 , t6 , t7 , t9
t5 , t7 , t4
t6
t6
t1 , t5 , t6 , t7 , t9
The cuboid w.r.t. all dimensions is called the
full space cuboid.
Objects in the full space cuboid are called the full space skyline objects sky(D).
Complete Skycube
Subspace Skyline
Skycube (proposed by Yuan, et al., in VLDB 2005) is the collection of all subspace skyline results.
A skycube can be viewed as a lattice of cuboids.
t7
Cuboid Skyline
u1
u2
u3
u4
u1 , u2
u1 , u3
u1 , u4
u2 , u3
u2 , u4
u3 , u4
u1 , u2 , u3
u1 , u2 , u4
u1 , u3 , u4
u2 , u3 , u4
u1 , u2 , u3 , u4
t5 , t6
t1 , t5 , t6 , t7
t1 , t5 , t6 , t7
t5 , t6 , t7
t5 , t6
t5 , t6
t6
t7
t1 , t5 , t6 , t7 , t9
t5 , t6 , t7 , t9
t5 , t7 , t4
t6
t6
t1 , t5 , t6 , t7 , t9
u1 u2 u3 u4
u1 u2 u3 u1 u2 u4 u1 u3 u4 u2 u3 u4
u1 u2 u3 u4
u1 u2 u1 u3 u1 u4 u2 u3 u2 u4 u3 u4
Skycube of 4-dimensions
The cuboid w.r.t. all dimensions is called the
full space cuboid.
Complete Skycube
Motivations
In many scenarios of the subspace skyline applications, the data are changing constantly. In an online hotel-booking system, room
prices change due to the availability. On-the-fly computation
Low update cost. Slow query response time.
Complete Skycube Fast query response time. High update cost
The skycube contains a huge number of duplicates.
t7
Cuboid Skyline
u1
u2
u3
u4
u1 , u2
u1 , u3
u1 , u4
u2 , u3
u2 , u4
u3 , u4
u1 , u2 , u3
u1 , u2 , u4
u1 , u3 , u4
u2 , u3 , u4
u1 , u2 , u3 , u4
t5 , t6
t1 , t5 , t6 , t7
t1 , t5 , t6 , t7
t5 , t6 , t7
t5 , t6
t5 , t6
t6
t7
t1 , t5 , t6 , t7 , t9
t5 , t6 , t7 , t9
t5 , t7 , t4
t6
t6
t1 , t5 , t6 , t7 , t9
Complete Skycube
Motivations
In many scenarios of the subspace skyline applications, the data are changing constantly. In an online hotel-booking system, room
prices change due to the availability. On-the-fly computation
Low update cost. Slow query response time.
Complete Skycube Fast query response time. High update cost
The skycube contains a huge number of duplicates.
For example, object t6 appears in 12 cuboids. Whenever t6 is updated, at least 12 cuboids have to be updated. In addition, all affected cuboids have to be recomputed to reflect the correct result. Both waste of storage and difficult to maintain.
t7
Cuboid Skyline
u1
u2
u3
u4
u1 , u2
u1 , u3
u1 , u4
u2 , u3
u2 , u4
u3 , u4
u1 , u2 , u3
u1 , u2 , u4
u1 , u3 , u4
u2 , u3 , u4
u1 , u2 , u3 , u4
t5 , t6
t1 , t5 , t6 , t7
t1 , t5 , t6 , t7
t5 , t6 , t7
t5 , t6
t5 , t6
t6
t7
t1 , t5 , t6 , t7 , t9
t5 , t6 , t7 , t9
t5 , t7 , t4
t6
t6
t1 , t5 , t6 , t7 , t9
Complete Skycube
Motivations
In many scenarios of the subspace skyline applications, the data are changing constantly. In an online hotel-booking system, room
prices change due to the availability. On-the-fly computation
Low update cost. Slow query response time.
Complete Skycube Fast query response time. High update cost
The skycube contains a huge number of duplicates.
For example, object t6 appears in 12 cuboids. Whenever t6 is updated, at least 12 cuboids have to be updated. In addition, all affected cuboids have to be recomputed to reflect the correct result. Both waste of storage and difficult to maintain.
t7
Cuboid Skyline
u1
u2
u3
u4
u1 , u2
u1 , u3
u1 , u4
u2 , u3
u2 , u4
u3 , u4
u1 , u2 , u3
u1 , u2 , u4
u1 , u3 , u4
u2 , u3 , u4
u1 , u2 , u3 , u4
t5 , t6
t1 , t5 , t6 , t7
t1 , t5 , t6 , t7
t5 , t6 , t7
t5 , t6
t5 , t6
t6
t7
t1 , t5 , t6 , t7 , t9
t5 , t6 , t7 , t9
t5 , t7 , t4
t6
t6
t1 , t5 , t6 , t7 , t9
Complete Skycube
The Compressed Skycube
The Compressed Skycube
In the paper, the authors proposed: A new compressed model for the Skycube, which greatly
reduces the storage. A new object-aware update scheme, which avoids
unnecessary disk access and cuboids' computation. By taking advantages of the compact structure and
the update scheme, the Compressed Skycube achieves both fast query response and efficient update.
Minimum Subspace
DEFINITION: Given an object t, the minimum subspaces of t, denoted as mss(t), satisfies the following two conditions:
1. For any subspace U in mss(t), t is in the skyline of U;
2. For any subspace V U, t is not in the skyline of V.
Minimum Subspace Consider object t6 again, it appears
in the skylines of 12 cuboids. The minimum subspaces of t6 are
cuboids u2 and u3.
u1 u2 u3 u4
u1 u2 u3 u1 u2 u4 u1 u3 u4 u2 u3 u4
u1 u2 u3 u4
u1 u2 u1 u3 u1 u4 u2 u3 u2 u4 u3 u4
Cuboids that contain object t6
t7
Cuboid Skyline
u1
u2
u3
u4
u1 , u2
u1 , u3
u1 , u4
u2 , u3
u2 , u4
u3 , u4
u1 , u2 , u3
u1 , u2 , u4
u1 , u3 , u4
u2 , u3 , u4
u1 , u2 , u3 , u4
t5 , t6
t1 , t5 , t6 , t7
t1 , t5 , t6 , t7
t5 , t6 , t7
t5 , t6
t5 , t6
t6
t7
t1 , t5 , t6 , t7 , t9
t5 , t6 , t7 , t9
t5 , t7 , t4
t6
t6
t1 , t5 , t6 , t7 , t9
Cuboids that contain object t6 are highlighted in blue.
Complete Skycube
Based on the definition of minimum subspaces, mms(t6) are highlighted in red.
Minimum Subspace
u1 u2 u3 u4
u1 u2 u3 u1 u2 u4 u1 u3 u4 u2 u3 u4
u1 u2 u3 u4
u1 u2 u1 u3 u1 u4 u2 u3 u2 u4 u3 u4
Cuboids that contain object t6
t7
Cuboid Skyline
u1
u2
u3
u4
u1 , u2
u1 , u3
u1 , u4
u2 , u3
u2 , u4
u3 , u4
u1 , u2 , u3
u1 , u2 , u4
u1 , u3 , u4
u2 , u3 , u4
u1 , u2 , u3 , u4
t5 , t6
t1 , t5 , t6 , t7
t1 , t5 , t6 , t7
t5 , t6 , t7
t5 , t6
t5 , t6
t6
t7
t1 , t5 , t6 , t7 , t9
t5 , t6 , t7 , t9
t5 , t7 , t4
t6
t6
t1 , t5 , t6 , t7 , t9
u1 u2 u3 u4
u1 u2 u3 u1 u2 u4 u1 u3 u4 u2 u3 u4
u1 u2 u3 u4
u1 u2 u1 u3 u1 u4 u2 u3 u2 u4 u3 u4
Cuboids that contain object t5
Cuboids that contain t5.
Complete Skycube
Cuboids that contain object t6 are highlighted in blue.
Based on the definition of minimum subspaces, mms(t6) are highlighted in red.
Minimum Subspacet7
Cuboid Skyline
u1
u2
u3
u4
u1 , u2
u1 , u3
u1 , u4
u2 , u3
u2 , u4
u3 , u4
u1 , u2 , u3
u1 , u2 , u4
u1 , u3 , u4
u2 , u3 , u4
u1 , u2 , u3 , u4
t5 , t6
t1 , t5 , t6 , t7
t1 , t5 , t6 , t7
t5 , t6 , t7
t5 , t6
t5 , t6
t6
t7
t1 , t5 , t6 , t7 , t9
t5 , t6 , t7 , t9
t5 , t7 , t4
t6
t6
t1 , t5 , t6 , t7 , t9
u1 u2 u3 u4
u1 u2 u3 u1 u2 u4 u1 u3 u4 u2 u3 u4
u1 u2 u3 u4
u1 u2 u1 u3 u1 u4 u2 u3 u2 u4 u3 u4
Cuboids that contain object t5
Cuboids that contain t5.
t4 u4 t9 u1, u2, u1, u3
t7 u1, u4
t1 u1, u3 t5 u4, u1, u2, u1, u3 t6 u2, u3
Minimum Subspaces
Similar for all other skyline objects, we store the minimum subspaces of all skyline objects in a table.
Complete Skycube
For easy processing in the later sections, we organize the full-space skyline objects together in the front.
Compressed Skycubet7
Cuboid Skyline
u1
u2
u3
u4
u1 , u2
u1 , u3
u1 , u4
u2 , u3
u2 , u4
u3 , u4
u1 , u2 , u3
u1 , u2 , u4
u1 , u3 , u4
u2 , u3 , u4
u1 , u2 , u3 , u4
t5 , t6
t1 , t5 , t6 , t7
t1 , t5 , t6 , t7
t5 , t6 , t7
t5 , t6
t5 , t6
t6
t7
t1 , t5 , t6 , t7 , t9
t5 , t6 , t7 , t9
t5 , t7 , t4
t6
t6
t1 , t5 , t6 , t7 , t9
t4 u4 t9 u1, u2, u1, u3
t7 u1, u4
t1 u1, u3 t5 u4, u1, u2, u1, u3 t6 u2, u3
Minimum Subspacest7
Cuboid Skyline
u1
u2
u3
u4
u1 , u2
u1 , u3 t1 , t5 , t9
t5 , t9
t5 , t7 , t4
t6
t6
Compressed Skycube
The compressed skycube (CSC) consists of non-empty cuboids such that an object t is stored in a cuboid iff the cuboid is the minimum subspace of t.
Compare with the complete skycube, CSC has fewer number of duplicates.
Complete Skycube
Querying the Compressed Skycube
Querying CSC Overview example: query space
Uq = u2, u3 , u4 To find the skylines in Uq i.e. sky(Uq), we
only need to: LEMMA 1. search within the cuboids of
the compressed skycube which are the subsets of Uq.
u1 u2 u3 u4
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
t7 1 3 4 1
t4 4 3 6 1
t9 2 2 3 7
Raw dataset
Compressed SkycubeCuboid Skyline
u1 t7
u2 t6
u3 t6
u4 t5 , t7, t4
u1, u2 t5 , t9
u1 , u3 t1 , t5 , t9
Querying CSC Overview example: query space
Uq = u2, u3 , u4 To find the skylines in Uq i.e. sky(Uq), we
only need to: LEMMA 1. search within the cuboids of
the compressed skycube which are the subsets of Uq.
u1 u2 u3 u4
u1 u2 u3 u1 u2 u4 u1 u3 u4 u2 u3 u4
u1 u2 u3 u4
u1 u2 u1 u3 u1 u4 u2 u3 u2 u4 u3 u4
Cuboids that contain object t5
u1 u2 u3 u4
u1 u2 u3 u1 u2 u4 u1 u3 u4 u2 u3 u4
u1 u2 u3 u4
u1 u2 u1 u3 u1 u4 u2 u3 u2 u4 u3 u4
Cuboids that contain object t6
If an object is the skyline of <u2,u3,u4>, the object must appear in some cuboids in the CSC which are subset of <u2,u3,u4>.
u1 u2 u3 u4
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
t7 1 3 4 1
t4 4 3 6 1
t9 2 2 3 7
Raw dataset
Querying CSC Overview example: query space
Uq = u2, u3 , u4 To find the skylines in Uq i.e. sky(Uq), we
only need to: LEMMA 1. search within the cuboids of
the compressed skycube which are the subsets of Uq.
LEMMA 2. If an object t in a cuboid V (V is subset of Uq) is not dominated in Uq by other objects in the same cuboid, then t is a full space skyline object.
u1 u2 u3 u4
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
t7 1 3 4 1
t4 4 3 6 1
t9 2 2 3 7
Raw dataset
Compressed SkycubeCuboid Skyline
u1 t7
u2 t6
u3 t6
u4 t5 , t7, t4
u1, u2 t5 , t9
u1 , u3 t1 , t5 , t9
No comparison is needed for t6.
t5, t7, t4 are only locally compared to each other. Why ?
Querying CSC LEMMA 2. If an object t in a cuboid V (V
is subset of Uq) is not dominated in Uq by other objects in the same cuboid, then t is a full space skyline object.
u1 u2 u3 u4
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
t7 1 3 4 1
t4 4 3 6 1
t9 2 2 3 7
Raw dataset
Compressed SkycubeCuboid Skyline
u1 t7
u2 t6
u3 t6
u4 t5 , t7, t4
u1, u2 t5 , t9
u1 , u3 t1 , t5 , t9
No comparison is needed for t6.
t5, t7, t4 are only locally compared to each other. Why ?
t5
t7
t4
u4
There are NO objects in this area. Otherwise, t5, t7, and t4 will not be the skylines of u4.
Therefore, no other objects can dominate t5, t7 and t4 in the superset of u4. e.g.If t4 is not dominated by t7 and t8 in full space, no other objects can dominate t4.
u3
Plot of objects t5, t7 and t4 in u4. Since they have the smallest values in u4, they are skylines of u4.
The query system based on CSC
Query system based on CSC
Query bufferQuery buffer
Skyline
query
Query
results
The system consists of a query buffer. The query buffer stores the most frequently requested query results.
Query system based on CSC
Query bufferQuery buffer CompressedSkycube
CompressedSkycube
CSC-based query system
Skyline
query
Query
results
If the requested query results are not in the buffer, the query buffer issue a query miss request to CSC.
Query miss
The system consists of a query buffer. The query buffer stores the most frequently requested query results.
Query system based on CSC
Query bufferQuery buffer CompressedSkycube
CompressedSkycube Disk
CSC-based query system
Skyline
query
Query
results
If the requested query results are not in the buffer, the query buffer issue a query miss request to CSC.
Query miss Updates
Updates
CSC monitors the updates of objects.
The system consists of a query buffer. The query buffer stores the most frequently requested query results.
Query system based on CSC
Query bufferQuery buffer CompressedSkycube
CompressedSkycube Disk
CSC-based query system
Skyline
query
Query
results
If the requested query results are not in the buffer, the query buffer issue a query miss request to CSC.
Query miss
Disk access
Updates
Updates
CSC monitors the updates of objects.
According to different object updates, CSC decides whether it needs to access the disk to retrieve new objects that are not in CSC.Disk access should be minimized.
The system consists of a query buffer. The query buffer stores the most frequently requested query results.
Query system based on CSC
Query bufferQuery buffer CompressedSkycube
CompressedSkycube Disk
CSC-based query system
Skyline
query
Query
results
If the requested query results are not in the buffer, the query buffer issue a query miss request to CSC.
Query miss
Invalidating Disk access
Updates
Updates
CSC monitors the updates of objects.
Finally, if some cuboids are updated, results in the buffer may not be accurate anymore. CSC then invalidates the affected query results in the buffer.
According to different object updates, CSC decides whether it needs to access the disk to retrieve new objects that are not in CSC.Disk access should be minimized.
The system consists of a query buffer. The query buffer stores the most frequently requested query results.
Updating the Compressed Skycube
Updating CSC
Intuitions: Not all updates of objects need to access the disk. Not all updates of objects need to re-compute the skyline of a cuboid.
These intuitions are supported by the theorems in the paper.
D: full-space; sky(D): full-space skyline. t: object before update; tnew: object after update.
t sky(D) No dataset
(disk) access
tnew sky(D)
tnew sky(D)
May access
dataset (disk)t sky(D) Insert new
skyline objects
Considering the proportion of full-space skyline objects in the whole dataset, the above covers most cases of the updates
Updating CSC
t sky(D) and tnew sky(D) Key points:
The existing objects in CSC are NOT affected. No need to retrieve objects that are NOT in CSC (no disk access)
An example dataset with 2D space. In this case, ta is the full-space skyline.
ta
tb
ua
ub
Updating CSC
t sky(D) and tnew sky(D) Key points
The existing objects in CSC are NOT affected. No need to retrieve objects that are NOT in CSC (no disk access)
ta
tb
ua
ub
In the compressed skycube (CSC), tb is the skyline of ua because tb and ta overlap on dimension ua.
An example dataset with 2D space. In this case, ta is the full-space skyline.
Updating CSC
t sky(D) and tnew sky(D) Key points
The existing objects in CSC are NOT affected. No need to retrieve objects that are NOT in CSC (no disk access)
ta
tb
ua
ub
t
t sky(D) and tnew sky(D)If t sky(D) and tnew sky(D), tnew will fall within this area, which will NOT affect the existing skyline objects. As a result, the objects in CSC are NOT affected.
An example dataset with 2D space. In this case, ta is the full-space skyline.
In the compressed skycube (CSC), tb is the skyline of ua because tb and ta overlap on dimension ua.
Updating CSC
t sky(D) and tnew sky(D) Key points
The existing objects in CSC are NOT affected. No need to retrieve objects that are NOT in CSC (no disk access)
ta
tb
ua
ub
t
t sky(D) and tnew sky(D)If t sky(D) and tnew sky(D), tnew will fall within this area, which will NOT affect the existing skyline objects. As a result, the objects in CSC are NOT affected.
Only when tnew overlaps with some subspace (e.g. ua or ub), tnew will becomes the skyline of the corresponding cuboids.In this case, tnew is added into CSC.
tnew
tnew
An example dataset with 2D space. In this case, ta is the full-space skyline.
In the compressed skycube (CSC), tb is the skyline of ua because tb and ta overlap on dimension ua.
Updating CSC
t sky(D) and tnew sky(D) Two steps approach to update CSC
Compare tnew with existing full-space skyline objects (sky(D)). Determine the minimum subspaces of tnew
Can be determined by ANY dominating object in sky(D) (Why?).
ta
tb
ua
ub
t
t sky(D) and tnew sky(D)If t sky(D) and tnew sky(D), tnew will fall within this area, which will NOT affect the existing skyline objects. As a result, the objects in CSC are NOT affected.
Only when tnew overlaps with some subspace (e.g. ua or ub), tnew will becomes the skyline of the corresponding cuboids. In this case, tnew is added into CSC.
tnew
tnew
An example dataset with 2D space. In this case, ta is the full-space skyline.
In the compressed skycube (CSC), tb is the skyline of ua because tb and ta overlap on dimension ua.
Updating CSC
ta
tb
ua
ub
t
t sky(D) and tnew sky(D) Two steps approach to update CSC
Compare tnew with existing full-space skyline objects (sky(D)). Determine the minimum subspaces of tnew
Can be determined by ANY dominating object in sky(D) (Why?).
Another example with ta and tb as the full-space skylines.
Updating CSC
ta
tb
ua
ub
t
tnew
tnew
t sky(D) and tnew sky(D) Two steps approach to update CSC
Compare tnew with existing full-space skyline objects (sky(D)). Determine the minimum subspaces of tnew
Can be determined by ANY dominating object in sky(D) (Why?).
tnew will be the skyline of a subspace (e.g. ub) only when tnew overlaps with a full-space skyline in that subspace (e.g. overlaps with ta in ub ).That is, tnew lies on the red lines.
Another example with ta and tb as the full-space skylines.
Updating CSC
ta
tb
ua
ub
ttnew
t sky(D) and tnew sky(D) Two steps approach to update CSC
Compare tnew with existing full-space skyline objects (sky(D)). Determine the minimum subspaces of tnew
Can be determined by ANY dominating object in sky(D) (Why?).
If tnew is dominated by tb, the minimum subspaces of tnew must be the minimum subspaces of tb.E.g. If tnew is the skyline of ua, tb is also the skyline of ua as tb is a full-space skyline and is dominating tnew.
tnew will be the skyline of a subspace (e.g. ub) only when tnew overlaps with a full-space skyline in that subspace (e.g. overlaps with ta in ub ).That is, tnew lies on the red lines.
Updating CSC
ta
tb
ua
ub
t
tnew
t sky(D) and tnew sky(D) Two steps approach to update CSC
Compare tnew with existing full-space skyline objects (sky(D)). Determine the minimum subspaces of tnew
Can be determined by ANY dominating object in sky(D) (Why?).
Similarly, if tnew is the skyline of ub, ta is also the skyline of ub because ta is a full-space skyline and is dominating tnew.
If tnew is dominated by tb, the minimum subspaces of tnew must be the minimum subspaces of tb.E.g. If tnew is the skyline of ua, tb is also the skyline of ua
as tb is a full-space skyline and is dominating tnew.
Updating CSC
ta
tb
ua
ub
t
t sky(D) and tnew sky(D) Two steps approach to update CSC
Compare tnew with existing full-space skyline objects (sky(D)). Determine the minimum subspaces of tnew
Can be determined by ANY dominating object in sky(D) (Why?).
Similarly, if tnew is the skyline of ub, ta is also the skyline of ub because ta is a full-space skyline and is dominating tnew.
If tnew is dominated by tb, the minimum subspaces of tnew must be the minimum subspaces of tb.E.g. If tnew is the skyline of ua, tb is also the skyline of ua
as tb is a full-space skyline and is dominating tnew.
Therefore, to determine the minimum subspaces of tnew , we only need to consider the minimum subspaces of any full-space skylines that dominates tnew.
tnew
tnew
Updating CSC
t sky(D) and tnew sky(D) Two steps approach to update CSC
Compare tnew with existing full-space skyline objects (sky(D)). Determine the minimum subspaces of tnew
Can be determined by ANY dominating object in sky(D).
u1 u2 u3 u4
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
t7 1 3 4 1
t4 4 3 6 1
t9 2 2 3 7
Objects in the Compressed Skycube
Full-spaceskylines
Update object
Object t9 is not a full-space skyline object.
Updating CSC
t sky(D) and tnew sky(D) Two steps approach to update CSC
Compare tnew with existing full-space skyline objects (sky(D)). Determine the minimum subspaces of tnew
Can be determined by ANY dominating object in sky(D).
u1 u2 u3 u4
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
t7 1 3 4 1
t4 4 3 6 1
t9 2 2 3 7
Objects in the Compressed Skycube
Full-spaceskylines
Update object
We first compare object t9 with existing full-space skyline objects.
Object t9 is not a full-space skyline object.
The full-space skyline object t1 does NOT dominate object t9, continue to compare the next full-space skyline object.
Updating CSC
t sky(D) and tnew sky(D) Two steps approach to update CSC
Compare tnew with existing full-space skyline objects (sky(D)). Determine the minimum subspaces of tnew
Can be determined by ANY dominating object in sky(D).
u1 u2 u3 u4
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
t7 1 3 4 1
t4 4 3 6 1
t9 2 2 3 7
Objects in the Compressed Skycube
Full-spaceskylines
Update object
We first compare object t9 with existing full-space skyline objects.
Object t9 is not a full-space skyline object.
The full-space skyline object t5 dominates object t9, we then retrieve the minimum subspaces of t5. The retrieved subspaces are the candidates for the minimum subspaces of object t9.
The full-space skyline object t1 does NOT dominate object t9, continue to compare the next full-space skyline object.
Updating CSC
t sky(D) and tnew sky(D) Two steps approach to update CSC
Compare tnew with existing full-space skyline objects (sky(D)). Determine the minimum subspaces of tnew
Can be determined by ANY dominating object in sky(D).
u1 u2 u3 u4
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
t7 1 3 4 1
t4 4 3 6 1
t9 2 2 3 7
Objects in the Compressed Skycube
Full-spaceskylines
Update object
We first compare object t9 with existing full-space skyline objects.
Object t9 is not a full-space skyline object.
The full-space skyline object t5 dominates object t9, we then retrieve the minimum subspaces of t5. The retrieved subspaces are the candidates for the minimum subspaces of object t9.
Updating CSC
t sky(D) and tnew sky(D) Two steps approach to update CSC
Compare tnew with existing full-space skyline objects (sky(D)). Determine the minimum subspaces of tnew
Can be determined by ANY dominating object in sky(D).
u1 u2 u3 u4
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
t7 1 3 4 1
t4 4 3 6 1
t9 2 2 3 7
Objects in the Compressed Skycube
Object t9 is not a full-space skyline object.
t1 u1, u3
t5 u4, u1, u2, u1, u3
t6 u2, u3
t7 u1, u4
t4 u4
t9 u1, u2, u1, u3
Minimum subspace u4, u1, u2, u1, u3 are the candidates of the minimum subspaces of Object t9.
Minimum subspaces of t9 are u1, u2, u1, u3 . .
Updating CSC
t sky(D) and tnew sky(D) Two steps approach to update CSC
Compare tnew with existing full-space skyline objects (sky(D)). Determine the minimum subspaces of tnew
Can be determined by ANY dominating object in sky(D).
u1 u2 u3 u4
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
t7 1 3 4 1
t4 4 3 6 1
t9 2 2 3 7
Objects in the Compressed Skycube
Object t9 is not a full-space skyline object.
t1 u1, u3
t5 u4, u1, u2, u1, u3
t6 u2, u3
t7 u1, u4
t4 u4
t9 u1, u2, u1, u3
Minimum subspaceCompressed Skycube
Cuboid Skyline
u1 t7
u2 t6
u3 t6
u4 t5 , t7, t4
u1, u2 t5 , t9
u1 , u3 t1 , t5 , t9
Finally, we update the compressed skycube and insert object t9 into the corresponding cuboids.
Minimum subspaces of t9 are u1, u2, u1, u3 . .
Updating CSC
t sky(D) and tnew sky(D) Two steps approach to update CSC
Compare tnew with existing full-space skyline objects (sky(D)). Determine the minimum subspaces of tnew
Can be determined by ANY dominating object in sky(D).
u1 u2 u3 u4
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
t7 1 3 4 1
t4 4 3 6 1
t9 2 2 3 7
Objects in the Compressed Skycube
Object t9 is not a full-space skyline object.
t1 u1, u3
t5 u4, u1, u2, u1, u3
t6 u2, u3
t7 u1, u4
t4 u4
t9 u1, u2, u1, u3
Minimum subspaceCompressed Skycube
Cuboid Skyline
u1 t7
u2 t6
u3 t6
u4 t5 , t7, t4
u1, u2 t5 , t9
u1 , u3 t1 , t5 , t9
Finally, we update the compressed skycube and insert object t9 into the corresponding cuboids.
Minimum subspaces of t9 are u1, u2, u1, u3 . .
We don’t need to retrieve the objects that are NOT in the compressed skycube throughout the whole update process.i.e. No disk access in this case.
Updating CSC t sky(D) and tnew sky(D) Key points
The objects that are previously dominated by skylines are still dominated by the skylines after update.
ta
tb
ua
ub
t
An example with ta and tb as the full-space skylines.
Updating CSC t sky(D) and tnew sky(D) Key points
The objects that are previously dominated by skylines are still dominated by the skylines after update.
ta
tb
ua
ub
t
An example with ta and tb as the full-space skylines.
Since tnew is full-space skyline, it must falls in the red area.
Updating CSC t sky(D) and tnew sky(D) Key points
The objects that are previously dominated by skylines are still dominated by the skylines after update.
ta
tb
ua
ub
t
An example with ta and tb as the full-space skylines.
Since tnew is full-space skyline, it must falls in the red area.
The purple objects, which are not in the compressed skycube, will not become skyline of any dimension after update of t. (they are still dominated by some skylines)In another words, no need to retrieve these objects from disks, no disk access in this case.
Updating CSC t sky(D) and tnew sky(D) Key points
The objects that are previously dominated by skylines are still dominated by the skylines after update.
Existing skylines may be dominated by tnew.
ta
tb
ua
ub
t
Since tnew is full-space skyline, it must falls in the red area.
If tnew is updated to here, tb will be dominated by tnew, tb is then removed from the cuboid of full-space skylines, and the minimum subspaces of tb need to be updated.
tnew
The purple objects, which are not in the compressed skycube, will not become skyline of any dimension after update of t. (they are still dominated by some skylines)In another words, no need to retrieve these objects from disks, no disk access in this case.
Updating CSC t sky(D) and tnew sky(D) Key points
The objects that are previously dominated by skylines are still dominated by the skylines after update.
Existing skylines may be dominated by tnew.
u1 u2 u3 u4
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
t7 1 3 4 1
t4 4 3 6 1
t9 2 2 3 7
Objects in the Compressed Skycube
t10 1 3 1 3
Object t10 is updated. It was not a full-space skyline.
Updating CSC t sky(D) and tnew sky(D) Key points
The objects that are previously dominated by skylines are still dominated by the skylines after update.
Existing skylines may be dominated by tnew.
u1 u2 u3 u4
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
t7 1 3 4 1
t4 4 3 6 1
t9 2 2 3 7
Objects in the Compressed Skycube
t10 1 3 1 3
t1 u1, u3
t5 u4, u1, u2, u1, u3
t6 u2, u3
t7 u1, u4
t4 u4
t9 u1,u2 u1,u3
Minimum subspace We first compare t1 with t10.Since object t10 dominates t1 in the full-space, t1 is no longer the skyline of <u1,u3>. The minimum subspace <u1,u3> of t1 is removed.
Object t10 is updated. It was not a full-space skyline.
Updating CSC t sky(D) and tnew sky(D) Key points
The objects that are previously dominated by skylines are still dominated by the skylines after update.
Existing skylines may be dominated by tnew.
u1 u2 u3 u4
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
t7 1 3 4 1
t4 4 3 6 1
t9 2 2 3 7
Objects in the Compressed Skycube
t10 1 3 1 3
We then compare t5 with t10.Since object t10 dominates t5 in <u1,u3> only, we need to update the minimum subspaces of t5.
Object t10 is updated. It was not a full-space skyline.
t1 u1, u3
t5 u4, u1, u2, u1, u3
t6 u2, u3
t7 u1, u4
t4 u4
t9 u1,u2 u1,u3
Minimum subspace
Updating CSC t sky(D) and tnew sky(D) Key points
The objects that are previously dominated by skylines are still dominated by the skylines after update.
Existing skylines may be dominated by tnew.
u1 u2 u3 u4
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
t7 1 3 4 1
t4 4 3 6 1
t9 2 2 3 7
Objects in the Compressed Skycube
t10 1 3 1 3
We then compare t5 with t10.Since object t10 dominates t5 in <u1,u3> only, we need to update the minimum subspaces of t5.
Object t10 is updated. It was not a full-space skyline.
u1 u2 u3 u4
u1 u2 u3 u1 u2 u4 u1 u3 u4 u2 u3 u4
u1 u2 u3 u4
u1 u2 u1 u3 u1 u4 u2 u3 u2 u4 u3 u4
Cuboids that contain object t5
Remove <u1,u3> as t5 is no longer skyline of this subspace.
t1 u1, u3
t5 u4, u1, u2, u1, u3
t6 u2, u3
t7 u1, u4
t4 u4
t9 u1,u2 u1,u3
Minimum subspace
Updating CSC t sky(D) and tnew sky(D) Key points
The objects that are previously dominated by skylines are still dominated by the skylines after update.
Existing skylines may be dominated by tnew.
u1 u2 u3 u4
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
t7 1 3 4 1
t4 4 3 6 1
t9 2 2 3 7
Objects in the Compressed Skycube
t10 1 3 1 3
We then compare t5 with t10.Since object t10 dominates t5 in <u1,u3> only, we need to update the minimum subspaces of t5.
Object t10 is updated. It was not a full-space skyline.
u1 u2 u3 u4
u1 u2 u3 u1 u2 u4 u1 u3 u4 u2 u3 u4
u1 u2 u3 u4
u1 u2 u1 u3 u1 u4 u2 u3 u2 u4 u3 u4
Cuboids that contain object t5
Remove <u1,u3> as t5 is no longer skyline of this subspace.
t1 u1, u3
t5 u4, u1, u2, u1, u3
t6 u2, u3
t7 u1, u4
t4 u4
t9 u1,u2 u1,u3
Minimum subspace
Updating CSC t sky(D) and tnew sky(D) Key points
The objects that are previously dominated by skylines are still dominated by the skylines after update.
Existing skylines may be dominated by tnew.
u1 u2 u3 u4
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
t7 1 3 4 1
t4 4 3 6 1
t9 2 2 3 7
Objects in the Compressed Skycube
t10 1 3 1 3
We then compare t5 with t10.Since object t10 dominates t5 in <u1,u3> only, we need to update the minimum subspaces of t5.
Object t10 is updated. It was not a full-space skyline.
u1 u2 u3 u4
u1 u2 u3 u1 u2 u4 u1 u3 u4 u2 u3 u4
u1 u2 u3 u4
u1 u2 u1 u3 u1 u4 u2 u3 u2 u4 u3 u4
Cuboids that contain object t5
Update minimum subspaces of t5.
t1 u1, u3
t5 u4, u1, u2, u1, u3
t6 u2, u3
t7 u1, u4
t4 u4
t9 u1,u2 u1,u3
Minimum subspace
Updating CSC t sky(D) and tnew sky(D) Key points
The objects that are previously dominated by skylines are still dominated by the skylines after update.
Existing skylines may be dominated by tnew.
u1 u2 u3 u4
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
t7 1 3 4 1
t4 4 3 6 1
t9 2 2 3 7
Objects in the Compressed Skycube
t10 1 3 1 3
Similar for other objects, we update the skylines that are dominated by tnew.Then we find the minimum subspaces of tnew.The paper describes an algorithm to deduce the minimum subspaces of tnew from the previous skylines.
Object t10 is updated. It was not a full-space skyline.
t1 u1, u3
t5 u4, u1, u2, u1, u3
t6 u2, u3
t7 u1, u4
t4 u4
t9 u1,u2 u1,u3
Minimum subspace
t10 u1, u3
Minimum subspaces of t10 are u1, and u3 .
Updating CSC t sky(D) and tnew sky(D) Key points
The objects that are previously dominated by skylines are still dominated by the skylines after update.
Existing skylines may be dominated by tnew.
u1 u2 u3 u4
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
t7 1 3 4 1
t4 4 3 6 1
t9 2 2 3 7
Objects in the Compressed Skycube
t10 1 3 1 3
Object t10 is updated. It was not a full-space skyline.
t1 u1, u3
t5 u4, u1, u2, u1, u3
t6 u2, u3
t7 u1, u4
t4 u4
t9 u1,u2 u1,u3
Minimum subspace
t10 u1, u3
Minimum subspaces of t10 are u1, and u3 .
Compressed Skycube
Cuboid Skyline
u1 t7 ,t10
u2 t6
u3 t6 ,t10
u4 t5 , t7, t4
u1, u2 t5 , t9
u1 , u3 t1 , t5 , t9
Finally, we update the compressed skycube. Insert object t10 into the corresponding cuboids and remove the dominated objects.
Experimental Evaluation
Storage Comparison
Settings: Dimensionality (Full-space) – [4, 8]; default = 6. Cardinality – [100K, 500K]; default = 300K. Distribution: Independent, Corr, Anti-Corr.
Storage Comparison
Due to less number of duplicates in the CSC structure, CSC is less affected by cardinality than Skycube.
Logarithmic scale to reflect the exponential effect of the dimensionality.CSC is better than the Skycube in up to an other of magnitude.
Query Performance
Queries on the complete skycube do not involve computations, their time is not reported.
This set of experiments verifies that the query response of the CSC is indeed very fast.
Update Performance General update
Updates are from random objects in the whole dataset.
Skycube is re-computed from scratch.
CSC outperforms Skcube by several orders of magnitude. This is because the update scheme updates CSC incrementally and avoids many unnecessary computations when an objects’ update does not affect the CSC structure.
Update Performance General update
Updates are from random objects in the whole dataset.
Skycube is re-computed from scratch.
Full-space skyline update. Updates are from random
full-space skyline objects. For fair comparison, Skycube
is re-computed from existing skylines plus new candidates.
Conclusion
Conclusions
In the paper, the authours addressed the update support of the skycube in dynamic
environment, and provided an efficient and scalable solution for online skyline query system.
proposed a compact structure, the Compressed Skycube (CSC), with about 10% disk space of the Complete Skycube and fast query response.
proposed an object-aware update scheme, such that different updates trigger different amount of computation. The Compressed Skycube outperforms the Skycube in update by several orders of magnitude.
Thank you!
Tian Xia and Donghui Zhang. Refreshing the Sky: the Compressed Skycube with Efficient Support for Frequent Updates. SIGMOD 2006.
DB Seminar Schedule Seminar Schedule
=================================================================Chui Chun Kit 30/11/07Gong Jian Jim 7/12/07Loo Kin Kong 14/12/07Ngai Wang Kay Jackie 21/12/07Siu Wing Yan Angela 4/1/08Tam Ming Wai 11/1/08Tsang Pui Kwan Smith 18/1/08U Leong Hou Kamiru 25/1/08Wong Wai Kit 1/2/08Cui Yingjie Jason 15/2/08LEE King For 22/2/08Lin Zhifeng Arthur 7/3/08Yuan Wenjun Clement 14/3/08Zhang Shiming Simon 28/3/08Zhang Yiwei Kelvin 11/4/08LEE Yau Tat 18/4/08Pan Guodong Delvin 25/4/08
Please send the abstract to [email protected] one week before your talk
Our Motivations (2)
t4 4 3 6 1
t9 2 2 3 7
t7 1 3 4 1
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
u1 u2 u3 u4
t2 4 6 7 2
t3 9 7 5 6
t8 6 5 3 8
t7
Cuboid Skyline
u1
u2
u3
u4
u1 , u2
u1 , u3
u1 , u4
u2 , u3
u2 , u4
u3 , u4
u1 , u2 , u3
u1 , u2 , u4
u1 , u3 , u4
u2 , u3 , u4
u1 , u2 , u3 , u4
t5 , t6
t1 , t5 , t6 , t7
t1 , t5 , t6 , t7
t5 , t6 , t7
t5 , t6
t5 , t6
t6
t7
t1 , t5 , t6 , t7 , t9
t5 , t6 , t7 , t9
t5 , t7 , t4
t6
t6
Corresponding Skycube
Full-space skyline objects
Other skyline objects (not in full-space)
t1 , t5 , t6 , t7 , t9
u1 u2 u3 u4
Querying CSC
LEMMA 1: Given a query space Uq and an object t, if for any subspace Ui in mss(t), Ui Uq, then t is not in the skyline of Uq.
Lemma 1 implies two important facts:1) Only the existing cuboids that Uq need to be searched.2) No other cuboids need to be accessed or computed in the query process.
Example: Uq = u2, u3 , u4 , and t9 can be safely pruned.
t4 4 3 6 1
t9 2 2 3 7
t7 1 3 4 1
t1 3 4 2 5
t5 2 2 3 1
t6 6 1 1 3
t7
Cuboid Skyline
u1
u2
u3
u4
u1 , u2
u1 , u3 t1 , t5 , t9
t5 , t9
t5 , t7 , t4
t6
t6
Some properties of CSC
The number of non-empty cuboids is solely decided by sky(D). In other words, there does not exist a cuboid which
only contains objects not in sky(D). Each non-empty cuboid in CSC contains at least one
object in sky(D). Therefore, as long as the full-space skyline is
unchanged, no new cuboid will be added to CSC.