pirs: query verification on data streams ke yi, hong kong university of science and technology ...
Post on 18-Jan-2016
215 Views
Preview:
TRANSCRIPT
PIRS: Query Verification on Data Streams Ke Yi, Hong Kong University of Science and Technology Feifei Li, Florida State University Marios Hadjieleftheriou, AT&T Labs George Kollios, Boston University Divesh Srivastava, AT&T Labs
work done while the 1st and 2nd authors were working at AT&T labs.
Publishing Data and Outsourcing Query Service
2
NetworkNetwork
Gigascope:analysis tool by
IP Traffic Streamcoming from
0 1 1 0 0 1 … 1 1 0 …
statistics
Results
Revisiting the CISCO – AT&T Example
3
NetworkNetworkGigascopeIP Traffic Stream
0 1 1 0 0 1 … 1 1 0 …
statistics
lawyers: sign the trust agreementCould we help? (computer scientists)
Concrete Example
Continuous Query:
SELECT SUM(packet_size) FROM IP_trace
GROUP BY srcIP, destIP
Answer:
4
pm p3 p2 p1. . .
IP Stream:
: srcIP, destIP, packet_size
1 2 3 . . . n
5 10KB 2KB 150KB . . . 5KB
10 11KB 130KB 1MB . . . 20KB
13 . . .Tim
e
Groups
Continuous Query Verification (CQV) on Data Streams
5
1. Client register query2. Server reports answer
upon request Server maintains exact answer
Client maintains synopsis XBoth client
and server monitorthe same stream
Source of streams
Group 1
Group 2
Group 3
…
…
…
SELECT SUM(packet_size) From IP_TraceGROUP BY src_ip, dest_ip
The Model for the Stream
6
n
ii mv
1
9|1 7|iS 1|1 …
0VT 0 0 0…V1 V2 V3 Vn
9 0
Vi
710
T=1 T=2 T=3
agg_attribute | group_id
Continuous Query Verification: CQV
TVA
7
0VT 0 0 0…V1 V2 V3 Vn
9 0
Vi
710
9|1 7|iS 1|1 …
T=1 T=2 T=3
Update V
XT
Synopsis
Update X
0 0 2 0…V1 V2 V3 Vn
9 0
Vi
510 1
Alarm
TVA
0 0 0…V1 V2 V3 VnVi
710 1
no alarm
PIRS: Polynomial Identity Random Synopsis
,max2,max mnpmn
PZa
pnaaaVX nvvvT mod)()2()1()( 21
8
choose prime p:
chose a random number :
)()(?
TT VXVXA
raise alarm if not equal
o/w no alarm
)()()(:ilityDecomposab baba VXVXVVX
Incremental Update to PIRS
9
91 )1( aX
9|1 7|iS …
T=1 T=2
update to v1 update to vi
712 )( iaXX
An update to group i with value u could be done in logu time (exponential by squaring): )(1 iaXX
1|1
update to v1
123 )1( aXX
It Solves CQV problem!
TT WV
alarm no raisesobvously W,V if 1. WV if 2.
10
Theorem: Given any PIRS raises an alarm
with probability at least 1-δ
nwnx
wx
wxxwf
nvnx
vx
vxxvf )(2)2(1)1()( ,)(2)2(1)1()(
WV iff )()( xfxf wv
a polynomial with 1 as the leading coefficient is completely determinedby its zeroes
Due to the fundamental theorem of algebra.
)()( ,WV if xfxf wv happens at no more than m values of x
Since we have p>m/ δ choices for a: the probability that X(V)=X(W) is at most δ
Optimality of PIRS
11
Theorem: PIRS occupies O(log m/δ + log n) bits of space (3 words only at most, i.e., p, a, X(V)), spends O(1) time to process a tuple for count query, or O(log u) time to processa tuple for sum query.
Theorem: Any synopsis for solving the CQV problem witherror probability at most δ has to keep Ω(log minn,m/δ) bits.
Multiple Queries
12
Q1 Q2
X1 X2
Q1 Q2
X
9|1,8S …
update to v1 update to v8
Theorem: our synopses use constant space for multiple queries.
V1..n1V1..n2 V1..(n1+n2)
Handle the Load Shedding
13
Semantic Load Shedding: drop tuples from certain groups Small number of groups having errors
Random Load Shedding: All groups have small amount of errors
CQV with Semantic Load Shedding
14
Randomly drop certain tuples according to groups
9|1 7|i 2|j 1|1 4|k …5|1
Server claims at most γ number of groups have errors
To detect if more than γ groups having errors!
We have designed synopses using O(γ log 1/δ log n) bits of space and achieve the error probability at most δ
PIRSγ: An Exact Solution819.4for 1
21 cck
15
k mod p mody xie.g., ,,...,1 touniformly
,...,1 mapsfucntion hash t independen wise-pair a ,
k
nb
PIRS PIRS PIRS…
k buckets Alarm
v8
b(8)=2
If at least buckets raise alarms
PIRS PIRS PIRS…
…
log 1/δ
AlarmIf at least one layer raises alarms
21
PIRSγ: An Exact Solution
16
Theorem: PIRSγ requires O(γ2 log1/δ logn) bits, spendsO( log1/δ ) time to process a tuple and solves CQV with semantic load shedding.
Intuition on Approximation
17
number of errors
probability to raise alarm
γ
the ideal synopsis
γ- γ+
the approximation
PIRS±γ: An Approximate Solution
18
Theorem: PIRS±γ requires O(γ log1/δ logn) bits, spendsO(γ log1/δ ) time to process a tuple.
CQV with Random Load Shedding
19
Randomly drop tuples
All groups have small errors
To detect if any group has error greater than a claimed threshold
Theorem: Any synopsis solves this problem with errorprobability at most δ requires at least Ω(n) bits (reducingto the problem of estimating infinite frequency moment: the number of occurrence of the most frequent item).
Sliding Window and Other Queries It is easy to extend PIRS to work with sliding
window model since it is decomposable, i.e., X(v1+v2)=X(v1)*X(v2).
Other queries that can be transformed into Group By aggregation queries.
Details in the paper.
20
Some Experiments
21
We use real streams: World Cup Data (WC) IP traces from the AT&T network (IP)
We perform the following query: WC: Aggregate on response size and group
by client id/object id (50M groups) IP: Aggregate on packet size and group by
source IP/destination IP (7M groups) Hardware for the client:
2.8GHz Intel Pentium 4 CPU 512 MB memory Linux Machine
Detection Accuracy
22
groups ofnumber
actual not the groups, ofnumber potential by the determined is
105.0/ hence, ,10,1022 9101964
n
pmmn
Over 100,000 random attacks, PIRS identifies all of them.
Memory Usage of Exact
23PIRS using only constant 3 words (27 bytes) at all time.
Exact’s memory usage is linear and expensive.
Update Time (per tuple) of Exact
24
1. Exact is fast when memory usage is small.2. It becomes extremely slow due to cache misses and memory
swap operations.
Cache misses and memory swap
Running Time Analysis
25
WC IPs
Count 0.98 μs 0.98 μs
Sum 8.01 μs 6.69 μs
Average Update Time
IPs exhibits smaller update cost for sum query as the average value of u is smaller than that of WC
Multiple Queries: Exact Memory Usage
26PIRS always using only constant 3 words (27 bytes).
Exact’s memory usage is linear w.r.t number of queries and increasing over time.
Multiple Queries: Exact Update Time Per Tuple
27
Multiple Queries: PIRS Update Time Per Tuple
28
The Library
29
Download PIRS and other synopses at:
http://www.cs.fsu.edu/~lifeifei/pirs/
Conclusion Space and Update efficient synopsis for
verifying continuous group-by aggregation queries on streaming data;
Could be generalized to handle selection query, and sliding-window semantics;
How about more complicated queries?
30
Thanks!
31
Questions
Problem and Goals
32
Assumption: Client and DSMS observe the same stream
Problem: Client needs to verify the results
Goals: Be memory, update efficient Tolerance for a limited number of errors Tolerance for small errors Support multiple queries
Related Techniques to PIRS
33
Incremental Cryptography Block operation (insert, delete), cannot support
arithmetic operation Program Verification Server may pass the program execution but
simply return random outputs Fingerprinting Technique PIRS is a fingerprinting technique
CQV with Semantic Load Shedding
|),( ii vviVVE
),( iff VVEVV
VV if -1least at alarm raises s.t. synopsisDesign
34
),( iff VVEVV
VV if alarm no raises and
PIRS±γ: An Approximate Solution
)ln
1( wherec
VV
)ln
1( wherec
VV
35
Theorem: PIRS±γ: 1.raises no alarm with probability at least 1- δ on any
2.raises an alarm with probability at least 1- δ on any
For any c>-lnln2=0.367
Using the intuition of coupon collector problem
and the Chernoff bound.
PIRS±γ: An Approximate Solution
kk ln s.t.,k choose
36
numbers randomt independne wise-n , ...1nbb
k,...,1in ddistributeuniformly
PIRS PIRS PIRS…
k buckets Alarm
vi
bi=2
If all k buckets raise alarms
PIRS PIRS PIRS…
…
log 1/δ
AlarmIf majority layers raise alarms
Information Disclosure on Multiple Attacks
ron alarman raises PIRS |),( :witness RrVVW
WRVVW ),( :witness-non
PIRSby used seeds random of space :R
VVRVVW if , |),(|
37
VVRVVW if , |),(|
R
VV if
PIRS: X(V) on r
V turnsRe
Learns nothing about ralarman received and if VV ),( VVW
),( Learns VVWr
Insight: server could potentially gets rid of δ portion of seeds from each notified failed attack!
Information Disclosure on Multiple Attacks
38
Bob
Theorem: For the total of k attacks made by Bob to PIRS, the probability that none of them succeeds is at least 1-kδ.
Proof of the Optimality
..., 21)( ffFff ifp
i
39
MUfX : )()( assuming 21 fpfp
F fromfunction thedescribe tobits log
andoutput for bits logleast at needs
F
MX
)()(|,
VfVfFfFVV
VVF
fp
,f
)( :Xfor
Proof of the Optimality
)(1
k
i ifp
n2U
)log(log :else nF
)n(MlogF
40
kffFk ...consider and ,1 1
functionsk theseof outputs for the nscombinatio possible M of totalk
kMU hole,pigeon by
M1)logF(Ulog
))((log)log()1(Flog nMn
top related