lahar: extracting events from probabilistic streams

47
LAHAR: Extracting Events from Probabilistic Streams Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Upload: armine

Post on 10-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

LAHAR: Extracting Events from Probabilistic Streams. Chris Re, Julie Letchner , Magdalena Balazinska and Dan Suciu University of Washington. What is a Lahar ?. This is a Lahar. It’s a massive, fast stream of dirt(y data). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: LAHAR: Extracting Events from Probabilistic Streams

LAHAR: Extracting Events from Probabilistic Streams

Chris Re, Julie Letchner,

Magdalena Balazinska and Dan Suciu

University of Washington

Page 2: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re2

What is a Lahar?

This is a Lahar

May 18, 1980 ~ 8:27am … a few minutes later

It’s a massive, fast stream of dirt(y data)

Our system, Lahar, processes queries on massive, dirty streams of data

Page 3: LAHAR: Extracting Events from Probabilistic Streams

Event Queries

Lahar -- SIGMOD 2008 -- Christopher Re

3

C B

A

DE

Motivating App: RFID Event queries as Cayuga, Sase and Snoop

Complex sequences using projections, predicates,…

Joe entered office 422 at t=8

Query: “Alert when Joe enters 422”

i.e. Joe outside 422, inside 422

Page 4: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re4

Challenges: Tracking Joe’s Location

6th Floor in CS building

Blue ring is Joe’s Location

Antennas

Page 5: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re5

6th Floor in CS building

Challenges: Tracking Joe’s Location

Blue ring is Joe’s Location

Antennas Two Problems:1. Missed Readings2. Granularity Mismatch

Propose: infer location, keep probs & query with Lahar Model Based View [Deshpande et al] of an HMM

Lahar retains probabilities, achieves higher quality (P/R) and is still efficient.

Page 6: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re6

Outline RFID streams to probabilistic streams Lahar queries on probabilistic streams Query algorithms: Regular and Extended Regular Experiments

Page 7: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re7

Tracking Joe’s Location

Blue ring is ground truth

Antennas 6th Floor in CS building

Page 8: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re8

Probabilities via particle filter

Each orange particle is a guess of Joe’s location

Blue ring is ground truth

Antennas

Particles guess many locations per timestep, so data are uncertain

6th Floor in CS building

Page 9: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re9

Tag t Loc P

Joe 7 422 0.4

Hall3 0.4

Hall4 0.2

Joe 8 422 0.6

Hall3 0.2

Hall4 0.2

Sue 7 … …

From particles to a probabilistic stream

At(tag,loc)

Query Particle Filter output via At – a model based view

Page 10: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re

(0.4+0.2) * 0.6 = 0.36

Tag t Loc P

Joe 7 422 0.4

Hall3 0.4

Hall4 0.2

Joe 8 422 0.6

Hall3 0.2

Hall4 0.2

Sue 7 … …

Semantics of the Model

10

At(tag,loc)

Tag t Loc

Joe 7 Hall4

Joe 8 422

Sue 7 …

Prob = 0.2 * 0.6 * …

“Joe enters 422” @ t=8A query q returns the probability that q is true at each time t

possible stream (worlds)

Probability outside 422 (in Hall3,Hall4)

Page 11: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re11

Outline RFID streams to probabilistic streams Lahar queries on probabilistic streams Query algorithms: Regular and Extended Regular Experiments

Page 12: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re12

(` ',` 4 ') (` ', 4̀ 2 '); 2At Jo At Je H oeall

Lahar Queries by Example

Alert when Joe is in hallway 4 and later in office 422

Inspired by Cayuga [Demers et al 2006, White et al 2007]

Page 13: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re13

Joe in 422

(` ',` 4 ') (` ', 4̀ 2 '); 2At Jo At Je H oeall

Lahar Queries by Example

Alert when Joe is in hallway 4 and later in office 422

Joe in Hall4 Joe in 422

Inspired by Cayuga [Demers et al 2006, White et al 2007]

Page 14: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re14

Joe in 422

(` ',` 4 ') (` ', 4̀ 2 '); 2At Jo At Je H oeall

Lahar Queries by Example

Alert when Joe is in hallway 4 and later in office 422

Joe in Hall4 Joe in 422

Inspired by Cayuga [Demers et al 2006, White et al 2007]

`422' (` ',` 4 ')( ; (` ', ))l At Joe H At Joeall l

Alert when Joe is in hallway 4, and immediately in office 422

Page 15: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re15

Joe in 422

(` ',` 4 ') (` ', 4̀ 2 '); 2At Jo At Je H oeall

Lahar Queries by Example

Alert when Joe is in hallway 4 and later in office 422

Joe in Hall4 Joe in 422

Inspired by Cayuga [Demers et al 2006, White et al 2007]

`422' (` ',` 4 ')( ; (` ', ))l At Joe H At Joeall l

Alert when Joe is in hallway 4, and immediately in office 422

Joe in Hall4 Joe in 422

Joe

Challenge with probabilities: Naïve approach is exponential; unavoidable (#P)

Page 16: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re16

( (` ',` 4 '); (` ', ))4̀22 '

At Joe Hall At Joe ll

Regular Queries (Efficient, streamable) Alert when Joe enters 422

Extended Regular (Efficient, streamable) Alert when anyone enters 422

A hierarchy of Lahar queries

Page 17: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re17

( (` ',` 4 '); (` ', ))4̀22 '

At Joe Hall At Joe ll

`422' ( ( ,` 4 '); ( , ))l At p Hall At p l

A hierarchy of Lahar queries

Regular Queries (Efficient, streamable) Alert when Joe enters 422

Extended Regular (Efficient, streamable) Alert when anyone enters 422

Safe (Efficient, but not streamable) Unsafe (Inefficient)

Page 18: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re18

Outline RFID streams to probabilistic streams Lahar queries on probabilistic streams Query algorithms: Regular and Extended Regular Experiments

Page 19: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re19

Joe

Review: A non-probabilistic example

Alert me when Joe enters 422

`422' (` ',` 4 '( ; (` ', )) )l A Att Joe Hal Joe llq

Tag T Loc

Joe 7 Hall 4

Joe 8 422

Tag T Loc

Joe 7 Hall 4

Joe 8 423

Accept at t = 8

{}

{1}

{2}

{}

{1}

{}

Final

Joe in Hall4 Joe in 4221 2

Page 20: LAHAR: Extracting Events from Probabilistic Streams

… now with probabilities

Lahar -- SIGMOD 2008 -- Christopher Re

Joe Final

Joe in Hall4 Joe in 4221 2

`422' (` ',` 4 '( ; (` ', )) )l A Att Joe Hal Joe llq

Accept t=8 with p = 0.3

Alert me when Joe enters 422

{} 1.0

{} 0.5, {1} 0.5

{} 0.65, {1} 0.05, {2} 0.3

Distribution on States

Tag T Loc P

Joe 7 Hall4 0.5

Joe 8 423 0.3

422 0.6

Page 21: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re21

Lies in the preceding slides… (technical details) Richer predication: “Alert when Joe enters any

office”

Translate query and input into an alphabet

Joe Final

Joe in Hall4 Joe in 422

1 2

Key Technical Detail: Alphabet is small in data Streamable

See paper for compilation

Page 22: LAHAR: Extracting Events from Probabilistic Streams

22

`422' ( ( ,` 4 '); ( , ))lq At p Hall At p l

Extension to Extended regular

Lahar -- SIGMOD 2008 -- Christopher Re

“Alert when anyone enters 422”

Page 23: LAHAR: Extracting Events from Probabilistic Streams

23

`422 '[ ` '] ( (` ',` 4 '); (` ', ))lq p Tom At Tom Hall At Tom l `422 '[ ` '] ( (` ',` 4 '); (` ', ))lq p Joe At Joe Hall At Joe l

`422' ( ( ,` 4 '); ( , ))lq At p Hall At p l

Extension to Extended regular

Lahar -- SIGMOD 2008 -- Christopher Re

Algorithm: (Obs1) suggests run automaton for each person (Obs2) suggests multiply to get prob any is true

Space = O(# persons), not # timesteps: can stream

“Alert when anyone enters 422”

(Obs 1) Each query is regular (Obs 2) disjoint sets of eventsHence, probabilistically independent

Page 24: LAHAR: Extracting Events from Probabilistic Streams

Summary of Contributions Regular Queries (Efficient, streamable)

Compiled to an automaton,streaming, O(1) space

Extended regular (Efficient, streamable) Streaming with O(m) space, i.e. # of persons.

See paper for Markovian correlations, more sophisticated predication, complete compilation and static analysis algorithms

Safe (Efficient, but not streamable) Unsafe (Inefficient, most #P-hard)

Page 25: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re25

Outline RFID streams to probabilistic streams Lahar queries on probabilistic streams Query algorithms: Regular and Extended Regular Experiments

Page 26: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re26

Experimental Setup

Quality: How is P/R affected by keeping probs? 52 objects, 352 locations, 10k sq. ft.

2x30min trace with 10 min break in between Participants marked down true locations

Page 27: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re27

2 1( ), ( ) ( ) 1 2( ( ( , ); ( , ))Person p Coffee l Hallway l At p l At p l

Experimental Setup

Quality: How is P/R affected by keeping probs? 52 objects, 352 locations, 10k sq. ft.

2x30min trace with 10 min break in between Participants marked down true locations “Alert when anyone enters a coffee room”

Baseline: Most Likely Estimate (MLE) Each timestep/Each person: most likely location

Page 28: LAHAR: Extracting Events from Probabilistic Streams

0

0.2

0.4

0.6

0.8

1

Quality: Realtime – Improve over MLE?

Lahar -- SIGMOD 2008 -- Christopher Re28

Declare an event “true”, if its Pr > threshold Vary threshold

Precision

0

0.2

0.4

0.6

0.8

1Recall

0

0.2

0.4

0.6

0.8

1 F1

10% improvement in F1

Page 29: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re29

Performance: Is the cost too high?Synthetic Data – Same query

Page 30: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re30

Related Work Event Queries – Deterministic

Cayuga, SASE, SnoopIB

Model-Based Views BBQ, recently, Kanagal et al ICDE 08

Probabilistic Databases Mystiq, Trio, MayBMS, Maryland, Purdue,MCDB

Particle Filters on HMMs Doucet, Godsill

Page 31: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re31

Conclusion Showed Lahar

Processed output of several inference tasks (HMMs) Applies more generally than just RFID

Quality (F1) gains by keeping probability

Performance usable in real-time Lots of concurrent tags No indexing!

Page 32: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re32

Page 33: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re33

Overview of Regular Query Algorithm

1. Compile an event query q1. Automaton (A) over a language L

2. Mapping (M) events to subsets of L

2. Runtime – Input is set of events E1. Map E into subsets of L via M

2. Maintain set of possible states of A

Deterministic Probabilistic

stays same

stays same

distribution

distribution

Size of distribution depends only on the query, q.

NB: example to follow

For details, see paper

Page 34: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re34

Why are ER queries hard? Regular Queries ~ Regular Expressions

Mapping is non-trivial Inspired by Cayuga [Demers et al. 06]

Queries have #P-combined complexity Encode mDNF as regular expression

Intuition: n-sized automaton leads to Extended regular ~ 1 NFA per/person

k persons implies O(k)-size automaton Exponential cost

time(2 )n

When ER, can avoid blowup

Page 35: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re35

Regular and Extended Regular Query is regular if no variable is shared between

subgoals

Query is extended regular if any variable shared by two subgoals, is shared by all subgoals

p is shared between subgoals

502 ( (' ', '501'); (' ', ))l At Joe At Joe l

502 ( ( , 5̀01 ); ( , ))l At p At p l

Page 36: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re36

Correlations

Page 37: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re37

Sequencing by example Sequencing is parameterized [Cayuga]

502' ( ( , 5̀01'); ( , ))l At p At p l

( ,501)Joe ( ,502)Bob ( ,502)Joe

Time

( ,503)Joe

Semicolon means “the next event among those that match next goal”

Semicolon is not “after”

Page 38: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re38

Compilation by example Each goal “corresponds” to two letters:

move (m) – the query should advance accept (a) – the next subgoal accepts

1 50` 2' ( ( , 5̀01 ); ( , ))lq At Joe At Joe l

1 1 1 2 2{m , , , }L a m a

1a 2a

2{ }m1 1 2( ,501) { , , }Joe m a m

2 2( ,502) { , }Joe m a

Any other maps to empty set0 2( , ) { }Joe l m

Final

Does not contain

Does contain

qM

Page 39: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re39

Subtle example..

What about:

1 50` 2' ( ( , 5̀01 ); ( , ))lq At Joe At Joe l 1 1 1 2 2{m , , , }L a m a

1a 2a

2{ }m

1 1 2( ,501) { , , }Joe m a m

2 2( ,502) { , }Joe m a

Any other maps to empty set0 2( , ) { }Joe l m

Final

Does not contain

Does contain

1M

2 ( , 5̀01 ); ( , 5̀02 ')q At Joe At Joe

0( , )Joe l

2M

Page 40: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re40

CUT II

Page 41: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re41

Motivating Apps RFID apps

Diary and Active Calendar Application. Alert if I go to a database meeting.

Supply chain Alert if Mach 3 razors are being stolen

Many independent HMMs Elder care [Intel/UW]

Alert if elder takes their medicine with water Activity Recognition Financial applications on predictive HMM

Alert if head-and-shoulders market

Page 42: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re42

Compile Select and Filter

Intuition: goal maps to two letters: match (m) : matches filter accept (a) : accepted by select

(` ', 5̀01'); (` ', 5̀02 ')filterq At Joe At Joe

`502' ( (` ', 5̀01 ); (` ', ))select lq At Joe At Joe l

1 1 2 2{m , , , }L a m a

1a 2a

2{ }m Final

Does not contain

Does contain

language and automaton are the same for both queries

Page 43: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re43

Wrinkle in the language:Filter v. Selection

“Alert next time Joe is in 502 after he is in 501”

(` ', 5̀01'); (` ', 5̀02 ')filterq At Joe At Joe

`502' ( (` ', 5̀01 ); (` ', ))select lq At Joe At Joe l

Time

Yes

No

( ,501)Joe ( ,502)Joe( ,503)Joe

“Alert if the next place Joe is in after 501 is 502”

At

Page 44: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re44

Recap of Algorithms Regular Queries

Compiled them to an NFA, then used image Data complexity O(1)

Extended regular Several regulars multiplied together Depends on number of distinct people in the data, not

number of time steps.

Page 45: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re45

Text1 Euclid Eculid Euclid Euclid Euclid Euclid Symbol

Page 46: LAHAR: Extracting Events from Probabilistic Streams

Lahar -- SIGMOD 2008 -- Christopher Re46

(` ',` 4 ') (` ', 4̀ 2 '); 2At Jo At Je H oeall

`422' (` ',` 4 ')( ; (` ', ))l At Joe H At Joeall l

Lahar Queries by Example

Alert when Joe is in hallway 4 and later in office 422

Joe in Hall4 Joe in 422

Alert when Joe is in hallway 4, and immediately in office 422

Joe in Hall4 Joe in 422

Inspired by Cayuga [Demers et al 2006, White et al 2007]

Joe

Joe in 422

Challenge with probabilities: Naïve approach is exponential; unavoidable (#P)

Page 47: LAHAR: Extracting Events from Probabilistic Streams

47

Quality: Archived – Improve over Viterbi?

Lahar -- SIGMOD 2008 -- Christopher Re

Smoothing v. Viterbi (MAP) Lahar tracks of Markovian Correlations Viterbi leverages correlations for MAP estimate

0

0.2

0.4

0.6

0.8

1Precision Recall F1

0000000000000000000000000000000000000000000000000000

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

Approx ~30% gain in F1