![Page 1: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/1.jpg)
Efficient Filtering in Publish-
Subscribe Systems using BDDAlexis Campailla, Sagar Chaki, Edmund Clarke, Somesh Jha, Helmut Veith
Prepared by Nabeel Mohamed
4/16/08
1
![Page 2: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/2.jpg)
Outline
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
2
![Page 3: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/3.jpg)
Research Problem at Hand
Loosely-coupled interactions in
publish-subscribe systems allows to
build very large scale systems
However, filtering techniques used are
a major bottleneck
Efficiency of the filtering technique
plays a major role in scalability
Whatever technique we use should be
provably correct
3
![Page 4: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/4.jpg)
Major Contributions
A Precise semantics to match
messages (events) to subscriptions
(subscription queries)
Modeling filtering as a satisfiability
check in BDD
4
![Page 5: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/5.jpg)
Roadmap
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
5
![Page 6: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/6.jpg)
Publish-Subscribe Systems
Publisher
Publisher
Publisher
SubscriberNotify()
SubscriberNotify()
SubscriberNotify()
Distributed
Subscription
Mgmt and Routing
Distributed
Content Routers
Notify()
Subscribe()
Unsubscribe()
publish
publish
notify
subscribe
unsubscribe
6
![Page 7: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/7.jpg)
Publish-Subscribe Systems
Publishers and Subscribers are
loosely coupled
◦ Space decoupled
◦ Time decoupled
◦ Synchronization decoupled
Content routers (brokers) form a
structured p2p system
Scalable Systems
7
![Page 8: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/8.jpg)
Message (Event) Filtering
Filtering
◦ Matching incoming messages (events) generated by Publishers with subscription criteria
◦ A main task of content routers (brokers) –filtering engine
Content-based pub-sub systems routes messages (events) based on the content itself
Example: Filter Quotes with symbol = Google and offer price < 400 in a Financial ticker.
8
![Page 9: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/9.jpg)
Example Pub-Sub Systems
Stock market feeds
◦ For delivery of financial data such as
stock quotes, trade reports, news, etc. to
customers
◦ OPRA feed disseminates more than
100,000 quotes/sec
Sensor networks
Network traffic analysis
Transaction log analysis
9
![Page 10: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/10.jpg)
Desirable Functions of a Filtering
Engine Correctness:
◦ Correctly matching incoming messages with subscription criteria
Expressiveness:◦ Rich subscription language
Efficiency:◦ Real time matching
Scalability:◦ Handling a large number of subscriptions
Dynamic:◦ Capability to add and remove subscriptions
online
10
![Page 11: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/11.jpg)
Related Work
Most existing systems support only conjunctive subscriptions
◦ GRYPHON
◦ SIENA
◦ Le Subscribe
Example: The following subscription requires 27 GRYPHON-like subscriptions while BDD handles it naturally.
11
![Page 12: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/12.jpg)
Related Work
Some systems have higher expressive power at the expense of less efficient filtering.◦ ELVIN
Can we come up with an efficient filtering technique while providing an expressive subscription language?
BDD based filtering may be employed in existing systems to improve matching efficiency
12
![Page 13: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/13.jpg)
Roadmap
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
13
![Page 14: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/14.jpg)
Subscription Query Language
The language used to describe
subscription criteria or subscriptions
Three Subscription Languages of
increasing complexity
◦ SiSL – Simple Subscription Language
◦ StSL – Strict Subscription Language
◦ DeSL – Default Subscription Language
14
![Page 15: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/15.jpg)
Messages and Attributes
V = <v1, .., vn> = a finite sequence of
attributes
Each attribute vi has a type
Each attribute vi has a corresponding
domain
Event schema =
15
![Page 16: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/16.jpg)
Messages and Attributes
A message = an assignment of values
to some (not necessarily all) of the
attributes
Formally, a message is a mapping m
such that for each attribute v, either
(m does not define v) ≡
A message is total if it defines all
attributes in V.16
![Page 17: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/17.jpg)
Messages and Attributes –
Example 1 Let V = <company, product, price>
over the event schema <STR, STR, DBL>
Consider the following message:<company> IBM </company><product>PC AT, 20 Mhz, 256 KB RAM</product><price>5000</price>
This describes a total message m1
where m1(company) = “IBM”, m1(product) = “PC AT, 20 Mhz, 256 KB RAM” and m1(price) = 5000.
17
![Page 18: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/18.jpg)
Messages and Attributes –
Example 2 Consider the following message:
<company> IBM </company>
<product>PC AT, 20 Mhz, 256 KB RAM</product>
This describes a different message m2
which is not total (i.e. partial), since
m2(price) = *.
18
![Page 19: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/19.jpg)
Three Subscription Languages
SiSL – Simple Subscription Language
◦ All messages are total
StSL – Strict Subscription Language◦ Messages define all attributes that occur in
the query (subscription criteria)
◦ SiSL is a subset of StSL
DeSL – Default Subscription Language
◦ All attributes are initialized to default values (e.g. using NULL)
◦ Extends the functionality of SiSL to heterogeneous message formats
19
![Page 20: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/20.jpg)
Formalizing SiSL Queries
(Subscriptions) Atomic formulas
Let v be an attribute in V
If and
then the formulas v = c, v < c, c < v
are atomic formulas.
If , atomic formulas are
defined similarly.
If
then the formulas are
atomic formulas. ( ≡ substring)20
![Page 21: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/21.jpg)
Formalizing SiSL Queries
(Subscriptions) Atoms = the set of atomic formulas
A Query is a Boolean combination
of atomic formulas
= the set of attributes occurring
in
= the set of atomic formulas
occurring in
21
![Page 22: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/22.jpg)
Formalizing SiSL Queries
(Subscriptions) Abbreviations
22
![Page 23: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/23.jpg)
Example: SiSL Query
The following SiSL query matches all
messages for 1000 Mhz PCs
manufactured by IBM, Dell or Siemens
which cost at most $1000.
23
![Page 24: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/24.jpg)
Formalizing SiSL Queries
(Subscriptions) = The instantiation of a query by
a message m.
Definition:
is defined as the query obtained
from by replacing all variables
for which m(v) ≠ * by m(v).
Definition:
The SiSL query matches the total
message m if evaluates to true.
24
![Page 25: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/25.jpg)
Formalizing StSL Queries
(Subscriptions) StSL (Strict Subscription Language) is
generalization of SiSL.
Definition: adequacy
A message m is adequate for a query
, if for all , it holds that m(v)
≠ *.
Definition:
The query matches m, iff m is
adequate for and
25
![Page 26: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/26.jpg)
Formalizing DeSL Queries
(Subscriptions) DeSL (Default Subscription Language)
is the most general out of the three.
For each attribute vi, there’s a default
value
Definition:
The default extension of m is
defined as follows.
26
![Page 27: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/27.jpg)
Formalizing DeSL Queries
(Subscriptions) Definition:
The query matches the message m
under default semantics if (i.e.
evaluates to true)
27
![Page 28: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/28.jpg)
Roadmap
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
28
![Page 29: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/29.jpg)
BDDs (Binary Decision
Diagrams) Notations
A = a set of propositional variables
= a linear ordering (variable
ordering) on A
= An ordered BDD over A, whose
non-terminal nodes are labeled by
variables in A, terminals by 0 or 1.
= The Boolean function
represented by node v in
29
![Page 30: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/30.jpg)
Properties of BDDs
Each non-terminal node v has two out-
edges: low edge and high edge
Let a non-terminal node v with label ai
has successors at the low and high
edges u and w respectively. Then,
≡
Size = # nodes in the BDD
30
![Page 31: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/31.jpg)
Example: BDD
The following BDD represents the
Boolean function x AND ( y OR z).
The variable ordering is
31
![Page 32: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/32.jpg)
Shared BDDs (SBDDs)
While OBDDs represent one Boolean function, SBDDs represent multiple Boolean functions.
SBDD is a collection of component OBDDs respecting same variable ordering.
SBDD has a set of output nodes Vo = {o1, …, on} each corresponding to Boolean functions <f1,…, fn> respectively.
32
![Page 33: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/33.jpg)
SBDDs
Every root node of component
OBDDS Vo
Notation:
Denotes the BDD together with its
output nodes {o1, …, on}
is polynomial time
computable from any other shared
BDD over A for <f1,…, fn>
33
![Page 34: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/34.jpg)
Example: Shared BDD
Node 1 represents
Node 2 represents
Node 3 represents
34
![Page 35: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/35.jpg)
BDD Data Structure
A BDD with n nodes is represented as a graph whose vertices are the natural numbers 1,…, n.
The adjacency relationship is described by an array of size n.
ith element = (low[i], high[i], label[i], value[i])◦ low[i] = low successor of i◦ high[i] = high successor of i
◦ label[i] = label of i◦ value[i] = used later to store the result of the
BDD evaluation corresponding to i.
35
![Page 36: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/36.jpg)
BDD Evaluation
The above algorithm computes the
value of each node in under the
assignment where
= = value of ith component36
![Page 37: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/37.jpg)
BDD Evaluation
Notice that we can compute the value
of Boolean functions associated with
each output node in one pass.
37
![Page 38: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/38.jpg)
BDD Restrictions
The idea is to restrict the possible
truth assignments such that
external constraint f (a Boolean fn
over A) evaluates to true under
Definition: f-restriction
38
![Page 39: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/39.jpg)
Roadmap
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
39
![Page 40: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/40.jpg)
Query BDDs
Key Idea
◦ Represent many subscription queries by a
single shared BDD whose nodes
correspond to atomic sub-formulas of the
queries.
◦ Messages are matched against queries
by simply running EvalBDD on the shared
BDD.
40
![Page 41: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/41.jpg)
Query BDDs
, a sequence of queries
over the set of attributes V
A = , the set of atomic
sub-formulas of the queries.
is the set of propositional variables
such that each atomic sub-formula a
in A is assigned a propositional
variable
= Boolean query obtained by
substituting each a with 41
![Page 42: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/42.jpg)
Example: Query BDDs
Let & two subscriptions received
Then, =
Three atomic sub-formulas => Three
propositional variables
42
![Page 43: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/43.jpg)
Example: Query BDDs
Let the variable order be
SBDD corresponding
to the queries
43
![Page 44: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/44.jpg)
Query Matching: SiSL
Use EvalBDD algorithm for query
matching
A query Qi is considered matched if
the BDD node corresponding to Qi
evaluates to 1.
Bottom-up evaluation makes sure sub-
queries are evaluated only once.
44
![Page 45: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/45.jpg)
Query Matching: DeSL
Same as handling complete
messages
When a message received, it is
extended to a total message before
performing the matching.
45
![Page 46: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/46.jpg)
Query Matching: StSL
Recall that a message m matches a
subscription Q iff m is adequate for Q
and m satisfies Q.
Can use a modified EvalBDD to
perform faster matching
Key Ideas
◦ An undefined atom renders all sub-
formulas in which it occurs undefined.
◦ Treat * as new value undefined
46
![Page 47: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/47.jpg)
Query Matching: StSL
MVEvalBDD for StSL is significantly
faster than EvalBDD for SiSL
47
![Page 48: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/48.jpg)
Roadmap
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
48
![Page 49: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/49.jpg)
# Nodes in SBDD vs. #
Subscriptions
Number of nodes scale almost linearly
◦ High scalability
Restriction further reduces node count,
minimizing memory requirements
49
![Page 50: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/50.jpg)
Matching time for SiSL and StSL
Inputs: Number of subscription queries and message density (how total)
Partial messages can be matched quickly.
Time for StSL queries
50
![Page 51: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/51.jpg)
Roadmap
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
51
![Page 52: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/52.jpg)
Variable Ordering vs. BDD size
Variable ordering has a tremendous
influence on BDD size.
52
![Page 53: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/53.jpg)
Pros
Introduces a well-formed semantics to
describe the matching process in
publish-subscribe systems
Matching as a satisfiability checking in
SBDD allows to incrementally check
multiple subscriptions
Scalable
StSL is more efficient than SiSL
53
![Page 54: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/54.jpg)
Cons/Improvements
Does not describe any heuristics to select node ordering (NP-hard);
◦ Can we order based on the significance of the attributes involved?
Does not explore possibility of eliminating redundancies due to semantically related atomic sub-formulas (e.g.: price = 100 and price > 80) (again NP-hard)
◦ Can we further reduce the node count exploiting the semantics without causing side effect?
Efficiency of matching is not compared with existing systems
54
![Page 55: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/55.jpg)
Conclusion
Two major contributions
◦ A Precise semantics to match messages
to subscriptions
◦ Modeling filtering as a satisfiability check
in BDD
55
![Page 56: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/56.jpg)
Questions
56
![Page 57: Efficient Filtering in Pub-Sub Systems using BDD](https://reader034.vdocuments.site/reader034/viewer/2022052601/558fc1541a28abdc668b465d/html5/thumbnails/57.jpg)
Thank You
57