scaling construction of low fan-out overlays for topic-based publish/subscribe systems

31
MIDDLEWARE SYSTEMS RESEARCH GROUP Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems Chen Chen 1 joint work with Roman Vitenberg 3 , Hans-Arno Jacobsen 1,2 1 Department of Electrical and Computer Engineering 2 Department of Computer Science University of Toronto 3 Department of Informatics University of Oslo ICDCS 2011 1

Upload: bing

Post on 12-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems. Chen Chen 1 joint work with Roman Vitenberg 3 , Hans-Arno Jacobsen 1,2 1 Department of Electrical and Computer Engineering 2 Department of Computer Science University of Toronto - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP

Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

Chen Chen 1

joint work with Roman Vitenberg 3, Hans-Arno Jacobsen 1,2

1 Department of Electrical and Computer Engineering2 Department of Computer Science

University of Toronto

3 Department of InformaticsUniversity of Oslo

ICDCS 2011 1

Page 2: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP

Example: pub/sub

Interests: IBM

Interests: IBM

Interests: Microsoft

2

<Microsoft, price = 50>

<IBM, price = 100>

ICDCS 2011

Page 3: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP

Pub/Sub

• A communication paradigm– Subscribers express their interests– Publishers disseminate messages

• Many applications and industry standards– Application integration, financial data dissemination, RSS feed distribution, business process management– WS Notifications, WS Eventing, OMGs’ Real-time Data Dissemination Service

• Topic-based pub/sub– TIBCO RV– Google’s GooPS

ICDCS 2011 3

Page 4: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP

Two directions for pub/sub

Design of routing protocols

• The design of protocols so that publications and subscriptions are sent most efficiently across the overlay network.

• G. Li et al., ICDCS’08• M. Castro et al., JSAC’02

Construction of overlay• The construction of the

overlay topology such that network traffic is minimized.

• Chockler et al., PODC’07• Onus et al., INFOCOM’09

ICDCS 2011 4

Page 5: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP

Desirable properties for overlays

• Low average node degree• Low maximum node degree• Low diameter• Topic-connectivity• Efficiency to construct• Adaptability to churn• Ease of distributed implementation

ICDCS 2011 5

V5

V1

{b,c,d}

V2

{a}

{b,d}

V4

{a,b}

V3

{a,c}

Page 6: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP Our contributions

6

Previous greedy algorithm

High runtime cost

Full knowledge requirement

Centralized operation (difficult to decentralize)

Our divide-and-conquer algorithm

Low runntime cost

Partial knowledge requirement

Centralized operation (easy to decentralize)

4V T

ICDCS 2011

Page 7: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP

Topic-connected overlay(TCO)

V5

{a,c}

V1

{b,c,d}

V2

{a}

{b,d}

V4

{a,b}

V3

V5

{a,c}

V2

{a}

V4

{a,b}

V1

{b,c,d}

{b,d}

V4

{a,b}

V3

An overlay G Suboverlay Ga istopic-connected

Suboverlay Gb isNOT topic-connected

ICDCS 2011 7

Page 8: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP MinMax-TCO

V5

V1

{b,c,d}

V2

{a}

{b,d}

V4

{a,b}

V3

V5 has 3 edges

{a,c}

V5

V1

{b,c,d}

V2

{a}

{b,d}

V4

{a,b}

V3

V1 has 4 edges

{a,c}

ICDCS 2011 8

Page 9: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP

ICDCS 2011 9

MinMax-TCO problem and GM-M algorithm[Onus, 2009]

• Minimum Maximum Degree Topic-Connected Overlay (MinMax-TCO) problem– Given a set of nodes V, set of topics T, and Interest: V T

{true, false}, construct a topic-connected overlay G with minimum maximum degree.

• Theorem: MinMax-TCO is NP-complete

• GM-M algorithm (MinMax-ODA)– always greedily adding an edge which 1) has the largest edge contribution, and 2) increases the maximum node degree minimally

– logarithmic approximation ratio – time complexity 4

V T log V T

Page 10: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP Why divide-and-conquer

• GM-M’s runtime cost is expensive– time complexity– 487 minutes: |V|=1000, |T|=100, uniform distribution*

* each topic has an equal probability for all nodes that may be interested in that topic

• The number of nodes is the dominant factor

ICDCS 2011 10

4V T

To improve running time

Reduce the size of node set

Divide-and-conquer based on node set V

Page 11: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP Divide-and-conquer (DC)

V12

V0

{c}

V6

{d}V9 {a,b,c

}V3

{d} {a,b,c}

V8

V11V2

{a}V5{a,b,d}

V14

{b,c,d}

{a,b,c}

{a,b,d}

V13

V1

V4

{c}

V10

V7

{c}{a,c,d}

{c}

{a}

ICDCS 2011

- Divide overlay based on V- Conquer each sub-TCO by GM-M- Combine via cross-TCO links

11

Page 12: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP Challenges for divide

Node clusteringNodes with similar interests are

placed together• High runtime cost• Not trivial to decentralize• Outputs with varying sizes

Random partitioningEach node flips a coin and gets

assigned to one of the partitions• Fast• Easy to tune• Straightforward to decentralize

However, • May lose correlation among nodes

due to randomness• Maximum node degree is very

sensitive to random partitioning

ICDCS 2011 12

Divide the MinMax-TCO problem into several sub-overlay construction problems

Page 13: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP Bad case for random partitioning

ICDCS 2011 13

vall

Va1

Vb1

Va2

Vb2 Vb3 Vb4

V1 V2 V3 V4 V5 V6 V7 V8

V1

V2

V3V4V5

V6

V7

V8

vall

Va1

Vb1

Va2

Vb2Vb3

Vb4

{t1, t2, t3, t4, t5, t6, t7, t8}

{t1, t2, t3, t4} {t5, t6, t7, t8}

{t1, t2} {t3, t4} {t5, t6} {t7, t8}

{t1

}{t2

}{t3

}{t4

}{t5

}{t6

}{t7

}{t8

}Random partitioning may increase the degrees of individual nodes by a factor of | T |

Page 14: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP Poor performance of DC

for MinMax-TCO

ICDCS 2011 14

Page 15: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP Pub/sub workloads

• The number of nodes |V|: from 1000 to 8000

• The number of topics |T|: from 100 to 1000

• The subscription size: from 50 to 150 on average

• Topic popularity– Uniform: [Chockler, 2007]– Zipf: feed popularity distribution in RSS [Liu, 2005]– Exponential: stock popularity in NYSE [Tock, 2005]

ICDCS 2011 15

Page 16: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP

Learn from workloads

Observations• Increased maximum node degree occurs when a node

subscribes to a large number of topics• “Pareto 80-20” rule:

– most nodes subscribe to a relatively small number of topics – only a relatively small number of nodes might be interested

in a large number of topics

Basic idea

special treatment for those nodes interested in many topics

ICDCS 2011 16

Page 17: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP Bulk nodes

Given (V,T,Int)

the bulk node set is a subset

such that

where Tv is the topic set subscribed by node v

and η is defined as bulk subscriber threshold

The lightweight node set is L = V – B

The bulk subscriber threshold η

can be determined based on historical results

ICDCS 2011 17

B V{ : }vB v V T

Page 18: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP

Challenges for combine

Combine multiple sub-TCOs into one

by adding cross-TCO links as bridges

• Not all nodes need to participate• How to select node subsets for cross-TCO links?

– small : increasing node degrees– large : degrading time efficiency

ICDCS 2011 18

Page 19: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP Representative set

Given a TCO (V,T,Int,E),

a representative set (rep set) is a subset of V that covers all V’s topics λ times.

ICDCS 2011 19

V5

V1

{b,c,d}

V2

{a}

{b,d}

V4

{a,b}

V3

A topic-connected overlay{v3,v5} is a 1-rep set which covers all topics {a,b,c,d}

V5

V1

{b,c,d}

V2

{a}

{b,d}

V4

{a,b}

V3

V5

V1

{b,c,d}

V2

{b,d}

V4

{a,b}

V3

{a,c}

{a,c}

{v1,v2,v3,v5} is a 2-rep set; {a,b,c,d} is covered twice.

{a}{a,c}

Page 20: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP

Representative nodes• Representative nodes (rep-nodes)

– Represents the interests of all the nodes– Can function as bridges to determine cross-TCO links– Coverage factor λ : for tuning the size of rep set

• Observation For typical pub/sub workload and sufficiently large partitions, minimal rep sets tend to be several times smaller than the total number of nodes.

• How to find a minimal rep set Rλ for (V,T,Int)? – Linearly reducible to classic set cover problem: NP-complete– Greedy algorithm: always adding a node with the largest number of

topics that are not yet λ-covered• a logarithmic approximation ratio• efficiently implemented

ICDCS 2011 20

Page 21: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP

Divide-and-Conquer with Bulk and Lightweight Rep-nodes (DCBR-M)

V0

V3

V6V12

V9

V15

V18

V19V20

V1

V4

V7

V13

V10

V16

V2

V5

V8

V14V11

V17

{a,c,h}

{b,c,d,e}{d,f,g,h

}

{c,e,h}

{a,d,e,g}

{a,c,e,f}

{a,e,f,g}

{a,c,d,e}

{a,d,f,g}

{b,d,e,f}

{b,d,e,g}

{a,e,f}

{c,d,g,h}

{b,f,h}

{b,d,e}

{a,c,g,h}

{a,d,e}

{a,c,e,g}

{a,b,c,e,f,g}

{a,b,c,d,f,g}

{a,b,c,e,f,g,h}

ICDCS 2011

Page 22: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP Design of DCBR-M algorithm

• Different parameters for tuning the algorithm:– The bulk subscriber threshold η divide, combine

bulk nodes vs. lightweight nodes– The coverage factor λ combine

time efficiency vs. the quality of TCO – The number of lightweight partitions p divide, conquer

p = |L| (one node one partition): combine only

p = 1 (all node one partition): conquer only

• How to decentralize DCBR-M– Nodes autonomously organize themselves into random partitions– Different partitions construct inner edges in parallel– Different partitions compute rep sets in parallel– Bulk nodes and rep-nodes communicate and compute outer edges

ICDCS 2011 22

Page 23: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP

Theoretical analysis of DCBR-M

• DCBR-M will generate a TCO whose maximum node degree is asymptotically the same as that of the TCO output by GM-M under the realistic assumption for typical pub/sub workloads.

• The running time of DCBR-M is

Considerable speedup when |B| and |R| are small

ICDCS 2011 23

4

4

3

LT B R

p

Page 24: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP Evaluation for DCBR-M (1)

24ICDCS 2011

Page 25: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP

Evaluation for DCBR-M (2)

ICDCS 2011 25

Page 26: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP

Evaluation for DCBR-M (3)

26ICDCS 2011

Page 27: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP Conclusion

ICDCS 2011 27

Running time max degree avg degree Required information

Potential to Decentralize

RingPT good poor: 168 poor: 92 full knowledge good

GM-M poor: 487 min good: 5 good: 3.88 full knowledge poor

DCBR-M

good: 13.6 sec good: 6 good: 4.29 partial knowledge good

Page 28: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP

Backup

ICDCS 2011 28

Page 29: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP

Related work

• Construction of the overlay– MinAvg-TCO, Chockler et al. PODC’2007– MinMax-TCO, Onus et al. Infocom’2009– Low-TCO, Onus et al. ICDCS’2010– DC for MinAvg-TCO, Chen et al. ICDCS’2010

• Design of routing protocols– G. Li et al. ICDCS’2008– M. Castro et al. JASC’2002

ICDCS 2011 29

Page 30: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP

Minimal Number of Links

• A typical pub/sub system combines a number of protocols, many of which maintaining per-link state– A node must constantly monitor the availability of each of its

neighbors (heartbeats and keep-alive state)– If the links are maintained using TCP, there is the cost of

connection state for each link– The more links there are, the fewer topics can be routed over

each individual link, thereby diminishing cross-topic aggregation benefits

– If sequential-diff-based compression scheme is used, there is an extra cost associated with a history table

ICDCS 2011

Page 31: Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems

MIDDLEWARE SYSTEMSRESEARCH GROUP DCBR-M vs DC

• MinMax-TCO vs MinAvg-TCOFundamentally different problems– Average node degree is a “global” property;

maximum node degree possess both “global” and “local” properties.

– DC for MinAvg-TCO does not directly apply to MinMax-TCO.– MinMax-TCO is more sensitive to divide, conquer and combine. – Different algorithm design, theoretical analysis, and experiments.

ICDCS 2011 31