the continuous distributed monitoring model

29
The Continuous Distributed Monitoring Model Farzad Nozarian [email protected] Chalmers University of Technology 18/04/2016

Upload: farzad-nozarian

Post on 11-Apr-2017

143 views

Category:

Education


1 download

TRANSCRIPT

Page 1: The Continuous Distributed Monitoring Model

The Continuous Distributed

Monitoring ModelFarzad Nozarian

[email protected]

Chalmers University of Technology

18/04/2016

Page 2: The Continuous Distributed Monitoring Model

218/04/2016

Outline

Chalmers University of Technology

Countdown Problem

Monitoring Entropy

Geometric Approach

Sampling

Introduction

Page 3: The Continuous Distributed Monitoring Model

318/04/2016

What Is the Problem?

Chalmers University of Technology

Simple countdown!Tracking the entropy Distinct elementsSamplingTop-k items

Several processing nodes receive streams of data items

The goal is how to monitor a function over the union of items

Examples of monitoring functions:

with minimum communication cost

Page 4: The Continuous Distributed Monitoring Model

418/04/2016

Motivation and Applications

Chalmers University of Technology

Monitoring the global health of the network in a large ISP

Tracking the usage of resources in distributed data centers by social

networks

Tracking global changes by collecting information from sensors

Page 5: The Continuous Distributed Monitoring Model

518/04/2016

What Are the Challenges?

Chalmers University of Technology

Continuous MonitoringReal-time tracking, rather than one-shot query

StreamingData is received at a very high speed

Distributed Processing

Each node only sees part of the global streamCommunication cost is important

Page 6: The Continuous Distributed Monitoring Model

618/04/2016

Trivial Solutions

Chalmers University of Technology

High communication cost!

Summarizing information in complex functionsParameter tuning for frequency of the polling

Infrequent polling

Delay in identifying events

Frequent polling

High communication

Centralizing all the items

Periodic polling

Page 7: The Continuous Distributed Monitoring Model

The Countdown Problem

Page 8: The Continuous Distributed Monitoring Model

818/04/2016

The Countdown Problem

Chalmers University of Technology

A threshold monitoring problem with many applications

Identifying when the total number of observations reaches

Trivial solution: Observers notify the coordinator by sending a bit when an event is observed

But we can improve it!

communication

Page 9: The Continuous Distributed Monitoring Model

918/04/2016

A First Approach

Chalmers University of Technology

The total communication is

Idea: there are many events at each site before reaching the threshold

At least one site should see items before thresholdEvery site waits to see at least items before reporting to the coordinator

After receiving a report from observer the coordinator updates and informs all nodes

Page 10: The Continuous Distributed Monitoring Model

1018/04/2016

A Quadratic Improvement

Chalmers University of Technology

Waiting for more updates before reporting to coordinatorProtocol runs over rounds

The total communication is

In round , all nodes wait to receive items before reporting to the coordinator

Coordinator starts the th round after receiving messages

Page 11: The Continuous Distributed Monitoring Model

Monitoring Entropy

Page 12: The Continuous Distributed Monitoring Model

1218/04/2016

Monitoring Entropy

Chalmers University of Technology

Monitoring non-monotone functions

Let denote the number of occurrences of item

Let denote the total number of items

Union of input streams implicitly define a probability distribution given by ,

The goal is monitoring the entropy of this distribution

Page 13: The Continuous Distributed Monitoring Model

1318/04/2016

Entropy Protocol

Chalmers University of Technology

The protocol proceeds in multiple rounds

In the first round, coordinator collects a constant number of items from sites

In each subsequent round coordinator does the following:

Computes the parameter

Runs the approximate countdown protocol with Collects frequency distribution from all sites and computes current entropy

Page 14: The Continuous Distributed Monitoring Model

The Geometric Approach

Page 15: The Continuous Distributed Monitoring Model

1518/04/2016

The Geometric Approach (1/2)

Chalmers University of Technology

Goal: monitoring of arbitrary threshold non-linear functions

A geometric fact:

Idea: break down the testing of or into local conditions

Page 16: The Continuous Distributed Monitoring Model

1618/04/2016

The Geometric Approach (2/2)

Chalmers University of Technology

Each site checks whether its sphere is monochromaticWhen all the constraints are upheld:

Query result remains unchangedNo communication is required

When a constraint is violated:New data is gathered from the streamsNew constraints are set on the streams

Page 17: The Continuous Distributed Monitoring Model

Sampling

Page 18: The Continuous Distributed Monitoring Model

1818/04/2016

Sampling

Chalmers University of Technology

Given inputs of total size , draw a sample of size Uniform over all subsets of size

Sampling cases

Sampling applications

Approximate query answeringQuery planningNumber of distinct elementsHeavy hitters

Infinite windowsSliding windows

Page 19: The Continuous Distributed Monitoring Model

1918/04/2016

Infinite Windows (1/2)

Chalmers University of Technology

Each site associates a random weight with each observation

Coordinator maintains the following variables:

Set of random sample with weight no more than

Weight : the -th smallest weight so far in the system

Each site only maintains its local -th smallest weight

Page 20: The Continuous Distributed Monitoring Model

2018/04/2016

Infinite Windows (2/2)

Chalmers University of Technology

Protocol outline:

Each site sends an element with weight smaller than to the coordinator

Coordinator updates and , if weight of received item is smaller than

Coordinator replies back to site with the current value of

Page 21: The Continuous Distributed Monitoring Model

Thank You :)

Page 22: The Continuous Distributed Monitoring Model

Support Slides

Page 23: The Continuous Distributed Monitoring Model

2318/04/2016

A First Approach (long Ver.)

Chalmers University of Technology

Algorithm steps:Initially, each site report the coordinator whenever its num. of observed items exceeds Coordinator compute current slack based on the sum of all local count: ( is current count)Each site set upper bound on its local count

The total communication is

Idea: there are many events at each site before reaching the threshold At least one site should see items before

threshold

Page 24: The Continuous Distributed Monitoring Model

2418/04/2016

Approximate Countdown

Chalmers University of Technology

Improve the cost by approximating the answer

Similar to previous approach but now terminate when the bound of unreported count reaches The number of rounds is reduced to

The total communication is

Let be the approx. parameter

Report 0 if count Report 1 if count

Page 25: The Continuous Distributed Monitoring Model

2518/04/2016

Randomized Countdown Protocol (1/2)

Chalmers University of Technology

If grows very large the cost will be high

Allow algorithm to give an wrong answer with small probability

Randomization reduces the dependency to by parameter

Page 26: The Continuous Distributed Monitoring Model

2618/04/2016

Randomized Countdown Protocol (2/2)

Chalmers University of Technology

With randomization parameter determined by analysis:

Each site collect of observations

With probability it sends a message otherwise remains silent

The coordinator wait until receive messages, then terminates

The total communication cost is

Page 27: The Continuous Distributed Monitoring Model

2718/04/2016

Geometric Computational Model (1/2)

Chalmers University of Technology

Each site has a -dimensional vector called local statistics vector

Let be weights assigned to the streams

Define the global statistics vector as the weighted average of the s

Let be an arbitrary monitoring function

Goal: determining at any given time and threshold

Page 28: The Continuous Distributed Monitoring Model

2818/04/2016

Geometric Computational Model (2/2)

Chalmers University of Technology

is the last statistics vector collected from the node Coordinator constructs estimate vector is the weighted average of the

Each node also maintains following parameters:

Decomposing relies on the following fact:

Delta vector:

Drift vector:

Page 29: The Continuous Distributed Monitoring Model

2918/04/2016

Geometric Interpretation

Chalmers University of Technology

Geometric interpretation:

Convex hull can be fully covered by spheres with radius centered at

�⃗�

𝑢1𝑢2

𝑢3

𝑢4𝑢5