on developing and operating of data elasticity management process

22
On Developing and Operating of Data Elasticity Management Process Tien-Dung Nguyen, Hong-Linh Truong, Georgiana Copil, Duc- Hung Le, Daniel Moldovan, and Schahram Dustdar Distributed Systems Group, TU Wien [email protected] http://dsg.tuwien.ac.at/staff/truong ICSOC 2015, 17 Nov, Goa, India 1

Upload: hong-linh-truong

Post on 11-Apr-2017

392 views

Category:

Education


1 download

TRANSCRIPT

Page 1: On Developing and Operating of Data Elasticity Management Process

On Developing and Operating of Data

Elasticity Management Process

Tien-Dung Nguyen, Hong-Linh Truong, Georgiana Copil, Duc-

Hung Le, Daniel Moldovan, and Schahram Dustdar

Distributed Systems Group, TU Wien

[email protected]

http://dsg.tuwien.ac.at/staff/truong

ICSOC 2015, 17 Nov, Goa, India 1

Page 2: On Developing and Operating of Data Elasticity Management Process

Outline

Motivation

Approach

Elasticity Model for Data Asset

Generating and Operating Data Elasticity

Management Process

Prototype, Evaluation and Lessons learned

Conclusions and Future Work

2ICSOC 2015, 17 Nov, Goa, India

Page 3: On Developing and Operating of Data Elasticity Management Process

Motivation

Data-as-a-Service (DaaS) and Analytics-as-a-Service (AaaS) Provider

data analytic workflows producing data assets for different consumers

Data assets can be “sold” based on

quality of data (e.g., accuracy)

performance of the execution of data analytic workflow

cost for computation and data resource.

DaaS/AaaS consumers typically want

data assets with expectation of quality of data, performance and cost

3ICSOC 2015, 17 Nov, Goa, India

Page 4: On Developing and Operating of Data Elasticity Management Process

Example: Ensuring Quality of GPS Data

Asset

GPS data of motorbikes in Ho Chi Minh city

GPS Data-as-a-Service (DaaS) provider and

several DaaS consumers (e.g., Taxi company)

4ICSOC 2015, 17 Nov, Goa, India

Expectation:

Quality of data:

Vehicle location accuracy>=

81%

Vehicle speed accuracy>=

81%

Performance

deliveryTime <=55s

Cost <= €0.09

Page 5: On Developing and Operating of Data Elasticity Management Process

GPS Data Asset Example

Timestamp DeviceID Longitude Latitude Speed Local

Area

Estimated

Speed in local

area

Wed Sep 10

07:45:00 ICT

2014

51B00552 10.660332 106.779396 0 CMT8-

VTS

10.25

Wed Sep 10

07:45:23 ICT

2014

51C29797 10.749635 106.67208 24 CMT8-

DBP

30.5

Wed Sep 10

07:46:24 ICT

2014

51B01907 10.877548 106.64205 0 CMT8-

AC

21.1

5ICSOC 2015, 17 Nov, Goa, India

Image source:

http://vnexpress.net/tin-

tuc/thoi-su/hang-nghin-

xe-ket-cung-ngay-dau-

thu-phi-cau-binh-trieu-1-

2858465.html

Page 6: On Developing and Operating of Data Elasticity Management Process

Motivation – Research Questions

How to support Data Elasticity Management

Process for different “business models”?

Improve quality of data: monitor/adjust quality of data

Guarantee performance: scaling in/out services to

monitor/adjust quality of data to adapt with changes of

number of consumers

Cost: minimize computation cost and get some benefits

They are not in the current focus

quality of services and cost service composition

refining/replacing/extending analytic tasks in data

analytics workflow

6ICSOC 2015, 17 Nov, Goa, India

Page 7: On Developing and Operating of Data Elasticity Management Process

Approach

7

Generating and execution Data Elasticity Management

Process from information in data analytics workflow

and expected quality of results of the data asset.

ICSOC 2015, 17 Nov, Goa, India

Page 8: On Developing and Operating of Data Elasticity Management Process

Approach

Algorithm to generate data elasticity

management process

Inputs:

Data Analytics Workflows

Quality of Results

Primitive Actions

Output

Data Elasticity Management Process

Runtime Environment for Data Elasticity

Management Process

8ICSOC 2015, 17 Nov, Goa, India

Page 9: On Developing and Operating of Data Elasticity Management Process

Quality of Results

Specifies the expectation of quality of data,

performance and cost of data asset

Quality of Results (QoR) (analytics quality)

9

Hong Linh Truong, Schahram Dustdar: Principles of Software-Defined Elastic Systems for Big Data

Analytics. IC2E 2014: 562-567

Hong Linh Truong, Schahram Dustdar: Principles of Software-Defined Elastic Systems for Big Data

Analytics. IC2E 2014: 562-567

ICSOC 2015, 17 Nov, Goa, India

Page 10: On Developing and Operating of Data Elasticity Management Process

Example: Quality of Results!at.ac.tuwien.dsg.depic.common.entity.qor.QoRModel

dataAssetForm: XML

listOfMetrics:

-!at.ac.tuwien.dsg.depic.common.entity.qor.QoRMetric

-!at.ac.tuwien.dsg.depic.common.entity.qor.QoRMetric

name: vehicleArc

listOfRanges:

- !at.ac.tuwien.dsg.depic.common.entity.qor.Range

rangeID: vehicleArc_co1

toValue: 80.0

- !at.ac.tuwien.dsg.depic.common.entity.qor.Range

fromValue: 81.0

rangeID: vehicleArc_co2

toValue: 100.0

unit: '%'

10ICSOC 2015, 17 Nov, Goa, India

listOfQElements:

-!at.ac.tuwien.dsg.depic.common.entity.qor.QElement

listOfRanges:

- speedArc_co1

- vehicleArc_co2

- deliveryTime_co2

price: 0.01

qElementID: qElement4

listOfQElements:

-!at.ac.tuwien.dsg.depic.common.entity.qor.QElement

listOfRanges:

- speedArc_co1

- vehicleArc_co2

- deliveryTime_co2

price: 0.01

qElementID: qElement4

Page 11: On Developing and Operating of Data Elasticity Management Process

Primitive Action for

Data Elasticity Management

11

Primitive actions for monitoring and adjustment

to capture information of action to adjust quality of data,

performance ?

filled by data experts, profiling tool, benchmarking, etc.

ICSOC 2015, 17 Nov, Goa, India

Page 12: On Developing and Operating of Data Elasticity Management Process

-!at.ac.tuwien.dsg.depic.common.entity.primitiveaction.AdjustmentAction

actionID: VAA actionName: VAA

artifact:

name: VAA

description: adjust vehicle Accuracy

location: …./salsa/upload/files/jun/artifact_sh/VAA.sh

restfulAPI: /VAA/rest/control

type: sh

associatedQoRMetric: vehicleArc

Example: Primitive Action

listOfAdjustmentCases:

- !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.AdjustmentCase

estimatedResult:

conditionID: vehicleArc_c1

lowerBound: 81.0

metricName: vehicleArc

upperBound: 100.0

listOfParameters:

- !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.Parameter

parameterName: longtitudeIndex

type: int

value: 0

- !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.Parameter

parameterName: latitudeIndex

type: int

value: 1

- !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.Parameter

parameterName: speedIndex

type: int

value: 2

listOfPrerequisiteActionIDs: []

listOfAdjustmentCases:

- !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.AdjustmentCase

estimatedResult:

conditionID: vehicleArc_c1

lowerBound: 81.0

metricName: vehicleArc

upperBound: 100.0

listOfParameters:

- !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.Parameter

parameterName: longtitudeIndex

type: int

value: 0

- !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.Parameter

parameterName: latitudeIndex

type: int

value: 1

- !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.Parameter

parameterName: speedIndex

type: int

value: 2

listOfPrerequisiteActionIDs: []

12ICSOC 2015, 17 Nov, Goa, India

Page 13: On Developing and Operating of Data Elasticity Management Process

Generating Data Elasticity

Management Process

ICSOC 2015, 17 Nov, Goa, India 13

Data Asset: associated with an expected state and current state

Adjustment Process: a set of primitive actions that change the state of an

data asset

Resource Control Plan: computational resources for executing adjustment

processes

Page 14: On Developing and Operating of Data Elasticity Management Process

Using Data Elasticity

Management Process to ensure QoR

14ICSOC 2015, 17 Nov, Goa, India

Page 15: On Developing and Operating of Data Elasticity Management Process

Using Data Elasticity

Management Process

Our current focus:

Enrichment of data assets at the end of the data

analytics process

A general principle:

store data into a data buffer

perform actions on data in the buffer before delivering

the data to customer

Data buffers can have different plugins

interfacing to different types of databases

15ICSOC 2015, 17 Nov, Goa, India

Page 16: On Developing and Operating of Data Elasticity Management Process

Prototype Implementation

ICSOC 2015, 17 Nov, Goa, India 16

Prototype Source.

https://github.com/tuwiendsg/EPICS/tree/master/depic

Page 17: On Developing and Operating of Data Elasticity Management Process

Experiments: Setup

Scenario Near-real time GPS data of vehicles in HoChiMinh City. Data size

1.17GB.

Emulating data source by sending historical GPS data to scalable message oriented middleware (MOM - Apache ActiveMQ)

5 concurrent DaaS consumers

Infrastructure 1 VM (7GB RAM, 4 vCPUs, 40GB Disk) for Tooling, Orchestrator,

Data Asset Loader and Data Analytics Workflow Management

4 VMs (1GB RAM, 1 vCPU, 40GB) for monitoring/adjustment services at the beginning

Unit costs For speedAcr and vehicleAcr as EUR 0.0002

machines in the data analytics phase as 0.104 EUR

Machines in enrichment phase as EUR0.026

17ICSOC 2015, 17 Nov, Goa, India

Page 18: On Developing and Operating of Data Elasticity Management Process

Elasticity Process Management

Generation

ICSOC 2015, 17 Nov, Goa, India 18

Page 19: On Developing and Operating of Data Elasticity Management Process

Elasticity Process Management

Generation

We rely on domain expert to provide knowledge about primitive

actions for data assets (and services to monitor and enrichment)

Missing information leads to incomplete generation

IT experts may modify the process before ist deployment

Current generation algorithms do not deal with complex

dependencies among QoR and other metadata about data analytics

functions

New classes of algorithms

ICSOC 2015, 17 Nov, Goa, India 19

Page 20: On Developing and Operating of Data Elasticity Management Process

Data quality, processing cost, and

data asset costs

20

Figure 6: Data Analytics Workflow

for provisioning GPS data

Provider:

• Vehicle location accuracy (%):

[0 20], [21 40] [41 60], [61 80], [81 100]

• Vehicle speed accuracy (%):

[0 20], [21 40] [41 60], [61 80], [81 100]

• Delivery time (s): [0 54], [55 120]

• Assumed cost function:

Consumers’ expectations of the data asset

• Case 1:– Vehicle location accuracy > 81%

– Vehicle speed accuracy > 81 %

– Delivery Time < 55s

• Case 2:– Vehicle location accuracy > 61%

– Vehicle Speed accuracy > 61 %

– Delivery Time < 55s

ICSOC 2015, 17 Nov, Goa, India

Page 21: On Developing and Operating of Data Elasticity Management Process

Data quality, processing cost, and

data asset costs

ICSOC 2015, 17 Nov, Goa, India 21

f5 (w_processingTime =0.5,

w_vechicleAcr =0.25,

w_speedAcr =0.25)

Consumers shared some

common cost (data analytic

part)

We cannot determine the best cost model but report and act according

to generated/pre-defined plans

The runtime of data analytics is crucial:

huge problems in integration and performance/cost

consequences

Resource plan for data quality enrichment is a very hard problem

Page 22: On Developing and Operating of Data Elasticity Management Process

Conclusions and Future Work

Conclusions Supporting DaaS/AaaS to develop and enforce appropriate cost

elasticity models for data assets

Elasticity of quality, cost and resource usage focusing on data aspects

But current techniques are still at an early stage

Future work Domain knowledge integration

Tool for the development and integration of primitive actions

Realistic data asset cost models in DaaS

Data asset cost studies System performance & data asset quality enrichment/evaluation

Different performance and cost models and dependencies

Algorithms: new classes of management processes

22ICSOC 2015, 17 Nov, Goa, India

Thanks! and Questions?Thanks! and Questions?