on developing and operating of data elasticity management process
TRANSCRIPT
![Page 1: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/1.jpg)
On Developing and Operating of Data
Elasticity Management Process
Tien-Dung Nguyen, Hong-Linh Truong, Georgiana Copil, Duc-
Hung Le, Daniel Moldovan, and Schahram Dustdar
Distributed Systems Group, TU Wien
http://dsg.tuwien.ac.at/staff/truong
ICSOC 2015, 17 Nov, Goa, India 1
![Page 2: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/2.jpg)
Outline
Motivation
Approach
Elasticity Model for Data Asset
Generating and Operating Data Elasticity
Management Process
Prototype, Evaluation and Lessons learned
Conclusions and Future Work
2ICSOC 2015, 17 Nov, Goa, India
![Page 3: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/3.jpg)
Motivation
Data-as-a-Service (DaaS) and Analytics-as-a-Service (AaaS) Provider
data analytic workflows producing data assets for different consumers
Data assets can be “sold” based on
quality of data (e.g., accuracy)
performance of the execution of data analytic workflow
cost for computation and data resource.
DaaS/AaaS consumers typically want
data assets with expectation of quality of data, performance and cost
3ICSOC 2015, 17 Nov, Goa, India
![Page 4: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/4.jpg)
Example: Ensuring Quality of GPS Data
Asset
GPS data of motorbikes in Ho Chi Minh city
GPS Data-as-a-Service (DaaS) provider and
several DaaS consumers (e.g., Taxi company)
4ICSOC 2015, 17 Nov, Goa, India
Expectation:
Quality of data:
Vehicle location accuracy>=
81%
Vehicle speed accuracy>=
81%
Performance
deliveryTime <=55s
Cost <= €0.09
![Page 5: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/5.jpg)
GPS Data Asset Example
Timestamp DeviceID Longitude Latitude Speed Local
Area
Estimated
Speed in local
area
Wed Sep 10
07:45:00 ICT
2014
51B00552 10.660332 106.779396 0 CMT8-
VTS
10.25
Wed Sep 10
07:45:23 ICT
2014
51C29797 10.749635 106.67208 24 CMT8-
DBP
30.5
Wed Sep 10
07:46:24 ICT
2014
51B01907 10.877548 106.64205 0 CMT8-
AC
21.1
5ICSOC 2015, 17 Nov, Goa, India
Image source:
http://vnexpress.net/tin-
tuc/thoi-su/hang-nghin-
xe-ket-cung-ngay-dau-
thu-phi-cau-binh-trieu-1-
2858465.html
![Page 6: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/6.jpg)
Motivation – Research Questions
How to support Data Elasticity Management
Process for different “business models”?
Improve quality of data: monitor/adjust quality of data
Guarantee performance: scaling in/out services to
monitor/adjust quality of data to adapt with changes of
number of consumers
Cost: minimize computation cost and get some benefits
They are not in the current focus
quality of services and cost service composition
refining/replacing/extending analytic tasks in data
analytics workflow
6ICSOC 2015, 17 Nov, Goa, India
![Page 7: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/7.jpg)
Approach
7
Generating and execution Data Elasticity Management
Process from information in data analytics workflow
and expected quality of results of the data asset.
ICSOC 2015, 17 Nov, Goa, India
![Page 8: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/8.jpg)
Approach
Algorithm to generate data elasticity
management process
Inputs:
Data Analytics Workflows
Quality of Results
Primitive Actions
Output
Data Elasticity Management Process
Runtime Environment for Data Elasticity
Management Process
8ICSOC 2015, 17 Nov, Goa, India
![Page 9: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/9.jpg)
Quality of Results
Specifies the expectation of quality of data,
performance and cost of data asset
Quality of Results (QoR) (analytics quality)
9
Hong Linh Truong, Schahram Dustdar: Principles of Software-Defined Elastic Systems for Big Data
Analytics. IC2E 2014: 562-567
Hong Linh Truong, Schahram Dustdar: Principles of Software-Defined Elastic Systems for Big Data
Analytics. IC2E 2014: 562-567
ICSOC 2015, 17 Nov, Goa, India
![Page 10: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/10.jpg)
Example: Quality of Results!at.ac.tuwien.dsg.depic.common.entity.qor.QoRModel
dataAssetForm: XML
listOfMetrics:
-!at.ac.tuwien.dsg.depic.common.entity.qor.QoRMetric
-!at.ac.tuwien.dsg.depic.common.entity.qor.QoRMetric
name: vehicleArc
listOfRanges:
- !at.ac.tuwien.dsg.depic.common.entity.qor.Range
rangeID: vehicleArc_co1
toValue: 80.0
- !at.ac.tuwien.dsg.depic.common.entity.qor.Range
fromValue: 81.0
rangeID: vehicleArc_co2
toValue: 100.0
unit: '%'
10ICSOC 2015, 17 Nov, Goa, India
listOfQElements:
-!at.ac.tuwien.dsg.depic.common.entity.qor.QElement
listOfRanges:
- speedArc_co1
- vehicleArc_co2
- deliveryTime_co2
price: 0.01
qElementID: qElement4
listOfQElements:
-!at.ac.tuwien.dsg.depic.common.entity.qor.QElement
listOfRanges:
- speedArc_co1
- vehicleArc_co2
- deliveryTime_co2
price: 0.01
qElementID: qElement4
![Page 11: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/11.jpg)
Primitive Action for
Data Elasticity Management
11
Primitive actions for monitoring and adjustment
to capture information of action to adjust quality of data,
performance ?
filled by data experts, profiling tool, benchmarking, etc.
ICSOC 2015, 17 Nov, Goa, India
![Page 12: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/12.jpg)
-!at.ac.tuwien.dsg.depic.common.entity.primitiveaction.AdjustmentAction
actionID: VAA actionName: VAA
artifact:
name: VAA
description: adjust vehicle Accuracy
location: …./salsa/upload/files/jun/artifact_sh/VAA.sh
restfulAPI: /VAA/rest/control
type: sh
associatedQoRMetric: vehicleArc
Example: Primitive Action
listOfAdjustmentCases:
- !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.AdjustmentCase
estimatedResult:
conditionID: vehicleArc_c1
lowerBound: 81.0
metricName: vehicleArc
upperBound: 100.0
listOfParameters:
- !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.Parameter
parameterName: longtitudeIndex
type: int
value: 0
- !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.Parameter
parameterName: latitudeIndex
type: int
value: 1
- !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.Parameter
parameterName: speedIndex
type: int
value: 2
listOfPrerequisiteActionIDs: []
listOfAdjustmentCases:
- !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.AdjustmentCase
estimatedResult:
conditionID: vehicleArc_c1
lowerBound: 81.0
metricName: vehicleArc
upperBound: 100.0
listOfParameters:
- !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.Parameter
parameterName: longtitudeIndex
type: int
value: 0
- !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.Parameter
parameterName: latitudeIndex
type: int
value: 1
- !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.Parameter
parameterName: speedIndex
type: int
value: 2
listOfPrerequisiteActionIDs: []
12ICSOC 2015, 17 Nov, Goa, India
![Page 13: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/13.jpg)
Generating Data Elasticity
Management Process
ICSOC 2015, 17 Nov, Goa, India 13
Data Asset: associated with an expected state and current state
Adjustment Process: a set of primitive actions that change the state of an
data asset
Resource Control Plan: computational resources for executing adjustment
processes
![Page 14: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/14.jpg)
Using Data Elasticity
Management Process to ensure QoR
14ICSOC 2015, 17 Nov, Goa, India
![Page 15: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/15.jpg)
Using Data Elasticity
Management Process
Our current focus:
Enrichment of data assets at the end of the data
analytics process
A general principle:
store data into a data buffer
perform actions on data in the buffer before delivering
the data to customer
Data buffers can have different plugins
interfacing to different types of databases
15ICSOC 2015, 17 Nov, Goa, India
![Page 16: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/16.jpg)
Prototype Implementation
ICSOC 2015, 17 Nov, Goa, India 16
Prototype Source.
https://github.com/tuwiendsg/EPICS/tree/master/depic
![Page 17: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/17.jpg)
Experiments: Setup
Scenario Near-real time GPS data of vehicles in HoChiMinh City. Data size
1.17GB.
Emulating data source by sending historical GPS data to scalable message oriented middleware (MOM - Apache ActiveMQ)
5 concurrent DaaS consumers
Infrastructure 1 VM (7GB RAM, 4 vCPUs, 40GB Disk) for Tooling, Orchestrator,
Data Asset Loader and Data Analytics Workflow Management
4 VMs (1GB RAM, 1 vCPU, 40GB) for monitoring/adjustment services at the beginning
Unit costs For speedAcr and vehicleAcr as EUR 0.0002
machines in the data analytics phase as 0.104 EUR
Machines in enrichment phase as EUR0.026
17ICSOC 2015, 17 Nov, Goa, India
![Page 18: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/18.jpg)
Elasticity Process Management
Generation
ICSOC 2015, 17 Nov, Goa, India 18
![Page 19: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/19.jpg)
Elasticity Process Management
Generation
We rely on domain expert to provide knowledge about primitive
actions for data assets (and services to monitor and enrichment)
Missing information leads to incomplete generation
IT experts may modify the process before ist deployment
Current generation algorithms do not deal with complex
dependencies among QoR and other metadata about data analytics
functions
New classes of algorithms
ICSOC 2015, 17 Nov, Goa, India 19
![Page 20: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/20.jpg)
Data quality, processing cost, and
data asset costs
20
Figure 6: Data Analytics Workflow
for provisioning GPS data
Provider:
• Vehicle location accuracy (%):
[0 20], [21 40] [41 60], [61 80], [81 100]
• Vehicle speed accuracy (%):
[0 20], [21 40] [41 60], [61 80], [81 100]
• Delivery time (s): [0 54], [55 120]
• Assumed cost function:
Consumers’ expectations of the data asset
• Case 1:– Vehicle location accuracy > 81%
– Vehicle speed accuracy > 81 %
– Delivery Time < 55s
• Case 2:– Vehicle location accuracy > 61%
– Vehicle Speed accuracy > 61 %
– Delivery Time < 55s
ICSOC 2015, 17 Nov, Goa, India
![Page 21: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/21.jpg)
Data quality, processing cost, and
data asset costs
ICSOC 2015, 17 Nov, Goa, India 21
f5 (w_processingTime =0.5,
w_vechicleAcr =0.25,
w_speedAcr =0.25)
Consumers shared some
common cost (data analytic
part)
We cannot determine the best cost model but report and act according
to generated/pre-defined plans
The runtime of data analytics is crucial:
huge problems in integration and performance/cost
consequences
Resource plan for data quality enrichment is a very hard problem
![Page 22: On Developing and Operating of Data Elasticity Management Process](https://reader031.vdocuments.site/reader031/viewer/2022030307/58ec54df1a28ab4d118b45c9/html5/thumbnails/22.jpg)
Conclusions and Future Work
Conclusions Supporting DaaS/AaaS to develop and enforce appropriate cost
elasticity models for data assets
Elasticity of quality, cost and resource usage focusing on data aspects
But current techniques are still at an early stage
Future work Domain knowledge integration
Tool for the development and integration of primitive actions
Realistic data asset cost models in DaaS
Data asset cost studies System performance & data asset quality enrichment/evaluation
Different performance and cost models and dependencies
Algorithms: new classes of management processes
22ICSOC 2015, 17 Nov, Goa, India
Thanks! and Questions?Thanks! and Questions?