data management in cloud workflow systems dong yuan faculty of information and communication...

16
Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of

Post on 18-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology

Data Management in Cloud Workflow Systems

Dong Yuan

Faculty of Information and Communication Technology

Swinburne University of Technology

Page 2: Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology

Outline

> Cloud Computing & Cloud Workflow Systems

– Introduction to cloud workflow systems. A brief overview of grid workflow systems.

> Data Management in Cloud Workflow Systems

– New features and research issues

> Cloud Computing Environment and SwinDeW-C

– Our simulation environment and cloud workflow system

Page 3: Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology

> Cloud Computing & Cloud Workflow Systems

Page 4: Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology

Cloud Computing

> Some new features of cloud computing

– Large data centres with cheap hardware

– Virtualisation

– Internet based and SOA

• SaaS, PaaS, IaaS

– Market driven and cost model

> Research of cloud computing has emerged in many areas

– Data mining, Database, Parallel computing & Scientific application, Content delivery

Page 5: Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology

Cloud Workflow Systems

> Grid workflow systems

– Kepler, Pegasus, Taverna, MOTEUR, Triana, ASKALON

– Gridbus, GridFlow

> Build-time: focus on data modelling.

– Kepler: actor-oriented data modelling. Taverna - Sculf. ASKALON - AGWL

> Runtime: adopt Data Grid system

– Grid DataFarm, GDMP, GridDB, SRB, RLS (P-RLS), GSB, DaltOn

Page 6: Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology

Cloud Workflow Systems

> Architecture

– Based on Internet

– Platform as a Service

– More distributed

Unified Resources

Fabric

Platform

Web Portal

User

Workflow Application

Workflow Specification

Cloud Service

Virtual Machine

Cloud Service Cloud

Service

Cloud Service

Cloud Service

Local Data CentreGlobalCloud

Cloud Service ProviderCloud

Service Provider

Cloud Service Provider

Cloud Service Provider

Page 7: Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology

> Data Management in Cloud Workflow Systems

Page 8: Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology

Data Management in Cloud Workflow Systems

> New features and challenges– Independent of users and automatic

– Cost driven

• computation cost, storage cost, data transfer cost

– Data dependency

• Task – data, data – data, derivation

> Some research issues– Data partition, placement, replication, synchronisation,

provenance, catalogue, meta-data, consistence, reduction, storage, movement, etc.

Page 9: Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology

Data Placement in Cloud Workflow Systems

> Data Placement: to decide where to store the application data in the distributed data centres

> Aims:

– Reduce data movement

– Reduce task waiting time

> Strategy:

– Data dependency: dataset – dataset

– Build-time: existing data, runtime: generated data (also intermediate data)

Page 10: Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology

Data Replication in Cloud Workflow Systems

> Data replication: for one dataset, store several copies in different places (data centres)

> Aims:

– Increase data security

– Fast data access

– Reduce data movement

> Strategy:

– Dynamic replication.

Page 11: Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology

Intermediate Data Storage in Cloud Workflow Systems

> Intermediate data storage is especially importance in scientific workflows

> Aim:

– Reduce system cost

> Strategy:

– Intermediate data can be regenerated with data provenance information

– Selectively store some key intermediate datasets

Page 12: Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology

> Cloud computing environment and SwinDeW-C

Page 13: Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology

Simulation Cloud

Swinburne Cluster

VMware

SwinDeW-C

…... …...Physical Machines

Layer

Virtual Machines

Layer

ApplicationsLayer

Data Centres with Hadoop

Page 14: Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology

Web Portal

Page 15: Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology

Related key system components of SwinDeW-C

User Interface Module

Data Management Module

Data Placement Component

Data Replication Component

Intermediate data storage Component

Data Catalogue

Flow Management Module

Process Repository

Task Management Module

Scheduler

Resource Management Module

…...

Web PortalMonitoring Component

Uploading Component

Meta-data Management Component

Provenance Data

Collection

Page 16: Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology

End

> Questions?