overview of big data in cloud computing

61
BigData in Cloud computing Viet-Trung Tran @Vietstack Sunday 1 February 15

Upload: viet-trung-tran

Post on 15-Jul-2015

844 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Overview of big data in cloud computing

BigData in Cloud computingViet-Trung Tran@Vietstack

Sunday 1 February 15

Page 2: Overview of big data in cloud computing

Bio

Viet-Trung Tran

[email protected]

https://www.facebook.com/groups/BigDataStartUp/

SoICT, Trendiction S.A Luxembourg, Microsoft Research Cambridge, INRIA France, BKAV

Sunday 1 February 15

Page 3: Overview of big data in cloud computing

Sunday 1 February 15

Page 4: Overview of big data in cloud computing

Sunday 1 February 15

Page 5: Overview of big data in cloud computing

Sunday 1 February 15

Page 6: Overview of big data in cloud computing

Sunday 1 February 15

Page 7: Overview of big data in cloud computing

Sunday 1 February 15

Page 8: Overview of big data in cloud computing

Sunday 1 February 15

Page 9: Overview of big data in cloud computing

Sunday 1 February 15

Page 10: Overview of big data in cloud computing

Google trendsGoogle MapReduce paper 2014

Sunday 1 February 15

Page 11: Overview of big data in cloud computing

BigData in science

Sunday 1 February 15

Page 12: Overview of big data in cloud computing

Sunday 1 February 15

Page 13: Overview of big data in cloud computing

The Data Science: The 4th Paradigm for Scientific Discovery

Last few decades

Thousand years ago

Today and the Future

Last few hundred years

2

22.

34

acG

aa

Κ−=###

$

%

&&&

'

( ρπ

Simulation of complex phenomena

Newton’s laws, Maxwell’s equations…

Description of natural phenomena

Crédits: Dennis Gannon

Sunday 1 February 15

Page 14: Overview of big data in cloud computing

What’s BigData

Data has always been Big. The one aspect that differs now, if compared with the past, would be the sheer scale and accessibility of Data, which is the direct result of the super efficient speeds in which data can now be computed. Big Data is therefore an all-encompassing term for any collection of large data sets that were once difficult to process.

Big data requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times.

Sunday 1 February 15

Page 15: Overview of big data in cloud computing

Data mining -> BigData mining?

Sunday 1 February 15

Page 16: Overview of big data in cloud computing

Simplified BigData stack

Data analytics & visualization

Data processing frameworks (Streaming, MapReduce, BSP

model)

Data management systems BlobSeer

Sunday 1 February 15

Page 17: Overview of big data in cloud computing

BigData management

Sunday 1 February 15

Page 18: Overview of big data in cloud computing

NoSQL

Sunday 1 February 15

Page 19: Overview of big data in cloud computing

The last 25 years of commercial DBMS development can be summed up in a single phrase: "one size fits all". This phrase refers to the fact that the traditional DBMS architecture (originally designed and optimized for business data processing) has been used to support many data-centric applications with widely varying characteristics and requirements. In this paper, we argue that this concept is no longer applicable to the database market, and that the commercial world will fracture into a collection of independent database engines, some of which may be unified by a common front-end

Sunday 1 February 15

Page 20: Overview of big data in cloud computing

Sunday 1 February 15

Page 21: Overview of big data in cloud computing

Why NoSQL“The whole point of seeking alternatives [to RDBMS systems] is that you need to solve a problem that relational databases are a bad fit for.” Eric Evans - Rackspace ACID does not scaleWeb applications have different needs

Scalability ElasticityFlexible schema/ semi-structured data

Geographically distributedWeb applications do not always need

Transaction

Strong consistencyComplex queries

Sunday 1 February 15

Page 22: Overview of big data in cloud computing

Sunday 1 February 15

Page 23: Overview of big data in cloud computing

Sunday 1 February 15

Page 24: Overview of big data in cloud computing

Big Data processing engines

MapReduce

Sunday 1 February 15

Page 25: Overview of big data in cloud computing

Sunday 1 February 15

Page 26: Overview of big data in cloud computing

Stream processing

Sunday 1 February 15

Page 27: Overview of big data in cloud computing

Large scale graph processing

Sunday 1 February 15

Page 28: Overview of big data in cloud computing

2012

Sunday 1 February 15

Page 29: Overview of big data in cloud computing

2014

Sunday 1 February 15

Page 30: Overview of big data in cloud computing

Vanilla Hadoop ecosystem

Sunday 1 February 15

Page 31: Overview of big data in cloud computing

Hortonworks data flatform

Sunday 1 February 15

Page 32: Overview of big data in cloud computing

Sunday 1 February 15

Page 33: Overview of big data in cloud computing

Hadoop ecosystem: Microsoft HDinsight

Sunday 1 February 15

Page 34: Overview of big data in cloud computing

BigData & CloudA Match made in heaven?

Sunday 1 February 15

Page 35: Overview of big data in cloud computing

Sunday 1 February 15

Page 36: Overview of big data in cloud computing

Sunday 1 February 15

Page 37: Overview of big data in cloud computing

Cloud features

Sunday 1 February 15

Page 38: Overview of big data in cloud computing

Data in the Clouds

As estimated by IDC, by 2020, about 40% data globally would be touched with Cloud Computing.

Cloud adoption is accelerating – the amount of data stored in Amazon Web Services (AWS) S3 cloud storage has jumped from 262 billion objects in 2010 to over 1 trillion objects at the end of the first second of 2012.

Sunday 1 February 15

Page 39: Overview of big data in cloud computing

While enterprises often keep their most sensitive data in-house, huge volumes of data such as social media data may be located externally.

It is a fact that data that is too big to process is also too big to transfer anywhere, so it’s just the analytical program which needs to be moved—not the data.

"You don't want to be shipping terabytes and petabytes around,". "Keep the data where it is, and then you move the analytics … to that data."

Sunday 1 February 15

Page 40: Overview of big data in cloud computing

Cloud enables BigDataSome of the first adopters of big data in cloud computing are users that deployed Hadoop clusters in highly scalable and elastic clouds: IBM, Azure, AWS

Cloud computing democratizes big data – any enterprise can now work with unstructured data at a huge scale.Analytics-as-a-service (AaaS) models for cloud-based big data analytics

Sunday 1 February 15

Page 41: Overview of big data in cloud computing

Drivers for big data on cloud adoptionCost reduction

Managing cloud-based big data is cost-effective, scalable, and fast to build.

Rapid provisioning/time to market

Faster provisioning is important for big data applications because the value of data reduces quickly as time goes by. 

Flexibility/scalability

Big data analysis, especially in the life sciences industry, requires huge compute power for a brief amount of time. For this type of analysis, servers need to be provisioned in minutes.

Sunday 1 February 15

Page 42: Overview of big data in cloud computing

Sunday 1 February 15

Page 43: Overview of big data in cloud computing

Sunday 1 February 15

Page 44: Overview of big data in cloud computing

BigData is not always Cloud-appropriate

Low latency realtime data

Virtualization overhead

Multi-tenancy overhead

Scalability

Lack of cloud computing features to support RDBMS

Availability

“Rain cloud” incorporates clouds

Data integrity/privacy

Data can only be accessed by authorized users

Currently, encryption is utilized by most researchers to ensure data privacy in the cloud

Sunday 1 February 15

Page 45: Overview of big data in cloud computing

NoSQL vs SQL in the Cloud

Sunday 1 February 15

Page 46: Overview of big data in cloud computing

Data security/peformance trade-offs

Distributed nodes

Distributed data

Internode communication

RPC over TCP/IP?

Encrypted IO?

Security/performance trade-offs

Sunday 1 February 15

Page 47: Overview of big data in cloud computing

Cloud Architecture for Big Data

Resource scheduling and SLA for Big Data on CloudStorage and computation management in Cloud for Big Data

Large-scale data intensive workflow in support of Big Data processing on Cloud

Multiple source data processing and integration on Cloud

Virtualisation and visualisation of Big Data on Cloud

Fault tolerance and reliability for Big Data processing on Cloud

MapReduce with Cloud for Big Data processing

Distributed file storage system with Cloud for Big Data

Inter-cloud technology for Big Data

Security, privacy and trust in Big Data processing on Cloud

Green, energy-efficient models and sustainability issues in Cloud for Big Data processing

Cloud infrastructure for social networking with Big Data

User friendly Cloud access for Big Data processing

Innovative Cloud data centre networking for Big Data

Wireless and mobility support in Cloud data centre for Big Data

Sunday 1 February 15

Page 48: Overview of big data in cloud computing

BigData use cases

Sunday 1 February 15

Page 49: Overview of big data in cloud computing

Security Analytics

Sunday 1 February 15

Page 50: Overview of big data in cloud computing

Sunday 1 February 15

Page 51: Overview of big data in cloud computing

Sunday 1 February 15

Page 52: Overview of big data in cloud computing

Thank you for your attention

Sunday 1 February 15

Page 53: Overview of big data in cloud computing

Sunday 1 February 15

Page 56: Overview of big data in cloud computing

Classification of BigData

Sunday 1 February 15

Page 57: Overview of big data in cloud computing

Relationship between Cloud and BigData

Sunday 1 February 15

Page 58: Overview of big data in cloud computing

Sunday 1 February 15

Page 59: Overview of big data in cloud computing

Sunday 1 February 15

Page 60: Overview of big data in cloud computing

Open research issues

Data staging

Distributed storage systems: NoSQL, NewSQL

Data analysis

Data security

Sunday 1 February 15

Page 61: Overview of big data in cloud computing

In theory, Unfortunately, it’s not all good news.

DB administrators don’t have an easy ride. The NoSQL databases that have appeared in the last few years, with their key-value pairs, document stores, and missing schemas,

Sunday 1 February 15