bigdata processing in the cloud – guest lecture - university of applied sciences rapperswil -...

Post on 02-Dec-2014

249 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

© 2013 IBM Corporation1

BigData processing in the cloud – Guest Lecture - University of Applied Sciences Rapperswil - 29.4.14

Romeo Kienzler

IBM Innovation Center

Source: http://res.sys-con.com/story/oct12/2398990/Cloud_BigData_468.jpg

© 2013 IBM Corporation2

What is BIG data?

© 2013 IBM Corporation3

What is BIG data?

© 2013 IBM Corporation4

What is BIG data?

Big Data

Hadoop

© 2013 IBM Corporation5

What is BIG data?

Business Intelligence

Data Warehouse

© 2013 IBM Corporation6

Map-Reduce → Hadoop → BigInsights

© 2013 IBM Corporation7

BigData UseCases● Google Index

● 40 X 10^9 = 40.000.000.000 => 40 billion pages indexed● Will break 100 PB barrier soon● Derived from MapReduce● now “caffeine” based on “percolator”

● Incremental vs. batch● In-Memory vs. disk

© 2013 IBM Corporation8

BigData UseCases● CERN LHC

● 25 petabytes per year● Facebook

● Hive Datawarehouse● 300 PB, growing 600 TB / d● > 100 k servers

● Genomics● Enterprises

● Data center analytics (Logflies, OS/NW monitors, ...)● Predictive Maintenance, Cybersecurity

● Social Media Analytics● DWH offload● Call Detail Record (CDR) data preservation

http://www.balthasar-glaettli.ch/vorratsdaten/

© 2013 IBM Corporation9

BigData Analytics

© 2013 IBM Corporation10

BigData Analytics – Predictive Analytics

"sometimes it's not who has the best algorithm that wins; it's who has the most data."

(C) Google Inc.

The Unreasonable Effectiveness of Data¹

¹http://www.csee.wvu.edu/~gidoretto/courses/2011-fall-cp/reading/TheUnreasonable%20EffectivenessofData_IEEE_IS2009.pdf

No Sampling => Work with full dataset => No p-Value/z-Scores anymore

© 2013 IBM Corporation11

Data Parallelism

© 2013 IBM Corporation12

Aggregated Bandwith between CPU, Main Memory and Hard Drive

1 TB (at 10 GByte/s)

- 1 Node - 100 sec

- 10 Nodes - 10 sec

- 100 Nodes - 1 sec

- 1000 Nodes - 100 msec

© 2013 IBM Corporation13

Fault Tolerance / Commodity Hardware

AMD Turion II Neo N40L (2x 1,5GHz / 2MB / 15W), 8 GB RAM,

3TB SEAGATE Barracuda 7200.14

< CHF 500

100 K => 200 X (2, 4, 3) => 400 Cores, 1,6 TB RAM, 200 TB HD

MTBF ~ 365 d > 1,5 d

Source: http://www.cloudcomputingpatterns.org/Watchdog

© 2013 IBM Corporation14

© 2013 IBM Corporation15

© 2013 IBM Corporation16

HDFS – Hadoop File System

© 2013 IBM Corporation17

© 2013 IBM Corporation18

© 2013 IBM Corporation19

© 2013 IBM Corporation20

© 2013 IBM Corporation21

© 2013 IBM Corporation22

© 2013 IBM Corporation23

© 2013 IBM Corporation24

© 2013 IBM Corporation25

© 2013 IBM Corporation26

© 2013 IBM Corporation27

© 2013 IBM Corporation28

© 2013 IBM Corporation29

© 2013 IBM Corporation30

© 2013 IBM Corporation31

© 2013 IBM Corporation32

© 2013 IBM Corporation33

© 2013 IBM Corporation34

© 2013 IBM Corporation35

Map-Reduce

Source: http://www.cloudcomputingpatterns.org/Map_Reduce

© 2013 IBM Corporation36

© 2013 IBM Corporation37

© 2013 IBM Corporation38

© 2013 IBM Corporation39

© 2013 IBM Corporation40

© 2013 IBM Corporation41

© 2013 IBM Corporation42

© 2013 IBM Corporation43

© 2013 IBM Corporation44

© 2013 IBM Corporation45

© 2013 IBM Corporation46

© 2013 IBM Corporation47

© 2013 IBM Corporation48

© 2013 IBM Corporation49

© 2013 IBM Corporation50

© 2013 IBM Corporation51

© 2013 IBM Corporation52

© 2013 IBM Corporation53

© 2013 IBM Corporation54

© 2013 IBM Corporation55

© 2013 IBM Corporation56

© 2013 IBM Corporation57

© 2013 IBM Corporation58

© 2013 IBM Corporation59

© 2013 IBM Corporation60

© 2013 IBM Corporation61

© 2013 IBM Corporation62

© 2013 IBM Corporation63

© 2013 IBM Corporation64

© 2013 IBM Corporation65

© 2013 IBM Corporation66

© 2013 IBM Corporation67

© 2013 IBM Corporation68

© 2013 IBM Corporation69

© 2013 IBM Corporation70

© 2013 IBM Corporation71

© 2013 IBM Corporation72

© 2013 IBM Corporation73

© 2013 IBM Corporation74

© 2013 IBM Corporation75

© 2013 IBM Corporation76

© 2013 IBM Corporation77

What role is the cloud playing here?

© 2013 IBM Corporation78

“Elastic” Scale-Out

Source: http://www.cloudcomputingpatterns.org/Continuously_Changing_Workload

© 2013 IBM Corporation79

“Elastic” Scale-Out

of

© 2013 IBM Corporation80

“Elastic” Scale-Out

of

CPU Cores

© 2013 IBM Corporation81

“Elastic” Scale-Out

of

CPU Cores Storage

© 2013 IBM Corporation82

“Elastic” Scale-Out

of

CPU Cores Storage

© 2013 IBM Corporation83

“Elastic” Scale-Out

of

CPU Cores Storage Memory

© 2013 IBM Corporation84

“Elastic” Scale-Out

of

CPU Cores Storage Memory

© 2013 IBM Corporation85

“Elastic” Scale-Out

linear

Source: http://www.cloudcomputingpatterns.org/Elastic_Platform

© 2013 IBM Corporation86

“Elastic” Scale-Out

linear

Source: http://www.cloudcomputingpatterns.org/Elastic_Platform

© 2013 IBM Corporation87

BigData Scale-Out

How do Databases Scale-Out?

© 2013 IBM Corporation88

BigData Scale-Out

How do Databases Scale-Out?

© 2013 IBM Corporation89

How do Databases Scale-Out?

Shared Disk Architectures

© 2013 IBM Corporation90

How do Databases Scale-Out?

Shared Disk Architectures

© 2013 IBM Corporation91

How do Databases Scale-Out?

Shared Nothing Architectures

© 2013 IBM Corporation92

Born on the cloud Databases

Source: http://www.constructioncloudcomputing.com/wp-content/uploads/2010/10/dreamstime_7360880-480x300.jpg

Source: http://www.cloudcomputingpatterns.org/Execution_Environment

© 2013 IBM Corporation93

Google AppEngine

Google App Engine is a Platform as a Service (PaaS) offering that lets you build and run applications on Google’s infrastructure. App Engine applications are easy to build, easy to maintain, and easy to scale as your traffic and data storage needs change. With App Engine, there are no servers for you to maintain. You simply upload your application and it’s ready to go.

Source: http://www.cloudcomputingpatterns.org/Platform_as_a_Service_%28PaaS%29

© 2013 IBM Corporation94

Google AppEngine Database Services

© 2013 IBM Corporation95

© 2013 IBM Corporation96

IBM BlueMix

BlueMix is a Platform as a Service Cloud, based on Cloud Foundry, employing Enterprise grade services enriched with IBM Software and hosted at SOFTLAYER

© 2013 IBM Corporation97

IBM BlueMix, a Cloudfoundry runtime

Linux VM

Linux VM

CodeRuntime

Framework+

Droplet

Linux VMContainer Container Container

SQL

Push

SSO

Services:

...

DropletDroplet

© 2013 IBM Corporation98

● Summary

● BigData is born on the cloud

● Cloud facilitates resource provisioning, configuration and deployment

● Highly innovative area

● Technology

● UseCases

● Links

● http://en.wikipedia.org/wiki/MapReduce

● http://www.se-radio.net/2013/12/episode-199-michael-stonebraker/

● Sign up for the free BlueMix beta

● http://bluemix.net

● Come to the BlueMix Days

● http://bit.ly/1lsIY8J

● Use our software

● Biginsights: http://www.ibm.com/software/data/infosphere/biginsights/quick-start/

top related