20150630 kca big-data-with-cloud_output

36
Building Big Data Infrastructure with Cloud Service KKBOX/MOPCON ericpi

Upload: ericpi-bi

Post on 06-Aug-2015

307 views

Category:

Software


1 download

TRANSCRIPT

Page 1: 20150630 kca big-data-with-cloud_output

Building Big Data Infrastructure with Cloud Service

KKBOX/MOPCONericpi

Page 2: 20150630 kca big-data-with-cloud_output

About ericpi ( 畢瑄易 )

• KKBOX– RD Center Senior Engineer– COO Manager / KH Side Leader

• Mobile/Open/Platform CONference –MOPCON 2012/2013/2014/2015 Organizer

• Kaohsiung Software Developer Group (KSDG)– Organizer

Page 3: 20150630 kca big-data-with-cloud_output

關於 Big Data, 可能很多人聽過 3Vs

Page 4: 20150630 kca big-data-with-cloud_output

關於 Big Data, 可能很多人聽過 4Vs

Page 5: 20150630 kca big-data-with-cloud_output

IBM 說 4Vs 是 -

http://www.ibmbigdatahub.com/infographic/four-vs-big-data

Page 6: 20150630 kca big-data-with-cloud_output

也有人說 4Vs 應該是

• volume• variety• velocity• variability

Page 7: 20150630 kca big-data-with-cloud_output

一個 Big Data, 各自 表述 ?

Page 8: 20150630 kca big-data-with-cloud_output

還好有個組織很負責任跳出來…NIST: http://bigdatawg.nist.gov/V1_output_docs.php

Page 9: 20150630 kca big-data-with-cloud_output

NIST WG 定義 Big Data

• Big Data refers to the inability of traditional  data architectures to efficiently handle the  new datasets.

• characteristics (4Vs)– volume– variety– velocity– variability

Page 10: 20150630 kca big-data-with-cloud_output

NIST WG 還強調了…

• Big Data consists of extensive datasets - primarily in the characteristics of volume, variety, velocity, and/or variability - that require a scalable architecture for efficient storage, manipulation, and analysis.

Page 11: 20150630 kca big-data-with-cloud_output

NIST WG 還強調了…

• The Big Data paradigm consists of the  distribution of data systems across horizontally coupled, independent resources to achieve the scalability needed for the efficient processing of extensive datasets.

Page 12: 20150630 kca big-data-with-cloud_output

NIST 並未著墨 多大 & 什麼硬體

Page 13: 20150630 kca big-data-with-cloud_output

efficient & scalability 倒是常出現

Page 14: 20150630 kca big-data-with-cloud_output

本日熱門 2 - Cloud Computing

• NIST 定義 Essential Characteristics– Broad network access– Resource pooling– Rapid elasticity–Measured service

Page 15: 20150630 kca big-data-with-cloud_output

本日熱門 2 - Cloud Computing

• NIST 定義 Service Models– SaaS (Software as a Service)– PaaS (Platform as a Service)– IaaS (Infrastructure as a Service)

Page 16: 20150630 kca big-data-with-cloud_output

本日熱門 2 - Cloud Computing

• NIST 定義 Service Models– SaaS (Software as a Service)– PaaS (Platform as a Service)– IaaS (Infrastructure as a Service)

Page 17: 20150630 kca big-data-with-cloud_output

Big Data 跟 Cloud Computing

• 從資料的角度– 資料存儲量– 跨機房存取– 異地同步

Page 18: 20150630 kca big-data-with-cloud_output

Big Data 跟 Cloud Computing

• 從運算規模及基礎架構– 硬體投資– 超高運算量– 分散運算

Page 19: 20150630 kca big-data-with-cloud_output

所有創新產業的基礎工業

Page 20: 20150630 kca big-data-with-cloud_output

所有創新產業的基礎工業

軟體

Page 21: 20150630 kca big-data-with-cloud_output

2011/08 WSJ: Why Software Is Eating The World

http://www.wsj.com/articles/SB10001424053111903480904576512250915629460

Page 22: 20150630 kca big-data-with-cloud_output

Big Data / Cloud Computing 在 NIST 文件中 99% 在定義軟體架構

Page 23: 20150630 kca big-data-with-cloud_output

未來的珍希資源 - 軟體工程師

Page 24: 20150630 kca big-data-with-cloud_output

未來的珍希資源 - 資料科學家

Page 25: 20150630 kca big-data-with-cloud_output

未來最珍貴的資源都在 - 人才

Page 26: 20150630 kca big-data-with-cloud_output

架構實踐 - 資料來源多元性• 找到正確資料源

– Website / Mobile Service

– Open Data– Open API

• 資訊收集工具

– Centralized Log Management

– Spider / Crawler– 3rd Application

Page 27: 20150630 kca big-data-with-cloud_output

架構實踐 - 資料來源多元性• 對應雲端服務

– IaaS: EC2– PaaS

• Elastic Beanstalk• Lambda

• 資訊收集工具

– Centralized Log Management

– Spider / Crawler– 3rd Application

Page 28: 20150630 kca big-data-with-cloud_output

架構實踐 - 資料型別多樣性

• RDBMS• NoSQL

Page 29: 20150630 kca big-data-with-cloud_output

架構實踐 - 資料型別多樣性

• RDBMS• NoSQL

• 異質資料存儲• NoSQL 資料塑模

Page 30: 20150630 kca big-data-with-cloud_output

架構實踐 - 資料型別多樣性

• RDBMS• NoSQL

• 雲端方案

– RDS– SimpleDB, S3,

DynamoDB– @EC2: Cassandra,

Hbase.. blah

Page 31: 20150630 kca big-data-with-cloud_output

架構實踐 - 資料處理的流量• 巨量資料處理的顯學 : MapReduce

http://www.pinaldave.com/bimg/mapreduce.jpg

Page 32: 20150630 kca big-data-with-cloud_output

架構實踐 - 資料處理的流量• cloud service 的 reduce 方案– Elastic MapReduce

Page 33: 20150630 kca big-data-with-cloud_output

架構實踐 - 資料角色

• 不只異質 , 而且可以異位

Page 34: 20150630 kca big-data-with-cloud_output

參考架構探討 – NIST Ref. Arch

http://bigdatawg.nist.gov/V1_output_docs.php

Page 35: 20150630 kca big-data-with-cloud_output

參考架構探討 – AWS Large Scale Processing and Huge Data sets

http://aws.amazon.com/architecture/

Page 36: 20150630 kca big-data-with-cloud_output

THANKS官方網站http://mopcon.org/

粉絲群http://fb.me/mopcon