20150630 kca big-data-with-cloud_output
TRANSCRIPT
Building Big Data Infrastructure with Cloud Service
KKBOX/MOPCONericpi
About ericpi ( 畢瑄易 )
• KKBOX– RD Center Senior Engineer– COO Manager / KH Side Leader
• Mobile/Open/Platform CONference –MOPCON 2012/2013/2014/2015 Organizer
• Kaohsiung Software Developer Group (KSDG)– Organizer
關於 Big Data, 可能很多人聽過 3Vs
關於 Big Data, 可能很多人聽過 4Vs
IBM 說 4Vs 是 -
http://www.ibmbigdatahub.com/infographic/four-vs-big-data
也有人說 4Vs 應該是
• volume• variety• velocity• variability
一個 Big Data, 各自 表述 ?
還好有個組織很負責任跳出來…NIST: http://bigdatawg.nist.gov/V1_output_docs.php
NIST WG 定義 Big Data
• Big Data refers to the inability of traditional data architectures to efficiently handle the new datasets.
• characteristics (4Vs)– volume– variety– velocity– variability
NIST WG 還強調了…
• Big Data consists of extensive datasets - primarily in the characteristics of volume, variety, velocity, and/or variability - that require a scalable architecture for efficient storage, manipulation, and analysis.
NIST WG 還強調了…
• The Big Data paradigm consists of the distribution of data systems across horizontally coupled, independent resources to achieve the scalability needed for the efficient processing of extensive datasets.
NIST 並未著墨 多大 & 什麼硬體
efficient & scalability 倒是常出現
本日熱門 2 - Cloud Computing
• NIST 定義 Essential Characteristics– Broad network access– Resource pooling– Rapid elasticity–Measured service
本日熱門 2 - Cloud Computing
• NIST 定義 Service Models– SaaS (Software as a Service)– PaaS (Platform as a Service)– IaaS (Infrastructure as a Service)
本日熱門 2 - Cloud Computing
• NIST 定義 Service Models– SaaS (Software as a Service)– PaaS (Platform as a Service)– IaaS (Infrastructure as a Service)
Big Data 跟 Cloud Computing
• 從資料的角度– 資料存儲量– 跨機房存取– 異地同步
Big Data 跟 Cloud Computing
• 從運算規模及基礎架構– 硬體投資– 超高運算量– 分散運算
所有創新產業的基礎工業
所有創新產業的基礎工業
軟體
2011/08 WSJ: Why Software Is Eating The World
http://www.wsj.com/articles/SB10001424053111903480904576512250915629460
Big Data / Cloud Computing 在 NIST 文件中 99% 在定義軟體架構
未來的珍希資源 - 軟體工程師
未來的珍希資源 - 資料科學家
未來最珍貴的資源都在 - 人才
架構實踐 - 資料來源多元性• 找到正確資料源
– Website / Mobile Service
– Open Data– Open API
• 資訊收集工具
– Centralized Log Management
– Spider / Crawler– 3rd Application
架構實踐 - 資料來源多元性• 對應雲端服務
– IaaS: EC2– PaaS
• Elastic Beanstalk• Lambda
• 資訊收集工具
– Centralized Log Management
– Spider / Crawler– 3rd Application
架構實踐 - 資料型別多樣性
• RDBMS• NoSQL
架構實踐 - 資料型別多樣性
• RDBMS• NoSQL
• 異質資料存儲• NoSQL 資料塑模
架構實踐 - 資料型別多樣性
• RDBMS• NoSQL
• 雲端方案
– RDS– SimpleDB, S3,
DynamoDB– @EC2: Cassandra,
Hbase.. blah
架構實踐 - 資料處理的流量• 巨量資料處理的顯學 : MapReduce
http://www.pinaldave.com/bimg/mapreduce.jpg
架構實踐 - 資料處理的流量• cloud service 的 reduce 方案– Elastic MapReduce
架構實踐 - 資料角色
• 不只異質 , 而且可以異位
參考架構探討 – NIST Ref. Arch
http://bigdatawg.nist.gov/V1_output_docs.php
參考架構探討 – AWS Large Scale Processing and Huge Data sets
http://aws.amazon.com/architecture/
THANKS官方網站http://mopcon.org/
粉絲群http://fb.me/mopcon