using hadoop for big data

93
Hadoop for (Young) Data Scientist Komes Chandavimol and Team Data Science Lab, Thailand [email protected]

Upload: data-science-thailand

Post on 16-Apr-2017

2.328 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Using hadoop for big data

Hadoop for (Young) Data Scientist

Komes Chandavimol and TeamData Science Lab, Thailand

[email protected]

Page 2: Using hadoop for big data

Agenda

• Big Data, Analytics and Data Science

• Hadoop + Sparks Workshops

• Sharing Experience: Hadoop (Real) Use Cases

• Hadoop + Spark Trends,

Page 3: Using hadoop for big data

3

Big Data, Analytics and Data Science

Page 4: Using hadoop for big data

Big Data

http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427

https://www.domo.com/learn/data-never-sleeps-3-0

Page 5: Using hadoop for big data

Internet of Things

http://topmanagement.com.mx/innovacion-social-y-empresarial-objetivo-de-hitachi/

Page 6: Using hadoop for big data

6http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427

https://www.domo.com/learn/data-never-sleeps-3-0

The Growth of Data

Page 7: Using hadoop for big data

7http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427

https://www.domo.com/learn/data-never-sleeps-3-0

What is Big Data?

Page 8: Using hadoop for big data

8http://blogs.forrester.com/category/hadoophttp://solutions.forrester.com/Global/FileLib/webinars/Big_Data_-_Gold_Rush_or_Illusion.pdf

The Big Data Tools

Page 9: Using hadoop for big data

http://thebigdatablog.weebly.com/blog/the-hadoop-ecosystem-overview

Page 10: Using hadoop for big data
Page 11: Using hadoop for big data

11http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/

Traditional Data Management Architecture

Page 12: Using hadoop for big data

12http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/

New Data Management Architecture

Page 13: Using hadoop for big data

13http://www.kdnuggets.com/2014/05/big-data-landscape-v30-analyzed.html

Page 14: Using hadoop for big data

14

https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now

Data Lake

Page 15: Using hadoop for big data

How the Data Lake works?

15http://www.clearpeaks.com/blog/category/tableau

Traditional Enterprise Data warehouse

Page 16: Using hadoop for big data

16

What you consume from Data Lake?

https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now

Page 17: Using hadoop for big data

17

Volume? Variety? Velocity?

https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now

Page 18: Using hadoop for big data

18

Value

https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now

Page 19: Using hadoop for big data

19

Big Data + Analytics = Values

https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now

Page 20: Using hadoop for big data

Big Data Analytics

20http://hortonworks.com/blog/big-data-refinery-fuels-next-generation-data-architecture/

Page 21: Using hadoop for big data

Big Data Analytics

21http://dataofthings.blogspot.com/2014/04/the-bbbt-sessions-hortonworks-big-data.html

Page 22: Using hadoop for big data

Big Data Analytics

22http://www.gartner.com/it-glossary/predictive-analytics

Page 23: Using hadoop for big data

23

How to do Big Data Analytics?

https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now

Page 24: Using hadoop for big data
Page 25: Using hadoop for big data

Data Science Experience Sharing, Big Data Challenge #2,Bangkok Thailand

http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

What is Data Science?

Page 26: Using hadoop for big data

The Rise of Data Scientist

27

http://flowingdata.com/2009/06/04/rise-of-the-data-scientist/

2009

https://hbr.org/

Page 27: Using hadoop for big data

28http://hrb.org

http://www.anlytcs.com/2014/01/data-science-venn-diagram-v20.html

2014

The Rise of Data Scientist

Page 28: Using hadoop for big data

Data Science Experience Sharing, Big Data Challenge #2,Bangkok Thailand

http://www.anlytcs.com/2014/01/data-science-venn-diagram-v20.html

2014

The Data Science

Page 29: Using hadoop for big data

30

The Solution, Data Science Team

Page 30: Using hadoop for big data

31

Data Science Team

Doing Data Science by O'Neil et al (2013)

Page 31: Using hadoop for big data

32

Doing Data Science by O'Neil et al (2013)

Page 32: Using hadoop for big data

33

Doing Data Science by O'Neil et al (2013)

Data Science Team

Analyzing the Analyzers, Harris (2013)

Page 33: Using hadoop for big data

34

Data Science TeamData Scientist & Data Engineer

http://www.kdnuggets.com/2015/11/different-data-science-roles-industry.html

Page 34: Using hadoop for big data

35

Data Science TeamData Scientist & Data Engineer

http://www.kdnuggets.com/2015/11/different-data-science-roles-industry.htmlhttps://www.facebook.com/DataScienceTh/posts/931828353527079:0

Page 35: Using hadoop for big data

36

Data Science Professionals

http://www.kdnuggets.com/2015/11/different-data-science-roles-industry.html

Page 36: Using hadoop for big data

37

Data Science for Dummies Pierson

(2015)

∗Build In-house Team

• Train existing employee

• Train existing employee and hire experts

• Hire experts

∗Outsourcing requirements to private DS consultants

• Outsourcing for comprehensive DS Strategy development

• Outsource for DS Solutions to specific problem

∗Leverage Cloud-based platform solutions

How to build DS Team?

Page 37: Using hadoop for big data

Machine Learning

Improving Performance in some Task with Experience”. Tom Mitchell

Tom Mitchell (1998)

The field of study that gives computers the ability to learn

without being explicitly programmed. Arthur Samuel (1990)

Wikipedia, Data Visualization for Dummies (2014)

Data Points: Visualization That Means Something(2013)38

Machine Learning deals with systems

that can learn from data.

Page 38: Using hadoop for big data

39

Page 39: Using hadoop for big data

Machine Learning Discovery

• Class Discovery• Correlation Discovery• Novelty (Surprise) Discovery• Association (or Link Discovery)

40

KirkBorne-workshop-ODSC2016.pdf

Page 40: Using hadoop for big data

The XYZ of Data Science

Smart X : • Smart Cities • Smart Highways • Smart Supply Chain Precision Y : • Precision Medicine • Precision Farming • Precision Pricing Personalized Z : • Personalized Health • Personalized Learning • Personalized Shopping Experience

41KirkBorne-Workshop-ODSC2016.pdf

Intelligence at the edge of the network… at the point of data collection

Page 41: Using hadoop for big data

42DataInquest – Predictive Analytics and Data Science Bootcamp

Page 42: Using hadoop for big data

Data Science is a Team Sport

http://www.ibmbigdatahub.com/blog/why-data-science-team-sport

Page 43: Using hadoop for big data

44

How to Start?

Page 44: Using hadoop for big data

45

Hadoop + Spark Workshops

Page 45: Using hadoop for big data
Page 46: Using hadoop for big data
Page 47: Using hadoop for big data
Page 48: Using hadoop for big data

49

Workshop #1 การติดตั้ง HDFS และ YARN

Page 49: Using hadoop for big data
Page 50: Using hadoop for big data

51

Workshop #2 WordCount

Page 51: Using hadoop for big data
Page 52: Using hadoop for big data

53

Workshop #3 WordCount (Streaming)

Page 53: Using hadoop for big data

54

Workshop #4 WordCount(Frequency Sort)

Page 54: Using hadoop for big data
Page 55: Using hadoop for big data

56

Workshop #5 Setup Cloudera QuickStart

Page 56: Using hadoop for big data
Page 57: Using hadoop for big data

58

Workshop #6 Exploring HBASE data in HUE

Page 58: Using hadoop for big data

59

Workshop #7 Design a Schema for quick twitter

relationship lookup

Page 59: Using hadoop for big data

60

Workshop #8 Design a schema for IoT log

(Smart Meter)

Page 60: Using hadoop for big data

61

Workshop #9 Create an HBase table for

Smart meter data

Page 61: Using hadoop for big data

62

Workshop #10 Bank Customer Snapshot

Page 62: Using hadoop for big data
Page 63: Using hadoop for big data
Page 64: Using hadoop for big data

65

Workshop #10.1 -10.1 Create Hive Tables

10.2 Create External Hive Tables10.3 Create External Hive Tables

10.4 Partition

Page 65: Using hadoop for big data
Page 66: Using hadoop for big data

67

Workshop #11SQOOP

Page 67: Using hadoop for big data
Page 68: Using hadoop for big data
Page 69: Using hadoop for big data
Page 70: Using hadoop for big data
Page 71: Using hadoop for big data
Page 72: Using hadoop for big data

73

Workshop spk1 WordCountspk2 WordCountspk3 WordCount

Page 73: Using hadoop for big data
Page 74: Using hadoop for big data
Page 75: Using hadoop for big data

76

Workshop spk4 SparkSQL + ML

Page 76: Using hadoop for big data
Page 77: Using hadoop for big data
Page 78: Using hadoop for big data
Page 79: Using hadoop for big data
Page 80: Using hadoop for big data
Page 81: Using hadoop for big data
Page 82: Using hadoop for big data
Page 83: Using hadoop for big data

84

Sharing Experience:

Page 84: Using hadoop for big data

Source: Analytics: The New Path to Value, a joint MIT Sloan Management Review and IBM Institute for Business Value study. Copyright © Massachusetts Institute of Technology 2010.

Top Performers Use Analytics 5

Times More Than Lower

Performers

Page 85: Using hadoop for big data

Revenue - Cost = Profit

Page 86: Using hadoop for big data

Monitoring and MaintenanceData sources: IoT Sensors in factory

Data products: predictive maintenance models

http://www.electrex.it/en/news/600-automated-energy-management-system-a-enms-for-cement-production-plants.html

Page 87: Using hadoop for big data

Customer Engagement + LocationData sources: Mobile App, Loyalty Program, GIS

Data products: Buying behavior analysis, coupon-response model , location visualizationhttp://www.fastcompany.com/3020859/most-creative-people/how-chinas-one-child-policy-forced-starbucks-to-rethink-its-beijing-sto

Page 88: Using hadoop for big data

Fuel Saving Data sources: Telematics (sensor), GPS

Data products: Prescriptive analytics – route

optimization, predictive maintenance

(parts/malfunction)http://www.cnet.com/news/ups-turns-data-analysis-into-big-savings/

Page 89: Using hadoop for big data

Fraud DetectionData sources: historical pattern of transaction data

Data products: predictive models – fraud/non-fraudhttps://bluefishway.com/2013/09/13/panic-oh-no-not-again/

Page 90: Using hadoop for big data

HR Analytics – Google Hiring Data sources: Historical hiring attributesData products: Predictive model – recruiting high performer

Behavioral Test

Situational Test

GPA

Brain Teaser

Good School

Page 91: Using hadoop for big data

Average ROI of Analytics/Data Science

Page 92: Using hadoop for big data

93

Hadoop + Spark Trends

Page 93: Using hadoop for big data