hadoop for the masses

28
Presented by Amandeep Modgil @amandeepmodgil David Hamilton @analyticsanvil Date 1 September 2016 Hadoop for the Masses General use and the Battle of Big Data

Upload: hadoop-summit

Post on 21-Jan-2017

192 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Hadoop for the Masses

Presented byAmandeep Modgil@amandeepmodgil

David Hamilton@analyticsanvil

Date1 September 2016

Hadoop for the MassesGeneral use and the Battle of Big Data

Page 2: Hadoop for the Masses

Hadoop for the Masses

Hadoop for the MassesGeneral use and the Battle of Big Data

| 2

Amandeep Modgil & David Hamilton – 1 September 2016

We’ll share our experience rolling out a Hadoop-based data lake to a self-service audience within

a corporate environment.

Page 3: Hadoop for the Masses

Hadoop for the Masses

Amandeep Modgil & David Hamilton – 1 September 2016

About us Birth of a Data Lake

Security Governance Change management

Learnings for making

Hadoop work in the

enterprise

Agenda

1 2 3 4 5 6

| 3

Page 4: Hadoop for the Masses

1About us

Page 5: Hadoop for the Masses

Hadoop for the Masses

About usOur background

| 5

Amandeep Modgil & David Hamilton – 1 September 2016

Page 6: Hadoop for the Masses

2Birth of a Data Lake

Page 7: Hadoop for the Masses

Hadoop for the Masses

Birth of a data lake› Large internal analytics community› Changing industry› Big(ish) data› Past pain points:

» Accessibility» Accuracy» Performance

Background

| 7

Amandeep Modgil & David Hamilton – 1 September 2016

Q2-2016Go live

Q3-2015Data

ingestion

Q2-2015Infra Go

liveQ1-2015Kick off

Q4-2014Feasibility

Page 8: Hadoop for the Masses

Hadoop for the Masses

Birth of a data lakeProject initiation

| 8

Amandeep Modgil & David Hamilton – 1 September 2016

Feasibility

Q4-2014

Technical and business requirement

s

Architecture design and roadmap

Decision to implement

Hadoop

POCs (functionality, integration)

Kick OffQ1-2015

Page 9: Hadoop for the Masses

Hadoop for the Masses

Birth of a data lakeData Landscape – Conceptual diagram

| 9

Amandeep Modgil & David Hamilton – 1 September 2016

Database Replication*

Windows Azure storage

Source Systems

Data Lake*(Hortonworks HDP)

RDBMS Application

Analytical Systems

* New components

EDW ODS

APISAP Application

Page 10: Hadoop for the Masses

Hadoop for the Masses

Birth of a data lake

Target landscape› Hortonworks HDP in Azure cloud (dev, test, prod)› Hive as initial use-case› Aims:

» Multiple legacy sources Unified data lake» Batch bottlenecks Parallel, scalable» ETL heavy landscape Schema on read, unstructured data

Project initiation

| 10

Amandeep Modgil & David Hamilton – 1 September 2016

Page 11: Hadoop for the Masses

Challenges in the enterprise…SecurityGovernanceChange Management

Taming the elephant

Page 12: Hadoop for the Masses

3Security

Page 13: Hadoop for the Masses

Hadoop for the Masses

Security

Challenges› Data security› Secure infrastructure› Provisioning access

Amandeep Modgil & David Hamilton – 1 September 2016

Challenges in the enterprise

| 13

Page 14: Hadoop for the Masses

Hadoop for the Masses

Security

› Filesystem security is essential» Difficult with some cloud storage

› Hive security via Ranger› Private cloud environment in MS Azure› Integrated authentication via Kerberos / AD› Secured access points to the cluster

Our experience

| 14

Amandeep Modgil & David Hamilton – 1 September 2016

Page 15: Hadoop for the Masses

4Governance

Page 16: Hadoop for the Masses

Hadoop for the Masses

Governance

Challenges› Platform reliability› Data quality› Keeping the lake “clean”

Amandeep Modgil & David Hamilton – 1 September 2016

Challenges in the enterprise

| 16

Page 17: Hadoop for the Masses

Hadoop for the Masses

Governance

› Naming standards essential› Metadata catalogue› Cluster resource management› Code management› Data quality› Monitoring

Our experience

| 17

Amandeep Modgil & David Hamilton – 1 September 2016

Page 18: Hadoop for the Masses

5Change management

Page 19: Hadoop for the Masses

Hadoop for the Masses

Change Management

Challenges› Requirements gathering› User education› Expectation management

Amandeep Modgil & David Hamilton – 1 September 2016

Challenges in the enterprise

| 19

Page 20: Hadoop for the Masses

Hadoop for the Masses

Change management

› Explain platform choice to users› Early rollout to key user groups› UI is important› Communicate differences with existing platforms

» Performance» Functionality

› Anticipate different user groups

Our experience

| 20

Amandeep Modgil & David Hamilton – 1 September 2016

Page 21: Hadoop for the Masses

6Learnings for making Hadoop work in the enterprise

Page 22: Hadoop for the Masses

Hadoop for the Masses

Learnings for making Hadoop work in the enterprise

Amandeep Modgil & David Hamilton – 1 September 2016

Understand the scale of the challenge

| 22

Deploying a new tool

Understand-ing Parallel concepts

Deploying for the en-

terprise

Security in-tegration

Building and governing for general

use

Complexity

Perc

eive

d di

fficu

lty

/ effo

rt

Page 23: Hadoop for the Masses

Hadoop for the Masses

Learnings for making Hadoop work in the enterprise

› Write guidelines, but use erasers› Some hard things are easy, some easy things are hard› Build reusable building blocks› Integration worthwhile, smoothness not guaranteed with all tools

» Other data platforms» ETL tools» Front-end tools

Our experience

| 23

Amandeep Modgil & David Hamilton – 1 September 2016

Page 24: Hadoop for the Masses

Hadoop for the Masses

Learnings for making Hadoop work in the enterprise

› Bulky ELT / ETL flows› Data archiving› Unstructured data› Streaming data› New capability

Strengths and opportunities

| 24

Amandeep Modgil & David Hamilton – 1 September 2016

Page 25: Hadoop for the Masses

Hadoop for the Masses

Amandeep Modgil & David Hamilton – 1 September 2016

About us Birth of a Data Lake

Security Governance Change management

Learnings for making

Hadoop work in the

enterprise

Agenda

1 2 3 4 5 6

| 25

ü ü ü ü ü ü

Page 26: Hadoop for the Masses

Questions?

Page 27: Hadoop for the Masses

Hadoop for the Masses

Contact us

› https://au.linkedin.com/in/amandeep-modgil › https://au.linkedin.com/in/davidhamiltonau

| 27

Amandeep Modgil & David Hamilton – 1 September 2016

Page 28: Hadoop for the Masses

Hadoop for the Masses

Image credits

› ‘img_9646’ by Leonid Mamchenkov https://www.flickr.com/photos/mamchenkov/2955225736 under a Creative Commons Attribution 2.0. Full terms at http://creativecommons.org/licenses/by/2.0.

› ‘Bicycle Security’ by Sean MacEntee https://www.flickr.com/photos/smemon/9565907428 under a Creative Commons Attribution 2.0. Full terms at http://creativecommons.org/licenses/by/2.0.

› ‘Traffic Cop’ by Eric Chan https://www.flickr.com/photos/maveric2003/27022816 under a Creative Commons Attribution 2.0. Full terms at http://creativecommons.org/licenses/by/2.0.

› ‘restoration’ by zoetnet https://www.flickr.com/photos/zoetnet/5944551574 under a Creative Commons Attribution 2.0. Full terms at http://creativecommons.org/licenses/by/2.0.

| 28

Amandeep Modgil & David Hamilton – 1 September 2016