ieg 201402 intuit building big data analytics platform

19

Upload: information-excellence

Post on 23-Jan-2015

242 views

Category:

Business


2 download

DESCRIPTION

Information Excellence Group 2014 Spring "Business Analytics Industry Summit", Building Big Data Analytics Platform, Neeta Pande, Data Architect, INTUIT

TRANSCRIPT

Page 1: IEG 201402 INTUIT Building Big Data Analytics Platform
Page 2: IEG 201402 INTUIT Building Big Data Analytics Platform

INTUIT:

Building Big Data Analytics Platform

at IntuitNeeta Pande

Page 3: IEG 201402 INTUIT Building Big Data Analytics Platform

Building Big Data Analytics Platformat Intuit

8/Feb/2014Neeta Pande

Page 4: IEG 201402 INTUIT Building Big Data Analytics Platform

Roadmap

• Setting Context and Introduction to the Analytics Platform at Intuit

• Key highlights that differentiates the platform

• Sharing Experiences building the platform

• Wish-list of capabilities for future of Big data technologies

Page 5: IEG 201402 INTUIT Building Big Data Analytics Platform

Setting Context and Intro to the Analytics Platform

Page 6: IEG 201402 INTUIT Building Big Data Analytics Platform

Quick look into Intuit Offerings

Page 7: IEG 201402 INTUIT Building Big Data Analytics Platform

• Central repository of Analytical Data from – Intuit products– Intuit Business Systems– Intuit Master Systems– External Data Sources

• Caters to– Product Managers– Product Developers– Data Analysts– Data Scientists– Experience Designers

Enterprise Wide Platform for cross Intuit Data Analytics7

Introduction to the Analytical Platform

Page 8: IEG 201402 INTUIT Building Big Data Analytics Platform

Technologies used to build the platform

HCAT

ALO

G

Page 9: IEG 201402 INTUIT Building Big Data Analytics Platform

Key highlights that differentiates the platform

Page 10: IEG 201402 INTUIT Building Big Data Analytics Platform

10

Product User

Entered Data

Product User

Entered Data

Product Usage Data

Product Usage Data

BusinessData

BusinessData

Master Data

Master Data

External Data

External Data

Data IntegrationData Integration

Policy based Access ControlPolicy based Access Control

Management, PM, PD, Data Analyst, Data Scientist

Central Analytics Platform

Batch Near Realtime Realtime

Capability View of the Platform

Page 11: IEG 201402 INTUIT Building Big Data Analytics Platform

Enterprise wide data across all offerings and cross-offerings

CohostSensitive Informationon same infrastructure

Batch, Near Real Time, Real time on the sameinfrastructure

Mobile, Web,Desktop Offerings

DWH Semantic layers on Hadoop

Key differentiators of the Platform

Page 12: IEG 201402 INTUIT Building Big Data Analytics Platform

• DWH patterns like SCD, surrogate key, fact updates challenging

Data Pipeline and Challenges

1

2

3

4

5

67 8

Data Acquisition

Data Cleansing

Data Standardization

Data Securitization

Incremental load

Entity Mastering

DWH load

Data Consumption

• Cleansing and Standardization need third party libraries

• Part of the same flow and need a hadoopintegration

• Encryption of sensitive information

• Tokenization for join optimization on sensitive fields

• Extract Analytical information before encryption

• Challenge loading data from transactional sources

• MDM solutions from major vendors do not provide mastering in Hadoop.

• Interactive exploration in MPP-RDBMS because of Advanced SQL and query performance

• Sampling and extraction for building models in R

Page 13: IEG 201402 INTUIT Building Big Data Analytics Platform

Sharing Experiences building the platform

Page 14: IEG 201402 INTUIT Building Big Data Analytics Platform

• Batch Data Integration – Evaluated and found Big Data Integration capabilities of Informatica relevant for the Platform

• Real time – Using Flume for real time use cases. Found Kafka and storm to be a good fit from several requirements POV.

Evaluated and found InformaticaData Quality good fit for Data Cleansing and Standardization integrated in the same flow as Batch Data Integration

• Custom Implementation of symmetric key Encryption/Decryption.

• Hadoop does not provide out of the box solution

• Evaluated Third Party Solutions, not matured enough

• Key management using HSM (Safenet)

• Decryption UDFs in MR, PIG, Hive shielding developers/users from the security implementation

Custom Implementation of Mastering solution in-hadoop.

• Leading MDM solutions do not have Hadoop Integration

• Some open source tools have MDM capabilities, but not matured and widely adopted.

• Traditional DWH and incremental loads challenging on Hadoop.

• Upserts and SCD handled best in HBase and exposed via HCatalog for querying

• The adhoc query capabilities still not matured/adopted and hence MPP-RDBMS still preferred.

• Large Scale machine learning infrastructure still being adopted. Hence widely used technology options not in place

Page 15: IEG 201402 INTUIT Building Big Data Analytics Platform

Wish-list for future of Hadoop

Page 16: IEG 201402 INTUIT Building Big Data Analytics Platform

Data Security support built in to the platform

MDM solutions integrated and optimized for the platform

Interactive querying capabilities on the big data platforms (Impala, Tez)

Better support for traditional DWH capabilities

Integrated Real time, Near real time and Batch processing pipelines

Distributed machine learning technologies with comprehensive and advanced capabilities

Opensource end to end data quality solutions integrated with the platform

Page 17: IEG 201402 INTUIT Building Big Data Analytics Platform

Q & A

Thank you

Page 18: IEG 201402 INTUIT Building Big Data Analytics Platform

Community Focused

Volunteer Driven

Knowledge Share

Accelerated Learning

Collective Excellence

Distilled Knowledge

Shared, Non Conflicting Goals

Validation / Brainstorm platform

Mentor, Guide, Coach

Satisfied, Empowered Professional

Richer Industry and Academia

About Information Excellence Group

Progress Information Excellence

Towards an Enriched Profession, Business and Society

Page 19: IEG 201402 INTUIT Building Big Data Analytics Platform

About Information Excellence GroupReach us at:

blog: http://informationexcellence.wordpress.com/

presentations: http://www.slideshare.net/informationexcellence

linked in:http://www.linkedin.com/groups/Information-Excellence-3893869

Facebook:http://www.facebook.com/pages/Information-excellence-group/171892096247159

Google+: https://plus.google.com/u/0/communities/102316155996060621595

twitter: #infoexcelemail: [email protected]

[email protected]

Have you enriched yourself by contributing to the community Knowledge Share..