bi with apache hadoop(en)

24
Business Integration with CDH 4 (including Apache Hadoop) Alexander Alten-Lorenz, Cloudera INC Muenchen, 22. February 2013

Upload: alexander-alten-lorenz

Post on 14-Jun-2015

808 views

Category:

Technology


4 download

DESCRIPTION

Simple, low level presentation just to get the audience smooth into hadoop and show them real use cases

TRANSCRIPT

Page 1: Bi with apache hadoop(en)

Business Integration withCDH 4

(including Apache Hadoop)

Alexander Alten-Lorenz, Cloudera INCMuenchen, 22. February 2013

Page 2: Bi with apache hadoop(en)

Challenges

Volume Velocity Variety

Page 3: Bi with apache hadoop(en)

Business Integration• CRM

• Analytics

• Social Networks

• Marketing

• Document Store

• Search-Indices

• Invoicing

• Risk Management

• Universal Data Access

• Data Governance

• SAP / Salesforce

• Article and Storage Management

Page 4: Bi with apache hadoop(en)

Use Cases

Page 5: Bi with apache hadoop(en)

Risk Management

• Problem: Scoring of Customers and Projects

• Solution: Finance History, Communication and Pattern Detection

• User: Finance, Insurance

Page 6: Bi with apache hadoop(en)

Recommendations

• Problem: Recommend convenient products to purchased products, matching the interests

• Solution: Statistical analysis of interests, purchase history, detect matching swarm patterns

• Users: eCommerce, Advertising

Page 7: Bi with apache hadoop(en)

Graph-Analytics

• Problem: Detect trends and curves in large distributed networks (Wired, Social, Mesh)

• Solution: Collecting and Data Mining all data, applying to self learning patterns to detect trends and forecasts

• User: Enterprises, Gov, NGO, Provider, Telco, Stock Exchange

Page 8: Bi with apache hadoop(en)

Detection of Dangerous Use

• Problem: Spam, Credit Card Abuse

• Solution: Pattern Detection, Prioritizing, heuristically Analytics

• Users: Retail, Finance, Reseller

Page 9: Bi with apache hadoop(en)

Text Analysis

• Problem: Detect the meaning of the written word (Sentiment Analysis)

• Solution: Keyword patterns, Coherences detection, Path detection

• Users: eCommerce, Social Media Service Provider, Attitude Research

Page 10: Bi with apache hadoop(en)

Amounts of real Data

• Ebay: 12 PB, Search Optimization

• Facebook: 50 PB, Logs, Reports

• Walmart, 4.5 PB, Customer Transactions

http://wiki.apache.org/hadoop/PoweredByhttp://en.wikipedia.org/wiki/Big_data

Page 11: Bi with apache hadoop(en)

Apache Hadoop

• Software Framework for large amounts of unstructured data

• Apache-License

• Two main cores

• HDFS: Distributed data storage

• MapReduce: Distributed data handling

Page 12: Bi with apache hadoop(en)

Hadoop ClusterData Node

Data Node: 4-16 Cores, 4-16 Disks, 8-64 GB RAM, 1-10GB Network

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Page 13: Bi with apache hadoop(en)

Hadoop Distributed File System

File

Block Block Block Block Block Block Block

Data Node Data Node Data Node

Page 14: Bi with apache hadoop(en)

MapReduceData

QueryRDBMS

Data

QueryHadoop

Page 15: Bi with apache hadoop(en)

Features

HDFS MapReduce

Distribution

Fault Tolerance

Scalability

✔ ✔

✔ ✔

✔ ✔

Page 16: Bi with apache hadoop(en)

Hadoop Eco System

MapReduce

HDFSJava API

RDBMS

Sqoop Flume

Logs

Connectors

...

Pig

Scripts

Hive

SQL HBase

Oozie

Zookeeper

Mahout

Hue

Whirr

Avro

Page 17: Bi with apache hadoop(en)

Example of a Integration

Page 18: Bi with apache hadoop(en)

Scope• Successful Audits per ISO 27001

• Analyze different Data Sources from different Data Bases and CRM Systems

• Realtime and Lifetime Statistics per Product

• Periodical Analytic and Statistic Jobs

• Weekly Re-Import into CRM

• Single Queries per User (Analyst) over a Secured GUI

Page 19: Bi with apache hadoop(en)

Solution Path• Cluster Authentication and Authorization via

Kerberos and crypted data communication / Data Protection

• Sqoop Connector to CRM / DB

• Terradata, Oracle, Postgres, MySQL, MS SQL

• Hive - HBase Integration

• Hive Analytics, controlled automatically over Oozie Workload Orchestrator

• Hue Shell, Authentication via Kerberos SPNEGO

Page 20: Bi with apache hadoop(en)

Sqoop

HiveHBase

Kerberos(AD, MITv5)

Oozie

HUEEnduser

CRM Park CDHIntegration Authentification

Automation

Real Time

Page 21: Bi with apache hadoop(en)

How to Manage?

Page 22: Bi with apache hadoop(en)

Cloudera Manager• Automated Deployment

• Monitoring

• Service Management

• Log Management

• Events and Alerts

• Reporting

• Support Integration

Page 23: Bi with apache hadoop(en)

Cloudera

• Founded 2009 in Palo Alto

• Cloudera's Distribution Including Hadoop

• CDH4 / Cloudera Manager 4

• > 320 employees worldwide

• Training, Consulting, Support, Development

• Enterprise Tools

Page 24: Bi with apache hadoop(en)

Thank You!

[email protected]

• Twitter: @mapredit

• Blog: mapredit.blogspot.com

• http://www.cloudera.com/

• http://hadoop. apache.org/