bi with apache hadoop(en)

Post on 14-Jun-2015

808 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Simple, low level presentation just to get the audience smooth into hadoop and show them real use cases

TRANSCRIPT

Business Integration withCDH 4

(including Apache Hadoop)

Alexander Alten-Lorenz, Cloudera INCMuenchen, 22. February 2013

Challenges

Volume Velocity Variety

Business Integration• CRM

• Analytics

• Social Networks

• Marketing

• Document Store

• Search-Indices

• Invoicing

• Risk Management

• Universal Data Access

• Data Governance

• SAP / Salesforce

• Article and Storage Management

Use Cases

Risk Management

• Problem: Scoring of Customers and Projects

• Solution: Finance History, Communication and Pattern Detection

• User: Finance, Insurance

Recommendations

• Problem: Recommend convenient products to purchased products, matching the interests

• Solution: Statistical analysis of interests, purchase history, detect matching swarm patterns

• Users: eCommerce, Advertising

Graph-Analytics

• Problem: Detect trends and curves in large distributed networks (Wired, Social, Mesh)

• Solution: Collecting and Data Mining all data, applying to self learning patterns to detect trends and forecasts

• User: Enterprises, Gov, NGO, Provider, Telco, Stock Exchange

Detection of Dangerous Use

• Problem: Spam, Credit Card Abuse

• Solution: Pattern Detection, Prioritizing, heuristically Analytics

• Users: Retail, Finance, Reseller

Text Analysis

• Problem: Detect the meaning of the written word (Sentiment Analysis)

• Solution: Keyword patterns, Coherences detection, Path detection

• Users: eCommerce, Social Media Service Provider, Attitude Research

Amounts of real Data

• Ebay: 12 PB, Search Optimization

• Facebook: 50 PB, Logs, Reports

• Walmart, 4.5 PB, Customer Transactions

http://wiki.apache.org/hadoop/PoweredByhttp://en.wikipedia.org/wiki/Big_data

Apache Hadoop

• Software Framework for large amounts of unstructured data

• Apache-License

• Two main cores

• HDFS: Distributed data storage

• MapReduce: Distributed data handling

Hadoop ClusterData Node

Data Node: 4-16 Cores, 4-16 Disks, 8-64 GB RAM, 1-10GB Network

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Hadoop Distributed File System

File

Block Block Block Block Block Block Block

Data Node Data Node Data Node

MapReduceData

QueryRDBMS

Data

QueryHadoop

Features

HDFS MapReduce

Distribution

Fault Tolerance

Scalability

✔ ✔

✔ ✔

✔ ✔

Hadoop Eco System

MapReduce

HDFSJava API

RDBMS

Sqoop Flume

Logs

Connectors

...

Pig

Scripts

Hive

SQL HBase

Oozie

Zookeeper

Mahout

Hue

Whirr

Avro

Example of a Integration

Scope• Successful Audits per ISO 27001

• Analyze different Data Sources from different Data Bases and CRM Systems

• Realtime and Lifetime Statistics per Product

• Periodical Analytic and Statistic Jobs

• Weekly Re-Import into CRM

• Single Queries per User (Analyst) over a Secured GUI

Solution Path• Cluster Authentication and Authorization via

Kerberos and crypted data communication / Data Protection

• Sqoop Connector to CRM / DB

• Terradata, Oracle, Postgres, MySQL, MS SQL

• Hive - HBase Integration

• Hive Analytics, controlled automatically over Oozie Workload Orchestrator

• Hue Shell, Authentication via Kerberos SPNEGO

Sqoop

HiveHBase

Kerberos(AD, MITv5)

Oozie

HUEEnduser

CRM Park CDHIntegration Authentification

Automation

Real Time

How to Manage?

Cloudera Manager• Automated Deployment

• Monitoring

• Service Management

• Log Management

• Events and Alerts

• Reporting

• Support Integration

Cloudera

• Founded 2009 in Palo Alto

• Cloudera's Distribution Including Hadoop

• CDH4 / Cloudera Manager 4

• > 320 employees worldwide

• Training, Consulting, Support, Development

• Enterprise Tools

Thank You!

• alexander@cloudera.com

• Twitter: @mapredit

• Blog: mapredit.blogspot.com

• http://www.cloudera.com/

• http://hadoop. apache.org/

top related