automate hadoop jobs with real world business impact

20
Automate Hadoop Jobs with Real World Business Impact Beeshmanth (B) Kotamreddy DevOps: Continuous Delivery CA Technologies Principal Product Manager DO4X185S @beeshmanth #CAWorld April Merritt Major international Retailer based in Ohio Senior Analyst

Upload: ca-technologies

Post on 16-Apr-2017

614 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Automate Hadoop Jobs with Real World Business Impact

Automate Hadoop Jobs with Real World Business Impact

Beeshmanth (B) Kotamreddy

DevOps: Continuous Delivery

CA Technologies

Principal Product Manager

DO4X185S

@beeshmanth

#CAWorld

April Merritt

Major international Retailer based in OhioSenior Analyst

Page 2: Automate Hadoop Jobs with Real World Business Impact

2 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD

For Informational Purposes Only

Terms of this Presentation

© 2015 CA. All rights reserved. All trademarks referenced herein belong to their respective companies. The presentation provided at CA

World 2015 is intended for information purposes only and does not form any type of warranty. Some of the specific slides with customer

references relate to customer's specific use and experience of CA products and solutions so actual results may vary.

Certain information in this presentation may outline CA’s general product direction. This presentation shall not serve to (i) affect the rights

and/or obligations of CA or its licensees under any existing or future license agreement or services agreement relating to any CA software

product; or (ii) amend any product documentation or specifications for any CA software product. This presentation is based on current

information and resource allocations as of November 18, 2015, and is subject to change or withdrawal by CA at any time without notice. The

development, release and timing of any features or functionality described in this presentation remain at CA’s sole discretion.

Notwithstanding anything in this presentation to the contrary, upon the general availability of any future CA product release referenced in

this presentation, CA may make such release available to new licensees in the form of a regularly scheduled major product release. Such

release may be made available to licensees of the product who are active subscribers to CA maintenance and support, on a when and if-

available basis. The information in this presentation is not deemed to be incorporated into any contract.

Page 3: Automate Hadoop Jobs with Real World Business Impact

3 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD

Abstract

Have you ever wondered how you might simplify and automate Hadoop batch processing for faster implementation and more accurate big data analytics?

With CA Workload Automation, you can simplify and automate Hadoop batch processing for faster implementation and more accurate big data analytics.

Beeshmanth(B) KotamreddyCA Technologies

Principal Product Manager

April MerrittSenior Analyst

Major international retailer based in Ohio

Page 4: Automate Hadoop Jobs with Real World Business Impact

4 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD

Agenda

BIGDATA AND CHANGING CUSTOMER NEEDS

HADOOP

Q & A

BUSINESS CHALLENGES

CA WORKLOAD AUTOMATION ADVANCED INTEGRATION FOR HADOOP

REAL WORLD USE OF CA’S ADVANCED INTEGRATION FOR HADOOP

1

2

3

4

5

6

Page 5: Automate Hadoop Jobs with Real World Business Impact

5 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD

Maximize the value of Big Data with the power of Workload

Automation

HDFS Operations Pig Hive Sqoop Oozie Workflows

Exciting, disruptive & evolving ecosystem

"80% of customer data will be wasted due to immature enterprise data 'value chains.' “ ~IDC

Page 6: Automate Hadoop Jobs with Real World Business Impact

6 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD

What is Big Data?

Datasets whose volume, velocity, variety and complexity exceed ability of commonly used software tools to capture, process, store, manage, and analyze them.

Information Sources

MobileTransactionalData

SearchTextsCRM, SCM,ERP

$ € ¥

ImagesEmail SocialMedia

IT Ops AudioVideo

Velocity Volume

Variety Complexity

BigData

Page 7: Automate Hadoop Jobs with Real World Business Impact

7 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD

Enterprises across all industries use Big Data

Enterprises require new capabilities around processing large amounts of data in a variety of different formats

Fraud Prevention

Trading Risks

Customer Risk Assessment

Call Detail Records

Real-time bandwidth allocations

Life time value and promotions

RETAILERS

Customer Analytics

Brand Sentiment Analytics

Promotion Planning

TELCO CARRIERSBANKS

Genomic Analysis

Medical trial Analysis

Hospital Diagnostics Analytics

IOT/Smart Meter Analytics

Energy trading and pricing risk analytics

GOVERNMENT/PUBLIC SECTOR

Crime Intelligence and Prevention

Fraud Prevention

UTILITY PROVIDERSHEALTH CARE PROVIDERS

$

Page 8: Automate Hadoop Jobs with Real World Business Impact

8 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD

What is Hadoop ?

Hadoop is… open-source software designed for

High Scalability, Fault Tolerant and Highly DistributedKey elements:

1. Distributed processing of Big Data (e.g. MapReduce)2. Distributed storage (Hadoop Distributed File System or HDFS)

HDFS(Distributed Reliable Storage)

MapReduce(Resource Management

& Data Processing)

HDFS(Distributed Reliable Storage)

YARN(Resource Management)

MapReduce(Dist. Programming)

Hadoop 1.0 Hadoop 2.0

Spark(In Memory) H

Bas

e

(No

SQL

sto

re)

Hive (Query)

Pig (Scripting)

Oozie(Workflow)

Page 9: Automate Hadoop Jobs with Real World Business Impact

9 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD

Job-1

Job-2

Job-3

Job-4

Job-5

HDFS

Data Nodes

Task Trackers

Hadoop Distributed File System (HDFS)Self-healing, high bandwidth Clustered Storage

• Name Node - One of the Core Hadoop services that maintains the namespace –knows where data is and manages blocks on data nodes

• Data Node - serves that actual store the data in their local disks.

• Secondary Name Node -performs periodic checkpoint of primary name node to serve as a backup in case of failure

Slave Nodes

2

4

5

1

2

5

1

3

4

2

3

5

1

3

4

HDFS breaks incoming files into blocks and stores them redundantly across the cluster.

Name Node (primary)

Name Node (secondary)

Master Node

Periodic Checkpoint

1

Page 10: Automate Hadoop Jobs with Real World Business Impact

10 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD

MapReduce – Core Hadoop2

Hadoop’s MapReduce framework involves two phases:1. Map Phase: Distributes dataset among multiple servers and

operates on the data locally.

2. Reduce Phase: Recombines the partial results.

A distributed computing Framework

Page 11: Automate Hadoop Jobs with Real World Business Impact

11 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD

SO, YOU HAVE DATA

And you want it to help you better understand your business, customers and marketplace.

THAT’S WHY YOU USE HADOOP

But, extracting data insights may require you to interface with systems outside of Hadoop.

And that isn’t always easy…

Page 12: Automate Hadoop Jobs with Real World Business Impact

12 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD

Enterprises typically have multiple scheduling engines to manage end-to-end business processes

Companies typically interface with multiple systems such as

ERP (SAP/ Oracle etc.), databases, reporting tools, point of sale systems,

social media files etc., in addition to Hadoop

As a result, Enterprises use multiple tools to manage

their workload automation needs

Visualizing the end-to-end business workflows, & managing dependencies across Hadoop

and non-Hadoop systems might not always be easy

Page 13: Automate Hadoop Jobs with Real World Business Impact

13 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD

Challenges

Multiple Schedulers needed to run traditional jobs and Hadoop jobs Hadoop jobs may not integrate into existing Workflows

Heterogeneous Environment and Tools Team productivity, experience, knowledge Placing workloads - “right place , right time”

Slow responsiveness to the business No central location to monitor end-end workflows

Page 14: Automate Hadoop Jobs with Real World Business Impact

14 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD

Drag and drop Hadoop jobs into existing workflows.

Monitor traditional and Hadoop jobs from a single console.

Detect problems early and resolve them quickly.

Set up automatic alerts for critical events.

Unified visibility into your heterogeneous and Hadoopenvironments

Improved performance and uptime through proactive monitoring and alerts

Lower costs by eliminating the complexity of disconnected monitoring tools

BIG DATA MADE EASY withCA Workload Automation Advanced Integration for Hadoop

Page 15: Automate Hadoop Jobs with Real World Business Impact

Automate Hadoop Jobs with Real World Business Impact

April Merritt

DevOps: Continuous Delivery

Major international Retailer based in Ohio

Senior Analyst

Page 16: Automate Hadoop Jobs with Real World Business Impact

16 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD

Extract and Move Input Files

Transform and Process Input

Run Specialized obs to extract

data

Batch Ingestion into Hadoop and Batch Analytics

Load results into BI Tool for

Interactive queries

INTEGRATED JOBS

Jobs directly integrate with source system, and fun in flow.

Then… extract Pricing, Inventory, Sales, etc… data

when jobs complete.

DATASTAGE

Parse Integration Files

Run ETL and NZ to merge input files into DW

SQOOP JOB

Run Sqoop jobs to copy data into Hadoop cluster

PIG JOB

Run pig jobs for operational

analytics

Interactive search job to run dynamic

promotion

Wo

rkfl

ow

Wo

rklo

ads

Use

cas

e

Extract POS, Inventory, Price Data

Mine Customer Information and Inventory Information from Source

Systems

Load Data into NoSQLand render dynamic

discounting on-demand

Perform Batch aggregation and Machine learning for Promotion

Analytics

CA Workload Automation extends scheduling for Big DataRetail Customer Analytics in the Application Economy

ETL JOB ANALYTICS JOB

Page 17: Automate Hadoop Jobs with Real World Business Impact

17 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD

Hadoop and CA Workload Automation DEHow our company’s IT department makes our Workload Automation a Priority

• All enterprise data systems already integrated into DE.• Majority of sources and destinations already using system. Hadoop integration does not require additional architecture or work.

• Processes already set up for handling failure, changes, and audit controls. • Operations callouts, restarts, expert schedulers who focus on streamlining integrated workflows and creating easily manageable sustainable architecture.

• Enterprise flow accessible in one place. •Full transparency. Visible issues are fixed issues.

• Oozie Workflows will not be used.•DE is more user friendly and easier to schedule. Less complicated workflows make troubleshooting and trainings easier.

Page 18: Automate Hadoop Jobs with Real World Business Impact

18 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD

CA Workload Automation extends scheduling for Big DataRetail Customer Analytics in the Application Economy

Landing Zone

EDW Transformation

Data Injection into Hadoop

HDFS Transformation and Analytics

EDW Aggregation

Analytics

Screenshotusing CA

Workload Automation DE

Page 19: Automate Hadoop Jobs with Real World Business Impact

19 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD

Q & A

Page 20: Automate Hadoop Jobs with Real World Business Impact

20 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD

For More Information

To learn more, please visit:

http://cainc.to/Nv2VOe

CA World ’15