automate hadoop jobs with real world business impact
TRANSCRIPT
Automate Hadoop Jobs with Real World Business Impact
Beeshmanth (B) Kotamreddy
DevOps: Continuous Delivery
CA Technologies
Principal Product Manager
DO4X185S
@beeshmanth
#CAWorld
April Merritt
Major international Retailer based in OhioSenior Analyst
2 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
For Informational Purposes Only
Terms of this Presentation
© 2015 CA. All rights reserved. All trademarks referenced herein belong to their respective companies. The presentation provided at CA
World 2015 is intended for information purposes only and does not form any type of warranty. Some of the specific slides with customer
references relate to customer's specific use and experience of CA products and solutions so actual results may vary.
Certain information in this presentation may outline CA’s general product direction. This presentation shall not serve to (i) affect the rights
and/or obligations of CA or its licensees under any existing or future license agreement or services agreement relating to any CA software
product; or (ii) amend any product documentation or specifications for any CA software product. This presentation is based on current
information and resource allocations as of November 18, 2015, and is subject to change or withdrawal by CA at any time without notice. The
development, release and timing of any features or functionality described in this presentation remain at CA’s sole discretion.
Notwithstanding anything in this presentation to the contrary, upon the general availability of any future CA product release referenced in
this presentation, CA may make such release available to new licensees in the form of a regularly scheduled major product release. Such
release may be made available to licensees of the product who are active subscribers to CA maintenance and support, on a when and if-
available basis. The information in this presentation is not deemed to be incorporated into any contract.
3 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Abstract
Have you ever wondered how you might simplify and automate Hadoop batch processing for faster implementation and more accurate big data analytics?
With CA Workload Automation, you can simplify and automate Hadoop batch processing for faster implementation and more accurate big data analytics.
Beeshmanth(B) KotamreddyCA Technologies
Principal Product Manager
April MerrittSenior Analyst
Major international retailer based in Ohio
4 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Agenda
BIGDATA AND CHANGING CUSTOMER NEEDS
HADOOP
Q & A
BUSINESS CHALLENGES
CA WORKLOAD AUTOMATION ADVANCED INTEGRATION FOR HADOOP
REAL WORLD USE OF CA’S ADVANCED INTEGRATION FOR HADOOP
1
2
3
4
5
6
5 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Maximize the value of Big Data with the power of Workload
Automation
HDFS Operations Pig Hive Sqoop Oozie Workflows
Exciting, disruptive & evolving ecosystem
"80% of customer data will be wasted due to immature enterprise data 'value chains.' “ ~IDC
6 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
What is Big Data?
Datasets whose volume, velocity, variety and complexity exceed ability of commonly used software tools to capture, process, store, manage, and analyze them.
Information Sources
MobileTransactionalData
SearchTextsCRM, SCM,ERP
$ € ¥
ImagesEmail SocialMedia
IT Ops AudioVideo
Velocity Volume
Variety Complexity
BigData
7 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Enterprises across all industries use Big Data
Enterprises require new capabilities around processing large amounts of data in a variety of different formats
Fraud Prevention
Trading Risks
Customer Risk Assessment
Call Detail Records
Real-time bandwidth allocations
Life time value and promotions
RETAILERS
Customer Analytics
Brand Sentiment Analytics
Promotion Planning
TELCO CARRIERSBANKS
Genomic Analysis
Medical trial Analysis
Hospital Diagnostics Analytics
IOT/Smart Meter Analytics
Energy trading and pricing risk analytics
GOVERNMENT/PUBLIC SECTOR
Crime Intelligence and Prevention
Fraud Prevention
UTILITY PROVIDERSHEALTH CARE PROVIDERS
$
8 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
What is Hadoop ?
Hadoop is… open-source software designed for
High Scalability, Fault Tolerant and Highly DistributedKey elements:
1. Distributed processing of Big Data (e.g. MapReduce)2. Distributed storage (Hadoop Distributed File System or HDFS)
HDFS(Distributed Reliable Storage)
MapReduce(Resource Management
& Data Processing)
HDFS(Distributed Reliable Storage)
YARN(Resource Management)
MapReduce(Dist. Programming)
Hadoop 1.0 Hadoop 2.0
Spark(In Memory) H
Bas
e
(No
SQL
sto
re)
Hive (Query)
Pig (Scripting)
Oozie(Workflow)
9 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Job-1
Job-2
Job-3
Job-4
Job-5
HDFS
Data Nodes
Task Trackers
Hadoop Distributed File System (HDFS)Self-healing, high bandwidth Clustered Storage
• Name Node - One of the Core Hadoop services that maintains the namespace –knows where data is and manages blocks on data nodes
• Data Node - serves that actual store the data in their local disks.
• Secondary Name Node -performs periodic checkpoint of primary name node to serve as a backup in case of failure
Slave Nodes
2
4
5
1
2
5
1
3
4
2
3
5
1
3
4
HDFS breaks incoming files into blocks and stores them redundantly across the cluster.
Name Node (primary)
Name Node (secondary)
Master Node
Periodic Checkpoint
1
10 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
MapReduce – Core Hadoop2
Hadoop’s MapReduce framework involves two phases:1. Map Phase: Distributes dataset among multiple servers and
operates on the data locally.
2. Reduce Phase: Recombines the partial results.
A distributed computing Framework
11 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
SO, YOU HAVE DATA
And you want it to help you better understand your business, customers and marketplace.
THAT’S WHY YOU USE HADOOP
But, extracting data insights may require you to interface with systems outside of Hadoop.
And that isn’t always easy…
12 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Enterprises typically have multiple scheduling engines to manage end-to-end business processes
Companies typically interface with multiple systems such as
ERP (SAP/ Oracle etc.), databases, reporting tools, point of sale systems,
social media files etc., in addition to Hadoop
As a result, Enterprises use multiple tools to manage
their workload automation needs
Visualizing the end-to-end business workflows, & managing dependencies across Hadoop
and non-Hadoop systems might not always be easy
13 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Challenges
Multiple Schedulers needed to run traditional jobs and Hadoop jobs Hadoop jobs may not integrate into existing Workflows
Heterogeneous Environment and Tools Team productivity, experience, knowledge Placing workloads - “right place , right time”
Slow responsiveness to the business No central location to monitor end-end workflows
14 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Drag and drop Hadoop jobs into existing workflows.
Monitor traditional and Hadoop jobs from a single console.
Detect problems early and resolve them quickly.
Set up automatic alerts for critical events.
Unified visibility into your heterogeneous and Hadoopenvironments
Improved performance and uptime through proactive monitoring and alerts
Lower costs by eliminating the complexity of disconnected monitoring tools
BIG DATA MADE EASY withCA Workload Automation Advanced Integration for Hadoop
Automate Hadoop Jobs with Real World Business Impact
April Merritt
DevOps: Continuous Delivery
Major international Retailer based in Ohio
Senior Analyst
16 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Extract and Move Input Files
Transform and Process Input
Run Specialized obs to extract
data
Batch Ingestion into Hadoop and Batch Analytics
Load results into BI Tool for
Interactive queries
INTEGRATED JOBS
Jobs directly integrate with source system, and fun in flow.
Then… extract Pricing, Inventory, Sales, etc… data
when jobs complete.
DATASTAGE
Parse Integration Files
Run ETL and NZ to merge input files into DW
SQOOP JOB
Run Sqoop jobs to copy data into Hadoop cluster
PIG JOB
Run pig jobs for operational
analytics
Interactive search job to run dynamic
promotion
Wo
rkfl
ow
Wo
rklo
ads
Use
cas
e
Extract POS, Inventory, Price Data
Mine Customer Information and Inventory Information from Source
Systems
Load Data into NoSQLand render dynamic
discounting on-demand
Perform Batch aggregation and Machine learning for Promotion
Analytics
CA Workload Automation extends scheduling for Big DataRetail Customer Analytics in the Application Economy
ETL JOB ANALYTICS JOB
17 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Hadoop and CA Workload Automation DEHow our company’s IT department makes our Workload Automation a Priority
• All enterprise data systems already integrated into DE.• Majority of sources and destinations already using system. Hadoop integration does not require additional architecture or work.
• Processes already set up for handling failure, changes, and audit controls. • Operations callouts, restarts, expert schedulers who focus on streamlining integrated workflows and creating easily manageable sustainable architecture.
• Enterprise flow accessible in one place. •Full transparency. Visible issues are fixed issues.
• Oozie Workflows will not be used.•DE is more user friendly and easier to schedule. Less complicated workflows make troubleshooting and trainings easier.
18 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
CA Workload Automation extends scheduling for Big DataRetail Customer Analytics in the Application Economy
Landing Zone
EDW Transformation
Data Injection into Hadoop
HDFS Transformation and Analytics
EDW Aggregation
Analytics
Screenshotusing CA
Workload Automation DE
19 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Q & A
20 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
For More Information
To learn more, please visit:
http://cainc.to/Nv2VOe
CA World ’15