use cases from batch to streaming, mapreduce to spark, mainframe to cloud: today's etl does it...

17
Powering the Connected Data Platform With ETL Onboarding @Scott_Gnau CTO, Hortonworks @TenduYogurtcu Big Data GM, Syncsort

Upload: syncsort

Post on 16-Apr-2017

636 views

Category:

Software


3 download

TRANSCRIPT

Page 1: Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: Today's ETL Does it All!

Powering the Connected Data Platform With ETL Onboarding

@Scott_GnauCTO, Hortonworks

@TenduYogurtcuBig Data GM, Syncsort

Page 2: Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: Today's ETL Does it All!

Global Leader in Big Iron to Big Data Solutions

2Syncsort Confidential and Proprietary - do not copy or distribute

• Provider of enterprise software and leader in Big Iron to Big Data solutions in more than 85 countries around the world

• Global presence in 87% of enterprise Fortune 500 companies

• High performance & scalable software harnessing valuable data assets to power business and operational analytics, while dramatically reducing the cost of mainframe and legacy systems

• Unique focus on customer value through cost-effective solutions and unparalleled support; trusted leader for nearly 50 years

WOODCLIFF LAKE, NJ

JAPAN

SINGAPORE

2

Global customer base of leaders and emerging businesses across all major industries

Strategic partnerships in Big Iron and Big Data ecosystems

Page 3: Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: Today's ETL Does it All!

Meet Today’s Presenters

3Syncsort Confidential and Proprietary - do not copy or distribute

Scott GnauCTO, Hortonworks

Tendu Yogurtcu, PhDGM, Big Data, Syncsort

Page 4: Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: Today's ETL Does it All!

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Open and Connected Data Platforms

DATA ATREST

DATA IN MOTION

ACTIONABLEINTELLIGENCE

The Future of the Enterprise is About All Data

Modern Data Applications

Page 5: Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: Today's ETL Does it All!

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Modern Data Applications

Modern Data Architecture

• ALL Data: Data-at-Rest & Data-in-Motion

• Cloud & Data Center• Powered by Open

Source

Big Data Analytics & IoT

Next Generation Data Use-Cases:• Predictive Retail• Factory Automation• Connected Cars• Predictive Analytics• Artificial Intelligence

The Shift to the Modern Data Architecture

System-centric User-centricRelational Database

Mainframe Client/Server Web & SaaS

IDMS

Data atRest

Data inMotion

ACTIONABLE INTELLIGENCE

Modern Data Applications

Page 6: Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: Today's ETL Does it All!

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Connected Data Platforms Enable Enterprise Transformations

Data in Motion

Data in Motion

Data at Rest

Data at Rest

MachineLearning

Deep HistoricalAnalysis

C L O U D

D ATA C E N T E R

Stream Analytics

Edge Data

Edge Data

Edge Analytics

Page 7: Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: Today's ETL Does it All!

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Data is the new Raw Material for Commerce

Easy Onboarding of New Data from New Sources

Access to Data from Legacy Systems and Apps

Successful Modern Data Apps

New Business and Revenue models

All Data

Page 8: Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: Today's ETL Does it All!

Data – Raw Material for Advanced Analytics

8

Page 9: Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: Today's ETL Does it All!

Syncsort Makes ALL Data Accessible & Usable – Ready for Analytics

9

Page 10: Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: Today's ETL Does it All!

Our Strategy: Simplify Big Data Integration

• Deploy on premise or in the cloud

• Choose among multiple execution frameworks – Hadoop, Spark, Linux, Unix, Windows

• Integrate streaming and batch data with a single data pipeline for innovative applications, like IoT

• Future-proof applications to avoid re-writing jobs in order to take advantage of innovations in new execution frameworks

• Access and integrate ALL enterprise data sources – including mainframe – for advanced analytics

10

Page 11: Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: Today's ETL Does it All!

Three Commitments Underpin Our Big Data Integration Strategy

Syncsort Confidential and Proprietary - do not copy or distribute 12

Light footprint

Self-tuning engine

Single install. No 3rd party dependencies

World-class data processing, mainframe expertise

JIRA:MAPREDUCE-2454MAPREDUCE-4807MAPREDUCE-4049MAPREDUCE-5455HIVE-8347SQOOP-1272PARQUET-134Spark-packagesand more!

Ongoing Contributions to theOpen Source Community1

Leverage Syncsort Technology Innovations & Mainframe Heritage

2

Strong Partnerships with StrategicBig Data & Hadoop Players

3

Page 12: Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: Today's ETL Does it All!

ETL Onboarding with Syncsort

13

Page 13: Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: Today's ETL Does it All!

Insurance: Easy Access to ALL Data for Better Analytics

14Syncsort Confidential and Proprietary - do not copy or distribute

• Challenge: Needed hard-to-access operational data for advanced analytics

• Solution:• Quickly load ~1000 database tables into HDP with the

click of a button• Access & integrate complex Mainframe VSAM files, data

from DB2/z, Oracle & SQL Server• Track changes & keep data up to date

• Benefits:• Insight: Better and faster analytics• Agility: Reclaim development time; single tool to ingest, detect changes and populate the data lake• Compliance: Build audit trails, keep EDW current• Productivity: No need for deep understanding of Hadoop

Page 14: Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: Today's ETL Does it All!

Leading Media Company: Accelerate New Business Initiatives

15Syncsort Confidential and Proprietary - do not copy or distribute

• Challenge: Build scalable platform to support new business initiatives & scale for double-digit data growth, while reducing escalating EDW & ELT Costs

• Solution:• Shift data storage & processing out of the EDW into

Hadoop• Migrate 500+ SQL ELT workloads to DMX-h on HDP

• Benefits:• Agility: Scalable architecture to deploy new business initiatives – analyze more set top box data,

blend website user activity data, etc.• Cost: Millions of dollars in savings from EDW, including SQL tuning & maintenance costs• Productivity: ETL developers can stop coding & tuning, and get up & running on Hadoop quickly

Page 15: Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: Today's ETL Does it All!

Hotel Chain: Ease of Use, Timely & Up-to-Date Reporting

16

• Challenge: More timely collection & reporting on room availability, event bookings, inventory and other hotel data from 4,000+ properties globally

• Solution: • Near real-time reporting• DMX-h consumes property updates from Kafka every 10s• DMX-h processes data on HDP, loading to TD every 30 min• Deployed on Google Cloud Platform

• Benefits:•Time to Value: DMX-h ease of use drastically cut development time

•Agility: Reports updated every 30 minutes vs every 24 hours

•Productivity: Leveraging ETL team for Hadoop (Spark), visual understanding of data pipeline

•Insight: Up-to-date data = better business decisions = happier customers

Page 16: Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: Today's ETL Does it All!

Syncsort DMX-h: Benefits to Business

17Syncsort Confidential and Proprietary - do not copy or distribute

• Faster Time to Value: •Faster & better insights with readily-accessible data

• Compliance:•Secure data access, ability to build audit trails

• Increased Productivity:•Reclaim development time by automating, optimizing and future-proofing development

•Across platforms, on premise and in the cloud

• Cost: •Lower archival costs

•Reduced development time

•Reduced Total Cost of Ownership, higher ROI

Page 17: Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: Today's ETL Does it All!

Syncsort Confidential and Proprietary - do not copy or distribute 18

See For Yourself!***

Take a 30-day Free Trial @www.syncsort.com/try