getting to what matters: accelerating your path through the big data lifecycle with csc and...

20
© Hortonworks Inc. 2011 – 2014. All Rights Reserved Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

Upload: hortonworks

Post on 15-Jul-2015

395 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

Page 2: Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Presenters

•  John Kreisa (@marked_man) VP Strategic Marketing, Hortonworks Over 20 years in data management as a developer and a marketer

•  Tim Gasper (@TimGasper) Global Offerings Manager, CSC Led product for Infochimps for 4 years, now called the CSC Big Data PaaS; leads product/offering management for CSC Big Data & Analytics

Page 3: Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Traditional systems under pressure Challenges •  Constrains data to app •  Can’t manage new data •  Costly to Scale

Business Value

Clickstream

Geolocation

Web Data

Internet of Things

Docs, emails

Server logs

2012 2.8 Zettabytes

2020 40 Zettabytes

LAGGARDS

INDUSTRY LEADERS

1

2 New Data

ERP CRM SCM

New

Traditional

Page 4: Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop emerged as foundation of new data architecture

Apache Hadoop is an open source data platform for managing large volumes of high velocity and variety of data •  Built by Yahoo! to be the heartbeat of its ad & search business

•  Donated to Apache Software Foundation in 2005 with rapid adoption by large web properties & early adopter enterprises

•  Incredibly disruptive to current platform economics

Traditional Hadoop Advantages ü  Manages new data paradigm ü  Handles data at scale ü  Cost effective ü  Open source

Application

Storage HDFS

Batch Processing MapReduce

Page 5: Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

SYSTEMS  INTEGRATOR  

OPERATIONAL  TOOLS  

DEV  &  DATA  TOOLS  

INFRASTRUCTURE  

Hadoop is deeply integrated in the data center SO

UR

CES

EXISTING  Systems  

Clickstream   Web  &Social   GeolocaDon   Sensor  &  Machine  

Server  Logs   Unstructured  

DAT

A S

YSTE

M

RDBMS   EDW   MPP  

APPLICAT

IONS  

Deep Partnerships Hortonworks engages in deep engineered relationships with the leaders in the data center, such as HP, Microsoft, Red Hat, SAP, SAS & Teradata Broad Partnerships Over 600 partners work with us to certify their applications to work with Hadoop so they can extend big data to their users

HDP 2.2

Gov

erna

nce

&

Inte

grat

ion

Secu

rity

Ope

ratio

ns

Data Access

Data Management

YARN

Page 6: Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

SYSTEMS  INTEGRATOR  

OPERATIONAL  TOOLS  

DEV  &  DATA  TOOLS  

INFRASTRUCTURE  

CSC and the Modern Data Architecture Modern Data Architecture

•  Enable applications to have access to all your enterprise data through an efficient centralized platform

•  Supported with a centralized approach governance, security and operations

•  Versatile to handle any applications and datasets no matter the size or type

CSC Extends Hadoop’s Reach

•  Allows for multiple deployment options - including on-premise, managed or Big Data as a Service.

•  CSC’s global consulting services can help you architect, develop and implement your big data strategy, analytics, integrations, and platforms

Clickstream   Web    &  Social  

GeolocaDon   Sensor    &  Machine  

Server    Logs  

Unstructured  

SOU

RC

ES

Existing Systems

ERP   CRM   SCM  

AN

ALY

TIC

S

Data Marts

Business Analytics

Visualization & Dashboards

AN

ALY

TIC

S

Applications Business Analytics

Visualization & Dashboards

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

HDFS (Hadoop Distributed File System)

YARN: Data Operating System

Interactive Real-Time Batch Partner ISV Batch Batch MPP   EDW  

Page 7: Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop Driver: Cost optimization

Archive Data off EDW Move rarely used data to Hadoop as active archive, store more data longer

Offload costly ETL process Free your EDW to perform high-value functions like analytics & operations, not ETL

Enrich the value of your EDW Use Hadoop to refine new data sources, such as web and machine data for new analytical context

AN

ALY

TIC

S

Data Marts

Business Analytics

Visualization & Dashboards

HDP helps you reduce costs and optimize the value associated with your EDW

AN

ALY

TIC

S D

ATA

SYST

EMS

Data Marts

Business Analytics

Visualization & Dashboards

HDP 2.2

ELT °

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

N

Cold Data, Deeper Archive & New Sources

Enterprise Data Warehouse

Hot

MPP

In-Memory

Clickstream   Web    &  Social  

GeolocaDon   Sensor    &  Machine  

Server    Logs  

Unstructured  

Existing Systems

ERP   CRM   SCM  

SOU

RC

ES

Page 8: Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Single View Improve acquisition and retention

Predictive Analytics Identify your next best action

Data Discovery Uncover new findings

Financial Services

New Account Risk Screens Trading Risk Insurance Underwriting

Improved Customer Service Insurance Underwriting Aggregate Banking Data as a Service

Cross-sell & Upsell of Financial Products Risk Analysis for Usage-Based Car Insurance Identify Claims Errors for Reimbursement

Telecom Unified Household View of the Customer Searchable Data for NPTB Recommendations Protect Customer Data from Employee Misuse

Analyze Call Center Contacts Records Network Infrastructure Capacity Planning Call Detail Records (CDR) Analysis

Inferred Demographics for Improved Targeting Proactive Maintenance on Transmission Equipment Tiered Service for High-Value Customers

Retail 360° View of the Customer Supply Chain Optimization Website Optimization for Path to Purchase

Localized, Personalized Promotions A/B Testing for Online Advertisements Data-Driven Pricing, improved loyalty programs

Customer Segmentation Personalized, Real-time Offers In-Store Shopper Behavior

Manufacturing Supply Chain and Logistics Optimize Warehouse Inventory Levels Product Insight from Electronic Usage Data

Assembly Line Quality Assurance Proactive Equipment Maintenance Crowdsource Quality Assurance

Single View of a Product Throughout Lifecycle Connected Car Data for Ongoing Innovation Improve Manufacturing Yields

Healthcare Electronic Medical Records Monitor Patient Vitals in Real-Time Use Genomic Data in Medical Trials

Improving Lifelong Care for Epilepsy Rapid Stroke Detection and Intervention Monitor Medical Supply Chain to Reduce Waste

Reduce Patient Re-Admittance Rates Video Analysis for Surgical Decision Support Healthcare Analytics as a Service

Oil & Gas Unify Exploration & Production Data Monitor Rig Safety in Real-Time Geographic exploration

DCA to Slow Well Declines Curves Proactive Maintenance for Oil Field Equipment Define Operational Set Points for Wells

Government Single View of Entity CBM & Autonomic Logistic Analysis Sentiment Analysis on Program Effectiveness

Prevent Fraud, Waste and Abuse Proactive Maintenance for Public Infrastructure Meet Deadlines for Government Reporting

Hadoop Driver: Advanced analytic applications

Page 9: Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop Driver: Enabling the data lake SC

ALE

SCOPE

Data Lake Definition •  Centralized Architecture

Multiple applications on a shared data set with consistent levels of service

•  Any App, Any Data Multiple applications accessing all data affording new insights and opportunities.

•  Unlocks ‘Systems of Insight’ Advanced algorithms and applications used to derive new value and optimize existing value.

Drivers: 1.  Cost Optimization 2.  Advanced Analytic Apps

Goal: •  Centralized Architecture •  Data-driven Business

DATA LAKE

Journey to the Data Lake with Hadoop

Systems of Insight

Page 10: Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Case Study: 12 month Hadoop evolution at TrueCar D

ata

Plat

form

Cap

abili

ties

12 months execution plan

June 2013 Begin Hadoop Execution

July 2013 Hortonworks Partnership

May ‘14 IPO

Aug 2013 Training & Dev Begins

Nov 2013 Production Cluster 60 Nodes 2 PB

Jan 2014 40% Dev Staff Proficient

Dec 2013 Three Production Apps (3 total)

Feb 2014 Three More Production Apps (6 total)

12 Month Results at TRUECar •  Six Production Hadoop Applications •  Sixty nodes/2PB data •  Storage Costs/Compute Costs

from $19/GB to $0.23/GB

“We addressed our data platform capabilities strategically as a pre-cursor to IPO.”

Page 11: Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

CSC Big Data & Analytics

•  Fastest Time to Value Proven methodologies and customer success stories achieving insight in 30 days and production rollout in 90.

•  Industry Analytics Expertise Experience combining horizontal analytics approaches and techniques with industry and vertical specialization.

•  Global Solutions Integrator Worldwide delivery capabilities and experience with a broad set of both open and proprietary technologies and vendors.

•  End-to-End Consulting Taking customers on a journey from strategy and roadmap, to business and technology transformation, to ongoing SLA management and as-a-Service.

Page 12: Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

CSC Big Data Platform as a Service

Big Data Platform as a Service

Flexible Deployment Options

Hadoop Queries Streams CSC Command and Control

MongoDB Elasticsearch

Storm Kafka

PostgreSQL PostGIS

Deployment Center

Operations Center

Support Center

Application Center

Knowledge Center

Public Cloud

Virtual Private Cloud

Enterprise Private Cloud

Dedicated Cluster

Enterprise Grade Security Access Control

Compliance Support

Perimeter Security

Activity Monitoring

Audit Logging Encryption Malware

Protection Hardened

OS

DataStax TitanDB

ETL Data Transformation Business Intelligence Data Mining Advanced Analytics Geolocation

Hive w/ Tez HBase

Accumulo

HDFS, YARN, MR, Spark, …

Page 13: Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Across the Industries, Clients See the Possibilities

Financial Services Utilities Transportation Health and Life Sciences

Retail Telecommunications

•  Fraud detection •  Risk management •  360° view of the

customer

•  Real-time route optimization based on traffic and weather

•  Maintenance optimization and asset tracking

•  360° view of the customer

•  Click-stream analysis •  Real-time promotions

Law Enforcement •  Real-time multimodal

surveillance •  Situational awareness •  Cybersecurity detection

•  CDR processing •  Churn prediction •  Geomapping/marketing •  Network monitoring

•  Epidemic early warning system

•  ICU monitoring •  Remote healthcare

monitoring

•  Analysis of weather impact on power generation

•  Transmission monitoring •  Smart grid management

•  Predictive maintenance •  Real-time parts flow

monitoring •  Product configuration

planning

Manufacturing

Page 14: Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

But They Struggle With Consistent Challenges

•  Data complexity •  Robust and scalable service •  Speed of stand-up

1. Setting up and operating a big data and analytics platform

2. Applying the right data science

3. Integrating insights into their business processes

•  Skills shortage •  Skills retention

4. Identifying and managing big data skills

Page 15: Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Time to Value, Time to Next Iteration

Business Discovery

Info Discovery

Logical Data Model

Physical Data Model

System Staging

Data Ingestion, Transformation, ETL

Application Development

Analytics

Production Staging

Data Warehouse Project 12-24 Months to Reach Production

Big Data Project 3-6 Months to Reach Production

Prod. Stag.

Business Discovery

Info Discovery

Sys. Stag.

Initial Data

Ingest

Schema on Read

Analytics

App Dev

Schema on Read

Analytics

App Dev

Schema on Read

Analytics

App Dev

Schema on Read

Analytics

App Dev

Schema on Read

Analytics

App Dev

Schema on Read

Analytics

App Dev

Page 16: Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Following the Big Data Maturity Lifecycle and…

•  Determining use cases •  Art of the possible •  Technology evaluation &

understanding

•  Validate business value hypothesis with real data

•  Quick win, low hanging fruit, rapid initial phase

•  Implement one key transformation or insight into business process

•  Longer project timelines and robust ROI tracking

•  Expand to other key use cases for a big data enabled department of business function

•  Incorporate complementary tools and technology for a broader solution

•  Shift from a department or function focus to a cross-org focus

•  Introduce insights from across silos

•  Implement self-service capabilities for analytics and data integration

•  Provide marketplaces, catalogs, and collaboration zones

Page 17: Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

… Leveraging an App Reference Design Framework

It’s all about the apps.

Page 18: Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Proof of Value: Food & Hospitality Retailer

This Food & Hospitality Retailer has a footprint of over 650 regional hotels, 2,800 coffee shops, and a number of restaurant chains. CSC provides the infrastructure, data platform, and analytics that uncovers revenue opportunities in customer web interactions.

•  The client wanted to quickly evaluate the use of big data and the value that it brings as it relates to identifying new business opportunities

•  Ease of use was a key need in making insights and reporting more accessible to analysts… and increasing the speed with which they could analyze

•  Time to market was a key factor in the decision to implement a comprehensive big data platform. The client realized: –  A bare platform would not be easy

to manage –  Their staff does not possess the skills to operate a

bare platform –  They needed to focus on the

big data applications, rather than the platform

•  CSC designed and configured the solution, built and deployed it in the cloud, and developed ETL flows to transport web activity data within 90 days: –  Core platform (BDPaaS) leveraging Hortonworks

Data Platform, including Hive with Tez –  Aggregating lots of different data sources to create

one massive web log data set –  Adding data science algorithms to clean up data for

better insights –  Providing Pentaho Business Analytics as a

comprehensive reporting and dashboard suite for insight presentation

•  CSC managed the infrastructure, platform components, and data flows, in addition to providing continued support/consultation services to the client

•  The client is generating insights on how customers interact with their website, and improving their services for happier customers and more streamlined business: –  Faster path to ROI with both tech and services –  Creating a real-time customer insights dashboard

and set of reports –  Ability to prove the value of big data internally

through the mining of data and generation of insights and reports for various teams

–  Scalability to more data sources and use cases, including plans for mobile application analytics and operational metrics, as well as operational business analytics combining internal and external data sources

SOLUTION

CHALLENGE RESULTS

Page 19: Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Business Unit Strategy: Network Rail

Network Rail manages the most of the rail infrastructure across Great Britain, responsible for control and maintenance of over 2,500 railway stations, 20,000 miles of track, and 40,000 bridges and tunnels. CSC provides a data and analytics hub for massive amounts of imagery and analog track monitoring data.

•  Network Rail needed a platform that could not only store, but also analyze petabytes of data over the long-term: –  Track imagery and video data captured via drones

and cameras –  Vibration data captured via maintenance trains –  Other forms of large file size analog data crossed

with operational, structured data sets •  Network Rail wanted to implement the solution

quickly, and ramp up data volumes at a fast pace •  Goal of leveraging combined services to assist with

loading data, managing the underlying infrastructure, and working with and analyzing the data

•  CSC designed and configured the solution, built and deployed it in the cloud, and developed ETL flows to import massive amounts of bulk data on an ongoing basis –  Core platform (BDPaaS) leveraging Hortonworks

Data Platform, including Hive with Tez •  CSC’s platform integrated with ESRI ArcGIS for Big

Data geolocation analysis features including geotagging and geo tiles

•  CSC managed the infrastructure, platform components, and data flows, in addition to providing continued support/consultation services to the client

•  Network Rail is generating insights on how to prioritize in near real-time the improvement and maintenance of the massive railway track and infrastructure footprint –  Advanced analytics of analog data, including

geolocation capabilities –  Ability to handle the scale required by the massive

amount of data under management and data growth –  Complete transformation of a business unit’s

analytics capability on track for success in less than 12 months

SOLUTION

CHALLENGE RESULTS

Page 20: Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Question & Answer session will be conducted electronically, using the panel to the right of your screen

Get started with Hortonworks Sandbox http://hortonworks.com/sandbox

Follow us: @hortonworks

CSC Big Data Maturity Survey http://www.csc.com/big_data_index Learn

More

@CSCNews

Next Steps

CSC Big Data Home http://www.csc.com/big_data