powering the future of data with ibm & hortonworks · pdf filepowering the future of data...
Post on 03-Feb-2018
223 Views
Preview:
TRANSCRIPT
© 2017 IBM Corporation & Hortonworks
Powering the Future of Datawith IBM & Hortonworks
Sean RobertsPartner Engineering, Hortonworks, EMEA@seano
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks Company ProfileO
NLY 100
open sourceApache Hadoop data platform
% Founded in 2011
employees across
countries16
HADOOP1ST
provider to go public
IPO 4Q14 (NASDAQ: HDP)
Serving 57% of US F100 and 20% of Global F500 ~1,100
Original 24Architects, Developers, Operators of Hadoop
from Yahoo!
3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
M A X I M U M C O M M U N I T Y I N N O V A T I O N
T H E
I N N O V A T I O N
A D V A N T A G E
P R O P R I E T A R Y
A P P R O A C H
T I M E
IN
NO
VA
TI
ON
Our Founding Belief:
Innovation happens best not in isolation but in collaboration
OPEN C OMMUNITY
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
M A X I M U M C O M M U N I T Y I N N O V A T I O N
T H E
I N N O V A T I O N
A D V A N T A G E
P R O P R I E T A R Y
A P P R O A C H
T I M E
IN
NO
VA
TI
ON
Customers Win With Open Source
Maximum Choice & Minimum Lock-in
Open community produces widely used, longer lived technology
Faster Pace of Innovation
Hundreds of developers across hundreds of companies, including users and vendors
Easier Adoption
Unfettered access to downloads and code
Focus on Partnering and Collaboration
Ability to influence enterprise-focused roadmap paired with code transparency enables co-engineering partnerships and deeply integrated solutions that deliver maximum business value
Aligned Incentives Drive Mutual Success
Customer success – not license sales – drives revenue
OPEN C OMMUNITY
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
INTERNETOF
ANYTHING
AGE OF DATA
Open source is the norm,
and Apache is the center of gravity
Founded: 1999
6 © Hortonworks Inc. 2011 – 2017. All Rights ReservedPage 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hortonworks Influences the Apache Community
We Employ the Committers
--one third of all committers to the Apache®
Hadoop™ project, and a majority in Apache NiFiand other important projects
Our Committers Innovate
and improve Connected Data Platforms
We Influence the Hadoop Roadmap
by communicating important requirements to the community through our leaders
A P A C H E H A D O O P C O M M I T T E R S
7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
5 IBMers contributing to
Linux and Apache Projects
1999
IBM is investing in the Linux ecosystems &
open innovation
270+ OpenPOWER-based
innovations under way
2016
50k+ IBMers contributing to 150+ open organizations
1. Source: https://developer.ibm.com/start/
1
Blockchain
HyperledgerOpen Source
Databases
7
8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Open Community example: Apache Atlas
May 2015
Apache
Atlas
Incubation
DGI group
Kickoff
Dec 2014 Apr 2017
HDP 2.6/
Atlas 0.8
Release
Global Financial
Company
* DGI: Data Governance Initiative
Aug 2016
HDP 2.5/
Atlas 0.7
Foundation
Release
• Committers – 35• Code contributors from
- IBM, Aetna, Merck, Target, JPMC
Kickoff to GA in 7 months
July 2015
HDP 2.3
GA Release
© 2017 IBM Corporation & Hortonworks
A Connected Data Strategy Solves for All Data
Data Services
Hortonworks Solutions
Enterprise DataWarehouse Optimization
Cyber Security andThreat Management
Internet of Thingsand Streaming Analytics
Data CenterHortonworks Data Suite
HDFHDP
HortonworksConnection
CloudHortonworks Data Cloud
AWS HDInsight
Hortonworks Connection
Enablement Subscription
SmartSense™
Premier Operational Support
Educational Services
Professional Services
Community Connection
16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hortonworks Data Platform for Data at RestPowered by Open Enterprise Hadoop
Open
Interoperable
Ready
Central
17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hortonworks Data Platform for Data at RestPowered by Open Enterprise Hadoop
18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hortonworks Data Platform for Data at RestPowered by Open Enterprise Hadoop
IBM Spectrum Scale
19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Data lake with IBM Spectrum Scale Unleash new storage economies on a global scale.
Block
iSCSI
Client workstations Users and
applications
Compute farm
Traditionalapplications
GLOBAL Namespace
Analytics
Transparent
HDFS
OpenStack
Cinder
Glance
Manilla
Object
Swift S3
Transparent Cloud
Powered by IBM Spectrum ScaleAutomated data placement and data migration
Disk Tape Shared Nothing Cluster
Flash
New Genapplications
Transparent Cloud Tier
Worldwide Data Distribution (R/W)
Site B
Site A
Site C
SMBNFS
POSIX
File
Consolidate all your unstructured data storage on Spectrum Scale with unlimited and painless scaling of capacity and performance
Encryption DR Site
AFM-DR
JBOD/JBOF
Spectrum Scale RAID
Compression
4000+
clients
20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Announcement 19 Sep 2016: IBM and Hortonworks bring Open Source Distribution to Power Systems
Open-on-Open: 100% Open Hadoop & Spark on OpenPOWER fuels community innovation
Combined Market Leadership and Reach
Hortonwork’s strong client success, rapid growth and leadership in the Hadoop community
Power's success, large global enterprise install base, and IBM's client focus
Guarantee: 3X price-performance advantage over x86*
22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Announcement 13 Jun 2017: IBM and Hortonworks extend partnership to bring Data Science to HDP
Great Data + Great Data Science = Great Decisions
IBM chooses Hortonworks Data Platform (HDP) as their Hadoop distribution
Hortonworks Data Platform (HDP®) combining IBM DSX (Data Science Experience) & IBM Big SQL into new integrated solutions
24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Secure
Real-time
Streaming
Integrated
Hortonworks DataFlow for Data in MotionPowered by Apache NiFi, Kafka, and Storm
25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
HDF (Hortonworks Data Flow): Data-In-Motion Platform
BUILD DATAFLOW MANAGEMENT AND STREAMING ANALYTICS SOLUTIONSto COLLECT, CURATE, ANALYZE and ACT ONdata in motion across the edge, data center and cloud
27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Smart Mobility
Digital Personalization
The Internet of Things
Cloud Computing
The Connected EraA New Way of Business
28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Data Doubles Every Two Years
44ZB By 2020
28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
The Old Way
System-Centric
Procedural
Hierarchical
Scheduled
Monolithic
The New Way
User-Centric
Agile
Dynamic
Real-Time
Contextual
29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Data Doubles Every Two Years
44ZB By 2020
30 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
A Connected Data Strategy Solves for All Data
DATA IN MOTION DATA AT REST
Hortonworks Data Flow
(HDF)
Hortonworks Data Platform
(HDP)
31 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Capturestreaming data
Deliverperishable insights
Combinenew & old data
Storedata forever
Accessa multi-tenant data lake
Modelwith artificial intelligence
DATA AT RESTDATA IN MOTION
ACTIONABLE
INTELLIGENCE
Perishable Insights Historical Insights
Hortonworks Data Flow
(HDF)
Hortonworks Data Platform
(HDP)
A Connected Data Strategy Solves for All Data
32 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Applications
Real-Time Cyber Securityprotects systems with superior threat detection
Smart Manufacturingdramatically improves yields by managing more variables in greater detail
Connected, Autonomous Carsdrive themselves and improve road safety
Future Farmingoptimizing soil, seeds and equipment to measured conditions on each square foot
Automatic Recommendation Enginesmatch products to preferences in milliseconds
DATA AT REST
DATA IN MOTION
ACTIONABLEINTELLIGENCE
Hortonworks Data Flow
(HDF)
Hortonworks Data Platform
(HDP)
33 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Actionable Intelligence Drives the New Automotive Industry
ERP Data
Warranty Data
Geo Trackin
g
Infotainment Metadata
SCADA Systems
Social Media Streams
PREVENTATIVEMAINTENANCE
SUPPLY CHAIN OPTIMIZATION
MANUFACTURING YIELDS MAXIMIZATION
QUALITY CONTROL
NEW PRODUCT PLANNING
ERP Systems
Defect Testing
Data
Machine Data Data
HistoriansProduct
Design Docs
Service Records
34 © Hortonworks Inc. 2011 – 2017. All Rights Reserved34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Payment Tracking
DueDiligence
SocialMapping
ProductDesign
M & ACall
AnalysisMachine
Data
DefectDetecting
FactoryYields
CustomerSupport
BasketAnalysis
Segments
CustomerRetention
SentimentAnalysis
OptimizeInventories
SupplyChain
Cross-Sell
VendorScorecards
AdPlacement
CyberSecurity
DisasterMitigation
InvestmentPlanning
AdPlacement
RiskModeling
ProactiveRepair
InventoryPredictions
NextProduct
Recs
OPEXReduction
HistoricalRecords
MainframeOffloads
Device Data
Ingest
Rapid Reporting
DigitalProtection
Dataas a
Service
FraudPrevention
PublicData
Capture
INNOVATE
RENOVATE
E X P L O R E O P T I M I Z E T R A N S F O R M
ACTIVEARCHIVE
ETLONBOARD
DATAENRICHMENT
DATADISCOVERY
SINGLEVIEW
PREDICT IVEANALYT ICS
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Prescient Traveler Is a Modern Data Application“We know that when we define a high-threat area in a given area of the world, that it is underpinned by very specific data sources. It’s data-driven, and we can point to those sources—if ever asked—and say, ‘Here’s why.’”
Mike Bishop, Chief Systems Architect, Prescient
Hortonworks Customer Story – Prescient Traveler
36 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Prescient Harnesses Machine Learning for Traveler Safety Warnings
S I T UAT I O N
Could only produce one assessment every
3-4 days
Performs risk management
Uses humans to identify false
positives
Needed efficient way to store raw data for
analytics
49,500 Data Sources
700% Productivity Improvement
5 Petabytes of Data
HybridArchitecture
ingested by HDF into HDP
for geospatial analysts
stored in HDP connected EMC
HDF connects data center to cloud
ETL OFFLOAD
Sensor Data Ingest
DATA DISCOVERY
ThreatAssessments
SINGLEVIEWGlobal
Threat Map
P R E D I C T I V EA N A LY T I C SThreat-Proximity Mobile Alerts
ACTIVEARCHIVE
Streaming Threat Archive
DATAENRICHMENTProvenanceMetadata
“We know that when we define a high-threat area in a given area of the world, that it is
37 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Improving Service at the UK’s Royal Mail, a Centuries-Old Business
SITUATION
Wanted to redefine data
for business decisions
Supports more than
29 million addresses
Spent 90% of
time moving data
to/from warehouse
PREDICTIVE
ANALYTICS
Investment
Planning
SINGLE
VIEW
Parcel
Distribution
SINGLE
VIEW
Customer
Acquisition
PREDICTIVEANALYTICS
New Data ProductsSINGLE
VIEW
Customer
Support
ACTIVE
ARCHIVE
OPEX
Savings
ACTIVE
ARCHIVE
EDW
Offload
ETL
OFFLOAD
Rapid
Reporting
DATA
ENRICHMENT
Public
Data Capture
DATA
ENRICHMENT
Data-as-a-
Service
Time moving EDW data from 90 to 10%
Customer churn reduced
Analytic velocity improved
Governance & compliance
freeing valuable analytic capacity
by gathering edge data with HDF
delivering insight in days, not months
simplified and centralized
“We’re accelerating that whole process, we’re not having to spin up projects just to get data. We are able to accomplish a huge amount of work with single individuals. We see Hortonworks as our advanced analytics platform.” Thomas Lee-Warren, Director of the Technology Data Group
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
From 4 hours to 2-Seconds
5000x
10s of PBS
Cloud Flexibility
threat detection latency
improved time-to-protection
of historical data for machine learning
to meet peak demand for analysis
S I T UAT I O N
3-4 hr processing latenciesto analyze digital threats
Network has +57M attack sensors in 157 countries
Data streams from 75M users on 120M devices
Long open windows of exposure to cyber attacks
“On any given day, we’ll be processing 40 billion messages into our system…It used to be that queues would back up. We would see times to analysis on the order of 4 hours. On average, we’ve gotten that down to two or two and a half seconds.” David Lin, Senior Director of Engineering, Symantec Cloud Platform
Symantec: Data Science Speeds Time to Cyber Security Protection
39 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Progressive Rewards Safe Drivers and Improves Traffic Safety
ETL OFFLOAD
Sensor Data Ingest
DATA DISCOVERY
Web Log Analysis
ACTIVE ARCHIVEIndividual
Driving Histories
100% in2-3 Days
+12 Billion
Web App-Enabled
$2.6 Billion
driving detail captured from Snapshot, in HDF
miles driven stored
customers see driving detail and improve safety
in 2014 Premiums
Existing Data Systems Did Not Scale Efficiently
Usage-Based “Snapshot” Insurance
Program
In-Car Sensor Captures IoT Data
~7 Days to Transform Only 25% of UBI Data
S I T UAT I O N
DATA DISCOVERY Online AdPlacement
DATA DISCOVERY Claim Notes
Mining
P R E D I C T I V EA N A LY T I C SUsage-Based Insurance (UBI)
“We’re looking at datasets that we never dreamed we could look at…It’s joining dots that in the past we didn’t even know we could join.”
-- Pawan Divakarla, Data & Analytics Business Leader
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Se
rvices an
d In
tegratio
n Laye
r
Search andDashboarding
Portal
Security Data Vault
CommunityAnalytical
Models
Provisioning,Management
and Monitoring
ModulesReal-time ProcessingCyber Security Engine
TelemetryParsers Enrichment
ThreatIntel
AlertTriage
Indexersand
Writers
Cyber SecurityStream Processing Pipeline
Cyber Security Solution
Tele
metry In
gest B
uffe
r
TelemetryData Collectors
Real-timeEnrich / ThreatIntel Streams
PerformanceNetwork
IngestProbes
/ OtherMachine Generated Logs(AD, App / Web Server,
firewall, VPN, etc.)
Security Endpoint Devices (Fireye, Palo Alto,
BlueCoat, etc.)
Network Data(PCAP, Netflow, Bro, etc.)
IDS(Suricata, Snort, etc.)
Threat Intelligence Feeds(Soltra, OpenTaxi,third-party feeds)
TelemetryData Sources
41 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Powering the Future of Datawith IBM & Hortonworks
Demo & Hands on:Talk to us outside the auditorium
Sean RobertsPartner Engineering, Hortonworks, EMEA@seano
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What does this Reference Application Demonstrate
Stream Processing (HDF - Stream Processing)
Aggregations over Windows via Tumbling and Sliding (HDF - Stream Processing)
Joining and Forking Streams (HDF - Stream Processing)
Pattern Matching (HDF - Stream Processing)
Collecting data from the Edge - First Mile Problem (HDF - Flow Management)
Data Acquisition, parsing, enrichment and intelligent routing (HDF - Flow Management)
© 2017 IBM Corporation & Hortonworks
© 2017 IBM Corporation & Hortonworks
Flow Management with NiFi & MiNiFi
© 2017 IBM Corporation & Hortonworks
Stream Processing with SAM (Streaming Analytics Manager)
© 2017 IBM Corporation & Hortonworks
© 2017 IBM Corporation & Hortonworks
top related