"integration of hadoop in business landscape", michal alexa, it and innovation cloud...
TRANSCRIPT
# 1
Integration of Hadoop in Business landscapeMichal AlexaService Line ManagerData Innovation LabDecember 2016
# 2
3.472 images
pinned
72 hours new
video content
uploaded
204.000.000 emails
sent
4.000.000 search
queries
277.000 tweets
347.222 photos
sent
Users sweep
416.667 times
2.460.000 new
items of content
shared
216.000 photos
shared
$ 83.000 in online
sales
48.000 apps
downloaded from
the Itunes store
26.380 new
reviews
What happens on the Internet in 60 seconds (2014)
# 3
Big-Data and Business world
Big-Data
Java, Python, PigLatin
Massive clusters for big data processing
Structured & unstructured data
Apache & open source
Distributions (e.g. Cloudera)
Engines (Spark, Impala)
Fast paced evolution since 2006
# 4
Big-Data and Business world
Big-Data
Java, Python, PigLatin
Massive clusters for big data processing
Structured & unstructured data
Apache & open source
Distributions (e.g. Cloudera)
Engines (Spark, Impala)
Fast paced evolution since 2006
???
ABAP
Client/Server
classic RDBMS as relational database
Proprietary software with interfaces
Engines OLTP, OLAP
World Positioning: 76% of finance
transactions, 78% of food
production, 82% medical devices
Steady evolution since 1972
# 5
Big-Data and Business world
Big-Data
Java, Python, PigLatin
Massive clusters for big data processing
Structured & unstructured data
Apache & open source
Distributions (e.g. Cloudera)
Engines (Spark, Impala)
Fast paced evolution since 2006
Business
ABAP
Client/Server
classic RDBMS as relational database
Proprietary software with interfaces
Engines OLTP, OLAP
World Positioning: 76% of finance
transactions, 78% of food production,
82% medical devices
Steady evolution since 1972
# 6
Story…
# 7
Story…
# 8
Biggest struggles in Data Management
Scalability
Data-Pipelines
Granularity and Velocity
Data-Silos
Extensibility
• Not any more possible to do lifetime sizing of platform during procurement
• HW requirements create limitations to possible growth
• Scale UP comes often with great cost, and scale DOWN is usually
valueless
• Data transformations are I/O intensive operations
• Take lot of time, consume lot of resources
• Limitations on format of data
• Limitations on granularity of data, often only aggregated and cleaned
data are stored
• Raw data are necessary for data science activities
• Too many places for storing data
• No interconnection between company units limits data analyzing
possibilities
• Data analyses requires lot of programing languages
• Limited applications compatibility
# 9
What is Apache Hadoop?
A software framework for storing, processing and analyzing
“big data”
ScalableDistributed Fault-TolerantOpen Source
# 10
“Data-Lake” In Business infrastructure
# 11
“Data-Lake” In Business infrastructure
Data-Lake
BW
Source
systems
logs
# 12
“Data-Lake” In Business infrastructure
Data-Lake
BW
Source
systems
logs
BW
# 13
Emerging new technologies – Integration answers to Big-Data
Smart Data Access
• Data federation feature
available on SAP HANA
• Not fully read-write
• Sybase ASE, Sybase IQ,
Teradata, and Hadoop and
some other databases
Dynamic Tearing
• Supports only Write
Optimized DSO and PSA
• Some restrictions
• Sybase IQ only
• Limited disaster
recovery
• Read & write, but
only on HANA
SDA DTNearline Storage
• Move data from online to
“nearline” database
• Read only
• Uses DAP (Data Archiving
Processes)
• Wrong assumption of
Sybase IQ as “one and
only” storage
NLSSAP HANA VORA
• DB interface between HANA
and Hadoop (Spark)
• Heavily Java-based – no ABAP
workbench integration etc.
• No UI – engine only
• Allows for reporting within
Hadoop based on Spark
VORA
DLMData Lifecycle Manager
• Hana Native only, no ERP
• Offloading to IQ or Spark
# 14
Emerging new technologies – Integration answers to Big Data
Smart Data Access
• Data federation feature
available on SAP HANA
• Not fully read-write
• Sybase ASE, Sybase IQ,
Teradata, and Hadoop and
some other databases
Dynamic Tiering
• Supports only Write
Optimized DSO and PSA
• Some restrictions
• Sybase IQ only
• Limited disaster
recovery
• Read & write, but
only on HANA
SDA DTNearline Storage
• Move data from online to
“nearline” database
• Read only
• Uses DAP (Data Archiving
Processes)
• SAP positions Sybase IQ
as “one and only” storage
NLSSAP HANA VORA
• DB interface between HANA
and Hadoop (Spark)
• Heavily Java-based – no ABAP
workbench integration etc.
• No UI – engine only
• Allows for reporting within
Hadoop based on Spark
VORA
DLMData Lifecycle Manager
• Hana Native only, no ERP
• Offloading to IQ or Spark
Offloading Integration
# 15
Business <> Hadoop struggle
Hadoop Integration with Businesses is difficult for
several reasons:
Technology readiness
IT culture
Data integration
Operations
• Development strategy
• Software logistics
• Rapid prototyping
• Data protection / personal
data
• SOX compliance
IT culture gap Data integration gap Operational gap
• ETL
• Loading of data
• Staging & enriching of
data within Hadoop
• Data flows from SAP to
Hadoop and back
• Running applications 24x7
between SAP and Hadoop
• Job scheduling
• Testing
• Patching & upgrades
We should intend to close those gaps
# 16
Summary
• Hadoop is awesome! Lets make it really
available for all businesses.
• Start small, small amount of data and
fast turnover.
• Think about how to enable new
technology to others.
Details, tech. slides and knowledge is shareable during networking.