systems of intelligence - wikibon/thecube
TRANSCRIPT
Systems of Intelligence:The Next 10-‐15 Years of Enterprise Applications
George GilbertBig Data Analyst
• SoI build on SoR but are biggest change in enterprise apps in five decades
• Enterprises need deep focus on sourcing, preparing, analyzing, modeling data: SoIpivot on data quality, fail otherwise
• Speed of integrating increasingly sophisticated analytics with operational apps ever more critical, but doesn’t *necessarily* require streaming-‐only analytics
• SoI require new stack: enterprises must choose their platform by balancing need for optimized functionality vs. need for simplicity
AGENDAEnterprises Must Manage Journey From Systems of Record (SoR) to Systems of Intelligence (SoI) By Balancing Skills And Tech Maturity
Improve business process efficiency• On time-‐shared mainframes, GUI client-‐server, or in cloud via SaaS: SoR’s automate business processes• Standardized processes and business transactions enable performance reporting and business intelligence• Limitations of historical performance reporting: like steering a ship while looking backwards at its wake
Systems of Record Automate Business Processes: Five Decades From Airline Reservations to ERP and Data Warehouses
Systems of Intelligence Build on Systems of Record
RetailSalesAssociate Consumer
MobileRetail Call Center TV Ads E-‐Mail
SocialMediaeCommerce
• Modern
Systems of Intelligence optimize loyalty by anticipating consumer “conversation”• Omni channel: comprehensive, real-‐time integration of all touch points, channels via common data• Intelligence: predictive and real-‐time to influence consumer interaction• Loyalty and profitability are higherBuild on SoR: still run core processes varying degrees of “real-‐time” integration to SoI• Modern SoR: are cloud, mobile, social, most critically: supports fast data integration with other apps• Data from SoR can be approximate: can be modestly stale if apps can’t support RT query from consumer-‐facing apps; SoI can then run in cloud more easily
• Omni channel: still needs access to pricing info, inventory, billing process• Intelligence: still needs master customer data and transaction history
MachineLearning
PredictiveModel
2 New Elements Based Solely on Forward-‐Looking Analytics & Data
DataPlatform
Systems of Intelligence Will Cross Functions And Industries
Transformation from SoR to SoISoI will progressively remake existing application categories via use of machine learning
HR Talent Management exampleSoR: track recruit-‐to-‐retire processes including source, attract, develop, motivate, retain…
SoI: for retention, predict who is at risk for proactive intervention
Systems of IntelligenceAre Prototype for IoT:Example of Systems Management Becoming Autonomic Systems Management
Traditional Management Autonomic ServiceManagementObjects Servers, storage, networks, databases, web
serversPhysical infrastructure and services are like IoT“devices”
Analytics Real-‐time dashboard of Predictive model of behavior from real-‐time streaming data
Alerts Pre-‐set performance thresholds Anomalous behavior
Action Send alerts to administrators Suggest or auto remediate behavior
Analytics = “Lights out”via real-‐time, predictive + prescriptive Auto Pilot
Real-‐time Dashboard =Backward looking
The Journey To Systems Of Intelligence:Determined By Combination of Enterprise Capabilities, Tech Maturity
Smart Grid
Adjunct Data Warehouse
Customer 360
Real-‐time loyaltyomni-‐channelmulti-‐touchpoint
Predictive model learns from and anticipates consumer in near real-‐time
Continuously updated prediction of energy supply, demand tunes end-‐point consumption
Autonomic systems management System learns “normal” behavior of apps and infrastructure and flags or fixes anomalies
Data Lake with some production analytics offload from Data Warehouse
Enough internal and external customer data in a pipeline to start predictive modeling
Applications
Technology Maturity, Enterprise CapabilitesTime
• SoI build on SoR but are biggest change in enterprise apps in five decades
• Enterprises need deep focus on sourcing, preparing, analyzing, modeling data: SoIpivot on data quality, fail otherwise
• Speed of integrating increasingly sophisticated analytics with operational apps ever more critical, but doesn’t *necessarily* require streaming-‐only analytics
• SoI require new stack: enterprises must choose their platform by balancing need for optimized functionality vs. need for simplicity
AgendaEnterprises Must Manage Journey From Systems of Record to Systems of Intelligence By Balancing Skills And Tech Maturity
Collecting *Usable* Data About Customer Interactions Requires New Sourcing, Prep’ing, Analytic Techniques
SoR: Traditional Data Warehouse Challenge• Time-‐to-‐analysis bottlenecked by need to decide questions before building/designing DW
• Design of DW limits available data and then development cycle for ETL severely limits ability to ask new questions
SoI Analytics: Data Lake = Training Wheels• Time-‐to-‐analysis becomes short enough to be iterative by providing self-‐service access to all data before building the analytic pipeline
• Analysis open to interoperation with any data processing engine that writes to HDFS
• New production pipelines can stay to production Hadoop cluster or go back to DW
ETL +DatabaseDesign
Mostly Hardwired Questions
AvailableData
HDFS
Self-‐service iterative and incremental database design
Data provisioning
New Questions
Journey to SoI Requires Skills, Technology to Start Iteratively Prep’ing Data and Building Predictive Models
Bottleneck
Systems of Intelligence Always Need More Sources of Customer Data – Including Externally Syndicated
Source: Oracle BlueKai
The internal customer master is no longer the last word about the customer
Raw data from one source: logs
Preparing Hundreds of Raw Data Sources for Analytics Often Requires Techniques as Advanced as Machine Learning on the Data Sources Themselves
Prep’ing hundreds of sources requires SoItechnology such as machine learning to
inform data scientists’ decisions
Source: Tamr
• SoI build on SoR but are biggest change in enterprise apps in five decades
• Enterprises need deep focus on sourcing, preparing, analyzing, modeling data: SoI pivot on data quality, fail otherwise
• Speed of integrating increasingly sophisticated analytics with operational apps ever more critical, but doesn’t *necessarily* require streaming-‐only analytics
• SoI require new stack: enterprises must choose their platform by balancing need for optimized functionality vs. need for simplicity
AGENDAEnterprises Must Manage Journey From Systems of Record (SoR) to Systems of Intelligence (SoI) By Balancing Skills And Tech Maturity
Range of “Real-‐Time” Interactions• REAL RT: high frequency algorithmic
securities trading on one end of the spectrum
• Updates every couple hours: inventory levels accessed by ecommerce, mobile apps at other end of spectrum
Modern SoR makes it easier to get to fastest part of spectrum
Real-‐Time is a Matter of Degree: Choices Depend on Usage Scenario, Accessibility of Applications That Need to be Integrated – Including Legacy and Modern Systems of Record
NetworkOperations-‐
FacingData
Data Warehouse
Call Detail Records
BillingCRM
Key: Scale-‐Up RDBMS(Oracle, IBM, Microsoft)
Customer-‐FacingData
Batch ETL
Legacy SoR Analytic Data Pipeline Limitations• Batch ETL: Too slow to build closed loop analytics • Database Scale + Cost: Limit amount + use of data
Operational Applications
*Legacy* Systems of Record Need Completely New Analytic Data Pipelines Built for Speed
Legacy SoR Analytics:Historical reporting
ConsumerMobileRetail eCommerce
Call Detail Records
ERPCRM
Fast Data:Machine learning on MOST RECENT call data for anticipating and influencing customer interaction
Batch ETL Customer-‐AND
Network-‐FacingData
Real-‐Time Interactions:Loyalty offers based on
historical and most recent dataConnection prioritization
*Modern* Systems of Record: Addition of Streaming Data More Easily Supports Real-‐Time Data for Predictive Models of Systems of Intelligence
Key: Modern SoR Built On Scale-‐OutData Platform
Fast Data:Machine Learning on MOST RECENT call data for anticipatingand influencing customer interaction
Big Data:Machine Learning on HISTORICALdata provides context for buildingcustomer profiles and model of network utilization over time
Streaming Data
GB
TB
PB
Batch Processing
Min Sec MS µS
Streaming -‐ Velocity
Big DataMaximum throughput of dataExploratory analysis of historical data
Fast DataFastest speed to make a decision on each event
Streaming is Newest Religious War: Use It For *All* Analytic Workloads? Processing Lots of Data vs. Analyzing Each Event = Inherent Conflict
“Streams can do it all” school: Big Data Apps are Just Fast Data Apps Scaled-‐Out• If it can handle fast data, just scale it out to handle big
data• Big win: only one application needed
Wikibon recommendation (elaborated on next page):Streaming and batch *will always* coexist• Even batch programs on streaming platform will still
have different application logic…• High volume machine learning vs. incremental update• Historical performance analysis vs. looking up a profile
Latency(Higher is Slower)
Even When Streaming Engines Support More Sophisticated Analytic WorkloadsThe Applications Are Likely to Differ Between Event-‐at-‐a-‐Time vs. Batch
Analytic Sophistication
Basic Streaming
SQL
Machine Learning
What HappenedCounting
What HappenedExploration, OLAP or Dashboard
Anticipate or Act AutomaticallyPrediction or Prescription
IMPLICATION: Converging on one application engine not critical
Stream processors: Spark, Flink, InfoStreams, Samza, DataTorrent, (DB): VoltDB / MemSQL
Historical analysis
Batch-‐oriented
Per E
vent-‐Orie
nted
Profile lo
okup
Explore large, new
data
Increm
ental m
odel update
YARN – Cluster Resource Management
HDFS or operational database
StreamingStorm, Flink,Samza, Data Torrent
SQLImpala, Drill, Hive, HAWQ…
Machine LearningMahout…
Key Takeaway: Coexistence of Batch and Streaming Means One Application Engine Doesn’t Have to Rule All -‐ Spark and Hadoop Can Live Together
Pro: Mix and match pipeline comprised of specialized processing *optimized* for each workloadCon: Batch-‐only -‐ hand-‐off between processing engines via storage is slow. Each processing engine is standalone and can’t leverage the others’ functionality
Pro: Fast and simple -‐pipeline comprised of one in-‐memory engine with streaming, SQL, machine learning, graph personalities (libraries)
Con: still immature –performance an issue; haven’t fully delivered integration – But Tungsten per boost, IBM projects could add huge new value
Spark Core
Spark MLlib
Spark Streaming
Machine Learning
Spark SQL: Join, filter, aggregate
Streaming Ingest
Spark SQL
HDFS or operational database
YARN or Mesos or other Workload Mgr
Big Data Streaming Data
Ope
ratio
nal
Pred
ictio
nMachine
Learning
Predictions informed by most recent data:But model lacks historical context
Model with most recent data:Learns from recent or streaming data streams but lacks historical context
Predictions informed by historical context:But model operates on old data
Future:Real Time + Historical Context
Learn + Predict
Model with historical context:But model drifts when put into operation
How Systems of Intelligence Get Smarter:Big Data vs. Streaming Data -‐&-‐ Learning vs. Predicting
Netflix Movie library example• Big Data + machine learning: At first sign-‐in, customer clicks through favorite genres, favorite movies; offline that’s compared with customers with similar tastes to generate individual recommendations (operational prediction)
• Fast Data + ML: As the user browses for next movie, streaming data feeds machine learning, which updates the recommendations in real-‐time (operational prediction)
• SoI build on SoR but are biggest change in enterprise apps in five decades
• Enterprises need deep focus on sourcing, preparing, analyzing, modeling data: SoI pivot on data quality, fail otherwise
• Speed of integrating increasingly sophisticated analytics with operational apps ever more critical, but doesn’t *necessarily* require streaming-‐only analytics
• SoI require new stack: enterprises must choose their platform by balancing need for optimized functionality vs. need for simplicity
AGENDAEnterprises Must Manage Journey From Systems of Record (SoR) to Systems of Intelligence (SoI) By Balancing Skills And Tech Maturity
Systems of Intelligence Require New Technology at Every Level of Stack Compared to Systems of Record
Systems of Record Systems of Intelligence
Data Business transactions Big Data: User interactions, contextual observations, machine data measurements
Data preparation
Batch ETL “All” raw data collected for data scientists to either build predictive models or to prep for business analysts;results of both put into continually evolving production analytic data pipeline
Analytic data pipeline
Historical reporting from data warehouse
Predictive models developed via machine learning from Big Data and Fast Data
Platforms Oracle 12c, SQL Server, DB2, Teradata, Informatica
Hadoop, AWS, Azure, Google Cloud Platform,best-‐of-‐breed specialized databases, Oracle, Spark
Data platform components
OLTP SQL DBMS, MPP SQL DBMS
OLTP, MPP analytic, key-‐value, Bigtable-‐type, doc store, streaming,machine learning, graph processing
Elaborated on next slide
Data platform component
Functionality Role Examples
Key value store Cache or session store Serve content like offers, profiles -‐ fast Aerospike, Redis, Couchbase
Document store Manage JSON data Serve Web, mobile UI MongoDB
Graph processor Manage extremely inter-‐related data Understand relationships such as a user’s product preferences
Neo4j, Titan, Giraph
Event log Deliver data from any source(s) to any destination(s)
Ensure exactly once delivery Kafka, RabbitMQ
Stream processor Analyze fast data Analytics without lag of first storing data Spark Streaming, Data Torrent, Flink
Machine learning Create predictive model Intelligence for anticipating and influencing outcomes
Azure ML, Spark Mllib, Mahout
BigTable DB Operational database (scalable, lite OLTP)
Manage millions of columns by trillions of rows
HBase, Cassandra
OLTP SQL DBMS Operational database Heavy duty OLTP Oracle, SQL Server, DB2
Analytic SQL DBMS Business Intelligence, sometimesmachine learning
High performance analysis on Big Data Teradata, Vertica, Greenplum
Orchestration Build, run, and manage an analytic data pipeline
Developer focuses on end-‐to-‐end application rather than each service
Google Cloud Dataflow, Azure Data Factory
Systems of Intelligence Data Platform Components
Many optimized data managers(Cassandra, Aerospike, MongoDB, Neo4j…)
Single vendor data platform(Azure, AWS, Google Cloud Platform, Bluemix, Pivotal)
Single multi-‐purpose engine(Oracle, Spark)
Enterprises Must Choose Their Platform By Balancing Ability to Handle Optimized but Complex vs. General Purpose Simplicity but Slower Evolving
Optimize
d +
More Co
mplex
General Purpose
+ Less Com
plex
Faster SlowerInnovation
Hadoop ecosystem(Cloudera, Hortonworks, MapR)
Pro: Greatest innovation and choice of products with optimal functionalityCon: Complexity -‐ customers have to build, integrate, test, operate multi-‐vendor, mostly open source databases(chart source: 451 Research)
Many Optimized Data Managers: “Wild West” of the Ecosystem -‐ Best for Internet-‐Centric Companies Needing Optimized Functionality, Fastest Innovation
Customer sweet spot• Leading-‐edge Internet-‐centric companies• Facebook, LinkedIn, Netflix, Uber, ad-‐tech, gaming, ecommerce
Many optimizeddata managers
Pro: Widest and deepest ecosystem that is curatedCon: It’s still more of an ecosystem than a product and that means operational and development complexity
Hadoop Ecosystem is Best for Those Who Need Fast Innovation Simplified By Curated Ecosystem
Customer sweet spot:• Internet-‐centric and sophisticated IT enterprises• Ad-‐tech, gaming, ecommerce, telco’s, banks, retailers
Hadoop is Moving Toward Becoming an Integral Platform But “Seams” Between Individual Components Still Visible
Single Vendor Data Platform-‐as-‐a-‐ServiceDelivers More Simplicity via an Integral Offering Balanced With Some Optimization
Cloud platforms: Built, integrated, tested, delivered, and operated as a serviceo Microsoft: HDInsight, SQL Azure,
Azure ML, Streaming, Data Factory, Cortana Analytics
o AWS: Kinesis, S3, DynamoDB, EMR, Redshift
o Google Cloud Dataflow, BigQuery, BigTable
Pro: Single-‐vendor simplicity combined with optimized functionality Con: Potential for lock-‐in; leading-‐edge innovation will likely exist outside platform
Customer sweet spot• Mainstream enterprises that need a mix of optimized functionality and the simplicity of a single platform• Less effort on admin, development
Single Multi-‐Purpose Engine Can Have Wide Appeal if It Stays Close to Innovation, Performance Frontier With Open Source Economics
Pro: Simplicity• Single interface for developers, admins• Deep integration greatly reinforces value of each component of functionality – e.g. high volume event streams, queried to feed continual iteration of machine learning, which updates predictive model, which drives transaction in real-‐time
Con: Really hard to evolve at pace of ecosystem innovation• Spark immaturity• Web-‐scale issues, Oracle=EXPENSIVE
Integrated analytic data processing engine: Oracle, Sparko (OLTP -‐ Oracle)o SQL queryo Event processingo Machine learningo Graph processing
Customer sweet spotOracle: Mainstream enterprises that want to build on their existing data platform and leverage its low-‐latency analytics
Spark: enterprises at leading edge and ISV’s that want deeply integrated processing capabilities
• Most mainstream enterprises are very early in the journey
• Critical new data and analytic skills are required: sourcing, preparing, analyzing, modeling
• Modernizing SoR can accelerate the journey: from after-‐the-‐fact analytics to predictive models that inform transactions and interactions in real-‐time
• Choice of new platform: depends on need for simplicity vs. optimized functionality and latest innovation
Recap: Pace and Place in Enterprise Journey and Choice of PlatformRequires Assessment of Skills, Use Cases