systems of intelligence - wikibon/thecube

30
Systems of Intelligence: The Next 1015 Years of Enterprise Applications George Gilbert Big Data Analyst

Upload: george-gilbert

Post on 17-Aug-2015

1.115 views

Category:

Technology


2 download

TRANSCRIPT

Systems  of  Intelligence:The  Next  10-­‐15  Years  of  Enterprise  Applications

George  GilbertBig  Data  Analyst

• SoI build  on  SoR but  are  biggest  change  in  enterprise  apps  in  five  decades

• Enterprises  need  deep  focus  on  sourcing,  preparing,  analyzing,  modeling  data:  SoIpivot  on  data  quality,   fail  otherwise

• Speed  of  integrating  increasingly  sophisticated  analytics  with  operational  apps  ever  more  critical,  but  doesn’t  *necessarily*  require  streaming-­‐only  analytics

• SoI require  new  stack:  enterprises  must  choose  their  platform  by  balancing   need  for  optimized  functionality  vs.  need  for  simplicity

AGENDAEnterprises  Must  Manage  Journey  From  Systems  of  Record  (SoR)  to  Systems  of  Intelligence  (SoI)  By  Balancing  Skills  And  Tech  Maturity

Improve  business  process  efficiency• On  time-­‐shared  mainframes,  GUI  client-­‐server,  or  in  cloud  via  SaaS: SoR’s automate  business  processes• Standardized  processes  and  business  transactions  enable  performance  reporting  and  business  intelligence• Limitations  of  historical  performance  reporting:  like  steering  a  ship  while  looking  backwards  at  its  wake

Systems  of  Record  Automate  Business  Processes:  Five  Decades  From  Airline  Reservations  to  ERP  and  Data  Warehouses

Systems  of  Intelligence Build  on  Systems  of  Record

RetailSalesAssociate Consumer

MobileRetail Call  Center TV  Ads E-­‐Mail

SocialMediaeCommerce

• Modern

Systems  of  Intelligence  optimize  loyalty  by  anticipating  consumer  “conversation”• Omni channel:  comprehensive,   real-­‐time  integration  of  all  touch  points,   channels  via  common  data• Intelligence:  predictive  and  real-­‐time  to  influence  consumer   interaction• Loyalty and  profitability  are  higherBuild  on  SoR:  still  run  core  processes  varying  degrees  of  “real-­‐time”  integration  to  SoI• Modern SoR: are  cloud,  mobile,   social,  most  critically:  supports   fast  data  integration  with  other  apps• Data  from  SoR can  be  approximate: can  be  modestly  stale  if  apps  can’t  support   RT  query   from  consumer-­‐facing  apps;  SoI can  then  run  in  cloud    more  easily

• Omni  channel: still  needs  access  to  pricing  info,   inventory,  billing  process• Intelligence: still  needs  master  customer  data  and  transaction  history

MachineLearning

PredictiveModel

2  New  Elements  Based  Solely  on  Forward-­‐Looking  Analytics  &  Data

DataPlatform

Systems  of  Intelligence Will  Cross  Functions  And  Industries

Transformation  from  SoR to  SoISoI will  progressively  remake  existing  application  categories  via  use  of  machine  learning

HR  Talent  Management  exampleSoR:  track  recruit-­‐to-­‐retire  processes  including  source,  attract,  develop,  motivate,  retain…

SoI:  for  retention,  predict  who  is  at  risk  for  proactive  intervention

Systems  of  IntelligenceAre  Prototype  for  IoT:Example  of  Systems  Management  Becoming  Autonomic  Systems  Management

Traditional  Management Autonomic ServiceManagementObjects Servers,  storage, networks,   databases, web  

serversPhysical infrastructure  and  services   are  like  IoT“devices”

Analytics Real-­‐time dashboard  of   Predictive model  of  behavior  from  real-­‐time  streaming  data

Alerts Pre-­‐set  performance  thresholds Anomalous behavior

Action Send  alerts  to  administrators Suggest  or  auto  remediate behavior

Analytics  =  “Lights  out”via  real-­‐time,  predictive +  prescriptive  Auto  Pilot

Real-­‐time  Dashboard  =Backward  looking

The  Journey  To  Systems  Of  Intelligence:Determined  By  Combination  of  Enterprise  Capabilities,  Tech  Maturity

Smart  Grid

Adjunct  Data  Warehouse

Customer  360

Real-­‐time  loyaltyomni-­‐channelmulti-­‐touchpoint

Predictive  model  learns  from  and  anticipates  consumer  in  near  real-­‐time

Continuously   updated  prediction  of  energy  supply,  demand  tunes  end-­‐point  consumption

Autonomic  systems  management System  learns  “normal”  behavior  of  apps  and  infrastructure  and  flags  or  fixes  anomalies

Data  Lake  with  some  production  analytics  offload   from  Data  Warehouse

Enough   internal  and  external  customer  data  in  a  pipeline  to  start  predictive  modeling

Applications

Technology  Maturity,  Enterprise  CapabilitesTime

• SoI build  on  SoR but  are  biggest  change  in  enterprise  apps  in  five  decades

• Enterprises  need  deep  focus  on  sourcing,  preparing,  analyzing,  modeling  data:  SoIpivot  on  data  quality,   fail  otherwise

• Speed  of  integrating  increasingly  sophisticated  analytics  with  operational  apps  ever  more  critical,  but  doesn’t  *necessarily*  require  streaming-­‐only  analytics

• SoI require  new  stack:  enterprises  must  choose  their  platform  by  balancing   need  for  optimized  functionality  vs.  need  for  simplicity

AgendaEnterprises  Must  Manage  Journey  From  Systems  of  Record  to  Systems  of  Intelligence  By  Balancing  Skills  And  Tech  Maturity

Collecting  *Usable*  Data  About  Customer  Interactions  Requires  New  Sourcing,  Prep’ing,  Analytic  Techniques

SoR:  Traditional  Data  Warehouse  Challenge• Time-­‐to-­‐analysis  bottlenecked  by  need  to  decide  questions  before  building/designing   DW

• Design  of  DW  limits  available  data  and  then  development  cycle  for  ETL  severely  limits  ability  to  ask  new  questions

SoI Analytics:  Data  Lake  =  Training  Wheels• Time-­‐to-­‐analysis  becomes  short  enough   to  be  iterative  by  providing   self-­‐service  access  to  all  data  before  building   the  analytic  pipeline

• Analysis  open  to  interoperation  with  any  data  processing  engine   that  writes  to  HDFS

• New  production  pipelines  can  stay  to  production   Hadoop  cluster  or  go  back  to  DW

ETL  +DatabaseDesign

Mostly  Hardwired  Questions

AvailableData

HDFS

Self-­‐service  iterative  and  incremental  database  design

Data  provisioning

New  Questions

Journey  to  SoI Requires  Skills,  Technology  to  Start  Iteratively  Prep’ing Data  and  Building  Predictive  Models

Bottleneck

Systems  of  Intelligence  Always  Need  More  Sources  of  Customer  Data  – Including  Externally  Syndicated

Source:  Oracle  BlueKai

The  internal  customer  master  is  no  longer  the  last  word  about  the  customer

Raw  data  from  one  source:  logs

Preparing  Hundreds  of  Raw  Data  Sources  for  Analytics  Often  Requires  Techniques  as  Advanced  as  Machine  Learning  on  the  Data  Sources  Themselves

Prep’ing hundreds  of  sources  requires  SoItechnology  such  as  machine  learning  to  

inform  data  scientists’  decisions

Source:  Tamr

• SoI build  on  SoR but  are  biggest  change  in  enterprise  apps  in  five  decades

• Enterprises  need  deep  focus  on  sourcing,  preparing,  analyzing,  modeling  data:  SoI pivot  on  data  quality,  fail  otherwise

• Speed  of  integrating  increasingly  sophisticated  analytics  with  operational  apps  ever  more  critical,  but  doesn’t  *necessarily*  require  streaming-­‐only  analytics

• SoI require  new  stack:  enterprises  must  choose  their  platform  by  balancing  need  for  optimized  functionality  vs.  need  for  simplicity

AGENDAEnterprises  Must  Manage  Journey  From  Systems  of  Record  (SoR)  to  Systems  of  Intelligence  (SoI)  By  Balancing  Skills  And  Tech  Maturity

Range  of  “Real-­‐Time”  Interactions• REAL  RT:  high  frequency  algorithmic  

securities    trading  on  one  end  of  the  spectrum

• Updates  every  couple  hours:  inventory   levels  accessed  by  ecommerce,  mobile  apps  at  other  end  of  spectrum

Modern  SoR makes  it  easier  to  get  to  fastest  part  of  spectrum

Real-­‐Time  is  a  Matter  of  Degree:  Choices  Depend  on  Usage  Scenario,  Accessibility  of  Applications  That  Need  to  be  Integrated  – Including  Legacy  and  Modern  Systems  of  Record

NetworkOperations-­‐

FacingData

Data  Warehouse

Call  Detail  Records

BillingCRM

Key:    Scale-­‐Up  RDBMS(Oracle,  IBM,  Microsoft)

Customer-­‐FacingData

Batch  ETL

Legacy  SoR Analytic  Data  Pipeline  Limitations• Batch  ETL:  Too  slow  to  build  closed  loop  analytics  • Database  Scale  +  Cost:  Limit  amount  +  use  of  data

Operational  Applications

*Legacy*  Systems  of  Record  Need  Completely  New  Analytic  Data  Pipelines  Built  for  Speed

Legacy  SoR Analytics:Historical  reporting

ConsumerMobileRetail eCommerce

Call  Detail  Records

ERPCRM

Fast  Data:Machine  learning  on  MOST  RECENT  call  data  for  anticipating  and  influencing  customer  interaction

Batch  ETL Customer-­‐AND

Network-­‐FacingData

Real-­‐Time  Interactions:Loyalty  offers  based  on  

historical  and  most  recent  dataConnection  prioritization  

*Modern*  Systems  of  Record:  Addition  of  Streaming  Data  More  Easily  Supports  Real-­‐Time  Data  for  Predictive  Models  of  Systems  of  Intelligence

Key:    Modern  SoR Built  On  Scale-­‐OutData  Platform

Fast  Data:Machine  Learning  on  MOST  RECENT  call  data  for  anticipatingand  influencing  customer  interaction

Big  Data:Machine  Learning  on  HISTORICALdata  provides  context  for  buildingcustomer  profiles  and  model  of  network  utilization  over  time

Streaming  Data

GB

TB

PB

Batch  Processing

Min Sec MS µS

Streaming  -­‐ Velocity

Big  DataMaximum  throughput  of  dataExploratory  analysis  of  historical  data

Fast  DataFastest  speed  to  make  a  decision  on  each  event

Streaming  is  Newest  Religious  War:  Use  It  For  *All*  Analytic  Workloads?  Processing  Lots  of  Data  vs.  Analyzing  Each  Event  =  Inherent  Conflict

“Streams  can  do  it  all”  school:  Big  Data  Apps  are  Just  Fast  Data  Apps  Scaled-­‐Out• If  it  can  handle  fast  data,  just  scale  it  out  to  handle  big  

data• Big  win:  only  one  application  needed

Wikibon recommendation  (elaborated  on  next  page):Streaming  and  batch  *will  always*  coexist• Even  batch  programs  on  streaming  platform  will  still  

have  different  application  logic…• High  volume  machine  learning vs.  incremental  update• Historical  performance  analysis  vs.  looking  up  a  profile

Latency(Higher   is  Slower)

Even  When  Streaming  Engines  Support  More  Sophisticated  Analytic  WorkloadsThe  Applications  Are  Likely  to  Differ  Between  Event-­‐at-­‐a-­‐Time  vs.  Batch  

Analytic  Sophistication

Basic  Streaming

SQL

Machine  Learning

What  HappenedCounting

What  HappenedExploration,  OLAP  or  Dashboard

Anticipate  or  Act  AutomaticallyPrediction  or  Prescription

IMPLICATION:  Converging  on  one  application  engine  not  critical

Stream  processors:  Spark,  Flink,   InfoStreams,  Samza,  DataTorrent,  (DB):  VoltDB /  MemSQL

Historical  analysis

Batch-­‐oriented

Per  E

vent-­‐Orie

nted

Profile  lo

okup

Explore  large,  new

 data

Increm

ental  m

odel  update

YARN  – Cluster  Resource  Management

HDFS  or  operational  database

StreamingStorm,  Flink,Samza,  Data  Torrent

SQLImpala,  Drill,  Hive,  HAWQ…

Machine  LearningMahout…

Key  Takeaway:  Coexistence  of  Batch  and  Streaming  Means  One  Application  Engine  Doesn’t  Have  to  Rule  All  -­‐ Spark  and  Hadoop  Can  Live  Together  

Pro:  Mix  and  match  pipeline  comprised  of  specialized  processing  *optimized*   for  each  workloadCon:  Batch-­‐only  -­‐ hand-­‐off  between  processing  engines  via  storage  is  slow.    Each  processing  engine   is  standalone  and  can’t  leverage  the  others’   functionality

Pro:  Fast  and  simple  -­‐pipeline  comprised  of  one  in-­‐memory  engine  with  streaming,  SQL,  machine  learning,  graph  personalities   (libraries)

Con:  still  immature  –performance  an  issue;  haven’t  fully  delivered  integration  – But  Tungsten  per  boost,   IBM  projects  could  add  huge  new  value

Spark  Core

Spark  MLlib

Spark  Streaming

Machine  Learning

Spark  SQL:  Join,   filter,  aggregate

Streaming  Ingest

Spark  SQL

HDFS  or  operational  database

YARN  or  Mesos or  other  Workload  Mgr

Big  Data Streaming  Data

Ope

ratio

nal

Pred

ictio

nMachine

Learning

Predictions  informed  by  most  recent  data:But  model   lacks  historical  context

Model  with  most  recent  data:Learns  from  recent  or  streaming  data  streams  but   lacks  historical  context

Predictions  informed  by  historical  context:But  model  operates  on  old  data

Future:Real  Time  +  Historical  Context

Learn  +  Predict

Model  with  historical  context:But  model  drifts  when  put  into  operation

How  Systems  of  Intelligence  Get  Smarter:Big  Data  vs.  Streaming  Data  -­‐&-­‐ Learning  vs.  Predicting

Netflix  Movie  library  example• Big  Data  +  machine  learning:  At  first  sign-­‐in,  customer  clicks  through  favorite  genres,  favorite  movies;  offline  that’s  compared  with  customers  with  similar  tastes  to  generate  individual  recommendations  (operational  prediction)

• Fast  Data  +  ML:  As  the  user  browses  for  next  movie,  streaming  data  feeds  machine  learning,  which  updates  the  recommendations  in  real-­‐time  (operational  prediction)

• SoI build  on  SoR but  are  biggest  change  in  enterprise  apps  in  five  decades

• Enterprises  need  deep  focus  on  sourcing,  preparing,  analyzing,  modeling  data:  SoI pivot  on  data  quality,  fail  otherwise

• Speed  of  integrating  increasingly  sophisticated  analytics  with  operational  apps  ever  more  critical,  but  doesn’t  *necessarily*  require  streaming-­‐only  analytics

• SoI require  new  stack:  enterprises  must  choose  their  platform  by  balancing  need  for  optimized  functionality  vs.  need  for  simplicity

AGENDAEnterprises  Must  Manage  Journey  From  Systems  of  Record  (SoR)  to  Systems  of  Intelligence  (SoI)  By  Balancing  Skills  And  Tech  Maturity

Systems  of  Intelligence  Require  New  Technology  at  Every  Level  of  Stack  Compared  to  Systems  of  Record

Systems  of  Record Systems  of  Intelligence

Data Business  transactions Big  Data:  User  interactions,  contextual  observations,  machine  data  measurements

Data  preparation

Batch  ETL “All”  raw  data  collected  for  data  scientists  to  either  build  predictive  models  or  to  prep  for  business  analysts;results  of  both  put  into  continually  evolving  production  analytic  data  pipeline

Analytic data  pipeline

Historical reporting  from  data  warehouse

Predictive  models  developed  via  machine  learning  from  Big  Data  and  Fast  Data

Platforms Oracle  12c, SQL  Server,  DB2,  Teradata,  Informatica

Hadoop,  AWS, Azure,  Google  Cloud  Platform,best-­‐of-­‐breed  specialized  databases,  Oracle,  Spark

Data  platform  components

OLTP SQL  DBMS,  MPP  SQL  DBMS

OLTP,  MPP  analytic,  key-­‐value,  Bigtable-­‐type,  doc  store,  streaming,machine  learning,  graph  processing

Elaborated  on  next  slide

Data platform  component

Functionality Role Examples

Key  value  store Cache  or  session  store Serve  content   like  offers,  profiles  -­‐ fast Aerospike,  Redis,  Couchbase

Document  store Manage  JSON  data Serve  Web,  mobile  UI MongoDB

Graph processor Manage extremely  inter-­‐related  data Understand  relationships  such  as  a user’s  product  preferences

Neo4j,  Titan,  Giraph

Event  log Deliver  data  from  any  source(s)  to  any  destination(s)

Ensure  exactly once  delivery Kafka,  RabbitMQ

Stream  processor Analyze  fast  data Analytics without   lag  of  first  storing  data Spark  Streaming,  Data  Torrent,  Flink

Machine  learning Create  predictive  model Intelligence for  anticipating  and  influencing  outcomes

Azure ML,  Spark  Mllib,  Mahout

BigTable DB Operational  database  (scalable,  lite  OLTP)

Manage  millions  of  columns  by  trillions  of  rows

HBase,  Cassandra

OLTP  SQL  DBMS Operational database Heavy  duty  OLTP Oracle,  SQL  Server,  DB2

Analytic SQL  DBMS Business   Intelligence,  sometimesmachine  learning

High  performance analysis  on  Big  Data Teradata,  Vertica,  Greenplum

Orchestration Build, run,  and  manage  an  analytic  data  pipeline

Developer  focuses  on  end-­‐to-­‐end  application  rather  than  each  service

Google  Cloud  Dataflow,  Azure  Data  Factory

Systems  of  Intelligence  Data  Platform  Components

Many  optimized  data  managers(Cassandra,  Aerospike,  MongoDB,  Neo4j…)

Single  vendor  data  platform(Azure,  AWS,  Google  Cloud  Platform,  Bluemix, Pivotal)

Single  multi-­‐purpose  engine(Oracle,  Spark)

Enterprises  Must  Choose  Their  Platform  By  Balancing  Ability  to  Handle    Optimized  but  Complex  vs.  General  Purpose  Simplicity  but  Slower  Evolving

Optimize

d  +  

More  Co

mplex

General  Purpose  

+  Less  Com

plex

Faster SlowerInnovation

Hadoop  ecosystem(Cloudera,  Hortonworks,  MapR)

Pro:  Greatest  innovation  and  choice  of  products  with  optimal  functionalityCon:  Complexity  -­‐ customers  have  to  build,  integrate,  test,  operate  multi-­‐vendor,  mostly  open  source  databases(chart  source:  451  Research)

Many  Optimized  Data  Managers:   “Wild  West”  of  the  Ecosystem  -­‐ Best  for  Internet-­‐Centric  Companies  Needing  Optimized  Functionality,  Fastest  Innovation

Customer  sweet  spot• Leading-­‐edge  Internet-­‐centric  companies• Facebook,  LinkedIn,  Netflix,  Uber,  ad-­‐tech,  gaming,  ecommerce

Many  optimizeddata  managers

Pro:  Widest  and  deepest  ecosystem  that  is  curatedCon:  It’s  still  more  of  an  ecosystem  than  a  product  and  that  means  operational  and  development  complexity

Hadoop  Ecosystem is  Best  for  Those  Who  Need  Fast  Innovation  Simplified  By  Curated  Ecosystem

Customer  sweet  spot:• Internet-­‐centric  and  sophisticated  IT  enterprises• Ad-­‐tech,  gaming,  ecommerce,  telco’s,  banks,  retailers

Hadoop  is  Moving  Toward  Becoming  an  Integral  Platform  But  “Seams”  Between  Individual  Components  Still  Visible

Single  Vendor  Data  Platform-­‐as-­‐a-­‐ServiceDelivers  More  Simplicity  via  an  Integral  Offering  Balanced  With  Some  Optimization

Cloud  platforms:  Built,  integrated,  tested,  delivered,  and  operated  as  a  serviceo Microsoft:  HDInsight,  SQL  Azure,  

Azure  ML,  Streaming,  Data  Factory,  Cortana  Analytics

o AWS:  Kinesis,  S3,  DynamoDB,  EMR,  Redshift

o Google  Cloud  Dataflow,  BigQuery,  BigTable

Pro:  Single-­‐vendor  simplicity  combined  with  optimized  functionality  Con:  Potential  for  lock-­‐in;  leading-­‐edge  innovation  will  likely  exist  outside  platform

Customer  sweet  spot• Mainstream  enterprises  that  need  a  mix  of  optimized  functionality  and  the  simplicity  of  a  single  platform• Less  effort  on  admin,  development

Single  Multi-­‐Purpose  Engine Can  Have  Wide  Appeal  if  It  Stays  Close  to  Innovation,  Performance  Frontier  With  Open  Source  Economics

Pro:  Simplicity• Single  interface  for  developers,  admins• Deep  integration  greatly  reinforces  value  of  each  component  of  functionality  – e.g.  high  volume  event  streams,  queried  to  feed  continual  iteration  of  machine  learning,  which  updates  predictive  model,  which  drives  transaction  in  real-­‐time

Con:  Really  hard  to  evolve  at  pace  of  ecosystem  innovation• Spark  immaturity• Web-­‐scale  issues,  Oracle=EXPENSIVE

Integrated  analytic  data  processing  engine:  Oracle,  Sparko (OLTP  -­‐ Oracle)o SQL  queryo Event  processingo Machine  learningo Graph  processing

Customer  sweet  spotOracle:  Mainstream  enterprises  that  want  to  build  on  their  existing  data  platform  and  leverage  its  low-­‐latency  analytics

Spark:  enterprises  at  leading  edge  and  ISV’s  that  want  deeply  integrated  processing  capabilities  

• Most  mainstream  enterprises  are  very  early  in  the  journey  

• Critical  new  data  and  analytic  skills  are  required:  sourcing,  preparing,  analyzing,  modeling

• Modernizing  SoR can  accelerate  the  journey:  from  after-­‐the-­‐fact  analytics  to  predictive  models  that  inform  transactions  and  interactions  in  real-­‐time  

• Choice  of  new  platform:  depends  on  need  for  simplicity  vs.  optimized  functionality  and  latest  innovation

Recap:  Pace  and  Place  in  Enterprise  Journey  and  Choice  of  PlatformRequires  Assessment  of  Skills,  Use  Cases