relational technologies under siege: will handsome newcomers displace the stalwart incumbents?

5

Click here to load reader

Upload: neil-raden

Post on 24-May-2015

286 views

Category:

Data & Analytics


1 download

DESCRIPTION

After three decades of prominence, Relational Database Management Systems (RDBMS) are being challenged by a raft of new technologies. While enjoying a position of incumbency, newer data management approaches are benefitting from a vibrancy powered by the effects of Moore's Law and Big Data. Hadoop and NoSQL offerings were designed for the cloud, but are finding a place in enterprise architecture. In fact, Hadoop has already made a dent in the burgeoning field of analytics, previously the realm of data warehouses and analytical (relational) platforms.

TRANSCRIPT

Page 1: Relational Technologies Under Siege:  Will Handsome Newcomers Displace the Stalwart Incumbents?

™  Hired  Brains  Research  LLC  Published:  October  16,  2014  Analyst:  Neil  Raden  [email protected]      

Page  1  of  5   Copyright  ©  2014  Hired  Brains  Research  LLC      

Relational  Technologies  Under  Siege:    Will  Handsome  Newcomers  Displace  the  Stalwart  Incumbents?    

After  three  decades  of  prominence,  Relational  Database  Management  Systems  

(RDBMS)  are  being  challenged  by  a  raft  of  new  technologies.  While  enjoying  a  position  of  incumbency,  newer  data  management  approaches  are  benefitting  from  a  vibrancy  powered  by  the  effects  of  Moore's  Law  and  Big  Data.  Hadoop  and  NoSQL  offerings  were  designed  for  the  cloud,  but  are  finding  a  place  in  enterprise  architecture.  In  fact,  Hadoop  has  already  made  a  dent  in  the  burgeoning  field  of  analytics,  previously  the  realm  of  data  warehouses  and  analytical  (relational)  platforms.    

KEY  FINDINGS  

• RDBMS  are  overwhelmed  by  new  forms  of  data  (so-­‐called  "big  data"),  including  text,  documents,  machine-­‐generated  streams,  graphs  and  others,  but  are  counter-­‐attacking  with  new  development  and  features  as  well  as  acquisitions  and  partnerships  • Non-­‐relational  platform  vendors  assert  that  the  relational  model  itself  is  too  rigid  and  expensive  for  the  explosion  of  information  • A  fundamental  drawback  in  RDBMS  technology  is  the  tight  coupling  of  the  storage,  metadata  and  parser/optimizer  layers  that  cannot  take  advantage  of  the  separate  storage  and  compute  capabilities  of  Hadoop  • Advances  in  technology  are  not  the  key  differentiators  between  RDBMS  tools  and  Hadoop/Big  Data  and  NoSQL  offerings.  The  continuing  enterprise  need  for  or  quality,  integrated  information  and  a  "single  version  of  the  truth"  

Page 2: Relational Technologies Under Siege:  Will Handsome Newcomers Displace the Stalwart Incumbents?

™  Hired  Brains  Research  LLC  Published:  October  16,  2014  Analyst:  Neil  Raden  [email protected]      

Page  2  of  5   Copyright  ©  2014  Hired  Brains  Research  LLC      

argue  for  existing  and  enhanced  relational  data  warehouses  versus  the  "good  enough"  mentality  of  cloud-­‐based  and  Hadoop  efforts  that  were  developed  for  large  internet  companies.  These  are  the  key  identifying  differences  between  the  analytical  approaches  • The  "new-­‐new"  is  pretty  exciting,  but  there  is  a  rush  to  provide  true  SQL  access  to  many  of  these  platforms,  an  admission  that  the  relational  calculus  will  endure  • Desirable  features  of  RDBMS  will  migrate  to  the  distributed  processing  of  Hadoop,  but  only  once  Hadoop  solves  its  shortcomings  in  security,  workload  management  and  operability.  Born-­‐in-­‐the  cloud  SaaS  applications  built  on  NoSQL  databases  (even  some  to  emerge)  will  operate  seamlessly  on  this  platform,  but  not  for  3-­‐5  years  • Surveys  of  "revenue  intention"  for  new  technology  spending  are  misleading;  only  15%  of  companies  surveyed  are  using  Hadoop,  and  many  are  experiments.    

ANALYSIS  

Relational  database  technology  was  adopted  by  the  enterprise  for  its  ability  to  host  transactional/operational  applications.  By  the  late  80's  vendors  posted  benchmarks  of  transactions/second  that  exceeded  those  of  the  purely  proprietary  databases  with  the  added  benefit  of  an  abstracted  language,  SQL  that  allowed  for  different  flavors  of  databases  to  be  designed,  queried  and  maintained  without  the  effort  of  learning  a  new  proprietary  language  for  each  one.  

Later,  as  the  need  grew  for  more  careful  data  management  for  reporting  and  analytics,  RDBMS  were  pressed  into  service  as  data  warehouses,  a  role  for  which  they  were  not  well-­‐suited  in  terms  of  scale  and  especially  speed  of  complex  queries  

Page 3: Relational Technologies Under Siege:  Will Handsome Newcomers Displace the Stalwart Incumbents?

™  Hired  Brains  Research  LLC  Published:  October  16,  2014  Analyst:  Neil  Raden  [email protected]      

Page  3  of  5   Copyright  ©  2014  Hired  Brains  Research  LLC      

and  large  table  joins.  This  need  was  met  in  a  number  of  ways,  to  some  degree,  but  it  took  time.  

This  is  precisely  where  we  see  Hadoop  today,  a  tool  that  was  built  to  support  search  and  indexing  of  unruly  data  in  the  Internet,  primarily.  However,  its  advantages  in  term  of  cost  and  scale  are  so  compelling  that  it  is  quickly  being  pressed  into  service  as  an  enterprise  analytics  platform,  but  it  is  sorely  lacking  in  some  features  that  data  warehouses  and  analytical  platforms  (like  Vertica,  Netezza,  Teradata  etc.)  already  possess.  

The  trend  for  distributors  of  Hadoop  is  to  claim  that  relational  data  warehouses  are  obsolete,  or  at  best  artifacts  that  have  some  enduring  value.  Curiously,  with  all  of  the  attendant  deficiencies  of  RDBMS  in  their  view,  they  are  mostly  mute  about  how  Hadoop  will  address  the  role  RDBMS  fills  so  well  for  transactional  purposes,  but  that  is  likely  to  change.  

Relational  vendors  are  at  work  to  put  in  place  reference  architectures  (and  products  to  support  them)  that  are  hybrid  in  nature.  A  term  emerging  is  polyglot  persistence,  the  ability  of  the  first  mover  in  an  analytical  query  to  parse  and  distribute  pieces  of  the  query  to  the  logical  location  of  the  data  and,  preferably,  the  compute  engine  for  that  data  without  having  to  bulk-­‐load  data  and  persist  it  to  answer  a  question.  The  concept  is  similar  to  federating  queries,  but  much  more  powerful  as  a  federation  scheme  usually  involves  design  of  a  reference  schema  and  assembling  and  transforming  the  data  into  a  single  place  to  satisfy  the  query.  In  a  hybrid  architecture,  there  are  actually  multiple  storage  locations  (even  in-­‐memory)  and  compute  resources  working  in  a  cooperative  fashion.  This  arrangement  preserves  the  RDBMS  as  the  origin  of  analytical  queries  and  provider  of  the  answer  set  and  

Page 4: Relational Technologies Under Siege:  Will Handsome Newcomers Displace the Stalwart Incumbents?

™  Hired  Brains  Research  LLC  Published:  October  16,  2014  Analyst:  Neil  Raden  [email protected]      

Page  4  of  5   Copyright  ©  2014  Hired  Brains  Research  LLC      

simplifies  the  maintenance  and  orchestration  of  downstream  processes,  especially  analytical,  visualization  and  data  discovery.  

RDBMS  were  mostly  row-­‐oriented,  given  their  OLTP  orientation,  but  some  

adopted  a  column-­‐orientation,  the  most  visible  early  on  being  SybaselQ.  In  the  past  few  years,  it  became  obvious  that  analytical  applications  would  be  better  served  by  

a  columnar  orientation  and  products  like  Vertica  emerged  combined  with  a  highly  

scalable  MPP  architecture.  But  today,  there  is  an  explosion  of  new  databases  of  

many  types  such  as  (a  sampling,  not  comprehensive):    

• Column:  Accumulo,  Cassandra,  HBase  • Wide  Table:  Google  BigTable  • Document:  MongoDB,  Apache  CouchDB,  Couchbase  • Key  Value:  Dynamo,  FoundationDB,  MapR-­‐DB  • Graph:  Neo4J,  InfiniteGraph  and  Virtuoso  

Keep  in  mind  that  none  of  these  database  system  are  "general  purpose,"  most  require  programming  interfaces  and  lack  the  kind  of  management  and  administrative  features  that  IT  departments  demand.  

RECOMMENDATIONS  

• Recognize  that  RDBMS,  Hadoop  and  NoSQL  databases  have  vastly  different  purposes,  capabilities,  features  and  maturity  • When  contemplating  a  move  from  a  Enterprise  Data  Warehouse  and/or  on-­‐premise  ETL,  take  the  long  view  of  the  effort,  cost  and  disruption  • Determine  exactly  what  your  RDBMS  vendor  is  planning  for  supporting  "hybrid"  environments  because,  for  the  time  being,  it  will  have  the  effect  on  the  downstream  activities  of  analytics  

Page 5: Relational Technologies Under Siege:  Will Handsome Newcomers Displace the Stalwart Incumbents?

™  Hired  Brains  Research  LLC  Published:  October  16,  2014  Analyst:  Neil  Raden  [email protected]      

Page  5  of  5   Copyright  ©  2014  Hired  Brains  Research  LLC      

• There  are  many  use  cases  for  NoSQL/Big  Data  that  are  compelling  and  you  should  carefully  consider  them.  In  general,  they  go  beyond  your  existing  Data  Warehouse/BI  but  are  not  necessarily  a  suitable  replacement.  In  two  years  this  will  likely  change.  • Go  slow  and  do  not  throw  away  the  baby  with  the  bath  water.  The  best  approach  is  to  experiment  with  a  "skunk  works"  project  or  two  to  get  a  feel  if  the  approach  is  right  for  your  organization.  Beyond  that,  design  a  careful  Proof  of  Concept  (PoC)  that  can  actually  "prove"  your  "concept."  Vendors  tend  to  insert  requirements  and  features  that  favor  their  product,  which  can  derail  the  validity  of  the  PoC.  

IN  CONCLUSION  

The  explosion  in  database  technology  was  inevitable  as  the  effects  of  Moore's  

Law  caused  a  discontinuous  jump  in  the  flow  and  processing  of  information.  

Technology,  however,  is  always  a  step  ahead  of  business.  The  implementation  of  

enterprise  applications,  information  management  and  processing  platforms  is  a  

carefully  woven  fabric  that  does  not  bear  rapid  disruption  (unless,  of  course,  that  

is  the  enterprise's  strategy).  "Big  data"  can  provide  enormous  benefits  to  

organizations,  but  not  to  all  of  them.  Many  will  find  it  preferable  to  rely  on  third  

parties  to  prepare  and  even  interpret  big  data  for  them.  For  those  that  see  a  clear  

requirement,  it  is  wise  to  consider  the  whole  playing  field  and  how  the  insights  

gained  will  find  purchase  and  value.  As  Peter  Drucker  said,  "Information  is  data  

that  has  meaning  and  purpose."