a*new*product:** hunk*–splunk*analybcs*for* hadoop* · 2017-10-13 · new*productfrom* splunk*...

32
Copyright © 2013 Splunk Inc. A New Product: Hunk – Splunk AnalyBcs for Hadoop (BETA) Clint Sharp Director of Product Management #splunkconf

Upload: others

Post on 25-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Copyright  ©  2013  Splunk  Inc.  

A  New  Product:    Hunk  –  Splunk  AnalyBcs  for  Hadoop  (BETA)  Clint  Sharp  Director  of  Product  Management  #splunkconf  

Page 2: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Legal  NoBces  During  the  course  of  this  presentaBon,  we  may  make  forward-­‐looking  statements  regarding  future  events  or  the  expected  performance  of  the  company.  We  cauBon  you  that  such  statements  reflect  our  current  expectaBons  and  esBmates  based  on  factors  currently  known  to  us  and  that  actual  events  or  results  could  differ  materially.  For  important  factors  that  may  cause  actual  results  to  differ  from  those  contained  in  our  forward-­‐looking  statements,  please  review  our  filings  with  the  SEC.    The  forward-­‐looking  statements  made  in  this  presentaBon  are  being  made  as  of  the  Bme  and  date  of  its  live  presentaBon.    If  reviewed  aYer  its  live  presentaBon,  this  presentaBon  may  not  contain  current  or  accurate  informaBon.      We  do  not  assume  any  obligaBon  to  update  any  forward-­‐looking  statements  we  may  make.    In  addiBon,  any  informaBon  about  our  roadmap  outlines  our  general  product  direcBon  and  is  subject  to  change  at  any  Bme  without  noBce.    It  is  for  informaBonal  purposes  only  and  shall  not,  be  incorporated  into  any  contract  or  other  commitment.    Splunk  undertakes  no  obligaBon  either  to  develop  the  features  or  funcBonality  described  or  to  include  any  such  feature  or  funcBonality  in  a  future  release.  

 

Splunk,  Splunk>,  Splunk  Storm,  Listen  to  Your  Data,  SPL  and  The  Engine  for  Machine  Data  are  trademarks  and  registered  trademarks  of  Splunk  Inc.  in  the  United  States  and  other  countries.  All  other  brand  names,  product  names,  or  trademarks  belong  to  their  respecCve  

owners.    

©2013  Splunk  Inc.  All  rights  reserved.  

2  

Page 3: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

New  product  from  Splunk  delivers  interac(ve  data  explora(on,  analysis  and  visualiza(ons  for  Hadoop  

Announcing  Hunk  Beta  Splunk  AnalyBcs  for  Hadoop  

3  

Page 4: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

A  Lot  of  OrganizaBonal  Data  Ends  Up  in  Hadoop  

!   20X  services  relaBve  to  soYware  

!   Inadequate  skills  for  big    data  analyBcs  

!   13+  Hadoop-­‐related  projects  requiring  integraBon  

!   Data  is  “too  big  to  move”  Hadoop  

(MapReduce  &  HDFS)  

YARN  Ambari Avro  

Cassandra

Chukwa  

H  i  v  e  

HBase Mahout  

Pig  

ZooKeeper  

13+  Hadoop-­‐related  projects  

Challenges  Deploying  and  Leveraging  Hadoop    

4  

Page 5: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Splunk  Hadoop  Connect  

Reliable  bi-­‐direcBonal  integraBon  to  Hadoop  >1000  downloads  

October  2012:  Splunk  Hadoop  Connect  To  Address  Common  Challenges  Deploying  and  Running  Hadoop  

HA  indexes  and  storage  

Commodity  servers  

Hadoop  (MapReduce  &  HDFS)  

Import  Browse  Export  

Report  and    analyze  

Custom    dashboards  

Monitor    and  alert  

Ad  hoc    search  

5  

Page 6: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

What  About  ExtracBng  Value  Directly  from  Hadoop?  

“How  can  we  leverage  the      full  capabiliBes  of  Splunk        naBvely  on  data  in  Hadoop?”  

Data  in  Hadoop  is  too  big  to  move  

HA  indexes  and  storage  

Commodity  servers  

Hadoop  (MapReduce  &  HDFS)  

Report  and    analyze  

Custom    dashboards  

Monitor    and  alert  

Ad  hoc    search  

6  

Page 7: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Hunk:  Splunk  AnalyBcs  for  Hadoop  

Hadoop  (MapReduce  &  HDFS)  

Full-­‐featured,  integrated  product  

Insights  for  everyone  

Distribu(on  agnos(c  

Delivers  interacBve  data  exploraBon,  analysis  and  visualizaBon  for  Hadoop  

Empowers  broader  user  groups  to  derive  acBonable  insights  from  raw  data  in  Hadoop  

Works  with  leading    distribuBons  to  maximize  enterprise  technology  investments  

Explore   Analyze   Visualize   Dashboards   Share  

7  

Page 8: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Derive  AcBonable  Insights  from  Raw  Data    

Hadoop  storage  

Immediately  start  exploring,  analyzing  and  visualizing  raw  data  in  Hadoop  

1 2Point  Splunk  at  Hadoop  cluster  

Explore   Analyze   Visualize   Dashboards   Share  

8  

Page 9: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Explore,  Analyze  and  Visualize  Data,  On-­‐the-­‐fly  Virtual  index   Schema-­‐on-­‐the-­‐fly   Flexibility  and    

fast  (me  to  value  

•  Enables  seamless  use  of  the  enBre  Splunk  technology  stack  on  data  wherever  it  rests  • Hadoop  virtual  index  automaBcally  handles  MapReduce  •  Technology  is    patent  pending  

•  Structure  applied  at  search  Bme  • No  brille  schema  to  work  around  • AutomaBcally  find  transacBons,  palerns  and  trends  

• NormalizaBon  as  it’s  needed  •  Faster  implementaBon  •  Easy  search  language  • MulBple  views  into  the  same  data  

9  

Page 10: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

InteracBve  Data  ExploraBon  

Search  interface  

Search  assistant  

InteracBve  results  window  

!   Powerful  search  processing  language  (SPL)  

!   Designed  for  data  exploraBon  across  large  datasets  –  preview  data,  iterate  quickly  

!   No  requirement  to  “understand”  data  upfront  

Search  and  explore  data  from  one  place  

10  

Page 11: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

InteracBve  Data  Analysis  Rapidly  analyze  and  interact  with  data  

!   InteracBve  analyBcs  interface  !   Deep  analysis,  palern  detecBon  and  finding  anomalies  with  over  100  staBsBcal  commands  

!   Enrich  results  with  informaBon  from  external  relaBonal  databases  

InteracBve,    analyBcs  interface  

Formaong  opBons  

11  

Page 12: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Powerful  Plaporm  for  Enterprise  Developers  

JavaScript    

Java    

Python    

PHP    

C#    

Ruby    

API    

Add  new  UI  components  

Integrate  into  exisBng  systems  

With  known  languages  

and  frameworks  

12  

Page 13: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Technology  Overview  

Page 14: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Hunk  Server  

64-­‐bit  Linux  OS  

splunkweb  •  Web  and  applicaBon  server  •  Python,  AJAX,  CSS,  XSLT,  XML  

•  Search  head  •  Virtual  indexes  •  C++,  web  services  

REST  API   COMMAND  LINE  

Explore   Analyze   Visualize   Dashboards   Share  

ODBC  (beta)  

splunkd  

Hadoop  interface  •  Hadoop  client  libraries  •  JAVA  

14  

Page 15: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

64-­‐bit  Linux  OS  

splunkweb  •  Web  and  applicaBon  server  •  Python,  AJAX,  CSS,  XSLT,  XML  

•  Search  head  •  Virtual  indexes  •  C++,  web  services  

REST  API   COMMAND  LINE  

Explore   Analyze   Visualize   Dashboards   Share  

ODBC  (beta)  

splunkd  

Hadoop  interface  •  Hadoop  client  libraries  •  JAVA  

Connect  to  HDFS  and  MapReduce  

Connect  to  Apache  HDFS  and  MapReduce    or  your  choice  of  Hadoop  distribuBon  

Hadoop  cluster  1  

15  

Page 16: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

64-­‐bit  Linux  OS  

splunkweb  •  Web  and  applicaBon  server  •  Python,  AJAX,  CSS,  XSLT,  XML  

•  Search  head  •  Virtual  indexes  •  C++,  web  services  

REST  API   COMMAND  LINE  

Explore   Analyze   Visualize   Dashboards   Share  

ODBC  (beta)  

splunkd  

Hadoop  interface  •  Hadoop  client  libraries  •  JAVA  

Hunk  Scales  with  Your  Hadoop  Deployments  Connect  Hunk  to  mulBple  Hadoop  clusters  

Hadoop  cluster  3  

Hadoop  cluster  2  

Hadoop  cluster  1  

16  

Page 17: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Prerequisites  

Hadoop  access  rights  

Java  1.6+  Hadoop  client  

libraries    

HDFS  scratch  space  

Data  in  Hadoop  to  analyze  

DataNode  local  temp  disk  space  

17  

Page 18: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

MapReduce  as  The  OrchestraBon  Framework  

1.  Copy  splunkd          binary   HDFS  .tgz  

         TaskTracker  1                  TaskTracker  2  .tgz  

2.  Copy  

3.  Expand  in  specified  locaBon  on  each  TaskTracker    

               TaskTracker  3  .tgz  

4.  Receive  binary  in          subsequent  searches  

Hunk  search  head  >  

18  

Page 19: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Hunk  Usage  in  HDFS  

bundles  –  Search  head  bundles:  keeps  last  5  bundles  

packages  –  Hunk  .tgz  packages:  no  automaBc  cleanup  

dispatch/<sid>  –  Search  scratch  space:  cleanup  when  sid  is  invalid  

hdfs://<scratch_space_path>/  

19  

Page 20: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Hunk  Uses  Virtual  Indexes  

!   Enables  seamless  use  of  almost  the  enBre  Splunk  stack  on  data  in  Hadoop  !   AutomaBcally  handles  MapReduce  !   Technology  is  patent  pending  

20  

Page 21: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Hunk  search  head  >  

Examples  of  Virtual  Indexes  

External  system  1  

External  system  2    

External  system  3  

index  =  syslog        (/home/syslog/…)  

index  =  apache_logs  index  =  sensor_data  

index  =  twiler    

21  

Page 22: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Define  Virtual  Indexes  and  Paths  

Virtual  index  (e.g.  twiler)  

Virtual  index  (e.g.  sensor  data)  

Virtual  index  (e.g.  Apache  logs)  

External  resource    (e.g.  hadoop.prod)  

Specify  virtual  index  and  data  paths,  and  opBonally:    

! Filter  files  or  directories  using  a  whitelist  or  blacklist    

! Extract  metadata  or  Bme  range  from  paths    ! Use  props/transforms.conf  to  specify  search  Bme  processing  

22  

Page 23: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Search  Data  in  Hadoop  

External  resource    (e.g.  hadoop.prod)  

JSON  configs   MapReduce  

jobs  

Tasks  

/  working  directory  

Run  a  copy  of  splunkd  to  process  

Hunk  search  head  >  

1  

5  3  

4  

2  

NameNode  

JobTracker  (MapReduce  resource  

manager  in  YARN)  

   DataNode  /      TaskTracker        (Node  in  YARN)  

   DataNode  /      TaskTracker  (Node  in  YARN)  

   DataNode  /      TaskTracker  (Node  in  YARN)  

HDFS  

23  

Page 24: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Data  Processing  Pipeline  

Raw  data  (HDFS)  

Custom  processing  

Indexing  pipeline  

Search  pipeline  

You  can  plug  in  data  preprocessors  e.g.  Apache  Avro  or  format  readers  

MapReduce/Java  

stdin  

Event  breaking  Timestamping    

Event  typing  Lookups    Tagging  Search  processors  

splunkd/C++  

24  

Page 25: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Hunk  applies  schema  for  all  fields  –  including  transacBons  –  at  search  Bme  

Hunk  Applies  Schema  on  The  Fly  

•  Structure  applied  at  search  Bme  

•  No  brille  schema  to  work  around  

•  AutomaBcally  find  palerns  and  trends  

25  

Page 26: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Example  Bme-­‐based  parBBon  pruning  Search:    index=hunk  earliest_(me=“2013-­‐06-­‐10T01:00:00”  latest_(me  =“2013-­‐06-­‐10T02:00:00”    

Search  OpBmizaBon:  ParBBon  Pruning  

!   Most  data  types  are  stored  in  hierarchical  directories  –  Such  as  /<base_path>/<date>/<hour>/<hostname>/somefile.log  

!   You  can  instruct  Hunk  to  extract  fields  and  Bme  ranges  from  a  path    !   Searches  ignore  directories  that  cannot  possibly  contain  search  results  –  Such  as  Bme  ranges  outside  of  a  defined  range  

26  

Page 27: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Search  Performance  with  MapReduce  MapReduce  consideraBons  !   Stats/chart/Bmechart/top/etc.  commands  work  well  in  a  distributed  environment  

–  They  MapReduce  well  !   Time  and  order  commands  don’t  work  well  in  a  distributed  environment  

–  They  don’t  MapReduce  well  

Summary  indexing  

•  Useful  for  speeding  up  searches  •  Summaries  could  have  different  retenBon  policy  •  In  most  cases  resides  on  the  search  head  •  Backfill  is  a  manual  (scripted)  process  

27  

Page 28: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Mixed-­‐mode  Search  

ReporBng  Streaming  •  Transfers  first  several  blocks  

from  HDFS  to  the  Hunk  search  head  for  immediate  processing  

•  Pushes  computaBon  to  the  DataNodes  and  TaskTrackers  for  the  complete  search  

!   Hunk  starts  the  streaming  and  reporBng  modes  concurrently  !   Streaming  results  show  unBl  the  reporBng  results  come  in  !   Allows  users  to  search  interacBvely  by  pausing  and  refining  queries  

28  

Page 29: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Flexible,  IteraBve  Workflow  for  Business  Users  

Explore  

Analyze  

Model  

Pivot  

Visualize  

Share  

Interac(ve  Analy(cs  

•  Preview  results  •  NormalizaBon  as  it’s  needed  •  Faster  implementaBon  and  flexibility  •  Easy  search  language  +  data  models  &  pivot  • MulBple  views  into  the  same  data  

29  

Page 30: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Demo  

Page 31: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Next  Steps  

Download  the  .conf2013  Mobile  App  If  not  iPhone,  iPad  or  Android,  use  the  Web  App    

Take  the  survey  &  WIN  A  PASS  FOR  .CONF2014…  Or  one  of  these  bags!    Go  to  “Technical  Deep  Dive:  Hadoop  Opera(ons  Management”  Brera  6,  Level  3  Today,  11:30-­‐12:30pm  

1  

2  

3  

31  

Page 32: A*New*Product:** Hunk*–Splunk*AnalyBcs*for* Hadoop* · 2017-10-13 · New*productfrom* Splunk* deliversinterac(ve*data exploraon,analysisand* visualizaonsfor Hadoop* Announcing*Hunk*Beta

Thank  You