hunk & elasrc mapreduce: big data analyrcs on aws

31
Copyright © 2014 Splunk Inc. Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS Dritan Bi=ncka BD Solu=ons Architecture

Upload: dangtram

Post on 13-Feb-2017

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Copyright  ©  2014  Splunk  Inc.  

Hunk  &  Elas=c  MapReduce:  Big  Data  Analy=cs  on  AWS  

Dritan  Bi=ncka  BD  Solu=ons  Architecture  

Page 2: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Disclaimer  

2  

During  the  course  of  this  presenta=on,  we  may  make  forward  looking  statements  regarding  future  events  or  the  expected  performance  of  the  company.  We  cau=on  you  that  such  statements  reflect  our  current  expecta=ons  and  

es=mates  based  on  factors  currently  known  to  us  and  that  actual  events  or  results  could  differ  materially.  For  important  factors  that  may  cause  actual  results  to  differ  from  those  contained  in  our  forward-­‐looking  statements,  

please  review  our  filings  with  the  SEC.  The  forward-­‐looking  statements  made  in  the  this  presenta=on  are  being  made  as  of  the  =me  and  date  of  its  live  presenta=on.  If  reviewed  aTer  its  live  presenta=on,  this  presenta=on  may  not  contain  current  or  accurate  informa=on.  We  do  not  assume  any  obliga=on  to  update  any  forward  looking  statements  we  may  make.  In  addi=on,  any  informa=on  about  our  roadmap  outlines  our  general  product  direc=on  and  is  subject  to  change  at  any  =me  without  no=ce.  It  is  for  informa=onal  purposes  only  and  shall  not,  be  incorporated  into  any  contract  or  other  commitment.  Splunk  undertakes  no  obliga=on  either  to  develop  the  features  or  func=onality  described  or  to  

include  any  such  feature  or  func=onality  in  a  future  release.  

Page 3: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

About  Me    

 !  Member  of  BD  Solu=on  Architecture  team  !   Large  scale  deployments  !   Cloud  and  Big  Data  !   Fourth  .Conf  

Page 4: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Agenda  

!   Hunk  !   Amazon  EMR  !   Understanding  how  Hunk  and  EMR  can  work  together  !   Demo  

–  Analyzing  HDFS/S3  data  with  Hunk  on  EMR  

 

4  

Page 5: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Introduc=on    to  Hunk  

Page 6: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

6  

Splunk  as  a  single  pane  of  glass  for  your  machine  data  

Page 7: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

7  

RDBM   Splunk>  NoSQL  

Page 8: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

8  

RDBM  

Splunk>  

NoSQL  RDBM   Splunk>  NoSQL  

Page 9: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Hunk  for  Hadoop  and  NoSQL  Data  Stores  

9  

Explore  Analyze    Visualize  

RDBM  

Splunk>  

NoSQL  

Page 10: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Hunk  for  Hadoop  and  NoSQL  Data  Stores  

10  

Explore  Analyze    Visualize  

RDBM  

Splunk>  

NoSQL  

Page 11: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Hadoop  Components  HDFS  

–  NameNode  –  DataNode    –  Distributed,  replicated,  massively  scalable  file  system  

11  

MapReduce  –  JobTracker    –  TaskTracker  –  Programming  paradigm;  two  phase  processing  of  large  datasets    

ê  We  also  use  it,  though  a  simplified  version  of  it    –  Scalable,  fault  tolerant  etc.    

COMPUTE  

STORAGE  

Page 12: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Splunk  and  Hadoop  Data  

Export:  Write  data  out  to  Hadoop,  search  based  (push)  Explore:  Read  data  from  Hadoop  and  analyze  on  SH    

12  

Splunk  Hadoop  Connect  

Page 13: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Splunk  and  Hadoop  Data  

Export:  Write  data  out  to  Hadoop,  search  based  (push)  Explore:  Read  data  from  Hadoop  and  analyze  on  SH    

13  

Splunk  Hadoop  Connect  

PULL  

Page 14: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Splunk  and  Hadoop  Data  

Export:  Write  data  out  to  Hadoop,  search  based  (push)  Explore:  Read  data  from  Hadoop  and  analyze  on  SH    

14  

STORAGE  

Splunk  Hadoop  Connect  

PULL  

✓  

✗  COMPUTE  

Page 15: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Splunk  and  Hadoop  Data  –  Today  

15  

COMPUTE  

STORAGE  Explore   Visualize   Dashboards  

Share  Analyze  

✓  ✓  

Page 16: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

64-­‐bit  Linux  OS  

splunkweb  •  Web  and  Applica=on  server  •  Python,  AJAX,  CSS,  XSLT,  XML  

•  Search  Head  •  Virtual  Indexes  •  C++,  Web  Services  

REST  API   COMMAND  LINE  

Explore   Analyze   Visualize   Dashboards   Share  

ODBC    

splunkd  

Splunk  Stack  

16  

Page 17: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

64-­‐bit  Linux  OS  

splunkweb  •  Web  and  Applica=on  server  •  Python,  AJAX,  CSS,  XSLT,  XML  

•  Search  Head  •  Virtual  Indexes  •  C++,  Web  Services  

REST  API   COMMAND  LINE  

Explore   Analyze   Visualize   Dashboards   Share  

ODBC    

splunkd  

Hadoop  Interface  •  Hadoop  Client  Libraries  •  JAVA  

Hunk  Stack  

17  

Page 18: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

64-­‐bit  Linux  OS  

splunkweb  •  Web  and  Applica=on  server  •  Python,  AJAX,  CSS,  XSLT,  XML  

•  Search  Head  •  Virtual  Indexes  •  C++,  Web  Services  

REST  API   COMMAND  LINE  

Explore   Analyze   Visualize   Dashboards   Share  

ODBC  

splunkd  

Hadoop  Interface  •  Hadoop  Client  Libraries  •  JAVA  

Scaling  with  Hadoop  

18  

Connect  Hunk  to  mul=ple  Hadoop  clusters  

Hadoop  Cluster  3  

Hadoop  Cluster  2  

Hadoop  Cluster  1  

Page 19: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

What Makes it Stick?

ERP1  (prod)   ERP2  (test)  

VIX-­‐1   VIX-­‐2   VIX-­‐3   VIX-­‐4  

ERP  Provider  Family      

 Hadoop  

In order to access and process data in external data stores (supports HDFS out-of-the-box), Hunk External Resource Providers (ERP) carry out the store-specific file system implementation and computational semantics.

Provider  Family  is  a  logical  grouping  of  data  store  framework  that  accesses  the  same  “kind”  of  external  systems  and  shares  a  global  set  of  configura=ons.  

A  provider  is  a  collec=on  of  specific  Hunk  ERP  helper  process  implementa=on  within  the  provider  family  and  shares  a  cluster-­‐specific  configura=ons.  

ATer  you  set  up  a  provider,  you  configure  virtual  indexes  (VIX)  by  giving  Hunk  informa=on  about  the  data  loca=on.  Hunk  then  use  the  informa=on  and  its  underlying  implementa=on  to  distribute  searches.  

Hunk  

Page 20: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Explore,  Analyze,  Visualize  Data  in  Hadoop  !   No  fixed  schema  to  search  unstructured  data  !   Preview  results  while  MapReduce  jobs  start  !   Easier  app  development  than  in  raw  Hadoop  

20  

!   Unlock  business  value  of  data  in  Hadoop  !   Fast  to  learn  instead  of  scarce  skills  !   Integrated  –  explore,  analyze  and  visualize  

Page 21: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Integrated  Analy=cs  Plaoorm  for  Hadoop  Data  

21  

Full-­‐featured,  Integrated  Product  

Insights  for  Everyone  

Works  with  What  You  Have  Today  

Explore   Visualize   Dashboards   Share  

Hadoop  (MapReduce  &  HDFS)  

Analyze  

21  

Page 22: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Introduc=on  to  EMR  

Page 23: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Amazon  EMR  

23  

!   Amazon  EMR  is  Hadoop  framework  in  the  cloud  offered  as  a  managed  service  

!   Used  in  “variety  of  applica.ons,  including  log  analysis,  web  indexing,  data  warehousing,  machine  learning,  financial  analysis,  scien.fic  simula.on,  and  bioinforma.cs”  

 

Amazon EMR

Page 24: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Provisioning  Hadoop  on  AWS  

24  

1.  Login  to  AWS  Console  2.  Fill  in  a  form    3.  Click  “Create  Cluster”  4.  Wait  a  few  minutes  for  

a  fully  operaYonal  Hadoop  cluster  

Page 25: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Why  is  EMR  Compelling?  

25  

!   No  Hadoop/HDFS  management    !   NaYve  support  for  AWS  S3  –  Vast  amounts  of  data  in  S3  

!   Cluster  Elas=city    !   Spot  vs.  Reserved  Instances  –  Long  running  vs.  transient  

!   Pay  for  what  you  use  !   Thousands  of  customers  

Master  

HDFS  

S3  

.  .  .  

Page 26: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Managed  Hadoop  framework  on  the  cloud  with  access  to  vast  amounts  of  data  in  HDFS  and  S3  

Explore,  analyze  and  visualize  data  from  a  central  place    

Full  analy=cs  solu=on  for  Big  Data  on  the  cloud  

Integra=ng  Hunk  with  EMR  

EMR   Hunk  

Page 27: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Hunk  on  EMR:  Op=on  1  

27  

!   Classic  Hunk  +  Hadoop  –  Provision  an  EMR  cluster  –  Provision  a  Hunk  EC2  instance  using  the  AWS  Marketplace  Hunk  AMI  –  Bring  Your  Own  License  (BYOL)  –  Configure  Hunk  with  EMR  cluster  

ê  Edit  Security  Groups  to  allow  access  ê  Master  IP  addresses  &  Ports  ê  Create  provider  ê  Create  Virtual  Index  ê  Search      

Page 28: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Hunk  on  EMR:  Op=on  2  

28  

!   Placeholder      

Page 29: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Demo  

29  

!   Analyze  ELB  or  S3  Access  Logs    !   Analyze  CloudTrail  Access  Logs    

Page 30: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

Copyright  ©  2014  Splunk  Inc.  

QUESTIONS?      

You  may  also  like:  Hunk  6.1  Technical  Deep  Dive  

Hunk  Report  AcceleraYon  Deep  Dive  Comprehensive  Security  AnalyYcs    

for  Modern  Threats  with  Hunk    

Page 31: Hunk & Elasrc MapReduce: Big Data Analyrcs on AWS

THANK  YOU  feedback:  [email protected]