hunk & elasrc mapreduce: big data analyrcs on aws
TRANSCRIPT
Copyright © 2014 Splunk Inc.
Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS
Dritan Bi=ncka BD Solu=ons Architecture
Disclaimer
2
During the course of this presenta=on, we may make forward looking statements regarding future events or the expected performance of the company. We cau=on you that such statements reflect our current expecta=ons and
es=mates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements,
please review our filings with the SEC. The forward-‐looking statements made in the this presenta=on are being made as of the =me and date of its live presenta=on. If reviewed aTer its live presenta=on, this presenta=on may not contain current or accurate informa=on. We do not assume any obliga=on to update any forward looking statements we may make. In addi=on, any informa=on about our roadmap outlines our general product direc=on and is subject to change at any =me without no=ce. It is for informa=onal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obliga=on either to develop the features or func=onality described or to
include any such feature or func=onality in a future release.
About Me
! Member of BD Solu=on Architecture team ! Large scale deployments ! Cloud and Big Data ! Fourth .Conf
Agenda
! Hunk ! Amazon EMR ! Understanding how Hunk and EMR can work together ! Demo
– Analyzing HDFS/S3 data with Hunk on EMR
4
Introduc=on to Hunk
6
Splunk as a single pane of glass for your machine data
7
RDBM Splunk> NoSQL
8
RDBM
Splunk>
NoSQL RDBM Splunk> NoSQL
Hunk for Hadoop and NoSQL Data Stores
9
Explore Analyze Visualize
RDBM
Splunk>
NoSQL
Hunk for Hadoop and NoSQL Data Stores
10
Explore Analyze Visualize
RDBM
Splunk>
NoSQL
Hadoop Components HDFS
– NameNode – DataNode – Distributed, replicated, massively scalable file system
11
MapReduce – JobTracker – TaskTracker – Programming paradigm; two phase processing of large datasets
ê We also use it, though a simplified version of it – Scalable, fault tolerant etc.
COMPUTE
STORAGE
Splunk and Hadoop Data
Export: Write data out to Hadoop, search based (push) Explore: Read data from Hadoop and analyze on SH
12
Splunk Hadoop Connect
Splunk and Hadoop Data
Export: Write data out to Hadoop, search based (push) Explore: Read data from Hadoop and analyze on SH
13
Splunk Hadoop Connect
PULL
Splunk and Hadoop Data
Export: Write data out to Hadoop, search based (push) Explore: Read data from Hadoop and analyze on SH
14
STORAGE
Splunk Hadoop Connect
PULL
✓
✗ COMPUTE
Splunk and Hadoop Data – Today
15
COMPUTE
STORAGE Explore Visualize Dashboards
Share Analyze
✓ ✓
64-‐bit Linux OS
splunkweb • Web and Applica=on server • Python, AJAX, CSS, XSLT, XML
• Search Head • Virtual Indexes • C++, Web Services
REST API COMMAND LINE
Explore Analyze Visualize Dashboards Share
ODBC
splunkd
Splunk Stack
16
64-‐bit Linux OS
splunkweb • Web and Applica=on server • Python, AJAX, CSS, XSLT, XML
• Search Head • Virtual Indexes • C++, Web Services
REST API COMMAND LINE
Explore Analyze Visualize Dashboards Share
ODBC
splunkd
Hadoop Interface • Hadoop Client Libraries • JAVA
Hunk Stack
17
64-‐bit Linux OS
splunkweb • Web and Applica=on server • Python, AJAX, CSS, XSLT, XML
• Search Head • Virtual Indexes • C++, Web Services
REST API COMMAND LINE
Explore Analyze Visualize Dashboards Share
ODBC
splunkd
Hadoop Interface • Hadoop Client Libraries • JAVA
Scaling with Hadoop
18
Connect Hunk to mul=ple Hadoop clusters
Hadoop Cluster 3
Hadoop Cluster 2
Hadoop Cluster 1
What Makes it Stick?
ERP1 (prod) ERP2 (test)
VIX-‐1 VIX-‐2 VIX-‐3 VIX-‐4
ERP Provider Family
Hadoop
In order to access and process data in external data stores (supports HDFS out-of-the-box), Hunk External Resource Providers (ERP) carry out the store-specific file system implementation and computational semantics.
Provider Family is a logical grouping of data store framework that accesses the same “kind” of external systems and shares a global set of configura=ons.
A provider is a collec=on of specific Hunk ERP helper process implementa=on within the provider family and shares a cluster-‐specific configura=ons.
ATer you set up a provider, you configure virtual indexes (VIX) by giving Hunk informa=on about the data loca=on. Hunk then use the informa=on and its underlying implementa=on to distribute searches.
Hunk
Explore, Analyze, Visualize Data in Hadoop ! No fixed schema to search unstructured data ! Preview results while MapReduce jobs start ! Easier app development than in raw Hadoop
20
! Unlock business value of data in Hadoop ! Fast to learn instead of scarce skills ! Integrated – explore, analyze and visualize
Integrated Analy=cs Plaoorm for Hadoop Data
21
Full-‐featured, Integrated Product
Insights for Everyone
Works with What You Have Today
Explore Visualize Dashboards Share
Hadoop (MapReduce & HDFS)
Analyze
21
Introduc=on to EMR
Amazon EMR
23
! Amazon EMR is Hadoop framework in the cloud offered as a managed service
! Used in “variety of applica.ons, including log analysis, web indexing, data warehousing, machine learning, financial analysis, scien.fic simula.on, and bioinforma.cs”
Amazon EMR
Provisioning Hadoop on AWS
24
1. Login to AWS Console 2. Fill in a form 3. Click “Create Cluster” 4. Wait a few minutes for
a fully operaYonal Hadoop cluster
Why is EMR Compelling?
25
! No Hadoop/HDFS management ! NaYve support for AWS S3 – Vast amounts of data in S3
! Cluster Elas=city ! Spot vs. Reserved Instances – Long running vs. transient
! Pay for what you use ! Thousands of customers
Master
HDFS
S3
. . .
Managed Hadoop framework on the cloud with access to vast amounts of data in HDFS and S3
Explore, analyze and visualize data from a central place
Full analy=cs solu=on for Big Data on the cloud
Integra=ng Hunk with EMR
EMR Hunk
Hunk on EMR: Op=on 1
27
! Classic Hunk + Hadoop – Provision an EMR cluster – Provision a Hunk EC2 instance using the AWS Marketplace Hunk AMI – Bring Your Own License (BYOL) – Configure Hunk with EMR cluster
ê Edit Security Groups to allow access ê Master IP addresses & Ports ê Create provider ê Create Virtual Index ê Search
Hunk on EMR: Op=on 2
28
! Placeholder
Demo
29
! Analyze ELB or S3 Access Logs ! Analyze CloudTrail Access Logs
Copyright © 2014 Splunk Inc.
QUESTIONS?
You may also like: Hunk 6.1 Technical Deep Dive
Hunk Report AcceleraYon Deep Dive Comprehensive Security AnalyYcs
for Modern Threats with Hunk
THANK YOU feedback: [email protected]