a*new*product:** hunk*–splunk*analybcs*for* hadoop* · 2017-10-13 · new*productfrom* splunk*...
TRANSCRIPT
Copyright © 2013 Splunk Inc.
A New Product: Hunk – Splunk AnalyBcs for Hadoop (BETA) Clint Sharp Director of Product Management #splunkconf
Legal NoBces During the course of this presentaBon, we may make forward-‐looking statements regarding future events or the expected performance of the company. We cauBon you that such statements reflect our current expectaBons and esBmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements, please review our filings with the SEC. The forward-‐looking statements made in this presentaBon are being made as of the Bme and date of its live presentaBon. If reviewed aYer its live presentaBon, this presentaBon may not contain current or accurate informaBon. We do not assume any obligaBon to update any forward-‐looking statements we may make. In addiBon, any informaBon about our roadmap outlines our general product direcBon and is subject to change at any Bme without noBce. It is for informaBonal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligaBon either to develop the features or funcBonality described or to include any such feature or funcBonality in a future release.
Splunk, Splunk>, Splunk Storm, Listen to Your Data, SPL and The Engine for Machine Data are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respecCve
owners.
©2013 Splunk Inc. All rights reserved.
2
New product from Splunk delivers interac(ve data explora(on, analysis and visualiza(ons for Hadoop
Announcing Hunk Beta Splunk AnalyBcs for Hadoop
3
A Lot of OrganizaBonal Data Ends Up in Hadoop
! 20X services relaBve to soYware
! Inadequate skills for big data analyBcs
! 13+ Hadoop-‐related projects requiring integraBon
! Data is “too big to move” Hadoop
(MapReduce & HDFS)
YARN Ambari Avro
Cassandra
Chukwa
H i v e
HBase Mahout
Pig
ZooKeeper
13+ Hadoop-‐related projects
Challenges Deploying and Leveraging Hadoop
4
Splunk Hadoop Connect
Reliable bi-‐direcBonal integraBon to Hadoop >1000 downloads
October 2012: Splunk Hadoop Connect To Address Common Challenges Deploying and Running Hadoop
HA indexes and storage
Commodity servers
Hadoop (MapReduce & HDFS)
Import Browse Export
Report and analyze
Custom dashboards
Monitor and alert
Ad hoc search
5
What About ExtracBng Value Directly from Hadoop?
“How can we leverage the full capabiliBes of Splunk naBvely on data in Hadoop?”
Data in Hadoop is too big to move
HA indexes and storage
Commodity servers
Hadoop (MapReduce & HDFS)
Report and analyze
Custom dashboards
Monitor and alert
Ad hoc search
6
Hunk: Splunk AnalyBcs for Hadoop
Hadoop (MapReduce & HDFS)
Full-‐featured, integrated product
Insights for everyone
Distribu(on agnos(c
Delivers interacBve data exploraBon, analysis and visualizaBon for Hadoop
Empowers broader user groups to derive acBonable insights from raw data in Hadoop
Works with leading distribuBons to maximize enterprise technology investments
Explore Analyze Visualize Dashboards Share
7
Derive AcBonable Insights from Raw Data
Hadoop storage
Immediately start exploring, analyzing and visualizing raw data in Hadoop
1 2Point Splunk at Hadoop cluster
Explore Analyze Visualize Dashboards Share
8
Explore, Analyze and Visualize Data, On-‐the-‐fly Virtual index Schema-‐on-‐the-‐fly Flexibility and
fast (me to value
• Enables seamless use of the enBre Splunk technology stack on data wherever it rests • Hadoop virtual index automaBcally handles MapReduce • Technology is patent pending
• Structure applied at search Bme • No brille schema to work around • AutomaBcally find transacBons, palerns and trends
• NormalizaBon as it’s needed • Faster implementaBon • Easy search language • MulBple views into the same data
9
InteracBve Data ExploraBon
Search interface
Search assistant
InteracBve results window
! Powerful search processing language (SPL)
! Designed for data exploraBon across large datasets – preview data, iterate quickly
! No requirement to “understand” data upfront
Search and explore data from one place
10
InteracBve Data Analysis Rapidly analyze and interact with data
! InteracBve analyBcs interface ! Deep analysis, palern detecBon and finding anomalies with over 100 staBsBcal commands
! Enrich results with informaBon from external relaBonal databases
InteracBve, analyBcs interface
Formaong opBons
11
Powerful Plaporm for Enterprise Developers
JavaScript
Java
Python
PHP
C#
Ruby
API
Add new UI components
Integrate into exisBng systems
With known languages
and frameworks
12
Technology Overview
Hunk Server
64-‐bit Linux OS
splunkweb • Web and applicaBon server • Python, AJAX, CSS, XSLT, XML
• Search head • Virtual indexes • C++, web services
REST API COMMAND LINE
Explore Analyze Visualize Dashboards Share
ODBC (beta)
splunkd
Hadoop interface • Hadoop client libraries • JAVA
14
64-‐bit Linux OS
splunkweb • Web and applicaBon server • Python, AJAX, CSS, XSLT, XML
• Search head • Virtual indexes • C++, web services
REST API COMMAND LINE
Explore Analyze Visualize Dashboards Share
ODBC (beta)
splunkd
Hadoop interface • Hadoop client libraries • JAVA
Connect to HDFS and MapReduce
Connect to Apache HDFS and MapReduce or your choice of Hadoop distribuBon
Hadoop cluster 1
15
64-‐bit Linux OS
splunkweb • Web and applicaBon server • Python, AJAX, CSS, XSLT, XML
• Search head • Virtual indexes • C++, web services
REST API COMMAND LINE
Explore Analyze Visualize Dashboards Share
ODBC (beta)
splunkd
Hadoop interface • Hadoop client libraries • JAVA
Hunk Scales with Your Hadoop Deployments Connect Hunk to mulBple Hadoop clusters
Hadoop cluster 3
Hadoop cluster 2
Hadoop cluster 1
16
Prerequisites
Hadoop access rights
Java 1.6+ Hadoop client
libraries
HDFS scratch space
Data in Hadoop to analyze
DataNode local temp disk space
17
MapReduce as The OrchestraBon Framework
1. Copy splunkd binary HDFS .tgz
TaskTracker 1 TaskTracker 2 .tgz
2. Copy
3. Expand in specified locaBon on each TaskTracker
TaskTracker 3 .tgz
4. Receive binary in subsequent searches
Hunk search head >
18
Hunk Usage in HDFS
bundles – Search head bundles: keeps last 5 bundles
packages – Hunk .tgz packages: no automaBc cleanup
dispatch/<sid> – Search scratch space: cleanup when sid is invalid
hdfs://<scratch_space_path>/
19
Hunk Uses Virtual Indexes
! Enables seamless use of almost the enBre Splunk stack on data in Hadoop ! AutomaBcally handles MapReduce ! Technology is patent pending
20
Hunk search head >
Examples of Virtual Indexes
External system 1
External system 2
External system 3
index = syslog (/home/syslog/…)
index = apache_logs index = sensor_data
index = twiler
21
Define Virtual Indexes and Paths
Virtual index (e.g. twiler)
Virtual index (e.g. sensor data)
Virtual index (e.g. Apache logs)
External resource (e.g. hadoop.prod)
Specify virtual index and data paths, and opBonally:
! Filter files or directories using a whitelist or blacklist
! Extract metadata or Bme range from paths ! Use props/transforms.conf to specify search Bme processing
22
Search Data in Hadoop
External resource (e.g. hadoop.prod)
JSON configs MapReduce
jobs
Tasks
/ working directory
Run a copy of splunkd to process
Hunk search head >
1
5 3
4
2
NameNode
JobTracker (MapReduce resource
manager in YARN)
DataNode / TaskTracker (Node in YARN)
DataNode / TaskTracker (Node in YARN)
DataNode / TaskTracker (Node in YARN)
HDFS
23
Data Processing Pipeline
Raw data (HDFS)
Custom processing
Indexing pipeline
Search pipeline
You can plug in data preprocessors e.g. Apache Avro or format readers
MapReduce/Java
stdin
Event breaking Timestamping
Event typing Lookups Tagging Search processors
splunkd/C++
24
Hunk applies schema for all fields – including transacBons – at search Bme
Hunk Applies Schema on The Fly
• Structure applied at search Bme
• No brille schema to work around
• AutomaBcally find palerns and trends
25
Example Bme-‐based parBBon pruning Search: index=hunk earliest_(me=“2013-‐06-‐10T01:00:00” latest_(me =“2013-‐06-‐10T02:00:00”
Search OpBmizaBon: ParBBon Pruning
! Most data types are stored in hierarchical directories – Such as /<base_path>/<date>/<hour>/<hostname>/somefile.log
! You can instruct Hunk to extract fields and Bme ranges from a path ! Searches ignore directories that cannot possibly contain search results – Such as Bme ranges outside of a defined range
26
Search Performance with MapReduce MapReduce consideraBons ! Stats/chart/Bmechart/top/etc. commands work well in a distributed environment
– They MapReduce well ! Time and order commands don’t work well in a distributed environment
– They don’t MapReduce well
Summary indexing
• Useful for speeding up searches • Summaries could have different retenBon policy • In most cases resides on the search head • Backfill is a manual (scripted) process
27
Mixed-‐mode Search
ReporBng Streaming • Transfers first several blocks
from HDFS to the Hunk search head for immediate processing
• Pushes computaBon to the DataNodes and TaskTrackers for the complete search
! Hunk starts the streaming and reporBng modes concurrently ! Streaming results show unBl the reporBng results come in ! Allows users to search interacBvely by pausing and refining queries
28
Flexible, IteraBve Workflow for Business Users
Explore
Analyze
Model
Pivot
Visualize
Share
Interac(ve Analy(cs
• Preview results • NormalizaBon as it’s needed • Faster implementaBon and flexibility • Easy search language + data models & pivot • MulBple views into the same data
29
Demo
Next Steps
Download the .conf2013 Mobile App If not iPhone, iPad or Android, use the Web App
Take the survey & WIN A PASS FOR .CONF2014… Or one of these bags! Go to “Technical Deep Dive: Hadoop Opera(ons Management” Brera 6, Level 3 Today, 11:30-‐12:30pm
1
2
3
31
Thank You