anewproduct:** hunk–splunkanalybcsfor hadoop* · 2017-10-13 · newproductfrom splunk*...

Copyright © 2013 Splunk Inc.

A New Product: Hunk – Splunk AnalyBcs for Hadoop (BETA) Clint Sharp Director of Product Management #splunkconf

Legal NoBces During the course of this presentaBon, we may make forward-‐looking statements regarding future events or the expected performance of the company. We cauBon you that such statements reflect our current expectaBons and esBmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements, please review our filings with the SEC. The forward-‐looking statements made in this presentaBon are being made as of the Bme and date of its live presentaBon. If reviewed aYer its live presentaBon, this presentaBon may not contain current or accurate informaBon. We do not assume any obligaBon to update any forward-‐looking statements we may make. In addiBon, any informaBon about our roadmap outlines our general product direcBon and is subject to change at any Bme without noBce. It is for informaBonal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligaBon either to develop the features or funcBonality described or to include any such feature or funcBonality in a future release.

Splunk, Splunk>, Splunk Storm, Listen to Your Data, SPL and The Engine for Machine Data are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respecCve

owners.

©2013 Splunk Inc. All rights reserved.

2

New product from Splunk delivers interac(ve data explora(on, analysis and visualiza(ons for Hadoop

Announcing Hunk Beta Splunk AnalyBcs for Hadoop

3

A Lot of OrganizaBonal Data Ends Up in Hadoop

!   20X services relaBve to soYware

!   Inadequate skills for big data analyBcs

!   13+ Hadoop-‐related projects requiring integraBon

!   Data is “too big to move” Hadoop

(MapReduce & HDFS)

YARN Ambari Avro

Cassandra

Chukwa

H i v e

HBase Mahout

Pig

ZooKeeper

13+ Hadoop-‐related projects

Challenges Deploying and Leveraging Hadoop

4

Splunk Hadoop Connect

Reliable bi-‐direcBonal integraBon to Hadoop >1000 downloads

October 2012: Splunk Hadoop Connect To Address Common Challenges Deploying and Running Hadoop

HA indexes and storage

Commodity servers

Hadoop (MapReduce & HDFS)

Import Browse Export

Report and analyze

Custom dashboards

Monitor and alert

Ad hoc search

5

What About ExtracBng Value Directly from Hadoop?

“How can we leverage the full capabiliBes of Splunk naBvely on data in Hadoop?”

Data in Hadoop is too big to move

HA indexes and storage

Commodity servers


Report and analyze

Custom dashboards

Monitor and alert

Ad hoc search

6

Hunk: Splunk AnalyBcs for Hadoop


Full-‐featured, integrated product

Insights for everyone

Distribu(on agnos(c

Delivers interacBve data exploraBon, analysis and visualizaBon for Hadoop

Empowers broader user groups to derive acBonable insights from raw data in Hadoop

Works with leading distribuBons to maximize enterprise technology investments

Explore Analyze Visualize Dashboards Share

7

Derive AcBonable Insights from Raw Data

Hadoop storage

Immediately start exploring, analyzing and visualizing raw data in Hadoop

1 2Point Splunk at Hadoop cluster


8

Explore, Analyze and Visualize Data, On-‐the-‐fly Virtual index Schema-‐on-‐the-‐fly Flexibility and

fast (me to value

•  Enables seamless use of the enBre Splunk technology stack on data wherever it rests • Hadoop virtual index automaBcally handles MapReduce •  Technology is patent pending

•  Structure applied at search Bme • No brille schema to work around • AutomaBcally find transacBons, palerns and trends

• NormalizaBon as it’s needed •  Faster implementaBon •  Easy search language • MulBple views into the same data

9

InteracBve Data ExploraBon

Search interface

Search assistant

InteracBve results window

!   Powerful search processing language (SPL)

!   Designed for data exploraBon across large datasets – preview data, iterate quickly

!   No requirement to “understand” data upfront

Search and explore data from one place

10

InteracBve Data Analysis Rapidly analyze and interact with data

!   InteracBve analyBcs interface !   Deep analysis, palern detecBon and finding anomalies with over 100 staBsBcal commands

!   Enrich results with informaBon from external relaBonal databases

InteracBve, analyBcs interface

Formaong opBons

11

Powerful Plaporm for Enterprise Developers

JavaScript

Java

Python

PHP

C#

Ruby

API

Add new UI components

Integrate into exisBng systems

With known languages

and frameworks

12

Technology Overview

Hunk Server

64-‐bit Linux OS

splunkweb •  Web and applicaBon server •  Python, AJAX, CSS, XSLT, XML

•  Search head •  Virtual indexes •  C++, web services

REST API COMMAND LINE


ODBC (beta)

splunkd

Hadoop interface •  Hadoop client libraries •  JAVA

14

64-‐bit Linux OS





ODBC (beta)

splunkd


Connect to HDFS and MapReduce

Connect to Apache HDFS and MapReduce or your choice of Hadoop distribuBon

Hadoop cluster 1

15

64-‐bit Linux OS





ODBC (beta)

splunkd


Hunk Scales with Your Hadoop Deployments Connect Hunk to mulBple Hadoop clusters

Hadoop cluster 3

Hadoop cluster 2

Hadoop cluster 1

16

Prerequisites

Hadoop access rights

Java 1.6+ Hadoop client

libraries

HDFS scratch space

Data in Hadoop to analyze

DataNode local temp disk space

17

MapReduce as The OrchestraBon Framework

1. Copy splunkd binary HDFS .tgz

TaskTracker 1 TaskTracker 2 .tgz

2. Copy

3. Expand in specified locaBon on each TaskTracker

TaskTracker 3 .tgz

4. Receive binary in subsequent searches

Hunk search head >

18

Hunk Usage in HDFS

bundles – Search head bundles: keeps last 5 bundles

packages – Hunk .tgz packages: no automaBc cleanup

dispatch/<sid> – Search scratch space: cleanup when sid is invalid

hdfs://<scratch_space_path>/

19

Hunk Uses Virtual Indexes

!   Enables seamless use of almost the enBre Splunk stack on data in Hadoop !   AutomaBcally handles MapReduce !   Technology is patent pending

20

Hunk search head >

Examples of Virtual Indexes

External system 1

External system 2

External system 3

index = syslog (/home/syslog/…)

index = apache_logs index = sensor_data

index = twiler

21

Define Virtual Indexes and Paths

Virtual index (e.g. twiler)

Virtual index (e.g. sensor data)

Virtual index (e.g. Apache logs)

External resource (e.g. hadoop.prod)

Specify virtual index and data paths, and opBonally:

! Filter files or directories using a whitelist or blacklist

! Extract metadata or Bme range from paths ! Use props/transforms.conf to specify search Bme processing

22

Search Data in Hadoop

External resource (e.g. hadoop.prod)

JSON configs MapReduce

jobs

Tasks

/ working directory

Run a copy of splunkd to process

Hunk search head >

1

5 3

4

2

NameNode

JobTracker (MapReduce resource

manager in YARN)

DataNode / TaskTracker (Node in YARN)



HDFS

23

Data Processing Pipeline

Raw data (HDFS)

Custom processing

Indexing pipeline

Search pipeline

You can plug in data preprocessors e.g. Apache Avro or format readers

MapReduce/Java

stdin

Event breaking Timestamping

Event typing Lookups Tagging Search processors

splunkd/C++

24

Hunk applies schema for all fields – including transacBons – at search Bme

Hunk Applies Schema on The Fly

•  Structure applied at search Bme

•  No brille schema to work around

•  AutomaBcally find palerns and trends

25

Example Bme-‐based parBBon pruning Search: index=hunk earliest_(me=“2013-‐06-‐10T01:00:00” latest_(me =“2013-‐06-‐10T02:00:00”

Search OpBmizaBon: ParBBon Pruning

!   Most data types are stored in hierarchical directories –  Such as /<base_path>/<date>/<hour>/<hostname>/somefile.log

!   You can instruct Hunk to extract fields and Bme ranges from a path !   Searches ignore directories that cannot possibly contain search results –  Such as Bme ranges outside of a defined range

26

Search Performance with MapReduce MapReduce consideraBons !   Stats/chart/Bmechart/top/etc. commands work well in a distributed environment

–  They MapReduce well !   Time and order commands don’t work well in a distributed environment

–  They don’t MapReduce well

Summary indexing

•  Useful for speeding up searches •  Summaries could have different retenBon policy •  In most cases resides on the search head •  Backfill is a manual (scripted) process

27

Mixed-‐mode Search

ReporBng Streaming •  Transfers first several blocks

from HDFS to the Hunk search head for immediate processing

•  Pushes computaBon to the DataNodes and TaskTrackers for the complete search

!   Hunk starts the streaming and reporBng modes concurrently !   Streaming results show unBl the reporBng results come in !   Allows users to search interacBvely by pausing and refining queries

28

Flexible, IteraBve Workflow for Business Users

Explore

Analyze

Model

Pivot

Visualize

Share

Interac(ve Analy(cs

•  Preview results •  NormalizaBon as it’s needed •  Faster implementaBon and flexibility •  Easy search language + data models & pivot • MulBple views into the same data

29

Next Steps

Download the .conf2013 Mobile App If not iPhone, iPad or Android, use the Web App

Take the survey & WIN A PASS FOR .CONF2014… Or one of these bags! Go to “Technical Deep Dive: Hadoop Opera(ons Management” Brera 6, Level 3 Today, 11:30-‐12:30pm

1

2

3

31

Thank You

a*new*product:** hunk*–splunk*analybcs*for* hadoop* · 2017-10-13 · new*productfrom* splunk*...

Documents

anewproduct:** hunk–splunkanalybcsfor hadoop* · 2017-10-13 · newproductfrom splunk*...