building a log analysis pipeline

17
Building a Log Analysis Pipeline A BRIEF TOUR

Upload: david-severski

Post on 11-May-2015

865 views

Category:

Technology


4 download

DESCRIPTION

Quick internal presentation on work we've been doing to deploy an ELK stack for our security analysis needs.

TRANSCRIPT

Page 1: Building a Log Analysis Pipeline

Building a Log Analysis PipelineA BRIEF TOUR

Page 2: Building a Log Analysis Pipeline

Problem Limited visibility into the environment

SIEM solutions inadequate for risk management purposes

Requests for extracts difficult or impossible to provide

Unable to connect together different data sources

Page 3: Building a Log Analysis Pipeline

Requirements Cheap

◦ Budget + Labor 0◦ Hobby project

Scalable◦ SIEM data in the TB range◦ Need to have historical data◦ Decoupled from logging infrastructure

Performance◦ Batch processing is okay◦ …but batches can’t be too slow◦ Need near real-time exploration options

Confidentiality◦ This is security data. Let’s not create more problems than solutions.

Page 4: Building a Log Analysis Pipeline

Resources SIEM does a good job with log aggregation

◦ Stores raw syslog events

Easy to access to raw events on the SIEM

Data is relatively large, but not BIG

Page 5: Building a Log Analysis Pipeline

A Plan Is Born“I have a cunning plan!” – S. Baldrick, Blackadder

Page 6: Building a Log Analysis Pipeline

Early ApproachesMETHOD 1 - MONGODB

◦ Python regexp to create JSON◦ Load to MongoDB◦ Run Mongo MapReduce

Worked – but slow. Required AWS for sufficient memory to run MapReduce flows

METHOD 2 – PURE PYTHON

◦ Python regexp to create CSV◦ Pull off to Analysis Workspace◦ Python MapReduce in shell

Worked – but limited and rigid

Page 7: Building a Log Analysis Pipeline

Premature Data Truncation Leads to Poor Results

Loose ability to query context

Additional queries not possible without custom redesign◦ Blocks vs. Passes◦ Port information

Querying peer node relations, etc. not practical

Page 8: Building a Log Analysis Pipeline

Unleash the ELK!

Page 9: Building a Log Analysis Pipeline

Elasticsearch◦ Full text search engine based on

Apache Lucene◦ Incredibly fast and flexible query

DSL◦ Built for distributed search

(horizontal scale) from the ground up

Page 10: Building a Log Analysis Pipeline

Logstash Open Source log intake and processor

Easy to use pattern matching◦ No more opaque regexs!

Terrific metadata enrichment

Scores of plugins◦ Inputs, outputs, filters, codecs

Page 11: Building a Log Analysis Pipeline

Kibana◦ Lightweight HTML5 interface to

Elasticsearch for logs◦ Not a full SIEM replacement◦ Targeting the Splunk market

Page 12: Building a Log Analysis Pipeline

Infrastructure On SIEM

◦ Python for creating extracts◦ Bash for taring up raw logs

Transport◦ SCP from SIEM to Windows file share◦ USB from Windows file share◦ Sneaker net to analysis workspace

On Analysis Workspace◦ Vagrant◦ Chef

Page 13: Building a Log Analysis Pipeline

Demo

Page 14: Building a Log Analysis Pipeline

Pieces Involved

Page 15: Building a Log Analysis Pipeline

Next Steps – Infrastructure

Complete provisioning scripts for Hadoop & AWS

Transfer raw GZ files to encrypted S3 bucket◦ Allow extract AWS EMR jobs to run

Process via Logstash into Elasticsearch◦ Elasticsearch for short-term exploration◦ Archive structured data to S3

Setup Elasticsearch-Hadoop connector

Use AWS EMR to do ad hoc extracts off of structured S3 buckets

Page 16: Building a Log Analysis Pipeline

Next Steps – Data Products

Full MaxMind integration◦ Accuracy & detail

Reputation◦ REN-ISAC integration

Graph exploration◦ Who else talked to whom◦ Clustering

Future◦ Proxy logs◦ DNS logs

Page 17: Building a Log Analysis Pipeline

Thanks Google Groups

IRC #logstash, #chef, #vagrant, #elasticsearch

Seattle Search and Machine Learning Meetup

Seattle Chef Meetup

Hortonworks Sandbox

The Phoenix Project

Data-Driven Security

AlienVault

…and more!