lumber-mill documentation · short introduction of the background of lumber-mill, why it exists and...

30
Lumber-Mill Documentation Release 0.20 Johan Rask Jul 14, 2017

Upload: others

Post on 03-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill DocumentationRelease 0.20

Johan Rask

Jul 14, 2017

Page 2: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long
Page 3: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Contents

1 Five minute intro 3

2 Sources 7

3 Enriching functions 15

4 Sinks 21

i

Page 4: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

ii

Page 5: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

Lumber-Mill is designed for programmers/devops/SRE etc with a professional programming background but with anemergent interest for devops, monitoring and log processing and who want total control of the event pipeline.

Benefits

• Deployed as AWS Lambdas

• API instead of runtime - complete java8 api

• No need for plugins, just depend on any third-party library you might need

Observable call(Observable eventStream) {

// Parse and de-normalize eventseventStream.compose ( new CloudWatchLogsEventPreProcessor())

.flatMap (grok.parse (

field: 'message',pattern: '%{AWS_LAMBDA_REQUEST_REPORT}'))

.flatMap ( addField('type','cloudwatchlogs'))

.flatMap ( fingerprint.md5())

.buffer (100)

.flatMap (AWS.elasticsearch.client (

url: '{es_url}',index_prefix: 'lumbermill-',type: '{type}',region: 'eu-west-1',document_id: '{fingerprint}'

))

}

Contents:

Contents 1

Page 6: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

2 Contents

Page 7: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

CHAPTER 1

Five minute intro

Short introduction of the background of Lumber-Mill, why it exists and who it is usable for.

Why, when and for who?

Why did we build it?

To make a long story short, it started when we wanted to collect our AWS ELB logs on multiple accounts and send toElasticsearch. This seemed like a quite simple task for logstash but we had a pain with it. Except for this, doing custom“stuff” is hard in an off-the-shelf solution. After some consideration we decided to implement our own solution forhow to collect and process logs (and more) on with focus on AWS.

What is it?

So, what is Lumber-Mill? Lumber-Mill is a Reactive (RxJava) API that can be used from any JVM compatiblelanguage. Currently Groovy is the best supported language but we are considering the best options to make it usablefor all jvm languages. We started out implementing our own pipeline similar to Logstash with different queues butdecided to drop that approach and go all in for Rx instead which we have been really happy about.

Why Java/jvm? It feels like the jvm has the best third-party libraries, support for AWS lambdas and a good fit for ouruse case where we often have high concurrency, io and cpu.

Some of the features include

• AWS Lambda support (Kinesis, S3, Cloudwatch Logs, Cloudtrail)

• S3 Get, Put, Delete

• Kinesis Producer / Consumer

• Grok, Compression, Templating, Base64, Json, Conditionals, Timestamps etc.

• Elasticsearch Bulk API (including AWS Elasticsearch Service)

• Graphite

• InfluxDB

3

Page 8: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

When should you use it?

If you run on AWS, you want a simple way of collecting your logs. Perhaps you want to run it as AWS Lambdas andconnect to S3, Kinesis and/or Cloudwatch Logs (incl Cloudtrail and VPC FLow Logs)

Who is it for?

Lumber-Mill is not designed for the non-programmer, even if he or she is likely to get it to work. Lumber-Millis designed for programmers/devops/SRE etc with a professional programming background but with an emergentinterest for devops, monitoring and of course log processing!

Samples please

Cloudwatch to Elasticsearch

This is a complete sample of AWS Lambda that when triggered by Cloudwatch Logs events, it will decode and parsethese events and send to AWS Elaticsearch Service.

public class DemoEventProcessor extends CloudWatchLogsLambda implements→˓EventProcessor {

public DemoEventProcessor() {super(this)

}

Observable call(Observable observable) {observable

// Parse and de-normalize events.compose ( new CloudWatchLogsEventPreProcessor()).flatMap ( addField('type','cloudwatchlogs')).flatMap ( fingerprint.md5()).buffer (100).flatMap (

AWS.elasticsearch.client (url: 'https://endpoint',index_prefix: 'indexname-',type: '{type}',region: 'eu-west-1',document_id: '{fingerprint}'

))

}}

S3

This sample is an AWS Lambda that when triggered by an S3 event it will not only download and parse but will alsogzip and put the file back on S3 before processing the contents.

Observable call(Observable observable) {

// Download locally, remove original on completedobservable.flatMap (

s3.download (bucket: '{bucket_name}',key: '{key}',remove: true

4 Chapter 1. Five minute intro

Page 9: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

))

// Compress file since we want compressed files on S3.flatMap (

gzip.compress (file: '{s3_download_path}'

))

// Put compressed file to S3 under processed directory.flatMap (

s3.put (bucket: '{bucket_name}',key : 'processed/{key}.gz',file : '{gzip_path_compressed}'

))

// Read each line.flatMap ( file.lines(file: '{s3_download_path}'))

// Parse lines with grok => json, tag with _grokparsefailure on miss.flatMap (

grok.parse (field: 'message',pattern: '%{AWS_ELB_LOG}',tagOnFailure: true

))

// Use correct timestamp.flatMap (

rename (from: 'timestamp',to : '@timestamp'

)).flatMap (

addField ('type', 'elb')).flatMap (

fingerprint.md5('{message}'))// Buffer to suitable bulk size.buffer(5000).flatMap (

// See Elasticsearch in previous sample or use other output)

}

Status

We use Lumber-Mill extensively to collect and process logs from different AWS accounts to our central system. Beforerelease, or even before we put it on master, we usually run it in production for quite some time.

1.3. Status 5

Page 10: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

We are currently thinking about the API and what the best approach is to make it as simple to work with and usablefrom multiple jvm languages. Due to that, api:s might feel a bit awkward (well, it can suck) to work with when notusing groovy.

Installation / Deployment

TODO

6 Chapter 1. Five minute intro

Page 11: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

CHAPTER 2

Sources

AWS Lambda Source

Lumber-Mill supports getting events with AWS Lambda from three different sources.

• Kinesis

• S3

• Cloudwatch logs (incl. cloudtrail, AWS Flowlogs)

Kinesis

To receive events from Kinesis, you must extend lumbermill.aws.lambda.KinesisLambda. The event contents will bethe raw contents of the kinesis record.

import lumbermill.api.EventProcessorimport lumbermill.aws.lambda.KinesisLambdaimport rx.Observableimport static lumbermill.Core.*

public class DemoLambda extends KinesisLambda {

public DemoLambda() {super(new DemoLambdaEventProcessor());

}

public static class DemoLambdaEventProcessor implements EventProcessor {

// Raw contents of kinesis eventObservable call(Observable observable) {

observable

// Convert to json if expecting json

7

Page 12: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

.flatMap ( toJsonObject() )

// Time behind latest is stored as metadata.doOnNext (console.stdout('Currently behind latest with {millisBehindLatest}

→˓ms').doOnNext( console.stdout())

S3

To receive events from S3, you must extend lumbermill.aws.lambda.S3Lambda. The event is a lumber-mill.api.JsonEvent with contents the metadata of the event as fields, the following fields exists.

• bucket_name

• key

• bucket_arn

• etag

• size

import lumbermill.api.EventProcessorimport lumbermill.aws.lambda.S3Lambdaimport rx.Observableimport static lumbermill.Core.*import static lumbermill.AWS.s3

public class DemoLambda extends S3Lambda {

public DemoLambda() {super(new DemoLambdaEventProcessor());

}

public static class DemoLambdaEventProcessor implements EventProcessor {

Observable call(Observable observable) {observable

// Since the event is only a reference, the file must be downloaded.flatMap (

s3.download (bucket: '{bucket_name}',key: '{key}',remove: true

))// Then (if you want), read each line as a separate Event.flatMap (

file.lines(file: '{s3_download_path}')).doOnNext( console.stdout())

8 Chapter 2. Sources

Page 13: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

Cloudwatch Logs

Receiving Cloudwatch Logs events is similar to both S3 and Kinesis. First subclass lumber-mill.aws.lambda.CloudWatchLogsLambda.

The data the is received in the call() method is encoded data which must be decoded since it it both compressed andBase64 encoded. It looks like this when the call() method is invokded.

Use lumbermill.aws.lambda.CloudWatchLogsEventPreProcessor which will decode, decompress, parse and denor-malize the data into a stream of JsonEvents

Each JsonEvent contains the fields

• message

• logGroup

• logStream

• @timestamp

import lumbermill.api.Codecsimport lumbermill.api.JsonEventimport lumbermill.api.EventProcessorimport lumbermill.aws.lambda.CloudWatchLogsLambdaimport lumbermill.aws.lambda.CloudWatchLogsEventPreProcessorimport rx.Observable

import static lumbermill.Core.*

public class DemoLambda extends CloudWatchLogsLambda {public DemoLambda() {

super(new DemoLambdaEventProcessor());}

private static class DemoLambdaEventProcessor implements EventProcessor {

Observable call(Observable observable) {

// Parse and de-normalize events (required as first transformer)// Will return JsonEvent.compose (

new CloudWatchLogsEventPreProcessor()).doOnNext(console.stdout())

}

VPC Flow Logs

VPC Flow Logs events are received from Cloudwatch logs and the raw json is stored in the ‘message’field. What we need to do is to extract this and convert it to JsonEvent and this is done with the lumber-mill.aws.lambda.VPCFlowLogsEventPreProcessor

Observable call(Observable observable) {.compose (

new VPCFlowLogsEventPreProcessor()).doOnNext(console.stdout())

}

2.1. AWS Lambda Source 9

Page 14: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

The JsonEvent has the following fields:

{"account_id" : "808736257386","action" : "ACCEPT","bytes" : 1990,"dstaddr" : "52.30.151.45","dstport" : "443","end" : 1480508691,"interface_id" : "eni-3a2b2575","log_status" : "OK","packets" : 11,"protocol" : "6","srcaddr" : "172.31.21.142","srcport" : "35052","start" : 1480508631,"version" : "2"

}

Cloudtrail

Cloudtrail events are received from Cloudwatch logs and the raw json is stored in the ‘message’ field. What we need todo is to extract this and convert it to JsonEvent. This will be a separate EventProcessor in next release of Lumber-Millin the same way as with VPC Flow Logs.

Observable call(Observable observable) {.compose (

new CloudWatchLogsEventPreProcessor())

// Decodes 'message' field and merge new and old event, 'message' field is→˓removed since we do not need it anymore.

.flatMap ({ JsonEvent -> eventreturn Codecs.JSON_OBJECT.from(event.valueAsString('message'))

.merge(event)

.remove('message')

.toObservable()}).doOnNext(console.stdout())

}

Kinesis Consumer Library

Lumber-Mill can use KCL to process data. Each ‘batch’ is received as a stream and checkpointed after it successfullyreturns. Currently there is no support for delay checkpointing.

Build

compile 'com.sonymobile:lumbermill-aws-kcl:$version'

This sample subscribes to a kinesis stream and simply prints the contents of each record and the total count.

import com.amazonaws.auth.DefaultAWSCredentialsProviderChainimport com.amazonaws.services.kinesis.clientlibrary.lib.worker.→˓KinesisClientLibConfigurationimport lumbermill.api.BytesEvent

10 Chapter 2. Sources

Page 15: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

import lumbermill.aws.kcl.KCLimport static lumbermill.api.Sys.envimport static lumbermill.aws.kcl.KCL.workerId

// Uses minimal KCL configurationKCL.create (

new KinesisClientLibConfiguration (env ('appName', 'testApp').string(),env ('streamName', 'testStream').string(),new DefaultAWSCredentialsProviderChain(),workerId())

.withRegionName(env ("region", "eu-west-1").string()))

.dry(env ("dry", "false").bool()) // Dry will not checkpoint

// Each record as an observable.handleRecordBatch { record ->

record.doOnNext{BytesEvent event -> println event.raw().utf8()}.count().doOnNext{count -> println count} // Prints the total number of records

→˓that was received.}

HTTP/REST Server

Lumbermill Http/Rest support wraps (the superb library) vertx to provide a simple but yet powerful way of ingestingdata with http.

We are running this module and ingesting large amount of data but it is currently limited in how much you canconfigure your rest endpoints. GET is not implemented other than a 200 OK healthcheck

POST with path param and query params

This samples shows how to setup an endpoint to receive Observable for each request containing path parameters, queryparameters and body. Path parameters and query parameters are stored as metadata and not in the body.

import lumbermill.api.Eventimport static lumbermill.Http.httpimport static lumbermill.Core.*

http.server(port:8080).post (

path : '/person/:name',tags : ['person'])

.onTag('person', { request ->request

.doOnNext(console.stdout('Name is {name}'))

.doOnNext(console.stdout('Lastname is {lastname}'))

.doOnNext{Event event -> println event.raw().utf8()}})

Running the following curl command should produce the output below

2.3. HTTP/REST Server 11

Page 16: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

curl -v -XPOST localhost:8080/person/johan?lastname=rask -d 'Hello world' in the→˓console.

Name is johanLastname is rask{"message":"Hello world","@timestamp":"2016-12-06T10:57:04.324+01:00","tags":["person→˓"]}

If you want to handle all data in a single method and skip tags you can use on() instead of onTag().

http.server(port:8080).post (

path : '/person/:name').on({ request ->

request.doOnNext(console.stdout('Name is {name}')).doOnNext{Event event -> println event.raw().utf8()}

})

Content-Type and codecs

These are the default Codecs used when receiving data.

• application/json : Codecs.JSON_ANY

• text/plain : Codecs.TEXT_TO_JSON

• default : Codecs.TEXT_TO_JSON

To change how data is decoded you can set the codec when creating an endpoint. The example below will simply print‘Hello world’ (if using the curl command above)

http.server(port:8080).post (

path : '/person/:name'),codec: Codecs.BYTES // Will simply wrap the body as raw bytes

.on({ request ->request

.doOnNext{Event event -> println event.raw().utf8()}})

Twitter

Twitter feed that is designed for experimental purposes only.

The following fields are found in the TweetEvent json

• message (tweet)

• id

• created_time_ms

• lang

• in_reply_to_screen_name

• in_reply_to_status_id

• in_reply_to_user_id

12 Chapter 2. Sources

Page 17: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

• retweeted_count

• favourite_count

• user_id

• user_name

Only if user specificed a location * place_name * place_full_name * country * country_code2 * longitude, double *latitude, double * location, [lon, lat] array

There is also full access to the underlying twitter4j api.

import static lumbermill.Core.*import lumbermill.social.Twitter

Twitter.feed ('consumer_key': '{consumer_key}','consumer_key_secret' : '{consumer_key_secret}','access_token' : '{access_token}','access_token_secret' : '{access_token_secret}'

).filter({ev -> ev.status().getPlace() != null}).doOnNext(console.stdout('Tweet {message} from place {place_full_name}')).subscribe()

Local Filesystem

The fs support is designed mainly to support one-time read of files which might be temporary files downloaded fromS3 or one-time jobs when recursively iterating a filesystem.

It does not support tail, if you need that there are better solutions*

Default codec is Codecs.TEXT_TO_JSON

Read a file once

This will create the source Observable and does not “hook” into an existing pipeline

import lumbermill.api.Codecs

import static lumbermill.Core.file

file.readFileAsLines (file: '/tmp/afile',codec : Codecs.TEXT_TO_JSON)

.filter( keepWhen( "'{message}'.contains('ERROR')" )

.doOnNext( console.stdout('Errors: {message}') )

.subscribe()

Read each line in an existing pipeline

If you i.e have downloaded a file from S3 or iterating a number of files you can use the file.lines() to read each line andreturn as Observables. This also takes the codec parameter if required.

.flatMap (file.lines(file: '{s3_download_path}')

)

2.5. Local Filesystem 13

Page 18: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

14 Chapter 2. Sources

Page 19: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

CHAPTER 3

Enriching functions

Functions used in the pipeline to mutate/enrich the event contents.

Unless specified, functions are part of the core module which is used by depending on the core module and importingall methods on the lumbermill.Core class.

compile 'com.sonymobile:lumbermill-core:$version'

import static lumbermill.Core.*

Add / Remove / Rename

o.flatMap ( addField('name', 'string'))o.flatMap ( addField('name', 10))o.flatMap ( addField('name', true))o.flatMap ( addField('name', 10.8))

o.flatMap( remove('field'))o.flatMap( remove('field1', 'field2'))

o.flatMap ( rename (from: 'source', to: 'target'))

Base64

Base64 encodes and decodes the contents of an Event and returns a lumbermill.api.BytesEvent

o.flatMap ( base64.encode())

o.flatMap ( base64.decode())

15

Page 20: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

Fingerprint / Checksum

Adds a fingerprint based on either the complete payload or based on on or more fields (supports pattern).

It is up to the user to create the source string to be used as fingerprint. Best practice to separate each ‘word’ with achar, like a pipe (|) char to prevent any unexpected behaviour. Read more at https://github.com/google/guava/wiki/HashingExplained.

o.flatMap( fingerprint.md5('{@timestamp}|{message}'))

// Raw payloado.flatMap( fingerprint.md5())

// To access the fingerprint, use field 'fingerprint'o.doOnNext( console.stdout('Fingerprint was {fingerprint}'))

Compression

Support for gzip and zlib.

Zlib support for file compression/decompression is not finished, only for event contents

Example of file compression/decompression can be a reference to an S3 file that is compressed and must be decom-pressed before usage. Or a local file reference that must be compressed before put back on S3.

// Compress a fileo.flatMap ( gzip.compress (

file: 'fileName', // Supports patternoutput_field: 'gzip_path_compressed' // Optional, defaults to gzip_path_compressed

))

// Decompress a fileo.flatMap ( gzip.decompress (

file: 'fileName', // Supports patternoutput_field: 'gzip_path_decompressed' // Optional, defaults to gzip_path_

→˓decompressed)

// Decompress a payloado.flatMap ( gzip.decompress())o.flatMap ( zlib.decompress())

// Compress a payloado.flatMap ( gzip.compress())o.flatMap ( zlib.compress())

Timestamps

Helps out converting different times to @timestamp: ISO_8601.

// Add timestamp field nowo.flatMap( timestampNow())

16 Chapter 3. Enriching functions

Page 21: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

// Timestamp from @timestamp that contains time in seconds into @timestampo.flatMap( timestampFromSecs())

// Timestamp from a field that contains time in seconds into @timestampo.flatMap( timestampFromSecs('fieldWithTime'))

// Timestamp from a field that contains time in seconds into another fieldo.flatMap( timestampFromSecs('fieldWithTime', 'targetFieldWithTime'))

// Timestamp from @timestamp that contains time in millis into @timestampo.flatMap( timestampFromMs())

// Timestamp from a field that contains time in millis into @timestampo.flatMap( timestampFromMs('fieldWithTime'))

// Timestamp from a field that contains time in millis into another fieldo.flatMap( timestampFromMs('fieldWithTime', 'targetFieldWithTime'))

Conditionals

Currently, the support for conditionals is limited but it is WIP. It is currently done by using one of the compute*methods.

The conditional functions can:

• return a function

• invoke a function

• invoke multiple functions

// Execute If a tag existscomputeIfTagExists ('tagName');

// Execute If a tag does not existscomputeIfTagIsAbsent ('tagName');

// Execute If a regex match a fieldcomputeIfMatch ('message', '<regex>');

// Execute If a regex does not match a fieldcomputeIfNotMatch ('message', '<regex>');

// Execute If a field existscomputeIfExists('fieldName')

//Execute if a field does not existcomputeIfAbsent('fieldName')

// This will create a fingerprint unless the field 'fingerprint' already existso.flatMap ( computeIfAbsent('fingerprint') {

fingerprint.md5()})

3.6. Conditionals 17

Page 22: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

Filters

RxJava provides the observable.filter() operation that can be used to keep or skip data. Lumber-Mill provides twofunctions that can be used together with filter.

The expression uses JavaScript, so it must be valid javascript and must return a boolean value but it can be ANYexpression in JavaScript

Some simple examples

// String equals, Note the quotes!!o.filter( keepWhen("'{name}' == 'Johan'"))

// String containso.filter( keepWhen("'{message}'.contains('ERROR'")) // Same as str.indexOf(string) !=→˓-1

// Numberso.filter( skipWhen("{age} == 99"))

// Booleano.filter( skipWhen("{isHappy} == false)")

// Arrayo.filter( keepWhen("{tags}.contains('Johan')")

// combinationo.filter( keepWhen("'{name}' == 'Johan' && {isHappy} == true"))

Grok

Grok is one of the most powerful functions in lumbermill and it works “almost” in the same way as in logstash.Lumber-Mill is bundled with the same grok patterns as Logstash is, plus a few more AWS related patterns.

This sample expects an AWS ELB file to be processed.

o.flatMap( grok.parse (field: 'message',pattern: '%{AWS_ELB_LOG}',tagOnFailure: true, // Optional, defaults to truetag: '_grokparsefailure' // Optional, defaults to _grokparsefailure

))

GeoIP

This comes as a separate module lumbermill-geospatial and it also requires you to download the database to use.

To prevent classpath issues, you must exclude jackson dependencies when depending on this module.

compile ('com.sonymobile:lumbermill-geospatial:$version') {exclude group: 'com.fasterxml.jackson.core'exclude group: 'com.fasterxml.jackson.databind'exclude group: 'com.fasterxml.jackson.annotations'

}

18 Chapter 3. Enriching functions

Page 23: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

o.flatMap (geoip (

'source' : 'client_ip', // Required - if field does not exist it simply will→˓not add any geo info

'target' : 'geoip', // Optional - defaults to 'geoip''path' : '/tmp/GeoLite2-City.mmdb', // Optional, but if not supplied

→˓GeoLite2-City.mmdb must be found on classpath'fields' : ['country_code2', 'location'] // Optional, defaults to all fields

))

Important, the GeoLite2-City.mmdb MUST be downloaded and imported from the project that depends on this mod-ule, the database in NOT included in the distribution.

wget http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gzgunzip GeoLite2-City.mmdb.gz

The database file can be opened from classpath if you make it available there, and this is default behaviour.

mv GeoLite2-City.mmdb your_project/src/main/resources

Or it can be located somewhere on the filesystem

mv GeoLite2-City.mmdb /tmp

geoip (field: 'client_ip', path: '/tmp/GeoLite2-City.mmdb.gz')

Docker

Simply prepare the image with the maxmind database

WORKDIR /srvRUN wget http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gzRUN gunzip GeoLite2-City.mmdb.gz

And use it from code

geoip ('source' : 'client_ip','path' : '/srv/GeoLite2-City.mmdb'

)

3.9. GeoIP 19

Page 24: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

20 Chapter 3. Enriching functions

Page 25: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

CHAPTER 4

Sinks

Elasticsearch

Stores events in Elasticsearch

Usage

build.gradle

compile 'com.sonymobile:lumbermill-elasticsearch-client:$version'

Groovy script

import lumbermill.api.Codecsimport static lumbermill.Core.*import static lumbermill.Elasticsearch.elasticsearch

Observable.just(Codecs.TEXT_TO_JSON.from("hello"), Codecs.TEXT_TO_JSON.from("World")).flatMap (

fingerprint.md5('{message}')).buffer (100) // Buffering is currently required. Pick a suitable amount..flatMap (

elasticsearch.client (basic_auth: 'user:passwd', // Optionalurl: 'http(s)://host', // Requiredindex_prefix: 'myindex-', // Required, supports pattern '

→˓{anIndex}-'type: 'a_type', // Required, supports pattern '{type}'document_id: '{fingerprint}', // Optional, but recommendedtimestamp_field: '@timestamp' // Optional, defaults to @timestampretry: [ // Optional, defaults to fixed, 2000,

→˓20policy: 'linear',attempts: 20,

21

Page 26: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

delayMs: 500],dispatcher: [ // Optional

max_concurrent_requests: 2, // Optional, defaults to 5threadpool: <ExecutorService>, // Optional

])

).toBlocking().subscribe()

Arguments

Elasticsearch requires a List<JsonEvent> as input so you MUST buffer before sending. It will convert events into asingle Bulk API request.

Returns

Elasticsearch returns Observable<ElasticSearchResponseEvent> which extends JsonEvent and contains the actual rawresponse from Elasticsearch. If you want to continue working with the original Events that where sent as argumentsto Elasticsearch function you can get those with the arguments() method.

o.flatMap (elasticsearch.client (...).flatMap(response.arguments())

)

Errors

Elasticsearch client is built to handle partial errors, meaning that some entries are not properly stored. This could bedue to anything from malformed content or shard failures. The Elasticsearch client will retry any failures that are notunrecoverable (400 BAD_REQUEST), those will simply be ignored and not retried.

Retries

Elasticsearch has a default retry policy that is fixed delay of 2 seconds and 20 attempts, so it will retry failed recordsevery 2 seconds 20 times.

Once there are no more retries, an FatalIndexException is thrown to indicate that it failed and there is no use tocontinue.

Limitations

• Currently it only uses index operation, does not support create, update or delete.

• Only daily indices can be created.

Performance

It is a custom implementation based on OkHttp. We started out with Jest but could not get good enough throughput,but OkHttp has proven to be amazing.

AWS Elasticsearch Service

Stores events in AWS Elasticsearch Service. The configuration is the same as “normal” Elasticsearch but you need toimport the aws module instead, and also configure which region that is used.

Usage

build.gradle

22 Chapter 4. Sinks

Page 27: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

compile 'com.sonymobile:lumbermill-aws:$version'

Groovy script

import lumbermill.api.Codecsimport lumbermill.AWSimport static lumbermill.Core.*

Observable.just(Codecs.TEXT_TO_JSON.from("hello"), Codecs.TEXT_TO_JSON.from("World")).flatMap (

fingerprint.md5('{message}')).buffer (100) // Buffering is currently required. Pick a suitable amount..flatMap (

AWS.elasticsearch.client (// Same options as for "normal" Elasticsearch

region : 'eu-west-1' // Optional, defaults to eu-west-1)

).toBlocking().subscribe()

Graphite

Stores metrics in graphite using carbon line protocol.

Usage

build.gradle

compile 'com.sonymobile:lumbermill-graphite:$version'

Groovy script

This simple sample will write two metrics from a single event to graphite.

import lumbermill.api.Codecsimport static lumbermill.Graphite.carbon

Codecs.TEXT_TO_JSON.from("hello").put("@metric", "hits.count").put("@value", 5).toObservable().flatMap (

carbon (host: 'localhost', // Optional (localhost)port: 2003, // Optional (2003)timestamp_field: '@timestamp', // Optional (@timestamp)timestamp_precision: 'ISO_8601', // Optional (ISO_8601).

→˓Supports 'MILLIS' and 'SECONDS'metrics : [ // At least one metric is

→˓required'stats.counters.{@metric}' : '{@value}', // If either @metric

→˓or @value is missing no metric is stored'stats.counters.duplicate.{@metric}' : '{@value}'

4.3. Graphite 23

Page 28: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

])

).toBlocking().subscribe()

Limitations

Since there is no way of supporting updates, at-least-once delivery is not supported. Any event that comes more thanonce will be stored again.

InfluxDB

Stores metrics in influxdb.

Usage

build.gradle

compile 'com.sonymobile:lumbermill-influxdb:$version'

Groovy script

{"@timestamp" : "ISO_DATE","metric" : "cpu","avg" : 75,"max", : 98,"device_name" : null

}

import lumbermill.api.Codecsimport java.util.concurrent.TimeUnitimport static lumbermill.Influxdb.influxdb

client = influxdb.client (url : 'http://influxdb:8086', //Requireduser : 'root', //Requiredpassword : 'root', // Requireddb : 'testDb', // Required, supports templatingmeasurement : '{metric}', // Required, supports templatingfields : [

// value should be the name of the field, template not supported to→˓support correct type (WIP)

'avg' : 'avg','max' : 'max',

],excludeTags : ['@timestamp', 'message'],precision : TimeUnit.MILLISECONDS // Optional (default MS), precision of

→˓the time field)

rx.Observable.just(json).buffer(2).flatMap (client).toBlocking().subscribe()

24 Chapter 4. Sinks

Page 29: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

S3

All S3 functions supports the config field roleArn to be able to assume a different role.

o.flatMap (s3.download (

roleArn: 'the_role_arn_to_assume'))

S3 Download / Get

Download to filesystem

I.e, an S3 event that you receive from an AWS Lambda will contain the bucket and key to that file so downloading thislocally is simple. The sample below will download it as a temp file on local disk and add the field ‘s3_download_path’to the event.

observable.flatMap (s3.download (

bucket: '{bucket_name}',key: '{key}',remove: true,output_field: 's3_download_path' // Optional, defaults to s3_download_path

)).doOnNext(console.stdout('File downloaded with filename {s3_download_path}'))

Download as BytesEvent

Similar to the example above but instead a BytesEvent with the complete contents of the S3 file is downloaded.

observable.flatMap (s3.get (

bucket: '{bucket_name}',key: '{key}',codec: Codecs.BYTES // Optional, defaults to bytes

)).doOnNext{Event e -> println 'Size in bytes: ' + e.raw().size()}

S3 Put

S3 Put takes a reference to a file and puts it on S3. The example below takes the downloaded file and makes a copywith the appended suffix ‘.copy’.

.flatMap (s3.put (

bucket: '{bucket}',key : '{key}.copy',file : '{s3_download_path}'

))

4.5. S3 25

Page 30: Lumber-Mill Documentation · Short introduction of the background of Lumber-Mill, why it exists and who it is usable for. Why, when and for who? Why did we build it? To make a long

Lumber-Mill Documentation, Release 0.20

Kinesis

Stores metrics in Kinesis using aws-sdk. It has built-in retry functionality.

Usage

build.gradle

compile 'com.sonymobile:lumbermill-aws:$version'

Groovy script

Sample will invoke putRecords() with two events. Even if the buffer says 100, onCompleted will be invoked after thetwo events have been processed which will cause the pipeline to flush.

import lumbermill.api.Codecsimport lumbermill.Core.*import lumbermill.AWS.*

Observable.just(Codecs.TEXT_TO_JSON.from("hello"), Codecs.TEXT_TO_JSON.from("World")).buffer (100).flatMap (

kinesis.bufferedProducer (region: 'eu-west-1', // Optional, defaults to eu-west-1,

→˓overridded by endpointendpoint: 'host', // Optional, for custom hostnamestream: 'stream_name', // Requiredpartition_key: '{afield}', // Optional (**Recommended**), supports

→˓patterns. defaults to randomized uuidmax_connections: 10, // Optional, defaults to 10.request_timeout: 60000, // Optional, defaults to 60000msretry: [ // Optional, defaults to fixed, 2000, 20

policy: 'linear',attempts: 20,delayMs: 500

])

).toBlocking().subscribe()

26 Chapter 4. Sinks