elasticsearch first-steps

51
Elasticsearch: first steps with an Aggregate-oriented database Jug Roma 28/11/2013 Matteo Moci

Upload: matteo-moci

Post on 26-Jan-2015

136 views

Category:

Technology


2 download

DESCRIPTION

Elasticsearch: first steps with an aggregate-oriented database

TRANSCRIPT

Page 1: Elasticsearch first-steps

Elasticsearch: first steps with an

Aggregate-oriented database

Jug Roma 28/11/2013

Matteo Moci

Page 2: Elasticsearch first-steps

Me

Matteo Moci

@matteomoci

http://mox.fm

Software Engineer

R&D, new product development

Page 3: Elasticsearch first-steps

Agenda

• 2 Use cases

• Elasticsearch Basics

• Data Design for scaling

Page 4: Elasticsearch first-steps

Social Media Analytics Platform

for Marketing Agencies

Page 5: Elasticsearch first-steps

Scenario

•Using Elasticsearch as:

•Analytics engine

•Aggregate repository

Page 6: Elasticsearch first-steps

Use case 1

• count values distribution over time

Page 7: Elasticsearch first-steps

Before

• ~10M documents

•Heaviest query:

• ~10 minutes

•Our staff had a problem

Page 8: Elasticsearch first-steps

After

• ~10M documents

•Heaviest query:

• ~1 second (also with larger dataset)

Page 9: Elasticsearch first-steps

Use case 2

• Aggregate-oriented repository

• ...as in DDD

http://ptgmedia.pearsoncmg.com/images/chap10_9780321834577/elementLinks/10fig05.jpg

Page 10: Elasticsearch first-steps

ElasticsearchDistributed RESTful search and analytics

real time data and analytics

distributed

high availability

multi tenancy

full-text search

schema free

RESTful, JSON API

Page 11: Elasticsearch first-steps

Elasticsearch basics

• Install• API• Types mapping• Facets• Relations

Page 12: Elasticsearch first-steps

Install

$ wget https://download.elasticsearch.org/... $ tar -xf elasticsearch-0.90.7.tar.gz

Page 13: Elasticsearch first-steps

Run!

Page 14: Elasticsearch first-steps

es

Run!$ ./elasticsearch-0.90.7/bin/elasticsearch -f

Hulk

Page 15: Elasticsearch first-steps

es

Run!$ ./elasticsearch-0.90.7/bin/elasticsearch -f

Hulk

$ ./elasticsearch-0.90.7/bin/elasticsearch -f

Page 16: Elasticsearch first-steps

es

Run!$ ./elasticsearch-0.90.7/bin/elasticsearch -f

Hulk Thor

$ ./elasticsearch-0.90.7/bin/elasticsearch -f

Page 17: Elasticsearch first-steps

Index a document

$ curl -X PUT localhost:9200/products/product/1 -d '{

"name" : "Camera" }'

Page 18: Elasticsearch first-steps

Search

$ curl‐XGET 'localhost:9200/products/product/_search?q=Camera'

Page 19: Elasticsearch first-steps

esHulk

Products

1 2

1 2

Shards and Replicas

Page 20: Elasticsearch first-steps

esThorHulk

Products

1 2

1 2

Shards and Replicas

Page 21: Elasticsearch first-steps

esThor

Products

Hulk

Products

1 2

1 2

Shards and Replicas

Page 22: Elasticsearch first-steps

esThor

Products

Hulk

Products

1 2

1 2

Shards and Replicas

Page 23: Elasticsearch first-steps

esThor

Products

Hulk

Products

1 2

12

Shards and Replicas

Page 24: Elasticsearch first-steps

Integration

Hulk9300

Thor9300

Page 25: Elasticsearch first-steps

Integration

Hulk

TransportClient

9300Thor

9300

Page 26: Elasticsearch first-steps

Async Java APIthis.client.prepareGet("documents", "document", id) //async, non blocking APIs //use a listener to handle result. non-blocking .execute(new ActionListener<GetResponse>() { @Override public void onResponse(GetResponse getFields)

{ // }

@Override public void onFailure(Throwable e) { // }

Page 27: Elasticsearch first-steps

Mapping

Mappings define how primitive types are stored and analyzed

Page 28: Elasticsearch first-steps

Mapping• JSON data is parsed on indexing• Mapping is done on first field indexing• Inferred if not configured (!)• Types: float, long, boolean, date

(+formatting), object, nested• String type can have arbitrary analyzers• Fields can be split up in more fields

Page 29: Elasticsearch first-steps

{ "text": { "type": "multi_field", "fields": { "text": { "type": "string", "index": "analyzed", "index_analyzer": "whitespace", "analyzer": "whitespace" }, "text_bigram": { "type": "string", "index": "analyzed", "index_analyzer": "bigram_analyzer", "search_analyzer": "bigram_analyzer" }, "text_trigram": { "type": "string", "index": "analyzed", "index_analyzer": "trigram_analyzer", "search_analyzer": "trigram_analyzer" } } }}

Page 30: Elasticsearch first-steps

Mapping - lessons

• schema can evolve (e.g. add fields)• inferred if not specified (!)• worst case: reindex• use aliases to enable zero downtime

Page 31: Elasticsearch first-steps

Search with Facetsfinal TermsFacetBuilder userFacet = FacetBuilders.termsFacet(MENTION_FACET_NAME) .field(USER_ID).size(maxUsersAmount);

SearchResponse response; response = client.prepareSearch(Indices.USERS) .setTypes(USER_TYPE) .setQuery(someQuery).setSize(0) .setSearchType(SearchType.COUNT)

.addFacet(userFacet).execute().actionGet();

final TermsFacet facets = (TermsFacet) response.getFacets().facetsAsMap() .get(MENTION_FACET_NAME);

Page 32: Elasticsearch first-steps

Query

Facets

Page 33: Elasticsearch first-steps

Date Histogram Facet

The histogram facet works with numeric data by building a histogram across intervals of the field values.

Each value is placed in a “bucket”

Page 34: Elasticsearch first-steps

{    "query" : {        "match_all" : {}    },    "facets" : {        "histo1" : {            "histogram" : {                "field" : "followers",                "interval" : 10            }        }    }}

Page 35: Elasticsearch first-steps

Facets - lessonsBug in 0.90.x:

• https://github.com/elasticsearch/elasticsearch/issues/1305*

Solutions: • use 1 shard• ask for top 100 instead of 10

*will be solved in 1.0 with aggregation module

Page 36: Elasticsearch first-steps

Analyzers

A Lucene analyzer consists of a tokenizer and an arbitrary amount of filters (+ char filters)

Page 37: Elasticsearch first-steps

{ "index":{ "analysis":{ "filter":{ "bigram_shingle_filter":{ "type":"shingle", "max_shingle_size":2, "min_shingle_size":2, "output_unigrams":"false", "output_unigrams_if_no_shingles":"false" }, "trigram_shingle_filter":{ "type":"shingle", "max_shingle_size":3, "min_shingle_size":3, "output_unigrams":"false", "output_unigrams_if_no_shingles":"false" } } ...

..."analyzer":{ "bigram_analyzer":{ "tokenizer":"whitespace", "filter":[ "standard", "bigram_shingle_filter" ] }, "trigram_analyzer":{ "tokenizer":"whitespace", "filter":[ "standard", "trigram_shingle_filter" ] } } } }}

Page 38: Elasticsearch first-steps

Relations between Documents

BookAuthorN1

• nested: faster reads, update needs reindex, cross object match• parent/child: same shard, no reindex on update, difficult sorting

Page 39: Elasticsearch first-steps

Nested Documents

Specify Book type is “nested” in Author’s Mapping

We can query Authors with a query on properties of nested Books

“Authors who published at least a book with Penguin, in scifi genre”

Page 40: Elasticsearch first-steps

curl -XGET localhost:9200/authors/nested_author/_search -d '{ "query": { "filtered": { "query": {"match_all": {}}, "filter": { "nested": { "path": "books", "query":{ "filtered": { "query": { "match_all": {}}, "filter": { "and": [ {"term": {"books.publisher": "penguin"}}, {"term": {"books.genre": "scifi"}} ] } } } } } } }}'

Page 41: Elasticsearch first-steps

Parent and Child

Indexing happens separately

Specify _parent type in Child mapping (Book)

When indexing Books, specify id of Author

Page 42: Elasticsearch first-steps

curl -XPOST localhost:9200/authors/book/_mapping -d '{ "book":{ "_parent": {"type": "bare_author"} }}'

curl -XPOST localhost:9200/authors/book/1?parent=2 -d '{ "name": "Revelation Space", "genre": "scifi", "publisher": "penguin"}'

Page 43: Elasticsearch first-steps

Parent and Child - query

curl -XPOST localhost:9200/authors/bare_author/_search -d '{ "query": { "has_child": { "type": "book", "query" : { "filtered": { "query": { "match_all": {}}, "filter" : { "and": [ {"term": {"publisher": "penguin"}}, {"term": {"genre": "scifi"}} ] } } } } }}'

Page 44: Elasticsearch first-steps

Data DesignIndex Configurations

• One index “per user”• Single index• SI + Routing: 1 index + custom doc routing

to shards• Time: 1 index per time window *

* we can search across indices

Page 45: Elasticsearch first-steps

One Index per userHulk Thor

User1 s0 User1 s1

User2 s0

+ different sharding per user- small users own (and cost) at least 1 shard

Page 46: Elasticsearch first-steps

Single IndexHulk Thor

Users s0 Users s3

+ filter by user id, support growth- search hits all shards

Users s2

Page 47: Elasticsearch first-steps

Single Index + routingHulk Thor

Users s0 Users s3

+ a user’s data is all in one shard, allows large overallocation

Users s2

Page 48: Elasticsearch first-steps

Index per time rangeHulk Thor

2013_01 s1 2013_01 s2

+ allows change in future indices

2013_02 s1

Page 49: Elasticsearch first-steps

Data Design - lessonsTest, test, test your use case!

Take a single node with one shard and throw load at it, checking the shard capacity

The shard is the scaling unit: overallocate to enable future scaling

#shards > #nodes

Page 50: Elasticsearch first-steps

...ES has lots of other features!

• Bulk operations• Percolator (alerts, classification, …) • Suggesters (“Did you mean …?”) • Index templates (Automatic index

configuration) • Monitoring API (Amount of memory used,

number of operations, …)• Plugins• ...

Page 51: Elasticsearch first-steps

Thanks!

@matteomocihttp://mox.fm