distributed percolator in elasticsearch
DESCRIPTION
TRANSCRIPT
![Page 1: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/1.jpg)
Martijn van Groningen@mvgroningen
Percolator
Thursday, September 5, 13
![Page 2: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/2.jpg)
Topics• What is percolator?
• Redesigned percolator
• New percolator features
• How does the percolator work?
Thursday, September 5, 13
![Page 3: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/3.jpg)
Percolator?
coffee OR pots
Title : Coffee percolatorBody : A coffee percolator is a type of pot used to brew coffee by continually cycling the boiling or nearly-boiling brew through ...
Title : Coffee percolatorBody : A coffee percolator is a type of pot used to brew coffee by continually cycling the boiling or nearly-boiling brew through ...
Title : Coffee percolatorBody : A coffee percolator is a type of pot used to brew coffee by continually cycling the boiling or nearly-boiling brew through ...
Title : Coffee percolatorBody : A coffee percolator is a type of pot used to brew coffee by continually cycling the boiling or nearly-boiling brew through ...
1. Coffee percolator2. Plain old telephone service (pots)...
Hits
QueryDocuments
Thursday, September 5, 13
![Page 4: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/4.jpg)
Percolator?
coffee OR pots
Title : Coffee percolatorBody : A coffee percolator is a type of pot used to brew coffee by continually cycling the boiling or nearly-boiling brew through ...
1. Coffee OR pots2. boiling AND brew...
Matches
Document Queries
boiling AND brew
other AND stuff
Thursday, September 5, 13
![Page 5: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/5.jpg)
Percolator?• Reversed search
• Document becomes a query and a query becomes a document.
• Queries need to be stored.
• matches != hitsBecause hits has relevancy whereas matches have not.
Thursday, September 5, 13
![Page 6: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/6.jpg)
Percolator, but how?• Search request:
• Queries are defined in JSON.
• But so are documents!
curl -XPOST 'localhost:9200/my-index/_search' -d '{ "query" : { "match" : { "body" : "coffee" } }}'
Thursday, September 5, 13
![Page 7: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/7.jpg)
Percolator, but how?• Indexing a query (<=0.90):
• Any query can be indexed as a document.Plus any arbitrary data
curl -XPUT 'localhost:9200/_percolator/my-index/my-id' -d '{ "query" : { "match" : { "body" : "coffee" } }, "click_id" : 12}'
Thursday, September 5, 13
![Page 8: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/8.jpg)
Percolator, but how?• Indexing a query:
• Path structureindex: _percolator is a reserved index for queries.type: The index to register a query to.id: The unique identifier for a query.
curl -XPUT 'localhost:9200/_percolator/my-index/my-id' -d '{ "query" : { "match" : { "body" : "coffee" } }, "click_id" : 12}'
Thursday, September 5, 13
![Page 9: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/9.jpg)
Percolator, but how?• Percolate api (<=0.90):
• All queries registered to ‘my-index’ are consulted.
curl -XPUT 'localhost:9200/my-index/my-type/_percolate' -d '{ "doc" : { "title" : "Coffee percolator", "body" : "A coffee percolator is a type of ..." }}'
Thursday, September 5, 13
![Page 10: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/10.jpg)
Percolator, but how?• Percolate api response (<=0.90):
• A simple list of query ids.
• Also the percolate api work in realtime.
{"ok" : true
"matches" : ["my-id",...]}
Thursday, September 5, 13
![Page 11: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/11.jpg)
Percolation in the wild
Thursday, September 5, 13
![Page 12: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/12.jpg)
Alerting use case• Store and register queries that monitor data.
End users can define their alerts via application.
• Execute the percolate api right after indexing.No need to wait - percolator works in realtime.
• Examples:Price monitor, News alerts, Stock alerts, Weather alerts
Thursday, September 5, 13
![Page 13: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/13.jpg)
Alerting use case
curl -XPUT 'localhost:9200/_percolator/prices/user-1' -d '{ "query" : {
"bool" : [{
"range" : { "product.price" : { "lte" : 500 }}
},{
"match" : { "product.name" : "my led tv" } } ]
}}'
Triggered by user adding an user alert:
Thursday, September 5, 13
![Page 14: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/14.jpg)
Percolator - alerting use case
curl -XPOST 'localhost:9200/prices/price/_percolate' -d '{ "doc" : {
"product" : { "name" : "my led tv", "price" : 499
} }}'
Then when new TVs are added:
Thursday, September 5, 13
![Page 15: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/15.jpg)
Pricing use case• Store all users’ queries of a specific time frame
Last week’s, last month’s queries.
• Provide feedback to advertisement owner.Execute percolate api while editing the ad.
• Examples:Real estate, car sales or any other market place.
Thursday, September 5, 13
![Page 16: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/16.jpg)
Contextual ads use case• Store advertisement as queries.
• On page display percolate document against the stored advertisements.
• Examples:Gmail
Thursday, September 5, 13
![Page 17: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/17.jpg)
Classification use case• Store queries that can identify patterns in your
documents.
• Percolate a document before indexing it.Enrich the document with the queries it matches with.
• Examples:Automatically tag documents, geo tag documents and ways to automatically categorize documents.
Thursday, September 5, 13
![Page 18: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/18.jpg)
Distributed Percolator
Thursday, September 5, 13
![Page 19: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/19.jpg)
Percolator - redesign• The _percolator index can only have one
primary shard.
Node 1
p1
Node 2
p1
Node 3
p1
CPercolate
?
? ?
?
? ?
?
? ?
Thursday, September 5, 13
![Page 20: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/20.jpg)
Percolator - redesign• The redesigned percolator has no dedicated
reserved _percolator index.
• Instead the redesigned percolator has a _percolator type / mapping.
• Any index can become a percolator index.Without any restrictions on (sharding) settings.
Thursday, September 5, 13
![Page 21: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/21.jpg)
Percolator - redesign• Because _percolator index has been
replaced by _percolator type:
• Queries and your data coexist in the same index.Percolator shares the settings of the index it sits in.
• Or have a number dedicated percolator indices.
Thursday, September 5, 13
![Page 22: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/22.jpg)
Percolator - redesign• Redesigned percolator is fully distributed.
Node 1
a1
Node 2
a1
C
Percolate
a2a2
Thursday, September 5, 13
![Page 23: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/23.jpg)
Percolator - redesign• Indexing a query:
• Path structureindex: The index to hold the query.type: The reserved _percolator type.id: The unique identifier for a query.
curl -XPUT 'localhost:9200/my-index/_percolator/my-id' -d '{ "query" : { "match" : { "body" : "coffee" } }, "click_id" : 12}'
Thursday, September 5, 13
![Page 24: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/24.jpg)
• Percolate api remains similar, but:
Fully multi tenant:
Full alias support:
And routing support.
Percolator - redesign
curl -XGET 'localhost:9200/my-index1,my-index2/my-type/_percolate' -d '{ "doc" : { "title" : "Coffee percolator", "body" : "A coffee percolator is a type of ..." }}'
curl -XGET 'localhost:9200/my-alias/my-type/_percolate' -d '{ "doc" : { "title" : "Coffee percolator", "body" : "A coffee percolator is a type of ..." }}'
Thursday, September 5, 13
![Page 25: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/25.jpg)
Percolator - redesign• Percolate api response:
{"took" : 19,"_shards" : {
"total" : 2, "successful" : 2, "failed" : 0 },
"count" : 4,"matches" : [
{"_index" : "my-index1","_id" : "my-id"
},{
"_index" : "my-index2","_id" : "my-id"
},...
]}
Thursday, September 5, 13
![Page 26: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/26.jpg)
Percolator - how does it work?• Each shard holds a Collection of parsed
queries in memory.
• The queries are also stored on the shard (Lucene index)
• The collection of queries get updated by every index, create, update or delete operation in realtime.
Thursday, September 5, 13
![Page 27: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/27.jpg)
Percolator - how does it work?• During percolating the document to be
percolated gets indexed into an in memory index.
• All shard queries are executed against this one document in memory index.Shard level execution time is linear to the amount queries to evaluate.
• After all queries have been evaluated the in memory index gets cleaned up.
Thursday, September 5, 13
![Page 28: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/28.jpg)
Distributed percolator• Percolate api executes the request in parallel
on all shards.
• Use routing and multi tenancy to reduce the amount of queries to evaluate.- Routing will reduce the amount of shards.- More indices (and therefore more shards) reduces the amount of queries per shard.
Thursday, September 5, 13
![Page 29: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/29.jpg)
Distributed percolator• No routing / partitioning
Node 1
a1
Node 2
a1
C
Percolate
a2a2
a3 a3
Thursday, September 5, 13
![Page 30: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/30.jpg)
Distributed percolator• Percolating with routing:
Node 1
a1
Node 2
a1
C
Percolate, but route with XYZ
a2a2
a3 a3
Thursday, September 5, 13
![Page 31: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/31.jpg)
Node 1
Distributed percolator• Percolating based on location partitioning in
different indices.
a1
Node 2
a1
C
a2a2
b1 b1b2
index a = EU queriesindex b = NA queries
b2
Thursday, September 5, 13
![Page 32: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/32.jpg)
Percolator features
Thursday, September 5, 13
![Page 33: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/33.jpg)
Feature - percolate existing doc• Percolating a newly indexed document is very
common pattern.
curl -XGET 'localhost:9200/my-index1/my-type/1/_percolate'
curl -XGET 'localhost:9200/my-index1/my-type/1/_percolate?percolate_index=my-index2'
my-index1 is both percolate and source index:
my-index2 contains the queries to evaluate:
and my-index1 contains the document to percolate
Thursday, September 5, 13
![Page 34: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/34.jpg)
Feature - count api
curl -XPUT 'localhost:9200/my-index1/my-type/_percolate/count' -d '{ "doc" : { "title" : "Coffee percolator", "body" : "A coffee percolator is a type of ..." }}'
{"took" : 8,"_shards" : {
"total" : 2, "successful" : 2, "failed" : 0 },
"count" : 5}
Response:
Count api:
curl -XPUT 'localhost:9200/my-index1/my-type/1/_percolate/count'
Count existing doc api:
Thursday, September 5, 13
![Page 35: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/35.jpg)
Feature - filtering
curl -XGET 'localhost:9200/my-index1/my-type/_percolate/count' -d '{ "doc" : { "title" : "Coffee percolator", "body" : "A coffee percolator is a type of ..." },
"query" : {"term" : {"click_id" : "43"}
}}'
Filtering by query:
curl -XGET 'localhost:9200/my-index1/my-type/_percolate/count' -d '{ "doc" : { "title" : "Coffee percolator", "body" : "A coffee percolator is a type of ..." },
"filter" : {"term" : {"click_id" : "43"}
}}'
Filtering by filter:
Thursday, September 5, 13
![Page 36: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/36.jpg)
Feature - sorting / scoring• Build on top on the query support.• Sorting based on percolator query fields.
Document being percolated isn’t scored!
• Three new options:• size The amount of matches to return (required with sort)
• sort Whether to sort based on query.
• score Just include score, but don’t sort
• Like the query / filter support not realtime.
Thursday, September 5, 13
![Page 37: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/37.jpg)
Feature - sorting / scoring• Sorting support works nicely with function
score query.curl -XGET 'localhost:9200/my-index1/my-type/_percolate' -d '{ "doc" : { ... },
"query" : {"function_score" : {
"query" : { "match_all": {}}, "functions" : [
{ "exp" : { "create_date" : { "reference" : "2013/08/14", "scale" : "1000d" } } }
] } } "sort" : true, "size" : 10
}'
Field in query
Thursday, September 5, 13
![Page 38: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/38.jpg)
Feature - sorting / scoring
{ "took": 2, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "total": 2, "matches": [ { "_index": "my-index", "_id": "2", "_score": 0.85559505 }, { "_index": "my-index", "_id": "1", "_score": 0.4002574 } ]}
• Response:
Thursday, September 5, 13
![Page 39: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/39.jpg)
Feature - highlighting
curl -XPUT 'localhost:9200/my-index/_percolator/1' -d '{ "query": { "match" : { "body" : "brown fox" } } }'
curl -XPUT 'localhost:9200/my-index/_percolator/2' -d '{ "query": { "match" : { "body" : "lazy dog" } } }'
• Lets index two queries:
Thursday, September 5, 13
![Page 40: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/40.jpg)
Feature - highlighting
• The size option is required.• All highlight options are supported.
curl -XGET 'localhost:9200/my-index/my-type/percolate' -d '{ "doc" : { "body" : "The quick brown fox jumps over the lazy dog" }, "highlight" : { "fields" : { "body" : {} } }, "size" : 5}'
Thursday, September 5, 13
![Page 41: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/41.jpg)
Feature - highlighting{ ... "total": 2, "matches": [ { "_index": "my-index", "_id": "1", "highlight": { "body": [ "The quick <em>brown</em> <em>fox</em> jumps over the lazy dog" ] } }, { "_index": "my-index", "_id": "2", "highlight": { "body": [ "The quick brown fox jumps over the <em>lazy</em> <em>dog</em>" ] } } ]}
Thursday, September 5, 13
![Page 42: Distributed percolator in elasticsearch](https://reader033.vdocuments.site/reader033/viewer/2022051412/549351c3ac7959482e8b481f/html5/thumbnails/42.jpg)
Feature - multi percolate• Combine multiple percolate requests into a
single request.
{"percolate" : {"index" : "my-index", "type" : "my-tweet"}}{"doc" : {"title" : "coffee percolator"}}{"percolate" : "index" : "my-index", "type" : "my-type", "id" : "1"}{}{"count" : {"index" : "my-index", "type" : "my-type"}}{"doc" : {"title" : "coffee percolator"}}{"count" : "index" : "my-index", "type" : "my-type", "id" : "1"}{}
curl -XGET 'localhost:9200/_mpercolate' --data-binary @requests.txt; echo
requests.txt:
Request:
Thursday, September 5, 13