Roma – 8 Febbraio 2017presenta Alberto Paro, Seacom
ElasticSearch 5.xNew Tricks
Alberto Paro Laureato in Ingegneria Informatica (POLIMI)
Autore di 3 libri su ElasticSearch da 1 a 5.x + 6 Tech review
Lavoro principalmente in Scala e su tecnologie BD (Akka, Spray.io, Playframework, Apache Spark) e NoSQL (Accumulo, Cassandra, ElasticSearch e MongoDB)
Evangelist linguaggio Scala e Scala.JS
Tip 1: Shrink - 1/5
Why? The wrong number of shards during the initial
design sizing. Often sizing the shards without knowing the correct data/text distribution tends to oversize the number of shards
Reducing the number of shards to reduce memory and resource usage
Reducing the number of shards to speed up searching
Tip 1: Shrink - 2/5 - Where is your data?
We can retrieve it via the _nodes API: curl -XGET 'http://localhost:9200/_nodes?pretty'In the result there will be a similar section:.... "nodes" : { "5Sei9ip8Qhee3J0o9dTV4g" : { "name" : "Gin Genie", "transport_address" : "127.0.0.1:9300", "host" : "127.0.0.1", "ip" : "127.0.0.1", "version" : "5.1.1",....The name of my node is Gin Genie
Tip 1: Shrink - 3/5 - Relocate your data
We can change the index settings, forcing allocation to a single node for our index, and disabling the writing for the index.curl -XPUT 'http://localhost:9200/myindex/_settings' -d ’{ "settings": { "index.routing.allocation.require._name": "Gin Genie", "index.blocks.write":
true } }’We can check for the green status: curl -XGET 'http://localhost:9200/_cluster/health?pretty'
Tip 1: Shrink - 4/5 – Shrink our shards
We need to disable the writing for the index via:curl -XPUT 'http://localhost:9200/myindex/_settings?index.blocks.write=true'
The shrink call for creating the reduced_index, will be:curl -XPOST 'http://localhost:9200/myindex/_shrink/reduced_index' -d '{ "settings": { "index.number_of_replicas": 1, "index.number_of_shards": 1, "index.codec": "best_compression” }, "aliases": {"my_search_indices": {}} }'
Tip 1: Shrink - 5/5 – Post Shrinking
We can also wait for a yellow status if the index it is ready to work: curl -XGET 'http://localhost:9200/_cluster/health? wait_for_status=yellow’
Now we can remove the read-only by changing the index settings: curl -XPUT 'http://localhost:9200/myindex/_settings? index.blocks.write=true'
Tip 2: Reindex - 1/2
Why? Changing an analyzer for a mapping Adding a new subfield to a mapping and you need
to reprocess all the records to search for the new subfield
Removing an unused mapping Changing a record structure that requires a new
mapping
Tip 2: Reindex - 2/2
curl -XPOST 'http://localhost:9200/_reindex?pretty=true' -d '{ "source": { "index": "myindex” "type": "mytype", "query": "…" }, "dest": { "index": "myindex2", "script": "…" }}'
Tip 3: Update By Query with painless
Add a new Field1. Create your mapping (i.e modified: date)2. Call an update by querycurl -XPOST http://$server/$index/$mapping/_update_by_query -d '{ "script": { "inline": "ctx._source.modified=\"2015-10-06T00:00:00.000+00:00\"", "lang": "painless” }, "query": { "bool": {"must_not":[{"exists":{"field":"modified"} }]} }}'
Tip 4: Use search_after
Step 1: curl -XGET 'http://$server/$index/$type/_search' -d ’{ "size": 100, "query": { "match_all" : {} }, "sort": [{"_uid": "desc"} ]}’Step n, n>1: curl -XGET 'http://$server/$index/$type/_search' -d ’{ "size": 100, "query": { "match_all" : {} }, "search_after": ["$type#100"], "sort": [{"_uid": "desc"} ]}’
Tip 5: Reindex for a remote node – 1/2
Why? The backup is a safe Lucene index copy, so it depends on the
Elasticsearch version used. If you are switching from a version of Elastisearch that is prior to version 5.x, it's not possible to restore old indices.
It's not possible to restore backups of a newer Elasticsearch version in an older version. The restore is only forward-compatible.
It's not possible to restore partial data from a backup.
Tip 5: Reindex for a remote node – 2/2
In config/elasticsearch.yml add:reindex.remote.whitelist: ["192.168.1.227:9200"] Then:curl -XPOST "http://$server/_reindex" -d' { "source": { "remote": { "host": "http://192.168.1.227:9200" }, "index": "test-source” }, "dest": { "index": "test-dest” } }'
Tip 6: Ingest Pipeline – 1/2
Why Adding/Removing fields without changing your code Manipulate your records before ingesting
Computed fields Also supports scripting
Tip 6: Ingest Pipeline – 2/2
curl -XPUT 'http://127.0.0.1:9200/_ingest/pipeline/add-user-john' -d '{ "description" : "Add user john field", "processors" : [ { "set" : { "field": "user", "value": "john"} } ], "version":1 }’
curl -XPUT http://$server/$index/$type/$id?pipeline=add-user-john -d '{}'
Grazie per l’attenzione
Alberto Paro
Q&A