Download - ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup

Roma – 8 Febbraio 2017presenta Alberto Paro, Seacom

ElasticSearch 5.xNew Tricks

Alberto Paro Laureato in Ingegneria Informatica (POLIMI)

Autore di 3 libri su ElasticSearch da 1 a 5.x + 6 Tech review

Lavoro principalmente in Scala e su tecnologie BD (Akka, Spray.io, Playframework, Apache Spark) e NoSQL (Accumulo, Cassandra, ElasticSearch e MongoDB)

Evangelist linguaggio Scala e Scala.JS

Tip 1: Shrink - 1/5

Why? The wrong number of shards during the initial

design sizing. Often sizing the shards without knowing the correct data/text distribution tends to oversize the number of shards

Reducing the number of shards to reduce memory and resource usage

Reducing the number of shards to speed up searching

Tip 1: Shrink - 2/5 - Where is your data?

We can retrieve it via the _nodes API: curl -XGET 'http://localhost:9200/_nodes?pretty'In the result there will be a similar section:.... "nodes" : { "5Sei9ip8Qhee3J0o9dTV4g" : { "name" : "Gin Genie", "transport_address" : "127.0.0.1:9300", "host" : "127.0.0.1", "ip" : "127.0.0.1", "version" : "5.1.1",....The name of my node is Gin Genie

Tip 1: Shrink - 3/5 - Relocate your data

We can change the index settings, forcing allocation to a single node for our index, and disabling the writing for the index.curl -XPUT 'http://localhost:9200/myindex/_settings' -d ’{ "settings": { "index.routing.allocation.require._name": "Gin Genie", "index.blocks.write":

true } }’We can check for the green status: curl -XGET 'http://localhost:9200/_cluster/health?pretty'

Tip 1: Shrink - 4/5 – Shrink our shards

We need to disable the writing for the index via:curl -XPUT 'http://localhost:9200/myindex/_settings?index.blocks.write=true'

The shrink call for creating the reduced_index, will be:curl -XPOST 'http://localhost:9200/myindex/_shrink/reduced_index' -d '{ "settings": { "index.number_of_replicas": 1, "index.number_of_shards": 1, "index.codec": "best_compression” }, "aliases": {"my_search_indices": {}} }'

Tip 1: Shrink - 5/5 – Post Shrinking

We can also wait for a yellow status if the index it is ready to work: curl -XGET 'http://localhost:9200/_cluster/health? wait_for_status=yellow’

Now we can remove the read-only by changing the index settings: curl -XPUT 'http://localhost:9200/myindex/_settings? index.blocks.write=true'

Tip 2: Reindex - 1/2

Why? Changing an analyzer for a mapping Adding a new subfield to a mapping and you need

to reprocess all the records to search for the new subfield

Removing an unused mapping Changing a record structure that requires a new

mapping

Tip 2: Reindex - 2/2

curl -XPOST 'http://localhost:9200/_reindex?pretty=true' -d '{ "source": { "index": "myindex” "type": "mytype", "query": "…" }, "dest": { "index": "myindex2", "script": "…" }}'

Tip 3: Update By Query with painless

Add a new Field1. Create your mapping (i.e modified: date)2. Call an update by querycurl -XPOST http://$server/$index/$mapping/_update_by_query -d '{ "script": { "inline": "ctx._source.modified=\"2015-10-06T00:00:00.000+00:00\"", "lang": "painless” }, "query": { "bool": {"must_not":[{"exists":{"field":"modified"} }]} }}'

Tip 4: Use search_after

Step 1: curl -XGET 'http://$server/$index/$type/_search' -d ’{ "size": 100, "query": { "match_all" : {} }, "sort": [{"_uid": "desc"} ]}’Step n, n>1: curl -XGET 'http://$server/$index/$type/_search' -d ’{ "size": 100, "query": { "match_all" : {} }, "search_after": ["$type#100"], "sort": [{"_uid": "desc"} ]}’

Tip 5: Reindex for a remote node – 1/2

Why? The backup is a safe Lucene index copy, so it depends on the

Elasticsearch version used. If you are switching from a version of Elastisearch that is prior to version 5.x, it's not possible to restore old indices.

It's not possible to restore backups of a newer Elasticsearch version in an older version. The restore is only forward-compatible.

It's not possible to restore partial data from a backup.

Tip 5: Reindex for a remote node – 2/2

In config/elasticsearch.yml add:reindex.remote.whitelist: ["192.168.1.227:9200"] Then:curl -XPOST "http://$server/_reindex" -d' { "source": { "remote": { "host": "http://192.168.1.227:9200" }, "index": "test-source” }, "dest": { "index": "test-dest” } }'

Tip 6: Ingest Pipeline – 1/2

Why Adding/Removing fields without changing your code Manipulate your records before ingesting

Computed fields Also supports scripting

Tip 6: Ingest Pipeline – 2/2

curl -XPUT 'http://127.0.0.1:9200/_ingest/pipeline/add-user-john' -d '{ "description" : "Add user john field", "processors" : [ { "set" : { "field": "user", "value": "john"} } ], "version":1 }’

curl -XPUT http://$server/$index/$type/$id?pipeline=add-user-john -d '{}'

Grazie per l’attenzione

Alberto Paro

Download - ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup

Top Related