elasticsearch quick introduction
DESCRIPTION
Elasticsearch and MIT Sloan Data Analytics Hackathon Cambridge, MA - May 10, 2014TRANSCRIPT
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Elasticsearch and MIT Sloan Data Analytics Hackathon Cambridge, MA - May 10, 2014
Elasticsearch Quick Introduction
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
About Me
• Igor Motov
• Developer at Elasticsearch Inc.
• Github: imotov
• Twitter: @imotov
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
About Elasticsearch Inc.
• Founded in 2012 By the people behind the Elasticsearch and Apache Lucene http://www.elasticsearch.com Headquarters: Amsterdam and Los Altos, CA
• We provide Training (public & onsite) Development support Production support subscription (SLA)
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
About Elasticsearch
• Real time search and analytics engine JSON-oriented, Apache Lucene-based
• Automatic Schema Detection Enables control of it when needed
• Distributed Scales Up+Out, Highly Available
• Multi-tenancy Dynamically create/delete indices
• API centric Most functionality is exposed through an API
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Basic Concepts
• Cluster a group of nodes sharing the same set of indices
• Node a running Elasticsearch instance (typically JVM process)
• Index a set of documents of possibly different types stored in one or more shards
• Type a set of documents in an index that share the same schema
• Shard a Lucene index, allocated on one of the nodes
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Basic Concepts - Document
• JSON Object
!
!
!
!
!
!
• Identified by index/type/id
{ "rank": 21, "city": "Boston", "state": "Massachusetts", "population2010": 617594, "land_area": 48.277, "density": 12793, "ansi": 619463, "location": { "lat": 42.332, "lon": 71.0202 }, "abbreviation": "MA" }
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Downloading elasticsearch• http://www.elasticsearch.org/download/
Windows Everything else
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
What’s in a distribution?
. ├── LICENSE.txt ├── NOTICE.txt ├── README.textile ├── bin │ ├── elasticsearch │ ├── elasticsearch.in.sh │ └── plugin ├── config │ ├── elasticsearch.yml │ └── logging.yml ├── data │ └── elasticsearch ├── lib │ ├── elasticsearch-x.y.z.jar │ ├── ... │ └── └── logs ├── elasticsearch.log └── elasticsearch_index_search_slowlog.log
executable scripts
node config files
data storage
libs
log files
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Configuration (multicast)
• Configuration config/elasticsearch.yml
cluster.name: "elasticsearch-imotov"
unique name
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Configuration (stand-alone)
• Configuration config/elasticsearch.yml
cluster.name: "elasticsearch-imotov" network.host: "127.0.0.1" discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: ["localhost:9300", "localhost:9301", “localhost:9302"]
unique name
listen only on localhost
disable multicast
search for other nodes on localhost
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Starting elasticsearch
• Foreground
!
!
• Background
$ bin/elasticsearch
$ bin/elasticsearch -d
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Is it running?
{ "status" : 200, "name" : "Kamal", "version" : { "number" : "1.1.1", "build_hash" : "f1585f096d3f3985e73456debdc1a0745f512bbc", "build_timestamp" : "2014-04-16T14:27:12Z", "build_snapshot" : false, "lucene_version" : "4.7" }, "tagline" : "You Know, for Search" }
$ curl -XGET "http://localhost:9200/?pretty"
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Communicating with Elasticsearch
• REST API Curl Ruby Python PHP Perl JavaScript (community supported)
• Binary Protocol Java
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Pick your client
• Java included in distribution
• Ruby, PHP, Perl, Python http://www.elasticsearch.org/blog/unleash-the-clients-ruby-python-php-perl/
• Everything Else http://www.elasticsearch.org/guide/clients/
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Indexing a document
$ curl -XPUT "http://localhost:9200/test-data/cities/21" -d '{ "rank": 21, "city": "Boston", "state": "Massachusetts", "population2010": 617594, "land_area": 48.277, "density": 12793, "ansi": 619463, "location": { "lat": 42.332, "lon": 71.0202 }, "abbreviation": "MA" }'
{"ok":true,"_index":"test-data","_type":"cities","_id":"21","_version":1}
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Getting a document
{ "_index" : "test-data", "_type" : "cities", "_id" : "21", "_version" : 1, "exists" : true, "_source" : { "rank": 21, "city": "Boston", "state": "Massachusetts", "population2010": 617594, "land_area": 48.277, "density": 12793, "ansi": 619463, "location": { "lat": 42.332, "lon": 71.0202 }, "abbreviation": "MA" } }
$ curl -XGET "http://localhost:9200/test-data/cities/21?pretty"
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Updating a document
$ curl -XPUT "http://localhost:9200/test-data/cities/21" -d '{ "rank": 21, "city": "Boston", "state": "Massachusetts", "population2010": 617594, "population2012": 636479, "land_area": 48.277, "density": 12793, "ansi": 619463, "location": { "lat": 42.332, "lon": 71.0202 }, "abbreviation": "MA" }'
{"ok":true,"_index":"test-data","_type":"cities","_id":"21","_version":2}
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Searching$ curl -XGET 'http://localhost:9200/test-data/cities/_search?pretty' -d '{ "query": { "match": { "city": "Boston" } } }'
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Searching{ "took" : 5, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 6.1357985, "hits" : [ { "_index" : "test-data", "_type" : "cities", "_id" : "21", "_score" : 6.1357985, "_source" : {"rank":"21","city":"Boston",...} } ] } }
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Range Queries
$ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty" -d '{ "query": { "range": { "population2012": { "from": 500000, "to": 1000000 } } } }'
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Boolean Queries
$ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty" -d '{ "query": { "bool": { "should": [{ "match": { "state": "Texas"} }, { "match": { "state": "California"} }], "must": { "range": { "population2012": { "from": 500000, "to": 1000000 } } }, "minimum_should_match": 1 } } }'
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
MatchAll Query$ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty" -d '{ "query": { "match_all": { } } }'
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Sorting and Paging
$ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty" -d '{ "query": { "match_all": { } }, "sort": [ {"state": {"order": "asc"}}, {"population2010": {"order": "desc"}} ], "from": 0, "size": 20 }'
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Analysis
• By default string are - Divided into words (tokens) - All tokens are converted to lower-case
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Analysis Example
• “Elasticsearch is a powerful open source search and analytics engine.”
1. elasticsearch 2. is 3. a 4. powerful 5. open 6. source 7. search 8. and 9. analytics 10. engine
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Customizing the mapping
curl -XPUT 'http://localhost:9200/my_index/' -d '{ "settings": { "index": { "number_of_shards": 1, "number_of_replicas": 0 } }, "mappings": { "my_type": { "properties": { "description": { "type": "string" }, "sku": { "type": "string", "index": "not_analyzed" }, "count": { "type": "integer" }, "price": { "type": "float" }, "location": { "type": "geo_point" } } } } }'
exact match
analyzed text
geo location
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Elasticsearch Reference
• http://www.elasticsearch.org/guide/
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Ideas for hackathon
• Explore data wikipedia twitter enron emails
• Play with Kibana
• Build Elasticsearch plugins
• Get prizes
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
Elasticsearch Meetup
http://www.meetup.com/Elasticsearch-Boston/
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
We are hiring
http://www.elasticsearch.com/about/jobs/