scalability and real-time queries with elasticsearch

29
Nov 21, 2015 Sofia var title = “Scalability and Real-time Queries with Elasticsearch”; var info = { name: “Ivelin Andreev”, otherOptional: “ };

Upload: ivo-andreev

Post on 13-Feb-2017

315 views

Category:

Software


0 download

TRANSCRIPT

Nov 21, 2015

Sofia

var title = “Scalability and Real-time

Queries with Elasticsearch”;

var info = {

name: “Ivelin Andreev”,

otherOptional: “…”

};

Nov 21, 2015

About me

• Project Manager @

o 13 years professional experience

o .NET Web Development MCPD

o SQL Server 2012 (MCSA)

• External Expert Horizon 2020

• Business Interests

o Web Development, SOA, Integration

o Security & Performance Optimization

• Contacto [email protected]

o www.linkedin.com/in/ivelin

o www.slideshare.net/ivoandreev

agenda();

Nov 21, 2015

• What?

• Why?

• First steps

• Analyzers in depth

• From RDBMS to Elasticsearch

• Demo

Nov 21, 2015

What is ES

• Powerful real-time search and analytics engine

“…It has a very advanced distributed model, speaks JSONnatively, and exposes many advanced search features,

all seamlessly expressed through JSON DSL…”

Shay Banon – Creator, Founder, CTO

• What else…o Document-oriented

o Sophisticated RESTful API

o Entirely open source

o Based on Apache Lucene

o Requires JAVA

Nov 21, 2015

Popularity (Search Engines)

Nov 21, 2015

Who Uses ES

Nov 21, 2015

First Steps in Elasticsearch

“You don’t learn walk by following

rules. You learn by doing”

(Richard Branson)

Nov 21, 2015

Terms

ElasticSearch RDBMS

Index Database

Type Table

Document Row

Field Column

Scaling

Cluster; Node; Shard (Primary/ Replica)

Nov 21, 2015

RESTful APIs

• Document APIs o Index, Get, Update, Delete

o Bulk API available

• Search APIso Send/Receive JSON

o Basic queries via query string

http://localhost:9200/{indexName}/{type}/_search?q=searchstr&size=100

http://localhost:9200/{index1,index2}/{type}/_search?q=createdby:ivo

http://localhost:9200/_search?q=tag:spam

POST /[index]/[type] {“…”,”…” }

GET /[index]/[type]/[ID] { }PUT /[index]/[type]/[ID] {

“…”,”…” }DELETE /[index]/[type]/[ID]

Nov 21, 2015

Query DSL

• Entire JSON object is the Query DSL

• Query

o Full text queries

o Results ordered by relevance

o Every field is searchable

• Filter

o Binary – either a field matches or it does not

• Filters and queries can be nested

o Nesting passes relevance to parents

Nov 21, 2015

Query - for full-text search or for any condition

that should affect the relevance score

Filter – for everything else

Nov 21, 2015

• ES provides 27 filters (Sep 2015)

• Term/Terms filter{ "term": { "date": "2015-10-10" }}

• Range filter{"range": {"age": {"gte":20, "lt":30}}}

• Exists/Missing filter{"exists": {"field": "title"}}

• Bool filter{"bool": {

"must": { "term": { "folder": "inbox" }},

"must_not": { "term": { "tag": "spam" }}

"should": [{ "term": { "starred": true }}, { "term": { "unread": true }}]

}}

How To (Filters)

Nov 21, 2015

How To (Queries)

• ES provides 38 queries (Sep 2015)

• match query{ "match": { "tweet": "About Search" }

• multi_match query{ "multi_match": {

"query": "full text search",

"fields": [ "title", "body" ] }}

• bool query{ "bool": {"must": { "match": { "title": "how to make millions" }},"must_not": { "match": { "tag": "spam" }},"should": [

{ "match": { "tag": "starred" }},{ "range": { "date": { "gte": "2014-01-01" }}}

]}}

• fuzzy query

Nov 21, 2015

Filters

• Boolean (Y/N)

• Exact values

• No analysis

• Cached

• Faster

Queries

• Relevance

• Full text

• Analysis

• Not cached

• Slower

Queries vs. Filters

Nov 21, 2015

Any index search solution is way better than “LIKE”

Nov 21, 2015

How does SQL Full-text Index Work

• Column-level languageo Used by stemmers and tokenizers

o Different columns for different languages

o Language tags are respected (XML, binary)

• Stop wordsALTER FULLTEXT STOPLIST ProductSL

ADD ‘blah' LANGUAGE 1033;

• Thesaurus fileso (i.e. “song”->”tune”)

Nov 21, 2015

Inverted Index

Nov 21, 2015

ES Analysis Process

• Character filterso Simplify data (“&” -> “and”, “ü” -> “u”)

• Tokenizerso Split data into words (terms, tokens)

• Token filterso Lowercase

o Remove words w/o relevance impact (“a”, “the”)

o Synonyms added

• Stemmingo Reduce to root form (“dogs” -> “dog”)

Nov 21, 2015

Analyzers

• FT fields are analyzed into terms to create inverted index

• Configured when index is created

"Set the shape to semi-transparent by calling set_trans(5)"

Analyzer Type Example

Whitespace Set, the, shape, to, semi-transparent, by, calling, set_trans(5)

Standard (Def.) set, the, shape, to, semi, transparent, by, calling, set_trans, 5

Simple set, the, shape, to, semi, transparent, by, calling, set, trans

Stop set, the, shape, to, semi, transparent, by, calling, set, trans

Language (EN) set, shape, semi, transparent, calling, set_trans, 5

Pattern “nonword”:{ “type”: “pattern”, “pattern”:”[^\\w]+” }

Custom Allows combination of Tokenizer[1:1] and TokenFilters[0:N]

Nov 21, 2015

Security Remarks

• RAM is Important

o Data structures reside in-memory

o Performance and reliability depend on it

oBe Aware

• No authentication!

• Protect private data alone

• Prevent expensive requests (DoS)

• Protect http://localhost:9200

Nov 21, 2015

Side by Side

ElasticSearch SQL Full-text Search

Performance RAM mainly Disk I/O mainly

Licensing Open Source Commercial

Platform Any (Java) Windows Only

Wildcards Yes Partly

FTS Syntax Rich Basic

Extensibility Plugins CLR or custom code

Scale Out Yes No

Relational Integrity No Yes

Security No Yes

FT Search Setup Manual Wizard

Index Update Manual Auto

Nov 21, 2015

From SQL to Elasticsearch

• Rivers (deprecated)

• Logstasho Open source log management tool

• Client librarieso .NET

• Elasticsearch.Net

• Nest

o Also Java, JS, Perl, Python, Ruby, PHP

Nov 21, 2015

Summary

• Not a replacement of RDBMS

• Real-time search applications

• Built for scalability

• Easy to install

• RESTful API and JSON

Nov 21, 2015

Nov 21, 2015

Deployment (Windows)

Install Java

Download ES zip

Install [ESHome]/bin> service install

Set ES service to start automatically [ESHome]/bin> service manager

Open in browser http://localhost:9200/

Plugin Install [ESHome]/bin> plugin -i elasticsearch/marvel/latest

Restart ES

Nov 21, 2015

Takeaways

• Toolso Kopf: https://github.com/lmenezes/elasticsearch-kopf

o Marvel: https://www.elastic.co/products/marvel

o Curl: http://curl.haxx.se/download.html

o JDBC Driver: http://www.java2s.com/Code/Jar/s/Downloadsqljdbc430jar.htm

• Communityo https://discuss.elastic.co (yes “.co”, not “.com”)

• Getting Startedo http://joelabrahamsson.com/elasticsearch-101/

Nov 21, 2015

Thanks to our Sponsors:

General Sponsor:

Gold Sponsors:

Media Partners:

Technological Partners:

Hosting Partner: