scalability and real-time queries with elasticsearch
TRANSCRIPT
Nov 21, 2015
Sofia
var title = “Scalability and Real-time
Queries with Elasticsearch”;
var info = {
name: “Ivelin Andreev”,
otherOptional: “…”
};
Nov 21, 2015
About me
• Project Manager @
o 13 years professional experience
o .NET Web Development MCPD
o SQL Server 2012 (MCSA)
• External Expert Horizon 2020
• Business Interests
o Web Development, SOA, Integration
o Security & Performance Optimization
• Contacto [email protected]
o www.linkedin.com/in/ivelin
o www.slideshare.net/ivoandreev
agenda();
Nov 21, 2015
• What?
• Why?
• First steps
• Analyzers in depth
• From RDBMS to Elasticsearch
• Demo
Nov 21, 2015
What is ES
• Powerful real-time search and analytics engine
“…It has a very advanced distributed model, speaks JSONnatively, and exposes many advanced search features,
all seamlessly expressed through JSON DSL…”
Shay Banon – Creator, Founder, CTO
• What else…o Document-oriented
o Sophisticated RESTful API
o Entirely open source
o Based on Apache Lucene
o Requires JAVA
Nov 21, 2015
Popularity (All DB Engines)
All DB Engines Ranking
Nov 21, 2015
First Steps in Elasticsearch
“You don’t learn walk by following
rules. You learn by doing”
(Richard Branson)
Nov 21, 2015
Terms
ElasticSearch RDBMS
Index Database
Type Table
Document Row
Field Column
Scaling
Cluster; Node; Shard (Primary/ Replica)
Nov 21, 2015
RESTful APIs
• Document APIs o Index, Get, Update, Delete
o Bulk API available
• Search APIso Send/Receive JSON
o Basic queries via query string
http://localhost:9200/{indexName}/{type}/_search?q=searchstr&size=100
http://localhost:9200/{index1,index2}/{type}/_search?q=createdby:ivo
http://localhost:9200/_search?q=tag:spam
POST /[index]/[type] {“…”,”…” }
GET /[index]/[type]/[ID] { }PUT /[index]/[type]/[ID] {
“…”,”…” }DELETE /[index]/[type]/[ID]
Nov 21, 2015
Query DSL
• Entire JSON object is the Query DSL
• Query
o Full text queries
o Results ordered by relevance
o Every field is searchable
• Filter
o Binary – either a field matches or it does not
• Filters and queries can be nested
o Nesting passes relevance to parents
Nov 21, 2015
Query - for full-text search or for any condition
that should affect the relevance score
Filter – for everything else
Nov 21, 2015
• ES provides 27 filters (Sep 2015)
• Term/Terms filter{ "term": { "date": "2015-10-10" }}
• Range filter{"range": {"age": {"gte":20, "lt":30}}}
• Exists/Missing filter{"exists": {"field": "title"}}
• Bool filter{"bool": {
"must": { "term": { "folder": "inbox" }},
"must_not": { "term": { "tag": "spam" }}
"should": [{ "term": { "starred": true }}, { "term": { "unread": true }}]
}}
How To (Filters)
Nov 21, 2015
How To (Queries)
• ES provides 38 queries (Sep 2015)
• match query{ "match": { "tweet": "About Search" }
• multi_match query{ "multi_match": {
"query": "full text search",
"fields": [ "title", "body" ] }}
• bool query{ "bool": {"must": { "match": { "title": "how to make millions" }},"must_not": { "match": { "tag": "spam" }},"should": [
{ "match": { "tag": "starred" }},{ "range": { "date": { "gte": "2014-01-01" }}}
]}}
• fuzzy query
Nov 21, 2015
Filters
• Boolean (Y/N)
• Exact values
• No analysis
• Cached
• Faster
Queries
• Relevance
• Full text
• Analysis
• Not cached
• Slower
Queries vs. Filters
Nov 21, 2015
How does SQL Full-text Index Work
• Column-level languageo Used by stemmers and tokenizers
o Different columns for different languages
o Language tags are respected (XML, binary)
• Stop wordsALTER FULLTEXT STOPLIST ProductSL
ADD ‘blah' LANGUAGE 1033;
• Thesaurus fileso (i.e. “song”->”tune”)
Nov 21, 2015
ES Analysis Process
• Character filterso Simplify data (“&” -> “and”, “ü” -> “u”)
• Tokenizerso Split data into words (terms, tokens)
• Token filterso Lowercase
o Remove words w/o relevance impact (“a”, “the”)
o Synonyms added
• Stemmingo Reduce to root form (“dogs” -> “dog”)
Nov 21, 2015
Analyzers
• FT fields are analyzed into terms to create inverted index
• Configured when index is created
"Set the shape to semi-transparent by calling set_trans(5)"
Analyzer Type Example
Whitespace Set, the, shape, to, semi-transparent, by, calling, set_trans(5)
Standard (Def.) set, the, shape, to, semi, transparent, by, calling, set_trans, 5
Simple set, the, shape, to, semi, transparent, by, calling, set, trans
Stop set, the, shape, to, semi, transparent, by, calling, set, trans
Language (EN) set, shape, semi, transparent, calling, set_trans, 5
Pattern “nonword”:{ “type”: “pattern”, “pattern”:”[^\\w]+” }
Custom Allows combination of Tokenizer[1:1] and TokenFilters[0:N]
Nov 21, 2015
Security Remarks
• RAM is Important
o Data structures reside in-memory
o Performance and reliability depend on it
oBe Aware
• No authentication!
• Protect private data alone
• Prevent expensive requests (DoS)
• Protect http://localhost:9200
Nov 21, 2015
Side by Side
ElasticSearch SQL Full-text Search
Performance RAM mainly Disk I/O mainly
Licensing Open Source Commercial
Platform Any (Java) Windows Only
Wildcards Yes Partly
FTS Syntax Rich Basic
Extensibility Plugins CLR or custom code
Scale Out Yes No
Relational Integrity No Yes
Security No Yes
FT Search Setup Manual Wizard
Index Update Manual Auto
Nov 21, 2015
From SQL to Elasticsearch
• Rivers (deprecated)
• Logstasho Open source log management tool
• Client librarieso .NET
• Elasticsearch.Net
• Nest
o Also Java, JS, Perl, Python, Ruby, PHP
Nov 21, 2015
Summary
• Not a replacement of RDBMS
• Real-time search applications
• Built for scalability
• Easy to install
• RESTful API and JSON
Nov 21, 2015
Deployment (Windows)
Install Java
Download ES zip
Install [ESHome]/bin> service install
Set ES service to start automatically [ESHome]/bin> service manager
Open in browser http://localhost:9200/
Plugin Install [ESHome]/bin> plugin -i elasticsearch/marvel/latest
Restart ES
Nov 21, 2015
Takeaways
• Toolso Kopf: https://github.com/lmenezes/elasticsearch-kopf
o Marvel: https://www.elastic.co/products/marvel
o Curl: http://curl.haxx.se/download.html
o JDBC Driver: http://www.java2s.com/Code/Jar/s/Downloadsqljdbc430jar.htm
• Communityo https://discuss.elastic.co (yes “.co”, not “.com”)
• Getting Startedo http://joelabrahamsson.com/elasticsearch-101/
Nov 21, 2015
Thanks to our Sponsors:
General Sponsor:
Gold Sponsors:
Media Partners:
Technological Partners:
Hosting Partner: