elasticsearch for sql users
TRANSCRIPT
1
Shaunak Kashyap Developer at Elastic @shaunak
Elasticsearch for SQL users
The Elastic Stack
2
Store, Index & Analyze
Ingest
User Interface
Plugins
Hosted Service
3
Agenda
Search queries
Data modeling
Architecture
1
2
3
2
4
Agenda
Search queries
Data modeling
Architecture
1
3
5
Agenda
Search queries
Data modeling
1
2
3 Architecture
6
Search Queries
https://www.flickr.com/photos/samhames/4422128094
7
CREATE TABLE IF NOT EXISTS emails ( sender VARCHAR(255) NOT NULL, recipients TEXT, cc TEXT, bcc TEXT, subject VARCHAR(1024), body MEDIUMTEXT, datetime DATETIME );
CREATE INDEX emails_sender ON emails(sender); CREATE FULLTEXT INDEX emails_subject ON emails(subject); CREATE FULLTEXT INDEX emails_body ON emails(body);
curl -XPUT 'http://localhost:9200/enron' -d' { "mappings": { "email": { "properties": { "sender": { "type": "keyword" }, "recipients": { "type": "keyword" }, "cc": { "type": "keyword" }, "bcc": { "type": "keyword" }, "subject": { "type": "text", "analyzer": "english" }, "datetime": { "type": "date" } } } }
Schemas
8
Loading the data
9
[LIVE DEMO]
• Search for text in a single field
• Search for text in multiple fields
• Search for a phrase
https://github.com/ycombinator/es-enron
10
Other Search Features
Stemming Synonyms Did you mean?
• Jump, jumped, jumping • Queen, monarch • Monetery => Monetary
11
Data Modeling
https://www.flickr.com/photos/samhames/4422128094https://www.flickr.com/photos/ericparker/7854157310
12
To analyze (text) or not to analyze (keyword)?
PUT cities/city/1 { "city": "Raleigh", "population": 431746 }
PUT cities/city/2 { "city": "New Albany", "population": 8829 }
PUT cities/city/3 { "city": "New York", "population": 8406000 }
POST cities/_search { "query": { "match": { "city": "New Albany" } } }
QUERY
+ = ?
PUT cities/city/1 { "city": "Raleigh", "population": 431746 }
13
To analyze (text) or not to analyze (keyword)?
PUT cities/city/2 { "city": "New Albany", "population": 8829 }
PUT cities/city/3 { "city": "New York", "population": 8406000 }
Term Document IDs
albany 2
new 2,3
raleigh 1
york 3
14
To analyze (text) or not to analyze (keyword)?
PUT cities { "mappings": { "city": { "properties": { "city": { "type": "keyword" } } } } }
MAPPING
Term Document IDs
New Albany 2
New York 3
Raleigh 1
PUT blog/post/1 { "author_id": 1, "title": "...", "body": "..." }
PUT blog/post/2 { "author_id": 1, "title": "...", "body": "..." }
PUT blog/post/3 { "author_id": 1, "title": "...", "body": "..." }
15
Relationships: Application-side joinsPUT blog/author/1 { "name": "John Doe", "bio": "..." }
POST blog/author/_search { "query": { "match": { "name": "John" } } }
QUERY 1
POST blog/post/_search { "query": { "match": { "author_id": <each id from query 1 result> } } }
QUERY 2
PUT blog/post/1 { "author_name": "John Doe", "title": "...", "body": "..." }
PUT blog/post/2 { "author_name": "John Doe", "title": "...", "body": "..." }
16
Relationships: Data denormalization
POST blog/post/_search { "query": { "match": { "author_name": "John" } } }
QUERY
PUT blog/post/3 { "author_name": "John Doe", "title": "...", "body": "..." }
17
Relationships: Nested objects
PUT blog/author/1 { "name": "John Doe", "bio": "...", "blog_posts": [ { "title": "...", "body": "..." }, { "title": "...", "body": "..." }, { "title": "...", "body": "..." } ] }
POST blog/author/_search { "query": { "match": { "name": "John" } } }
QUERY
18
Relationships: Parent-child documents
PUT blog/author/1 { "name": "John Doe", "bio": "..." }
POST blog/post/_search { "query": { "has_parent": { "type": "author", "query": { "match": { "name": "John" } } }
QUERY
PUT blog { "mappings": { "author": {}, "post": { "_parent": { "type": "author" } } } } PUT blog/post/1?parent=1
{ "title": "...", "body": "..." }
PUT blog/post/2?parent=1 { "title": "...", "body": "..." }
PUT blog/post/3?parent=1 { "title": "...", "body": "..." }
19
Architecture
https://www.flickr.com/photos/samhames/4422128094https://www.flickr.com/photos/haribote/4871284379/
20
RDBMS Triggers
database by Creative Stall from the Noun Project
1 2
21
Async replication to Elasticsearch
12 3
ESSynchronizer
flow by Yamini Ahluwalia from the Noun Project
22
Async replication to Elasticsearch with Logstash
12 3
23
Forked writes from application
1
2
24
Forked writes from application (more robust)
1
2
queue by Huu Nguyen from the Noun Project
ESSynchronizer3 4
25
Forked writes from application (more robust with Logstash)
1
23
4
26
Questions?
@shaunak
https://www.flickr.com/photos/nicknormal/2245559230/