realtime analytics with elasticsearch [new media inspiration 2013]

Post on 27-Jan-2015

119 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

A presentation from the New Media Inspiration 2013 conference (http://www.tuesday.cz/akce/new-media-inspiration-2013/) about using Elasticsearch's faceting features for realtime analytics of big data.

TRANSCRIPT

Real time analyticsof big data with Elasticsearch

Karel Minařík

JSON

Facets

Analytics

http://www.youtube.com/watch?v=-GftBySG99Q

Realtime Analytics With ElasticSearch

http://karmi.cz

http://elasticsearch.com

Realtime Analytics With ElasticSearch

Using a search engine for analytics?

wat?

A collection of documentsHOW DOES SEARCH WORK?

file_1.txtThe  ruby  is  a  pink  to  blood-­‐red  colored  gemstone  ...

file_2.txtRuby  is  a  dynamic,  reflective,  general-­‐purpose  object-­‐oriented  programming  language  ...

file_3.txt"Ruby"  is  a  song  by  English  rock  band  Kaiser  Chiefs  ...

How do you search documents?HOW DOES SEARCH WORK?

File.read('file_1.txt').include?('ruby')File.read('file_2.txt').include?('ruby')...

The inverted indexHOW DOES SEARCH WORK?

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices

TOKENS POSTINGS

ruby file_1.txt file_2.txt file_3.txt

pink file_1.txt

gemstone file_1.txt

dynamic file_2.txt

reflective file_2.txt

programming file_2.txt

song file_3.txt

english file_3.txt

rock file_3.txt

The inverted indexHOW DOES SEARCH WORK?

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices

ruby file_1.txt file_2.txt file_3.txt

pink file_1.txt

gemstone file_1.txt

dynamic file_2.txt

reflective file_2.txt

programming file_2.txt

song file_3.txt

english file_3.txt

rock file_3.txt

search  "ruby"

The inverted indexHOW DOES SEARCH WORK?

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices

pink file_1.txt

gemstone file_1.txt

dynamic file_2.txt

reflective file_2.txt

programming file_2.txt

song file_3.txt

english file_3.txt

rock file_3.txt

search  "song"

ruby file_1.txt file_2.txt file_3.txt

The inverted indexHOW DOES SEARCH WORK?

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices

ruby file_1.txt file_2.txt file_3.txt

pink file_1.txt

gemstone file_1.txt

dynamic file_2.txt

reflective file_2.txt

programming file_2.txt

english file_3.txt

rock file_3.txt

search  "ruby  AND  song"

song file_3.txt

The inverted indexHOW DOES SEARCH WORK?

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices

TOKENS POSTINGS

ruby file_1.txt file_2.txt file_3.txt

pink file_1.txt

gemstone file_1.txt

dynamic file_2.txt

reflective file_2.txt

programming file_2.txt

song file_3.txt

english file_3.txt

rock file_3.txt

31

Statistics!

http://elasticsearch.org

Realtime Analytics With ElasticSearch

ElasticSearch is an open source, scalable, distributed, cloud-ready, highly-available full-text search engine and database with powerful aggregation features, communicating by JSON over RESTful HTTP, based on Apache Lucene.

Faceted NavigationFACETS

http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/

Query

Facets

Faceted Navigation with ElasticsearchFACETS

curl  "http://localhost:9200/people/_search?pretty=true"  -­‐d  '{    "query"  :  {        "match"  :  {  "name"  :  "John"}    },    "filter"  :  {        "terms"  :  {  "employer"  :  ["IBM"]  }    },    "facets"  :  {        "employer"  :  {            "terms"  :  {                    "field"  :  "employer",                    "size"    :  3            }        }    }}'

User query

“Checkboxes”

Facets

http://www.elasticsearch.org/guide/reference/api/search/facets/index.html

"facets"  :  {        "employer"  :  {            "missing"  :  0,            "total"  :  10,            "other"  :  3,            "terms"  :  [  {                "term"  :  "ibm",                "count"  :  3            },  {                "term"  :  "twitter",                "count"  :  2            },  {                "term"  :  "apple",                "count"  :  2            }  ]        }    }

Response

Visualizing the FacetsFACETS

http://mbostock.github.com/d3/tutorial/bar-1.html

"facets"  :  {        "employer"  :  {            "missing"  :  0,            "total"  :  10,            "other"  :  3,            "terms"  :  [  {                "term"  :  "ibm",                "count"  :  3            },  {                "term"  :  "twitter",                "count"  :  2            },  {                "term"  :  "apple",                "count"  :  2            }  ]        }    }

d3.js ~ A Bar Chart, Part 1

DEMO: http://bl.ocks.org/4571766

Visualizing the FacetsFACETS

Visualizing the FacetsFACETS

Visualizing the FacetsFACETS

http://demo.kibana.org

Realtime Analytics With ElasticSearch

‣No batch orientation‣No stats precomputation and caching‣No predefined metrics or schemas

Important Concepts

‣Combination of free text search, structured search, and facets‣ Scripting for performing ad–hoc analytics‣ Extendable: write your own facet types

ScriptingFACETS

curl -X DELETE localhost:9200/demo-articlescurl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }'

curl -X PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}'curl -X PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}'curl -X PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}'curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}'curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}'curl -X POST localhost:9200/demo-articles/_refresh

curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{ "facets": { "popular-domains": { "terms": { "field" : "url",

"script" : "term.replace(new RegExp(\"https?://\"), \"\").split(\"/\")[0]", "lang" : "javascript" } } }}'

Extract and aggregate most popular domains from article URLs

"facets"  :  {        "popular-­‐domains"  :  {            //  ...            "terms"  :  [  {                "term"  :  "some.blogger.com",  "count"  :  3            },  {                "term"  :  "github.com",  "count"  :  1            }  ]        }    }

Response

DemonstrationsFACETS

curl -X DELETE localhost:9200/demo-articlescurl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }'

curl -X PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}'curl -X PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}'curl -X PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}'curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}'curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}'curl -X POST localhost:9200/demo-articles/_refresh

curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{ "facets": { "popular-domains": { "terms": { "field" : "url",

"script" : "term.replace(new RegExp(\"https?://\"), \"\").split(\"/\")[0]", "lang" : "javascript" } } }}'

Extract and aggregate most popular domains from article URLs

"facets"  :  {        "popular-­‐domains"  :  {            //  ...            "terms"  :  [  {                "term"  :  "some.blogger.com",  "count"  :  3            },  {                "term"  :  "github.com",  "count"  :  1            }  ]        }    }

Response

Demo

Thanks!d

top related