otis gospodnetic search analytics lucene eurocon 2011
DESCRIPTION
See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011TRANSCRIPT
![Page 1: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/1.jpg)
Search Analytics
Business Value&
NoSQL Backend
Otis Gospodnetić – Sematext International@otisg ◦ @sematext ◦ sematext.com
sematext.com/search-analytics
![Page 2: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/2.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.2
About Otis Gospodnetić
• ASF Member: Lucene, Solr, Nutch, Mahout
• Author: Lucene in Action 1 & 2
• Entrepreneur: Sematext, Simpy
![Page 3: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/3.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.3
Sematext Metrics
100% organic: no GMO, no VC 4 years old < 10 people 7 countries 3 timezones 2 continents > 100 customers
![Page 4: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/4.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.4
About Sematext
Products & Services
Consulting, Development, Tech Support:
Search (Lucene, Solr, ElasticSearch...) Big Data (Hadoop, HBase,
Voldemort...) Web Crawling (Nutch, Droids) Machine Learning (Mahout)
![Page 5: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/5.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.5
Agenda
What is Search Analytics and why it matters
Example reports and their value What we built, why, and how
![Page 6: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/6.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.6
Communication
twitter.com/sematext twitter.com/otisg hash tags: #stsa or #stanalytics http://sematext.com/search-analytics/index.html Raise your hand! [email protected]
![Page 7: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/7.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.7
The Compass
Search logs are your Map
Search Analytics is your Compass
![Page 8: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/8.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.8
High Level Why
searchusers
searchproviders
searchexperience
![Page 9: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/9.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.9
High Level Why
searchproviders
searchexperience
This search sucks!It takes 17 tries to find anything here!
F!?@#$%^&?!?
searchusers
Cool, the latest search tweaks made our site really sticky!
Awesome!
![Page 10: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/10.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.10
Don't Be Like This Dude
![Page 11: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/11.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.11
Got Clue?
Search Analytics
Performance Monitoring
Quality Assurance
Tuning UI
![Page 12: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/12.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.12
More Concrete Why
Measure and monitor everything. Introspection. Supports (re)design, navigation choices Helps with content acquisition & enhancement Improve search experience Mula
![Page 13: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/13.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.13
The Moment of Truth
Question for the audience #1
What do you use for Search Analytics?
a) Home grown stuffb) Google Analyticsc) Omnitured) Webtrendse) Otherf ) Nothing
![Page 14: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/14.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.14
Search Analytics Outline
Collect: queries & clicks & interactions & ... Analyze: actions / xactions / conversions Output: reports – over time Output++: feedback loop
The means, not the goal Ongoing, not one-off
remember this
![Page 15: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/15.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.15
Search vs. Web Analytics
User intent and information needs vs. inferring Hand in hand Ideally you can relate data from both or even
unify it
![Page 16: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/16.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.16
Example Core Reports
Rate & Volume, Latency (mean, avg, 90%) Click Through Rate, Mean Reciprocal Rank Top Queries by count, clicks, 0 hits... Query Trending Top Seen Docs, Top Clicked Docs (msft) Page & Click Depth Facet & Sort Usage ...
![Page 17: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/17.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.17
More Reports in More Detail
See Search Analytics What? Why? How?
http://blog.sematext.com/tag/analytics/
![Page 18: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/18.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.18
Part Dos
Switching gears... Juno digs NoSQL
![Page 19: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/19.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.19
What We've Built
Search Analytics SaaS Numerous reports (e.g. query volume,
rate, latency, term frequencies / comparisons, hit buckets, search origins, etc.)
Trending over time Comparisons of time periods Top N reports Filter, slice and dice
![Page 20: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/20.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.20
Who Needs a Compass?
We need it search-hadoop.com & search-lucene.com
Our customers need it!
You?
![Page 21: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/21.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.21
Sematext Search Analytics
![Page 22: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/22.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.22
Big Dreams
SaaS Multitenant Large Scale – Massive Data Cloud
![Page 23: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/23.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.23
Storage Choices
RDBMS: MySQL, PostgreSQL HDFS Hive HBase Cassandra
![Page 24: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/24.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.24
SaaS vs. In-House
Question for the audience #2
SaaS vs in-house Search Analytics?
a) SaaSb) in-house
![Page 25: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/25.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.25
Sematext Search Analytics
![Page 26: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/26.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.26
Sematext Search Analytics
![Page 27: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/27.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.27
Sematext Search Analytics
![Page 28: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/28.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.28
Sematext Search Analytics
![Page 29: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/29.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.29
Data Flow See Search Analytics with Flume and HBase
http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/
![Page 30: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/30.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.30
Data Collection See Search Analytics with Flume and HBase
http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/
![Page 31: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/31.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.31
Core Tech
JavaScript Beacons Metric Capture Web App aka Receiver Flume Agents, Collectors, Sinks HBase MapReduce Aggregations Search Analytics Reporting Web App
![Page 32: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/32.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.32
What is Flume
Distributed data/log collection service Scalable, configurable, extensible Centrally manageable, open source
Agents get data from app, Collectors save it Abstractions: Source → Decorator(s) → Sink
![Page 33: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/33.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.33
What is HBase
Scalable, reliable, distributed, column-oriented DB On top of HDFS MapReducable
![Page 34: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/34.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.34
Data Flow, Detailed
![Page 35: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/35.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.35
Why Flume
Reliable delivery e.g. queue msgs locally if destination unreachable
Easy, centralized management via Web UI or console
Good community, good progress, now @ASF But: more complex, more moving parts On Flume: slideshare.net/cloudera/inside-flume Alternatives: Kafka, Scribe...
![Page 36: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/36.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.36
Why HBase
Scalable raw & aggregate data storage MapReduce data input Fast scans for time ranges, fast key lookups Easy storage and compute power expansion Good looking roadmap, community,
progress
![Page 37: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/37.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.37
Open Sourcing
2 open-source projects:
github.com/sematext/HBaseWD
github.com/sematext/HBaseHUT See sematext.com/open-source/index.html
Patches for Flume and HBaseblog.sematext.com/tag/flume/
![Page 38: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/38.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.38
Challenges
Data size. Solutions: Compression (4-5x smaller with lzo) Data pruning (variable levels)
Query string distribution: very long-tail Lots of data to process, update, aggregate
Young tools: Flume, HBase Poor IO on EC2 Hadoop distributions
![Page 39: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/39.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.39
Output++
AutoComplete - $MM improvement Better DYM Spellchecker Related Searches Recommendations Relevance Feedback ...
![Page 40: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/40.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.40
Closing the Loop
searchusers
searchproviders
searchexperience
![Page 41: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/41.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.41
Resource
http://rosenfeldmedia.com/books/searchanalytics/
Search Analytics for Your SiteLouis Rosenfeld
![Page 42: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/42.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.42
We're Hiring
Dig Search?
Dig Analytics?
Dig Big Data?
Dig Performance?
Dig working with and in open-source?
We're hiring world-wide!
http://sematext.com/about/jobs.html
![Page 43: Otis gospodnetic Search Analytics Lucene Eurocon 2011](https://reader033.vdocuments.site/reader033/viewer/2022061223/54c672944a795913618b46b4/html5/thumbnails/43.jpg)
Copyright 2011 Sematext Int'l. All rights reserved.43
sematext.com blog.sematext.com @sematext @otisg [email protected]
Want SA? Grab me or go to: sematext.com/search-analytics
Hash tags: #stsa or #stanalytics
Contact