intro to search
DESCRIPTION
A 1 hour intro to search, Apache Lucene and Solr, and LucidWorks Search. Contains a quick start with LucidWorks Search and a demo using financial data (See Github prj: http://bit.ly/lws-financial) as well as some basic vocab and search explanationsTRANSCRIPT
© Copyright 2013
Intro to Search
Grant IngersollCTO, LucidWorks@gsingers
© 2013 LucidWorks
• Search is Everywhere!
• The Bar is Raised- Keyword search is a
commodity
• Holistic view of the data AND the users is critical
• Scalable Search, Discovery and Analytics are the key to unlocking this view of users and data
Search is dead, long live search
Documents
User Interaction
Access
Content Relationships
© 2013 LucidWorks3
Search is good for…
• Traditional: Fast, fuzzy text matching across a large document collection
• De-normalized data- “light” relational
• Top N problems- Key-value (top 1)
- Recommendations
- “Good enough” classification, clustering
• Faceting, slicing and dicing of enumerated data
• Spatial, spell checking, record linkage, highlighting
• NoSQL
© 2013 LucidWorks4
Common Use Cases
• eCommerce
- Search + Recs + Analysis of users
• Knowledge Management
- Financial, transportation, pharma
•Fraud detection
• Social media
- Trend monitoring
• Information technology- Log monitoring, analysis
•Healthcare
- DNA Analysis
© 2013 LucidWorks6
Topics
• Intros
• First 5 Minutes with LucidWorks Search (Solr++)
• Search Concepts
• Demo Deep Dive
• Level Up
• Resources
© 2013 LucidWorks7
› Founded in 2007 to be the go-to-company for
Lucene/Solr expertise
› 250+ customers (many Fortune 500)
› 100% y-y growth
› Over 40% of the active Apache Lucene/Solr Committers
› Host fast-growing Lucene/Solr Revolution User Conference
(400+ attendees)
LucidWorks Overview
© 2013 LucidWorks8
LucidWorks Product Suite
PRODUCT
LucidWorks SearchLucidWorks Big
Data
DescriptionMassively adopted open source search technology
Enterprise Search platform built on Lucene/Solr
Unified development platform for Big Data applications
Version Version 4.3 released May 2013
Version 2.5 ships December 2012
GA Version 1.1 released Feb. 2013
LucidWorks Offering
› Annual Support Subscriptions
› Professional Services
› Training› Inside Sales Model
› Free trial› On-prem or cloud › Inside sales model
› Free Trial› On-prem or cloud › Enterprise sales
model
© 2013 LucidWorks9
5 Minutes to Search
1. Install LWS1. Unpack, double click to launch Installer
2. Launch, wait for startup
2. http://localhost:8989/
3. Choose “Quick Start”
4. Choose a Data Source1. For me: /Users/grantingersoll/Desktop/reading
5. Quick Search
6. Search with Flare1. http://localhost:8989/flare/catalog/quickstart
7. Quick Changes:1. Add a Facet
2. Change Display Results
© 2013 LucidWorks10
Prepare Deep Dive Demo
1. https://github.com/LucidWorks/lws-financial-demo/blob/master/README.md
2. cd src/main/python
3. python setup.py -n setup -a TWITTER_ACCESS_TOKEN -c TWITTER_CONSUMER_KEY -s TWITTER_CONSUMER_SECRET -t TWITTER_ACCESS_TOKEN_SECRET -p ../../../data/sp500List-30.txt -A -l Finance --data_dir ../../../data
4. python python.py
© 2013 LucidWorks
• Java APIs for building search applications
• Fast, efficient, flexible
• Modules to add functionality:- Lang. Analysis- Faceting- Highlighting, spell checking- Much more
• Lucene best practices
• HTTP-based service- Many client bindings
• Faceting
• Distributed, fault-tolerant
• Many No-SQL features
11
© 2013 LucidWorks12
• IT Ready Open Source- Installation, provisioning, monitoring, administration, integration
• Enterprise Grade- A robust connector framework
» Including a wide assortment of prebuilt connectors to popular data sources
- Enterprise security framework» Leverages SSL, LDAP, Active Directory
» Document level access control
• Business Friendly- Rich graphical administration console
» speeds up search application development, deployment and management
- Expressive Business Logic» Processing information thru filters for better more accurate results
- Relevancy Work Bench
• Full power of Apache Lucene and Solr
LucidWorks Search Goals
© 2013 LucidWorks
Shards
1 23 N
Search View
•Documents
•Users •Logs
DocumentStore
Analytic Services
View into numeric/historic data
ClassificationRecommendation
Personalization & Machine Learning Services
Classification Models
In memoryReplicatedMulti-tenant
Discovery & Enrichment
Clustering, classification, NLP, topic identification, search log analysis, user behavior Content Acquisition
ETL, batch or near real-time
Access APIs
Data• LucidWorks Search
connectors• Push
Reference Architecture
© 2013 LucidWorks14
Basic Vocab
•Documents- Fields
»Tokens▪ Payloads
• Query- Many diff. kinds: term, phrase, regex, spatial, function
• Facets & Filters
• Collection- Index
»Shard▪ Segment
© 2013 LucidWorks15
Search Concepts: Indexing
© 2013 LucidWorks16
Search Concepts: Ranking
• Search is optimized for solving top N problems
• Hand Waving Algo:- Parse query- For Each Term
» Look up documents containing term
- Rank documents according to similarity
- Return top X
© 2013 LucidWorks17
Search Concepts: Faceting
• Dynamically slice and dice query results in a variety of ways:- Term- Range (date and numeric)- Pivot- Function- Multi-select
• Gather Stats
© 2013 LucidWorks18
Demo Deep Dive
• Application:- Stock Insights- Twitter Bootstrap + Python Flask + LWS- http://localhost:5000
• Goals: - Explore data sources, scheduling, other features- Automate setup via script and LWS APIs
• Data: - Company Info (Symbol, Company, Industry, City, State)- Twitter, websites- Historical Stock Prices from Y! Finance
• http://github.com/lucidworks/lws-financial-demo- README covers setup
© 2013 LucidWorks19
Level Up
• Explore our APIs:- http://bit.ly/lws-apis
• Build your own UI or extend ours
• Write a custom connector
• Customize Solr!
• Scale with SolrCloud
• Explore Solr Marketplace:• http://bit.ly/solr-market
© 2013 LucidWorks20
Where to Next?
• http://www.lucidworks.com• http://lucene.apache.org/solr
• Training: http://bit.ly/lws-training
• LWS more info: http://bit.ly/lws-more-info• LWS Documentation: http://bit.ly/lws-docs
• Twitter: @gsingers, @LucidWorks
• Taming Text: http://www.manning.com/ingersoll