intro to search

20
© Copyright 2013 Intro to Search Grant Ingersoll CTO, LucidWorks @gsingers

Upload: grant-ingersoll

Post on 10-May-2015

937 views

Category:

Technology


1 download

DESCRIPTION

A 1 hour intro to search, Apache Lucene and Solr, and LucidWorks Search. Contains a quick start with LucidWorks Search and a demo using financial data (See Github prj: http://bit.ly/lws-financial) as well as some basic vocab and search explanations

TRANSCRIPT

Page 1: Intro to Search

© Copyright 2013

Intro to Search

Grant IngersollCTO, LucidWorks@gsingers

Page 2: Intro to Search

© 2013 LucidWorks

• Search is Everywhere!

• The Bar is Raised- Keyword search is a

commodity

• Holistic view of the data AND the users is critical

• Scalable Search, Discovery and Analytics are the key to unlocking this view of users and data

Search is dead, long live search

Documents

User Interaction

Access

Content Relationships

Page 3: Intro to Search

© 2013 LucidWorks3

Search is good for…

• Traditional: Fast, fuzzy text matching across a large document collection

• De-normalized data- “light” relational

• Top N problems- Key-value (top 1)

- Recommendations

- “Good enough” classification, clustering

• Faceting, slicing and dicing of enumerated data

• Spatial, spell checking, record linkage, highlighting

• NoSQL

Page 4: Intro to Search

© 2013 LucidWorks4

Common Use Cases

• eCommerce

- Search + Recs + Analysis of users

• Knowledge Management

- Financial, transportation, pharma

•Fraud detection

• Social media

- Trend monitoring

• Information technology- Log monitoring, analysis

•Healthcare

- DNA Analysis

Page 5: Intro to Search

© 2013 LucidWorks

http://bit.ly/get-lws

5

Page 6: Intro to Search

© 2013 LucidWorks6

Topics

• Intros

• First 5 Minutes with LucidWorks Search (Solr++)

• Search Concepts

• Demo Deep Dive

• Level Up

• Resources

Page 7: Intro to Search

© 2013 LucidWorks7

› Founded in 2007 to be the go-to-company for

Lucene/Solr expertise

› 250+ customers (many Fortune 500)

› 100% y-y growth

› Over 40% of the active Apache Lucene/Solr Committers

› Host fast-growing Lucene/Solr Revolution User Conference

(400+ attendees)

LucidWorks Overview

Page 8: Intro to Search

© 2013 LucidWorks8

LucidWorks Product Suite

PRODUCT

LucidWorks SearchLucidWorks Big

Data

DescriptionMassively adopted open source search technology

Enterprise Search platform built on Lucene/Solr

Unified development platform for Big Data applications

Version Version 4.3 released May 2013

Version 2.5 ships December 2012

GA Version 1.1 released Feb. 2013

LucidWorks Offering

› Annual Support Subscriptions

› Professional Services

› Training› Inside Sales Model

› Free trial› On-prem or cloud › Inside sales model

› Free Trial› On-prem or cloud › Enterprise sales

model

Page 9: Intro to Search

© 2013 LucidWorks9

5 Minutes to Search

1. Install LWS1. Unpack, double click to launch Installer

2. Launch, wait for startup

2. http://localhost:8989/

3. Choose “Quick Start”

4. Choose a Data Source1. For me: /Users/grantingersoll/Desktop/reading

5. Quick Search

6. Search with Flare1. http://localhost:8989/flare/catalog/quickstart

7. Quick Changes:1. Add a Facet

2. Change Display Results

Page 10: Intro to Search

© 2013 LucidWorks10

Prepare Deep Dive Demo

1. https://github.com/LucidWorks/lws-financial-demo/blob/master/README.md

2. cd src/main/python

3. python setup.py -n setup -a TWITTER_ACCESS_TOKEN -c TWITTER_CONSUMER_KEY -s TWITTER_CONSUMER_SECRET -t TWITTER_ACCESS_TOKEN_SECRET -p ../../../data/sp500List-30.txt -A -l Finance --data_dir ../../../data

4. python python.py

Page 11: Intro to Search

© 2013 LucidWorks

• Java APIs for building search applications

• Fast, efficient, flexible

• Modules to add functionality:- Lang. Analysis- Faceting- Highlighting, spell checking- Much more

• Lucene best practices

• HTTP-based service- Many client bindings

• Faceting

• Distributed, fault-tolerant

• Many No-SQL features

11

Page 12: Intro to Search

© 2013 LucidWorks12

• IT Ready Open Source- Installation, provisioning, monitoring, administration, integration

• Enterprise Grade- A robust connector framework

» Including a wide assortment of prebuilt connectors to popular data sources

- Enterprise security framework» Leverages SSL, LDAP, Active Directory

» Document level access control

• Business Friendly- Rich graphical administration console

» speeds up search application development, deployment and management

- Expressive Business Logic» Processing information thru filters for better more accurate results

- Relevancy Work Bench

• Full power of Apache Lucene and Solr

LucidWorks Search Goals

Page 13: Intro to Search

© 2013 LucidWorks

Shards

1 23 N

Search View

•Documents

•Users •Logs

DocumentStore

Analytic Services

View into numeric/historic data

ClassificationRecommendation

Personalization & Machine Learning Services

Classification Models

In memoryReplicatedMulti-tenant

Discovery & Enrichment

Clustering, classification, NLP, topic identification, search log analysis, user behavior Content Acquisition

ETL, batch or near real-time

Access APIs

Data• LucidWorks Search

connectors• Push

Reference Architecture

Page 14: Intro to Search

© 2013 LucidWorks14

Basic Vocab

•Documents- Fields

»Tokens▪ Payloads

• Query- Many diff. kinds: term, phrase, regex, spatial, function

• Facets & Filters

• Collection- Index

»Shard▪ Segment

Page 15: Intro to Search

© 2013 LucidWorks15

Search Concepts: Indexing

Page 16: Intro to Search

© 2013 LucidWorks16

Search Concepts: Ranking

• Search is optimized for solving top N problems

• Hand Waving Algo:- Parse query- For Each Term

» Look up documents containing term

- Rank documents according to similarity

- Return top X

Page 17: Intro to Search

© 2013 LucidWorks17

Search Concepts: Faceting

• Dynamically slice and dice query results in a variety of ways:- Term- Range (date and numeric)- Pivot- Function- Multi-select

• Gather Stats

Page 18: Intro to Search

© 2013 LucidWorks18

Demo Deep Dive

• Application:- Stock Insights- Twitter Bootstrap + Python Flask + LWS- http://localhost:5000

• Goals: - Explore data sources, scheduling, other features- Automate setup via script and LWS APIs

• Data: - Company Info (Symbol, Company, Industry, City, State)- Twitter, websites- Historical Stock Prices from Y! Finance

• http://github.com/lucidworks/lws-financial-demo- README covers setup

Page 19: Intro to Search

© 2013 LucidWorks19

Level Up

• Explore our APIs:- http://bit.ly/lws-apis

• Build your own UI or extend ours

• Write a custom connector

• Customize Solr!

• Scale with SolrCloud

• Explore Solr Marketplace:• http://bit.ly/solr-market

Page 20: Intro to Search

© 2013 LucidWorks20

Where to Next?

• http://www.lucidworks.com• http://lucene.apache.org/solr

• Training: http://bit.ly/lws-training

• LWS more info: http://bit.ly/lws-more-info• LWS Documentation: http://bit.ly/lws-docs

• Twitter: @gsingers, @LucidWorks

• Taming Text: http://www.manning.com/ingersoll