ferret a ruby search engine

67
Ferret A Ruby Search Engine Brian Sam-Bodden

Upload: elliando-dias

Post on 02-Nov-2014

3.471 views

Category:

Lifestyle


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Ferret A Ruby Search Engine

FerretA Ruby Search Engine

Brian Sam-Bodden

Page 2: Ferret A Ruby Search Engine

Agenda

• What is Ferret?

• Concepts

• Fields

• Indexing

• Installing Ferret

Page 3: Ferret A Ruby Search Engine

Agenda

• The Recipe

• Documents

• Ferret::Index::Index

• FQL

• Ferret in you App

Page 4: Ferret A Ruby Search Engine

Agenda

• Ferret in Rails

• Resources

Page 5: Ferret A Ruby Search Engine

What is Ferret?

• Information Retrieval (IR) Library

• Full-featured Text Search Engine

• Inspired on the Search Engine

• Port to Ruby by David Balmain

Page 6: Ferret A Ruby Search Engine

What is Ferret?

• Initially a 100% pure Ruby port

• Since 0.9 many core functions are implemented in C

• Fast! Now Faster than Lucene ;-)

Page 7: Ferret A Ruby Search Engine

Concepts

Page 8: Ferret A Ruby Search Engine

Concepts

• Index : Sequence of documents

Page 9: Ferret A Ruby Search Engine

Concepts

• Index : Sequence of documents

• Document : Sequence of fields

Page 10: Ferret A Ruby Search Engine

Concepts

• Index : Sequence of documents

• Document : Sequence of fields

• Field : Named sequence of terms

Page 11: Ferret A Ruby Search Engine

Concepts

• Index : Sequence of documents

• Document : Sequence of fields

• Field : Named sequence of terms

• Term : A text string, keyed by field name

Page 12: Ferret A Ruby Search Engine

Fields of a Document in an Index

Page 13: Ferret A Ruby Search Engine

Fields of a Document in an Index

• Fields are individually searchable units that are:

Page 14: Ferret A Ruby Search Engine

Fields of a Document in an Index

• Fields are individually searchable units that are:

• Stored: The original Terms of the fields are store

Page 15: Ferret A Ruby Search Engine

Fields of a Document in an Index

• Fields are individually searchable units that are:

• Stored: The original Terms of the fields are store

• Indexed: Inverted to rapidly find all Documents containing any of the Terms

Page 16: Ferret A Ruby Search Engine

Fields of a Document in an Index

• Fields are individually searchable units that are:

• Stored: The original Terms of the fields are store

• Indexed: Inverted to rapidly find all Documents containing any of the Terms

• Tokenized: Individual Terms extracted are indexed

Page 17: Ferret A Ruby Search Engine

Fields of a Document in an Index

• Fields are individually searchable units that are:

• Stored: The original Terms of the fields are store

• Indexed: Inverted to rapidly find all Documents containing any of the Terms

• Tokenized: Individual Terms extracted are indexed

• Vectored: Frequency and location of Terms are stored

Page 18: Ferret A Ruby Search Engine

It’s all about Indexing

• Indexing is the processing of a source document into plain text tokens that Ferret can manipulate

• For any non-plaintext sources such as PDF, Word, Excel you need to:

• Extract

• Analyze

Page 19: Ferret A Ruby Search Engine

Installing Ferret

Page 20: Ferret A Ruby Search Engine

Installing Ferret

gem install ferret

Page 21: Ferret A Ruby Search Engine

Installing Ferret

Page 22: Ferret A Ruby Search Engine

Installing Ferret

Page 23: Ferret A Ruby Search Engine

Installing Ferret

}

Page 24: Ferret A Ruby Search Engine

Installing Ferret

}Pick the latest version for your platform

Page 25: Ferret A Ruby Search Engine

The Recipe

Page 26: Ferret A Ruby Search Engine

The Recipe

1. Create some Documents

Page 27: Ferret A Ruby Search Engine

The Recipe

1. Create some Documents

2. Create an Index

Page 28: Ferret A Ruby Search Engine

The Recipe

1. Create some Documents

2. Create an Index

3. Adding Documents to the Index

Page 29: Ferret A Ruby Search Engine

The Recipe

1. Create some Documents

2. Create an Index

3. Adding Documents to the Index

4. Perform some Queries

Page 30: Ferret A Ruby Search Engine

Example DocumentsCreate some Documents

Page 31: Ferret A Ruby Search Engine

Example DocumentsCreate some Documents

“Any String is a Document”

Page 32: Ferret A Ruby Search Engine

Example DocumentsCreate some Documents

Page 33: Ferret A Ruby Search Engine

Example DocumentsCreate some Documents

[“This”, “is also”, “a document”]

Page 34: Ferret A Ruby Search Engine

Example DocumentsCreate some Documents

Page 35: Ferret A Ruby Search Engine

Example DocumentsCreate some Documents

Page 36: Ferret A Ruby Search Engine

Ferret::Index::IndexCreate an Index

Page 37: Ferret A Ruby Search Engine

Ferret::Index::Index

• Indexes are encapsulated by the class

Create an Index

Page 38: Ferret A Ruby Search Engine

Ferret::Index::Index

• Indexes are encapsulated by the class

➡ Ferret::Index::Index

Create an Index

Page 39: Ferret A Ruby Search Engine

Ferret::Index::Index

• Indexes are encapsulated by the class

➡ Ferret::Index::Index

• Use the alias Ferret::I for convenience

Create an Index

Page 40: Ferret A Ruby Search Engine

Ferret::Index::Index

• Indexes are encapsulated by the class

➡ Ferret::Index::Index

• Use the alias Ferret::I for convenience

• Index can be persistent

Create an Index

Page 41: Ferret A Ruby Search Engine

Ferret::Index::Index

• Indexes are encapsulated by the class

➡ Ferret::Index::Index

• Use the alias Ferret::I for convenience

• Index can be persistent

➡ index = Ferret::I.new(:path = > ‘/somepath’)

Create an Index

Page 42: Ferret A Ruby Search Engine

Ferret::Index::Index

• Indexes are encapsulated by the class

➡ Ferret::Index::Index

• Use the alias Ferret::I for convenience

• Index can be persistent

➡ index = Ferret::I.new(:path = > ‘/somepath’)

• Or, completely in Memory

Create an Index

Page 43: Ferret A Ruby Search Engine

Ferret::Index::Index

• Indexes are encapsulated by the class

➡ Ferret::Index::Index

• Use the alias Ferret::I for convenience

• Index can be persistent

➡ index = Ferret::I.new(:path = > ‘/somepath’)

• Or, completely in Memory

➡ index = Ferret::I.new()

Create an Index

Page 44: Ferret A Ruby Search Engine

Ferret::Index::Index

• Index provides the add_document method

• It also provides the << alias

• Adding documents is then as easy as:

➡ index << “This is a document”

➡ index << {:first => “Bob”, :last => “Smith”}

Adding Documents to the Index

Page 45: Ferret A Ruby Search Engine

Ferret::Index::IndexPerform some Queries

Page 46: Ferret A Ruby Search Engine

Ferret::Index::Index

• Index provides the search and search_each methods

Perform some Queries

Page 47: Ferret A Ruby Search Engine

Ferret::Index::Index

• Index provides the search and search_each methods

• search method takes a query and a an optional set of parameters:

Perform some Queries

Page 48: Ferret A Ruby Search Engine

Ferret::Index::Index

• Index provides the search and search_each methods

• search method takes a query and a an optional set of parameters:

➡ search(query, options = {})

Perform some Queries

Page 49: Ferret A Ruby Search Engine

Ferret::Index::Index

• Index provides the search and search_each methods

• search method takes a query and a an optional set of parameters:

➡ search(query, options = {})

• The search_each method provides an iterator block

Perform some Queries

Page 50: Ferret A Ruby Search Engine

Ferret::Index::Index

• Index provides the search and search_each methods

• search method takes a query and a an optional set of parameters:

➡ search(query, options = {})

• The search_each method provides an iterator block

➡ search_each(query, options = {}) {|doc, score| ... }

Perform some Queries

Page 51: Ferret A Ruby Search Engine

Playing with Ferret in irb

Page 52: Ferret A Ruby Search Engine

Playing with Ferret in irb

Page 53: Ferret A Ruby Search Engine

Ferret Query Language

• Ferret own Query Language, FQL is a powerful way to specify search queries

• FQL supports many query types, including:

• Term• Phrase• Field• Boolean

• Range• Wild• Fuzz

Page 54: Ferret A Ruby Search Engine

Index.explain

• The explain method of Index describes how a document score against a query

• Very useful for debugging

• and for learning how Ferret works

Page 55: Ferret A Ruby Search Engine

Index.explain

Page 56: Ferret A Ruby Search Engine

Ferret in your App

File System

Gather Data

Database Web

Manual Input

Ap

pli

cati

onF

erre

t

User

Get User’s Query

Present Search Results

Index Documents Search Index

Index

Page 57: Ferret A Ruby Search Engine

Ferret in Rails

• Acts As Ferret is an ActiveRecord extension

• Available as a plugin

• Provides a simplified interface to Ferret

• Maintained by Jens Kramer

Page 58: Ferret A Ruby Search Engine

Ferret in Rails

• Adding an index to an ActiveRecord model is as simple as:

Page 59: Ferret A Ruby Search Engine

Ferret in Rails

• Adding an index to an ActiveRecord model is as simple as:

Page 60: Ferret A Ruby Search Engine

Ferret in Rails• Simple model has two searchable

fields title and body:

Page 61: Ferret A Ruby Search Engine

Ferret in Rails

• After a quick rake db:migrate we now have some data to play with

• Fire up the Rails Console and let’s see what acts_as_ferret can do for our models

Page 62: Ferret A Ruby Search Engine

Ferret in Rails

Page 63: Ferret A Ruby Search Engine

Want more?

• Ferret is improving constantly

• Acts As Ferret seems to catch up quickly

• Real-life usage seems to require some good engineering on your part

• Background indexing

• Hot swap of indexes?

Page 64: Ferret A Ruby Search Engine

Want more?

• We only covered the simplest constructs in Ferret

• Ferret’s API provides enough flexibility for the most demanding searching needs

Page 65: Ferret A Ruby Search Engine

Online Resources

• http://ferret.davebalmain.com

• http://lucene.apache.org

• http://lucenebook.com

• http://projects.jkraemer.net/acts_as_ferret

Page 66: Ferret A Ruby Search Engine

In-Print Resources

Page 67: Ferret A Ruby Search Engine

Thanks!