using thinking sphinx with rails
DESCRIPTION
Thinking Sphinx presented at Ruby Fun Day (http://www.rubyonrails.in/events/3)TRANSCRIPT
![Page 1: Using Thinking Sphinx with rails](https://reader033.vdocuments.site/reader033/viewer/2022061218/54b6bdbf4a79593e4f8b4784/html5/thumbnails/1.jpg)
Free open-source SQL full-text search engine
An acronym for SQL Phrase Index Developed by Andrew Aksyonoff
![Page 2: Using Thinking Sphinx with rails](https://reader033.vdocuments.site/reader033/viewer/2022061218/54b6bdbf4a79593e4f8b4784/html5/thumbnails/2.jpg)
database search Using SQL directly: like "%text%" impractical for large text fields. no relevance ranking.
full text search searches all words in every document
against query. moves processing load out of DB. relvance ranking. other advanced features.
![Page 3: Using Thinking Sphinx with rails](https://reader033.vdocuments.site/reader033/viewer/2022061218/54b6bdbf4a79593e4f8b4784/html5/thumbnails/3.jpg)
2 step process indexing▪ scan text and build a list of search terms.
searching▪ search into index to get refrences to data.
![Page 4: Using Thinking Sphinx with rails](https://reader033.vdocuments.site/reader033/viewer/2022061218/54b6bdbf4a79593e4f8b4784/html5/thumbnails/4.jpg)
High indexing speed. upto 10 MB/sec on modern CPUs.
High search speed. avg query is under 0.1 sec on 2-4 GB
text collections.High scalability.
upto 100 GB text, upto 100M documents on a single CPU.
Supports distributed searching. can be extended to multiple servers.
![Page 5: Using Thinking Sphinx with rails](https://reader033.vdocuments.site/reader033/viewer/2022061218/54b6bdbf4a79593e4f8b4784/html5/thumbnails/5.jpg)
Supports phrase proximity ranking. providing good relevance.
Supports stopwords. exclude common words like – a, an, the, with, in
Supports different search modes "match all", "match phrase" and "match any"
Supports relevance modification on the fly.
Key Sphinx features are its speed and phrase proximity ranking.
![Page 6: Using Thinking Sphinx with rails](https://reader033.vdocuments.site/reader033/viewer/2022061218/54b6bdbf4a79593e4f8b4784/html5/thumbnails/6.jpg)
boardreader.com Indexes over 2 billion documents, BoardReader forum
search engine is the biggest Sphinx installation at present.
mininova.org Mininova, popular BitTorrent search engine, serves 3-5
million searches daily.
thepiratebay.org The Pirate Bay and (forthcoming) SuprNova moved to Sphinx
recently.
netlog.com NetLog, a large social network site with over 35 million
registered users, uses Sphinx for pretty every kind of search imaginable - people, photo, blog, event, music, and video searches. 12 million daily queries against 100+ GB indexes are handled by just 2 quad-core search boxes.
![Page 7: Using Thinking Sphinx with rails](https://reader033.vdocuments.site/reader033/viewer/2022061218/54b6bdbf4a79593e4f8b4784/html5/thumbnails/7.jpg)
Sphinx can be downloaded from http://www.sphinxsearch.com/
Its distribution contains the following programs:
indexer utility to create fulltext indices
searchd daemon to search through fulltext indices
search test utility to query fulltext indices from
command line sphinxapi
set of API libraries for Ruby, Python, Perl, Java.
![Page 8: Using Thinking Sphinx with rails](https://reader033.vdocuments.site/reader033/viewer/2022061218/54b6bdbf4a79593e4f8b4784/html5/thumbnails/8.jpg)
Configuration settings for indexer and searchd
Indexes, Fields, Attributes. Each index has a document id, some
fields, and some attributes. The id has to be unique, generally it’s the
primary key. The fields contain the text that is to be
searched. The attributes contain the data used for
sorting, filtering and grouping.
![Page 9: Using Thinking Sphinx with rails](https://reader033.vdocuments.site/reader033/viewer/2022061218/54b6bdbf4a79593e4f8b4784/html5/thumbnails/9.jpg)
thinking_sphinx Pat Allan also developed the underlying API for Sphinx,
Riddle. git://github.com/freelancing-god/thinking-
sphinx.git
ultrasphinx Evan Weaver svn://rubyforge.org/var/svn/fauna/ultrasphinx/
trunk
![Page 10: Using Thinking Sphinx with rails](https://reader033.vdocuments.site/reader033/viewer/2022061218/54b6bdbf4a79593e4f8b4784/html5/thumbnails/10.jpg)
Can be installed simply by ruby script\plugin install
<path_to_plugin>
No need to write the sphinx configuration file, plugins take care of this.
![Page 11: Using Thinking Sphinx with rails](https://reader033.vdocuments.site/reader033/viewer/2022061218/54b6bdbf4a79593e4f8b4784/html5/thumbnails/11.jpg)
field aliasing indexes full_name, :as => :name
field merging [first_name, last_name], :as => :name
field weighting set_property :field_weights => {:
last_name =>2, :first_name => 1} User.search "aaa", :field_weights =>
{ :first_name => 1, :last_name => 2} index computed value
indexes "age > 15", :as => :minor
![Page 12: Using Thinking Sphinx with rails](https://reader033.vdocuments.site/reader033/viewer/2022061218/54b6bdbf4a79593e4f8b4784/html5/thumbnails/12.jpg)
sorting (using attributes and fields) :sortable => true has created_at User.search("user", :order
=> :first_name, :sort_mode => :desc) User.search("user", :order => "created_at
DESC") filtering (using attributes and fields)
User.search :conditions => {:name => "aaa"} User.search :with => {:age => 10} User.search :without => {:age => 10}
add custom SQL conditions to index where "first_name = 'aaa'"
![Page 13: Using Thinking Sphinx with rails](https://reader033.vdocuments.site/reader033/viewer/2022061218/54b6bdbf4a79593e4f8b4784/html5/thumbnails/13.jpg)
drop-in compatibility with will_paginate User.search "aaa", :page => (params[:page]
|| 1) geodistance
has :latit has :longit set_property :latitude_attr
=> :latit, :longitude_attr => :longit Address.search "pizza hut", :geo => [1.234,
4.567], :order => "@geodist asc" delta index support
set_property :delta => true
![Page 14: Using Thinking Sphinx with rails](https://reader033.vdocuments.site/reader033/viewer/2022061218/54b6bdbf4a79593e4f8b4784/html5/thumbnails/14.jpg)
searching across multiple models indexes posts.name indexes posts.comments.name
comprehensive rake tasks rake ts:conf rake ts:in rake ts:start, restart, stop
multiple deployment environments rake ts:config RAILS_ENV=production
![Page 15: Using Thinking Sphinx with rails](https://reader033.vdocuments.site/reader033/viewer/2022061218/54b6bdbf4a79593e4f8b4784/html5/thumbnails/15.jpg)
one-to-one user has_one blog indexes blog.name
one-to-many blog has_many posts indexes posts.name
many-to-many (through) posts has_many comments through records comments has_many posts through records indexes comments.name
![Page 16: Using Thinking Sphinx with rails](https://reader033.vdocuments.site/reader033/viewer/2022061218/54b6bdbf4a79593e4f8b4784/html5/thumbnails/16.jpg)
deeply nested▪ blog has_many posts▪ posts has_many comments▪ indexes posts.comments.name
STI▪ User.search("user", :with => {:class_crc =>
Teacher.to_crc32}) polymorphic▪ user has_one phone▪ company has_one phone▪ indexes phone.name▪ where "callable_type = 'User'“
![Page 17: Using Thinking Sphinx with rails](https://reader033.vdocuments.site/reader033/viewer/2022061218/54b6bdbf4a79593e4f8b4784/html5/thumbnails/17.jpg)
You can run the index task while Sphinx is running, and it’ll reload the indexes automatically.
As of version 0.9.9, your configuration will automatically be reloaded.
Keep in mind that if any keywords for Ruby methods - such as id or name - clash with your column names, you need to use the symbol version.
Sphinx connects to DB directly, so don’t expect that any of the model methods can be indexed.
![Page 18: Using Thinking Sphinx with rails](https://reader033.vdocuments.site/reader033/viewer/2022061218/54b6bdbf4a79593e4f8b4784/html5/thumbnails/18.jpg)
You can extract commands for indexing and starting search daemon into scripts for fast access. indexer --config
config/development.sphinx.conf --all searchd --config
config/development.sphinx.conf skip this warning
distributed index 'model_name' can not be directly indexed; skipping.
![Page 19: Using Thinking Sphinx with rails](https://reader033.vdocuments.site/reader033/viewer/2022061218/54b6bdbf4a79593e4f8b4784/html5/thumbnails/19.jpg)
Almost has all thinking_sphinx features with some additional features: excerpt highlighting spellcheck fields* faceting on text, date, and numeric fields*
*will be demonstrated in next presentation
![Page 20: Using Thinking Sphinx with rails](https://reader033.vdocuments.site/reader033/viewer/2022061218/54b6bdbf4a79593e4f8b4784/html5/thumbnails/20.jpg)
sphinx http://www.sphinxsearch.com/
ultrasphinx http://blog.evanweaver.com/files/doc/
fauna/ultrasphinx/files/README.html thinking_sphinx
http://ts.freelancing-gods.com/ http://groups.google.com/group/
thinking-sphinx/