sphinx - high performance full-text search for mysql
TRANSCRIPT
![Page 1: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/1.jpg)
Sphinx - High performance full-text search for MySQL
Nguyen Van Vuong - Framgia
![Page 2: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/2.jpg)
Agenda
❖ Full-text search❖ What’s Sphinx ?❖ Why Sphinx ?❖ Sphinx workflow
➢ Indexing➢ Searching➢ Query syntax
❖ How does it scale ?❖ More about Sphinx❖ References
2
![Page 3: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/3.jpg)
Full-text search
3
![Page 4: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/4.jpg)
Full-text search
❖ Full-text search is one of the techniques for searching a document or database stored➢ Examines all of the words➢ Tries to match the search query
Articles
id (integer) title (varchar) content (text) tag (varchar)
4
❖ Example
![Page 5: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/5.jpg)
Full-text search
❖ Full-text search is one of the techniques for searching a document or database stored➢ Examines all of the words➢ Tries to match the search query
5
❖ ExampleSELECT * FROM articles
WHERE MATCH (title, content) AGAINST ('database' IN NATURAL LANGUAGE MODE)
![Page 6: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/6.jpg)
Full-text search - Term search vs Full-text search
❖ Search keywords: “I ate pizza yesterday”❖ Term search
➢ No analysis phase➢ Operate on a single term
6
![Page 7: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/7.jpg)
Full-text search - Term search vs Full-text search
❖ Full-text search➢ Tokenizer/analyzer
■ Breaking keywords down by whitespace and punctuation
■ Charset table➢ Morphology preprocessors
■ Normalize both "dogs" and "dog" to "dog"● Eat, eating, eaten, ate 7
![Page 8: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/8.jpg)
What’s Sphinx ?
8
![Page 9: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/9.jpg)
What’s Sphinx ?
❖ Sphinx is a mythical creature with the head of a human and the body of a lion
9
![Page 10: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/10.jpg)
What’s Sphinx ?
❖ Sphinx is a mythical creature with the head of a human and the body of a lion
10
![Page 11: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/11.jpg)
What’s Sphinx ?
❖ Full-text search engine❖ Free open source (GPL v2)❖ Begin 10 years ago❖ High performance❖ Integrate well with SQL databases❖ API exist for Perl, C#, Ruby, Java, PHP❖ Available for Linux, Windows, Mac OS
11
![Page 12: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/12.jpg)
Why Sphinx ?
12
![Page 13: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/13.jpg)
Why sphinx ?
❖ Quick to learn
❖ Easy to use
❖ Simple to maintain
13
![Page 14: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/14.jpg)
Why sphinx ?
❖ Speed➢ 50x-100x faster than MySQL Fulltext➢ Up to 1000x faster than MySQL in extreme cases
(eg. large result set with GROUP BY)
❖ Feature-rich➢ Relevancy (BM25)➢ Synonyms➢ Stopwords➢ Real-time index➢ ... 14
![Page 15: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/15.jpg)
Why sphinx ?
❖ Scalable➢ Aggregates search results from many sources➢ Fully transparent to calling application➢ Built-in load balancing
❖ Easy to Integrate➢ SphinxApi➢ SphinxSQL
15
![Page 16: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/16.jpg)
Sphinx workflow
16
![Page 17: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/17.jpg)
Spinx workflow
17
Application
Database
Sphinx Daemon
Sphinx Indexer Sphinx Index
1. Search query
2. Search results (IDs)
3. F
etch
doc
by
ID
![Page 18: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/18.jpg)
Sphinx workflow - Indexing
❖ Configuration➢ sphinx.conf
❖ Data sources
18
![Page 19: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/19.jpg)
❖ Character level➢ Charset_table
■ Use ranges: a...z, U+410...U+42F➢ Ngram_chars
■ Hieroglyphs as separate tokens● Chinese, Japanese, …● Unicode charset CJKV
Sphinx workflow - Indexing
19
![Page 20: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/20.jpg)
Sphinx workflow - Indexing
❖ Word level➢ Stopwords
■ Avoid wasting index space■ Example
● Don’t want to search for (like “I”, “Am”, “An”, etc)
➢ Stemming■ Single word can appear in many forms when
used in different contexts20
![Page 21: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/21.jpg)
Sphinx workflow - Indexing
❖ Building index
21
$ sudo service sphinxsearch start
$ sudo indexer --config <file> --all
$ sudo indexer --config <file> --rotate
![Page 22: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/22.jpg)
Sphinx workflow - Searching
❖ Configuring search daemon
22
searchd {listen =
localhost:9312listen =
9306:mysqllog =
/var/log/sphinxsearch/searchd.logquery_log =
/var/log/sphinxsearch/query.logread_timeout = 5client_timeout = 300max_children = 30persistent_connections_limit = 30pid_file =
/var/run/sphinxsearch/searchd.pid...
}
![Page 23: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/23.jpg)
Sphinx workflow - Searching
❖ Sphinx Api➢ Perl, C#, Ruby, Java, PHP➢ Example in PHP
23
![Page 24: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/24.jpg)
Sphinx workflow - Searching
❖ SphinxQL➢ Connect via MySQL Client
➢ Query like MySQL
24
$ mysql -h<ip> -P<port_of_sphinx>
SELECT * FROM myindex
WHERE MATCH ('@(title,content) find me fast');
![Page 25: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/25.jpg)
Sphinx workflow - Searching
❖ SphinxQL➢ Connect via MySQL Client
25
![Page 26: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/26.jpg)
Sphinx workflow - Query syntax
❖ Boolean search AND OR NOT: hello | world hello & world hello -world
❖ Per-field search@title hello, @body world
❖ Field combination@(title, body) hello world
❖ Search within first N words@body[50] hello
❖ Phrase search“hello world”
26
![Page 27: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/27.jpg)
Sphinx workflow - Query syntax
27
❖ Per field relevancy ranking weightsSPH_MATCH_ALLSPH_MATCH_ANYSPH_MATCH_FULLSCAN
❖ Proximity search"people passion"~3
❖ GEO distance search (with syntax for mi/km/m)GEODIST(0.659298124, -2.136602399, latitude,
longitude)
![Page 28: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/28.jpg)
How does it scale ?
28
![Page 29: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/29.jpg)
How does it scale ?
❖ Distribution is done horizontally➢ Search is performed across different nodes
❖ Set up an index on multiple servers
29
![Page 30: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/30.jpg)
How does it scale ?
❖ Adding distributed index configuration➢ First server (192.168.1.1)
30
index master{
type = distributed# Local index to be searchedlocal = items# Remote agent (index) to be searchedagent = 192.168.1.2:9312:items-2
}
![Page 31: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/31.jpg)
More about sphinx
31
![Page 32: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/32.jpg)
More about Sphinx
❖ Biggest known Sphinx cluster➢ Indexes 25+ billion
documents➢ Over 9TB of data➢ 1+ million
searches/day
32
❖ Busiest known Sphinx cluster➢ 300+ million search
queries/day.
❖ Books
![Page 33: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/33.jpg)
References
❖ Sphinx document (v2.2.1)❖ Sphinx Search Beginner's Guide - Abbas Ali❖ Meet the Sphinx - Andrew Aksyonoff❖ Advanced fulltext search with Sphinx - Adrian
Nuta❖ Search Big Data with MySQL and Sphinx -
Mindaugas Zukas
33
![Page 34: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/34.jpg)
34
Thank you
![Page 35: Sphinx - High performance full-text search for MySQL](https://reader036.vdocuments.site/reader036/viewer/2022062522/58a020211a28ab4e768b4841/html5/thumbnails/35.jpg)
Time for action
35
⬇
https://github.com/euclid1990/php-sphinx-search