postgresql full-text search in django

19
PostgreSQL Full-text Search in Django Paweł Kowalski

Upload: stx-next

Post on 18-Jan-2017

1.335 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: PostgreSQL Full-text Search in Django

PostgreSQL Full-text Searchin Django

Paweł Kowalski

Page 2: PostgreSQL Full-text Search in Django

● What is full-text search

● How it works in PostgreSQL

○ search

○ ranking

● How to use it in Django

● Questions

Agenda

Page 3: PostgreSQL Full-text Search in Django

Full text search refers to techniques for searching a

single computer-stored document or a collection in a

full text database.

https://en.wikipedia.org/wiki/Full_text_search

WHAT IS FULL TEXT SEARCH

Page 4: PostgreSQL Full-text Search in Django

SELECT *

FROM table

WHERE Col1 LIKE '%query%';

WHAT IS FULL TEXT SEARCH

Page 5: PostgreSQL Full-text Search in Django

SELECT *

FROM table

WHERE Col1 LIKE '%query%';

WHAT IS FULL TEXT SEARCH

SLOW, EXPENSIVE,NO ORDERING BY RELEVANCE

● LIKE ‘%query’ can’t use index● Col1 can be very long (eg. entire book)

Page 6: PostgreSQL Full-text Search in Django

SELECT to_tsvector(

'english',

'Try not to become a man of success, but rather try to become a man of value'

);

to_tsvector

----------------------------------------------------------------------

'becom':4,13 'man':6,15 'rather':10 'success':8 'tri':1,11 'valu':17

(1 row)

HOW IT WORKS IN POSTGRESQL

PostgreSQL, please help!

TSVECTOR

Since PostgreSQL 8.3

Page 7: PostgreSQL Full-text Search in Django

select to_tsvector('If you can dream it, you can do it') @@ 'dream';

?column?

----------

t

(1 row)

select to_tsvector('It''s kind of fun to do the impossible') @@ 'impossible';

?column?

----------

f

(1 row)

HOW IT WORKS IN POSTGRESQL

Search Operator: @@

Page 8: PostgreSQL Full-text Search in Django

SELECT 'dream'::tsquery, to_tsquery('dream');

tsquery | to_tsquery

--------------+------------

'dream' | 'dream'

(1 row)

SELECT 'impossible'::tsquery, to_tsquery('impossible');

tsquery | to_tsquery

--------------+------------

'impossible' | 'imposs'

(1 row)

HOW IT WORKS IN POSTGRESQL

TO_TSQUERY function

Page 9: PostgreSQL Full-text Search in Django

SELECT to_tsvector('It''s kind of fun to do the impossible') @@ to_tsquery('impossible');

?column?

----------

t

(1 row)

HOW IT WORKS IN POSTGRESQL

TO_TSQUERY function

Page 10: PostgreSQL Full-text Search in Django

SELECT to_tsvector('If the facts don't fit the theory, change the facts') @@ to_tsquery('! fact');

SELECT to_tsvector('If the facts don''t fit the theory, change the facts') @@ to_tsquery('theory & !fact');

SELECT to_tsvector('If the facts don''t fit the theory, change the facts.') @@ to_tsquery('fiction | theory');

HOW IT WORKS IN POSTGRESQL

Query Operators: ! & |

Page 11: PostgreSQL Full-text Search in Django

SELECT COUNT(*) FROM ticketing_event WHERE name ILIKE '%madonna%rebel%heart%tour%';

Time: 78,083 ms

HOW IT WORKS IN POSTGRESQL

Some numbers

SELECT COUNT(*) FROM ticketing_event WHERE search_vector @@ 'madonna & rebel & heart & tour'::tsquery;

Time: 30,065 ms

SELECT COUNT(*) FROM ticketing_event;

count

-------

68889

Time: 11,440 ms

Page 12: PostgreSQL Full-text Search in Django

SELECT post.id, setweight(to_tsvector(post.title), ‘A’) ||

setweight(to_tsvector(post.content), ‘B’) AS vector1

FROM post

WHERE vector1 @@ to_tsquery(‘Michael & Jackson’)

ORDER BY ts_rank(vector1, to_tsquery(‘Michael & Jackson’));

HOW IT WORKS IN POSTGRESQL

Ranking:

SETWEIGHT, TS_RANK functions

Page 13: PostgreSQL Full-text Search in Django

SELECT ts_rank(to_tsvector('This is an example of document'),

to_tsquery('example')) as relevancy;

relevancy

-----------

0.0607927

(1 row)

SELECT ts_rank(to_tsvector('This is an example of document'),

to_tsquery('example | unknown')) as relevancy;

relevancy

-----------

0.0303964

(1 row)

HOW IT WORKS IN POSTGRESQL

Ranking:

SETWEIGHT, TS_RANK functions

Page 16: PostgreSQL Full-text Search in Django

Post.objects.annotate(

search=SearchVector('title')

+ SearchVector('content'),

).filter(search='Michael Jackson')

HOW IT WORKS IN POSTGRESQL

SearchVector model field

Post.objects.filter(title__search='Michael Jackson')

Page 17: PostgreSQL Full-text Search in Django

HOW IT WORKS IN POSTGRESQL

SearchVector model field (stored)

class Post(models.Model):

title = models.CharField(max_length=100)

content = models.TextField()

search_vector = SearchVectorField()

Post.objects.filter(search_vector='Michael Jackson')

vector = SearchVector('title', weight=’A’) + SearchVector('content', weight=’B’)

post.search_vector = vector

post.save()

Update SearchVector field in post_save signal

Page 18: PostgreSQL Full-text Search in Django

HOW IT WORKS IN POSTGRESQL

django.contrib.postgres.search.SearchRank

queryset = Post.objects.annotate(

rank=SearchRank(

models.F('search_vector'),

SearchQuery('Michael Jackson')

),

)

queryset.filter(rank__gt=0.5).order_by('-rank')

Page 19: PostgreSQL Full-text Search in Django

Question Time