postgresql: advanced features in practice

J Á N S U C H A L

2 2 . 1 1 . 2 0 1 1

@ R U B Y S L A V A

PostgreSQL: Advanced features in practice

Why PostgreSQL?

The world’s most advanced open source database.

Features!

Transactional DDL

Cost-based query optimizer + Graphical explain

Partial indexes

Function indexes

K-nearest search

Views

Recursive Queries

Window Functions

Transactional DDL

class CreatePostsMigration < ActiveRecord::Migration def change create_table :posts do |t| t.string :name, null: false t.text :body, null: false t.references :author, null: false t.timestamps null: false end add_index :posts, :title, unique: true end end

Where is the problem?

Transactional DDL

class CreatePostsMigration < ActiveRecord::Migration def change create_table :posts do |t| t.string :name, null: false t.text :body, null: false t.references :author, null: false t.timestamps null: false end add_index :posts, :title, unique: true end end

Where is the problem?

Column title does not exist! Table is created, index is not. Oops! Transactional DDL FTW!

Cost-based query optimizer

What is the best plan to execute a given query?

Cost = I/O + CPU operations needed

Sequential vs. random seek

Join order

Join type (nested loop, hash join, merge join)

Graphical EXPLAIN

pgAdmin (www.pgadmin.org)

Partial indexes

Conditional indexes

Problem: Async job/queue table, find failed jobs

Create index on failed_at column

99% of index is never used

Partial indexes

Conditional indexes

Problem: Async job/queue table, find failed jobs

Create index on failed_at column

99% of index is never used

Solution: CREATE INDEX idx_dj_only_failed ON delayed_jobs (failed_at)

WHERE failed_at IS NOT NULL;

smaller index

faster updates

Function Indexes

Problem: Suffix search

SELECT … WHERE code LIKE ‘%123’

Function Indexes

Problem: Suffix search

SELECT … WHERE code LIKE ‘%123’

“Solution”:

Add reverse_code column, populate, add triggers for updates, create index on reverse_code column

reverse queries WHERE reverse_code LIKE “321%”

Function Indexes

Problem: Suffix search SELECT … WHERE code LIKE ‘%123’

“Solution”: Add reverse_code column, populate, add triggers for updates,

create index on reverse_code column,

reverse queries WHERE reverse_code LIKE “321%”

PostgreSQL solution: CREATE INDEX idx_reversed ON projects

(reverse((code)::text) text_pattern_ops);

SELECT … WHERE reverse(code) LIKE

reverse(‘%123’)

K-nearest search

Problem: Fuzzy string matching 900K rows

CREATE INDEX idx_trgm_name ON subjects USING gist (name gist_trgm_ops); SELECT name, name <-> 'Michl Brla' AS dist FROM subjects ORDER BY dist ASC LIMIT 10; (312ms)

"Michal Barla“ ; 0.588235 "Michal Bula“ ; 0.647059 "Michal Broz“ ; 0.647059 "Pavel Michl“ ; 0.647059 "Michal Brna“ ; 0.647059

K-nearest search

Problem: Fuzzy string matching 900K rows

Solution: Ngram/Trigram search

johno = {" j"," jo",”hno”,”joh”,"no ",”ohn”} CREATE INDEX idx_trgm_name ON subjects USING gist (name gist_trgm_ops); SELECT name, name <-> 'Michl Brla' AS dist FROM subjects ORDER BY dist ASC LIMIT 10; (312ms)

"Michal Barla“ ; 0.588235 "Michal Bula“ ; 0.647059 "Michal Broz“ ; 0.647059 "Pavel Michl“ ; 0.647059 "Michal Brna“ ; 0.647059

Views

Constraints propagated down to views

CREATE VIEW edges AS

SELECT subject_id AS source_id,

connected_subject_id AS target_id FROM raw_connections

UNION ALL

SELECT connected_subject_id AS source_id,

subject_id AS target_id FROM raw_connections;

SELECT * FROM edges WHERE source_id = 123;

SELECT * FROM edges WHERE source_id < 500 ORDER BY source_id LIMIT 10 No materialization, 2x indexed select + 1x append/merge

Recursive Queries

Problem: Find paths between two nodes in graph WITH RECURSIVE search_graph(source,target,distance,path) AS (

SELECT source_id, target_id, 1,

ARRAY[source_id, target_id]

FROM edges WHERE source_id = 552506

UNION ALL

SELECT sg.source, e.target_id, sg.distance + 1,

path || ARRAY[e.target_id]

FROM search_graph sg

JOIN edges e ON sg.target = e.source_id

WHERE NOT e.target_id = ANY(path) AND distance < 4

)

SELECT * FROM search_graph LIMIT 100

Recursive Queries

Problem: Find paths between two nodes in graph WITH RECURSIVE search_graph(source,target,distance,path) AS (

SELECT source_id, target_id, 1,

ARRAY[source_id, target_id]

FROM edges WHERE source_id = 552506

UNION ALL

SELECT sg.source, e.target_id, sg.distance + 1,

path || ARRAY[e.target_id]

FROM search_graph sg

JOIN edges e ON sg.target = e.source_id

WHERE NOT e.target_id = ANY(path) AND distance < 4

)

SELECT * FROM search_graph WHERE target = 530556 LIMIT 100;

Recursive queries

Recursive queries

Graph with ~1M edges (61ms)

source; target; distance; path

530556; 552506; 2; {530556,185423,552506}

JUDr. Robert Kaliňák -> FoodRest s.r.o. -> Ing. Ján Počiatek

530556; 552506; 2; {530556,183291,552506}

JUDr. Robert Kaliňák -> FoRest s.r.o. -> Ing. Ján Počiatek

530556; 552506; 4; {530556,183291,552522,185423,552506}

JUDr. Robert Kaliňák -> FoodRest s.r.o. -> Lena Sisková -> FoRest s.r.o. -> Ing. Ján Počiatek

Window functions

“Aggregate functions without grouping” avg, count, sum, rank, row_number, ntile…

Problem: Find closest nodes to a given node Order by sum of path scores Path score = 0.9^<distance> / log(1 + <number of paths>)

SELECT source, target FROM (

SELECT source, target, path, distance,

0.9 ^ distance / log(1 +

COUNT(*) OVER (PARTITION BY distance,target)

) AS score

FROM ( … ) AS paths

) as scored_paths

GROUP BY source, target ORDER BY SUM(score) DESC

Window functions






COUNT(*) OVER (PARTITION BY distance, target)

) AS n


) as scored_paths


Window functions







) AS score


) as scored_paths


Window functions







) AS score


) AS scored_paths


Window functions

Example: Closest to Róbert Kaliňák "Bussines Park Bratislava a.s."

"JARABINY a.s."

"Ing. Robert Pintér"

"Ing. Ján Počiatek"

"Bratislava trade center a.s.“

…

1M edges, 41ms

Additional resources

www.postgresql.org

Read the docs, seriously

www.explainextended.com

SQL guru blog

explain.depesz.com

First aid for slow queries

www.wikivs.com/wiki/MySQL_vs_PostgreSQL

MySQL vs. PostgreSQL comparison

http://www.postgresql.org/

http://www.explainextended.com/

http://www.explain.depesz.com/

http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL

Real World Explain

www.postgresql.org

http://www.postgresql.org/

postgresql: advanced features in practice

Technology

id limit

nearest search problem

index idx

idfrom search

suffix search

function indexes problem

path asselect source

recursive queries problem