postgresql: advanced features in practice

31
JÁN SUCHAL 22.11.2011 @RUBYSLAVA PostgreSQL: Advanced features in practice

Upload: jano-suchal

Post on 10-Jun-2015

5.787 views

Category:

Technology


2 download

DESCRIPTION

Transactional DDL, partial & function indexes, fuzzy string matching with trigram indexes, views, recursive/with queries and window functions.

TRANSCRIPT

Page 1: PostgreSQL: Advanced features in practice

J Á N S U C H A L

2 2 . 1 1 . 2 0 1 1

@ R U B Y S L A V A

PostgreSQL: Advanced features in practice

Page 2: PostgreSQL: Advanced features in practice

Why PostgreSQL?

The world’s most advanced open source database.

Features!

Transactional DDL

Cost-based query optimizer + Graphical explain

Partial indexes

Function indexes

K-nearest search

Views

Recursive Queries

Window Functions

Page 3: PostgreSQL: Advanced features in practice

Transactional DDL

class CreatePostsMigration < ActiveRecord::Migration def change create_table :posts do |t| t.string :name, null: false t.text :body, null: false t.references :author, null: false t.timestamps null: false end add_index :posts, :title, unique: true end end

Where is the problem?

Page 4: PostgreSQL: Advanced features in practice

Transactional DDL

class CreatePostsMigration < ActiveRecord::Migration def change create_table :posts do |t| t.string :name, null: false t.text :body, null: false t.references :author, null: false t.timestamps null: false end add_index :posts, :title, unique: true end end

Where is the problem?

Column title does not exist! Table is created, index is not. Oops! Transactional DDL FTW!

Page 5: PostgreSQL: Advanced features in practice

Cost-based query optimizer

What is the best plan to execute a given query?

Cost = I/O + CPU operations needed

Sequential vs. random seek

Join order

Join type (nested loop, hash join, merge join)

Page 6: PostgreSQL: Advanced features in practice

Graphical EXPLAIN

pgAdmin (www.pgadmin.org)

Page 7: PostgreSQL: Advanced features in practice

Partial indexes

Conditional indexes

Problem: Async job/queue table, find failed jobs

Create index on failed_at column

99% of index is never used

Page 8: PostgreSQL: Advanced features in practice

Partial indexes

Conditional indexes

Problem: Async job/queue table, find failed jobs

Create index on failed_at column

99% of index is never used

Solution: CREATE INDEX idx_dj_only_failed ON delayed_jobs (failed_at)

WHERE failed_at IS NOT NULL;

smaller index

faster updates

Page 9: PostgreSQL: Advanced features in practice

Function Indexes

Problem: Suffix search

SELECT … WHERE code LIKE ‘%123’

Page 10: PostgreSQL: Advanced features in practice

Function Indexes

Problem: Suffix search

SELECT … WHERE code LIKE ‘%123’

“Solution”:

Add reverse_code column, populate, add triggers for updates, create index on reverse_code column

reverse queries WHERE reverse_code LIKE “321%”

Page 11: PostgreSQL: Advanced features in practice

Function Indexes

Problem: Suffix search SELECT … WHERE code LIKE ‘%123’

“Solution”: Add reverse_code column, populate, add triggers for updates,

create index on reverse_code column,

reverse queries WHERE reverse_code LIKE “321%”

PostgreSQL solution: CREATE INDEX idx_reversed ON projects

(reverse((code)::text) text_pattern_ops);

SELECT … WHERE reverse(code) LIKE

reverse(‘%123’)

Page 12: PostgreSQL: Advanced features in practice

K-nearest search

Problem: Fuzzy string matching 900K rows

CREATE INDEX idx_trgm_name ON subjects USING gist (name gist_trgm_ops); SELECT name, name <-> 'Michl Brla' AS dist FROM subjects ORDER BY dist ASC LIMIT 10; (312ms)

"Michal Barla“ ; 0.588235 "Michal Bula“ ; 0.647059 "Michal Broz“ ; 0.647059 "Pavel Michl“ ; 0.647059 "Michal Brna“ ; 0.647059

Page 13: PostgreSQL: Advanced features in practice

K-nearest search

Problem: Fuzzy string matching 900K rows

Solution: Ngram/Trigram search

johno = {" j"," jo",”hno”,”joh”,"no ",”ohn”} CREATE INDEX idx_trgm_name ON subjects USING gist (name gist_trgm_ops); SELECT name, name <-> 'Michl Brla' AS dist FROM subjects ORDER BY dist ASC LIMIT 10; (312ms)

"Michal Barla“ ; 0.588235 "Michal Bula“ ; 0.647059 "Michal Broz“ ; 0.647059 "Pavel Michl“ ; 0.647059 "Michal Brna“ ; 0.647059

Page 14: PostgreSQL: Advanced features in practice

K-nearest search

Problem: Fuzzy string matching 900K rows

Solution: Ngram/Trigram search

johno = {" j"," jo",”hno”,”joh”,"no ",”ohn”} CREATE INDEX idx_trgm_name ON subjects USING gist (name gist_trgm_ops); SELECT name, name <-> 'Michl Brla' AS dist FROM subjects ORDER BY dist ASC LIMIT 10; (312ms)

"Michal Barla“ ; 0.588235 "Michal Bula“ ; 0.647059 "Michal Broz“ ; 0.647059 "Pavel Michl“ ; 0.647059 "Michal Brna“ ; 0.647059

Page 15: PostgreSQL: Advanced features in practice

Views

Constraints propagated down to views

CREATE VIEW edges AS

SELECT subject_id AS source_id,

connected_subject_id AS target_id FROM raw_connections

UNION ALL

SELECT connected_subject_id AS source_id,

subject_id AS target_id FROM raw_connections;

SELECT * FROM edges WHERE source_id = 123;

SELECT * FROM edges WHERE source_id < 500 ORDER BY source_id LIMIT 10 No materialization, 2x indexed select + 1x append/merge

Page 16: PostgreSQL: Advanced features in practice

Views

Constraints propagated down to views

CREATE VIEW edges AS

SELECT subject_id AS source_id,

connected_subject_id AS target_id FROM raw_connections

UNION ALL

SELECT connected_subject_id AS source_id,

subject_id AS target_id FROM raw_connections;

SELECT * FROM edges WHERE source_id = 123;

SELECT * FROM edges WHERE source_id < 500 ORDER BY source_id LIMIT 10 No materialization, 2x indexed select + 1x append/merge

Page 17: PostgreSQL: Advanced features in practice

Recursive Queries

Problem: Find paths between two nodes in graph WITH RECURSIVE search_graph(source,target,distance,path) AS (

SELECT source_id, target_id, 1,

ARRAY[source_id, target_id]

FROM edges WHERE source_id = 552506

UNION ALL

SELECT sg.source, e.target_id, sg.distance + 1,

path || ARRAY[e.target_id]

FROM search_graph sg

JOIN edges e ON sg.target = e.source_id

WHERE NOT e.target_id = ANY(path) AND distance < 4

)

SELECT * FROM search_graph LIMIT 100

Page 18: PostgreSQL: Advanced features in practice

Recursive Queries

Problem: Find paths between two nodes in graph WITH RECURSIVE search_graph(source,target,distance,path) AS (

SELECT source_id, target_id, 1,

ARRAY[source_id, target_id]

FROM edges WHERE source_id = 552506

UNION ALL

SELECT sg.source, e.target_id, sg.distance + 1,

path || ARRAY[e.target_id]

FROM search_graph sg

JOIN edges e ON sg.target = e.source_id

WHERE NOT e.target_id = ANY(path) AND distance < 4

)

SELECT * FROM search_graph LIMIT 100

Page 19: PostgreSQL: Advanced features in practice

Recursive Queries

Problem: Find paths between two nodes in graph WITH RECURSIVE search_graph(source,target,distance,path) AS (

SELECT source_id, target_id, 1,

ARRAY[source_id, target_id]

FROM edges WHERE source_id = 552506

UNION ALL

SELECT sg.source, e.target_id, sg.distance + 1,

path || ARRAY[e.target_id]

FROM search_graph sg

JOIN edges e ON sg.target = e.source_id

WHERE NOT e.target_id = ANY(path) AND distance < 4

)

SELECT * FROM search_graph WHERE target = 530556 LIMIT 100;

Page 20: PostgreSQL: Advanced features in practice

Recursive Queries

Problem: Find paths between two nodes in graph WITH RECURSIVE search_graph(source,target,distance,path) AS (

SELECT source_id, target_id, 1,

ARRAY[source_id, target_id]

FROM edges WHERE source_id = 552506

UNION ALL

SELECT sg.source, e.target_id, sg.distance + 1,

path || ARRAY[e.target_id]

FROM search_graph sg

JOIN edges e ON sg.target = e.source_id

WHERE NOT e.target_id = ANY(path) AND distance < 4

)

SELECT * FROM search_graph WHERE target = 530556 LIMIT 100;

Page 21: PostgreSQL: Advanced features in practice

Recursive Queries

Problem: Find paths between two nodes in graph WITH RECURSIVE search_graph(source,target,distance,path) AS (

SELECT source_id, target_id, 1,

ARRAY[source_id, target_id]

FROM edges WHERE source_id = 552506

UNION ALL

SELECT sg.source, e.target_id, sg.distance + 1,

path || ARRAY[e.target_id]

FROM search_graph sg

JOIN edges e ON sg.target = e.source_id

WHERE NOT e.target_id = ANY(path) AND distance < 4

)

SELECT * FROM search_graph WHERE target = 530556 LIMIT 100;

Page 22: PostgreSQL: Advanced features in practice

Recursive queries

Page 23: PostgreSQL: Advanced features in practice

Recursive queries

Graph with ~1M edges (61ms)

source; target; distance; path

530556; 552506; 2; {530556,185423,552506}

JUDr. Robert Kaliňák -> FoodRest s.r.o. -> Ing. Ján Počiatek

530556; 552506; 2; {530556,183291,552506}

JUDr. Robert Kaliňák -> FoRest s.r.o. -> Ing. Ján Počiatek

530556; 552506; 4; {530556,183291,552522,185423,552506}

JUDr. Robert Kaliňák -> FoodRest s.r.o. -> Lena Sisková -> FoRest s.r.o. -> Ing. Ján Počiatek

Page 24: PostgreSQL: Advanced features in practice

Window functions

“Aggregate functions without grouping” avg, count, sum, rank, row_number, ntile…

Problem: Find closest nodes to a given node Order by sum of path scores Path score = 0.9^<distance> / log(1 + <number of paths>)

SELECT source, target FROM (

SELECT source, target, path, distance,

0.9 ^ distance / log(1 +

COUNT(*) OVER (PARTITION BY distance,target)

) AS score

FROM ( … ) AS paths

) as scored_paths

GROUP BY source, target ORDER BY SUM(score) DESC

Page 25: PostgreSQL: Advanced features in practice

Window functions

“Aggregate functions without grouping” avg, count, sum, rank, row_number, ntile…

Problem: Find closest nodes to a given node Order by sum of path scores Path score = 0.9^<distance> / log(1 + <number of paths>)

SELECT source, target FROM (

SELECT source, target, path, distance,

0.9 ^ distance / log(1 +

COUNT(*) OVER (PARTITION BY distance,target)

) AS score

FROM ( … ) AS paths

) as scored_paths

GROUP BY source, target ORDER BY SUM(score) DESC

Page 26: PostgreSQL: Advanced features in practice

Window functions

“Aggregate functions without grouping” avg, count, sum, rank, row_number, ntile…

Problem: Find closest nodes to a given node Order by sum of path scores Path score = 0.9^<distance> / log(1 + <number of paths>)

SELECT source, target FROM (

SELECT source, target, path, distance,

0.9 ^ distance / log(1 +

COUNT(*) OVER (PARTITION BY distance, target)

) AS n

FROM ( … ) AS paths

) as scored_paths

GROUP BY source, target ORDER BY SUM(score) DESC

Page 27: PostgreSQL: Advanced features in practice

Window functions

“Aggregate functions without grouping” avg, count, sum, rank, row_number, ntile…

Problem: Find closest nodes to a given node Order by sum of path scores Path score = 0.9^<distance> / log(1 + <number of paths>)

SELECT source, target FROM (

SELECT source, target, path, distance,

0.9 ^ distance / log(1 +

COUNT(*) OVER (PARTITION BY distance, target)

) AS score

FROM ( … ) AS paths

) as scored_paths

GROUP BY source, target ORDER BY SUM(score) DESC

Page 28: PostgreSQL: Advanced features in practice

Window functions

“Aggregate functions without grouping” avg, count, sum, rank, row_number, ntile…

Problem: Find closest nodes to a given node Order by sum of path scores Path score = 0.9^<distance> / log(1 + <number of paths>)

SELECT source, target FROM (

SELECT source, target, path, distance,

0.9 ^ distance / log(1 +

COUNT(*) OVER (PARTITION BY distance, target)

) AS score

FROM ( … ) AS paths

) AS scored_paths

GROUP BY source, target ORDER BY SUM(score) DESC

Page 29: PostgreSQL: Advanced features in practice

Window functions

Example: Closest to Róbert Kaliňák "Bussines Park Bratislava a.s."

"JARABINY a.s."

"Ing. Robert Pintér"

"Ing. Ján Počiatek"

"Bratislava trade center a.s.“

1M edges, 41ms

Page 30: PostgreSQL: Advanced features in practice

Additional resources

www.postgresql.org

Read the docs, seriously

www.explainextended.com

SQL guru blog

explain.depesz.com

First aid for slow queries

www.wikivs.com/wiki/MySQL_vs_PostgreSQL

MySQL vs. PostgreSQL comparison

Page 31: PostgreSQL: Advanced features in practice

Real World Explain

www.postgresql.org