pro postgresql, oscon 2008
TRANSCRIPT
Pro PostgreSQL
Robert Treatomniti.combrighterlamp.org
Who Am I? (Why Listen To Me)
O-0
PostgreSQL User Since 6.5.x
DBA of High Traffic / Large PostgreSQL Instances
Long Time Contributor to PostgreSQL Project
Contribute / Maintain Several Open Source Projects
Co-Author Beginning PHP & PostgreSQL 8 (Apress)
Outline
O-1
What you need to know about the project
Getting started
Upgrading
Configuring your server
Hardware
Availability
Scalability
Query tuning
Tablespaces
Partitioning
Stuff you should know about
K-0
Know Your Way Around The Project
Know Your Way Around The Project
K-1
www.postgresql.org
downloads
documentation
bug reports
security alerts
wiki
support companies
rss > news events - versions
Know Your Way Around The Project
K-2
www.pgfoundry.orgprojects.postgresql.org
Modules
Programs
Resources
URI Type
CIText
SkyTools
Npgsql
Pl/Proxy
pg_bulkload
plpgsql-debugger
sample databases
Know Your Way Around The Project
K-3
www.planetpostgresql.org
Project News
Community News
Helpful Tips / Examples
Know Your Way Around The Project
K-4
archives.postgresql.org
mailing list archives back to 1997
full text search via postgtresql 8.3
keyword search suggestions
lists for users, developers, regional, user groups
Know Your Way Around The Project
K-5
#postgresql
irc.freenode.net
real time help
rtfm_please - ??help
Know Your Way Around The Project
K-6
project management
core team
committers
-hackers
roadmap
web team
S-0
Get Off To A Good Start
S-1
Get Off To A Good Start
Use package management
Consistent
Standardized
Simple
Different across systems
Upgrades are an issue
Trust your packager?
S-2
Get Off To A Good Start
Use package management
Different across systems
Upgrades are an issue
Trust your packager?
S-2
Get Off To A Good Start
Use package management
Don't Be Afraid To Roll Your Own
S-4
Get Off To A Good Start
$PGDATA/pg_log
/var/log/pgsql
when in doubt... (postgresql.conf)
separate disk
Configure Logging
Logging is often overlooked, but is the first step toward troubleshooting!
S-5
Get Off To A Good Start
most systems have different defaults
firewalls/ selinux (FATAL)
rtfm (pg_hba.conf, grant, revoke)
Configure Authentication
S-6
Get Off To A Good Start
TRUST
md5
IDENT
Authentication Methods
S-7
Get Off To A Good Start
trust these more than your own code
package dependent
use different schemas (when able)
pgcrypto
pgstatstuple, pg_buffercache, pg_freespacemap
/contrib
S-8
Get Off To A Good Start
package dependent
some are non-core (plruby, plr, plphp)
varying functionality
varying levels of trust
don't be afraid, test!
procedural languages
U-0
Let's Talk About Upgrades
U-1
Let's Talk About Upgrades
Versioning
First Digit (7.4.16 -> 8.2.0)
Second Digit (8.2.4 -> 8.3.0)
Third Digit (8.3.0 -> 8.3.1)
U-2
Let's Talk About Upgrades
Versioning
First Digit (7.4.16 -> 8.2.0)
Second Digit (8.2.4 -> 8.3.0)
Third Digit (8.3.0 -> 8.3.1)
U-3
Let's Talk About Upgrades
Versioning
First Digit (7.4.16 -> 8.2.0)
Second Digit (8.2.4 -> 8.3.0)
Third Digit (8.3.0 -> 8.3.1)
U-4
Let's Talk About Upgrades
Achtung!!
Make Backups!
Read the Release Notes!
U-5
Let's Talk About Upgrades
pg_dump/pg_restore
simple
-Fc is your friend
dump with new version of pg_dump
pitfalls (time, hdd)
U-6
Let's Talk About Upgrades
the slony method
not simple
create slave on new version
switchover (switch back?)
pitfalls (initial synch, compatibility)
U-7
Let's Talk About Upgrades
pg_migrator
in place upgrades
rewrites system catalog info
no way to go back (fs snapshots)
still new, under-flux
8.1 -> 8.2 only (for now)
U-8
Let's Talk About Upgrades
upgrading older db
cold standby
8.2 -> warm standby
8.4 -> hot standby ?
A-6
Availability
slony
asynchronous, master-slave replication
controlled switchover, failover
low i/o, time constraints
other benefits (upgrades, scaling)
A-7
Availability
bucardo
asynchronous, multi-master replication
also does master-slave
low i/o, time constraints
other benefits (upgrades, scaling)
A-8
Availability
shared disk
one copy of PGDATA on shared storage
standby takes over akin to db crash
shared disk is point of failure (raid)
STONITH
A-9
Availability
filesystem replication
drbd
filesystem mirrored between servers
synchronized, ordered writes
single disk system?
A-10
Availability
pgpool
dual-master, statement based
little caveats (random(),now(),sequences)
bigger caveats (security, password, pg_hba)
pgpool becomes failure point
A-11
Availability
postgres-r
multi-master, synchronous
just open sourced this month!
small community
not proven
H-0
Scalability
H-1
Scalability
what is scaling?
How well a solution to some problem will work when the relative size of the problem increases
- Theo Schlossnagle
H-2
Scalability
bigger, better, faster, more!
postgresql scales up pretty well
more disks (tablespaces)
more cpu's, more ram
connection pooling
1000+ connections, TB+ data
H-3
Scalability
pgpool
dual-master, statement based
little caveats (random(),now(),sequences)
bigger caveats (security, password, pg_hba)
pgpool becomes failure point
H-4
Scalability
pg_bouncer
simple connection pooler
10/1 -> 40/1
caveats (prepared statements, temp tables)
skype, myyearbook.com
H-5
Scalability
slony
asynchronous, master-slave replication
multiple, cascading slaves
scales read operations
other benefits (upgrades, scaling)
solid user base
H-6
Scalability
bucardo
asynchronous, multi-master replication
also does master-slave
low i/o, time constraints
other benefits (upgrades, scaling)
H-7
Scalability
pgpool-II
single db over multiple machines
scales read operations
replication, load balance, parallel query
green technology
H-8
Scalability
pgcluster
synchronous multi-master replication
significant complexity
scales read operations
other uses (failover abilities)
green technology
H-9
Scalability
postgres-r
multi-master, synchronous
just open source this month!
small community
other uses (failover abilities)
not proven
H-10
Scalability
pitr read-only slaves
based on pitr, warm standby operation
core team officially supporting development
8.4 -> synchronous wal shipping
8.? -> read only slaves
J-0
Query Your Queries
J-1
Query Your Queries
finding slow queries:log_min_duration_statement
-1, 0 , n
superuser only
alter user
LOG: duration: 5005.273 ms statement: select pg_sleep(5);
J-2
Query Your Queries
finding slow queries:pgfouine / pqa
log analyzers
command line, generate reports
i/o load
http://pgfouine.projects.postgresql.org/reports.htmlhttp://pqa.projects.postgresql.org/example.html
J-3
Query Your Queries
finding slow queries:pg_stat_all_tables
pagila=# \d pg_stat_all_tablesView "pg_catalog.pg_stat_all_tables" Column | Type |------------------+-------------+ relid | oid | schemaname | name | relname | name | seq_scan | bigint | seq_tup_read | bigint | idx_scan | bigint | idx_tup_fetch | bigint | n_tup_ins | bigint | n_tup_upd | bigint | n_tup_del | bigint | n_tup_hot_upd | bigint | n_live_tup | bigint | n_dead_tup | bigint | last_vacuum | timestamptz | last_autovacuum | timestamptz | last_analyze | timestamptz | last_autoanalyze | timestamptz |
J-4
Query Your Queries
finding slow queries:pg_stat_all_tables
pagila=# \d pg_stat_all_tablesView "pg_catalog.pg_stat_all_tables" Column | Type |------------------+-------------+ relid | oid | schemaname | name | relname | name | seq_scan | bigint | seq_tup_read | bigint | idx_scan | bigint | idx_tup_fetch | bigint | n_tup_ins | bigint | n_tup_upd | bigint | n_tup_del | bigint | n_tup_hot_upd | bigint | n_live_tup | bigint | n_dead_tup | bigint | last_vacuum | timestamptz | last_autovacuum | timestamptz | last_analyze | timestamptz | last_autoanalyze | timestamptz |
J-5
Query Your Queries
finding slow queries:pg_stat_all_tables
pagila=# \d pg_stat_all_tablesView "pg_catalog.pg_stat_all_tables" Column | Type |------------------+-------------+ relid | oid | schemaname | name | relname | name | seq_scan | bigint | seq_tup_read | bigint | idx_scan | bigint | idx_tup_fetch | bigint | n_tup_ins | bigint | n_tup_upd | bigint | n_tup_del | bigint | n_tup_hot_upd | bigint | n_live_tup | bigint | n_dead_tup | bigint | last_vacuum | timestamptz | last_autovacuum | timestamptz | last_analyze | timestamptz | last_autoanalyze | timestamptz |
J-6
Query Your Queries
finding slow queries:pg_stat_all_tables
pagila=# \d pg_stat_all_tablesView "pg_catalog.pg_stat_all_tables" Column | Type |------------------+-------------+ relid | oid | schemaname | name | relname | name | seq_scan | bigint | seq_tup_read | bigint | idx_scan | bigint | idx_tup_fetch | bigint | n_tup_ins | bigint | n_tup_upd | bigint | n_tup_del | bigint | n_tup_hot_upd | bigint | n_live_tup | bigint | n_dead_tup | bigint | last_vacuum | timestamptz | last_autovacuum | timestamptz | last_analyze | timestamptz | last_autoanalyze | timestamptz |
J-7
Query Your Queries
finding slow queries:pg_stat_all_tables
pagila=# \d pg_stat_all_tablesView "pg_catalog.pg_stat_all_tables" Column | Type |------------------+-------------+ relid | oid | schemaname | name | relname | name | seq_scan | bigint | seq_tup_read | bigint | idx_scan | bigint | idx_tup_fetch | bigint | n_tup_ins | bigint | n_tup_upd | bigint | n_tup_del | bigint | n_tup_hot_upd | bigint | n_live_tup | bigint | n_dead_tup | bigint | last_vacuum | timestamptz | last_autovacuum | timestamptz | last_analyze | timestamptz | last_autoanalyze | timestamptz |
J-8
Query Your Queries
finding slow queries:pg_stat_all_tables
pagila=# \d pg_stat_all_tablesView "pg_catalog.pg_stat_all_tables" Column | Type |------------------+-------------+ relid | oid | schemaname | name | relname | name | seq_scan | bigint | seq_tup_read | bigint | idx_scan | bigint | idx_tup_fetch | bigint | n_tup_ins | bigint | n_tup_upd | bigint | n_tup_del | bigint | n_tup_hot_upd | bigint | n_live_tup | bigint | n_dead_tup | bigint | last_vacuum | timestamptz | last_autovacuum | timestamptz | last_analyze | timestamptz | last_autoanalyze | timestamptz |
J-9
Query Your Queries
finding slow queries:pg_stat_all_indexes
pagila=# \d pg_stat_all_indexesView "pg_catalog.pg_stat_all_indexes" Column | Type |---------------+--------+ relid | oid | indexrelid | oid | schemaname | name | relname | name | indexrelname | name | idx_scan | bigint | idx_tup_read | bigint | idx_tup_fetch | bigint |
J-10
Query Your Queries
finding slow queries:pg_stat_all_indexes
pagila=# \d pg_stat_all_indexesView "pg_catalog.pg_stat_all_indexes" Column | Type |---------------+--------+ relid | oid | indexrelid | oid | schemaname | name | relname | name | indexrelname | name | idx_scan | bigint | idx_tup_read | bigint | idx_tup_fetch | bigint |
J-11
Query Your Queries
finding slow queries:pg_statio_all_tables
pagila=# \d pg_statio_all_tablesView "pg_catalog.pg_statio_all_tables" Column | Type |-----------------+--------+ relid | oid | schemaname | name | relname | name | heap_blks_read | bigint | heap_blks_hit | bigint | idx_blks_read | bigint | idx_blks_hit | bigint | toast_blks_read | bigint | toast_blks_hit | bigint | tidx_blks_read | bigint | tidx_blks_hit | bigint |
J-12
Query Your Queries
finding slow queries:pg_statio_all_tables
pagila=# \d pg_statio_all_tablesView "pg_catalog.pg_statio_all_tables" Column | Type |-----------------+--------+ relid | oid | schemaname | name | relname | name | heap_blks_read | bigint | heap_blks_hit | bigint | idx_blks_read | bigint | idx_blks_hit | bigint | toast_blks_read | bigint | toast_blks_hit | bigint | tidx_blks_read | bigint | tidx_blks_hit | bigint |
J-13
Query Your Queries
fixing slow queries:explain analyze
universal tool
good for specific queries
explain for large queries
could be it's own talk
J-14
Query Your Queries
fixing slow queries:explain analyze
universal tool
good for specific queries
explain for large queries
could be it's own talk
http://wiki.postgresql.org/Using_EXPLAIN
J-15
Query Your Queries
fixing slow queries:indexing (basic)
use explain to find large sequential reads
use pg_stat_* tables to find numerous reads
btree (gist/gin)
enable_indexscan, enable_bitmapscan
dual column vs. single column
J-16
Query Your Queries
fixing slow queries:indexing (partial)
create index address_ba_part_idx on address (district) where district = 'Buenos Aires';
restrain index to rows that matter
can give significant speed improvements
where clause of index should match
where clause of query
J-17
Query Your Queries
fixing slow queries:indexing (partial)
create index customer_active_part_idx on customer (customer_id) where activebool is true;
restrain index to rows that matter
can give significant speed improvements
where clause of index should match
where clause of query
J-18
Query Your Queries
fixing slow queries:indexing (functional)
some people prefer to call these expressional indexes
J-19
Query Your Queries
fixing slow queries:indexing (expressional)
create unique index one_true_email_xidx on customer (lower(email));
push expensive functions into your index
system sees just WHERE indexedcolumn = 'constant'
expression of index should match expression of queries
narrow scope, but nice gains
J-20
Query Your Queries
fixing slow queries:indexing (expressional)
create index fullname_xidx on customer ((first_name||' '||last_name));
push expensive functions into your index
system sees just WHERE indexedcolumn = 'constant'
expression of index should match expression of queries
narrow scope, but nice gains
J-21
Query Your Queries
fixing slow queries:full text search
uses lexmes and word stemming to find common words
replacement for LIKE '%x%', ~* 'x';
supports multiple languages, custom dictionaries
special indexing options
J-22
Indexing Options
full text indexinggist vs. gin
old school
slower for queries
faster insert / update
mature
new in 8.2
faster for queries
slower insert / update
stable
N-0
PostgreSQL Tablespaces
N-1
PostgreSQL Tablespaces
tablespaces?
define logical locations for object placement
point to locations on disk (uses symlinks)
size determined by disk size (not pre-ordained)
dedicate per db, split db across multiple tblspc
N-2
PostgreSQL Tablespaces
tablespaces!
split database over separate disks
use stat, statio tables to gauge disk access
create dedicated storage for workloads
disk for read / write
disk for read only
large, slow disk for archiving
disk for indexes
Q-0
PostgreSQL Partitioning
Q-1
PostgreSQL Partitioning
partitioning?
as table size grows, it becomes unmanageable
use inheritance, rules, constraints to split data
queries ignore non-relevant partitions
could be it's own talk
Q-2
PostgreSQL Partitioning
partitioning!
as table size grows, it becomes unmanageable
use inheritance, rules, constraints to split data
queries ignore non-relevant partitions
could be it's own talk
http://www.pgcon.org/2007/schedule/events/41.en.html
Q-3
PostgreSQL Partitioning
partitioning : key points
determine list vs. range
use triggers rather than rules
partition creation vs. data population
automate maintenance
I-0
Other Stuff I Should Mention
I-1
Other Stuff I Should Mention
pgcrypto
cryptography type functions
/contrib (export issues)
md5, sha1, blowfish, many more
I-2
Other Stuff I Should Mention
dblink
pg -> pg connections
/contrib (still under development?)
can have performance issues on large queries
make it live in it's own schema
I-3
Other Stuff I Should Mention
*-link
heterogenous connections for postgresql
db specific and db independent options
any pl/u language can implement this
similar performance issues to dblink
dblink-tds, dbi-link, oralink, odbclink
http://www.pgfoundry.org/ (db link)
I-4
Other Stuff I Should Mention
autonomous logging tool
persistent logging for postgresql functions
built on top of dblink
make it live in it's own schema
https://labs.omniti.com/trac/pgsoltools
I-5
Other Stuff I Should Mention
snapshot pitr clones
full read/write copy of pitr slave
static snapshot
need solaris (zfs zone mojo)
could re-implement on other systems
https://labs.omniti.com/trac/pgsoltools
I-6
Other Stuff I Should Mention
check_postgres
nagios based monitoring script
common items for warnings and alerts
can be adapted to other uses
http://bucardo.org/check_postgres
I-7
Other Stuff I Should Mention
reconnoiter
monitoring / graphing tool
postgres based
still pretty green
https://labs.omniti.com/trac/reconnoiter
I-8
Other Stuff I Should Mention
phpPgAdmin
web based gui for postgresql
remote administration of multiple servers
implements much of postgresql functionality
support back to 7.2?
http://phppgadmin.sourceforge.net/
I-9
Other Stuff I Should Mention ;-)
my book?
I-10
Other Stuff I Should Mention ;-)
we're hiring
Ops Ninjas
Perl Kung-Fu Artists
PHP Ninjas
Database Samurai
http://omniti.com/is/hiring
L-0
El Fin