replication solutions for postgresql
DESCRIPTION
TRANSCRIPT
Replication Solutions for PostgreSQL
Peter [email protected]
2
What's in a Term?
• Replication?
• Clustering?
• High availability?
• Failover?
• Standby?
• Putting data on more than one computer
3
Space of Possibilities
• Goals• What do you want to achieve?
• Techniques• How can this be implemented?
• Solutions• What software is available to do this?
4
Goals
• High availability
• Performance• Read
• Write
• Wide-area networks
• Offline peers
5
Goal: High Availability
• No one wants “low availability”!
• Provisions for system failures• Software faults
• Hardware faults
• External interference
6
Goal: Read Performance
• Applications with:• many readers (e.g., library information system)
• resource-intensive readers (e.g., data warehousing)
• Distribute readers on more hardware.
• Most often, one physical machine is enough.
7
Goal: Write Performance
• Applications with:• Many writers
• Distribute writers on more hardware?• Constraint checking, conflict resolution?!?
• Faster writing contradicts replication.• Partition, don't replicate!
• RAID 0/striping is not replication – it makes things “worse”.
• RAID 10 is a good idea, but not the topic here.
8
Goal: Wide-Area Networks
• Faster access across WANs
• Reading?• Local copies
• Writing?• Synchronization?
9
Goal: Offline Peers
• Synchronize data with laptops, handhelds, ...
• “Road warriors”
• May be considered very-low-latency WANs
10
Techniques
• Replication• Master/Slave
• Asynchronous
• Synchronous
• Multi-Master• Asynchronous
• Synchronous
• Proxy
• Standby system
11
Technique: Replication Master/Slave Asynchronous
• High(er) availability(?)
• Read performance• Load spreading,
load balancing
• Offline peers(unidirectional sync.)
M Sasync
12
Technique: Replication Master/Slave Synchronous
• High availibility
• Better readperformance
• Worse writeperformance M Ssync
13
Technique: ReplicationMulti-Master Asynchronous
• Read performance
• Faster access acrossWANs
• Manage offline peers
• Requires conflictresolution mechanism
M Masync
14
Technique: ReplicationMulti-Master Synchronous
• “Holy grail of replication”
• High availability
• Read performance
• Difficult to get goodwrite performance M Msync
15
Technique: Proxy
• High availability
• Read performance
• Proxy instance shouldbe redundant
• Transparent to theapplication
C C
Proxy
16
Technique: Standby System
• High availability
M Ssync
17
Constraints
• Hardware
• Operating system
• Application
18
No Built-In Solution?
• FIXME
19
Solutions
• Slony-I, -II
• PGCluster
• DBMirror
• pgpool
• WAL replication
• Sequoia
• DRBD
• Shared storage
20
Solution: Slony-I
(Slony ← слоны ← elephants)
• Asynchronous master/slave replication
• Multiple slaves, cascading possible
• Particularly useful for:• Read performance (load balancing with pgpool)
• Limited form of high availability
• Offline slaves via file-based log shipping
http://www.slony.info/
21
Solution: Slony-II
• Synchronous master/masterreplication?
• See Gavin Sherry's session for details
22
Solution: PGCluster
• Synchronous master/master replication
• Replicates the query string
• Particularly useful for:• Load balancing
• High availability
http://pgcluster.projects.postgresql.org/
23
Solution: DBMirror
• Asynchronous master/slave replication
• Very simple (compared to Slony-I)
• Particularly useful for:• Read performance
• Offline peers
contrib/dbmirror/ in PostgreSQL source tree
24
Solution: pgpool
• Connection pool daemon for PostgreSQL
• Supports simple proxying
• Useful as frontend for Slony-I
http://pgpool.projects.postgresql.org/
25
Solution: WAL Replication
• Use the “archived” WAL logs for “recovery” on a standby system
• Disadvantages:• Only full database cluster replication
• Master and slave must be binary-compatible
• Rather slow across network
• Useful for:• High availability
26
Solution: Sequoia
• Formerly C[lustered]-JDBC
• Proxy offering clustering, loadbalancing and failover services
• Particularly useful for:• High availability
• Read performance
• Currently only for Java/JDBC applications
http://sequoia.continuent.org/
27
Solution: DRBD
• File system (block device) replication
• Linux kernel module
• Standby system
• Useful for:• High availability
• Secure any service, not just a database system
http://www.drbd.org/
28
Solution: Shared Storage
• NAS, iSCSI, Fiberchannel, ...
• Available from many vendors
• Standby system
• Useful for:• High availability
• Secure any service, not just a database system
• Single storage system is a possible point of failure
29
Summary
• Plenty of solutions for diverse applications
• Make a (project) plan.
30
Suggestions
• Minimum for any production installation:• Sensible disk clustering
• RAID 10
• Tablespace management
• Separate disk(s) for WAL
• DRBD or shared storage
• Slony-I for load balancing or warehousing
• Java developers consider Sequoia
31
Outlook
• Slony-II
• WAL replication management
• XA support
• More packaging efforts
32
The End
Replication Solutions for PostgreSQL