postgresql replication and some test scenarios
TRANSCRIPT
POSTGRESQL REPLICATIONNUR AGUS SURYOKO
SIMPLICTIC WAY TO REPLICATE POSTGRES
Primary DB Standby DB
1. Backup 3. RESTORE2. RSYNC
THIS METHOD IS… It work, yes• But:• With database growing, replication
getting slower• When the duration of backup +
rsync + restore > the interval = disaster
• Rely heavily on scripts
PROPOSAL
PostgresReplicationAsynchronousHot-Stanby
IN ONE SENTENCE
Same with Oracle Dataguard and DB2-HADR
AT A GLANCE
Primary DB Standby DB
Archive-Log Archive-Log
1. SQL Connection Open
2. Restore Database + Copy Archive Log
3. Replication in Sync
REMEMBER DB2-HADR? JUST THE SAME
The asynchronous one
YEAH, BABY!
READ-ONLY QUERY ON STANDBY
Same capability with Oracle Active Dataguard and DB2-HADR RoS (Read on Standby)
LET’S GET TO WORK2 nodes Installation: Download EnterpriseDB PostgreSQL binary distribution, save a lot of workConventions:Node1 hostname: userverNode2 hostname: userver2
Create sample table:CREATE TABLE test (
id uuid DEFAULT uuid_generate_v4() NOT NULL,
waktu timestamp with time zone DEFAULT now() NOT NULL,
text character varying(255)
);
CONTINUOUSCreate this script:#!/bin/bashwhile :do psql -d test -c "insert into test(text) values('X')"; sleep 1;done;
Run this script on node-1:primaryYou see what we are trying to do here…
NETWORKS• Maintain /etc/hosts on each nodes so that they can ping
each other• Configure ssh-key for user postgres on each nodes. Allow
passwordless ssh
PRIMARY SETUPpsql# create user replicator replication login encrypted password 'Initial1';
Keep the name. We don’t want any trouble. Believe me, I tried
PRIMARY SETUPedit postgresql.conf:wal_level = hot_standbyfsync = onsynchronous_commit = localwal_sync_method = fsyncwal_compression = onarchive_mode = onarchive_command = 'test ! -f /opt/postgresql/9.5/backup/archive/%f && cp %p /opt/postgresql/9.5/backup/archive/%f'max_wal_senders = 8wal_keep_segments = 24wal_sender_timeout = 10shot_standby = on
Careful with reddy things, adjust accordingly
PRIMARY SETUPas root: mkdir -p /opt/postgresql/9.5/backup/archiveas root: mkdir -p /opt/postgresql/9.5/backupas root: chown -R postgres:postgres /opt/postgresql/9.5/backuprestart postgres primaryensure no error in $PGDATA/pg_log
PRIMARY SETUPedit pg_hba.conf:host all replicator 192.168.56.105/32 md5host replication replicator 192.168.56.105/32 md5restart postgres primaryensure no error in $PGDATA/pg_log
Why 2? Because replication is not a database. It is a special keyword
STANDBY SETUPtest connection to primary:psql -h userver -d test -U replication -W
STANDBY SETUPedit postgresql.conf:wal_level = hot_standbyfsync = onsynchronous_commit = localwal_sync_method = fsyncwal_compression = onarchive_mode = onarchive_command = 'test ! -f /opt/postgresql/9.5/backup/archive/%f && cp %p /opt/postgresql/9.5/backup/archive/%f'max_wal_senders = 8wal_keep_segments = 24wal_sender_timeout = 10shot_standby = on
STANDBY SETUPdestroy db on standbyroot: rm -rf /opt/postgresql/9.5/dataroot: mkdir -p /opt/postgresql/9.5/dataroot: chown -R postgres:postgres /opt/postgresql/9.5/dataroot: chmod -R 700 /opt/postgresql/9.5/data
Destroy db, not the engine
backup - and direct restore from primary:pg_basebackup -h userver -D /opt/postgresql/9.5/data -U replicator -v -P
STANDBY SETUPprepare file /opt/postgresql/9.5/data/recovery.conf:standby_mode = 'on'primary_conninfo = 'host=userver port=5432 user=replicator password=Initial1'trigger_file = '/home/postgres/trigger.file‘
start standbycheck test table. data should be updated periodically
STANDBY SETUPcheck replication status:test=# select * from pg_stat_replication;sync_state must be ‘sync’
SYNC
TEST1: NODE2 SHUTDOWN OSnode1: primarynode2: standbyshutdown node2transaction on primary still goingreplication status:test=# select * from pg_stat_replication;<empty>
Test Result: OK
TEST2: START STANDBYnode1: primarynode2: standbystartup node2auto-start after OS, and sync still goingcheck replication status:test=# select * from pg_stat_replication;sync_state = ‘sync’
Test Result: OK
TEST3: NORMAL FAILOVERnode1: primarynode2: standbystop periodic insertmark last inserted value (primary): 2668fadd-59dc-468a-ae85-65f9750bc336 | 2015-11-05 22:04:15.658038+07 | Xshutdown node1
TEST3: NORMAL FAILOVERfound disconnected log in standby, this is normal:2015-11-05 22:14:18 WIB FATAL: could not connect to the primary server: could not connect to server: Connection refusedIs the server running on host "userver" (192.168.56.104) and acceptingTCP/IP connections on port 5432?
TEST3: NORMAL FAILOVERnode2 as primarytouch /home/postgres/trigger.filecheck log, this is a successful failover message:2015-11-05 22:15:53 WIB LOG: trigger file found: /home/postgres/trigger.file2015-11-05 22:15:53 WIB LOG: redo done at 0/100000282015-11-05 22:15:53 WIB LOG: last completed transaction was at log time 2015-11-05 22:04:15.658499+072015-11-05 22:15:53 WIB LOG: selected new timeline ID: 22015-11-05 22:15:54 WIB LOG: archive recovery complete2015-11-05 22:15:54 WIB LOG: MultiXact member wraparound protections are now enabled2015-11-05 22:15:54 WIB LOG: database system is ready to accept connections2015-11-05 22:15:54 WIB LOG: autovacuum launcher started
TEST3: NORMAL FAILOVERcheck latest inserted data, and compare to marked data:
2668fadd-59dc-468a-ae85-65f9750bc336 | 2015-11-05 22:04:15.658038+07 | X
data match!
ensure /home/postgres/trigger.file is deleted automatically by postgres
Test Result: OK
TEST4: NETWORK DISCONNECTnode1: primary
node2: standby
ensure the database is in sync
cut network connection
node1: primary: not in sync
node2: cannot connect to primary
re-connect network
node1: sync
node2: log report started streaming WAL
Test Result: OK
TEST5: NODE1 SHUTDOWN node1: primary
node2: standby
shutdown node1: primary
startup node1: primary
after startup, automatically sync
check test database, updated periodically
Test Result: OK
THANK YOU