a postgres orchestra - pgconf 2017, india | postgresql...
TRANSCRIPT
![Page 2: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/2.jpg)
gitlab incident
![Page 3: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/3.jpg)
gitlab incident
![Page 4: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/4.jpg)
gitlab incident
![Page 5: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/5.jpg)
about
• points of failure in current practices and clustering systems
• design an abstraction on top of existing systems to ensure availability
• illustrate some scenarios
![Page 6: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/6.jpg)
primer
![Page 7: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/7.jpg)
db cluster / compute cluster
• a database cluster is a collection of databases that is managed by a single instance of a running database server
• a group of databases that work together to achieve higher performance and/or availability
![Page 8: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/8.jpg)
one master, multiple standbys
![Page 9: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/9.jpg)
totally available
• 0 single points of failure
• number of failures it can tolerate = infinite
![Page 10: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/10.jpg)
highly available
• tending towards 0 single point of failures
• number of failures it can tolerate = f
![Page 11: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/11.jpg)
increase the area of success
• write ahead logs
• standbys / replicas
![Page 12: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/12.jpg)
areas of success - WAL
![Page 13: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/13.jpg)
areas of success - WAL
![Page 14: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/14.jpg)
areas of success - replica
![Page 15: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/15.jpg)
failures
• filesystem gives up at 80% disk usage
• high replication lag
• too many connections / semaphores
• network partitions
• cyclone Vardah 🌀
![Page 16: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/16.jpg)
replication
• file-system replication
• logical backups
• log shipping / streaming
![Page 17: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/17.jpg)
replication - snapshots
$ zfs snapshot -r kitallis/home@now $ zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT rpool/zfs 78.3M - 4.53G - kitallis/home@now 0 - 26K -
![Page 18: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/18.jpg)
replication - logical
> pg_dump db1 > db.sql > psql -d db2 -f db.sql
> pg_dump -Fc db1 > db.dump > pg_restore -d db2 db.dump
> pg_dumpall > db.out > psql -f db.out postgres
![Page 19: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/19.jpg)
replication - streaming
![Page 20: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/20.jpg)
tools
![Page 21: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/21.jpg)
clustering tools
![Page 22: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/22.jpg)
clustering + orchestration
• we require more than just a nice API on top of replication
• govern the cluster as well
![Page 23: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/23.jpg)
properties
• service is inherently a singleton
• high availability
• fail-overs are automatic
• consumers should be able to refresh connections
• zero / minimal application state
![Page 24: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/24.jpg)
suggestions
![Page 25: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/25.jpg)
network-oriented
• keepalived
• UCARP
![Page 26: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/26.jpg)
keepalived / UCARP
• built on the VRRP protocol
• provides health check for up/down service
• UCARP and keepalived are pretty similar
![Page 27: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/27.jpg)
keepalived
• health check (try INSERTs or SELECTs)
• notify_master (promote standby)
• notify_standby (follow new master)
![Page 28: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/28.jpg)
keepalived
![Page 29: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/29.jpg)
keepalived
![Page 30: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/30.jpg)
keepalived (problem)
![Page 31: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/31.jpg)
keepalived / UCARP- nopreempt
nopreemptstate BACKUPpriority 101
[-P, –preempt]
![Page 32: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/32.jpg)
problems with keepalived
• switchover requires config reload
• does not try to down or up any service
• IP flapping
• doesn’t really do a cluster-level consensus
![Page 33: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/33.jpg)
low-level cluster-oriented tool
SELECT pg_catalog.pg_last_xlog_receive_location();
pg_last_xlog_receive_location------------------------------- 0/29004560(1 row)
![Page 34: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/34.jpg)
property check
• treats the service as inherently singleton ❌
• high availability ✓
• fail-overs are automatic ~
• consumers should be able to refresh connections ✓
• zero / minimal application state ✓
![Page 35: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/35.jpg)
cluster-oriented
• repmgrd
• Heartbeat / Pacemaker
![Page 36: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/36.jpg)
repmgr
• maintains its own db and table with the node information and replication state
| id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority | active ||----+---------+------------------+---------+-------+-----------+-----------+----------+--------|| 1 | master | | test | node1 | localhost | rs1 | 100 | t || 2 | standby | 1 | test | node2 | locahost | rs2 | 99 | t || 3 | standby | 1 | test | node3 | localhost | rs3 | 98 | t |
![Page 37: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/37.jpg)
repmgr
• master register
• standby register
• standby clone
• standby promote
• standby follow
• cluster show
$ repmgr -f /apps/repmgr.conf standby register
![Page 38: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/38.jpg)
repmgrd
![Page 39: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/39.jpg)
repmgrd
![Page 40: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/40.jpg)
repmgrd - follow master
![Page 41: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/41.jpg)
repmgrd - PgBouncer
![Page 42: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/42.jpg)
low-level node health
• the healthiest node will be picked for nodes that share the priority
• https://github.com/nilenso/postgres-docker-cluster/blob/master/tests/sync-failover.sh
![Page 43: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/43.jpg)
scale
![Page 44: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/44.jpg)
points of failure
![Page 45: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/45.jpg)
scale (absurd growth)
![Page 46: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/46.jpg)
small anecdote
• we were running a system with strict SLAs (< 10ms, 99 percentile, 2000 RPS)
• we were not confident with just having the trigger mechanism, as it was too prone to failures, needed high availability
• running an application connection pool for performance reasons
• so we added another line of defence
![Page 47: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/47.jpg)
more lines of defence
• custom master check
• list through all nodes and see if we can do an INSERT
![Page 48: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/48.jpg)
kinds of push mechanisms
• synchronous: ensure that the promotion is dependent on n number of apps returning a successful response (atomic to some degree)
• asynchronous: promote first, fire notifications to all your apps
![Page 49: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/49.jpg)
![Page 50: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/50.jpg)
network is reliable
![Page 51: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/51.jpg)
separation of concerns
• the database promotion is dependent of a large set of network calls being successful
• your database promotion script is aware of all your apps and/or bouncers
• your apps probably have an API endpoint to receive such updates
![Page 52: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/52.jpg)
trigger mechanism
• fire and forget call at a critical decision point
• no retry logic built-in
• single point of failure (hopefully never make n*m calls)
![Page 53: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/53.jpg)
property check
• treats the service as inherently singleton ✓
• high availability ✓
• fail-overs are automatic ✓
• consumers should be able to refresh connections ~
• zero / minimal application state ~
![Page 54: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/54.jpg)
proposition
![Page 55: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/55.jpg)
increase the area of success
• writing to the database ✓
• failing over to a new database
• telling your clients about the master database
![Page 56: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/56.jpg)
repmgrd is good, could be better
• it’s battle-tested
• has worked reliably in the past
![Page 57: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/57.jpg)
poll and then push
• poll outside the promotion script, build better retries
• frees up the database promotion from any external services
• many nodes can publish the same information
![Page 58: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/58.jpg)
polling allows heartbeats
• heartbeat / monitoring
• keep an eye on repmgrd
• zookeeper node monitoring
![Page 59: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/59.jpg)
push to a central datastore
• we don’t want to depend on the complex network graph and build retry mechanism around apps
• we want to avoid application state and exposing API end-points
![Page 60: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/60.jpg)
central datastore
• battle-tested
• strongly consistent
• operations appear to execute atomically
• highly available
![Page 61: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/61.jpg)
zookeeper
• compare and swap / atomic operations
• ephemeral nodes
![Page 63: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/63.jpg)
Agrajag
![Page 64: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/64.jpg)
the orchestrator
![Page 65: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/65.jpg)
repmgr cluster show
$ repmgr -f /etc/repmgr.conf cluster show
Role | Name | Upstream | Connection String ----------+-------|----------|---------------------------------------- * master | node1 | | host=db_node1 dbname=repmgr user=repmgr standby | node2 | node1 | host=db_node2 dbname=repmgr user=repmgr standby | node3 | node2 | host=db_node3 dbname=repmgr user=repmgr
![Page 66: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/66.jpg)
repmgr repl_events
repmgr=# SELECT * from repmgr_test.repl_events;
| node_id | event | successful | event_timestamp | details | |---------+------------------+------------+-----------------+---------| | 1 | master_register | t | 2016-01-08 | | | 2 | standby_clone | t | 2016-01-08 | | | 2 | standby_register | t | 2016-01-08 | |
![Page 67: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/67.jpg)
zookeeper CAS
1 (defn compare-and-set 2 [client path predicate? new-data & args] 3 (try (let [zk-data (get-data path) 4 deserialized (when-let [bytes (:data zk-data)] 5 (deserializer bytes)) 6 version (-> zk-data :stat :version)] 7 (when (predicate? deserialized) 8 (log/debug "Pubilshing data to zk for" path) 9 (zk/set-data client path (serializer new-data) version))) 10 (catch KeeperException$BadVersionException bve 11 (log/warn "Trying to do stale write" bve))))
![Page 68: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/68.jpg)
Agrajag
![Page 69: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/69.jpg)
the orchestrator
• repmgrd is dead, do a manual master check
• second line of defence built into the orchestrator
![Page 70: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/70.jpg)
back-of-the-envelope concerns
• stateful reads from zookeeper
• dependency on orchestrator
• network partitions
• app knows about the upstream datastore
• directly update HaProxy
![Page 71: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/71.jpg)
reads from zookeeper
• the application / bouncer still needs to read from zookeeper
• is it really stateless?
![Page 72: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/72.jpg)
zookeeper watchers
• allow stateless reads from zookeeper
1 (defn get-data 2 ([path] 3 (get-data path false nil)) 4 ([path watch? watcher-fn] 5 (-> (zk/data @client path :watch? watch? :watcher watcher-fn) 6 :data 7 deserializer)))
![Page 73: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/73.jpg)
zookeeper ephemeral nodes
![Page 74: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/74.jpg)
zookeeper ephemeral nodes
• proceed only if master is present
• ephemeral nodes > 0 for the master znode
![Page 75: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/75.jpg)
split-brain
• have a witness server setup with repmgr
• will implicitly fence the old master based on promotion times
![Page 76: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/76.jpg)
Agrajag
![Page 77: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/77.jpg)
split-brain
![Page 78: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/78.jpg)
![Page 79: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/79.jpg)
network split in zookeeper
![Page 80: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/80.jpg)
app still has information about an upstream dependency
• move it to conf.d
• write your own Agrajag client
![Page 81: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/81.jpg)
but now the watchers are prone to failures
• but they read from one central place we can trust
• multiple nodes get a chance to update that same place
• client libraries for zookeeper already exist, tried and tested
![Page 82: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/82.jpg)
“just directly update HAProxy”
• good idea
• does not have compare-and-swap
![Page 83: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/83.jpg)
speed
reconnect_attempts=6reconnect_interval=10
![Page 84: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/84.jpg)
Agrajag
![Page 85: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/85.jpg)
re-invent? re-write?
plug holes?
• slapping HA on top of HA
• where do you draw the line?
![Page 86: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/86.jpg)
network is reliable
![Page 87: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/87.jpg)
avoid consensus if possible
• consensus is done, plug holes
• only add new facts to the system
• avoid co-ordination
![Page 88: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/88.jpg)
the orchestrator
![Page 89: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/89.jpg)
recap
![Page 90: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/90.jpg)
future
• /failover API endpoint
• another line of defence
• push and then poll and then broadcast
![Page 91: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/91.jpg)
monitor all the things
![Page 92: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/92.jpg)
monitor all the things
![Page 93: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/93.jpg)
caveats and hedges
• repmgrd might not be the best at consensus
• Agrajag is very beta
• adding another layer of dependency
• not one-size-fits-all
![Page 94: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/94.jpg)
caveats and hedges - what about PgBouncer?
• the most appropriate use-case for this orchestration layer is when you have an application-level connection pool
• when PgBouncer is sitting on the application layer
![Page 95: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/95.jpg)
plz help
![Page 96: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/96.jpg)
takeaway
• apply these axioms / properties to your current system
• test thoroughly
• tell me I’m wrong 🙊
![Page 97: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/97.jpg)
open questions
• what about Patroni and Stolon?
• what about Heartbeat / Pacemaker?
• what about PgBouncer?
![Page 98: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/98.jpg)
![Page 99: a postgres orchestra - PGconf 2017, India | PostgreSQL ...2017.pgconf.in/wp-content/uploads/2016/05/A... · db cluster / compute cluster • a database cluster is a collection of](https://reader030.vdocuments.site/reader030/viewer/2022040620/5f33a2e4c673fb37db2e65f8/html5/thumbnails/99.jpg)
references• https://github.com/aphyr/distsys-class
• https://aphyr.com/posts/291-call-me-maybe-zookeeper
• https://wiki.postgresql.org/images/1/1e/QuanHa_PGConf_US_2016_-_Doc.pdf
• http://www.formilux.org/archives/haproxy/1003/3259.html
• www.interdb.jp/pg/pgsql11.html
• https://github.com/staples-sparx/Agrajag/blob/dev/doc/tradeoffs.md