postgresql high availability in a containerized world

49
POSTGRESQL HIGH AVAILABILITY IN A CONTAINERIZED WORLD Jignesh Shah Chief Architect

Upload: jignesh-shah

Post on 13-Apr-2017

926 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: PostgreSQL High Availability in a Containerized World

POSTGRESQL HIGH AVAILABILITY IN A CONTAINERIZED WORLD

Jignesh Shah

Chief Architect

Page 2: PostgreSQL High Availability in a Containerized World

About @jkshah ü  appOrbit

•  Focus is on data management of applications running in Containers

ü  VMware •  Lead and manage Postgres and Data Management teams at VMware for various products

embedding PostgreSQL running in virtualized embedded instances

ü  Sun Microsystems •  Team Member of first published SpecJAppServer 2004 benchmark with PostgreSQL •  Performance of PostgreSQL on Solaris/Sun Servers

ü  Working with PostgreSQL community since 2005 •  http://jkshah.blogspot.com/2005/04/profiling-postgresql-using-dtrace-on_22.html

ü  Working with Container technologies (Solaris Zones) since 2004 •  http://jkshah.blogspot.com/2004/08/db2-working-under-solaris-10-zones_30.html

Page 3: PostgreSQL High Availability in a Containerized World

About ü OnDemand Dev/Test Environment Management

•  Multi-tiered, multi-serviced application deployment with initial data •  Onprem & multi-cloud •  Snap, Restore, Clone across all tiers

ü Data Services for Applications •  DBaaS for your applications •  Oracle, PostgreSQL, MySQL

ü Test Data Management •  Live data templates, data pipeline workflows •  Data subsets, data masking workflows

Page 4: PostgreSQL High Availability in a Containerized World

Enterprise Requirements

Page 5: PostgreSQL High Availability in a Containerized World

DB Server

Typical Enterprise Application

Web Server Web Server App Server App Server

Cache Server Cache Server

Page 6: PostgreSQL High Availability in a Containerized World

Typical Micro services Application

Web Server Webapp Server

Gateway

App Server MicroService 1

DB 2 App Server MicroService 2

DB 3 App Server MicroService N

Page 7: PostgreSQL High Availability in a Containerized World

Simplified Requirements of DB Service ü Easy, convenient Database service ü High Availability ü High Durability

Data Center

DNS Name

Page 8: PostgreSQL High Availability in a Containerized World

High Availability Requirements ü Requirements

•  Recovery Time Objective (RTO) - Availability •  What does 99.99% availability really mean?

•  Recovery Point Objective (RPO) - Durability •  Zero data lost? •  HA vs. DR requirements

Availability % Downtime / Year Downtime / Month * Downtime / week"Two Nines" - 99% 3.65 Days 7.2 Hours 1.69 Hours"Three Nines" - 99.9% 8.76 Hours 43.2 Minutes 10.1 Minutes"Four Nines" - 99.99% 52.56 Minutes 4.32 Minutes 1.01 Minutes"Five Nines" - 99.999% 5.26 Minutes 25.9 Seconds 6.06 Seconds

* Using a 30 day month

Page 9: PostgreSQL High Availability in a Containerized World

Causes of Downtime ü Blame it on Murphy’s Law

•  Anything that can go wrong, will go wrong

ü Planned Downtime •  Software upgrade (OS patches, SQL Server cumulative updates) •  Hardware/BIOS upgrade

ü Unplanned Downtime •  Datacenter failure (natural disasters, fire) •  Server failure (failed CPU, bad network card) •  I/O subsystem failure (disk failure, controller failure) •  Software/Data corruption (application bugs, OS binary corruptions) •  User Error (shutdown a SQL service, dropped a table)

Page 10: PostgreSQL High Availability in a Containerized World

Distilled HA Requirements ü Design to handle failures ü Easy to deploy ü Convenient to access ü Self healing/handling of failures ü Graceful handling

Page 11: PostgreSQL High Availability in a Containerized World

Design

Page 12: PostgreSQL High Availability in a Containerized World

Shared Storage based HA ü Leverage Hardware snapshots ü Automated failover - OS Clustering ü DR (across region) using block level

replication of snapshots ü Newer designs using distributed shared

storage

DNS Name

Page 13: PostgreSQL High Availability in a Containerized World

Disadvantages of Shared Storage HA ü Expensive ü Mostly DR ü No offloading

•  Though possible using storage clones

ü No load balancing ü No fast failover

•  PostgreSQL server start required

DNS Name

Page 14: PostgreSQL High Availability in a Containerized World

PostgreSQL Replication ü Single master, multi-slave

•  Cascading slave also possible •  Mechanism based on WAL (Write-Ahead Logs)

ü Multiple modes and multiple recovery ways •  Warm standby •  Asynchronous hot standby •  Synchronous hot standby

ü Fast failover (pg_ctl promote) ü Load Balancing for read only operations

•  Good for read scale ü Node failover, reconnection possible

Page 15: PostgreSQL High Availability in a Containerized World

Synchronous PostgreSQL Replication ü No SPOF Shared Storage ü Faster failover ü High durability ü Best within single datacenter only

DNS Name

Page 16: PostgreSQL High Availability in a Containerized World

Asynchronous PostgreSQL Replication ü Better performance ü Across datacenters ü Cascading replication ü Tradeoff: Lose transactions

DNS Name

Page 17: PostgreSQL High Availability in a Containerized World

PostgreSQL Replication - Issues ü No automated Failover ü Complex setup

•  Reusing failed master as slave (pg_rewind)

•  Provisioning new slave with full copy ü Manual DNS updates for failover ü Application reconnection required after a

failover ü Separation logic required for Read-Write

and Read-Only connections in application

Photo Credit: dundanim/ Shutterstock.com

Page 18: PostgreSQL High Availability in a Containerized World

Easy to deploy Using Linux Containers

Page 19: PostgreSQL High Availability in a Containerized World

What are Containers? ü OS Level virtualization where kernel allows for multiple isolated user-

space instances

Operating System

Bare Metal Server

OS

Bare Metal Server

Hypervisor

OS

Operating System

Bare Metal Server

C C C C C OS

Bare Metal Server

Hypervisor

OS

C C C C

Page 20: PostgreSQL High Availability in a Containerized World

Advantages of Containers ü Lower footprint ü Very Quick Startup and Shutdown ü Density ü Nesting

Page 21: PostgreSQL High Availability in a Containerized World

Disadvantages of Containers ü Same Kernel version ü Cannot run other OS natively ü Security (to be improved) ü Not a complete solution for enterprise needs

Page 22: PostgreSQL High Availability in a Containerized World

Docker ü Quick Guide to use a docker based container

# docker run --name mycontainer –e POSTGRES_PASSWORD=mysecretpassword -d postgres

# docker exec -ti mycontainer psql -U postgres

# docker stop mycontainer

# docker rm mycontainer

# docker rmi postgres

Page 23: PostgreSQL High Availability in a Containerized World

Container Volumes – Must for DB ü Persists beyond the life of a Docker container

•  VOLUME command in Dockerfile or •  Using –v using docker run command •  Automatically created if not already present during docker run •  Not part of docker push/pull operations •  Can select a non-local directory using --volume-driver •  Third party components required to get multi-host support (NFS, etc )

ü Different options using –v •  -v /hostsrc/data:/opt/data:ro # for read only volumes (default rw) •  -v /hostsrc/data:/opt/data:Z # Z – private volume, z – shared volume •  -v /etc/nginx.conf:/etc/nginx.conf # for mounting a single file only

Page 24: PostgreSQL High Availability in a Containerized World

PostgreSQL Container as a DB server ü Maybe you want a database server standalone

•  Not all database clients will be in the same host •  Need to limit memory usage •  Need different layout of how files are distributed

ü Use the –p option to make the port available even to non containers clients

ü Use –m to limit memory usage by the DB server (by default it can see and use all)

•  Note this does not set shared buffers automatically with the library image

docker run --name mycontainer -m 4g -e POSTGRES_PASSWORD=mysecretpassword \ -v /hostpath/pgdata:/var/lib/postgresql/data -p 5432:5432 -d postgres

Page 25: PostgreSQL High Availability in a Containerized World

Best Practices for PostgreSQL image ü For production install customize the docker image

•  Allocate proper memory limits - example 8GB •  All pagecache usage shows up as docker container memory usage

•  Bump up shared buffers and other parameters as required •  Hint: use PostgreSQL 9.3 or later otherwise have to privileged containers

•  http://jkshah.blogspot.com/2015/09/is-it-privilege-to-run-container-in.html

•  Support multiple volumes in your image •  PITR archives •  Full Backup directory

•  PostgreSQL Extensions •  Setup replication support

•  Out of box replication setup

•  Monitoring Tool •  Your favorite monitoring agent

Page 26: PostgreSQL High Availability in a Containerized World

Some Trends in Container World ü Binaries and data often separated

•  One lives in Container image and other in Volumes ü No longer pg_xlog deployed on separate volumes

•  Underlying storage technologies leads to inconsistent point in time restore causing DB to be unusable

ü  No new table spaces •  Hard to get easy replication setups done on the fly •  Could lead to lost data if new tablespaces are not on volumes

ü  Replications setup with automation rather than manually by Admins

Page 27: PostgreSQL High Availability in a Containerized World

Some Trends in Container World ü Adoption of Micro services

•  Leading to lots of smaller databases for each micro service ü Faster Updates

•  Schema changes sometimes need to be backward compatible ü Repeatable Deployments

•  Need to redeploy at a moment’s notice

Page 28: PostgreSQL High Availability in a Containerized World

Kubernetes ü Production grade container orchestrator ü Horizontal scaling

•  Setup rules to scale read replicas ü ConfigMap

•  postgresql.conf •  pg_hba.conf

ü Secrets •  Username passwords •  Certificates

ü Persistent Storage features evolving •  Plugins for storage drivers

Page 29: PostgreSQL High Availability in a Containerized World

Convenient to access

Page 30: PostgreSQL High Availability in a Containerized World

Kubernetes Services ü External Services (Virtual or Elastic IP)

•  Services are accessible from all nodes •  Shared Storage plugins makes your Stateful containers also HA •  Powerful Combination along with PostgreSQL Replication

•  can spin up fast slaves for multi-TB databases

•  Load balanced over multiple pods

ü For PostgreSQL typical two services •  master – one node only •  readreplicas - all read replicas

Page 31: PostgreSQL High Availability in a Containerized World

Consul •  Service Discovery •  Failure Detection •  Multi Data Center •  DNS Query Interface

{

"service": {

"name": ”mypostgresql",

"tags": ["master"],

"address": "127.0.0.1",

"port": 5432,

"enableTagOverride": false,

}

}

nslookup master.mypostgresql.service.domain

nslookup mypostgresql.service.domain

Page 32: PostgreSQL High Availability in a Containerized World

PostgreSQL Enhancement (Ports) ü SRV Record of NameServer

•  https://en.wikipedia.org/wiki/SRV_record •  IP:Port

ü PostgreSQL LIBPQ Client Enhancement •  Support Service Discovery using SRV Records •  servicename is passed •  libpq looks up the SRV Record from nameserver •  Connects port provided by SRV record

Page 33: PostgreSQL High Availability in a Containerized World

Self healing

Page 34: PostgreSQL High Availability in a Containerized World

Modern HA Projects ü Patroni / Governor

•  https://github.com/zalando/patroni (Python) •  Docker container •  Etcd •  HAProxy

ü Stolon •  https://github.com/sorintlab/stolon (Golang)

•  Docker •  Etcd /Consul •  Custom Proxy

Page 35: PostgreSQL High Availability in a Containerized World

Governor

https://github.com/compose/governor/blob/master/postgres-ha.pdf

Page 36: PostgreSQL High Availability in a Containerized World

Stolon The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

https://github.com/sorintlab/stolon/blob/master/doc/architecture_small.png

Page 37: PostgreSQL High Availability in a Containerized World

Basic Architecture ü Distributed state management store

•  Etcd, Consul, Zookeeper

ü Controller – Peer or Central •  Detect failure •  Shutdown failed master •  Elect /Self Elect new master •  Promote new master •  Repurpose failed master as slave (PG Rewind)

Page 38: PostgreSQL High Availability in a Containerized World

Limitations ü Selection of wrong replicas (lag) ü Multi-data center topology ü Edge cases ü Not completely human free

Page 39: PostgreSQL High Availability in a Containerized World

Deployment of PostgreSQL “Cluster”

ü  Can be made self healing ü  Integrate with pg_rewind to reuse master as slave ü  Integrate with shared storage to leverage snapshot create new slaves

Virtual IPs

Applications

Instance 1

Instance 2

Instance 3

Shared Storage

Page 40: PostgreSQL High Availability in a Containerized World

Deploying Multiple Database Clusters

Applications

Page 41: PostgreSQL High Availability in a Containerized World

Production Grade Orchestrator

ü  Can even add rules to spin up new slaves as for read load

Operations

Applications

Page 42: PostgreSQL High Availability in a Containerized World

Across Geography / Data centers

ü  Uniform DNS name for your database ü  Cloud-agnostic naming ü  Certificates created using DNS names you own ü  No Single Point of Failures ü  Geography based deployments

Operations

Applications

Page 43: PostgreSQL High Availability in a Containerized World

Graceful

Page 44: PostgreSQL High Availability in a Containerized World

Handling Connections ü  “Fatal: Database is coming up” - breaks port health checker ü Application does not reopen connections – retries not baked in ü Session variables lost during switch – rerun environment

Page 45: PostgreSQL High Availability in a Containerized World

Summary

Page 46: PostgreSQL High Availability in a Containerized World

Painpoints for Master-Slave PostgreSQL ü Architected to handle failures ü Easy to deploy - Single Node ü Easy to deploy - Setting up Replicated Node ü Convenient to access – Get to current master ü Convenient to access – Load Balance readonly Nodes ü Convenient to access – DNS single endpoints ü Self Managing – Automated Failover ü Self Managing – Automated provisioning/reprovisioning ü Graceful - Connection Pooling – handling disconnects

Page 47: PostgreSQL High Availability in a Containerized World

Containerized PostgreSQL HA Architecture DNS - Consul

Load Balancers – Kubernetes Proxy

Connection pooling: HA Proxy/pgbouncer

Compute Scheduler – Kubernetes

Self/Automated Failover – Patroni/Stolon/Governor

Config /State Management – ETCD, Consul, Zookeeper

Database - PostgreSQL

Instance Format – Docker Containers

Storage – Distributed Storage/Shared Nothing Storage

Page 48: PostgreSQL High Availability in a Containerized World

Your Feedback is Important! ü We’d like to understand your use of Postgres for HA / DR.

ü  If interested, ü Twitter: @jkshah ü Email: [email protected]

Page 49: PostgreSQL High Availability in a Containerized World

Thanks. Questions?

Follow me on twitter: @jkshah

Blog: http://jkshah.blogspot.com Full copies of your applications

at the push of a button

We are HIRING !!!