architecting your postgresql application for the cloud · 2009-10-21 · architecting your...
TRANSCRIPT
Architecting Your PostgreSQL Application for the Cloud
Jim MlodgenskiChief ArchitectEnterpriseDB
Agenda
• What is the Cloud?• Amazon EC2• PostgreSQL Applications in the Cloud• Maintenance• Security• High Availability• Scaling the Data Tier
Agenda
• What is the Cloud?• Amazon EC2• PostgreSQL Applications in the Cloud• Maintenance• Security• High Availability• Scaling the Data Tier
What is the Cloud?
• Dynamically scalable cluster of resources– It is really hosted virtualized instances
• Frequently referred to as Utility or Elastic Computing
• There is no standard across providers
• It will be a key part of IT in the future
Who are the players?
Agenda
• What is the Cloud?• Amazon EC2• PostgreSQL Applications in the Cloud• Maintenance• Security• High Availability• Scaling the Data Tier
Amazon EC2
• Elastic Compute Cloud (EC2)– Uses Xen virtualization to run images
• Amazon Machine Images (AMI)– Hundreds of pre-configured images available to
the public– Tools available to bundle customized images
Amazon EC2
• Instances– A range of virtual environments
• Small – 1 core, 1.7GB RAM, 160GB Storage
• Xlarge– 4 cores, 15GB RAM, 1.6TB Storage
• Xlarge High CPU– 8 cores, 7GB RAM, 1.6TB Storage
Amazon EC2 - Storage
• Instance Storage
• Tied to the active instance
• When the instance shuts down it all goes away
• Elastic Block Store
• Persistent storage
• Acts like a raw block device
• Tied to a physical zone
• Software RAID across multiple volumes is possible
• Simple Storage Service (S3)
• Reliable and Redundant
• Not a block device
• Only a web service interface
Amazon EC2
• Storage Performance● The EBS Storage is about
the same speed as the internal instance storage
● The IO speed of the Xlarge instances is extremely good
MacBook Small Small EBS Xlarge Xlarge High CPU
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
Bonnie++ Results
Seq Writes (K/sec)
Seq Reads (K/sec)
Amazon EC2
• Memory Performance● The speed of the memory
of even the largest instances is slow compared to physical machines
● This affects the performance of queries that are entirely returned out of cache
MacBook Small Xlarge Xlarge High CPU
0
500
1000
1500
2000
2500
3000
3500
4000
Bandwidth Memory Results
Read (MB/sec) Write (MB/sec)
Agenda
• What is the Cloud?• Amazon EC2• PostgreSQL Applications in the Cloud• Maintenance• Security• High Availability• Scaling the Data Tier
PostgreSQL Applications in the Cloud
• Elastics Databases● They do not exist for transactional systems● Possible to set up an AMI with PostgreSQL
● The “data” directory is ever changing● Need to scale the database via traditional methods
● Horizontally● Vertically
PostgreSQL Applications in the Cloud (cont.)
• Prepare for Elastic Application Tier– The Application Tier is easily scaled – Connection Pooling
• A connection pooling layer can be made elastic
– PgBouncer– pg-pool II
– Minimize Server Side Code• Put business logic in the Application Tier
Agenda
• What is the Cloud?• Amazon EC2• PostgreSQL Applications in the Cloud• Maintenance• Security• High Availability• Scaling the Data Tier
Maintenance
• Backups
– Online backups with PITR is always best
– Nothing wrong with pg_dump
• Storing the backups
– S3
• Extremely reliable
• Difficult to automate with scripts
– Web service interface
– EBS Snapshots
• Backup to S3
• Command line tools available
Maintenance (cont.)
• Performance Tuning
– Understand the ratio of the speed of IO vs memory
– Remember to adjust the configuration file when scaling vertically
MacBook Small Small EBS Xlarge Xlarge High CPU
0
20
40
60
80
100
120
140
Ratio of Memory vs. IO Speed
Memory/IO (Writes)
Memory/IO (Reads)
Agenda
• What is the Cloud?• Amazon EC2• PostgreSQL Applications in the Cloud• Maintenance• Security• High Availability• Scaling the Data Tier
Security
• The Cloud is on the public internet– Use good security practices
• Firewall the server– Lock down PostgreSQL
• Change the default port• Use pg_hba.conf• Authentication• SSL communication
Security (cont.)
• Storage is semi-public– Maintain good security practices
• Encryption on the data directory– S3 for backups
• Use SSL when communicating with S3
Agenda
• What is the Cloud?• Amazon EC2• PostgreSQL Applications in the Cloud• Maintenance• Security• High Availability• Scaling the Data Tier
High Availability
• Replication– Use the traditional
replication methods• Slony• Log Shipping• Bucardo• PG Pool II
High Availability (cont.)
• Clustering– Not as simple to use
traditional methods• No shared disk
– Active/Passive Clustering• ISCSI for the
shared disk
Agenda
• What is the Cloud?• Amazon EC2• PostgreSQL Applications in the Cloud• Maintenance• Security• High Availability• Scaling the Data Tier
Scaling the Data Tier
• Replication• Massively Parallel Processing
– GridSQL– pg-pool II
Scaling the Data Tier (cont.)
• Federation and Sharding
– Horizontally partition the data across many smaller databases
• PL/Proxy
• Hibernate Shards
Scaling the Data Tier (cont.)
• Pgpool-II and Slony can be a very scalable architecture
– But be sure the data is read consistently across nodes
Summary
• PostgreSQL can excel on the Cloud– But it is not always nirvana
• Utility model is interesting– But watch your costs closely!!!
• Manage PostgreSQL much like it is done in a traditional environment
Thank you. Questions?