managing terabytes

Post on 05-Dec-2014

1.903 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Some Conference

PgConf.EU 2011 1

Managing TerabytesProblems and solutions with

operating large Postgres installations

Selena DeckelmannPrime Radiant@selenamarie

Some Conference

PgConf.EU 2011

About me.

2

Some Conference

PgConf.EU 2011

• 1.6 TB, 1 cluster, Version 8.2

• 1.1 TB, 1 cluster, Version 8.3

• 8.4/9.0 Dev systems

• Working toward 9.0 into prod (May 2011)

• pgpool, Redis, RabbitMQ, NFS

The Environment

3

Some Conference

PgConf.EU 2011

• daily peak: ~3000 commits per second

• average writes: 4 MBps

• average reads: 8 MBps

Some stats

4

Some Conference

PgConf.EU 2011

What’s good

• Most queries are fast!

• Benchmarks say we’re pushing the limits of the hardware

• Developers love working with Postgres

Some Conference

PgConf.EU 2011

And lots more. But...

Some Conference

PgConf.EU 2011

Some Conference

PgConf.EU 2011

The Problems

1. System resource exhaustion

2. Everything is slow: Huge catalogs, Backups

3. Handling VACUUM problems: Bloat, Transaction wraparound

4. Upgrades: Minor, Major

Some Conference

PgConf.EU 2011

System Resource Exhaustion

Some Conference

PgConf.EU 2011

Problem: UFS on Solaris

“The only way to add more inodes to a UFS filesystem is: 1. destroy the filesystem and create a new filesystem with a higher inode density 2. enlarge the filesystem - growfs man page”

Running out of inodes

10

Some Conference

PgConf.EU 2011

Solution 0: Delete files.

Solution 1: Sharding/bigger filesystem

Solution 2: xfs

Running out of inodes

11

Some Conference

PgConf.EU 2011

Problem: Too many open files by the database.selena@lulu:~ #508 18:43 :) sudo lsof -p 19121 | wc

40 355 4151

Solution: You need a connection pooler.

Running out of file descriptors

12

Some Conference

PgConf.EU 2011

Solution: You need a connection pooler.

Recommended: pgbouncer (threaded, online upgrade)pgpool-II (failover)

Running out of file descriptors

13

Some Conference

PgConf.EU 2011

Everything is slow.

Some Conference

PgConf.EU 2011

409,994 tables

Huge Catalogs

15

Some Conference

PgConf.EU 2011

Minor mistake in parent table definitions:

not null default nextval('important_sequence'::text)

vs

not null default nextval('important_sequence'::regclass)

16

Maintenance problem

Some Conference

PgConf.EU 2011

Problem: Slow scans of catalog data

Solution: Upgrade to Postgres 8.4 or higher

But really: Avoid making a cluster with >400k tables.

Huge Catalogs

17

Some Conference

PgConf.EU 2011 18

9,019,868 total data points for table stats

4,550,770 total data points for index stats

Problem: This is slow to write.(128 MB written every second or so)

Stats collection

Some Conference

PgConf.EU 2011 19

9,019,868 total data points for table stats

4,550,770 total data points for index stats

Soution: Move stats file to RAM.

stats_temp_directory (8.4 or higher)There’s a trivial patch for earlier versions.

Stats collection

Some Conference

PgConf.EU 2011 20

9,019,868 total data points for table stats

4,550,770 total data points for index stats

Problem: This is slow to read.

Stats collection

Some Conference

PgConf.EU 2011 21

9,019,868 total data points for table stats

4,550,770 total data points for index stats

Solution: Supposedly, this is better in 8.4 and higher.(fewer writes per minute)Still probably not fast.

Stats collection

Some Conference

PgConf.EU 2011

pg_dump takes longer and longer...

Backups

22

Some Conference

PgConf.EU 2011

   backup     |   duration -------------------+--------------------  2009­11­22  |  02:44:36.821475   2009­11­23  |  02:46:20.003507  2009­11­24  |  02:47:06.260705  2009­12­06  |  07:13:04.174964  2009­12­13  |  05:00:01.082676  2009­12­20  |  06:24:49.433043  2009­12­27  |  05:35:20.551477  2010­01­03  |  07:36:49.651492  2010­01­10  |  05:55:02.396163  2010­01­17  |  07:32:33.277559  2010­01­24  |  06:22:46.522319  2010­01­31  |  10:48:13.060888  2010­02­07  |  21:21:47.77618  2010­02­14  |  14:32:04.638267  2010­02­21  |  11:34:42.353244  2010­02­28  |  11:13:02.102345

23

Some Conference

PgConf.EU 2011

Problem: pg_dump is too slow.

Solutions:

• patching pg_dump for SELECT ... LIMIT

• crank down shared_buffers

• Stop using pg_dump for backups

• 64-bit might help

Backups

24

Some Conference

PgConf.EU 2011

How not to migrate to a 64-bit system

25

Some Conference

PgConf.EU 2011

Install 32-bit Postgres and libraries on a 64-bit system. Install 64-bit Postgres/libs of the same version. Copy “hot backup” from 32-bit sys over to 64-bit sys. Run pg_dump from 64-bit version on 32-bit Postgres.

Title Text

26

Some Conference

PgConf.EU 2011

But lots of people use them that way!

A single warm standby is not a backup.

27

Some Conference

PgConf.EU 2011

Ship WAL from Solaris x86 -> Linux

It did work!

28

Some Conference

PgConf.EU 2011

Handling VACUUM problems

Some Conference

PgConf.EU 2011

Problem: Lots of dead tuples in tables.

• Frequent UPDATEs to long tables of log data

• Frequent DELETEs without a VACUUM

• A terabyte of dead tuples

Bloat

30

Some Conference

PgConf.EU 2011

Solution: Write custom scripts to clean

• VACUUM for small things

• CLUSTER for everything else

• Considered TRUNCATE

Fixing bloat

31

Some Conference

PgConf.EU 2011 32

Application allowed users to initiate ALTER TABLE.

Regular VACUUM couldn’t fix it.

VACUUM FULL of the catalog takes 2+ hours.

Use of NOTIFY/LISTEN can also cause bloat.

Catalog Bloat

Some Conference

PgConf.EU 2011

Problem: autovacuum set off too frequently

Watch age(datfrozenxid)

Solution: Increase autovacuum_freeze_max_age (default is 200 million, we increase to one billion)

Transaction wraparound avoidance

33

Some Conference

PgConf.EU 2011

Upgrades

Some Conference

PgConf.EU 2011

Problem: Restarting Postgres causes bad application performance.

• Require a start/stop of database

• Unexpected CHECKPOINT

• Cold cache

Minor upgrades

35

Some Conference

PgConf.EU 2011

Solutions:

• Plan for a CHECKPOINT before shutdown

• Warm the cache (Queries that exercise indexes, maybe table scans)

Minor upgrades

36

Some Conference

PgConf.EU 2011

Problem: Major upgrades are a PITA.

• <8.2 - no pg_upgrade :(

• Time your restores.

• Document your SLAs.

Major Version upgrades

37

Some Conference

PgConf.EU 2011

Solutions: :(

• >=8.3 - pg_upgrade

• Time your restores.

• Document your SLAs.

Major Version upgrades

38

Some Conference

PgConf.EU 2011

Solutions: :(

• Write tools to migrate data

• Shard

• Trigger-based replication

Major Version upgrades

39

Some Conference

PgConf.EU 2011

The Problems

1. System resource exhaustion

2. Everything is slow: Huge catalogs, Backups

3. Handling VACUUM problems: Bloat, Transaction wraparound

4. Upgrades: Minor, Major

Some Conference

PgConf.EU 2011

The Solutions

1. System resource exhaustionChoose a better filesystem, Pooling

2. Everything is slow: Huge catalogs, BackupsDon’t do that, Monitor & Binary backups

Some Conference

PgConf.EU 2011

The Solutions

3. Handling VACUUM problems: Bloat, Transaction wraparoundDeveloper education, Monitoring, Cleanup, *_max_freeze_age

4. Upgrades: Minor, MajorPlan, Plan, Plan (CHECKPOINT, warm cache, pg_upgrade)

Some Conference

PgConf.EU 2011 43

Managing TerabytesProblems and solutions with

operating large Postgres installations

Selena DeckelmannPrime Radiant@selenamarie

Thanks!

top related