postgresql worst practices, version fosdem pgday 2017 by ilya kosmodemiansky

Post on 21-Feb-2017

8.779 Views

Category:

Engineering

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

PostgreSQL worst practicesat FOSDEM PGDay Brussels 2017

Ilya Kosmodemianskyik@postgresql-consulting.com

Best practices are just boring

• Never follow them, try worst practices• Only those practices can really help you to screw the things upmost effectively

• PostgreSQL consultants are nice people, so try to make themhappy

How it works?

• I have a list, a little bit more than 100 worst practices• I do not make this stuff up, all of them are real-life examples• I reshuffle my list every time before presenting and extractsome amount of examples

• Well, there are some things, which I like more or less, so it isnot a very honest shuffle

0. Do not use indexes (a test one!)

• Basically, there is no difference between full table scan andindex scan

• You can check that. Just insert 10 rows into a test table onyour test server and compare.

• Nobody deals with more than 10 row tables in production!

1. Use ORM

• All databases share the same syntax• You must write database-independent code• Are there any benefits, which are based on database specificfeatures?

• It always good to learn a new complicated technology

2. Move joins to your application

• Just select * a couple of tables into the application written inyour favorite programming language

• Than join them at the application level

• Now you only need to implement nested loop join, hash joinand merge join as well as query optimizer and page cache

2. Move joins to your application

• Just select * a couple of tables into the application written inyour favorite programming language

• Than join them at the application level• Now you only need to implement nested loop join, hash joinand merge join as well as query optimizer and page cache

3. Be in trend, be schema-less

• You do not need to design the schema• You need only one table, two columns: id bigserial and extra

jsonb• JSONB datatype is pretty effective in PostgreSQL, you cansearch in it just like in a well-structured table

• Even if you put a 100M of JSON in it• Even if you have 1000+ tps

4. Be agile, use EAV

• You need only 3 tables: entity, attribute, value

• At some point add the 4th: attribute_type• Whet it starts to work slow, just call those four tables The

Core and add 1000+ tables with denormalized data• If it is not enough, you can always add value_version

4. Be agile, use EAV

• You need only 3 tables: entity, attribute, value• At some point add the 4th: attribute_type

• Whet it starts to work slow, just call those four tables TheCore and add 1000+ tables with denormalized data

• If it is not enough, you can always add value_version

4. Be agile, use EAV

• You need only 3 tables: entity, attribute, value• At some point add the 4th: attribute_type• Whet it starts to work slow, just call those four tables The

Core and add 1000+ tables with denormalized data

• If it is not enough, you can always add value_version

4. Be agile, use EAV

• You need only 3 tables: entity, attribute, value• At some point add the 4th: attribute_type• Whet it starts to work slow, just call those four tables The

Core and add 1000+ tables with denormalized data• If it is not enough, you can always add value_version

5. Try to create as many indexes as you can

• Indexes consume no disk space• Indexes consume no shared_bufers• There is no overhead on DML if one and every column in atable covered with bunch of indexes

• Optimizer will definitely choose your index once you created it• Keep calm and create more indexes

6. Always keep all your time series data

• Time series data like tables with logs or session history shouldbe never deleted, aggregated or archived, you always need tokeep it all

• You will always know where to check, if you run out of diskspace

• You can always call that Big Data• Solve the problem using partitioning... one partition for anhour or for a minute

6. Always keep all your time series data

• Time series data like tables with logs or session history shouldbe never deleted, aggregated or archived, you always need tokeep it all

• You will always know where to check, if you run out of diskspace

• You can always call that Big Data• Solve the problem using partitioning... one partition for anhour or for a minute

6. Always keep all your time series data

• Time series data like tables with logs or session history shouldbe never deleted, aggregated or archived, you always need tokeep it all

• You will always know where to check, if you run out of diskspace

• You can always call that Big Data

• Solve the problem using partitioning... one partition for anhour or for a minute

6. Always keep all your time series data

• Time series data like tables with logs or session history shouldbe never deleted, aggregated or archived, you always need tokeep it all

• You will always know where to check, if you run out of diskspace

• You can always call that Big Data• Solve the problem using partitioning... one partition for anhour or for a minute

7. Turn autovacuum off

• It is quite auxiliary process, you can easily stop it• There is no problem at all to have 100Gb data in a databasewhich is 1Tb in size

• 2-3Tb RAM servers are cheap, IO is a fastest thing in moderncomputing

• Besides of that, everyone likes BigData

8. Keep master and slave on different hardware

• That will maximize the possibility of unsuccessful failover

• To make things worser, you can change only slave-relatedparameters at slave, leaving defaults for shared_buffers etc.

8. Keep master and slave on different hardware

• That will maximize the possibility of unsuccessful failover• To make things worser, you can change only slave-relatedparameters at slave, leaving defaults for shared_buffers etc.

9. Put a synchronous replica to remote DC

• Indeed! That will maximize availability!

• Especially, if you put the replica to another continent

9. Put a synchronous replica to remote DC

• Indeed! That will maximize availability!• Especially, if you put the replica to another continent

10. Reinvent Slony

• If you need some data replication to another database, try toimplement it from scratch

• That allows you to run into all problems, PostgreSQL havehad since introducing Slony

10. Reinvent Slony

• If you need some data replication to another database, try toimplement it from scratch

• That allows you to run into all problems, PostgreSQL havehad since introducing Slony

11. Use as many count(*) as you can

• Figure 301083021830123921 is very informative for the enduser

• If it changes in a second to 30108302894839434020, it is stillinformative

• select count(*) from sometable is a quite light-weighted query• Tuple estimation from pg_catalog can never be preciseenough for you

12. Never use graphical monitoring

• You do not need graphs• Because it is an easy task to guess what was happenedyesterday at 2 a.m. using command line and grep only

13. Never use Foreign Keys(Use local produced instead!)

• Consistency control at application level always works asexpected

• You will never get data inconsistency without constraints• Even if you already have a bullet proof framework to maintainconsistency, could it be good enough reason to use it?

14. Always use text type for all columns

• It is always fun to reimplement date or ip validation in yourcode

• You will never mistakenly convert ”12-31-2015 03:01AM” to”15:01 12 of undef 2015” using text fields

15. Always use improved ”PostgreSQL”

• Postgres is not a perfect database and you are smart• All that annoying MVCC staff, 32 bit xid and autovacuumnightmare look like they look because hackers are oldschooland lazy

• Hack it in a hard way, do not bother yourself with submittingyour patch to the community, just put it into production

• It is easy to maintain such production and keep it compatiblewith ”not perfect” PostgreSQL upcoming versions

16. Postgres likes long transactions

• Always call external services from stored procedures (likesending emails)

• Oh, it is arguable... It can be, if 100% of developers werefamiliar with word timeout

• Anyway, you can just start transaction and go away forweekend

16. Postgres likes long transactions

• Always call external services from stored procedures (likesending emails)

• Oh, it is arguable... It can be, if 100% of developers werefamiliar with word timeout

• Anyway, you can just start transaction and go away forweekend

16. Postgres likes long transactions

• Always call external services from stored procedures (likesending emails)

• Oh, it is arguable... It can be, if 100% of developers werefamiliar with word timeout

• Anyway, you can just start transaction and go away forweekend

17. Load your data to PostgreSQL in a smart manner

• Write your own loader, 100 parallel threads minimum

• Never use COPY - it is specially designed for the task

17. Load your data to PostgreSQL in a smart manner

• Write your own loader, 100 parallel threads minimum• Never use COPY - it is specially designed for the task

18. Even if you want to backup your database...

• Use replication instead of backup

• Use pg_dump instead of backup• Write your own backup script• As complicated as possible, combine all external tools youknow

• Never perform a test recovery

18. Even if you want to backup your database...

• Use replication instead of backup• Use pg_dump instead of backup

• Write your own backup script• As complicated as possible, combine all external tools youknow

• Never perform a test recovery

18. Even if you want to backup your database...

• Use replication instead of backup• Use pg_dump instead of backup• Write your own backup script

• As complicated as possible, combine all external tools youknow

• Never perform a test recovery

18. Even if you want to backup your database...

• Use replication instead of backup• Use pg_dump instead of backup• Write your own backup script• As complicated as possible, combine all external tools youknow

• Never perform a test recovery

18. Even if you want to backup your database...

• Use replication instead of backup• Use pg_dump instead of backup• Write your own backup script• As complicated as possible, combine all external tools youknow

• Never perform a test recovery

Do not forget

That was WORST practice talk

Questions or ideas? Share your story!

ik@postgresql-consulting.com(I’am preparing this talk to be open sourced)

top related