fossasia 2015 - 10 features your developers are missing when stuck with proprietary databases

36
10 Features Developers are missing when Stuck with Proprietary Databases! Sameer Kumar (@sameerkasi200x) DB Solution Architect, Ashnik (@AshnikBiz) 15th March 2015

Upload: ashnikbiz

Post on 17-Jan-2017

440 views

Category:

Software


0 download

TRANSCRIPT

10 Features Developers are missing when Stuck with

Proprietary Databases!Sameer Kumar (@sameerkasi200x)

DB Solution Architect, Ashnik (@AshnikBiz)

15th March 2015

About Me!

- A Random guy who started Career as an Oracle and DB2 DBA (and yeah a bit of SQL Server too)

- Then moved to ‘Ashnik’ and started working with Postgres

- We work on Open Source Consulting and Solution

- And now I love Open Source!

- Twitter - @sameerkasi200x

- Apart from technology I love cycling and photography

2

Why I Love PostgreSQL?

- Claims to be “Most Advanced Open Source Database”

- A vibrant and active community

- Full ACID compliant

- Multi Version Concurrency Control

- NoSQL capability

- Developer Friendly

- Built to be extended ‘easily’

3

Supported on vast platforms

4

- Portable on vast range of Operating Systems – Unix, Linux, Windows etc

- Supported on various Architectures – RISC, ARM, x86

10 Features you would love as a developer!

1. New JSON Datatype introduced in v9.4 and JSON Functions & Operators

2. Vast set of datatypes supported – money, time, range, boolean, interval and many more

3. Rich support for foreign Data Wrapper – Build a Logical Data warehouse!

4. User Defined Operators – It’s really cool!

5. User Defined Extensions – you have out of box extensions plus you can write your own!

5… continued

10 Features you would love as a developer!

6. Filter Based Indexes or Partial Indexes – Index only what you need to!

7. Granular control of parameters at User, Database, Connection or Transaction Level – sort memory, logging parameters, reliability parameters and many more

8. Use of indexes to get statistics on the fly

9. JDBC API for COPY Command – Do bulk load right from you java program

10. Full Text Search – There is a lot more than what you think

6

JSON Features

Store Unstructured Data – Store rows with different Attributes

“category “

is an array

“features” is

an array of

sub-

documents

create table item_catalog ( item_id varchar(50) primary key,

item_descriptionvarchar(250), attributes jsonb );

“features” has different

set of members

New fields which suit the

details of this specific type

of product

8

Benefits to the Developers

- Allows you to store records which might have different attributes

- Store data in JSONB field until your schema has matured and firmed and then move it relational attributes (columns)

- Use JSON functions& operators to fetch and return data to application via APIs

- This would be make application transparent to underlying structure

- The binary storage format of JSONB allows efficient parsing

- You index JSON fields for faster search!

9

Datatype Support

Data Types Supported

Data Type Usage

Money Store currency Data

Interval Store time interval e.g. ‘2 days’, ‘1 hour’ etc

Time Store the time 2:00PM, 6:00AM etc

Range Store Ranges for integer, date or timestamp

Boolean Store true or false values

And store many more common data types e.g. varchar and char for string, numeric, float, integer, serial etc.

Create user defined datatypes to store data as per your convenience and define GiST indexes for your data-types

11

Benefits to the Developers

- Store the data from application or user input in more intuitive datatypes in database

- Avoid conversion or translation of values retrieved from database

- Define your own data types to match the structure or objects defined in programs

- Define your own operators and index access for user defined operators

12

Foreign Data Wrapper

Access Remote Databases

- As the name suggests allows you to access foreign tables in remote databases

- Allows you to read and write from these foreign tables

14

Benefits to the Developers- Access data from legacy systems for run-time processing

- Avoid connecting to multiple databases in application

- Read/write from noSQL or filesystem based stores as if they are relational tables

- Postgres would push the operations e.g. filter clause to foreign database for better execution

- Useful for migration or data integration

- Foreign Data Wrapper available for vast databases and data stores

- Hadoop, MongoDB, Oracle, MySQL, MariaDB, file system and many more

15

User Defined Operators

Define your own operators

- Postgres allows you to create your own operators

- You can override the existing ones for specific cases of you can give a new meaning to an operator for special cases

17

Benefits to the Developers

- Define your own operators to define how user defined data types are handled

- Define your own operators to override a default handling of data-types e.g. perform a case-in-sensitive search on varchar columns

- Create new data-types to handle specific tasks e.g. use + for concatenation of strings

- Makes the data processing easier for developers

- Makes the migration process easier e.g. from SQL Server to PostgreSQL will benefit from string concatenation

18

User Defined Extensions

Extend PostgreSQL capabilities with Extension

- These are like add-on modules which you can compile and add to PostgreSQL

- Once added the features offered by extensions work as native features

- Allows you to extend PostgreSQL capabilities

- There are out of box extensions available and you can write your own

20

Some Popular Extensions- pg_prewarm – load your data into buffer cache to avoid ‘cold

reboot’ issues

- pgcrypto – cryptographic functions to encrypt the data

- pg_shard – Create a Sharded Cluster with PostgreSQL

- postGIST – Add full spatial capabilities to PostgreSQL

- pg_bufferpool, pgrowlocks, pgstattuple and pg_freespacemap –take a peek into buffers, locks and data pages

- hstore- to use PostgreSQL as a key-value pair store

- fuzzystrmatch and pg_trgm – more enhanced and powerful search on textual data

21

Partial Indexes

When only a portion of data is relevant

- Often we have some columns which has low cardinality or few distinct values

- An index on these columns is not very helpful

- Mostly we have queries which requires only one of the values from all available values

- e.g. soft delete

- Application always queries data where “ deleted = false”

- e.g. using a column named “closed” in “ACCOUNTS” table in bank

23

Benefits of creating a partial index

- You can index only that data which is relevant and queried-• create index idx_active_acc_paymentdt on

ACCOUNTS(acc_int_payment_dt) where closed=flase;

- This keeps the index size smaller which performs faster

- You can create separate indexes to cover different set of data e.g. • create index idx_current_acc_paymentdt on

ACCOUNTS(acc_int_payment_dt) where acc_type=‘current’;

• create index idx_current_acc_paymentdt on ACCOUNTS(acc_int_payment_dt) where acc_type=‘savings’;

24

Granular Parameter Control

You can control Parameters at several levels

- Instance Level – in parameter file or in startup command

- Database- alter database reporting_db set work_mem=10240;

- User Level- alter user batch_user set maintenance_work_mem=1024mb

- Transaction Level- Select set_config(‘work_mem’,’20480’,true);

- Connection/Session Level- Set synchronous_commit=off;- Select set_config(‘synchronous_commit’,’off’,false);

26

Benefits to the Developer

- A developer can set the parameters as per the requirements in the program

- Set higher maintenance and sorting memory for batch jobs

- Set higher sorting memory for reporting user

- Set synchronous_commit off during batches to enhance performance for bulk loads

- Set different logging for specific users

27

PostgreSQL Planner can Get the Statistics on the fly

Benefits to the Developer

- Often as a developer you have code batch jobs

- Bulk uploads and bulk deletion of data from tables

- Post these operations you may be querying the same table

- Due to huge change in data volume chances are there that optimizer will pick a wrong plan.

29

So shall you gather stats after each bulk load operation?

- Not really!

- PostgreSQL optimizer is smart enough to quickly gauge the statistics from the indexes on the fly

- Developers don’t need to make their code heavy with ANALYZE, specially if response time is an important factor

30

JDBC API for COPY

JDBC Copy

- COPY command in PostgreSQL allows you to do bulk loads

- PostgreSQL jdbc drive also provides a COPY API

- Using JDBC Copy you can programmatically load data from STDIN or files

- Allows programmers to do faster bulk loads

32

Full Text Search

Yes! You can do full text search on PostgreSQL

- You can store your data in PostgreSQL and use it for complex pattern matches and textual search

- With GIN indexes your text searches and pattern matches can be made faster

- With additional Extensions you can also do trigram based searches or phonex/soundex matches

- Makes the developers life easier while doing searches on textual data

- GIN and GiST indexes help get better performance

34

For Further Reference- www.postgresql.org

- www.planetpostgresql.org

- Various community user group discussions

- Various blogs- Josh Berkus- Magnus Hagander- Bruce Momjian- Simon Riggs- Many more- Ashnik Blog Archives- Ashnik YouTube Channel

35

Questions?