stored procedures are good enough
TRANSCRIPT
Nikolay Samokhvalov
Twitter: @[email protected]
History
Year of Birth: 1995
History
1995: Postgres95 – POSTQUEL query language replaced with SQL
History
1995: Postgres95 – POSTQUEL query language replaced with SQL
1996: Postgres95 departed from academia, renamed to PostgreSQL
History
1995: Postgres95 – POSTQUEL query language replaced with SQL
1996: Postgres95 departed from academia, renamed to PostgreSQL
1998: PL/pgSQL added (PostgreSQL 6.4)
And a bit more history...
Object Management in POSTGRES Using ProceduresM. Stonebraker
http://www.dtic.mil/dtic/tr/fulltext/u2/a181411.pdf
What’s now?- Postgres speaks a lot of PL languages:
- “native”: PL/pgSQL- included: PL/Tcl, PL/Perl, PL/Python- additional-traditional: PL/Java, PL/R, PL/sh, PL/v8 (JavaScript)- not active: PL/Scheme, PL/PHP, PL/Ruby- special/exotic/new:
- PL/Proxy (sharding, from Skype), - PL/Container (Python, R), - plgo (Go), etc.- PgOpenCL (GPU!)
- Functions can also be created in:- C (anything is possible!)- SQL (plain! standard! with [recursive] CTEs!)
What are Stored Procedures?
In Postgres:Functions = UDFs (user-defined functions) = Stored Procedures
(in other DBMSes: you can include your function/UDF to a SELECT,while you can only PERFORM/EXEC/EXECUTE a stored procedure)
Functions & Triggers
Functions & Triggers
Why?
Reason #1: Data Clearness & Integrity
Data Checks (format, constraints, etc)(Ruby or Python or PHP or …)
Reason #1: Data Clearness & Integrity
Data Checks (format, constraints, etc)in App (Ruby or Python or PHP or …)
Reason #1: Data Clearness & Integrity
Data Checks (format, constraints, etc)in App (Ruby or Python or PHP or …)
Reason #1: Data Clearness & Integrity
App (Ruby or Python or PHP or …)
CHECKS
Reason #1: Data Clearness & Integrity
App (Ruby or Python or PHP or …)
CHECKS
Control your Data Quality
Data Validation, an example: validate email address
Source: https://www.postgresql.org/message-id/20050907175305.GA20501%40isis.sigpipe.cz
Reason #2: Access Control
- SECURITY DEFINER allows a user to do what she/he cannot usually do (but under strict control)- GRANT/REVOKE – a standard way to control permissions - Good approach: forbid direct access to tables, provide functions and views with proper GRANTs- Pay attention to:
- objects (tables, views, functions)- columns (can REVOKE/GRANT individually!)- rows (check what Row-Level Security is)
Reason #3: speed (first of all, IO/network-related)
DBMS (Postgres 9.6) – AWS RDS, USA,Client (psql) – somewhere in Germany.Getting all 10M rows is ~7x slower
Use your RDBMS for Data Manipulation. It is not just a Storage.
Reason #3: speedThere are a LOT of cases here.
- ORMs (ActiveRecord, Hibernate, etc) and how people work with them- Analytics (doing R or python calculations inside RAM, etc)- Massive data updates (retrieve IDs and then DELETE rows? Doh.
Just look around and you’ll find more.
Again: Work with Data Inside Database First.
Pay attention to:- cardinality (how many rows you touch?)- RTT (round trip time), reduce network calls
Reason #4: Data Integration
Data Manipulation Logicin App (Ruby or Python or PHP or …)
Something*
* ElasticSearch, Sphix, Analytics DBMS, etc
Reason #4: Data Integration
Data Manipulation Logicin App (Ruby or Python or PHP or …)
Something*
* ElasticSearch, Sphix, Analytics DBMS, etc
Reason #4: Data Integration
Data Manipulation Logicin App (Ruby or Python or PHP or …)
Something*
* ElasticSearch, Sphix, Analytics DBMS, etc
Reason #4: Data Integration
App (Ruby or Python or PHP or …)
Something*
* ElasticSearch, Sphix, Analytics DBMS, etc
DataManipulation
Use:- functions, triggers,- Foreign Data Wrappers (FDW),- Logical Decoding (e.g. pglogical)
#5: HTTP API w/o middleware, “declarative”http://postgrest.com - PostgREST
Written in HaskellMIT licenseActively developing
chat: https://gitter.im/begriffs/postgrest
CREATE VIEW v1.person
AS SELECT * FROM public.person; → /person
CREATE FUNCTION v1.myfunc(...) … → /rpc/myfunc
LANGUAGE ...;
(write functions in any language: SQL, plpgsql, plpython, plr, plv8, etc!)
GET → SELECTPOST → INSERTPATCH → UPDATEDELETE → DELETE
Only POST
#6: PL/Proxy: sharding
- All work via functions- Special functions (in PL/Proxy “language”) are in the
middle- Developed in Skype, and still there- Yandex.Mail migrated from Oracle to Postgres +
PL/Proxy in 2014-2016 (300+ TB, 250k RPS)
#6: PL/Proxy: sharding
#7: MADlib: Machine Learning inside your DBMS
- A lot of ML algorithms implemented (added in each release)- PL/Python- Very easy and quick start to do machine learning with your Postgres data
http://madlib.incubator.apache.org/
Cons● Tooling can be considered week (packaging, dependences, editors,
debugging, profiling, etc)
● Version control and schema migrations
● Testing
● Stored Procedures consume resource in DBMS. Can be tricky to scale○ Example: call external API via plpythonu function and save data -- consumes CPU on your
server unpredictably!
Cons - fixes● Tooling can be considered week (packaging, dependences, editors,
debugging, profiling, etc) vim+plpgsql highlighting; DataGrip, Debugger, Profiler (pgAdmin)
● Version control and schema migrations Sqitch and others
● Testing pgTAP
● Stored Procedures are consuming resource in DBMS. Can be tricky to scale○ Example: call external API via plpythonu function and save data -- consumes CPU on your
server unpredictably!
Avoid I/O things inside your master if you need to scale
Thank you!
Twitter: @postgresmen (new Postgres tweets daily!)
RuPostgres.org