8.4 upcoming features

Post on 05-Dec-2014

800 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

February 2009San Franciso

Josh Berkus, PostgreSQL Core Team

Version 8.4Holy Frijole,That's a lot of features!

8.4: A Few Patches

8.4: A Few Patches

5 CommitFests

8.4: A Few Patches

5 CommitFests

9 Months of Development

8.4: A Few Patches

5 CommitFests

9 Months of Development

Over 1600 GIT Updates

8.4: A Few Patches

5 CommitFests

9 Months of Development

Over 1600 GIT Updates

More Than 2 Dozen Major Features

Looks like some database projects still know how to put out new version.

Some-but-not-all 8.4 Features● Windowing Functions● Common Table Expressions● Parallel Restore● CIText● array_agg● auto_explain● SQL/MED connection manager● Per-Database Collations● \d commands improved● Multicolumn GIN indexes● Column-level permissions

● Unsigned Integers● Boyer-Moore String Searching● Improved Hash Indexes● More DTrace probes● Default & Variadic parameters● New PL/pgSQL statements● pg_stat_statements● pg_stat_functions● SSL refactor● pg_hba improvements● Performance improvements

SQL Features● Windowing Functions● Common Table Expressions● array_agg● Per-database Collations● New data types

– Unsigned Integers– CIText

● Improved \d commands● Add columns to existing VIEWs

Windowing Functions● Aggregate over part of the data

– SQL 2008 standard– Great for BI, OLAP

● Functions:– row_number()– rank()– lead()– lag()

● More from David Fetter later!

Windowing Functions

SELECT depname, empno, salary, rank() OVER

(PARTITION BY depname ORDER BY salary)

FROM empsalary;

SELECT y, m, SUM(SUM(people)) OVER (PARTITION BY y ORDER BY m), AVG(people)FROM( SELECT EXTRACT(YEAR FROM accident_date) AS y, EXTRACT(MONTH FROM accident_date) AS m, * FROM accident)sGROUP BY y, m;

Common Table Expressions● Ability to create "named subqueries" for your

query.● Best use: WITH RECURSIVE

– real recursive queries– "walk" trees with one query

● more from David Fetter later

Common Table ExpressionsWITH RECURSIVE subdepartment AS( -- SELECT * FROM department WHERE id = 'A'

UNION ALL

-- recursive term referring to "subdepartment" SELECT d.* FROM department AS d, subdepartment AS sd WHERE d.id = sd.parent_department)SELECT * FROM subdepartment;

array_agg● History:

– added Arrays in 7.4● array_accum() aggregate example code

– intarray contrib module in 8.0● only ints, but very fast

● array_agg() in 8.4: all arrays, fast C Code– from Robert Haas, new contributor!

SELECT status, array_agg(username) FROM logins GROUP BY status;

Per-Database Collations● Collations (ordering character sets) used to be

per installation● Now they are per database● Someday they will be per column● Google Summer of Code Project!CREATE DATABASE mydb

COLLATE 'sv_se.UTF-8'

CTYPE 'sv_se.UTF-8'

TEMPLATE template0

New Data Types● Make migrating from other DBMSes easier● CIText (in /contrib)

– Case Insensitive Text– Full CI indexing, comparisons

● Unsigned Integers (in pgFoundry)– migrate from MySQL, others

Better \d in psql● \d is now multi-version compatible

– \dt etc. won't error if you connect an 8.4 client to an 8.2 database

● \df for user functions only– \dfS for system functions

● \ef to edit a funcion

Add columns to VIEWs● In the bad old days:

– need to add another column to your VIEW?– have to drop it & recreate it– have to drop & recreate all dependancies– enter the World Of Pain

● In 8.4:– ALTER VIEW lets you add columns– Can't rename or modify though

Performance & Monitoring● Parallel Restore● Improved Hash Indexes● pg_stat_user_functions● pg_stat_statements● More Dtrace probes● auto_explain● Other Performance Improvements

Parallel Restore● In 8.3, we were single-threaded

pg_dump Restoredumpfile

8 Hours

Parallel Restore● In 8.4, Multi-core, Restore!

pg_dump

Restore

dumpfile

2 Hours

Restore

RestoreRestore

RestoreRestore

RestoreRestore

Improved Hash Indexes● Our old hash indexes were slow and useless● Improved hash indexes are fast!

– use them for ID columns● or other unique keys

– not completely recovery-safe yet though● don't switch over production DBs until 8.5

● Google Summer of Code project!

pg_stat_user_functions● For each of your functions, see

– # of times called– amount of time spent– amount of time spent excluding other functions

pg_stat_statements

pg_stat_statements

log pgFouinelogfile

pg_stat_statementspostgres=# SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 3;-[ RECORD 1 ]------------------------------------------------------------userid | 10dbid | 63781query | UPDATE branches SET bbalance = bbalance + $1 WHERE bid = $2;calls | 3000total_time | 20.716706rows | 3000-[ RECORD 2 ]------------------------------------------------------------userid | 10dbid | 63781query | UPDATE tellers SET tbalance = tbalance + $1 WHERE tid = $2;calls | 3000total_time | 17.1107649999999rows | 3000-[ RECORD 3 ]------------------------------------------------------------userid | 10dbid | 63781query | UPDATE accounts SET abalance = abalance + $1 WHERE aid = $2;calls | 3000total_time | 0.645601rows | 3000

More DTrace Probes* Probes to measure query timequery-parse-start (int, char *)query-parse-done (int, char *)query-plan-start ()query-plan-done ()query-execute-start ()query-execute-done ()query-statement-start (int, char *)query-statement-done (int, char *)

* Probes to measure dirty buffer writes by the backend because bgwriter is not effective

dirty-buffer-write-start (int, int, int, int)dirty-buffer-write-done (int, int, int, int)

* Probes to measure physical writes from the shared bufferbuffer-write-start (int, int, int, int)buffer-write-done (int, int, int, int, int)

* Probes to measure reads of a relation from a particular buffer blockbuffer-read-start (int, int, int, int, int)buffer-read-done (int, int, int, int, int, int)

* Probes to measure the effectiveness of buffer cachingbuffer-hit ()buffer-miss ()

* Probes to measure I/O time because wal_buffers is too smallwal-buffer-write-start ()wal-buffer-write-done ()

* Probes to measure checkpoint stats such as running time, buffers written, xlog files added, removed, recycled, etc

checkpoint-start (int)checkpoint-done (int, int, int, int, int)

* Probes to measure Idle in Transaction and client/network timeidle-transaction-start (int, int)idle-transaction-done ()

* Probes to measure sort timesort-start (int, int, int, int, int)sort-done (int, long)

* Probes to determine whether or not the deadlock detector has found a deadlock

deadlock-found ()deadlock-notfound (int)

* Probes to measure reads/writes by block numbers and relationssmgr-read-start (int, int, int, int)smgr-read-end (int, int, int, int, int, int)smgr-write-start (int, int, int, int)smgr-write-end (int, int, int, int, int, int)

auto_explain● misnamed; actually allows you to manually set specific

queries/sessions/functions to output explain plans to the log postgres=# LOAD 'auto_explain'; postgres=# SET auto_explain.log_min_duration = 0; postgres=# SELECT count(*) FROM pg_class, pg_index WHERE oid = indrelid AND indisunique;

This might produce log output such as:

LOG: duration: 0.986 ms plan: Aggregate (cost=14.90..14.91 rows=1 width=0) -> Hash Join (cost=3.91..14.70 rows=81 width=0) Hash Cond: (pg_class.oid = pg_index.indrelid) -> Seq Scan on pg_class (cost=0.00..8.27 rows=227 width=4) -> Hash (cost=2.90..2.90 rows=81 width=4) -> Seq Scan on pg_index (cost=0.00..2.90 rows=81 width=4)

More Performance Improvements● Free Space Map is dynamically sized (no more

max_fsm_pages!)● Visibility Map

– VACUUM only changed pages– Index-only Scans in 8.5

● Less writing to pgstat file– plus you can move it

Stored Procedures● Default Parameters● Variadic Parameters● New PL/pgSQL Statements● PL/pythonU OUT Parameters

DEFAULT parameters

CREATE OR REPLACE FUNCTI ON) adder a i nt 4 0de f aul t ,

b i nt 2de f aul t ( ' 'RETURNS i nt LANGUAGE sql

' $ + $ 'AS sel ect 1 2 ;

) (SELECT adder ; ) (SELECT adder 1 ; ) (SELECT adder 1, 2 ;

VARIADIC parametersCREATE OR REPLACE FUNCTION

adder(VARIADIC v int[]) RETURNS int AS $$

DECLARE s int; i int;BEGIN

s:=0;FOR i IN SELECT generate_subscripts(v,1) LOOP

s := s + i;END LOOP;

RETURN s;END;$$ LANGUAGE 'plpgsql';

SELECT adder(1);SELECT adder(1,2,3);SELECT adder(40,2);

New PL/PgSQL Statements● RETURNS TABLE

– SQL-compliant alias for "SETOF"● CASE statement

– real switching logicCASE

WHEN x BETWEEN 0 AND 10 THEN

msg := 'value is between zero and ten';

WHEN x BETWEEN 11 AND 20 THEN

msg := 'value is between eleven and twenty';

END CASE;

PL/pythonU OUT Parameters● You now can use IN, OUT and INOUT

parameters with PL/pythonU functions.● That's it!

Exotic Features● SQL/MED Connection Manager● Multi-column GIN Indexes● Boyer-Moore String Searching

SQL/MED● Foundation for connecting to external servers

– Future of PL/proxy and DBconnect– Future of DBI-Connect

CREATE FOREIGN DATA WRAPPER pgsql LIBRARY 'pgsql_fdw';

CREATE SERVER foo FOREIGN DATA WRAPPER pgsql OPTIONS (host 'remotehost', dbname 'remotedb');

CREATE USER MAPPING FOR PUBLIC SERVER foo OPTIONS (username 'bob', password 'secret');

Multi-Column GIN Indexes● Bad Old Days: to do a single Full Text Search

index over several columns, you had to concatenate them.

● New Goodness: you can now do a proper multicolumn index– and it's faster!

Boyer-Moore String Searching

Boyer-Moore String Searching

No, I don't know what it is either.

But we have it now.

Security● Refactored SSL● Improved pg_hba.conf● Column-level Permissions● SE-Postgres

Refactored SSL by Magnus

● Proper certificate verification– Choose level, full verification is default

● Control over all key and certificate files● SSL certificate authentication

– Trusted root certificate– Map «cn» value of certificate

pg_hba Improvements● "crypt" is gone (insecure)● «ident sameuser» => «ident»● New format for options

– name=value for all options● usermaps for all external methods

– with regexp support● Parsed on reload

Column PermissionsREVOKE SELECT (col1, col2), INSERT (col1, col2) ON tab1 FROM role2;

● Restrict access to sensitive columns from unprivileged ROLEs– more fine-grained security– no longer need to use VIEWs to do this

PostgreSQL Needs YOU

Many Patches == Lots of Testing● Bug Testing

– can you make 8.4 crash?● Specification Testing

– do the features do what the docs say they do?● Performance Testing

– is 8.4 really faster? How much?● Combinational Testing

– what happens when you put several new features together?

Many Patches == Lots of Testing

1. Take a copy of your production applications2. Port them to 8.43. Report breakage and issues4. Play with implementing new features

Do It Now!We're counting on you!

Contact Information● Josh Berkus

– josh@postgresql.org– http://it.toolbox.com/

blogs/database-soup

● Upcoming events– SCALE 7, Los

Angeles, Feb. 20– pgCon 2009, Ottawa,

May 20

This talk is copyright 2009 Josh Berkus, and is licensed under the Creative Commons Attribution License

top related