postgresql federation

Download Postgresql Federation

If you can't read please download the document

Upload: jim-mlodgenski

Post on 16-Apr-2017

4.051 views

Category:

Technology


0 download

TRANSCRIPT

Bright Blue

Federated PostgreSQL

Who Am I?

Jim [email protected]

@jim_mlodgenski

Co-organizer ofNYC PUG (www.nycpug.org)

Philly PUG (www.phlpug.org)

CTO, OpenSCGwww.openscg.com

http://nyc.pgconf.us

What is a federated database?

A federated database system is a type of meta-database management system (DBMS), which transparently maps multiple autonomous database systems into a single federated database. The constituent databases are interconnected via a computer network and may be geographically decentralized. ... There is no actual data integration in the constituent disparate databases as a result of data federation.-Wikipedia

How does PostgreSQL do it?

Uses Foreign Table Wrappers (FDW)

Used with SQL/MEDNew ANIS SQL 2003 Extension

Management of External Data

Standard way of handling remote objects in SQL databases

Wrappers used by SQL/MED to access remotes data sources

Types of Foreign Data Wrappers

SQL

NoSQL

File

Miscellaneous

PostgreSQL

SQL Wrappers

Oracle

MySQL

Informix

Firebird

SQLite

JDBC

ODBC

SQL Wrappers

CREATE SERVER oracle_server FOREIGN DATA WRAPPER oracle_fdw OPTIONS (dbserver 'ORACLE_DBNAME');

CREATE USER MAPPING FOR CURRENT_USER SERVER oracle_server OPTIONS (user 'scott', password 'tiger');

CREATE FOREIGN TABLE fdw_test ( userid numeric, username text, email text ) SERVER oracle_serverOPTIONS ( schema 'scott', table 'fdw_test');

postgres=# select * from fdw_test; userid | username | email --------+----------+------------------- 1 | scott | [email protected](1 row)

NoSQL Wrappers

MongoDB

CouchDB

MonetDB

Redis

Neo4j

Tycoon

NoSQL Wrappers

CREATE SERVER mongo_server FOREIGN DATA WRAPPER mongo_fdw OPTIONS (address '192.168.122.47', port '27017');

CREATE FOREIGN TABLE databases ( _id NAME, name TEXT )SERVER mongo_serverOPTIONS (database 'mydb', collection 'pgData');

test=# select * from databases ; _id | name --------------------------+------------ 52fd49bfba3ae4ea54afc459 | mongo 52fd49bfba3ae4ea54afc45a | postgresql 52fd49bfba3ae4ea54afc45b | oracle 52fd49bfba3ae4ea54afc45c | mysql 52fd49bfba3ae4ea54afc45d | redis 52fd49bfba3ae4ea54afc45e | db2(6 rows)

File Wrappers

Delimited files

Fixed length files

JSON files

File Wrappers

CREATE SERVER pg_load FOREIGN DATA WRAPPER file_fdw;

CREATE FOREIGN TABLE leads ( first_name text, last_name text, company_name text, address text, city text, county text, state text, zip text, phone1 text, phone2 text, email text, web text) SERVER pg_loadOPTIONS ( filename '/tmp/us-500.csv', format 'csv', header 'TRUE' );

test=# select first_name || ' ' || last_name as full_name, email from leads limit 3; full_name | email -------------------+------------------------------- James Butt | [email protected] Josephine Darakjy | [email protected] Art Venere | [email protected](3 rows)

Miscellaneous Wrappers

Hadoop

LDAP

S3

WWW

PG-Strom

Hadoop Wrapper

CREATE SERVER hive_server FOREIGN DATA WRAPPER hive_fdw OPTIONS (address '127.0.0.1', port '10000');

CREATE USER MAPPING FOR PUBLIC SERVER hive_server;

CREATE FOREIGN TABLE order_line ( ol_w_id integer, ol_d_id integer, ol_o_id integer, ol_number integer, ol_i_id integer, ol_delivery_d timestamp, ol_amount decimal(6,2), ol_supply_w_id integer, ol_quantity decimal(2,0), ol_dist_info varchar(24)) SERVER hive_server OPTIONS (table 'order_line');

INSERT INTO item_sale_month SELECT ol_i_id as i_id, EXTRACT(YEAR FROM ol_delivery_d) as year, EXTRACT(MONTH FROM ol_delivery_d) as month, sum(ol_amount) as amount FROM order_line GROUP BY 1, 2, 3;

Hadoop Wrapper

Hadoop foreign tables can also be writable

CREATE FORIEGN TABLE audit ( audit_id bigint, event_d timestamp, table varchar, action varchar, user varchar,) SERVER hive_server OPTIONS (table 'audit', flume_port '44444');

INSERT INTO audit VALUES (nextval('audit_id_seq'), now(), 'users', 'SELECT', 'scott');

Hadoop Wrapper

It also works with HBase tables

CREATE FOREIGN TABLE hive_hbase_table ( key varchar, value varchar) SERVER localhive OPTIONS (table 'hbase_table', hbase_address 'localhost', hbase_port '9090', hbase_mapping ':key,cf:val');

INSERT INTO hive_hbase_table VALUES ('key1', 'value1');INSERT INTO hive_hbase_table VALUES ('key2', 'value2');UPDATE hive_hbase_table SET value = 'update' WHERE key = 'key2';DELETE FROM hive_hbase_table WHERE key='key1';SELECT * from hive_hbase_table;

WWW Wrapper

CREATE SERVER www_fdw_server_google_search FOREIGN DATA WRAPPER www_fdw OPTIONS (uri 'https://ajax.googleapis.com/ajax/services/search/web?v=1.0');

CREATE USER MAPPING FOR current_user SERVER www_fdw_server_google_search;

CREATE FOREIGN TABLE www_fdw_google_search ( q text, GsearchResultClass text, unescapedUrl text, url text, visibleUrl text, cacheUrl text, title text, titleNoFormatting text, content text) SERVER www_fdw_server_google_search;

select url,substring(title,1,25)||'...',substring(content,1,25)||'...' from www_fdw_google_search where q='postgresql fdw'; url | ?column? | ?column? -------------------------------------------------------------+------------------------------+------------------------------ http://wiki.postgresql.org/wiki/Foreign_data_wrappers | Foreign data wrappers - '2011-01-01'; QUERY PLAN ------------------------------------------------------------------ Foreign Scan on public.bird_strikes (cost=100.00..134.54 rows=427 width=40) Output: airport, flight_date Remote SQL: SELECT airport, flight_date FROM public.bird_strikes WHERE ((flight_date > '2011-01-01 00:00:00'::timestamp without time zone))(3 rows)

PostgreSQL Wrapper

Sends built-in immutable functions

test=# explain verbose select airport, flight_date from bird_strikes where flight_date > '2011-01-01' and length(airport) < 10; QUERY PLAN ------------------------------------------------------------------------------- Foreign Scan on public.bird_strikes (cost=100.00..135.24 rows=142 width=40) Output: airport, flight_date Remote SQL: SELECT airport, flight_date FROM public.bird_strikes WHERE ((flight_date > '2011-01-01 00:00:00'::timestamp without time zone)) AND ((length(airport) < 10))(3 rows)

PostgreSQL Wrapper

Writable (INSERT, UPDATE, DELETE)

test=# explain verbose update bird_strikes set airport = 'Unknown' where record_id = 313339; QUERY PLAN ------------------------------------------------------------------------------- Update on public.bird_strikes (cost=100.00..111.05 rows=1 width=964) Remote SQL: UPDATE public.bird_strikes SET airport = $2 WHERE ctid = $1 -> Foreign Scan on public.bird_strikes (cost=100.00..111.05 rows=1 width=964) Output: aircraft_type, 'Unknown'::character varying, altitude, aircraft_model, num_wildlife_struck, impact_to_flight, effect, location, flight_num, flight_date, record_id, indicated_damage, freeform_en_route, num_engines, airline, origin_state, phase_of_flight, precipitation, wildlife_collected, wildlife_sent_to_smithsonian, remarks, reported_date, wildlife_size, sky_conditions, wildlife_species, when_time_hhmm, time_of_day, pilot_warned, cost_out_of_service, cost_other, cost_repair, cost_total, miles_from_airport, feet_above_ground, num_human_fatalities, num_injured, speed_knots, ctid Remote SQL: SELECT aircraft_type, altitude, aircraft_model, num_wildlife_struck, impact_to_flight, effect, location, flight_num, flight_date, record_id, indicated_damage, freeform_en_route, num_engines, airline, origin_state, phase_of_flight, precipitation, wildlife_collected, wildlife_sent_to_smithsonian, remarks, reported_date, wildlife_size, sky_conditions, wildlife_species, when_time_hhmm, time_of_day, pilot_warned, cost_out_of_service, cost_other, cost_repair, cost_total, miles_from_airport, feet_above_ground, num_human_fatalities, num_injured, speed_knots, ctid FROM public.bird_strikes WHERE ((record_id = 313339)) FOR UPDATE(5 rows)

PostgreSQL Wrapper

Writes are transactional

test=# select airport from bird_strikes where record_id = 313339; airport --------- Unknown(1 row)

test=# BEGIN;BEGINtest=# update bird_strikes set airport = 'UNKNOWN' where record_id = 313339;UPDATE 1test=# ROLLBACK;ROLLBACKtest=# select airport from bird_strikes where record_id = 313339; airport --------- Unknown(1 row)

Limitations

Aggregates are not pushed down

test=# explain verbose select count(*) from bird_strikes; QUERY PLAN --------------------------------------------------------------------------------------------------------- Aggregate (cost=220.92..220.93 rows=1 width=0) Output: count(*) -> Foreign Scan on public.bird_strikes (cost=100.00..212.39 rows=3413 width=0) Output: aircraft_type, airport, altitude, aircraft_model, num_wildlife_struck, impact_to_flight, effect, location, flight_num, flight_date, record_id, indicated_damage, freeform_en_route, num_engines, airline, origin_state, phase_of_flight, precipitation, wildlife_collected, wildlife_sent_to_smithsonian, remarks, reported_date, wildlife_size, sky_conditions, wildlife_species, when_time_hhmm, time_of_day, pilot_warned, cost_out_of_service, cost_other, cost_repair, cost_total, miles_from_airport, feet_above_ground, num_human_fatalities, num_injured, speed_knots Remote SQL: SELECT NULL FROM public.bird_strikes(5 rows)

Limitations

ORDER BY, GROUP BY, LIMIT not pushed down

test=# explain verbose select flight_num from bird_strikes order by flight_date limit 5; QUERY PLAN ------------------------------------------------------------------------------------------- Limit (cost=169.66..169.67 rows=5 width=40) Output: flight_num, flight_date -> Sort (cost=169.66..172.86 rows=1280 width=40) Output: flight_num, flight_date Sort Key: bird_strikes.flight_date -> Foreign Scan on public.bird_strikes (cost=100.00..148.40 rows=1280 width=40) Output: flight_num, flight_date Remote SQL: SELECT flight_num, flight_date FROM public.bird_strikes(8 rows)

Limitations

Joins not pushed down

test=# explain verbose select s.name, b.flight_date test-# from bird_strikes b, state_code s test-# where b.location = s.abbreviation and flight_date > '2011-01-01';

QUERY PLAN -------------------------------------------------------------------------------

Hash Join (cost=239.88..349.95 rows=1986 width=40) Output: s.name, b.flight_date Hash Cond: ((s.abbreviation)::text = (b.location)::text) -> Foreign Scan on public.state_code s (cost=100.00..137.90 rows=930 width=64) Output: s.id, s.name, s.abbreviation, s.country, s.type, s.sort, s.status, s.occupied, s.notes, s.fips_state, s.assoc_press, s.standard_federal_region, s.census_region, s.census_region_name, s.census_division, s.census_devision_name, s.circuit_court Remote SQL: SELECT name, abbreviation FROM public.state_code -> Hash (cost=134.54..134.54 rows=427 width=40) Output: b.flight_date, b.location -> Foreign Scan on public.bird_strikes b (cost=100.00..134.54 rows=427 width=40) Output: b.flight_date, b.location Remote SQL: SELECT location, flight_date FROM public.bird_strikes WHERE ((flight_date > '2011-01-01 00:00:00'::timestamp without time zone))(11 rows)

Limitations (Gotcha)

Sometimes the foreign tables don't act like tables

test=# SELECT l.*, w.lat, w.lng FROM leads l, www_fdw_geocoder_google w WHERE w.address = l.address || ',' || l.city || ',' || l.state;

first_name | last_name | company_name | address | city | county | state | zip | phone1 | phone2 | email | web | lat | lng ------------+-----------+--------------+---------+------+--------+-------+-----+--------+--------+-------+-----+-----+-----(0 rows)

Limitations (Gotcha)

QUERY PLAN ------------------------------------------------------------------------------------------- Merge Join (cost=187.47..215.47 rows=1000 width=448) Output: l.first_name, l.last_name, l.company_name, l.address, l.city, l.county, l.state, l.zip, l.phone1, l.phone2, l.email, l.web, w.lat, w.lng Merge Cond: ((((((l.address || ','::text) || l.city) || ','::text) || l.state)) = w.address) -> Sort (cost=37.64..38.14 rows=200 width=384) Output: l.first_name, l.last_name, l.company_name, l.address, l.city, l.county, l.state, l.zip, l.phone1, l.phone2, l.email, l.web, (((((l.address || ','::text) || l.city) || ','::text) || l.state)) Sort Key: (((((l.address || ','::text) || l.city) || ','::text) || l.state)) -> Foreign Scan on public.leads l (cost=0.00..30.00 rows=200 width=384) Output: l.first_name, l.last_name, l.company_name, l.address, l.city, l.county, l.state, l.zip, l.phone1, l.phone2, l.email, l.web, ((((l.address || ','::text) || l.city) || ','::text) || l.state) Foreign File: /tmp/us-500.csv Foreign File Size: 81485 -> Sort (cost=149.83..152.33 rows=1000 width=96) Output: w.lat, w.lng, w.address Sort Key: w.address -> Foreign Scan on public.www_fdw_geocoder_google w (cost=0.00..100.00 rows=1000 width=96) Output: w.lat, w.lng, w.address WWW API: Request(16 rows)

Limitations (Gotcha)

CREATE OR REPLACE FUNCTION google_geocode( OUT first_name text, OUT last_name text, OUT company_name text, OUT address text, OUT city text, OUT county text, OUT state text, OUT zip text, OUT phone1 text, OUT phone2 text, OUT email text, OUT web text, OUT lat text, OUT lng text) RETURNS SETOF RECORD AS $$DECLARE r record; f_adr text; l_lat text; l_lng text;BEGIN FOR r IN SELECT * FROM leads LOOP f_adr := r.address || ',' || r.city || ',' || r.state;

EXECUTE 'SELECT lat, lng FROM www_fdw_geocoder_google WHERE address = $1' INTO l_lat, l_lng USING f_adr;

SELECT r.first_name, r.last_name, r.company_name, r.address, r.city, r.county, r.state, r.zip, r.phone1, r.phone2, r.email, r.web, l_lat, l_lng INTO first_name, last_name, company_name, address, city, county, state, zip, phone1, phone2, email, web, lat, lng; RETURN NEXT; END LOOP;END $$ LANGUAGE plpgsql;

Writing a new FDW

Might not need to write one if there is a http interface

Use the Blackhole as a templatehttps://bitbucket.org/adunstan/blackhole_fdw

Writing a new FDW

Datum blackhole_fdw_handler(PG_FUNCTION_ARGS){.../* these are required */fdwroutine->GetForeignRelSize = blackholeGetForeignRelSize;fdwroutine->GetForeignPaths = blackholeGetForeignPaths;fdwroutine->GetForeignPlan = blackholeGetForeignPlan;fdwroutine->BeginForeignScan = blackholeBeginForeignScan;fdwroutine->IterateForeignScan = blackholeIterateForeignScan;fdwroutine->ReScanForeignScan = blackholeReScanForeignScan;fdwroutine->EndForeignScan = blackholeEndForeignScan;

/* remainder are optional - use NULL if not required *//* support for insert / update / delete */fdwroutine->AddForeignUpdateTargets = blackholeAddForeignUpdateTargets;fdwroutine->PlanForeignModify = blackholePlanForeignModify;fdwroutine->BeginForeignModify = blackholeBeginForeignModify;fdwroutine->ExecForeignInsert = blackholeExecForeignInsert;fdwroutine->ExecForeignUpdate = blackholeExecForeignUpdate;fdwroutine->ExecForeignDelete = blackholeExecForeignDelete;fdwroutine->EndForeignModify = blackholeEndForeignModify;

/* support for EXPLAIN */fdwroutine->ExplainForeignScan = blackholeExplainForeignScan;fdwroutine->ExplainForeignModify = blackholeExplainForeignModify;

/* support for ANALYSE */fdwroutine->AnalyzeForeignTable = blackholeAnalyzeForeignTable;

PG_RETURN_POINTER(fdwroutine);}

Future

Even more Wrappers

Check Constraints on Foreign TablesAllows partitioning

JoinsCustom Scan APIProbably will not be the way to do this, but progress being made

Questions?

[email protected]@jim_mlodgenski