straightforward integration of federated data

Post on 12-Sep-2021

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Straightforward Integrationof Federated Data

Using standard RDBMS servers tolink databases across the internet

1st Annual CASIMIR SymposiumRome, Nov 27–28, 2007

Walter PargentGSF - National Research Center for Environment and Health

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (2)

Session 3Walter Pargent

Short Glossary

• RDBMS =Relational DataBase Management System

• SQL =Structured Query Language

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (3)

Session 3Walter Pargent

Aims / Topics

• Show standard RDBMS featuresas options for dynamic data integration

• Discuss databaseLinkage vs. Replication

• Technical, not semantic integration level• How can technology help semantics• In case: Enquire other’s experiences !!!

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (4)

Session 3Walter Pargent

Janan Eppig @ Corfu

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (5)

Session 3Walter Pargent

Data integrationETL style

• ETL =– Extraction– Transformation

(and Transfer)– Loading

• => REPLICATION

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (6)

Session 3Walter Pargent

Thumb ruleof data design

• Redundancy abets inconsistency,so better avoid it …

• … wherever possible– Exceptions for

• Performance issues• (Ease of implementation)

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (7)

Session 3Walter Pargent

Linkage

Information-Proxy(RDBMS-Server)

Web-Interfaces

DataMart(BioMart)

SQL-Interface

Web-Services

M a s t e r - D a t a b a s e - S e r v e r s

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (8)

Session 3Walter Pargent

virtual private networks (VPN)

in DemilitarizedFirewall-Zones

The paranoid option:

Linkage Security Aspects

Information-Proxy(RDBMS-Server)

Local-Transact-Work-DBs

Local Proxies

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (9)

Session 3Walter Pargent

Integration@YourDesk

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (10)

Session 3Walter Pargent

MS-ACCESS

• … as an RDBMS client• Technical basis

– Data source names (DSNs) defined forindividual table connections

– Open Database Connectivity (ODBC)– Ability to join between heterologous RDBMS

tables and (local) spreadsheets– Relationship manager (for join predefs)

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (11)

Session 3Walter Pargent

Integration@YourDeskFile -> External Data -> Link Tables

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (12)

Session 3Walter Pargent

Integration@YourDeskCreate an ODBC link

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (13)

Session 3Walter Pargent

Integration@YourDeskTools -> Relationships

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (14)

Session 3Walter Pargent

Integration@YourDesk

3 in One

• EMMASTR– mySQL

(Solaris)• cryoDb

(devel)– mySQL

(Windows)• MouseNet

– Sybase ASE(HP-UX)

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (15)

Session 3Walter Pargent

Integration@YourDeskEMMA_2_MouseNet Query Building

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (16)

Session 3Walter Pargent

Integration@YourDeskEMMA_2_MouseNet : Results

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (17)

Session 3Walter Pargent

Integration@YourDesk

• Using a desktop database (MS-Access)is suitable for– Small scale and low complexity integration

• Prototyping

– Small scale ad hoc replication– Small scale data migrations / transformations– Ad hoc data editing

• … not suitable for– Handling of large datasets

(no optimization)

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (18)

Session 3Walter Pargent

BIG & CENTRAL

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (19)

Session 3Walter Pargent

Large scale &central server

• Rather use a ‘real’ RDBMS …• … on a central performant server• e.g.:

– mySQL FEDERATED engine– Sybase ASE CIS

(Component Integration Services)

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (20)

Session 3Walter Pargent

mySQL

• Table defined on the master server# create table xxx ( c1 int, c2 char, … )ENGINE=InnoDB;

• Table definition on proxy server# create table f_xxx (c1 int, c2 char, …)ENGINE=FEDERATEDCONNECTION=

'mysql://usr:PW@host:3306/dbnam/table';

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (21)

Session 3Walter Pargent

mySQL (5.1)

• Alternative to link many tables(since version 5.1)# CREATE SERVER fedlink

FOREIGN DATA WRAPPER mysqlOPTIONS (

USER 'fed_user',HOST 'remote_host',PORT 9306,DATABASE 'federated' );

# CREATE TABLE test_table (col1, col2, …)ENGINE=FEDERATEDCONNECTION='fedlink.test_table';

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (22)

Session 3Walter Pargent

mySQL (doc)

• Find mySql documentation for that at:

– “Storage Engines”• “The Federated Storage Engine”

– http://dev.mysql.com/doc/refman/5.1/en/federated-storage-engine.html

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (23)

Session 3Walter Pargent

Sybase ASE

ToDo on the Master DATABASE• Add entry to interfaces file

# GSFIEG_JAXmaster tcp ether xxx.gsf.de 5555query tcp ether xxx.gsf.de 5555

• As sa user on the MASTERDB# sp_addserver GSFIEG_JAX# sp_addremotelogin GSFIEG_JAX# sp_addremotelogin GSFIEG_JAX, pargent,

<someOtherLocalUserName>

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (24)

Session 3Walter Pargent

Sybase ASEToDo on the Proxy DATABASE• Add entry to interfaces file

# JAX_PUBLICmaster tcp ether yyy.jax.org 4000query tcp ether yyy.jax.org 4000

• As sa user on the ProxyDB# sp_addserver JAX_PUBLIC# sp_addlogin pargent, <lognamAtJaxDb>

,null,null,‘Remote user wapar at jax‘

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (25)

Session 3Walter Pargent

Sybase ASE

Test it on Proxy DB• Remote Procedure Call (RPC)

# Exec JAX_PUBLIC.mgd..sp_tables

• Example for proxy table# Create proxy_table jax_vocab

at 'JAX_PUBLIC.mgd..VOC_vocab'# Create existing table jax_proc

(c1 int,c2 varchar(10),…)external procedure at 'JAX_PUBLIC.mgd..some_proc'

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (26)

Session 3Walter Pargent

Other RDBMS’s

• PostgreSQL: contrib/dblink()• IBM DB2 UDB: table nicknames• Oracle: database links• MS SQL-Server: ???

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (27)

Session 3Walter Pargent

Heterologous Links

• Connections between different RDBMS’s• Implemented using

– ODBC (universal)– protocols specific to individual RDBMS

(esp. towards Oracle, DB2, SQL-Server)

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (28)

Session 3Walter Pargent

Availability …

• … of features forheterologous database integration:– mySQL and postgreSQL (NOT YET)– Sybase: ECDA options

(Enterprise Connect Data Access)– Oracle: Transparent Gateway option– IBM DB2: Discovery Link– MS SQL-Server: ???

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (29)

Session 3Walter Pargent

Linkage viaApplication vs. RDBMS

• Webservices and Webapps link onapplication level, which is appropriate forsequentiel / cascading 1:n linkage– Typical web-application (HTML tab with links)– Web-services

• Must have SQL (or similar interface) to dooverall set operations

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (30)

Session 3Walter Pargent

Why set operations• Scientific inference needs flexibility

– Cannot be confined to pre defined user interfaces(static query forms)

– “Find all mouse lines with traits linked to energymetabolism …”

• Need to use deliberate set oriented queries– Feasibility study / data quality checks in prototype

phase– Pre phase consistency checking to investigate data

curation needs– Prototype functionality and performance checks– Data transformation and masking (views)

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (31)

Session 3Walter Pargent

Integration atthe input (write) level

• To reduce subsequent need for datacuration, we’ll want to use:– Central Controlled Vocabularies– Ontology servers … (SQL, Webservices or

JavaEnterpriseBeans)– Central Source of Linkage objects

• e.g. a proactive mouseLineDb producingworldwide unique identifiers (codes)

• c.f. internal situation at GSF (ambiguous linecodes)

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (32)

Session 3Walter Pargent

Replication vs. Federation

• PRO:– Data are at hand for performance tuning and data

curation– Best performance for (client-side) queries

• CON:– Storage needs– Risk of outdated data– Risk of inconsistencies (from duplicates)– Expense of regular uploads

• Network load, script implementation, downtimes

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (33)

Session 3Walter Pargent

Replication vs. Federation

• PRO:– Always up to date– No denormalization (no add. risk for inconsistencies)– No extra expense for implementation of auto upload

• CON:– Several points of failure for service availability– Inadequate queries may hamper the master server

• However this could be hindered by db usage configuration(available resources, max rowcount, etc … per user …)

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (34)

Session 3Walter Pargent

Nix is fix(There is more than one way to Rome …)

• Decisions need not /must notbe carved in stone …– Find the currently appropriate mixture– Revalidate decisions regularly

(based on query statistics and user feedback)

– Change easily, whenever ..• Data volatility changes (prototype -> HTP project)

• Service reliability changes (to the better or worse)

• For performance issues

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (35)

Session 3Walter Pargent

Substitute linkage …

View:XXX

MasterDB

Proxy-table

p_XXX

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (36)

Session 3Walter Pargent

… by replication

View:XXX

MasterDB

Proxy-table

p_XXX

ReplicatedTable:XXX

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (37)

Session 3Walter Pargent

Linkage helps replication

• Use linkage of federated data forstraightforward implementation ofreplication# Insert into replica_table

select * from proxy_table• Use master_tab timestamps (ins_on,

upd_on) as filters for incrementalreplication …

• Use diff_log table for master_tab

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (38)

Session 3Walter Pargent

Where to use what

• Replication– Where master server is not reliably available

or has inadequate performance– The master server is not trusted for

sustainability (long term persistence)– Query load to this part of data is

extraordinarily high

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (39)

Session 3Walter Pargent

Where to use what

• Federation / Linkage– Master is well maintained by a “buddy”– Query load is intermediate or low– High volatility of master data– … or just in any case, where there is no

important CON

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (40)

Session 3Walter Pargent

Outlook

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (41)

Session 3Walter Pargent

Outlook

• Need to do heterologous tests includingperformance checks

• Adhoc statistics and complex queriesrapidly developed by pure SQL as viewsor stored procedures(by power-users or bio-informaticians …)

• Combine with BioMart ?

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (42)

Session 3Walter Pargent

Other tools• RDBMS inherent scheduling

and replication options• 3rd party ETL tools

(Extraction, Transformation, Loading)• Usage of client GUI for

SQL stored procedures

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (43)

Session 3Walter Pargent

THANXto

1st Ann. CASIMIR symposiumRome, Nov 28, 2007

Straightforward integrationof federated data (44)

Session 3Walter Pargent

Thanks to

• Colleagues at the GSF• Martin Hrabé de Angelis• Janan Eppig, Richard Butler and Damian

Smedley• Paul Schofield

top related