federated databases for the geosciences csig july 21, 2005 douglas s. greer

32
Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer

Upload: britney-blair

Post on 02-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Federated Databases for the Geosciences

CSIGJuly 21, 2005

Douglas S. Greer

Overview

• Database Federation Primer– Basic concepts and principles– DB2 Information Integrator

• The CHRONOS Federated Database– Integration of 7 independently developed

geoscience databases

Top-Level View of a Federated Database

Applications

FederatedDatabase

Data Source A Data Source DData Source CData Source B

Federated DB Data Sources

• Geographically Distributed Data Sources

• Heterogeneous Data Sources– Relational Databases – most common– Non-relational Sources– Web Pages / Web Services– Flat Files

Federated Databases

• May or may not actually contain data

• Federated database can create Global Views that define data in a uniform way across the data sources

• Applications can then access data through the global view using the standardized SQL schema

IBM DB2 Information Integrator

• Provides a framework for strategic information integration to help applications access, manipulate and integrate diverse and distributed data sources across multiple servers in real time.

• Can access structured and unstructured data types including relational databases such as Oracle, MySQL, PostgreSQL and MS SQL Server

Connecting to the Remote Database

• Step 1 – Create WRAPPER– Mechanism that the federated server uses to

communicate with a data source

– Identifies “Driver” code

• Step 2 – Identify SERVER– Identifies the connection to a data source

– Specifies which WRAPPER to use

– Directly or Indirectly specifies the server name, server type, version, database name and special parameters

Connecting to the Remote Database

• Step 3 – Specify USER MAPPING– Maps between a federated database user and an

authorized user (account and password) of a data source

• Step 4 – Define NICKNAMES– Pointer to a table or view in a data source

– Creates a binding between a local name and the data source name and hides the associated metadata details

A Simple Federated View

CREATE VIEW <Table_Name> AS

SELECT

(Database #1 SQL Command)

UNIONSELECT

(Database #2 SQL Command) UNION

SELECT(Database #3 SQL Command)

Identifying Data Sources

CREATE VIEW <Table_Name> ASSELECT

‘PALEOSTRAT’ AS db_namegenus_id AS genus…FROM PSTRAT.tbl_taxonomy…

UNIONSELECT

‘PALEOBIOLOGY’ AS db_namegenus_name AS genus…

Materialized Views• Federated databases normally do not store data

locally. Data from remote sites is fetched as needed.• Materialized Views create a local copy of a Global-

View.– Advantage: faster access – Disadvantages: Data may be stale. Refreshes required

• Several of the CHRONOS Global-Views have versions that use materialized views to increase performance

CHRONOS Project

• Create a dynamic, interactive and time-calibrated framework for Earth history

• Network of chronostratigraphy databases• Online stratigraphic record• Visualization and analytical tools• Develop a better understanding of fundamental

Earth processes through time

CHRONOS Federated Databases

• The following databases are all part of the CHRONOS Federated Database at SDSC based on IBM’s DB2 Information Integrator– Neptune– PaleoStrat– PaleoBiology– Janus– TimeScale– FAUNMAP– MIOMAP

Neptune Database

• Developed at ETH Zürich and currently hosted by Iowa State University

• Contains microfossil occurrences reported in DSDP and ODP samples

• PostgreSQL based• Contains four basic types of data: Fossil Records,

Taxonomy, Age models and Biogeography data• Schema contains approximately 20 tables with

hundreds of thousands of taxonomic occurrences

PaleoStrat Database

• Developed at Boise State University in collaboration with the CHRONOS

• Designed to support geoscience tools with broad applicability

• Contains sedimentary, paleontologic and stratigraphic data

• MS SQL Server based• Approximately 120 tables with thousands

taxonomic occurrences• Data from other databases currently being loaded

PaleoBiology Database

• Hosted by the National Center for Ecological Analysis and Synthesis (NCEAS) at the University of California at Santa Barbara

• Contains collection-based occurrence and taxonomic information about marine and terrestrial animals and plants

• MySQL based• 16 tables with hundreds of thousands of

taxonomic occurrences

Janus Database

• Database for the Integrated Ocean Drilling Program (IODP) hosted at Texas A&M University

• Contains numerous types of ocean drilling data collected by United States, Japanese and European ships

• Oracle based• Approximately 580 tables with millions of

taxonomic occurrences

TimeScale Database

• Contains data and information from the 2004 Global Time Scale of the International commission on Stratigraphy and 19 other time scales

• Supports web service conversions tools• PostgreSQL based• Approximately 25 tables with thousands of data

records

FAUNMAP Database

• Hosted by Illinois State Museum• Contains information about the historical

distribution of mammal species in the United States

• MySQL based• Approximately 30 tables with tens of thousands of

data records

MIOMAP Database

• Hosted by University of California, Berkeley• Contains comprehensive spatial and temporal

analysis of Miocene mammal taxa for the Western United States

• MySQL based• Thousands of records in a relatively small number

of tables

The Taxa Global-View

• Simple View to list taxa in all of the databases

• CHRONOS Taxa– Database Name– Table_Name– Taxon_ID– Genus– Species

Taxa Global View Example

Conop9 Application

• Developed by Peter M. Sadler, Dept. of Earth Sciences, Univ. of California Riverside

• Correlates stratigraphic sections by minimizing the number of inconsistencies in the order of first and last occurrences of fossils between sections

• Originally developed for flat files then adapted to CHRONOS DB2/II global-views

CONOP9 Data Correlation

Conop9 Global View

• Developed for the Conop9 Application• The Conop9 SDSC global-view provides a much

larger collection of data than that available in the older flat file system

• The CHRONOS global-view presents exactly the data needed by Conop9 but uses different SQL statements for each database – this involves joins across four tables in Neptune, seven tables in PaleoStrat and five tables in Janus

Conop9 Global-View Attributes

• CHRONOS Conop Global View Fields– Database Name– Genus– Species– Taxon_id – Used to create Conop9 input tables– Hole_id – Which stratigraphic section does this come from– LAD – Last Appearance Datum, newest observation of this

taxa for this hole– FAD – First Appearance Datum, oldest observation of this

taxa for this hole– LAD and FAD are the result of an SQL computation

Conop9 Global View Example

Age-Depth Plot

Age/Depth Plot Global-Views

• Uniform Global-View of hole location for ADP application

• Surprisingly there are significant differences between databases

• CHRONOS Hole_Summary– Database Name– Hole_ID– Latitude– Longitude

Age/Depth Plot Views

• Uniform Global-View for Hole/Taxa Description for ADP application

• CHRONOS Hole_Desc– Database Name– Hole_ID– Elevation– Meters_of_Section– Taxa_Count

Age/Depth Global View Example

Questions ?