printed by storing and manipulating gridded data in spatially-enabled databases adit santokhee, jon...

1
printed by www.postersession.com STORING AND MANIPULATING GRIDDED DATA IN SPATIALLY-ENABLED DATABASES Adit Santokhee, Jon Blower, Keith Haines Reading e-Science Centre, Environmental Systems Science Centre Modern computer simulations and satellite observations of the oceans and atmosphere produce large amounts of data on the terabyte scale. Data providers, such as the Met Office and the European Centre for Medium-Range Weather Forecasts, need a manageable system for storing these datasets, whilst enabling the many consumers of the data to access them in a convenient and secure manner. Typically, these datasets are stored in flat files (often compressed) and each institution tends to store data in its own format (e.g., NetCDF, HDF, GRIB) with the data discretized on a variety of grids. End-users of the data (which include research institutions, government agencies and private industry) should not have to know the details of how the data are stored. They require a flexible means of accessing data and downloading them in the form they prefer. A typical query might involve the extraction of a subset of data from multiple source files, interpolation, aggregation and re- projection on a new grid. There is increasing justification for using database management systems (DBMSs) to store and manipulate gridded data. The principal advantages of such databases are data integrity, consistency, flexibility and effective access to data by diverse users of multiple applications. Implementing an efficient DBMS for large quantities of gridded data is very challenging. Barrodale Computing Services Ltd. (BCS) have recently developed a software module (the BCS Grid DataBlade), that plugs into the IBM/Informix Dynamic Server 9.x (IDS) DBMS, for storage of gridded data and efficient retrieval of data products. The Reading e-Science Centre are evaluating this system on behalf of the environmental science community. Processes queries on the database server, thereby minimizing the amount of network input/output and client-side CPU time required Extracts data products up to 50-100 times faster than previous technology Handles 1D, 2D, 3D and 4D grids Stores grids using a tiling scheme in conjunction with Smart BLOBS, with user control over the tile size. This allows very efficient generation of data products that involve only a small portion of the data Stores the data in, and converts it between, more than 40 different planar mapping projections supported by the IBM/Informix Spatial DataBlade Supports irregularly spaced grids in any or all of the grid dimensions Handles the presence of multiple vector and/or scalar values at each grid point Provides interpolation options using N-Linear, nearest-neighbour or user-supplied interpolation schemes Extraction can be at any angle through the 4D volume Native Import/Export format is NetCDF; conventions defined in Grid Import-Export format (GIEF) Provides application programming interfaces for C, Java and SQL Introduction Features of the Grid DataBlade Example Uses Some Applications Progress Made So Far We have successfully used the Grid DataBlade to store about 12 GB Forecasting Ocean Assimilation Model (FOAM) data (temperature and salinity) in an Informix Database. Then, we tested the functionalities of the Grid DataBlade: extracting data, updating a grid, generating temperature timeseries involving extracting data from multiple grids and exporting data to files or for visualisation. These experiments were carried out using programs written in SQL, Java and the Native interfaces offered by the DataBlade and Informix APIs respectively. Future Work execute procedure grdfromgief(“pathname”,”table name”); Loading a GIEF file into a table Extracting a subset of the grid The following expression generates a timeseries for temperature at latitude 50, longitude -30 at a 5 m depth level between 1 st January 2004 to 30 th June 2004 from grids stored in the database: select GRDExtract (grid, '((translation –30.0 50.0 0 0) (dim_names time depth lat lon)(dim_sizes 175 1 1 1) (affine_transformation 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0) (nonuniform time 7305 …… 7480)(nonuniform depth 5)(interpolation (time linear)))'::grdspec) from foamvar where grid_id <= 6; Carrying out some detailed experiments to determine the performance of the DataBlade compared to traditional file based data access. The ability to make threshold type of queries directly on the database server. For instance, the possibility to find all the regions where the temperature is above/below a certain value. Creating virtual datasets. For example, density could be calculated on the database server using temperature and salinity data which are already stored in the database. Adding some new functionality for answering queries of the form: what values of salinity correspond to a particular temperature, given I have a grid containing salinity and temperature ? 40 m 75 m 90 m Depth 52.3 m X Y The above metadata describes a grid storing temperature data for the FOAM eighth degree at various levels and times (denoted by nonunisample1 and nonunisample2 respectively). The starting point of the grid is at longitude -98.5 and latitude 10. Each dimension has a set of basis vectors which tells us which axis varies fastest and by how much. In this case longitude varies fastest with 0.125 degrees spacing. The following expression exports a grid of temperature that begins at latitude –89.0, longitude 0 and extended to latitude 89, longitude 360, every one degree sampled at level 5 at time 6940 (1 st January 2003) to a GIEF file : Select grdrowtogief('${curdir}/Tempvar.nc', ‘foamvar', rowid, '((translation 0 -89 0 0) (dim_names time depth lat lon)(dim_sizes 1 1 179 360)(affine_transformation 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0)(nonuniform time 6940) (nonuniform depth 5))'::grdspec) from foamvar where grid_id = 13; Exporting a grid to a file The U.S. National Library of Medicine granted BCS access to their Visible Human Project consisting of 1,871 parallel high-resolution coloured images of a male cadaver. BCS then subsampled the data to form a 1.6-gigabyte 3D gridded dataset. Users can query the Grid DataBlade on the BCS Web site to extract 2D slices of a human cross-sections. U.S. Navy Pilots can train on real- life scenarios, including forecasted weather patterns, visibility, wind speed and direction using PC-based flight simulation software. The Grid DataBlade extracts time-significant, location-specific weather data from a four dimensional gridded dataset housed in IDS, Version 9.3 and passes it to trainees running the flight simulation on a PC. BARRODALE COMPUTING SERVICES LTD. www.barrodale. com http://www.barrodale.com/flightpath/index.html http://www.barrodale.com/grid_Demo/ GridBladeApplet.html Acknowledgements We are grateful to Ian Barrodale and Cedric Zala from Barrodale Computing Services Ltd. for kindly providing us an evaluation version of the Grid DataBlade and for assistance in using it. Special thanks also go to John Pickford from IBM for providing us a copy of the Informix Dynamic Sever and for support. References 1. Barrodale Computing Services Ltd., 2002: Storing and manipulating gridded data in databases. Online: http://www.barrodale.com/grid_Demo/gridInfo.pdf. 2. IBM, 2002: BCS speeds access to gridded data 100-fold with IBM Informix Dynamic Server. Online: http://www.barrodale.com/docs/ibm_grid_writeup.pdf

Post on 21-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Printed by  STORING AND MANIPULATING GRIDDED DATA IN SPATIALLY-ENABLED DATABASES Adit Santokhee, Jon Blower, Keith Haines Reading

printed by

www.postersession.com

STORING AND MANIPULATING GRIDDED DATA IN SPATIALLY-ENABLED DATABASES

Adit Santokhee, Jon Blower, Keith Haines

Reading e-Science Centre, Environmental Systems Science Centre

Modern computer simulations and satellite observations of the oceans and atmosphere produce large amounts of data on the terabyte scale. Data providers, such as the Met Office and the European Centre for Medium-Range Weather Forecasts, need a manageable system for storing these datasets, whilst enabling the many consumers of the data to access them in a convenient and secure manner. Typically, these datasets are stored in flat files (often compressed) and each institution tends to store data in its own format (e.g., NetCDF, HDF, GRIB) with the data discretized on a variety of grids.

End-users of the data (which include research institutions, government agencies and private industry) should not have to know the details of how the data are stored. They require a flexible means of accessing data and downloading them in the form they prefer. A typical query might involve the extraction of a subset of data from multiple source files, interpolation, aggregation and re-projection on a new grid.

There is increasing justification for using database management systems (DBMSs) to store and manipulate gridded data. The principal advantages of such databases are data integrity, consistency, flexibility and effective access to data by diverse users of multiple applications. Implementing an efficient DBMS for large quantities of gridded data is very challenging.

Barrodale Computing Services Ltd. (BCS) have recently developed a software module (the BCS Grid DataBlade), that plugs into the IBM/Informix Dynamic Server 9.x (IDS) DBMS, for storage of gridded data and efficient retrieval of data products. The Reading e-Science Centre are evaluating this system on behalf of the environmental science community.

Processes queries on the database server, thereby minimizing the amount of network input/output and client-side CPU time required

Extracts data products up to 50-100 times faster than previous technology

Handles 1D, 2D, 3D and 4D grids

Stores grids using a tiling scheme in conjunction with Smart BLOBS, with user control over the tile size. This allows very efficient generation of data products that involve only a small portion of the data

Stores the data in, and converts it between, more than 40 different planar mapping projections supported by the IBM/Informix Spatial DataBlade

Supports irregularly spaced grids in any or all of the grid dimensions

Handles the presence of multiple vector and/or scalar values at each grid point

Provides interpolation options using N-Linear, nearest-neighbour or user-supplied interpolation schemes

Extraction can be at any angle through the 4D volumeNative Import/Export format is NetCDF; conventions defined in Grid Import-Export format (GIEF)Provides application programming interfaces for C, Java and SQL

Introduction

Features of the Grid DataBlade

Example Uses

Some Applications

Progress Made So FarWe have successfully used the Grid DataBlade to store about 12 GB Forecasting Ocean Assimilation Model (FOAM) data (temperature and salinity) in an Informix Database.

Then, we tested the functionalities of the Grid DataBlade: extracting data, updating a grid, generating temperature timeseries involving extracting data from multiple grids and exporting data to files or for visualisation. These experiments were carried out using programs written in SQL, Java and the Native interfaces offered by the DataBlade and Informix APIs respectively.

Future Work execute procedure grdfromgief(“pathname”,”table name”);

Loading a GIEF file into a table

Extracting a subset of the grid

The following expression generates a timeseries for temperature at latitude 50, longitude -30 at a 5 m depth level between 1st January 2004 to 30th June 2004 from grids stored in the database:

select GRDExtract (grid, '((translation –30.0 50.0 0 0) (dim_names time depth lat lon)(dim_sizes 175 1 1 1) (affine_transformation 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0) (nonuniform time 7305 …… 7480)(nonuniform depth 5)(interpolation (time linear)))'::grdspec) from foamvar where grid_id <= 6;

Carrying out some detailed experiments to determine the performance of the DataBlade compared to traditional file based data access.

The ability to make threshold type of queries directly on the database server. For instance, the possibility to find all the regions where the temperature is above/below a certain value.

Creating virtual datasets. For example, density could be calculated on the database server using temperature and salinity data which are already stored in the database.

Adding some new functionality for answering queries of the form: what values of salinity correspond to a particular temperature, given I have a grid containing salinity and temperature ?

40 m

75 m

90 m

Depth

52.3 m

X

Y

The above metadata describes a grid storing temperature data for the FOAM eighth degree at various levels and times (denoted by nonunisample1 and nonunisample2 respectively). The starting point of the grid is at longitude -98.5 and latitude 10. Each dimension has a set of basis vectors which tells us which axis varies fastest and by how much. In this case longitude varies fastest with 0.125 degrees spacing.

The following expression exports a grid of temperature that begins at latitude –89.0, longitude 0 and extended to latitude 89, longitude 360, every one degree sampled at level 5 at time 6940 (1st January 2003) to a GIEF file :

Select grdrowtogief('${curdir}/Tempvar.nc', ‘foamvar', rowid, '((translation 0 -89 0 0) (dim_names time depth lat lon)(dim_sizes 1 1 179 360)(affine_transformation 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0)(nonuniform time 6940) (nonuniform depth 5))'::grdspec) from foamvar where grid_id = 13;

Exporting a grid to a file

The U.S. National Library of Medicine granted BCS access to their Visible Human Project consisting of 1,871 parallel high-resolution coloured images of a male cadaver. BCS then subsampled the data to form a 1.6-gigabyte 3D gridded dataset. Users can query the Grid DataBlade on the BCS Web site to extract 2D slices of a human cross-sections.

U.S. Navy Pilots can train on real-life scenarios, including forecasted weather patterns, visibility, wind speed and direction using PC-based flight simulation software. The Grid DataBlade extracts time-significant, location-specific weather data from a four dimensional gridded dataset housed in IDS, Version 9.3 and passes it to trainees running the flight simulation on a PC.

BARRODALECOMPUTINGSERVICES LTD.www.barrodale.com

http://www.barrodale.com/flightpath/index.html http://www.barrodale.com/grid_Demo/GridBladeApplet.html

AcknowledgementsWe are grateful to Ian Barrodale and Cedric Zala from Barrodale Computing Services Ltd. for kindly providing us an evaluation version of the Grid DataBlade and for assistance in using it. Special thanks also go to John Pickford from IBM for providing us a copy of the Informix Dynamic Sever and for support.

References1. Barrodale Computing Services Ltd., 2002: Storing and manipulating gridded data in databases. Online:http://www.barrodale.com/grid_Demo/gridInfo.pdf. 2. IBM, 2002: BCS speeds access to gridded data 100-fold with IBM Informix Dynamic Server. Online:http://www.barrodale.com/docs/ibm_grid_writeup.pdf