bulk metadata structures in cera frank toussaint, michael lautenschlager max-planck-institut für...

18
Bulk Metadata Bulk Metadata Structures in CERA Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate

Upload: julia-sparks

Post on 01-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate

Bulk Metadata Structures Bulk Metadata Structures in CERAin CERA

Frank Toussaint, Michael Lautenschlager

Max-Planck-Institut für MeteorologieWorld Data Center for Climate

Page 2: Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate

ContentsContents

Present activitiesPresent activities General vs. specific metadataGeneral vs. specific metadata Present structure at WDC-ClimatePresent structure at WDC-Climate Structural changes in CERAStructural changes in CERA Data flowsData flows Pros and cons of this modePros and cons of this mode What do we gain?What do we gain?

Page 3: Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate

dynamichtml pages

http:htmlS

ervl

et /

JS

PlnternetApplication

Server

web browser

New Catalogue AccessNew Catalogue Access

Catalogue access via WWW

• URL parsed by JSP

• integrated DB retrieval by JSP

• response in standard html

• efficient administration of detailed meta information

request: URL

Page 4: Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate

raw xml

xhtml

ISO xml

DC xml

... variousmetadataformats

http:XML

xsl –mapping

xsql

–qu

ery

see wini.wdc-climate.desee wini.wdc-climate.de

lnternetApplication

Server

Metadata access via WWW:

• xsql query to DB

• xml output from DB by integrated servlet

• xsl mapping to any metadata format

http Metadata Outputhttp Metadata Outputrequest: URL user applications

Page 5: Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate

MD typeMD type General/Catalogue MDGeneral/Catalogue MD Specific/Use MDSpecific/Use MD

propertiesproperties canonical,canonical, general general

branch specific,branch specific, detailled detailled

useuse browse, search,browse, search, retrieval retrieval

interpretation,interpretation, processing processing

contentcontent title, contacts, dates, title, contacts, dates, space-time coverage…space-time coverage…

grids, setups, grids, setups, platform/sensor…platform/sensor…

complexitycomplexity low diversitylow diversityrelatively high stabilityrelatively high stability

high diversityhigh diversitylow stabilitylow stability

useruser catalogue visitorscatalogue visitors(all scientists)(all scientists)

user of the datauser of the data(branch specific)(branch specific)

Types of MetadataTypes of Metadata

Page 6: Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate

Forms of Specific MetadataForms of Specific Metadata

grib headers with/without code tablesgrib headers with/without code tables NetCDF(-CF) headersNetCDF(-CF) headers xml files – structure definitions in xsdxml files – structure definitions in xsd

……in addition:in addition: hand written noteshand written notes programme inline commentsprogramme inline comments phone callsphone calls

Page 7: Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate

Present Structure at WDCCPresent Structure at WDCC

raw dataraw data postprocessedpostprocesseddatadata

data productsdata products

e.g., homogeneouse.g., homogeneousgrids: stored in grids: stored in present CERApresent CERA

for general scientific for general scientific useuse

(CERA Module (CERA Module DATA_ORG)DATA_ORG)

inhomogeneous grids:inhomogeneous grids:experts only experts only

Page 8: Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate

CERA ModulesCERA Modules

5 Modules (3 in use):

• DATA_ACCESSfor automatted data access

• DATA_ORGorganization of grid data

• CODEmodel code numbers

1 submodule

Page 9: Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate
Page 10: Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate

Appendix for Bulk MD: Appendix for Bulk MD: XML and otherXML and other

Appendix for bulk data

• type of appendix incl version

• xml, xsd, xsl techniques for catalogue display

• txt files to view

• other formats for download

possible types:• numerical grid description

• model/experiment description

Page 11: Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate

Data flows: InputData flows: Input

xslmapping

xmlmetadata

format

xsddefinitions

specific MDas bulk data

general MDas table content

control

tables

bulk xml

Page 12: Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate

Data flows: xml OutputData flows: xml Output

xslmapping

xmlmetadata

format

xsddefinitions

specific MDas bulk data

general MDas table content

not used

Page 13: Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate

Data flows: Catalogue OutputData flows: Catalogue Output

xslmapping

xmlmetadata

format

xsddefinitions

specific MDas bulk data

general MDas table content

user display

downloadon

requestnot used

Page 14: Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate

Concept of Appendix for Bulk DataConcept of Appendix for Bulk Data

ProsPros Data structure discussion decouples from Data structure discussion decouples from

data storage techniquedata storage technique maximum flexibilitymaximum flexibility easy catalogue integrated display for xmleasy catalogue integrated display for xml low effortlow effort access rights separate from main metadataaccess rights separate from main metadata stable xml structures later can be migrated to stable xml structures later can be migrated to

table structures table structures

Page 15: Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate

Concept of Appendix for Bulk Data Concept of Appendix for Bulk Data

ConsCons

search mechanisms on stored data are search mechanisms on stored data are between crude and nonebetween crude and none

……

Page 16: Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate

Data Storage Problem andData Storage Problem and Numerical Model Desription Problem Numerical Model Desription Problem

Which problems are solved by this concept ?Which problems are solved by this concept ?

Which problems are created ?Which problems are created ?

Which problems persist ?Which problems persist ?

diversity & time changes of specific datadiversity & time changes of specific data

……

we do not yet have a structural we do not yet have a structural concept…concept…

…responsibility of scientific specialists ?…responsibility of scientific specialists ?

Page 17: Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate

The Structures of Bulk DataThe Structures of Bulk DataMinimum requirementsMinimum requirements every data bulk needs a every data bulk needs a name, format, sizename, format, size it may have it may have contact persons, access contact persons, access

constraints …constraints …Bulk metadata as XMLBulk metadata as XML extraction & displayextraction & display of defined information of defined information undefined data is stored but not displayedundefined data is stored but not displayedOther bulk metadataOther bulk metadata text files text files as name lists, source codes, run as name lists, source codes, run

scripts, …scripts, … for display for display docs as pdfdocs as pdf

Page 18: Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate