interchange technology for applications to facilitate generic
TRANSCRIPT
INTERCHANGE TECHNOLOGY FOR APPLICATIONS TO FACILITATE GENERIC
ACCESS TO HETEROGENOUS DATA FORMATS
IEEE International Geoscience and Remote Sensing Symposium, 2002
RAHUL RAMACHANDRANINFORMATION TECHNOLOGY AND SYSTEMS CENTER
UNIVERSITY OF ALABAMA IN HUNTSVILLE
Earth Science Data Characteristics• Different formats,
types and structures (18 and counting for Atmospheric Science alone!)
• Different states of processing ( raw, calibrated, derived, modeled or interpreted )
• Enormous volumes
• Data usability problem
HDF HDF-EOS
netCDFASCII
Binary GRIB
Data Usability Problem
DATA FORMAT 1
DATA FORMAT 1
DATA FORMAT 2
DATA FORMAT 2
DATA FORMAT 3
DATA FORMAT 3
APPLICATION
READER 1 READER 2
FORMATCONVERTER
• Specialized code for every format• Difficult to assimilate new data types
• Enforce a Standard Data Format• Not practical for legacy datasets
ESML Solution
ESML LIBRARY
APPLICATION
ESMLFILE
ESMLFILE
ESMLFILE
DATA FORMAT 1
DATA FORMAT 1
DATA FORMAT 2
DATA FORMAT 2
DATA FORMAT 3
DATA FORMAT 3
• Data interoperability for applications• “Define Once, Use Anywhere” philosophy
What is ESML?
• Earth Science Markup Language (ESML)• Specialized markup language for Earth Science metadata
based on XML• Machine-readable and -interpretable representation of the
structure and content of any data file, regardless of data format
• External metadata files that can be generated by either data producer or data consumer (at collection, data set, and/or granule level)
• ESML will provide the benefits of a standard, self-describing data format (like HDF, HDF-EOS, netCDF,geoTIFF, …) without the cost of data conversion
• Basis for core Interchange Technology
Components of the Interchange Technology
DATAFORMAT2
DATAFORMAT3
OTHER FORMATS
DATAFORMAT1
ESMLFILE
ESMLFILE
ESMLFILE
ESMLSCHEMA
ESML LIBRARY
ESMLDATA
BROWSER
ADaM DATA MININGSYSTEM
OTHER APPLICATIONS
ESMLEDITOR
INTERCHANGETECHNOLOGY
Components of the Interchange Technology
DATAFORMAT2
DATAFORMAT3
OTHER FORMATS
DATAFORMAT1
ESMLEDITORESML
SCHEMA
ESML LIBRARY
ESMLDATA
BROWSER
ADaM DATA MININGSYSTEM
OTHER APPLICATIONS
ESMLFILE
ESMLFILE
ESMLFILE
ESML COSISTS OF:
MARKUPS/DESCRIPTIONS
Components of the Interchange Technology
DATAFORMAT2
DATAFORMAT3
OTHER FORMATS
DATAFORMAT1
ESMLEDITORESML
SCHEMA
ESML LIBRARY
ESMLDATA
BROWSER
ADaM DATA MININGSYSTEM
OTHER APPLICATIONS
ESMLFILE
ESMLFILE
ESMLFILE
ESML COSISTS OF:
RULES FOR THE MARKUPS
Components of the Interchange Technology
DATAFORMAT2
DATAFORMAT3
OTHER FORMATS
DATAFORMAT1
ESMLEDITORESML
SCHEMA
ESML LIBRARY
ESMLDATA
BROWSER
ADaM DATA MININGSYSTEM
OTHER APPLICATIONS
ESMLFILE
ESMLFILE
ESMLFILE
ESML COSISTS OF:
MIDDLEWARE FOR AUTOMATION
Components of the Interchange Technology
DATAFORMAT2
DATAFORMAT3
OTHER FORMATS
DATAFORMAT1
ESMLFILE
ESMLFILE
ESMLFILE
ESMLSCHEMA
ESML LIBRARY
ESMLDATA
BROWSER
ADaM DATA MININGSYSTEM
OTHER APPLICATIONS
ESMLEDITOR
INTERCHANGETECHNOLOGY
Scientists/Application Developers Role
DATAFORMAT1
DATAFORMAT2
DATAFORMAT3
OTHER FORMATS
ESMLFILE
ESMLFILE
ESMLFILE
ESMLSCHEMA
ESML LIBRARY
ESMLDATA
BROWSER
ADaM DATA MININGSYSTEM
OTHER APPLICATIONS
ESMLEDITOR
DATA PRODUCERSOR CONSUMERS
INTERCHANGETECHNOLOGYAPPLICATION
DEVELOPERS
ESML Schema
• Content metadata describe the contents of a file in human-readable and machine-readable terms. An example of this would be ECS or FGDC metadata.
• Syntactic metadata describe the structure of the file in machine-readable and -interpretable terms. HDF-EOS and HDF provide this mechanism for HDF-EOS files.
• Semantic metadata, describe the contents of a file in machine-readable terms such that an application can interpret the data in an intelligent manner. Semantic metadata is embedded in the Syntactic
ESML Content Metadata
• Content metadata are mainly used for human knowledge and web-searching
• Content metadata in ESML are derived from FGDC and ECS metadata sets
• Tools are planned for converting existing ECS metadata into ESML
Syntactic Metadata in Earth Science Markup Language
• Free Formats:– Descriptions of the structures of data files in two
basic data formats, Binary and ASCII• Self-Describing Formats:
– Descriptions of the structures of self-describing data files through their internal metadata
• HDF-EOS
• Other data formats– GRIB, McIDAS
• Formats to be added• HDF, CDF, netCDF, etc
Semantic Metadata in ESML• Goals
– Get the right data – Navigating the data
• Figure out within the data what is latitude, longitude, time and/or coordinate system
– Reading the data in actual scientific units, for example: • Data stored as value to save space • Equation is applied on processing
• Embedded in Syntactic metadata• Provides semantic descriptions for Time, Geo (Lat/Lon) and Data
fields• Provides capabilities to define equation for data conversion for Time,
Geo and Data fields• e.g. Y=mx+b where m is scale and b is offset
Need For Semantic Metadata: Example
TMI Data Product README File
The data values between 0 and 250 need to be scaled to obtain meaningful geophysical data.To scale the data, multiply by the scale factors listed below:
T: multiply by 6.0 to get minute of day (0 – 1440) S: multiply by 0.15 AND subtract 3 to get SST (-3 - 34.5 C)W: multiply by 0.15 to get 10 m winds (0 - 37.5 m/s)V: multiply by 0.3 to get water vapor (0 - 75 mm)L: multiply by 0.01 to get cloud liquid water (0 - 2.5 mm)R: multiply by 0.1 to get rain rate (0 - 25 mm/hr)
Semantic Metadata for TMI Products<SyntacticMetaData>
<Binary><BinaryStructure geoInfo="ByProjection" instances="1">
<Projection LowRight_X="360" LowRight_Y="-40" UpLeft_X="0" UpLeft_Y="40">
<Geographic latOffset="-0.125" lonOffset="-0.125"/></Projection><Array occurs="320"><Array occurs="1440"><Field name="SST" type="UInt8" order="LittleEndian" size="8"><Data unit="C " equation="0.15 * X -3"/>
</Field></Array>
</Array>...
</SyntacticMetaData>
Writing an ESML File (1)
ESML MARKUPFOR THE DATA FILE
<a:ESML><SyntacticMetaData><Ascii><AsciiStructure geoInfo="NoGeoInfo" instances="1"><Field name="SizeX" format="%d">
<Attribute/></Field><Field name="SizeY" format="%d">
<Attribute/></Field><Array occurs="4">
<Array occurs="5"><Field name="BrightnessTemp" format="%f"><Data unit="Degrees Kelvin"/>
</Field></Array>
</Array></AsciiStructure>
</Ascii></SyntacticMetaData>
</a:ESML>
451 2 3 4 56 7 8 9 10
11 12 13 14 1516 17 18 19 20
SIMPLE ASCIIDATA FILE
Writing an ESML File (2)
<a:ESML><SyntacticMetaData>
SIMPLE ASCIIDATA FILE
ONLY THESTRUCTURE
451 2 3 4 56 7 8 9 10
11 12 13 14 1516 17 18 19 20
Writing an ESML File (3)
<a:ESML><SyntacticMetaData><Ascii>
DESCRIBE THEFORMAT
451 2 3 4 56 7 8 9 10
11 12 13 14 1516 17 18 19 20
SIMPLE ASCII DATA FILE
Writing an ESML File (4)
<a:ESML><SyntacticMetaData><Ascii><AsciiStructure geoInfo="NoGeoInfo" instances="1">
451 2 3 4 56 7 8 9 10
11 12 13 14 1516 17 18 19 20
ENTIRE FILE CONTENTSINTO 1 LOGICAL
STRUCTURE
SIMPLE ASCII DATA FILE
Writing an ESML File (5)
451 2 3 4 56 7 8 9 10
11 12 13 14 1516 17 18 19 20
<a:ESML><SyntacticMetaData><Ascii><AsciiStructure geoInfo="NoGeoInfo" instances="1"><Field name="SizeX" format="%d">
<Attribute/></Field>
DEFINE THE FIRST FIELD IN THE FILE:
HEADER INFORMATION
SIMPLE ASCII DATA FILE
Writing an ESML File (6)
451 2 3 4 56 7 8 9 10
11 12 13 14 1516 17 18 19 20
<a:ESML><SyntacticMetaData><Ascii><AsciiStructure geoInfo="NoGeoInfo" instances="1"><Field name="SizeX" format="%d">
<Attribute/></Field><Field name="SizeY" format="%d">
<Attribute/></Field>
DEFINE THE SECOND FIELD IN THE FILE:
HEADER INFORMATION
SIMPLE ASCII DATA FILE
Writing an ESML File (7)
<a:ESML><SyntacticMetaData><Ascii><AsciiStructure geoInfo="NoGeoInfo" instances="1"><Field name="SizeX" format="%d">
<Attribute/></Field><Field name="SizeY" format="%d">
<Attribute/></Field><Array occurs="4">
<Array occurs="5"><Field name="BrightnessTemp" format="%d"><Data unit="Degrees Kelvin"/>
</Field></Array>
</Array>
451 2 3 4 56 7 8 9 10
11 12 13 14 1516 17 18 19 20
DEFINE THE DATA FIELD IN THE FILE:PROVIDE SIZE AND
FORMAT INFORMATION
SIMPLE ASCII DATA FILE
Writing an ESML File (8)
<a:ESML><SyntacticMetaData><Ascii><AsciiStructure geoInfo="NoGeoInfo" instances="1"><Field name="SizeX" format="%d">
<Attribute/></Field><Field name="SizeY" format="%d">
<Attribute/></Field><Array occurs="4">
<Array occurs="5"><Field name="BrightnessTemp" format="%d"><Data unit="Degrees Kelvin"/>
</Field></Array>
</Array></AsciiStructure>
</Ascii></SyntacticMetaData>
</a:ESML>
CLOSE ALL THETAGS: ESML FILE
IS READY
451 2 3 4 56 7 8 9 10
11 12 13 14 1516 17 18 19 20
SIMPLE ASCII DATA FILE
Advantages of using ESML• Scientist (Data Producer/Consumer)
– ESML will let them use virtually any data format in their applications
– ESML files are external description files that can be easily created, modified and viewed by any text editor
– ESML has a few simple concepts which can be used to describe numerous data sets
– An ESML file can be seen as a set of instructions to the application on how to read and understand a data file
– If the format of the data changes for whatever reason (e.g., newversion of data set) no software changes are required, just a new ESML file.
• Does that mean a scientist has to write an ESML file for every data file? – No, in fact the beauty of ESML is that it allows scientist to write
ONE ESML file to describe MANY data files that are structural and semantically similar
Advantages of using ESML
• Data Archiving Centers (Data Producers) – ESML files can be used to store not only the structural
and semantic information about data sets but also content or search metadata
– Since ESML files are independent separate files, they can be generated on the fly utilizing metadata databases as datasets are ordered
– Centers can archive data in its native formats and not have to store them in any “selected” format
– Centers can now also “ESMLize” all their legacy datasets with minimal efforts
– The existing legacy datasets now become a more valuable data resource for scientists, because they can be used more efficiently and effectively
Advantages of using ESML
• Application Developers– By using the ESML library, developers can build
“ESML enabled” applications!– ONE single reader component can read all the various
data formats instead of having separate reader module for different formats
– URL access• Application users can access data stored at different site online
without requiring to download the data files to their machines– These features make applications much more powerful
and flexible– Plus, ESML library is intuitive and easy to use
Advantages of using ESML• Other Advantages
– In addition to allowing applications to read the data, ESML provides scientist nifty features such as:
– Semantic tagging• Assigning meaning to different data fields that are present • Identify geolocation fields • Users can modify tags to access different data fields based on their
needs and requirements– Slicing and Dicing data
• Selecting or subsetting a part of data by changing the ESML definition
– Preprocessing data • Scientists can specify a transformation equation to be performed when
reading the data• Application receives the preprocessed the data• Example - Data that are stored as packed storage format and require
to be converted into some scientific quantity
Tools/Products Status
Ability to browse data files using the ESML description files
ESML Data Browser
Ability to write and validate ESML descriptionsESML Editor
•WINDOWS and LINUX•URL access•Handles ASCII, Binary (McIDAS), HDF-EOS, GRIB files•Handles preprocessing, wild card and symbols
ESML Library(C++/Java JNI )
•Defines ASCII, Binary (McIDAS), HDF-EOS, GRIB formats•Provides preprocessing, wild card, symbols and semantic tagging capabilities
ESML Schema
FeaturesTools/Products
ESML Enabled Applications (current and planned)
• ADaM Data Mining System• ESML Pilot Study – building a
MODIS/CERES collocation web service• Web Map and Coverage Servers for Passive
Microwave data sets• General Purpose Subsetter• Space Time Tool Kit• DODS server
Summary• ESML is NOT a new data format• ESML enables independently developed applications
and services to effectively utilize wide variety of distributed, heterogeneous data products
• ESML is simple enough that end-users (scientists) can create their own ESML for on-hand datasets (new or legacy)
ESML Web Page
• URL: esml.itsc.uah.eduesml.itsc.uah.edu• Post latest products, news, presentations, papers• Schema and related documents available to all• ESML Library, ESML Editor and ESML Data
Browser available
Demo
• ESML Editor• Data Browser
– Read multiple formats• National Lightning Detection Network (ASCII)• Advanced Microwave Sounding Unit-A (HDF-EOS)
• Equation to preprocess the data• Semantic capabilities – slice and dice the data
All the ESML tools/products are available at: http://esml.itsc.uah.edu/