scientific data management for visualization
TRANSCRIPT
1
Scientific Data ManagementScientific Data Managementfor Visualizationfor Visualization
Implementation ExperienceImplementation Experience
Mario Valle (presenting), Jean FavreMario Valle (presenting), Jean FavreSwiss National Supercomputing Centre (CSCS)Swiss National Supercomputing Centre (CSCS)
Etienne ParkinsonEtienne ParkinsonVA TECH HYDROVA TECH HYDRO
Alexandre Perrig, Mohamed Alexandre Perrig, Mohamed FarhatFarhatEPF Lausanne LMHEPF Lausanne LMH Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
The Turbine Simulation ProjectThe Turbine Simulation Project
The Turbine Simulation project brings together academic and industrial partners to develop an advanced method for flow simulation in Pelton turbines.
The expected result of the project is a significant improvement of the turbine design process.
All images and data are courtesy of VaTech Hydro and EPFL
LMHLMHLMHLMHLMHLMHLMHLMH
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Validation of modeling assumptionsValidation of modeling assumptions
Real world turbine Simulated turbine
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Comparison double roleComparison double role
ValidationValidationValidationValidation
ExperimentExperimentExperimentExperiment
Find theFind theFind theFind the
unexpectedunexpectedunexpectedunexpected
SimulationSimulationSimulationSimulation
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Specific visualization toolsSpecific visualization tools
AVS/Express
Project data readers andspecialized techniques
Implemented insideAVS/Express andParaView
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Expected outcomesExpected outcomes
Better design methods
Specific visualization techniques
Scientific data management competencies
supported by
supported by
2
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
validateddata
derivedproducts
aggregateddata
Experimental data
Sameconditions
Simulation results
Various kind of related dataVarious kind of related data
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Heterogeneous datasetsHeterogeneous datasets
PDF, scanned blueprints, Word documents
Documents
CAD files, STLGeometries
JPEGImages
LMH custom formatExperimental data
HDF5Derived data
Ansys CFX filesSimulation results
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Capture project knowledgeCapture project knowledge
There are data that we forgot often to consider part of the project knowledge:
� Simulation parameters� Relationships (this derived from that)� Workflow definitions� Version of the math libraries, etc.� “Everyone knows this” syndrome� Free form annotation of experiences
This knowledge usually remains in the head of the researcher (and often is lost after PhD or project end)
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Collect the project needsCollect the project needs
1. The project needs a uniform method to store knowledge and metadata about heterogeneous datasets
2. The project needs a uniform method to store relationships between heterogeneous datasets
3. Users want a uniform access to the visualization tools
4. The solution should not disrupts the normal workflow
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Data management for visualization Data management for visualization
1. Visualization outcomes are as good as the data they are based upon
2. Diverse datasets are integrated to gain insight
3. Visualization could produce derived data that should be managed
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Existing solutionsExisting solutions
Project people try hard to record somewhere useful information about their data
When those information are defined, are stored using very “ad-hoc” methods, like in a file name and path:
Run date Run number = “5”Format = “CFX”
/Pelton/simulations/2004-09-23/head=300/Q=5/rpm=3000/pelton_005.res
Param = “rotational speed”
Param = “flow rate”
Param = “head”Project name
Data type = “simulation”
3
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Metadata inside the data fileMetadata inside the data file
EXIF image metadata example
CFX metadata example
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Standard scientific data formatsStandard scientific data formats
There are general purpose file formats that support metadata storage
But they force conversion of data to a different format
XML is a general storage format for metadata.
One XML based scientific data format (XDMF) introduced the distinction between heavy (native data) and light (metadata) data
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Similar problemSimilar problem
Digital Libraries have similar problems
A library item could contain:� Images of a manuscript� Textual transcript� Related information
They created the MetadataEncoding and TransmissionStandard (METS)
It is an XML schema-basedspecification for encoding“hub” documents for materials whose content is digital.
Unfortunately METS cannot be adopted as-is.Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Scientific Data BagScientific Data Bag
A lightweight method to add metadata to a
set of heterogeneous datasets, to record their
relationships, to store derived data and to
provide uniform access to data and metadata.
The Scientific Data Bag (SDB) is a glue
format, not a new scientific data format.
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Scientific Data BagScientific Data Bag
pointersto data
metadata&
annotations
metadata&
annotations
Scientific Data Bag
data internalstructure
The Scientific Data Bag metaphor recalls the fact that a bag usually contains disparate, but related things.
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
SDB implementationSDB implementation
SDB is a small XML file that goes together with the data files
But it is not the most important thing offered by SDB; it instead offers:
1. Logical data structure
2. Implementation layer (API)
3. Operational workflow
4
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Bag and ThingsBag and Things
Bag
ThingThing Thing …
SimulationExperiment
data
Filtered
timeseries
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
InfoBlocksInfoBlocks
Bag
ThingThing Thing …
InfoBlock
InfoBlock
InfoBlock
Sticker
StickerTimesteps
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
AnnotationsAnnotations
Bag
ThingThing Thing …
InfoBlock
InfoBlock
InfoBlock
Annotation
Annotation
Processing Notes
Journal
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
FileFile
Bag
ThingThing Thing …
InfoBlock
InfoBlock
InfoBlock
Annotation
File
Nativedatasets
Annotation
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Structural MapStructural Map
Bag
ThingThing Thing …
InfoBlock
InfoBlock
InfoBlock
Annotation
File
Annotation
StructuralMap
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Relational MapRelational Map
Bag
ThingThing Thing …
InfoBlock
InfoBlock
InfoBlock
Annotation
File
AnnotationRelationalMap
StructuralMap
Derived-from
5
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
� Support for evolutionary development
� Semantically rich description of the bag logical structure
� Not based on XSchema
External profileExternal profile
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Metadata harvestingMetadata harvesting
Harvestingscript
Native datasets
SDB
Editing
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
SDB browsingSDB browsing
Browsing
Visualization
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Bag browserBag browser
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Bag browser selectionBag browser selection
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Project statusProject status
� SDB is evolving right now
� For now the description of CFX simulations and images collections are stable
� The experimental data description is still evolving
� SDB files are slowly being produced and stored
6
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
Unplanned outcomeUnplanned outcome
A side effect: the competencies acquired has been used by our users
� Tutorials on data management for CSCS users
� Consultancy for projects external to CSCS domain like the SPECTRA ESA satellite mission
� Awareness of the importance of data management for unrelated fields like molecular dynamics
Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005
FutureFuture
� Still missing some important clients
� Better integration with visualization tools needed
� In future larger scale projects SDB could play the role of an intermediate step to load a real database
Scientific Data ManagementScientific Data Managementfor Visualizationfor Visualization
Implementation ExperienceImplementation Experience
Thanks for your attention!Thanks for your attention!
Mario Valle et al.Mario Valle et al.
[email protected]@cscs.chhttp://http://www.cscs.ch/~mvalle/sdmwww.cscs.ch/~mvalle/sdm//