scientific data management for visualization

6
1 Scientific Data Management Scientific Data Management for Visualization for Visualization Implementation Experience Implementation Experience Mario Valle (presenting), Jean Favre Mario Valle (presenting), Jean Favre Swiss National Supercomputing Centre (CSCS) Swiss National Supercomputing Centre (CSCS) Etienne Parkinson Etienne Parkinson VA TECH HYDRO VA TECH HYDRO Alexandre Perrig, Mohamed Alexandre Perrig, Mohamed Farhat Farhat EPF Lausanne LMH EPF Lausanne LMH Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005 The Turbine Simulation Project The Turbine Simulation Project The Turbine Simulation project brings together academic and industrial partners to develop an advanced method for flow simulation in Pelton turbines. The expected result of the project is a significant improvement of the turbine design process. All images and data are courtesy of VaTech Hydro and EPFL LMH LMH LMH LMH LMH LMH LMH LMH Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005 Validation of modeling assumptions Validation of modeling assumptions Real world turbine Simulated turbine Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005 Comparison double role Comparison double role Validation Validation Validation Validation Experiment Experiment Experiment Experiment Find the Find the Find the Find the unexpected unexpected unexpected unexpected Simulation Simulation Simulation Simulation Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005 Specific visualization tools Specific visualization tools AVS/Express Project data readers and specialized techniques Implemented inside AVS/Express and ParaView Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005 Expected outcomes Expected outcomes Better design methods Specific visualization techniques Scientific data management competencies supported by supported by

Upload: others

Post on 12-Sep-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scientific Data Management for Visualization

1

Scientific Data ManagementScientific Data Managementfor Visualizationfor Visualization

Implementation ExperienceImplementation Experience

Mario Valle (presenting), Jean FavreMario Valle (presenting), Jean FavreSwiss National Supercomputing Centre (CSCS)Swiss National Supercomputing Centre (CSCS)

Etienne ParkinsonEtienne ParkinsonVA TECH HYDROVA TECH HYDRO

Alexandre Perrig, Mohamed Alexandre Perrig, Mohamed FarhatFarhatEPF Lausanne LMHEPF Lausanne LMH Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

The Turbine Simulation ProjectThe Turbine Simulation Project

The Turbine Simulation project brings together academic and industrial partners to develop an advanced method for flow simulation in Pelton turbines.

The expected result of the project is a significant improvement of the turbine design process.

All images and data are courtesy of VaTech Hydro and EPFL

LMHLMHLMHLMHLMHLMHLMHLMH

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Validation of modeling assumptionsValidation of modeling assumptions

Real world turbine Simulated turbine

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Comparison double roleComparison double role

ValidationValidationValidationValidation

ExperimentExperimentExperimentExperiment

Find theFind theFind theFind the

unexpectedunexpectedunexpectedunexpected

SimulationSimulationSimulationSimulation

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Specific visualization toolsSpecific visualization tools

AVS/Express

Project data readers andspecialized techniques

Implemented insideAVS/Express andParaView

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Expected outcomesExpected outcomes

Better design methods

Specific visualization techniques

Scientific data management competencies

supported by

supported by

Page 2: Scientific Data Management for Visualization

2

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

validateddata

derivedproducts

aggregateddata

Experimental data

Sameconditions

Simulation results

Various kind of related dataVarious kind of related data

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Heterogeneous datasetsHeterogeneous datasets

PDF, scanned blueprints, Word documents

Documents

CAD files, STLGeometries

JPEGImages

LMH custom formatExperimental data

HDF5Derived data

Ansys CFX filesSimulation results

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Capture project knowledgeCapture project knowledge

There are data that we forgot often to consider part of the project knowledge:

� Simulation parameters� Relationships (this derived from that)� Workflow definitions� Version of the math libraries, etc.� “Everyone knows this” syndrome� Free form annotation of experiences

This knowledge usually remains in the head of the researcher (and often is lost after PhD or project end)

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Collect the project needsCollect the project needs

1. The project needs a uniform method to store knowledge and metadata about heterogeneous datasets

2. The project needs a uniform method to store relationships between heterogeneous datasets

3. Users want a uniform access to the visualization tools

4. The solution should not disrupts the normal workflow

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Data management for visualization Data management for visualization

1. Visualization outcomes are as good as the data they are based upon

2. Diverse datasets are integrated to gain insight

3. Visualization could produce derived data that should be managed

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Existing solutionsExisting solutions

Project people try hard to record somewhere useful information about their data

When those information are defined, are stored using very “ad-hoc” methods, like in a file name and path:

Run date Run number = “5”Format = “CFX”

/Pelton/simulations/2004-09-23/head=300/Q=5/rpm=3000/pelton_005.res

Param = “rotational speed”

Param = “flow rate”

Param = “head”Project name

Data type = “simulation”

Page 3: Scientific Data Management for Visualization

3

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Metadata inside the data fileMetadata inside the data file

EXIF image metadata example

CFX metadata example

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Standard scientific data formatsStandard scientific data formats

There are general purpose file formats that support metadata storage

But they force conversion of data to a different format

XML is a general storage format for metadata.

One XML based scientific data format (XDMF) introduced the distinction between heavy (native data) and light (metadata) data

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Similar problemSimilar problem

Digital Libraries have similar problems

A library item could contain:� Images of a manuscript� Textual transcript� Related information

They created the MetadataEncoding and TransmissionStandard (METS)

It is an XML schema-basedspecification for encoding“hub” documents for materials whose content is digital.

Unfortunately METS cannot be adopted as-is.Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Scientific Data BagScientific Data Bag

A lightweight method to add metadata to a

set of heterogeneous datasets, to record their

relationships, to store derived data and to

provide uniform access to data and metadata.

The Scientific Data Bag (SDB) is a glue

format, not a new scientific data format.

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Scientific Data BagScientific Data Bag

pointersto data

metadata&

annotations

metadata&

annotations

Scientific Data Bag

data internalstructure

The Scientific Data Bag metaphor recalls the fact that a bag usually contains disparate, but related things.

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

SDB implementationSDB implementation

SDB is a small XML file that goes together with the data files

But it is not the most important thing offered by SDB; it instead offers:

1. Logical data structure

2. Implementation layer (API)

3. Operational workflow

Page 4: Scientific Data Management for Visualization

4

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Bag and ThingsBag and Things

Bag

ThingThing Thing …

SimulationExperiment

data

Filtered

timeseries

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

InfoBlocksInfoBlocks

Bag

ThingThing Thing …

InfoBlock

InfoBlock

InfoBlock

Sticker

StickerTimesteps

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

AnnotationsAnnotations

Bag

ThingThing Thing …

InfoBlock

InfoBlock

InfoBlock

Annotation

Annotation

Processing Notes

Journal

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

FileFile

Bag

ThingThing Thing …

InfoBlock

InfoBlock

InfoBlock

Annotation

File

Nativedatasets

Annotation

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Structural MapStructural Map

Bag

ThingThing Thing …

InfoBlock

InfoBlock

InfoBlock

Annotation

File

Annotation

StructuralMap

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Relational MapRelational Map

Bag

ThingThing Thing …

InfoBlock

InfoBlock

InfoBlock

Annotation

File

AnnotationRelationalMap

StructuralMap

Derived-from

Page 5: Scientific Data Management for Visualization

5

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

� Support for evolutionary development

� Semantically rich description of the bag logical structure

� Not based on XSchema

External profileExternal profile

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Metadata harvestingMetadata harvesting

Harvestingscript

Native datasets

SDB

Editing

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

SDB browsingSDB browsing

Browsing

Visualization

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Bag browserBag browser

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Bag browser selectionBag browser selection

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Project statusProject status

� SDB is evolving right now

� For now the description of CFX simulations and images collections are stable

� The experimental data description is still evolving

� SDB files are slowly being produced and stored

Page 6: Scientific Data Management for Visualization

6

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

Unplanned outcomeUnplanned outcome

A side effect: the competencies acquired has been used by our users

� Tutorials on data management for CSCS users

� Consultancy for projects external to CSCS domain like the SPECTRA ESA satellite mission

� Awareness of the importance of data management for unrelated fields like molecular dynamics

Data Management for Visualization – Mario Valle – CSCS – Magdeburg 04/03/2005

FutureFuture

� Still missing some important clients

� Better integration with visualization tools needed

� In future larger scale projects SDB could play the role of an intermediate step to load a real database

Scientific Data ManagementScientific Data Managementfor Visualizationfor Visualization

Implementation ExperienceImplementation Experience

Thanks for your attention!Thanks for your attention!

Mario Valle et al.Mario Valle et al.

[email protected]@cscs.chhttp://http://www.cscs.ch/~mvalle/sdmwww.cscs.ch/~mvalle/sdm//