michel hoepffner, nathalie fourès, hassan makhmara and

51
How will we manage large How will we manage large multidisciplinary scientific multidisciplinary scientific datasets datasets Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and Fernando Niño (Medias-France)

Upload: others

Post on 24-Jan-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

How will we manage largeHow will we manage largemultidisciplinary scientificmultidisciplinary scientific

datasetsdatasetsMichel Hoepffner, Nathalie Fourès, Hassan

Makhmara and Fernando Niño(Medias-France)

Page 2: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Preservation versus added-value

• The best long-term preservation forscientific data: have it stored in technicalcenters managed by data experts

• The best added-value for scientific data: totimely update datasets by the scientists incharge

We think it’s possible to solve this apparentdilemma

Page 3: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

The actors• Scientists:

– Research teams, represented by their PrincipalInvestigators

– Users (the scientific community)– Multidisciplinary international scientific programs: IGBP

2, WCRP, IHDP, etc.

• Operators:– National agencies providing data (space agencies, etc.)– Technical operators like Medias-France

Page 4: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Medias-France in few wordswith:

A SERVICE STRUCTURE:– DATABASE AND INFORMATION SYSTEM DESIGN

AND MANAGEMENT– PROJECTS SUPPORT (OBSERVING SYSTEMS AND

NETWORKS, etc.)– TRAINING (FELLOWSHIPS, etc.)– CONSULTANCY AND EXPERTISE (GMES, …)

http://medias.obs-mip.fr

Page 5: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Development and management of database andDevelopment and management of database andinformation systemsinformation systems

Sea/Atmosphere/Hydrology/

/Atmosphericchemistry

EarthScience

Environment Tools

EndedEMET-CEPHAPEX-SAHELJGOFS1ELMASIFAEPITHERMEFETCH*JGOFS2*

EXPRESSOIDAFMOZAICPIC DU MIDIO3OAIRQUALIGAC I & SPRE-ESCOMPTE

POLLENWDC-A(EPD)(EDDI)FORMATAPDCD-Rom Pollen

SUD-SAHELImages-B/OSS

WEB-SiteCDROM MediterraneeMEDESERT 99UNISPACE IIIGIS GrassBASS 2000CEOS-CDROM 2000

In progress EMERCASEGMESCATCHPLUVIOM

DEBITSPREPA1*IDAFESCOMPTE

DIAFCLEHARESOLVECPCECLIPSE

ADAMMDMOSS/LIFEAID-CCDZA/ORME

RICAMAREGICC1SEARCHWEB-SiteISISAMMAPOLKAIMMEDIASEUFOREO

Planned AMMA/Histo/MétaProMedS2E/ARGOSIMFREXMEDWATERCHOLCLIMENSO/PNEDC

IGAC MetaPREPA2*AMMA/Chimie

PAGES/MetadataXPROXYGPS

BIODIVALPGEOLAND/POSTELGMES-NOWEDENORE/RETYSZA/APM

GICC2CIESINACMADIMEDIAS/ISSS2E/ARGOSPLAN BLEUPOSTELMAZURKAGMES/informations

Page 6: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Database• Disciplinary:

– Atmosphere• Emet

– Hydrology• Catch

– Palynology• APD

• Multidisciplinary:– Hapex-Sahel– Fetch– Amma (African Monsoon Multidisciplinary

Approach)

Page 7: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

International coordination• Member of the PAGES Data Board (Past

Global Changes) with the World DataCenters of Boulder (USA) and Bremen(Germany)

• Mirror site of the World Data Center ofBoulder (USA)

• Coordinator of various European fundedprojects (Elmasifa, APD, Format, Search,Ricamare, etc.)

Page 8: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

TOOLS• Management of Internet sites:

– web– ftp

• Data visualization and extracting interfacedevelopment

• Database network, with scientists involvedin the management of data

Page 9: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

The scientific institutions• To distinguish:

– Scientific databases (experiments, networks )– Archives (Meteorological Services, BRGM, IGN, etc.– Collections (Museums, …)

• To question on the role of the different actors inthe scientific world :– the institutions of which the single goal is the

scientific research (CNRS-Insue, IRD, etc.),– those of which the scientific goal coexists with the

operational or commercial goal (e.g. BRGM, IGN,etc.)

Page 10: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Organization of the community• In scientific centers: network

developments between scientists ofdifferent projects in order to adopt:

• Same formats for common data• A common documentation (metadata) by discipline• Quality control informations

• In technical centers providing:– Database development and maintenance– Data distribution

Page 11: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Some examples of databaseorganization

Page 12: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Scientific discipline databaseScientific discipline databaseAn An example with pollen data

Scientific entityScientific entity Technical entity

Page 13: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Authors Publent

Publication

Pb210Age scale

CoringC14

Geochron

Samples

CountsTaxa

Lithology

Description

Location

Simplified structure of the Pollen DB

Page 14: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Examples of various data

Page 15: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Fossil wood

The taxa identificationand the observation oftree rings giveinformation on climatechanges

Page 16: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Megarid 3D of gravity system

Conglomerate

Guwaysah formation, Oman (F. Guillocheau, C. Robin)

The sedimentary faces

Page 17: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

PALBOT� Base de données de la collection de Paléobotanique de l’Université Paris 6� Environ 15000 fossiles végétaux de natures très diverses :

macro et micro fossiles ;structures perminéralisées ;

empreintes et compressions

Classification Evolution et BiosystématiqueLaboratoire de Paléobotanique et Paléoécologie (J. Broutin)Laboratoire Informatique et Systématique (R. Vignes Lebbe)

� Importance pour la recherche ; nombreux holotypes

� Base de données en cours ; accessible à l’URL http://albinoni.snv.jussieu.fr

Page 18: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

tmsi-idt-isib septembre 1999 18

Catalogues – Meta-données• Campagnes : 5445résumés ROSCOP/CSR(Cruise Summary Report)

• Bases/Jeux dedonnées des laboratoiresde la communauté françaisesou acquis à titre d'échange :306 fiches descriptivesEDMED (European Directoryof Marine EnvironmentalDatasets)(dont 82 au SISMER)

• Stations d’observations« Temps Réel » EDIOS,MAMA

• EUROSEISMIC

Page 19: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and
Page 20: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Les diagrammes polliniquesGolfe de Guinée Carotte CH22-KW31 ( 03°31'1N - 05°34'1E )

OLE

AC

EA

E

PA

LMA

E

PA

ND

AN

AC

EA

E

PO

DO

CA

RP

AC

EA

E

PR

OTE

AC

EA

E

RH

AM

NA

CE

AE

RH

IZO

PH

OR

AC

EA

E

RU

BIA

CE

AE

RU

TAC

EA

E

SA

LIC

AC

EA

E

SA

PIN

DA

CE

AE

Dep

ht (c

m)

Chr

ono.

age

s ca

lend

aire

sJa

smin

um

Ole

a

Ole

a ca

pens

is-ty

pe

Ole

a eu

ropa

ea-ty

pe

Cal

amus

dee

rratu

s

Ela

eis

guin

eens

is

Hyp

haen

e-ty

pe

Pho

enix

reci

nata

-type

Pan

danu

s

Pod

ocar

pus

Pro

tea-

type

Zizi

phus

-type

Rhi

zoph

ora

Dic

tyan

dra-

type

Gar

deni

eae

Hal

lea-

type

stip

ulos

a

Ixor

a-ty

pe b

rach

ypod

a

Kee

tia-ty

pe

Mitr

agyn

a-ty

pe in

erm

is

Mor

elia

-type

sen

egal

ensi

s

Mor

inda

-type

Nau

clea

-type

Pau

siny

stal

ia-ty

pe

Psy

drax

sub

cord

ata-

type

Unc

aria

-type

afri

cana

Van

guer

ia-ty

pe

Citr

us-ty

pe

Cla

usen

a an

isat

a

RU

TAC

EA

E

Zant

hoxy

lum

-type

Zant

hoxy

lum

-type

xan

thox

yloi

des

Sal

ix

Allo

phyl

us

Aph

ania

sen

egal

ensi

s

Dei

nbol

lia

Dod

onae

a

36 3530 0 0 0 0 0 0 0 0 0 40 0 0 110 0 0 0 2 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 044 3633 0 0 0 0 0 0 0 1 0 28 0 0 160 0 0 0 1 2 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 281 4126 0 0 0 0 0 0 0 0 1 12 0 0 253 0 0 0 1 3 3 0 0 0 2 0 0 0 0 0 0 3 0 0 0 0 0 1

121 4645 0 0 5 3 0 1 0 0 1 10 0 0 204 0 0 0 3 0 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0142 4929 0 0 6 0 0 0 0 0 0 11 0 1 208 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0182 5458 0 0 6 0 0 0 1 1 1 4 0 1 120 0 0 0 1 1 6 0 0 0 1 0 0 0 0 0 0 0 0 0 2 0 0 0221 5970 1 0 7 0 0 0 0 0 2 1 0 0 140 0 1 1 4 0 10 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0281 6773 0 0 0 0 0 2 0 0 0 2 0 0 138 0 0 3 0 2 9 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 1318 7250 0 1 0 0 0 1 0 0 1 1 0 0 152 0 0 0 0 3 5 0 0 0 3 0 0 0 0 0 0 0 1 0 1 0 0 0380 8080 0 0 10 0 0 0 0 0 4 3 0 3 187 1 0 1 2 3 0 0 0 1 2 0 1 0 0 0 1 1 0 0 0 0 0 0422 8632 0 0 3 0 0 1 0 0 0 3 0 0 54 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0462 9156 0 0 0 0 0 1 0 0 6 3 0 4 156 0 0 0 1 4 9 1 0 3 2 0 0 0 1 0 0 0 0 0 0 0 0 1480 9402 0 0 5 0 0 0 0 0 2 2 0 0 191 0 0 1 4 1 2 0 0 0 7 2 0 0 0 0 0 1 1 0 1 0 0 0522 9953 0 0 2 0 0 0 0 0 0 3 0 0 94 0 0 0 0 2 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0580 10723 0 0 2 0 0 0 0 0 6 5 0 0 116 0 0 0 0 1 3 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0617 11209 0 0 1 0 1 1 1 0 0 2 1 0 222 0 0 0 2 1 5 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0639 11504 0 0 1 0 1 0 1 0 0 1 0 1 71 4 0 0 1 2 3 0 0 0 3 0 0 0 0 0 0 0 1 1 0 0 0 0642 11537 0 0 1 0 0 0 0 0 0 0 0 0 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0674 11959 0 0 3 0 1 0 1 0 1 2 0 0 132 0 0 0 1 1 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0700 12311 0 0 6 0 0 0 0 0 0 0 0 0 149 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0Juillet-2001 Jean-Pierre Cazet

)nus

20

Alnus

20

Betula

20 40 60

Corylus

20

T ilia Quercus

20

Ephedra

fragilis

-type

Maerua

20

Artemisia

20 40 60 80

Gramine

ae

20

Chenopo

diacea

e/Amara

nthace

ae

Hyphaen

eRhiz

ophora

Hymeno

cardia

20

Alchorn

ea

20

Elaeis g

uineen

sis

20 40

Cyperac

eaeTyph

a/Spar

ganium

Tubulif lo

raeLig

uliflora

ePlan

tago

20 40 60 80 100

Fungal s

pores

500 1000 1500Tota

l

20 40 60 80

GroupI

GroupII

20

G roupII

I

20 40 60 80

GroupIV

20 40 60

G roupV

GroupV

I

20

G roupV

II

GroupI GroupIIGroupIII GroupIV GroupV GroupVI GroupVII

METEOR M 16415-2 (09°34'N, 19°06W)Hooghiemstra, H. et al. (1988)

Page 21: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

The African Pollen Database

�Data :

Tilia file

Paradox Table

�Free Software:

Tilia

Calib3

TGview

Page 22: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

The numerical simulations

3D Visualization

Quantification

F. Guillocheau, C. Robin, D. Rouby

Page 23: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Data valorization

Page 24: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Oak migration inEuropeduring the lastdeglaciation

Taberlet et Cheddadi, Science (2002)

European PollenDatabase

Page 25: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Climate reconstitution 6000 years BP

(Cheddadi et al., ClimateDynamics, 1997)

Page 26: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Objectives given to the databases

• Data archiving, which implies a discussion onthe stakes and the duration of it.

• Data access, which the services to be given to theusers, with the realization:– of a data policy by the scientists

– by the technical operator, in strong relationship with thescientists:

• of a data visualisation and extraction interface

• of data valorization tools

Page 27: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

A coordination to be obtainedbetween the institutions

• Database functioning rules

• Data value and property

• Relationship with industry for an open access to thedata

• Setting up of a data policy

Page 28: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Planned• The setting-up of a data portal:

– This implies the adoption of internationalstandards for metadata formats

- This allows a flexible organization for datamanagement

• The starting up of a community (inter-labsand inter-institutions) working indatabase interoperability

Page 29: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

International standards forthe metadata

• In order to develop a portal for a common access to thedifferent databases, the priority is to collect all thedescriptive information on the data which will allow touse it.

• Numerous French institutions are working on,particularly Cnes (space), Medias-France, BRGM(geology), Ifremer (oceanography), etc.

• The international metadata used by them is: "ISO/DIS19115 Geographic information -- Metadata"

Page 30: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

A federated and adaptabledata management

In this view, shared by Medias-France for thedevelopment of « multi-proxy » databases forscientific projects:

• The data expertise is staying in the laboratorywhich has produced it

• The data center centralizes the differentinitiatives and manages the coordinatingsystem based on a metadata catalog

• The user asks one server for the access to the datacoming from different databases initiatives

Page 31: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Objectives

• To make easier the efforts to archive thescientific data

• To allow to describe and to localize them

• To homogenize the ways of access and use

Page 32: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Data archiving

• Data are saved in centers specialized in thedata storage

• Data are coherent and easily updated

• Access policies are under the control of thescientists in the data centers

Page 33: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Facilities for the search and thelocalisation of the data

• Data are described by the metadata in astandard format

• Tools used for the description of the dataare easy to use by the data providers (Webforms )

• Management and data access policy of themetadata are provided by a single entity

Page 34: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Access facilities

• Data pre-visualization and extraction arestandardized

• Data is not redundant• Data access policy is managed by the data

providers

Page 35: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Organization(federating unit)

• A center has to manage the metadata

• It gives data centers the necessary tools fordata description

• It provides the scientific community thetools for data search, description andlocalization

Page 36: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Organization(Scientific Data Centers)

• The data centers manage the data with theirmethodologies

• They implement (with the help of thefederating unit) a standard interface for dataexchange

• They describe their data with the standardtools provided by the federating unit.

Page 37: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Script of data exploration andextraction

Page 38: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

The metadata

• Problem: the user would like to access tothe data via their metadata

• Example : « I would like to know thepollens sudied by M. X in Africa »

Page 39: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

One solution: the catalog server

• Principle :

User Catalog Server

Metadata selection

Pollen,X,Afrique ?

Metadata base

List of data servers corresponding to the request

Page 40: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Data catalog (Metadata)

Metadata base(data catalog)

Data source

Exchange protocol

Data source

Exchange protocol

1. Search by criteria

2. Catalog query

4. Data localization(pointer, address)

5. Query the adequate

data source

3. Data list corresponding to the criteria

6. Data extraction

Dat

a C

ente

rsLa

bora

tori

es,

Spec

ializ

ed c

ente

rs

Gat

e (p

orta

l)

Page 41: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Medias-France proposal• To create ISO 19115 profiles for various

scientific disciplines, in strong relationshipwith:– The Pi’s of scientific disciplinary programs

(GPD, etc.)– The responsible of scientific international and

interdisciplinary programs (WCRP, IGBP,IHDP, etc.)

• To install a catalog server which willexploit these profiles

Page 42: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

The data servers

• Are accessibles via the catalog server

• Allow data visualization

• Example : the x-proxy server for thepaleoclimatology

Page 43: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

The x-proxy server

• The need: to reach with a single interfaceheterogeneous and remote data

• The condition: data ownership andmanagement is done by the scientists

Page 44: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

An example of an answer to thisneed

FederalCenter

World DataCenter

EuropeanData Center

Paleoclimatologist

Brest

Medias-France Arles

Boulder

Internet

Page 45: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Exchanges example

FederatingCenter

World DataCenter on

pollen

Europeancenter on

pollen

Paleoclimatologist

Brest

ToulouseArles

Boulder

“ I would like a map of the oak distribution in France, for the last ten years, from 1992 to 2002”

Oak distribution map in France

“I would like the data on oak pollen in France from 1992 to 2002 ”

Dataset concerning theoak pollens, in France,dated from 1992 to 2002

Page 46: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Las-Dods Architecture

Las Client

LasServer Dods

Server

Dods Server

LasInterface

Dods Client

Treatements

Page 47: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

An example of « multi-proxy » database

Page 48: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Proposition• To archive the pertinent and essential data,

validated by the PI

• To manage and maintain the databases intechnical agencies like Medias-France andin research teams

Page 49: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Specific role of Medias-France

• To assure that the data will be stored aftertheir validation by the scientific community

• To propose the specific access andmanagement tools for each type of data

• To develop the "multi-proxy" interface

Page 50: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

References(and acknowledgments)

• Communication of Anne-Marie Lezine at theProspective Symposium of the Insu Division « EarthSciences » the 24th of September 2002 at Vulcania

• Waldteufel’s report: «Les bases de données pour lesGéosciences, éléments d’un schéma directeur »published by the INSU et the CNES in 1999(http://medias.obs-mip.fr/www/francais/documentation/)

Page 51: Michel Hoepffner, Nathalie Fourès, Hassan Makhmara and

Medias:

A votre service