c3-grid * federation system for climate data handling
DESCRIPTION
C3-Grid * Federation System for Climate Data Handling. Stephan Kindermann German Climate Computing Center – DKRZ. * C ollaborative C limate C ommunity Grid Project (Part of D-Grid Initiative). Overview. C3Grid Overview: Architecture, Partners, Goals.. - PowerPoint PPT PresentationTRANSCRIPT
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 1
C3-Grid* Federation System for Climate Data Handling
Stephan Kindermann
German Climate Computing Center – DKRZ
* Collaborative Climate Community Grid Project (Part of D-Grid Initiative)
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 2
C3Grid Overview: Architecture, Partners, Goals..
C3Grid Federation System Components: C3Grid ISO Discovery Metadata and Metadata Catalog
A short interop. study: C3Grid ISO Metadata / Geonetwork
Data Access and Preprocessing
C3Grid Security
Overview
C3Grid / IPCC ?
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 3
C3Grid Data and Job Management Middleware
D-Grid(SRM, d-cache,..)
D-Grid(SRM, d-cache,..)
C3Grid: Overview
World Data Centers Research Institutes
Climate Mare RSAT PIK GKSSAWI MPI-M
Universities
FU Berlin Uni Köln
Data Access Interface
DWD
ISO Discovery Metadata
Data +
Metadata
WorkflowData +
Metadata
Grid Data / Job Interface
ISO 19139
Discovery
Catalog
Result Data Products + Metadata
C3Grid Data Providers
Collaborative Grid Workspace(A)(B)
?!
IFM-GeomarDKRZ
Portal
C3RC
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 4
(A) Metadata for Data Discovery: Design and Implementation
Data Access Interface ISO Discovery Metadata
ISO 19139
Discovery
Catalog
C3Grid Data Providers
(A)
?
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 5
(A) Metadata – harvesting and lookup components
<<centralBuffer>>
DOM tree
validate againstschema
<<centralBuffer>>
DOM tree
transform byXSL
apply XPath
field
apply XPath
field
add documentto index
serializeDOM
XMLblob
accept Document asDOM tree
LuceneIndex
LuceneIndex
LuceneIndex
VirtualIndex
VirtualIndex
DataProvider
DataProvider
FileSystem
OAI-PMHHarvester
OAI-PMHHarvester
DirectoryHarvester
Index Builder
Sea
rch
Inte
rfac
eS
earc
h In
terf
ace
• Fast Range Queries
• Java API + Web Service Interface
made available on sourceforge.net see also: http://www.panfmp.org
• Technology
ISO 19115/19139 metadata profile
OAI-PMH harvesting catalogue
lucene based catalogue search
GridSphere based portal
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 6
(A) C3Grid ISO 19139 profile
Design criteria:
no schema extensions, profiling by restriction restriction using schematron constraints „the granularity of the discovery metadata should reflect
the logical organization of the data repository at a sufficiently coarse grained level“ (1)
CF based content description Link to resource metadata infrastructure
(GT4-MDS based)
(1) Inspire: DT Metadata – Draft Implementing Rules for Metadata (version 2, 02/02/2007)
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 7
(A) C3Grid ISO Profile
Description at aggregate level (e.g. experiment)
Aggregate extent description
with multiple verticalExtent sections
Sub-selection in data request
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 8
(A) C3Grid ISO Profile: CF usage
<contentInfo><MD_CoverageDescription> <attributeDescription><gco:RecordType>air_temperature</gco:RecordType></attributeDescription> <dimension xlink:href="#verticalCRS_hPa"><MD_RangeDimension> <descriptor><gco:CharacterString>K</gco:CharacterString></descriptor> </MD_RangeDimension></dimension></MD_CoverageDescription></contentInfo><contentInfo><MD_CoverageDescription> <attributeDescription><gco:RecordType>sea_surface_temperature</gco:RecordType></attrib…> <dimension xlink:href="#verticalCRS_m"><MD_RangeDimension> <descriptor><gco:CharacterString>K</gco:CharacterString></descriptor> </MD_RangeDimension></dimension></MD_CoverageDescription></contentInfo>
Reference to vertical CRS
Content description based on (extended) CF names
Link to corresponding vertical CRS
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 9
(A) C3Grid ISO profile
Data Distributor Info:
• reference to C3Grid resource metadata catalog (MDS) (names service endpoints)
• (optional: service endpoints)
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 10
(A) C3Grid ISO profile
Data provenance description:
• by now (data staging output): simple sequence of ProcessStep descriptions
• later (c3grid processed data): combined Source/ProcessStep blocks + external data provenance store
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 11
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 12
C3Grid ISO Profile: A short geonetwork experiment
Federation building: • OAI-PMH, WebDAV, Z39.50, geonet
• Full ISO metadata support (ISO19139/19119)
• OGC CSW 2.0 reference impl.
• RSS and GeoRSS newsfeeds
• SKOS based thesauri
• adaptable to new schema`s
• schematron constraint checking
On roadmap:
• flexible ISO profile support
• shibboleth integration
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 13
C3Grid ISO Profile: A short geonetwork experiment
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 14
Building complex metadata federations …
Harvesting via:• CSW• OAI-PMH• Geneonet• Web-Dav
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 15
C3Grid ISO Profile: A short geonetwork experiment
Import / Edit / Search: ok
Missing:
• content (CF) search
• vertical search
• temporal BBox search
• data staging
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 16
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 17
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 18
complete portal protoype to seach, access (pre-process) data described by C3Grid ISO profile in 3 weeks based on geonetwork open source solution ..
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 19
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 20
(B) Data Access and Preprocessing
World Data Centers Research Institutes University Partners
Data Access Interface ISO Discovery Metadata
Data +
MetadataData Analysis Workflow
Data +
Metadata ISO
Discovery
Catalog
Result Data Products + Metadata
C3Grid Data Providers
Collaborative Grid Workspace (A)(B)
?
!
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 21
Data Staging Request
Data IDs
Output Properties
Selection:• lon, lat, alt• time• content: CF
Data Staging Web Service
DBFlatFile
ArchiveDistributed C3Grid Work Space
(B) Data Access and Pre-Processing: Implementation
Offer Time / resource
estimation
skeleton implstatus..
Provider staging jars
Provider staging scripts
MD DB
WS GRAM
JSDL baseddescription
Processing jobs
Local resource manager
• C3Grid Generation 1: secured plain web services(status)
• C3Grid Generation 2: WSRF service interfaces (scheduled november 08)
• Generation 2+: full PKI/SAML security stack
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 22
C3Grid Middleware Components
Scheduler: Globus WSRF based, accepts WSL workflow description: compute tasks + data staging tasks
Datamanagement: Globus WSRF based, offer negotiation with scheduler, consistent view to distributed data, (later: replica management, caching)
Globus MDS Resource Metadata Catalog: service registry, resource status
Dependency on Globus SW stack, no high level impl. support tools, impl. Globus 4.1.x migration ??, problems with delegation impl. (insufficient docu. and guidance)
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 23
C3Grid Workflow Analysis
workflow-related
task-relatedanalysis and
preparation of workflows
monitoring and management of
workflow execution
(individual) scheduling strategy to
optimize the management
Handler to facade single/ specific Tasks
interaction an moitoring via WS
Notification standard
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 24
(C) Security Infrastructure
Identity ProviderHome Organisation
Attribute ProviderVirtual Organisation
MyProxyMyProxy
Delegation Service
Grid Service
Grid ServiceGrid
Resource
GRAM / DataRAM
C3Grid Middleware
GridShibSAML tools
wflowclient
SLCS(CA)
SLCS(CA)
X509 Grid-proxy
GridShib for GT policy
Portal
<..SAML Assertions..>
SAML SAML
SAML
SAML
Personal /Group
Account
„Home attributes + VO attributes“
DFN
Browser
Webstartapp
Shibb.login
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 25
(C) Security Infrastructure
Status:• Shibb IdP`s running at core C3Grid partners • Online CA for short-lived credentials tested, set up & operated by DFN (the German NREN)• Online CA (DFN-SLCS) accreditation process with EUGridPMA started• SLC contain campus attributes as SAML assertion • Java Webstart app to bootstrap SLCS in development at DFN• GridShib SAML Tools (v0.6.0) tested• Prototype of shibbolized GridSphere portal tested • open issues with GT4 proxy-delegation implementation
Next:• Integration of components• Virtual home organization for C3 users without a Shibboleth IdP • Integration of VO attributes (shibbolized VOMS)
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 26
C3Grid / IPCC Use Case
(0) IPCC Metadata harvested / mirrored in CERA DB (WDCC)
(1) Metadata visible in C3 Portal
(2) User issues IPCC data import from external repository
(3) User OpenID IdP / + IPCC_Access role external repos
(4) Download ?? C3 Repository
(5) C3Grid grants access to users with IPCC_Access role
‘grant procedure ?’: before each wflow exec. contact to IdP/AttributeService ?? or more offline method ?
C3RC / C3 WorkspaceIPCC data import
Wflow result publication
Analysis wflow
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 27
Appendix
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 28
C3Grid Content Info (Version 2)
<contentInfo> <MD_CoverageDescription> <attributeDescription> <gco:RecordType> CF_name_with_attribute</gco:RecordType> </attributeDescription>
<contentType> <MD_CoverageContentTypeCode codeList="http://wis.wmo.int/2006/catalogues/cf-standard-name-table.xml"
codeListValue="air_temperature"> air_temperature with a cell_methods attribute including time:mean
(interval: 1 day) </MD_CoverageContentTypeCode> </contentType> <dimension xlink:href="#verticalCRS_hPa"><MD_RangeDimension><descriptor>
<gco:CharacterString>K</gco:CharacterString> </descriptor></MD_RangeDimension</dimension>
</MD_CoverageDescription> </contentInfo>
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 29
Security Aspect: C3Grid step 0 step 1
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 30
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 31
18.09.08 / S. Kindermann / DKRZ GO-ESSP 2008 32
Lucene+ Index
(C) Data Reuse of Analysis Results: Metadata Generation
OAI-Harvester
WS Interface
C3Grid Workspace
wflow
m_tool
OAI-PMH Server
Portal
p_data
parent
process step
source
collection
• Time stamp• Description• Citation info
• Description
*
+
+
0..*
0..1
is_part_ofhas_parent
is_generated_by
is_generated_by
has_input
“quality check”
APIPrototype (Python)
Context description of Analysis Data:
• Aggregation
• Processing history