imarine exploitation opportunities
DESCRIPTION
iMarine e-Infrastructure and Applications CatalogueTRANSCRIPT
Exploitation opportunitiesExploitation opportunities
Pasquale Pagano (CNR)iMarine Technical Director
Outline
• Heterogeneous resources as a service
• Data Bonanza
• Virtual Research Environment
• Software platform
The Infrastructure The Infrastructure
• StatsCube
• GeosCube
• BiolCube
• ConnectCube
iMarine CatalogueiMarine Catalogue
I-MARINE EXTENDED BOARD 2
Distinguishing capabilities of the iMarine e-infrastructure and its
THE INFRASTRUCTURE
Distinguishing capabilities of the iMarine e-infrastructure and its enabling software
3I-MARINE EXTENDED BOARD
Concepts
The initiative
(the visionary leadership)
The e-infrastructure
I-MARINE EXTENDED BOARD
(the operational platform)
The system
(the enabling sw system)
4
e-Infrastructure
Geographically Distributed Computing
Infrastructure
Geographically Distributed Computing
Infrastructure
Across administrative
boundaries
Across private and commercial
providers
Across administrative
boundaries
Across private and commercial
providers
Service Allocations,
Deployment, Monitoring, and
Operation
Service Allocations,
Deployment, Monitoring, and
Operation
Uniform resource and data access
Uniform resource and data access
I-MARINE EXTENDED BOARD 5
Infrastructure: key characteristics
• Efficient and tailored storage technologies
• Computational environments dealing with the volume of the data
• Elastic management of the resources, monitoring, alerting, recovery
• Collaborative environment to support scientific communities
• Rich portfolio of applications to perform access, validation, enriching, processing, sharing, and mash-up of data
I-MARINE EXTENDED BOARD 6
Infrastructure: Storage as Service
• Open source RDBMS
• Up to 1 TB data
• Secure
• Fault-tolerant
• Replication
Virtual Workspace
Virtual Workspace
Relational DatabasesRelational Databases
45 TB Currently Used
• Scalability and high availability
• Across sites
• ISO 19115/10139 Metadata
• Catalogue
Large and Active data
storage
Large and Active data
storage
Spatial Database
Spatial Database
I-MARINE EXTENDED BOARD 7
45 TB Currently Used
Infrastructure: Computing as Service
Hadoop
Statistical Manager
• MapReduce
• Analysis/clustering/modeling
33
0 C
ore
s C
urr
en
tly
All
oca
ted
Manager
R clusters • Windows and Linux
I-MARINE EXTENDED BOARD 8
33
0 C
ore
s C
urr
en
tly
All
oca
ted
Infrastructure: Management as Service
Operation Machine readable SLAs
Machine readable monitoring, auditing, billing, reporting, and notification
Machine readable resource/performance capabilities description
Trust Privacy, governance, and attribution
Security, trusted network
I-MARINE EXTENDED BOARD 9
Infrastructure: Collaborative Environment
A single place to
• Get status and updates from applications and other users they are interested in;
• Get notifications about messages, jobs completion, new generated products, etc.
The Social Portal offers a familiar view of what is
happening on their VREs
I-MARINE EXTENDED BOARD 10
Infrastructure: Collaborative Environment
A single place to
• Manage all the portal extension.
Messages
The Social Portal offers a familiar view of what is
happening on their VREs
I-MARINE EXTENDED BOARD 11
MessagesNotifications PageWorkspace
Home Social
Search in your Workspace
Infrastructure: Collaborative Environment
The Social Portal offers a familiar view of what is
happening on their VREs
A single place to
• Manage data, store and preserve them
• Share data
I-MARINE EXTENDED BOARD 12
Google Analytics iMarine portal
13I-MARINE EXTENDED BOARD
iMarine
OBISWoRMS
WoRDS
GBIFEuroS
tat
Data.FAO
…
iMarine
Validation
EnrichingSharing
Data Bonanza
Private Cloud
iMarine
CoL
ITIS
IRMNG
NCBI
MyOcean
WOA
iMarine Registries
Enriching
Processing
Sharing
I-MARINE EXTENDED BOARD
Commercial Cloud
14
Data Bonanza
BiodiversityBiodiversityStatistical Statistical
DarwinCore / ISO19139
>35 M Observations (OBIS)
≈ 120 K Observed Species
(OBIS)
≈ 500 K Taxa (WoRMS)
>600 K Scientific Names
(ITIS)
>12 K Species Distribution
Maps (AquaMaps)
≈ 600 Species Extent (FAO)
SDMX *
- FAO CodeLists
- IRD CodeLists
- FAO Global
Aquaculture
Production
- FAO Global Capture
Production
- FAO Global
Production
- EurostatGeospatialGeospatial
I-MARINE EXTENDED BOARD
… FishBase, SeaLifeBase
… CoL, GBIF
- Eurostat
- …
ISO19139 (OGC W*S)
� 10 years Chemical and Physical variables in 2D space
� Ice concentration and velocity, Chlorophyll, Oxygen, Nitrate, Phosphate,
Phytoplankton as carbon, Salinity, Temperature, …
� On-demand Chemical and Physical variables in 3D space
� Apparent Oxygen Utilization, Dissolved Oxygen, Salinity, Temperature, …
> 3
00
vari
ab
les
15
Not Only Access• Access
– Retrieval of geospatial data as space/time-varying phenomena
– Direct fine-grained access to feature and feature property level.
• Validation– User-defined quality and dissemination level
• Enriching
I-MARINE EXTENDED BOARD
• Enriching– Generation metadata, exploitation of reference data, linking to
environmental dataset
• Processing – Analysis and mining exploiting e.g. R, Weka and RapidMiner
statistical frameworks
• Sharing– User-driven process to decide how other agents (human / machine)
can access information
16
Presence
Points
(FishBase
+
Obis)
Density Based Clustering
DBSCAN
(with outliers)
Features Clustering with StatsCube
I-MARINE EXTENDED BOARD
Other methods are also
available …
K-Means
X-Means 17
Ecological Modeling with BiolCube
I-MARINE EXTENDED BOARD 18
VS
FAO Eleutheronema tetradactylum
Maps Comparison with GeosCube
MEAN=0.81
VARIANCE=0.02
NUMBER_OF_ERRORS=6691
NUMBER_OF_COMPARISONS=259200
ACCURACY=97.42
MAXIMUM_ERROR=1.0
MAXIMUM_ERROR_POINT=3005:363:1 VS
AquaMaps Eleutheronema tetradactylum
MAXIMUM_ERROR_POINT=3005:363:1
COHENS_KAPPA=0.218
COHENS_KAPPA_CLASSIFICATION_LANDIS_KOCH=Fair
COHENS_KAPPA_CLASSIFICATION_FLEISS=Marginal
TREND=EXPANSION
RESOLUTION=0.5
I-MARINE EXTENDED BOARD 19
Not Only Access, Validation, Enriching,
Processing, Sharing
• It is always possible to save the discovered data in various Standard formats
• It is always possible to collaborate with co-workers through a dedicated workspace.
• Mash-up data across diversity
– Accessing statistical datasets in SDMX, geo-referencing
I-MARINE EXTENDED BOARD
– Accessing statistical datasets in SDMX, geo-referencing
them, describing them in ISO19139, and making them
available via OGC W*S standard protocols
– Accessing species observation datasets in DwC, analysing
their distribution trend via R, and projecting them in
geographical space
– Accessing species taxonomies in DwCA and publishing
them as reference data in SDMX
20
Data Bonanza: a common vision
Integrate and harmonize cross-
disciplinary data and information
across information systems and
workflows to support evidence-based
decision making
iMarine is implementing this vision through the
adoption of Standards, the identification of
common Methods and the implementation of
Tools which enable integration and
harmonization.
I-MARINE EXTENDED BOARD 21
Is this enough?
• An ecosystem of participatory data e-Infrastructures
• Regulated by policies
• Enabled by standards
• Promoting not only • Promoting not only access but mash-up of heterogeneous data
I-MARINE EXTENDED BOARD
User centric
22
User-Centric View
User-centric view of an ecosystem of
participatory data e-Infrastructures to
• Cope with the overwhelming amount of data
and capacities
• Promote re-use of data
• Encourage sharing of resulting products
User-centric and workflow-oriented
I-MARINE EXTENDED BOARD 23
Virtual Research Environment
iMarine is user-centric and workflow-oriented thanks to the gCube VRE technology
Virtual Research Environment (VRE) is
• a distributed and dynamically created environment
• where subset of data, services, computational, and storage resources storage resources
• regulated by tailored policies
• are assigned to a subset of users via interfaces
• for a limited timeframe
• at little or no cost for the providers of the participatory data e-infrastructures
I-MARINE EXTENDED BOARD 24
L. Candela, D. Castelli, P. Pagano (2013) Virtual Research Environments: An Overview and a
Research Agenda. Data Science Journal, Vol. 12
Software Platform
Software platform to abstract over differences in location, protocols, and models by
scaling no less than the
It turns resources and technologies into a utility by offering a single registration, monitoring, and access facilities
Fle
xib
leF
lexi
ble
I-MARINE EXTENDED BOARD 25
scaling no less than the interfaced resources,
keeping failures partial and temporary,
reacting to and recovering from a large number of potential issues.
and access facilities
Storage, Discovery, Indexing, Search, Execution, …
Fea
ture
-richFe
atu
re-rich
Software Platform
26I-MARINE EXTENDED BOARD
iMarine Exploitation models
Service Data hosting Infrastructure
Unlimited users, Infrastructure support, helpdesk, back-up, security
Validation (records) Workspace Hardware
Default Processing (<1MB) Social Tool Community Management
Storage 1TB Cloud Resources
Validation (Datasets) Custom Data Resources
Custom Processing (> 1MB) Spatial Data integration
User Management Large and Active Storage
Unlimited VRE’s
Hour/Day Month Year
27
Concept map of the products
I-MARINE OFFER
28I-MARINE EXTENDED BOARD
Application Bundles
Management and interpretation of biological and ecological data in the environmentManagement and interpretation of biological and ecological data in the environment
Complete full life-cycle data framework, from observational data to aggregated data repositories enriched with validation and analytical tools
Complete full life-cycle data framework, from observational data to aggregated data repositories enriched with validation and analytical tools
A BUNDLE is
a set of
services and
technologie
s grouped
according to
a family of
I-MARINE EXTENDED BOARD 29
Storage and interpretation of geospatial explicit information, including WPS processingStorage and interpretation of geospatial explicit information, including WPS processing
Flexible sharing, storage, reporting, search and retrieval, aggregation and projection facilitiesFlexible sharing, storage, reporting, search and retrieval, aggregation and projection facilities
a family of
related
tasks for ac
hieving a
common
objective
Discussion time
Thank you
for your attention
30
www.i-marine.eu
I-MARINE EXTENDED BOARD