providing statistical algorithms as-a-service
DESCRIPTION
In computational statistics, algorithms often have specialized implementations that address very specific problems. Every so often, these algorithms are applicable also to other problems than the original ones. Today, interest is growing towards modular and pluggable solutions that enable the repetition and validation of the experiments made by other scientists and allow the exploitation of those algorithms in other contexts. Furthermore, such procedures are requested to be remotely hosted and to “hide” the complexity of the calculations, managed by remote computational infrastructures behind the scenes. For such reasons, the usual solution of supplying modular software libraries containing implementations of algorithms is leaving the place to Web Services accessible through standard protocols and hosting such implementations. The protocols describing the computational capabilities of these Services are more and more elaborate, so that modular workflows can rely on them.TRANSCRIPT
![Page 1: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/1.jpg)
Providing Statistical Algorithms
as-a-Serviceas-a-Service
Gianpaolo Coro, Pasquale Pagano,
Leonardo Candela
ISTI-CNR, Pisa, Italy
![Page 2: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/2.jpg)
Statistical Manager is a set of web services that aim to:
• Help scientists in managing marine, biological or climatic statistical problems
• Supply precooked state-of-the-art algorithms as-a-Service
• Perform calculations by using Cloud computing in a transparent way to the users
• Share input, results, parameters and comments with colleagues by means of Virtual
Research Environment in the D4Science e-Infrastructure
Statistical Manager
Research Environment in the D4Science e-Infrastructure
Statistical
Manager
D4Science
Computational
FacilitiesSharing
Setup and execution
![Page 3: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/3.jpg)
Architecture
![Page 4: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/4.jpg)
Internal Work
![Page 5: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/5.jpg)
Resources and Sharing
![Page 6: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/6.jpg)
Statistical Manager - Interface
![Page 7: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/7.jpg)
Experiment Execution
![Page 8: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/8.jpg)
Computations Check
Summary of the Input, Output
and Parameters of the experiment
![Page 9: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/9.jpg)
Data Space - Sharing and Import
![Page 10: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/10.jpg)
Hosted Algorithms
![Page 11: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/11.jpg)
o Ecology
o Environment
o Biodiversity
Application Fields
o Biodiversity
o Life
![Page 12: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/12.jpg)
EcologyEcology
![Page 13: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/13.jpg)
Niche Modelling
• AquaMaps – Suitable Habitat
• AquaMaps – Native Habitat
• AquaMaps for 2050
• Artificial Neural Networks
• AquaMaps - ANN
Gadus morhua
AquaMaps - Suitable Habitat
![Page 14: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/14.jpg)
Outliers Detection
Presence
Points
Density-based
Clustering
and Outliers detection
Cetorhinus maximus
Distance Based Clustering
K-Means
X-Means
DBScan
![Page 15: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/15.jpg)
Climate Changes Effects on Species
Estimated impact of climate
changes over 20 years on 11549
Bioclimate HSpec
Overall occupancy in
time
changes over 20 years on 11549
species.Pseudanthias evansi
The occupancy by the
Pseudanthias evansi
decreases in Area 71 but
increases in Area 77
![Page 16: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/16.jpg)
Similarity between habitats
Habitat Representativeness Score:
1. Measures the similarity between the environmental features of two areas
2. Assesses the quality of models and environmental features
Latimeria chalumnae
HRS=10.5HRS=10.5
Habitat
Representativeness
Score
![Page 17: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/17.jpg)
EnvironmentEnvironment
![Page 18: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/18.jpg)
Rasterization
A polygonal map is
transformed into a raster
map or into a point map
![Page 19: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/19.jpg)
Maps Comparison
compare
Compares :
• Species Distribution
mapsmaps
• Environmental layers
• SAR Images
![Page 20: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/20.jpg)
Periodicity and Seasonality
Periodicity: 12 months
Extraction Tools Fourier AnalysisExtraction Tools Fourier Analysis
![Page 21: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/21.jpg)
Environmental Signal Processing
Resampling
Spectrogram
![Page 22: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/22.jpg)
BiodiversityBiodiversity
![Page 23: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/23.jpg)
Occurrence Data from GBIF Occurrence Data from Obis
∩Intersection
-Difference
ᴜUnion
Occurrence Points
DD
Duplicates DeletionIntersection DifferenceUnion
A
x,y
Event Date
Modif Date
Author
Species Scientific Name
B
x,y
Event Date
Modif Date
Author
Species Scientific Name
Records
Similarity
Records
Similarity
Duplicates Deletion
![Page 24: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/24.jpg)
BiOnym
Preprocessing
And
Parsing
A flexible workflow approach to
Taxon name
Matcher 1
Taxon name
ReferenceReference
Source
(ASFIS)(FISHBASE)
Reference
Source
(FISHBASE)
ReferenceReference
Source
(WoRMS)
Raw Input String.
E.g. Gadus morua Lineus 1758
DwC-A)
Reference
Source
(Other in
DwC-A)
A flexible workflow approach to
taxon name matching
Accounts for:
• Variations in the spelling and
interpretation of taxonomic
names
• Combination of data from
different sources
• Harmonization and reconciliation
of Taxa names
Taxon name
Matcher 2
Taxon name
Matcher n
PostProcessing
Correct Transcriptions:
E.g. Gadus morhua (Linnaeus, 1758)
![Page 25: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/25.jpg)
Trendylyzer
• Fill some knowledge gaps on marine species
• Account for sampling biases
• Define trends for common species• Define trends for common species
Plankton regime shift
Herring recovered after the fish ban
Can we recognize big changes in
species presence?
![Page 26: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/26.jpg)
LifeLife
![Page 27: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/27.jpg)
Calculate the a and b parameters for 14 230
species by means of Bayesian Methods
Length-Weight Relationships
Approach:
� Collaborative development with the final user
� Integration of user’s R Scriptsbluewatermag.com.au
� Integration of user’s R Scripts
� Usage of Cloud computing for R Scripts
� Periodic runs
� The porting to the D4Science Statistical Manager allowed to run the scripts in distributed
fashion
� The time reduction was from 20 days to 11 hours! 95.4% reduction
![Page 28: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/28.jpg)
Functions Simulation - Spawning Stock Biomass vs Recruits
Estimate biological limits for 50
Northeast Atlantic fish stocks
� Use real measures
� Rely on previous expert knowledge
� Use Bayesian models to combine
information
Re-estimated SSB limit
Re-estimated HS
Rule-
based
HS
Re-estimated
precautionary limit
![Page 29: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/29.jpg)
Future WorkFuture Work
![Page 30: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/30.jpg)
Plan
• Make the Statistical Manager Algorithms accessible
through the OGC WPS standard (currently available via
SOAP and Java API)
• Invoke the algorithms from a Workflow Management• Invoke the algorithms from a Workflow Management
System (e.g. Taverna)
• Expand the system with new algorithms
![Page 31: Providing Statistical Algorithms as-a-Service](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b4ee754a79590a688b4639/html5/thumbnails/31.jpg)
Thank you