slide 1 aspire stakeholder workshop brussels thursday 13 september 2012 rosette vandenbroucke hpc...
TRANSCRIPT
Slide 1
ASPIRE STAKEHOLDER WORKSHOPBrussels
Thursday 13 September 2012
www.terena.org/aspire
Rosette VandenbrouckeHPC Coordinator
Middleware and Managing Data and Knowledge in a Data-rich World
Slide 2
ASPIRE Data Panel
› Gill Davies – Online music performances› Antonella Fresa - DCH› Jens Jensen - HEP› Andrew Lyall – Biomed› Roshene McCool - Astronomy› Rosette Vandenbroucke
Slide 2
Slide 3
Work method
› Per discipline: List data creation/handling and associated requirements now and in the next 10 years
› Select aspects that are important for the represented disciplines
› Describe important future data and data handling expectations and common requirements
› Formulate recommendations
Slide 3
Slide 4
Aspects and type of data not covered
› Many more data aspects exist› Not possible to handle them all
› Other scientific disciplines› Twitter and blog data› Social sites data› Logs of mobile phone use› ...
Slide 4
Slide 5
Data aspects considered
› NetworkingBandwidth requirements, storage, mirrors, preservation, disaster recovery, costs
› Middleware› Meta data› AAI› Data policies
availability, replication› Data origin
authentication of source, integrity
Slide 5
Slide 6
NetworkingBandwidth (1)
› 3 models observed:SKA/HEP model
Tier structure
HG-DCH modeldata transfer between large centers/depositoriesvery large number of “small” users
Musical Performance modelsmall amount of datanetwork latency important
Slide 6
Slide 7
NetworkingBandwidth (2)
› Shared general concernNetwork links below required bandwidth
- too expensive- network link not available where needed- no permission to connect to the national research network
Cost issues: - bandwidth now available for free may incur tariffs in the future- very high bandwidth and/or dedicated lightpaths requirements can lead to high costs- some regions/countries have more expensive connections- Last mile
Slide 7
Slide 8
NetworkingStorage, mirrors, preservation, Disaster recovery
› Not all data can be stored or preserved› Preservation schemes in study› Replication of data sometimes inherent in the data
structure› Disaster recovery: not often explicitly addressed
Slide 8
Slide 9
Middleware
› Middleware very much discipline specific.› Expectation for generic solutions
Slide 9
Slide 10
Metadata
› Very important› Used by all
› Many standards exist !
› Definition and usage per discipline› No consideration for cross-disciplinary use
Slide 10
Slide 11
AAI
› Everyone agrees about the need for a globally accepted AAI system
› No consensus on how to do
› e-IRG has made recommendations for such an AAI system
› Federations of authentication and eduGAIN are an excellent move in that direction
Slide 11
Slide 12
Data Policies
› Availability of data› Policies on data access discipline specific› General tendency to move to “open data”
› “open data” cannot always be done, due to › the costs of generating the data› The costs of storage and curation› data confidentiality
Slide 12
Slide 13
Data origin
› Integrity and source authentication are important› No general mechanism for data-source
authentication› Metadata can help› In some disciplines data is only relevant to experts,
so considered as quite safe› Authentication by a unique digital signature at
creation
› Source authentication can add costs
Slide 13
Slide 14
DATA
› GROWING in every discipline
putting higher requirements on all aspects we have looked at
Slide 14
Slide 15
Recommendation 1Network related
- Collaboration between user communities and NRENs, GÉANT, ... to understand network requirements associated with the data deluge
- Adequate network services made available timely and economically viable
- All important network parameters have to be studied (speed, throughput, privacy, persistence of connection, cost, ...)
Slide 15
Slide 16
Recommendation 2standardisation of
datasets and metadata› Define standardised data sets:
› To profit from economy of scale fro cross-discipline middleware
› Define standardised data sets, metadata, middleware and applications› For easier accessibility of data
› Adopt a common metadata standard that takes into account multi-disciplinary use of data
Slide 16
Slide 17
Recommendation 3AAI
› Adopt a globally recognised AAI based on standards for the exchange of assertions and security tokens that can be used by all (user communities, e-infrastructure providers, ICT providers, ...)
Slide 17
Slide 18
Recommendation 4Data origin
› Create common mechanisms and procedures for all disciplines to certify and authenticate data.
Slide 18
Slide 19
Recommendation 5preservation,
curation › Facilitate collaboration between disciplines to
create common policies, procedures and tools to assist in the curation and preservation of data.
Slide 19