a data centre for science and industry roadmap. innovation networking data processing data...
TRANSCRIPT
Challenges
• Technological– High speed network providing high bandwidth and low latency– High performance computing– Massive data sources
• Organizational– Inclusiveness and Capillarity to access shared resources
• Economical– High investments needed to match technological
requirements– Resources may have to be shared among several actors as
research and academy will never have enough money to sustain by themselves the required infrastructural development
Data Processing:National Laboratory for HPC
Consortium of seven universities and REUNA. Nine more universities joining the consortium in 2014.
Mission• To consolidate a national facility for HPC by offering top
quality services and advanced training to answer the national demand for scientific computing, developing links between research groups, the industry and the public sector.
Vision• Participants visualize the NLHPC as a highly competitive center
with a range of research services in world-class-quality high performance computing.
Networking: REUNA
• Since a few years REUNA has aggressively– Pursued a plan to put its infrastructure
at the leading edge of the technology• Leased bandwidth –> Lambdas –>
Dark fiber– Extended its infrastructure to connect
with fibre the main research centres in Chile
• Has made a joint effort with advanced computing initiatives to integrate them into the network
The Data Challenge• Data are being produced at an exponentially growing rate
• Data production and collection is expensive and requires that:– Data must be accessible– Data must be preserved for a very long time (many decades)
during operations…– … and often beyond the end of the specific project that
provided the funding
• Historical data could be more important than the actualized data
The Data Challengefor Science
• … The above remarks apply …
• Moreover– Long time data maintenance cost is beyond the economical
possibility of any science– Nobody can foresee what could be the relevance of data
collected today with respect of theories to come
• Many sciences have not yet started to consider the implications of data storage at the global level– Medicine– Pharmacy– …
Scientific Data Repository Requirements
• To be connected to a national research and education network infrastructure ensuring access to/from all relevant actors (science data producers, scientific community, academia, …)
• High capacity communication backbone to remove the need to have computing (processing) resources co-located with the storage facilities
• To be designed to last for many decades, even beyond the boundaries of the original funding of a scientific facility (i.e. economically sustainable in the long term)
• To have data backed up routinely as it is valuable and expensive to recapture
• To take into account physical media renewal
The Data Challengefor Private Enterprise
• Data archival is not within the core business of the large majority of the enterprises, although data analysis is still required to increase profit
• Data analysis requires access to a broad range of competences not easily available within each enterprise domain
• External competences are often not the right solution
• Small to Medium size companies face an economical barrier as they do not have funds to invest in R&D digital infrastructure
VisionA Data Centre for Science AND Industry will drive innovation by:• enabling access to a massive heterogenous
collection of data to both scientist and private entreprenuers
• supporting the development of mathematical models, computing technologies and software solutions accross disciplines
• allowing cost efficient access of a wider range of users to modern technology data storage
Funding Model
• Capital Investment (CAPEX) shared at (say) 50% between scientific partners and private sector.
• Operational costs (OPEX) bared only by the private partner(s) with (say) 30% of infrastructure reserved to scientific use.
• Therefore: – No operational cost for scientific institutions– Reduced investment costs for the commercial
partner(s)
A Data Centre for Science and Industry
• Shall provide the capacity to store 1 EB (1000 PB) worth of data
• Shall connect to the global academic network through a dedicated backbone
• Shall connect in a transparent way to open internet exchange points for commercial access
• Shall have a reduced environmental impact
Phase 1 – Pilot
• Target storage capacity: 100 PB• Target network capacity: 100 Gbps• Target completion time: 2- 3 years• Physical space: ~ 500 m2
– Location must take into account need for minimizing environmental impact
– Use a modular approach to minimize expansion costs• Power consumption: ~ 0.5 MW
– Non conventional renewable energy source– Backed up with conventional (renewable) energy source
• Estimated investment: ~ USD 20 to 30 M (including space, connectivity, storage, power, etc.)
Phase 2 – Full scale DC
• Target storage capacity: 1000 PB• Target network capacity: multiple 100
Gbps• Target completion time: 5 to 7 years
according to demand and fund availability
Other DC Related Initiatives
• Focused in astronomical data• Focused in creating initial competences• Use existing facilities (universities)• Use existing REUNA capacity• Funded by academic research grants• Development within one to two years
At present there is more than one initiative being developed: to coordinate with them will
create sinergy and increase effectiveness
WBS
• Project Magement• Legal framework• Construction– Site Infrastructure– Network– Power– Storage
• Operational Model• Local Community• Outreach
Local Community
• Environmental impact• Cultural aspects• Byproduct benefits– work opportunities– visibility– local connectivity– …