uva sils 20161017 - mkuzak.github.io · natalie danezi large scale computing & big data...
TRANSCRIPT
![Page 1: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/1.jpg)
Natalie Danezi <[email protected]>
Large scale computing & Big DataSURFsara e-infrastructures
Swammerdam Institute for Life Sciences Workshop, 17 Oct 2016
![Page 2: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/2.jpg)
Swammerdam Institute for Life Sciences Workshop
SURF family
Shared Professional and Educational Services
Scientific Computing & Storage
Commercial ICT Products & Services
National Research & Education Network
eScience Collaboration and Tools
2
![Page 3: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/3.jpg)
Swammerdam Institute for Life Sciences Workshop
High Performance Computing(HPC)
in research
3
![Page 4: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/4.jpg)
Swammerdam Institute for Life Sciences Workshop
Does my research fit in HPC?
• Faster results• Task repetition• Higher accuracy• Larger computational domains• Larger volume of data
4
http://www.advancedgwt.com/groundwater-software/data-management-and-visualization/groundwater-desktop.html
![Page 5: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/5.jpg)
Swammerdam Institute for Life Sciences Workshop5
The HPC Infrastructure
![Page 6: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/6.jpg)
Swammerdam Institute for Life Sciences Workshop
More users, more resources
6
Animation: EGI - SURFsara MOOC 2014
http://web.grid.sara.nl/mooc/animations/single_user.html
![Page 7: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/7.jpg)
Swammerdam Institute for Life Sciences Workshop
Lisa cluster: batch processing
7
Animation: EGI - SURFsara MOOC 2014
http://web.grid.sara.nl/mooc/animations/cluster.html
![Page 8: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/8.jpg)
Swammerdam Institute for Life Sciences Workshop
Lisa cluster: batch processing
8
• Calculation intensive applications• Not data intensive applications• Well supported software stack• Relatively easy to start
![Page 9: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/9.jpg)
Swammerdam Institute for Life Sciences Workshop
Life Science Grid: cluster of clusters
9
Animation: EGI - SURFsara MOOC 2014
http://web.grid.sara.nl/mooc/animations/wms.html
![Page 10: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/10.jpg)
Swammerdam Institute for Life Sciences Workshop
• Resources meant for life science researchers
• 11 local clusters (AMC, LUMC, WUR, TUD, RUG, ..)
• Capacity: +/- 12 000 cpu cores, peta bytes of storage
• Independent tasks: Parameter sweeps, Monte-Carlo, ..
• Linux experience required
10
Life Science Grid: cluster of clusters
![Page 11: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/11.jpg)
Swammerdam Institute for Life Sciences Workshop
Life Science Grid examples
11
BBMRI.nl: a collaborative Dutch project focusing on BioBank enrichment The BIOS project • 6 BioBanks • 4000+ samples • 3 measurement types • 30 TB data
BBMRI.nl BIOS project
BBMRI.nl
• e-Infrastructure for NMR and structural biology
• One of the largest bio* virtual organizations
• Provides access to grid resources through easy to use web pages
We-NMR
BIOS e-Infrastructure
Local storage ErasmusMC
Storage LSG Cluster ErasmusMC
Grid storage
Dual tape copies: Amsterdam & Almere
Data processing: Grid & Cloud
Lightpath connectivity
Biomedical projects @SURF
• CTMM$Transla*onal$Research$IT$(TraIT):$develop$a$long8las*ng$IT$infrastructure$transla*onal$research$
$• BBMRI$will$form$an$interface$between$biological$specimens$and$data$(from$pa*ents$and$European$popula*ons)$and$top8level$biological$and$medical$research.$
• ALS$project$MinE$analysing$and$sharing$data$of$large$cohort$studies$to$discover$gene*c$profiles$
$• Na*onal$project$Data4LifeSciences$
• European$Life$Sciences$Infrastructure$For$Biological$Informa*on$(ELIXIR8NL:$DTL)$$
9
Large Hadron Collider LOFAR GoNL
![Page 12: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/12.jpg)
Swammerdam Institute for Life Sciences Workshop12
Turbulance modelling
Protein structure
eSALSA
Cartesius Supercomputer
• Large memory, fast interconnect, fast and large I/O• European collaborations: PRACE, HPC-Europa• Climate models, cell simulations• Programming experience required
![Page 13: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/13.jpg)
Swammerdam Institute for Life Sciences Workshop
Hadoop How to index the web
13
• Big Data• Exploration/mining of data• Map / Reduce• Programming experience required
![Page 14: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/14.jpg)
Swammerdam Institute for Life Sciences Workshop
HPC Cloud
14
• Flexible & controllable• Microsoft Windows supported• Ideal for 3rd party application
providers• Graphical user interface• User is also the system admin
![Page 15: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/15.jpg)
Swammerdam Institute for Life Sciences Workshop
Data services
15
Mastering the data life cycle with e-infrastructure services
Analyzing)data)
Preserving)data)
Crea2ng)data)
Processing))data)
Giving)access)to)data)
Reusing)data)
011011101
Central(Archive,(
Grid(dCache(Storage,(
B2SAFE/B2SHARE((EUDAT),(
Persistent(IdenCfier(services((EPIC)(
Research(data(NL(
Data(Ingest(Service,((
Lightpaths,(
Normal(channels((sIp,(nfs,(hKp)(
AuthenCcaCon(
AuthorizaCon(
CollaboraCon(tools:((
Beehub,(FileSender(
Supercomputer,((
Lisa(Cluster(
GRID,(((
HPC(Cloud,((
Hadoop(
VisualisaCon(services:(
Collaboratorium,(
(GPU(cluster,((
mobile(setup(
011011101
Enter(a(new((
Cycle,(develop(a(workplan(and(
apply(ICT(soluCons(
NLeScience(center(
integraCon(support((
SURFmarket(
licences/brokering(
6
![Page 16: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/16.jpg)
Swammerdam Institute for Life Sciences Workshop16
My laptop is not enough: where to go?
![Page 17: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/17.jpg)
Swammerdam Institute for Life Sciences Workshop
SURFsara services
• Lisa national compute cluster• Life Science Grid: interconnected clusters across the
Netherlands• Cartesius national Supercomputer• Hanthi Hadoop cluster• Oort HPC Cloud cluster• Central Archive, Beehub, SURFdrive for Data Services• …, Visualisation, Networking, Consultancy, Innovation
17
![Page 18: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/18.jpg)
Swammerdam Institute for Life Sciences Workshop
Getting access
18
• For:
๏ Lisa National Cluster, Cartesius Supercomputer
• Apply via:
๏ IRIS
(NWO grant)
• For:๏ Grid, Hadoop, HPC Cloud, Data Services,
Visualisation
๏ Or, not sure what suits you best?
• Apply via:
๏ https://e-infra.surfsara.nl/
(SURF grant)
![Page 19: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/19.jpg)
Swammerdam Institute for Life Sciences Workshop
Standard support
19
Bring your scientific problem
• We provide advice and support: ๏ Getting access ๏ Best practices ๏ Design & optimisation ๏ Integration to large scale
![Page 20: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/20.jpg)
Swammerdam Institute for Life Sciences Workshop
Trainings, online tutorials
20
![Page 21: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life](https://reader030.vdocuments.site/reader030/viewer/2022041216/5e055385c87e602d946b58f4/html5/thumbnails/21.jpg)
Swammerdam Institute for Life Sciences Workshop21
020 800 1400
Questions?
https://www.surf.nl/en/about-surf/subsidiaries/surfsara/