geocomputing using csc resources...store b2safe b2share hpc archive ida databases research long-term...

177
Geocomputing using CSC resources 2-3.10.2017 1 CSC – Finnish expertise in ICT for research, education, culture and public administration Geocomputing using CSC resources Atte Sillanpää, Seija Sirkiä, Elias Annila, Eduardo Gonzalez, Kylli Ek 2.10-3.10.2017, Espoo

Upload: others

Post on 17-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.20171

CSC – Finnish expertise in ICT for research, education, culture and public administration

Geocomputing using CSC resources

Atte Sillanpää, Seija Sirkiä, Elias Annila, Eduardo Gonzalez, Kylli Ek

2.10-3.10.2017, Espoo

Page 2: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.20172

Reasons for using CSC computing resources

• Computing something takes more than 2-4 hours

• Need for more memory

• Very big datasets

• Keep your desktop computer for normal usage, do computation elsewhere

• Need for a server computer

• Need for a lot of computers with the same set-up (courses)

• Convenient to use preinstalled and maintained software

• Free for Finnish university users / will be free for state research insitutes

Page 3: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.20173

• Taito superclusteroPre-installed softwareoSome GIS data available

• Before using:oMove your data and scriptsoSmall extra adjustments for running your

scripts

• cPouta cloudoThe user sets up the environmentoMore freedom, and more responsibilities

• Before using:oSetting up the computero Installation of softwareoMove your data and scripts

CSC main computing resourcs for GIS users

Page 4: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.20174

08:00 - 8:50 Linux intro

09:00–09:15 Introduction to thecourse

09:15–09:45 Getting accessUser account, project and services,web-based access to CSC's services

09:45–10:00 Coffee break

11:00–12:00 CSC's computingenvironment Different platforms,module system, licensing, storage anddata transfer

12:00–13:00 Lunch break

13:00–14:30Running your jobs, resource-management (a.k.a. batch job)systems

14:30–14:45 Coffee break

14:45–15:30 Running R code in Taito

Program, 2.10.2017

10:00–11:00 How to connectHow to access CSC's computers, NXclient, taito-shell

Taito

Taito

Page 5: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.20175

09:00 - 10:30 Running Python code inTaito

10:30 - 10:45 Coffee break

10:45 - 12:00 GIS software and data inTaito. Using virtual rasters.

12:00 - 13:00 Lunch break

13:00 - 14:30 cPouta, setting up anvirtual machine

14:30 - 14:45 Coffee break

14:45 - 16:00 Running GIS softwarein cPouta

Program, 3.10.2017

Taito cPouta

Page 6: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.20176

Using different GIS-software in Taito

Bash R Python QGIS

GDAL x x x x

GRASS x x (x) x

LasTools x (x) (x) x

SagaGIS x x (x) x

Taudem x (x) (x) x

R spatialpackages

- x - -

Python geopackages

- - x -

Page 7: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.20177

The keys to geocomputing: Change in working style & Linux

ArcGIS, QGIS, …ArcGIS, QGIS, … R, Python, shell scripts, Matlab, …R, Python, shell scripts, Matlab, …

GUIGUIScriptsScripts

Page 8: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.20178

CSC training

• 15.1.-17.1.2018 Automating GIS-processes (Python), Henrikki Tenkanen (HY)

• Feb 2018 R spatial, Marko Kallio (Aalto), Juha Aalto (FMI/HY)

• Spring 2018 Lidar data management

• Linux 1, 2, 3

• Python intro and Python multiprocessing

• R and data analysis

• cPouta

• + webinar recordings on YouTube

Page 9: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.20179

• Keep the name tag visible

• Lunch is served in the same buildingoRoom locked during lunch (lobby open, use

lockers)

• Toilets are in the lobby

• Network:oWIFI: eduroam, Haka authenticationoEthernet cables on the tablesoCSC-Guest accounts

• Username and password forworkstations: given on-site

• Bus stopsoOther side of the street (102,103)→

Kamppi/CenteroSame side, towards the bridge

(194,195/551)→Center/PasilaoBus stops to arrive at CSC at the same

positions, just on opposite sides

• If you came by car: parking is beingmonitored - ask for a temporaryparking permit from the reception (tellwhich workshop you’re participating)

• Visiting outside: doors by thereception desks are open

Practicalities

Page 10: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201710

• Name

• Organization

• Interests in geocomputing

• Goals for the course

Participants

Page 11: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201711

CSC

Page 12: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201712

Non-profitstate enterprise

with specialtasks

Turnoverin year 2015

36.8M€

Headquartersin Espoo,

datacenterin Kajaani,

Finland

Circa

employeesin year 2016

290Ownedby state

(70%)and all Finnish

education higherinstitutions (30%)

Page 13: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201713

Our Customers

Research institutes andorganizations

Organizationsproviding education

Memory organizations,state and public

organizations

Page 14: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201714

Support in All Phases of Research Process

PlanCustomer PortalExpertsGuidesWebsitesTrainingService Desk

Produce& Collect

DataInternationalresourcesModellingSoftwareSupercomputers

AnalyseCloud ServicesData scienceComputingSoftware

StoreB2SAFEB2SHAREHPC ArchiveIDADatabasesResearch long-term preservation(LTP)

Share &Publish

AVAAB2DROPB2SHAREDatabankEtsinFunet FileSender

Page 15: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201715

Services included in Funet membershipo Funet Network Connectionso Funet CERT Information Security Serviceo Vulnerability Scannero Certificate Serviceo eduroam Roaming Access Serviceo Funet FileSender File Sharing Service

Services with additional costso Funet Etuubi Video Management Systemo Funet Silta Video Conferencing MCU Serviceo Funet Tiimi Web Conferencing Systemo Funet light Pathso Router Serviceo Streaming Service

Funet – National and InternationalNetworks and Services

360 000end-users

Ca 80Funet

members

Page 16: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201716

Internationally competitive research environments and e-Infrastructures

Collaboration with majority of European computing centers

• International research network organizations:o NORDUnet, eduGAIN, GÉANT (GN3)

• European research infrastructures and supporting projects:o ELIXIR, CLARIN, ENVRI

• International HPC projects and GRID-organizations:o Nordic e-Infrastructure Collaboration (NeIC), PRACE, EGI-Inspire

• European centres of excellence:• NOMAD, E-CAM

• European e-Infrastructure policy initiatives :o e-Infrastructure Reflection Group (e-IRG), RDA

Page 17: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201717

HPC-Europa3 - travel, learn, network

• Four calls every year until April 2021oVisit a group or invite a (coming) collaborator to your group

• Requirements:oNon-proprietary researchoAffiliated at an EU-country or associated countryoProject can make use of HPC resources

• Provided:oSupport for accommodation, travel costs for 3-13 weeks and

likely a small daily allowanceoResources and support from local HPC-center

• http://www.hpc-europa.org/

Page 18: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201718

CSC’s data center in Kajaani

• CSC’s modular data center in Kajaani.Modern and reliable infrastructure (nationalpower grid, roads, airline connections, datanetworks)

• The Funet network ensures excellentnetworking capabilities around the world

• Place for CSC’s next supercomputers withother CSC customer systems

• Cost-Efficient solution – Sustainable andgreen energy supply

Page 19: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201719

Page 20: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201720

Page 21: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201721

Page 22: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201722

CSC’s Computing Capacity 1989–2016

Page 23: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201723

Software and database offered by CSC

• Large selection (over 200) of software and database packages forresearch research.csc.fi/software

• Mainly for academic research in Finland

• Centralized national offering: software consortia, better licenceprices, continuity, maintenance, training and support

Page 24: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201724

CSC – Finnish expertise in ICT for research, education, culture and public administration

Using CSC Environment Efficiently

Atte Sillanpää, Kylli Ek

2.10.2017, Espoo

Page 25: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201725

Getting access to CSC resources

Page 26: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201726

How to get started?

• research.csc.fi

• research.csc.fi/csc-guide

• research.csc.fi/faq-knowledge-base

• www.csc.fi/web/training/materials →CSC-Environment

• Service Desk: [email protected] sbatch job.sh Submitted batchjob 3660241

Page 27: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201727

PersonalProject 2

PersonalProject 1

Getting an Account: The process and framework

• Register to get a User Accounto You get a Personal Projecto This is not for CPU time

• Ask PI to apply for an (or invite to an existing)Academic Projecto PI logs in SUI and invites by your emailo Set as an accountable projecto You can belong to many projects

• Ask PI to apply for a Service e.g. Taito clusteraccesso Accept Terms of Use (link via email)

• Resources are managed at Academic Projectlevelo Quota, Services, Members per project

• Keep your personal details up to date

AcademicProject

UserAccount 1

PI UserAccount

TaitoAccess Sisu Access

UserAccount 2

Quota

CloudAccess

AcademicProject 2

PI UserAccount

Page 28: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201728

Scientist's User Interface (SUI)

Page 29: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201729

SUI – CSC Customer Portal

Web portal for all CSCusers – sui.csc.fi

• Sign up as customer

• Reset your password

• Manage your account

• Apply for an Academicproject

• Apply for computingservices

• Access your data

• Download material

• Watch videos

• Submit jobs

• Monitor hosts andjobs

• Personalize your use

• Message board

• + more

Page 30: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201730

Scientist’s User Interface (SUI)

My Account

• Maintain your account

information

• Change password for

CSC environment

• Define your personal

settings

Page 31: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201731

Scientist’s User Interface (SUI)

Batch Job Script Wizard

• Create job scripts

with easy to use forms

• Save scripts locally or

in CSC $HOME

• Instructions of how to

submit and monitor

Page 32: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201732

Scientist’s User Interface (SUI)

Downloads

• Access material

provided to you by

CSC

• Software installation

packages, manuals,

videos etc.

Page 33: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201733

Scientist’s User Interface (SUI)

Host Monitor

• View statuses and

details of CSC’s

computing servers and

batch systems

• Visualize history of CPU

usage and job count

• Monitor jobs in all hosts

in single view

• Control your own jobs

Page 34: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201734

Scientist’s User Interface (SUI)

My Files

• Access your data in CSC’s storage services in single view(computing servers, IDA and HPC Archive)

• Transfer files

• Search your data

• Submit jobs

• Typical folder and file

operations are supported

Page 35: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201735

Scientist’s User Interface (SUI)

My Projects

• View information and

resource usage of

your CSC projects

• Edit hosts for projects

• Apply resources for

your CSC customer

project

• Resource usage

currently not working

due system changes

Page 36: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201736

Connecting to CSC

Page 37: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201737

Learning targets

• Be aware of different ways of accessing CSC resources

• Log in to Taito with ssh and NoMachine

Page 38: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201738

The (almost) Complete Picture

Access via any of:• Ssh

• NoMachine• Browser (SUI,

cloud, Avaa, …)• Tunneling• HAKA

• iRODS

Page 39: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201739

Direct ssh connection –Linux/Mac

• From UNIX/Linux/OSX command line

• Use –X (or –Y) to enable remote graphics*

• scp : copy file to remote machine

$ ssh –X [email protected]

$ scp file [email protected]:login as: youridLast login: Tue Jul 09 13:14:15 2019 from cool.somewhere.fi┌─ Welcome ───────────────────────────────────────────────────────────────────┐│ CSC - Tieteen tietotekniikan keskus - IT Center for Science ││ HP Cluster Platform SL230s Gen8 TAITO │├─ Contact ───────────────────────────────────────────────────────────────────┤...

* In Windows you’d also need an X-windows emulator, but there is a better way

Page 40: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201740

Access from Windows

• Putty for ssh connectionoCan be installed without admin privileges

• NoMachine for GUIoNeeds Admin privileges for installation and updateoRecommended method (also for Linux/Mac)

• FileZilla/WinSCP for moving dataoEfficient GUI

• Find about other access options and moreinformation at: https://research.csc.fi/taito-connecting

Page 41: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201741

Putty

Page 42: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201742

NoMachine Remote Desktop

• Client connection between user and gateway

• Good performance even with slow network

• Ssh from gateway to server (fast if local)

• Persistent connection

• Suspendableo Continue later at another location

• Read the instructions…o ssh-key, keyboard layout, mac specific

workarounds, …

• Choose an application or server to use (rightclick)

Page 43: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201743

NoMachine

Page 44: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201744

Ascii terminal NoMachine

• Open a terminal on your workstation(right click on backround or select frommenu), then in terminal:

$ ssh [email protected]

(man in the middle?)

$ ls

$ hostname

$ gnuplot

$ plot sin(x) [fails!]

• Open NoMachine client

• Select nxkajaani.csc.fi

• Insert your username and password

• (accept help screens)

• Right click on the background, choosetaito from menu

• Give your CSC password

$ ls

$ hostname

$ …

Page 45: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201745

FileZilla

Page 46: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201746

Summary: How to access resources at CSC

• Ssh terminal connection to CSC (Putty + X-term emulator for win)

• NoMachine Remote desktop (etätyöpöytä)oClient installed at your own computer, working with graphics at CSC

• Scientist’s User Interface (www based) sui.csc.fioAccount management, File manager, software distribution, …

• Cloud services: pouta.csc.fioLots of freedom/flexibility and hence some administration and configuration work

Page 47: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201747

CSC Computing Environment

Page 48: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201748

Learning target

• Know how to choose right server (resource)

• Know where to put your files

• Know how to setup and use preinstalled software

hpc_archive

IDA

Taito.csc.fi

pouta$TMPDIR

Taito-shell.csc.fimodule spiderresearch.csc.fi

iput

?!

Page 49: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201749

On Laptops, Clusters and Supercomputers

• Shared Memory Parallel(SMP):oAll processors access (more

or less) the same memoryoWorking within node or like

your laptop

• Distributed Memory:o Processes access their ownmemoryo Interconnection network forexchangeoWorking with multiple nodes

Page 50: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201750

CSC HPC resources

TAITOSISU

Software

TAITO-SHELL

Data

Software

Jobs viaJobs viaSLURM

Jobs viaJobs viaSLURM

Interactiveusage

GUI

cPouta

GPU

Max 4 cores /128 GB memory

Max 48 cores /240 GB memory

Max 672 cores /1536 GB memory

Max9600 cores

Users buildtheir ownenvironment,inc. softwareand data

2-4 cores /8 GB memory

A laptop

For well-scalingparallel jobs

Page 51: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201751

Pouta Cloud service

Do I need…

Different operating system and softwarestack than CSC’s systems?

To run web services?

To extend my local computing resources?

à http://research.csc.fi/cloud-computing

Page 52: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201752

The module system

• Tool to set up your environmentoLoad libraries, adjust path, set environment

variablesoNeeded on a server with hundreds of

applications and several compilers etc.

• Slightly different on Taito vs. othersystems

• Used both in interactive and batch jobs

Page 53: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201753

Typical module commands

module avail shows available modules (compatible modules in taito)

module spider shows all available modules in taito

module list shows currently loaded modules

module load <name> loads module <name> (default version)

module load <name/version> loads module <name/version>

module switch <name1> <name2> unloads module name1 and loadsmodule name2

module purge unloads all loaded modules

Taito has ”meta-modules” named e.g. r-env, which will load all necessary modules needed to run R.

Page 54: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201754

Module example

• Show compatible modules on Taito$ module avail

• Initialize R and RStudio statistics packages$ module load r-env$ module load rstudio

• Start RStudio using the command$ rstudio

• It’s better to run the GUI (and calculations) on a compute node (jobs thathave used 1h of CPU on the login node will be killedautomatically)

• For interactive work, use taito-shell.csc.fi

> a=seq(0,10,by=0.1)> plot(a,cos(a))

Simple plotting in R

Page 55: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201755

Directories at CSC Environment (1)

Directory or storagearea Intended use Default

quota/user Storage time Backup

$HOME 1Initialization scripts, source codes, small datafiles.Not for running programs or research data.

50 GB Permanent Yes

$USERAPPL 1 Users' own application software. 50 GB Permanent Yes

$WRKDIR 1 Temporary data storage. 5 TB 90 days No

$WRKDIR/DONOTREMOVE Temporary data storage. Incl. in above Permanent No

$TMPDIR 3 Temporary users' files. - ~2 days No

Project 1Common storage for project members. Aproject can consist of one or more useraccounts.

On request Permanent No

HPC Archive 2 Long term storage. 2 TB Permanent Yes

IDA 2 Storage and sharing of stable data. On request PermanentNo, but multiplestorage copies

1: Lustre parallel (3:local) file system in Kajaani 2: iRODS storage system in Espoo

https://research.csc.fi/data-environment

Page 56: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201756

Directories at CSC Environment (2)taito.csc.fi sisu.csc.fi

iRODS interface,disk cache

computenodes

loginnodes

Hpc_archive/IDA Espoo

computenodes

loginnodes

$TMPDIR $TMPDIR

$USERAPPL→ $HOME/xyz$USERAPPL→ $HOME/xyz icp, …

icp, iput, ils, irm

$TMPDIR$TMPDIR$TMPDIR

$WRKDIR$WRKDIR

$HOME$HOME

Yourworkstation

Yourworkstation

scp, WinSCP, FileZilla …

My Files in SUIWeb portal

Cyberduck (Win/Mac) orcommand lineicommandsto access IDA

Page 57: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201757

Storage: hard disks - 4 PB on DDN (Lustre), Sisu and Taito

• $USERAPPL: put your own applications hereo /homeappl/home/username/app_taitoo /homeappl/home/username/app_sisu

• /tmp (Taito, ~2 TB) to be used for e.g. compiling codes on the login node or taito-shell

• $TMPDIR on compute nodes: for scratch files (accessed with $TMPDIR in batch script)

• $HOME for configuration files and misc. smallish storage. If full, gives strange errors (X-graphicsetc.)

• $WRKDIR for large data and during calculations. Avoid lots of small files. Files older than 90 daysare deleted. No backup.

• $WRKDIR/DONOTREMOVE old files not deleted from here – don’t copy files here, but move if youwant to keep them (or hpc_archive)

Page 58: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201758

Storage: disks and tape

• IDA Storage ServiceoCommon storage for project membersoStorage for non-sensitive stable research data (e.g. provides persistent

identifiers, automatic checksums)oEnables public sharing of data on the internetoUsage via SUI, command line or file transfer programoQuota available from universities, universities of applied sciences and

Academy of FinlandoApply on the web http://openscience.fi/becoming-an-ida-user

• hpc_archive ServiceoTape (+ disk cache)oDefault long term storageoAccess with i-commands from Sisu/Taito

Page 59: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201759

hpc_archive/IDA interface at CSC

Some iRODS commandso iput file copy file to hpc_archive/IDAo iget file copy file from …/IDAo ils list the current IDA directoryo icd dir change the IDA directoryo irm file remove file from IDAo imv file file move/rename file inside IDAo imkdir foo create a directory foo to IDAo iinit Initialize your IDA accounto ipwd show current directory in IDA

IDA uses some different commands. See http://openscience.fi/ida-commands

Page 60: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201760

Moving files, best practices

• rsync, not scp (when lots of/big files), zip & tar first$ rsync -P [email protected]:/tmp/huge.tar.gz .

• Funet FileSender (max 50 GB [1GB as an attachment? No!])o https://filesender.funet.fio Files can be downloaded also with wget

• iRODS, batch-like process, staging

• IDA: http://openscience.fi/ida

• CSC can help to tune e.g. TCP/IP parameters

• FUNET backbone 100 Gbit/s

• Webinar Recording on Data Transfer

https://research.csc.fi/csc-guide-moving-data-between-csc-and-local-environment

space!

Page 61: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201761

Data transfer speed

50 Mb / s 25 Mb / s 5 Mb / s

1 kb 0,00002 s 0,00001 s 0,0002 s

1 Mb 0,02 s 0,01 s 0,2

1 Gb 20 s 40 s 3,5 min

1 Tb 5,5 h 11 h 2d 7h

• 50 Mb / s often the realistic fast speed• 25 Mb / s good normal speed• 5 Mb / s realistic speed in mobile network

Page 62: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201762

Learning targets achieved?

• How to choose right server (resource)?oTaito or Taito-shell?

• Where to put your files?o$TMPDIR, $WRKDIR, $USERAPPL, hpc_archive or $HOME?

• How to setup and use preinstalled software/libraries/compilers?oHow do you find out which software is available?oHow do you initialise it?

Page 63: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201763

Running jobs at Taito

Page 64: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201764

Batch jobs learning target

• Benefits of batch jobs for compute intensive jobsoDifference of login and compute node

• How to submit and monitor jobs

• Batch script contents i.e. resource requirements

• How to learn resource requirements of own jobs

• What is saldo [billing units]

• Be aware of batch script wizard in SUI

• Submit first job(s)

• Learn to read the the manual

Page 65: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201765

What is a batch system?

• Optimizes resource usage by filling the server with jobs

• Cores, memory, disk, length, …

• Jobs to run are chosen based on their priority

• Priority increases with queuing time

• Priority decreases with recently used resources

• Short jobs with little memory and cores queue the least

• CSC uses SLURM (Simple Linux Utility for ResourceManagement)

Page 66: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201766

time

Computenode 1

Computenode 2

Individual batch jobs

Number of CPUsBatch job scheduler

places jobs on computenodes

Batch job schedulerplaces jobs on compute

nodes

Computenode 3

Batch cont’d

Page 67: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201767

Compute nodes are used via queuing system

$ sbatch job_script.sh

$ ./my_prog &

$ sbatch job_script.sh

Page 68: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201768

Batch job overview

Ø Steps for running a batch job

1. Write a batch job script• Script details depend on server, check CSC Guides or software page!• You can use the Batch Job Script Wizard in Scientist’s User Interface:o https://sui.csc.fi/group/sui/batch-job-script-wizard

2. Make sure all the necessary files are in $WRKDIR• $HOME has limited space• Login node $TMPDIR is not available on compute

nodes

3. Submit your jobo $ sbatch myscript

Page 69: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201769

Batch Job Script wizard in Scientist’s User Interface

Page 70: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201770

Batch jobs: what and why

Ø User has to specify necessary resourcesØ Can be added to the batch job script or given as command line options for sbatch (or a

combination of script and command line options)

Ø Resources need to be adequate for the jobØ Too small memory reservation will cause the job to failØ When the time reservation ends, the job will be terminated whether finished or not

Ø But: Requested resources can affect the time the job spends in the queueØ Especially memory reservation (and perhaps requested time)Ø Using more cores does not always make the job run faster - check!Ø Don’t request extra ”just in case” (time is less critical than memory wrt this)

Ø So: Realistic resource requests give best resultsØ Not always easy to know beforehandØ Usually best to try with smaller tasks first and check the used resourcesØ You can check what was actually used with the seff command

Page 71: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201771

Saldo and billing units

• All jobs consume saldo

• https://research.csc.fi/saldo

• One core hour of computing equals 2 billing units [bu]times a multiplier

• Jobs requesting less than 4GB of memory per core have amultiplier of 1

• Jobs requesting 4GB or more per core have a multiplierX/4, where X is the requested memory per core:o 5GB/core = 5/4x = 1.25xo 12GB/core = 12/4x = 3xo …

• Requested but not used computing time is not billed

• If saldo runs out, no new jobs are possible

• New saldo can be requested from SUI

Serial job (1 core), 0.5 GB/core of memory,requested 24 hours, used 5 hoursà billed:1*5*2*1=10 bu(failed) parallel job: requested 24 cores,2GB/memory per core, actually used 6 cores(18 cores idle) total run time 10 hoursàbilled 24*10*2*1=480 buParallel job 3 cores, 5 GB/core, 10 hoursàbilled: 3*10*2*5/4=75 bu

00,5

11,5

22,5

3

0 4 8 12

BuM

ultip

lier

GB / core request

Page 72: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201772

SLURM batch script contents

Page 73: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201773

Example serial batch job script on Taito

#!/bin/bash -l#SBATCH -J myjob#SBATCH -e myjob_err_%j#SBATCH -o myjob_output_%j#SBATCH --mail-type=END#SBATCH [email protected]#SBATCH --mem-per-cpu=4000#SBATCH -t 02:00:00#SBATCH -n 1#SBATCH –p serial#SBATCH --constraint=snb

module load myprogsrun myprog -option1 -option2

Page 74: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201774

#!/bin/bash -l

#!/bin/bash -l#SBATCH -J myjob#SBATCH -e myjob_err_%j#SBATCH -o myjob_output_%j#SBATCH --mail-type=END#SBATCH [email protected]#SBATCH --mem-per-cpu=4000#SBATCH -t 02:00:00#SBATCH -n 1#SBATCH –p serial

module load myprogsrun myprog -option1 -option2

Ø Tells the computer this is a script that shouldbe run using bash shell

Ø Everything starting with ”#SBATCH” is passedon to the batch job system (Slurm)

Ø Everything (else) starting with ”# ” isconsidered a comment

Ø Everything else is executed as a command

Page 75: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201775

#SBATCH -J myjob#!/bin/bash -l#SBATCH -J myjob#SBATCH -e myjob_err_%j#SBATCH -o myjob_output_%j#SBATCH --mail-type=END#SBATCH [email protected]#SBATCH --mem-per-cpu=4000#SBATCH -t 02:00:00#SBATCH -n 1#SBATCH –p serial

module load myprogsrun myprog -option1 -option2

Ø Sets the name of the job

Ø When listing jobs e.g. with squeue, only8 first characters of job name aredisplayed.

Page 76: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201776

#SBATCH -e myjob_err_%j#SBATCH -o myjob_output_%j

#!/bin/bash -l#SBATCH -J myjob#SBATCH -e myjob_err_%j#SBATCH -o myjob_output_%j#SBATCH --mail-type=END#SBATCH --mail-

[email protected]#SBATCH --mem-per-cpu=4000#SBATCH -t 02:00:00#SBATCH -n 1#SBATCH –p serial

module load myprogsrun myprog -option1 -option2

Ø Option –e sets the name of the file where possible errormessages (stderr) are written

Ø Option –o sets the name of the file where the standard output(stdout) is written

Ø When running the program interactively these would bewritten to the command promt

Ø What gets written to stderr and stderr depends on theprogram. If you are unfamiliar with the program, it’s alwayssafest to capture both

Ø %j is replaced with the job id number in the actual file name

Page 77: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201777

#SBATCH --mail-type=END#SBATCH [email protected]

#!/bin/bash -l#SBATCH -J myjob#SBATCH -e myjob_err_%j#SBATCH -o myjob_output_%j#SBATCH --mail-type=END#SBATCH [email protected]#SBATCH --mem-per-cpu=4000#SBATCH -t 02:00:00#SBATCH -n 1#SBATCH –p serial

module load myprogsrun myprog -option1 -option2

Ø Option --mail-type=END = send email whenthe job finishes

Ø Option --mail-user = your email address.

Ø If these are selected you get a email messagewhen the job is done. This message also has aresource usage summary that can help in settingbatch script parameters in the future.

Ø To see actually used resources try also: sacct–l –j <jobid> (more on this later)

Page 78: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201778

#SBATCH --mem-per-cpu=4000#!/bin/bash -l#SBATCH -J myjob#SBATCH -e myjob_err_%j#SBATCH -o myjob_output_%j#SBATCH --mail-type=END#SBATCH [email protected]#SBATCH --mem-per-cpu=4000#SBATCH -t 02:00:00#SBATCH -n 1#SBATCH –p serial

module load myprogsrun myprog -option1 -option2

Ø The amount of memory reserved for the job in MB• 1000 MB = 1 GB

Ø Memory is reserved per-core basis even forshared memory (OpenMP) jobs

• For those jobs it is better to ask memory per job:• --mem=1000

Ø Keep in mind the specifications for the nodes. Jobs withimpossible requests are rejected (try squeue after submit)

Ø If you reserve too little memory the job will be killed (you willsee a corresponding error in the output)

Ø If you reserve too much memory your job will spend muchlonger in queue and potentially waste resources (idle cores)

Page 79: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201779

#SBATCH -t 02:00:00

TIP: If you’re unsure of thesyntax, use Batch job wizardin SUI

#!/bin/bash -l#SBATCH -J myjob#SBATCH -e myjob_err_%j#SBATCH -o myjob_output_%j#SBATCH --mail-type=END#SBATCH --mail-

[email protected]#SBATCH --mem-per-cpu=4000#SBATCH -t 02:00:00#SBATCH -n 1#SBATCH –p serial

module load myprogsrun myprog -option1 -option2

Ø Time reserved for the job in hh:mm:ss

Ø When the time runs out the job will be terminated!

Ø With longer reservations the job may queue longer

Ø Limit for normal serial jobs is 3d (72 h)• if you reserve longer time, choose ”longrun” queue (limit 14d)• In the longrun queue you run at your own risk. If a batch job

in that queue stops prematurely no compensation is given forlost cpu time

• In longrun you likely queue for a longer time: shorter jobsand restarts are better (safer, more efficient)

• Default job length is 5 minutesà need to be set byyourself.

Page 80: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201780

#SBATCH -n 1

#!/bin/bash -l#SBATCH -J myjob#SBATCH -e myjob_err_%j#SBATCH -o myjob_output_%j#SBATCH --mail-type=END#SBATCH --mail-

[email protected]#SBATCH --mem-per-cpu=4000#SBATCH -t 02:00:00#SBATCH -n 1#SBATCH –p serial

module load myprogsrun myprog -option1 -option2

Ø Number of cores to use. More than one means parallel.

Ø It’s also possible to control on how many nodes your jobis distributed. Normally, this is not needed. By default useall cores in allocated nodes:

Ø --ntasks-per-node=16 #(Sandy Bridge)Ø --ntasks-per-node=24 #(Haswell)

Ø Check documentation: http://research.csc.fi/softwareØ There’s a lot of software that can only be run in

serial

Ø OpenMP applications can only use cores in one node

Page 81: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201781

#SBATCH -p serial

[asillanp@taito-login4 ~]$ sinfo -lWed Jan 28 15:45:39 2015PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT SHARE GROUPS NODES STATE NODELISTserial* up 3-00:00:00 1 no NO all 1 draining c623serial* up 3-00:00:00 1 no NO all 101 mixed c[25,76-77,…serial* up 3-00:00:00 1 no NO all 593 allocated c[3-24,26-75,…serial* up 3-00:00:00 1 no NO all 226 idle c[211-213,…parallel up 3-00:00:00 1-28 no NO all 1 draining c623parallel up 3-00:00:00 1-28 no NO all 101 mixed c[25,76-77,…parallel up 3-00:00:00 1-28 no NO all 593 allocated c[3-24,26-75,…parallel up 3-00:00:00 1-28 no NO all 226 idle c[211-213,…longrun up 14-00:00:0 1 no NO all 1 draining c623longrun up 14-00:00:0 1 no NO all 101 mixed c[25,76-77,…longrun up 14-00:00:0 1 no NO all 587 allocated c[3-24,26-75,…longrun up 14-00:00:0 1 no NO all 226 idle c[211-213,…test up 30:00 1-2 no NO all 4 idle c[1-2,984-985]hugemem up 7-00:00:00 1 no NO all 2 mixed c[577-578]

#!/bin/bash -l

#SBATCH -J myjob

#SBATCH -e myjob_err_%j

#SBATCH -o myjob_output_%j

#SBATCH --mail-type=END

#SBATCH [email protected]

#SBATCH --mem-per-cpu=4000

#SBATCH -t 02:00:00

#SBATCH -n 1

#SBATCH –p serial

module load myprog

srun myprog -option1 -option2

Ø The queue the job should be submitted toØ Queues are called ”partitions” in SLURMØ You can check the available queues with command

sinfo -l

Page 82: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201782

#SBATCH --constraint=snb

#!/bin/bash -l#SBATCH -J myjob#SBATCH -e myjob_err_%j#SBATCH -o myjob_output_%j#SBATCH --mail-type=END#SBATCH --mail-

[email protected]#SBATCH --mem-per-cpu=4000#SBATCH -t 02:00:00#SBATCH -n 1#SBATCH –p serial#SBATCH –-constraint=snb

module load myprogsrun myprog -option1 -option2

Ø The job is run only in Sandy Bridge (snb) nodesØ The other option is Haswell node (hsw) or

Ø #SBATCH --constraint=hsw

Ø Either that is free ”snb|hsw”Ø #SBATCH --constraint=”snb|hsw”

Ø Currently the default is to use either architecture inserial and longrun partitions

Ø Sandy Bridge in test and parallelØ A single job cannot use CPUs from both

architectures, but SLURM will take care of this

Page 83: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201783

module load myprogsrun myprog -option1 -option2 #!/bin/bash -l

#SBATCH -J myjob#SBATCH -e myjob_err_%j#SBATCH -o myjob_output_%j#SBATCH --mail-type=END#SBATCH --mail-

[email protected]#SBATCH --mem-per-cpu=4000#SBATCH -t 02:00:00#SBATCH -n 1#SBATCH –p serial

module load myprogsrun myprog -option1 -option2

Ø Your commands• These define the actual job to performed: these commands

are run on the compute node.• See application documentation for correct syntax• Some examples also from batch script wizard in SUI

Ø Remember to load modules if necessaryØ By default the working directory is the directory where you

submitted the job• If you include a cd command, make sure it points to correct

directory

Ø Remember that input and output files should be in$WRKDIR (or in some case $TMPDIR)

Ø $TMPDIR contents are deleted after the jobØ srun tells your program which cores to use. There are also

exceptions…

Page 84: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201784

Most commonly used sbatch options

Slurm option Description--begin=time defer job until HH:MM MM/DD/YY-c, --cpus-per-task=ncpus number of cpus required per task-d, --dependency=type:jobid defer job until condition on jobid is satisfied-e, --error=err file for batch script's standard error--ntasks-per-node=n number of tasks per node-J, --job-name=jobname name of job--mail-type=type notify on state change: BEGIN, END, FAIL or ALL--mail-user=user who to send email notification for job state changes-n, --ntasks=ntasks number of tasks to run-N, --nodes=N number of nodes on which to run-o, --output=out file for batch script's standard output-t, --time=minutes time limit in format hh:mm:ss--mem-per-cpu=<number in MB> maximum amount of real memory per allocated cpu (core)

required by the job in megabytes--mem=<number in MB> maximum memory per node

Page 85: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201785

SLURM:Managing batch jobs in Taito

Page 86: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201786

Task 1Task 1

Task 2Task 2

Task 3Task 3

Task 1Task 1

Task 2Task 2

Task 3Task 3

Input 1Input 1 Input 2Input 2

Simple job

Output 1Output 1 Output 2Output 2

Array job Parallel job 1

Input 1Input 1 Input 2Input 2

Task 1Task 1

Task 2Task 2

Task 3Task 3

Task 1Task 1

Task 2Task 2

Task 3Task 3

Output 1Output 1 Output 2Output 2

InputInput

Task 1Task 1

Task 2Task 2

Task 3Task 3

Task 1Task 1

Task 3Task 3

OutputOutput

Task 4Task 4

Parallel job 2

Task 1Task 1

Task 2Task 2

Task 3Task 3

Task 1Task 1

Task 3Task 3

Task 4Task 4

Input 1Input 1 Input 2Input 2

Output 1Output 1 Output 2Output 2

Task 2Task 2

Page 87: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201787

Simple job

• Pros: Easy, justcheck paths andpackages.

• Cons: Slowest.

Simple job Array job Parallel job

• Pros: Relativelyeasy, just checkpaths and packages,and add usingcommand linearguments. SLURMtakes care of the jobmanagement

• Cons: Applicableonly in certain cases

• Pros: All kind of usecases can be done.

• Cons: Requiresmodifying code.More complicatedcode, because thejob management isdone in the code.

Page 88: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201788

Multi-core processing support in GIS-software

Most GIS software won't support parallel processing out of the box, but a few do:

• Taudem, works in Taito

• GRASS

• R

• LasTools (Windows)

• ArcGIS Pro (Windows)

• Some others: https://research.csc.fi/geocomputing

Only some functions support multiprocessing,not tested in Taito / cPouta yet..

Page 89: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201789

Array jobs for GIS

l Same analysis for:l different input files / map sheetsl different scenariosl different variablesl different time periods

l Suits well if the jobs are independent of each other, for example raster calculator.

l In many cases if using map sheets the borders need special care.

Page 90: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201790

Parallel jobs for GIS

• You have to divide the job to tasks:oFor vector data it might mean handling the data in chunks.

o For example, if calculating zonal statistics for 10 000 polygons, dividing them 1000polygon chunks might work well.

oFor raster data some kind of areal division is often most logical.oFor some repeated analysis it might be easy just to divide the queries to different

tasks.o For example, if routing or geocoding 1 000 000 addresses, dividing them 10 000

queries per task might work well.

• You normally have to test, what is a good size for data for one task. Onetask should take at least ca 10 min.

Page 91: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201791

Be careful with simultaneous opening of files

• If you are using the same input or output files in array orparallel jobs, depending on software it might cause trouble.

• Solutions:oMake copies of input files for each processes you use.oWrite output to smaller files first, join them later

Page 92: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201792

Submitting and cancelling jobs

• The script file is submitted with command

$ sbatch batch_job.file

• Job can be deleted with command

$ scancel <jobid>

Page 93: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201793

Queues

• The job can be followed with command squeue:

$ squeue (shows all jobs in all queues)$ squeue –p <partition> (shows all jobs in single queue (partition))$ squeue –u <username> (shows all jobs for a single user)$ squeue –j <jobid> –l (status of a single job in long format)

• To estimate the start time of a job in queue

$ scontrol show job <jobid>

row "StartTime=..." gives an estimate on the job start-up time, e.g.StartTime=2014-02-11T19:46:44 EndTime=Unknown

• scontrol will also show where your job is running

• If you add this to the end of your batch script, you’ll get additional info to stdout about resourceusage

seff $SLURM_JOBID

Page 94: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201794

[erkki@taito]$ seff 52000797Job ID: 52000797Cluster: cscUser/Group: erkki/cscState: COMPLETED (exit code 0)Nodes: 4Cores per node: 16CPU Utilized: 00:37:22CPU Efficiency: 87.58% of 00:42:40 core-walltimeMemory Utilized: 7.53 GB (estimatedmaximum)Memory Efficiency: 3.21% of 234.38 GB(58.59 GB/node)

[erkki@taito]$ seff 52000798_6Job ID: 52000798Array Job ID: 52000798_6Cluster: cscUser/Group: erkki/cscState: COMPLETED (exit code 0)Nodes: 1Cores per node: 4CPU Utilized: 00:24:09CPU Efficiency: 98.17% of 00:24:36core-walltimeMemory Utilized: 23.50 MBMemory Efficiency: 1.15% of 2.00 GB

Examples of seff outputs

Comments: only small part of memory used, couldrequest less (now used the default 0.5GB/core), but for aparallel job like this, it’s better to request full nodesanyway.

Comments: only small part of memory used, could requestless (now used the default 0.5GB/core). Theoretically the jobused only 23.5/4=6MB/core, but asking for e.g. 100MB/core(for safety) would likely make the job queue less.

Page 95: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201795

Job logs

• Command sacct can be used to study past jobsoUseful when deciding proper resource requests

$ sacct Short format listing of jobs starting frommidnight today

$ sacct –l long format output$ sacct –j <jobid> information on single job$ sacct –S YYYY-MM-DD listing start date$ sacct –u <username> list only jobs submitted by username$ sacct –o list only named data fields, e.g.

$ sacct -o jobid,jobname,maxrss,reqmem,elapsed -j <jobid>

TIP: Check MaxRSS to see howmuch memory you need andavoid overbooking. See alsocommand seff JOBID

Page 96: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201796

Available nodes/queues and limits

• You can check available resources per node in each queue:

$ sjstat -c-------------------------------------------------------------Pool Memory Cpus Total Usable Free Other Traits-------------------------------------------------------------serial* 258000Mb 24 10 10 5 hsw,haswellserial* 64300Mb 16 502 502 9 snb,sandybridgeserial* 258000Mb 16 14 14 0 bigmem,snb,sandybridgeserial* 128600Mb 24 395 395 6 hsw,haswellparallel 258000Mb 24 10 10 5 hsw,haswellparallel 64300Mb 16 502 502 9 snb,sandybridgeparallel 258000Mb 16 14 14 0 bigmem,snb,sandybridgeparallel 128600Mb 24 395 395 6 hsw,haswelllongrun 258000Mb 16 8 8 0 bigmem,snb,sandybridgelongrun 258000Mb 24 10 10 5 hsw,haswelllongrun 64300Mb 16 502 502 9 snb,sandybridgelongrun 128600Mb 24 395 395 6 hsw,haswelltest 64300Mb 16 2 2 2 snb,sandybridgetest 128600Mb 24 2 2 2 hsw,haswellhugemem 1551000Mb 32 2 2 0 bigmem,snb,sandybridgehugemem 1551000Mb 40 4 4 1 bigmem,hsw,haswell,ssd

Page 97: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201797

Most frequently used SLURM commands

Command Descriptionsrun Run a parallel job.salloc Allocate resources for interactive use.sbatch Submit a job script to a queue.scancel Cancel jobs or job steps.sinfo View information about SLURM nodes and partitions.squeue View information about jobs located in the SLURM scheduling queuesmap Graphically view information about SLURM jobs, partitions, and set

configurations parameterssjstat Display statistics of jobs under control of SLURM (combines data from sinfo,

squeue and scontrol)scontrol View SLURM configuration and state.sacct Displays accounting data for batch jobs.

Page 98: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201798

Array jobs

• #SBATCH --array=1-100• Defines to run 100 jobs, where a variable$SLURM_ARRAY_TASK_ID gets each number (1,2,…100) in turnas its value. This is then used to launch the actual job (e.g.

• $ srun myprog input_$SLURM_ARRAY_TASK_ID >output_ $SLURM_ARRAY_TASK_ID)

• Thus this would run 100 jobs:srun myprog input_1 > output_1srun myprog input_2 > output_2…srun myprog input_100 > output_100

• For more information: research.csc.fi/taito-array-jobs

Page 99: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.201799

Array jobs, output files

• %j = job number

#SBATCH -o array_job_out_%j.txt

#SBATCH -e array_job_err_%j.txt

• %A = ID of the array job and %a = $SLURM_ARRAY_TASK_ID

#SBATCH -o array_job_out_%A_%a.txt

#SBATCH -e array_job_err_%A_%a.txt

Page 100: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017100

CSC – Finnish expertise in ICT for research, education, culture and public administration

R for spatial analysis in Taito

Seija Sirkiä, Kylli Ek

2.10.2017, Espoo

Page 101: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017101

R spatial

R strong sides– Statistical analysis– Raster analysis– Time-series analysis– Faceted maps– Repeating workflows– Usage of external functions– RStudio

R weak sides– Routing– 3D vector models– Point clouds– Interactiv map usage– Data editing– Steep learning curve

Page 102: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017102

R in Taito, rspatial-env 1

This is an convinience module for loading easily spatial analysisrelated R tools. It loads the following software:

• module load rspatial-env

• R (3.4.1) with spatial packages:ogeoR, geoRglm, geosphere, ggmap, grid, gstat, GWmodel, mapproj,

maptools, ncdf4, RandomFields, raster, rgdal, rgeos, rgrass7, RSAGA, sf, sp,spacetime, spatial, spatial.tools, spatstat, spdep, strucchange.

Page 103: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017103

R in Taito, rspatial-env 2

• rgdal - GDAL/OGR (2.2.1), Proj4 (4.9.3)

• rgeos - GEOS (3.6.1)

• RSAGA* - Saga GIS (2.1.4)

• rgrass7 - GRASS GIS (7.2.0, without GUI)* RSAGA gives error messages: "Error: select a tool", but it seems to be a bug of SAGA.According to our tests it seems that RSAGA commands are working despite the message.

* RSAGA io_gdal library does not work, use gdal_translate from gdalUtils.

Page 104: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017104

rspatial-env and Rstudio

• If you open Rstudtio in NoMachine from the menu, it will not have rspatial-env loaded.

• To open Rstudio with rspatial-env loaded

module load rspatial-env rstudio

rstudio

Page 105: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017105

Installing R libraries

• Everybody can add R libraries for personal useoThese override the system ones

• Normally similar to local installations:o install.packages("pkgname")oor using Rstudio menus

• If you think that the package should be useful also for others or requires someexternal software to be installed, ask [email protected]

Page 106: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017106

Running R in parallel — principles and practice

Page 107: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017107

"My R code is slow... what can be done?"

• You should seek to have an understanding which part of yourcode is the time consuming one, and why, and howothe function system.time() can helpounderstanding the details of the method you are using helpsoknowing how to run a smaller version of the problem helps a lotounderstanding even the basics of time complexity helps

• Always be suspicious of for-loops!

• Having more memory almost never helps

• Going parallel may help

Page 108: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017108

• Note on vocabulary: processor, CPU, core, chip andnode may or may not mean same or different things

• With respect to workload cores are not likelocomotives but more like trucks: several cores cannot work on the exact same task

• Simply taking your ordinary R and R code to alaptop with 8 instead of 1 otherwise similar coresgives no speed boost at all.

• Somebody (possibly you) has to do something todivide the workload

Mere presence of more cores does nothing

By Douglas W. Jones - Own work, Public Domain,https://commons.wikimedia.org/w/index.php?curid=7756596

CC BY-SA 3.0,https://commons.wikimedia.org/w/index.php?curid=228396

Page 109: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017109

Parallel scaling

• Unfortunately, increasing the number of cores will neverdecrease the time needed in the same proportion

• Not all parts of the program will benefit from parallelcomputing (Amdahl's law)

• Parallel computing introduces extra work that wouldn't needto be done when computing serially (overhead)

• Due to these facts increasing the number of cores may helponly up to a point but no further, or even not at all, and it maydepend on many things such as data size

Page 110: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017110

Level zero: not even parallel, just simultaneous

• Can you imagine going around in a computer classroom, firingup R on every computer and letting each one work on part ofyour problem?osuch as each fitting the same model to 20 different data sets or with 20

different combinations of parameters

• Do you also have access to a cluster with R installed on it?Good! Then you can do the same without even getting up fromyour chair.

• Upside: really easy. Downside: limited usabilityArray jobs

in Taito

Page 111: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017111

Your own laptop vs. a remote cluster

• The cores on your own laptop or desktop computer are(presumably) yours to use when ever and how you please

• A big cluster or supercomputer (such as CSC's Taito and Sisu)may have thousands of users and at any given time itsresources are at least partially in use by someone else

• For these cases, resource allocation and batch job systemsexist which add an extra layer of things to know about

SLURMin Taito

Page 112: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017112

Array job with R, giving the mapsheet path as argument

#!/bin/bash

#SBATCH -J array_job

#SBATCH -o array_job_out_%j.txt

#SBATCH -e array_job_err_%j.txt

#SBATCH -t 00:02:00

#SBATCH --mem-per-cpu=4000

#SBATCH --array=1-3

#SBATCH -n 1

#SBATCH -p serial

module load rspatial-env

# move to the directory where the data files locate

cd ~/git/geocomputing/R/contours/array

# set input file to be processed

name=$(sed -n "$SLURM_ARRAY_TASK_ID"p ../mapsheets.txt)

# run the analysis command

srun Rscript Calc_contours.R $name

Page 113: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017113

Array jobs with R, reading the argument in R script

args = commandArgs(trailingOnly=TRUE)

if (length(args)==0) {

stop("Please give the map sheet number", call.=FALSE)

} else if (length(args)==1) {

mapsheet <- args[1]

}

Page 114: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017114

Running R code in parallel

• There are several R packages for parallel computing:oparallel (snow)o foreach and doMPIoRmpi

Page 115: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017115

snow (parallel)

• This R package offers support for simple parallel computing inR, following the master - workers paradigm

master

worker worker worker worker

Page 116: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017116

parallel (snow), batch job file

The execution command that needs to be added to the end ofthe batch job file given above is like this:

srun RMPISNOW --no-save -f myrscript.R

Page 117: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017117

parallel (snow), R script

The script should contain a getMPIcluster call which will produce the reference to thecluster that can be given to various other functions, like in this example:

cl<-getMPIcluster()funtorun<-function(k) {system.time(sort(runif(1e7)))

}system.time(a<-clusterApply(cl,1:7,funtorun))astopCluster(cl)

Only the master process continues to run the given script.

Page 118: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017118

foreach and doParallel

• The foreach package implements a for-loop that usesiterators, and also allows for parallel execution using its%dopar% operator.onow it is definitely you who needs to decide how the work gets divided,

or at least one aspect of that

• It comes with several "parallel backends", of which the doMPIpacakge should be used on Taito.

• From the foreach vignette: "Running many tiny tasks inparallel will usually take more time to execute than runningthem sequentially"oso now you need to start paying attention to details

Page 119: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017119

foreach and doMPI, batch job file

The execution command that needs to be added to the end of thebatch job file given above is like this:

srun Rscript --no-save --slave myrscript.R

Note that this starts a number of R sessions equal to the number ofreserved cores that all start to execute the given script. The --slaveoption prevents all of them printing the R welcome message etc. to thecommon output file.

Page 120: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017120

foreach and doMPI, R script

The script should include a call to startMPIcluster very close to the beginning, as all theprocesses will execute everything before that call, while only the master continues afterthat call.

library(doMPI,quietly = TRUE)

cl<-startMPIcluster()registerDoMPI(cl)system.time(a<-foreach(i=1:7) %dopar% system.time(sort(runif(1e7))))acloseCluster(cl)

Page 121: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017121

Further information

• CSC R documentation: https://research.csc.fi/-/r• rspatial-env module: https://research.csc.fi/-/rspatial-env

Support: [email protected]

121

Page 122: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017122

CSC – Finnish expertise in ICT for research, education, culture and public administration

Python for GIS in Taito

Elias Annila3.10.2017

Page 123: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017123

Python and module system

• System python includes only limited set of packages• Wider selection of Python versions and packages via module system

oTaito/Taito-shell: module load python-env/x.x.x• Additionally GIS related packages in geopython module

oPart of geo-env modulel Loads also dependencies (gdal, proj...)

omodule load geo-env

123

Page 124: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017124

Available packages

• NumPy, Scipy, Pandas etc. in python-env• GIS Specific packages in geo-env (geopython)

oGeopandaso Fionao ShapelyoRasteriooRasterstatsoGDAL/OGRoNetworkxo Skimageo Pyprojo PysaloRtreeoDescartes

124

Page 125: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017125

Installing own Python packages

• Python packages not available by CSC can be installed by user• Easy installation from Python Packaging Index

opip install - -user package_nameoPackage is installed under $HOME/.local and is available without further actionsoPip tries to take care also of dependencies

• Installing from source with distutilsopython setup.py install - -user

• Alternate installation directoriesoBoth pip and distutils have - -prefix option for specifying installation directoryopip install --prefix=$HOME/my_modules package_nameoDirectory has to be added to PYTHONPATH:oexport PYTHONPATH=$PYTHONPATH:$HOME/my_modules/lib/python2.7/site-packages

125

Page 126: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017126

Running Python code in Taito and Taito Shell

l Same as most other softwarel In Taito Shell: python your_script.py argsl To submit batch jobs: sbatch batch_job.sh

#!/bin/bash -l#SBATCH -J python_focal#SBATCH -o out.txt#SBATCH -e err.txt#SBATCH -t 00:00:20#SBATCH --cpus-per-task=1#SBATCH --mem-per-cpu=2#SBATCH -p serial

# load needed modulesmodule load geo-envsrun python your_script.py args

Page 127: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017127

Array jobs and python

l Read arguments from file row by row and pass to python script

#!/bin/bash -l#SBATCH -J python_focal#SBATCH -o array_job_out_%A_%a.txt#SBATCH -e array_job_err_%A_%a.txt#SBATCH -t 00:00:20#SBATCH --cpus-per-task=1#SBATCH --mem-per-cpu=2#SBATCH -p serial#SBATCH --array=1-3

# load needed modulesmodule load geo-envname=$(sed -n "$SLURM_ARRAY_TASK_ID"p file_names.txt)srun python your_script.py $name

import sysfilename = sys.argv[1]..do something with the file..

Batch job file your_script.py

Page 128: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017128

Parallel jobs with Multiprocessing Python moduleimport multiprocessing as mpfrom time import sleep

def worker(x):sleep(0.5)return x

if __name__=='__main__':pool = mp.Pool(processes=4)print pool.map(worker, range(4))

from time import sleep

def worker(x):sleep(0.5)return x

if __name__ == '__main__':print [worker(i) for i in range(4)]

Parallel code Serial code

Page 129: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017129

cProfile

l Tool to help you find out which part of your porgram takes mostexecution time

l Let's say function A takes 99% execution timel There's little point trying to optimize performance of anything

else even if it is very inefficient.l Usage:

l python -m cProfile my_python_script.py

Page 130: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017130

Further information

• CSC Python documentation: research.csc.fi/-/python• Geopython module: https://research.csc.fi/-/geopython

• Geo-env module: https://research.csc.fi/-/geo-env

• Multiprocessing: https://docs.python.org/3/library/multiprocessing.html• cProfile https://docs.python.org/2/library/profile.html

Support: [email protected]

130

Page 131: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017131

CSC – Finnish expertise in ICT for research, education, culture and public administration

GIS software and data in Taito

Elias Annila, Kylli Ek3.10.2017

Page 132: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017132

GIS Software available in Taito

geo-env modulel Qgisl GRASS (without gui)l GDALl Proj4l Taudeml LAStools (open source tools only)l PDALl Python

• To load saga or grass gui on top of geo-env:l module load geo-env wxwidgets sagal module load geo-env grass

132

Page 133: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017133

GIS Software available in Taito

• rspatial-env modulel Rl GRASS (no gui)l SAGAl GDALl Proj4l

• module load rspatial-env

133

Page 134: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017134

Using different GIS-software in Taito

Bash R Python QGIS

GDAL x x x x

GRASS x x (x) x

LasTools x (x) (x) x

SagaGIS x x (x) x

Taudem x (x) (x) ?

R spatialpackages

- x - -

Python geopackages

- - x -

Page 135: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017135

GIS Software not available in Taito

Windows software:• ArcGIS• MapInfo• LasTools Windows tools

Server software• GeoServer, MapServer• PostGIS

Web map libraries• OpenLayers, Leaflet

135

Page 136: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017136

Spatial data in Taito

Page 137: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017137

Shared data area in Taito

• Hosts large commonly used datasets• Reduces the need to transfer data to Taito• Located at /proj/ogiir-csc/

• If you think some other dataset should beincluded here, ask from [email protected]

137

MML:Lidar point cloud dataDem 2m (see virtual rasters sectionbelow)Dem 10m (see virtual rasters sectionbelow)

FMI10km avg relative humidity10km avg sea level preassure10km daily max temperature10km daily mean temperature10km daily min temperature10km daily precipitation10km daily radiation10km daily snow10km monthly mean temperature10km monthly precipitation

LUKEMulti-source national forest inventory

Page 138: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017138

Virtual rasters

• Allows working with dataset of multiple files as if they were a single file.• XML pointing to actual raster filesl The virtual file doesn't need to be rectangular, it can have holes and the

source files can even have different resolutions• Taito has ready made virtual rasters for elevation models and a python

tool to create your own for a specific area.

138

Page 139: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017139

Virtual Rasters in Taito

• Ready made virtual rasters for 2m and 10m dems• Corresponding to folder structure• External overviews and xml headers•• A python script to create your own for a specific area• Syntax:• python /proj/ogiir-csc/mml/karttalehtijako/vrt_creator.py [-h] [-i] [-o] [-p prefix]

dataset <dem2m/dem10m> polygon output_dirl -h print helpl -i create individual vrt file for each polygon in shpl -o create overviewsl -p <prefix> adds a prefix to output files

Page 140: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017140

Virtual Rasters with different software

• Can be read by anything that uses GDAL drivers:l Qgis, GRASS, SAGA, TAUDEM, Python and R packages, gdal itself

• Virtual rasters can be of different size, working with big virtaul rasters does notsuit all.

• QGIS is the best in vizualizing virtual rasters (if overviews are available).• Reading a smaller area of a big virtual raster for a certain analysis works well

in Python rasterio, R raster packages and GRASS.• But there are limitations...l GRASS requires linking with r.external, but still there is no need to create

copies of actual datal SAGA uses gdal only for importing data so you can create a .sgrd copy of

data, not use vrt directly.l A lot of tools output raster of similar dimensions as input and currently nothingwill create a tiled virtual raster as output.

Page 141: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017141

Further information

• Geocomputing general: https://research.csc.fi/geocomputing

• GIS data in Taito: https://research.csc.fi/gis_data_in_taito• Virtual rasters: https://research.csc.fi/virtual_rasters

Support: [email protected]

141

Page 142: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017142

CSC – Finnish expertise in ICT for research, education, culture and public administration

Introduction to cPouta

Eduardo Gonzalez

3.10.2017, Espoo

Page 143: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017143

cPouta, setting up a virtual machine

• Part 1: Cloud brainwashingoCloud conceptsocPouta web interface

• Part 2: setting up a virtual machineoPouta conceptsoVirtual machine’s settings

Page 144: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017144

What cloud?

• Terminology overload, used to mean e.g.:oStorage services (Dropbox)oVirtual server hosting (Amazon Web Services)oSoftware platforms (Google App Engine)oPretty much any web service (gmail, MS Office 365, ArcGIS Online)oThe Internet as a whole

• Self-service and automation are the common features

Page 145: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017145

*aaS?

cPouta e.g. Gmail, Dropbox

IT admin(Setup IaaS, OS,

network…)

IT skilled user(Install sofware,manages data…)

Managed byCSC

Page 146: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017146

Division of work

IaaS cloud expert

• ProvidesResources (compute, storage)Interfaces to access the system

• Supports usage of the cloud, butdoes not necessarily manageVirtual Machines (VM)

Does not know what is running on theVMs

VM admin

• Can connect the existingcompute / storage resourcesthrough the private networksolution

• Manages Virtual Machinesroot permission for VMs

Installs and maintains OperatingSystem and other software for VMsPays the software licenses

IT admin(Setup IaaS, OS,

network…)

IT skilled user(Install sofware,manages data…)

VM user(Runs computations,

uses software…)

Page 147: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017147

Traditional HPC environment Cloud environmentVirtual Machine

Operating system Same for all: CSC’s cluster OS Chosen by the user

Software installation Done by cluster administratorsCustomers can only install software to their owndirectories, no administrative rights

Installed by the userThe user has admin rights

User accounts Managed by CSC’s user administrator Managed by the userSecurity e.g. softwarepatches

CSC administrators manage the common softwareand the OS

User has more responsibility: e.g.patching of running machines

Running jobs Jobs need to be sent via the cluster’s BatchScheduling System (BSS = SLURM in Taito)

The user is free to use or not use aBSS

Environment changes Changes to SW (libraries, compilers) happen. The user can decide on versions.

Snapshot of theenvironment

Not possible Can save as a Virtual Machine image

Performance Performs well for a variety of tasks Very small virtualization overhead formost tasks, heavily I/O bound and MPItasks affected more

Traditional HPC (Taito) vs. IaaS (cPouta)

Page 148: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017148

cPouta use cases

• Running scientific applicationsoComputational clustersoPre/post process Sisu/Taito bound data

• Running other types of software stacksoContainerized or non-containerizedoWeb/file servers, load balancers, databases etc.oAugments DevOps, CI/CD, Agile workflowsoFrom persistent to rapidly deployed

• Virtual computer class

• Ad hoc research data/information sharing

Page 149: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017149

cPouta examples – Ubuntu server

Page 150: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017150

cPouta examples – OSGeoLive Virtual Machine

Page 151: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017151

cPouta examples - Training with notebooks

Page 152: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017152

cPouta examples - Rstudio on a web accessible container

Page 153: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017153

153

1-2. 3.4-5.

cPouta examplesPebbles PaaS – Or how a complete classroom was migrated to cPouta

Page 154: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017154

cPouta examples - Web Application

Page 155: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017155

GIS related services

• Web map servicesoGeoServeroMapServer

• DatabasesoPostgreSQL/PostGIS

• ArcPy

• Windows* software

Page 156: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017156

Container orchestration

• Running docker containers with KubernetesoUsing Magnum for creating resourcesoUsed to dividing work between different cloud machinesohttps://research.csc.fi/pouta-container-orchestration

• Use cases of cloud geocomputing with workloadmanagement framework:oJRC Earth Observation Data and Processing PlatformoÜberoEOX-Sentinel2 mosaicoMapillary

Page 157: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017157

Pouta Clouds in general

• Serviced offered by CSC (hardware in Finland)

• True self-service cloud IaaS powered by OpenStackoDeploy your own virtual machines, storage and networks as your

requirements evolveo No proprietary software to limit scalability

• Simple to create and modify virtual resourceso Choose from Web UI, CLI or RESTful APIs

• Designed to serve scientific as well as other use casesoGeneral purposeo High Performance ComputingoData Intensive Computingo Sensitive data

Page 158: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017158

Pouta Clouds

cPouta Cloud ePouta Cloud

General descriptionPublic type service

In production since 2013“Virtual Private Cloud”

Direct connections to customer networks

Use casesFrom general pupose use to specialized

scientific computingDesigned specifically forhandling sensitive data

ISO/IEC27001 compliant Yes Yes

VAHTI 2/2010 raised information securityrequirements

Yes Yes

Min. Edu. fundedfor open Finnish research and education Yes Yes

VM access Internet OPN/MPLS

VM installation, Firewall configuration, LoadBalancing, VM auto-recovery, Backups Self-service Self-service

Supported Operating Systems No limitsCommercial OS & per vendor license model

No limitsCommercial OS & per vendor license model

For more information: https://research.csc.fi/cloud-computing

Page 159: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017159

Interfaces

• WeboWorks from any modern browseroLaunch, list, terminate serversoServer console in the browseroManage storage and networks

• Command lineoCan do all the same things as the web interface and moreoSee instructions from https://research.csc.fi/pouta-install-client and

https://research.csc.fi/pouta-client-usage

• APIoManagement through a programmable interface

Page 160: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017160

cPouta web interface – Project Overview

Page 161: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017161

cPouta web interface – Instances

Page 162: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017162

cPouta web interface – Volumes

Page 163: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017163

cPouta web interface – Images

Page 164: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017164

cPouta web interface – Access & Security

Page 165: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017165

cPouta web interface – Access & Security

Page 166: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017166

cPouta web interface – Instances

Page 167: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017167

cPouta, setting up a virtual machine

• Part 1: Cloud brainwashingoCloud conceptsocPouta web interface

• Part 2: setting up a virtual machineoPouta conceptsoVirtual machine’s settings

Page 168: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017168

Options for installing software

• From an image or Docker container with ready installedsoftware packageso for ex. OSGeoLive

• Installing sofware manuallyo for ex. using apt-get command line

• Scripting toolso for ex. Ansible

Page 169: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017169

Virtual Machine Images

• Images are templates for launching (creating) VMsoYou can also launchVMs from volume snapshots (an image created

from a VM or a bootable volume)

• CSC provides ready made imagesoBased on different OSsoUpdated regularly

• Own images could be uploaded for own projectoFor ex. created with VirtualBox or VmwareoYou can also use installation media to create a VM from scratch and

then create an image out of it

Page 170: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017170

Virtual Machines

• Virtual Machine = VM = InstanceoA virtual server managed by a OpenStack useroUser normally logins with ssh key pairoConsole access via OpenStack Web UI or ssh connectionoUser gets root permissions and thus can create user accounts

independent of the cloud middleware user accounts.oVMs are launched using an image

Page 171: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017171

Virtual Machines - Flavors in cPouta

For more information: https://research.csc.fi/pouta-flavours

Flavor Cores Memory/core Memory Disk (root) Disk (ephem.)

standard.small 2 1 000 MB 2 000 MB 80 GB 0 GB

hpc-gen1.4core 4 3 750 MB 15 000 MB 80 GB 0 GB

hpc-gen2.8core 8 5 000 MB 40 000 MB 80 GB 0 GB

Io.70GB 2 5 000 MB 10 000 MB 20 GB 70 GB

gpu.1.1gpu 1 GPU(14 cores) 8 928 MB 125 000 MB 80 GB 0 GB

• Flavors specify the resources of a VMoCPU cores, RAM, root disk, ephemeral disk

Page 172: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017172

Virtual Machines - Flavors in cPouta

VM type Description Use

standard.* Oversubscribed ”traditional” cloudvirtual machines.

All non-CPU, non-IO intensiveworkloads

hpcgen1.* Non-oversubscribed, non-HT SandyBridge nodes (Taito) CPU intensive HPC/HTC workloads

hpcgen2.* Non-oversubscribed, HT Haswell largememory nodes

Memory and CPU intensiveHPC/HTC workloads

io.* SSD-backed high IOPS nodes IOPS intensive workloads

gpu.* SSD-backed NVIDIA Tesla P100 GPGPUHPC applications leveraging GPUs

Machine- and deep learningRendering

For more information: https://research.csc.fi/pouta-flavours

Page 173: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017173

Volumes (storage types)

• Root disk

• Ephemeral storage

• Persistent volumes

• Volume snapshotso (Volumes can also be stored as Images)

Page 174: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017174

Access & Security - Virtual Machines and Internet

• New VMs are not connected to Internet by default

• You need to allocate an IP address to it

• Once connected to Internet anyone could access/use it

• You are responsible of your VMs securityoOperating system, applications, user accounts, firewalls...oSSH keypairsoSecurity groups

Page 175: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017175

Access & Security - Floating IPs (= Public IPs)

• Access to VMs from Internet is provided by floating IPs

• You need to assign IPs to specific VMs

• You can disassociate IP from a VM and it goes back to the pool

• You can allocate IPs for your project (pool of IPS)oAllocated IPs are ready to be used by VMsoThese IPs stay the same until you release them

Page 176: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017176

Access & Security - SSH Key Pairs

• SSH key pairs are used to secure login to the VMsoYou only need to create it once for each useroNew VMs created from CSC provided images can only be accessed with

a SSH key (you can change this later if necessary)

For more information: https://research.csc.fi/pouta-getting-started

Page 177: Geocomputing using CSC resources...Store B2SAFE B2SHARE HPC Archive IDA Databases Research long-term preservation (LTP) Share & Publish AVAA B2DROP B2SHARE Databank Etsin Funet FileSender

Geocomputing using CSC resources 2-3.10.2017177

Access & Security - Security Groups

• Security Groups are sets of firewall rules which limit access toyour VMsoBy default, all the ports are closedoAvoid adding rules to the ”default” rule setoAn instance can have several security groupsoYou can edit/create security groups at any timeoYou can add/remove security groups to a VM at any timeoKeep as closed as possible i.e. limit the IP range if possible

For more information: https://research.csc.fi/pouta-getting-started