deepsense computing platform · o job resume: bresume o job kill: bkill •check available hosts...

Post on 24-May-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DeepSense Computing Platform

Agenda

• System Overview

• File Systems

• Data Transfer

• IBM Spectrum LSF

• Conductor with Spark

• Technical Support

System Overview

• Compute Nodes• 20 Large memory nodes

-20 core, 512GB memory

• 4 Huge memory nodes

-20 core, 1TB memory

• 10 GPU nodes

-2XP100, 20 core, 512 GB Memory

• Operating System • Redhat Enterprise 7.5

Heterogeneous Computing with GPU

NVIDIA P100 GPU

Cores 3584

Memory 16 GB HBM2

Memory Bandwidth 720 GB/s

FLOPS (sp) 9.3 TFLOPS

FLOPS (dp) 4.7 TFLOPS

FLOPS (hp) 18.7 TFLOPS

Power consumption 250W

CUDA compute ability 6.0

IBM Power 8 with NVIDIA P100

CPU-GPU Systems Connected via PCI-e

NVLink Enables Fast Unified Memory Access between CPU and GPU memory

Applications

• Domain

- Ocean data products, Ship Building, Fisheries and

Aquaculture, Seaport and Logistics, Security and

Defense, Marine Risk…

• Data Source

- Sensor logs, text, image, video, web traffic

geospatial, AIS, …

• Analytics

- Image processing, Time-Series, Predictive

Analytics, Machine Learning, Deep Learning,

Distributed Computing, ..

File System

File System

Directory Purpose Quota Backed up?

Purged?

Home /dshome/subdir/username development 1Tb, 500k files per user yes no

Data /data/projectname development 2Tb, 500k files per project

yes no

Scratch /scratch/username computation 2TB, 1M files per user no yes

Data Transfer

• Two protocol nodesoprotocol1.deepsense.ca

oprotocol2.deepsense.ca

• Connect using SAMBA:

- smb://protocol1.deepsense.ca

- use your DeepSense account

Data Backups

• DeepSense is platform for data analytics. It is not meant for long term storage.• Users should ensure their original data is backed up at their own site.

• We do have daily backups• /dshome, /data, /software are backed up every evening

• The backup keeps 7 versions of files

• Once a file is deleted, it is kept backed up for 30 days. After which, it is no longer accessible

• If you need to restore a file, please let us know

IBM Spectrum LSF

• Workload management platform

• Maximize utilization for distributed

High Performance computing

• GPU Support

• Execute batch/interactive jobs

• Containerized workloads

LSF Access and Login

• User account => Deepsense account

• Login nodes:• login1.deepsense.ca

• login2.deepsense.ca

• Example connection:• ssh <username>@login1.deepsense.ca

• for Mac or Linux client use terminal

• for windows client use PuTTY, MobaXterm

• If you are off campus, need a Dalhousie VPN connection

Submitting Job to LSF

• Development/test jobs• For testing/dev use the login nodes

• Shared with all users

• Batch jobs• Command: bsub

• With ‘bsub’ options specify:

input/output files, GPU option, CPU/Memory Limit, etc..

• Interactive jobs• Command: bsub -I

LSF Monitoring/Cancelling jobs

• Check Running jobso bjobs -l

o bjobs -l <jobid> // for job details

• Control job executiono Job suspend: bstop <jobid>

o Job resume: bresume <jobid>

o Job kill: bkill <jobid>

• Check available hostso bhosts

IBM Spectrum Conductor with Spark (CWS)

• Spark integration and lifecycle management platform

• Support for multiple Spark versions

• Integrated application platform

• Notebooks, Deep Learning packages

• Simplified administration

Accessing CWS

• Management ConsoleoGo to url:https://ds-mgm-02.deepsense.cs.dal.ca:8443

o Login using DS account

• Command Line Optiono from login node ssh to:

ssh ds-cmhm-02.deepsense.cs.dal.ca

o source the environment

o login to cws using DS account

CWS - Spark Instance Group

• From dashboard go to:oWorkload -> Spark -> Spark

Instance Group

o Specify name, directory and user

• Choose Spark Versiono Spark 2.3.1, Spark 2.2.0,

Spark 2.1.1, Spark 1.6.1

• Optional: choose Notebook

Technical support

• DocumentationoDeepsense computing platform wiki page

https://docs.deepsense.ca

o IBM Knowledge Center

https://www.ibm.com/support/knowledgecenter/

• Troubleshooting/technical questionso Send email to support@deepsense.ca

Questions ?

top related