compute canada - helping advantage research...

31
Helping Advantage Research Computing in Canada Lixin Liu Simon Fraser University October 15, 2019

Upload: others

Post on 23-Jul-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Helping Advantage Research Computing in CanadaLixin LiuSimon Fraser UniversityOctober 15, 2019

Page 2: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Outline

• Compute Canada and Canadian ARC Funding

• Cedar System and Hardware

• Lustre File System

• Cedar Provisioning & Management

• Software Application Delivery

• Resource Allocation and Scheduler

• Compute Canada Support Model

2

Page 3: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Compute Canada and Canadian ARC Funding

Compute Canada

• Non-profit organization to support research in Canadian public institution

• 4 Regional consortia: ACENET, Calcul Quebec, Compute Ontario, WestGrid

• 37 member institutions, all public universities

• More than 200 analysts and systems administrators

• Operate 5 large national ARC sites, arbutus, cedar, graham, Niagara, beluga

• Participate many national and international collabrations, e.g., Atlas

ARC Funding in Canada

• Federal government provides majority of ARC funding by CFI/ISED

– CFI: Canadian Foundation for Innovation, National Platform Fund

– ISED: Ministry of Industry, Science & Economic Development, Cyber Infrastructure Fund

• Provincial matching fund

• Vendor in kind matching fund

3

Page 4: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Compute Canada Clusters

• CC issued RFP call to host national ARC systems

• 5 hosting institutions are selected after review

– Arbutus (GP1): Victoria, openstack clou

– Cedar (GP2): SFU, general purpose cluster with GPUs

– Graham (GP3): Waterloo, general purpose cluster with GPUs

– Niagara (LP): Toronto, large parallel jobs only

– Beluga (GP4): McGill, general purpose cluster with GPUs

• CC & Hosting institutions issued RFPs to purchase

– Systems at 4 Stage-1 sites (GP1-3 and LP) and 1 Stage-2 site (GP4)

– National Data Cyberinfrastructure (long term storage at all stage-1 sites)

– WAN network equipment (100GE to National RE network at stage-1 sites)

– Scheduler, parallel filesystem for all clusters

4

Page 5: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

SFU Data Centre• Location – Water Tower Building

– SFU campus on Burnaby mountain, 15km from downtown Vancouver

– Built in 1969 by BC Hydro as the Control Centre, covering 90% BC population

– 9000+ SF on ground floor space on concrete slap

– Divide into High Availability zone (17KW/rack) and High Density zone (35KW/rack)

– 7/24 NOC with operators on site

• Power– 3.5MW available, can upgrade to 10MW

– 1MW 1+1 UPS power with diesel backup (HA zone)

– Output 3-phase 240/415V

• Cooling Towers and Chillers with estimated PUE 1.07– Stage 1: radiative cooling during colder days

– Stage 2: evaporative cooling when web-bulb temperature under 20C

– Stage 3: chillers as needed (should only happen a few days each year)

5

Page 6: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

6

Page 7: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Cedar Cluster Information – Stage 1• Timeline

– GP2 Stage-1 RFP was issued by SFU in early June 2016, closed on July 26, 2016

– Proposals were reviewed by SFU and Compute Canada members

– Awarded to Scalar Decisions in September 2016

– Cluster was installed in Q1 2017 and passed acceptance tests on April 15, 2017

– SFU officially announced on April 20, 2017 and cluster in production on July 25, 2017

• Cluster nodes (902 nodes, 27696 cores, 584 P100, 186TB RAM)– 576 base, Dell C6320, 2 E5-2683v4, 128GB RAM, 2 480GB SSDs

– 128 large, Dell C6320, 2 E5-2683v4, 256GB RAM, 2 480GB SSDs

– 24 bigmem512, Dell C6320, 2 E5-2683v4, 512GB RAM, 2 480GB SSDs

– 24 bigmem1500, Dell R630, 2 E5-2683v4, 1.5TB RAM, 2 480GB SSDs

– 4 bigmem3000, Dell R930, 4 E7-4809v4, 3TB RAM, 2 480GB SSDs

– 114 base GPU, Dell C4130, 2 E5-2650v4, 4 P100/12GB, 128GB, 800GB SSD

– 32 large GPU, Dell C4130, 2 E5-2650v4, 4 P100/16GB, 256GB, 800GB SSD

– Others: 8 head nodes, 5 DTN nodes, 8 management nodes, TDS partition

7

Page 8: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Cedar Cluster Information - Stage 1• Interconnect

– Intel Omni-Path, 16 spine switches, 30 leaf switches

– Leaf switch connects to a 32-node island with 16 uplinks, 1 to each spine

– 29+2 compute islands, 1+1 for storage/service island, 2:1 blocking factor

• Global Storage & File Systems

– DDN SFA 14KX with 640 8TB drives, 4PB, Lustre filesystem

– 4 Embedded OSS servers running on SFA controllers, 64 OSTs

– 2 MDS servers with a EF4024 disk shelf, 6TB RAID 10, 2 MDTs

– 35GB/s read and 32GB/s write

• Rack Power and Cooling

– Each compute rack has 1 3P 60A PDU and UHD RDHX, 35KW power/cooling

– Storage/service racks have 2 3P 30A PDUPDUs and HD RDHX, 17.5KW

– Balance CPU and GPU islands to optimize of space, power & cooling

8

Page 9: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Cedar Cluster Information – Stage 2• Stage 2 expansion in Spring 2018

• Nodes

– 640 Skylake nodes, total 30,720 cores

– 122TB total memory

• Interconnect

– Intel Omni-Path, 16 spine switches, 20 leaf switches

– Adding 4 core switches to connect Stage 1 and Stage 2, 1:8 blocking

• Storage

– Expand SFA 14K by 200 8TB disks, all for scratch

9

Page 10: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Cedar Cluster Information – Stage 3• Stage 3 expansion fund is provided by ISED CI fund

• Installation planned in late 2019

• Nodes

– 768 Cascade Lake nodes, total 36,864 CPU cores

– 192 GPU nodes, 768 V100, 6144 CPU cores

– 184TB total memory

• Interconnect

– Intel Omni-Path, 16 spine switches, 30 leaf switches

– Adding additional 4 core switches to connect Stage 1 and 2, 1:4 blocking

10

Page 11: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Cedar Cluster Interconnect

11

Page 12: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Cedar Cluster Benchmark• Stage 1 HPL benchmark was performed on GPU nodes only

– TOP500 ranking in June 2017: No.86, 1,337TF

– GREEN500 ranking in June 2017: No.13, 8GFlops/Watts

• Installation planned in late 2019

• Stage 2 HPL benchmark was performed on CPU nodes only– TOP500 ranking in November 2018: No.190, 1,633TF

– Using hybrid HPL code from Intel to balance Skylake and Broadwell nodes

• Stage 3 HPL benchmark planned in Spring 2020

12

Page 13: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Cedar Cluster Benchmark

13

Page 14: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Persistent Storage• Long term storage is funded by National Data Cyberinfrastructure Fund

• Storage type include

– Lustre File system (project filesystem), 20PB

– dCache storage, 10PB

– Openstack Ceph, 3.5PB

– Offline/Nearline tape storage, 60PB

• Allow direct access to these storage from Cedar

• Globus is the preferred option to move data between sites

14

Page 15: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Lustre Filesystems – Home & Scratch• Current /home and /scratch

– SFA 14K with 840 8TB disks, EXA 4.2

– ldiskfs backend, 8+2 RAID6

– 4 Embedded OSS servers, 4 OSTs for /home and 80 for /scratch

– Major performance issues happened after stage 2 expansion

• Planning new /home

– move /home to a DDN SS9012 with 2 Dell R640 as OSS servers

– use ZFS backend OSTs: 12+2 RAIDZ2 with one SSD cache, total 6 OSTs

– move MDT from SAS to SSD based hardware RAID 10 storage, ldiskfs

• Planning /scratch changes

– Replace embedded SFA 14K controller to block based controller

– Add 4 Dell R640 as OSS servers

– Keep original MDS/MDT and OSTs

15

Page 16: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Lustre Filesystems – ProjectProject filesystem is based on CC storage building block, option 4

• Community version 2.10.7

• Each pair of OSS servers connect to 4 disk enclosures in JBOD mode

• Servers: Dell R630/640, 4 OSTs per server

• Enclosures: Seagate SP2584, SP3106 and SS9012

• ZFS backend, RAIDZ2 with SSD L2ARC cache

• OST level failover only, not using Multipathing

• 2 MDS servers using Dell R630, ldiskfs for MDT

• Directory structure is organized by projects and group quota is used

• Waiting to migrate to project quota

• Plan for 2.12 migration in 2020

16

Page 17: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Lustre Filesystems – ProjectPerformance Issues

• Initial MDT is using 24 SAS drives, RAID 10

• Frequent high load on MDS and OSS servers, mostly caused by bioinformatic jobs, like blast

Resolution

• Replaced all SAS disks by SSDs in March 2019

• Significant improvement observed on both MDS and OSS loads

17

Page 18: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Lustre HSM Filesystem – NearlineHSM Design

• Lustre HSM Filesystem using Robinhood and TSM copytool solution is developed by CC members, mostly by Simon GuilBault

• IBM Spectrum Protect (aka TSM) is used for backup at all CC sites

• Cross sites replication is planned

• Use Robinhood as a policy engine and storage changelog in MySQL

• Lhsmtool_cmd calls a script to `archive’ files to Spectrum Protect server

• Keep 2 copies of data on tape

Cedar implementation

• 1 DSS7000 with 8 OSTs (ldiskfs 9+2 RAID6) as OSS server

• Share the same MDS server with /project

• Directory structure is similar to Project and we use project GID as the project id for project quota

18

Page 19: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Lustre Filesystems – Nearline

19

Page 20: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Auto Provisioning and Management (ADAM)

• Compute nodes are running CentOS 7.5

• OS is installed in memory (ramdisk)

• Local disks are used as local scratch only

• Using iPXE to boot an unconfigured node with temporary IP address

• Register node information and assign a permanent IP address

• Performance node firmware update automatically during boot

• 2-stage installation process to boot 1600+ nodes within 60 minutes

• System provisioning using Puppet

• User authentication by Compute Canada LDAP services

• Global syslog collection and centralized monitoring

20

Page 21: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Software Application Delivery• CVMFS is used to provide nation wide software distribution

• Software in CVMFS is maintain by CC analysts from all regions

• Use Nix/Easybuild to install software to CVMFS Stratum 0 server

• CVMFS Stratum 1 servers available on East and West

• Local sites have dedicated Squid Servers

• CC provided training for analysts to use Nix, EasyBuild and Lmod to build and install software in CVMFS

• Local (cedar only) software is installed on a NFS server

• Singularity Container support

• Module environment to run applications

• Job scheduler: Slurm

21

Page 22: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Resource Allocation & Scheduler• All CC resources are free to Canadian researchers in public institutions

• Projects have a default allocation (coreyear CPU, TB storage)

• CC issues resource allocation call every year include

– Resources for Research Groups (RRG), award for 1 year

– Resource Platform and Portals (RPP), award for 3 years

– Rapid Access Services (RAS), short term, no application necessary

• Allocations: CPU (coreyear), GPU (gpuyear) & Storage (TB)

• Applications are reviewed by science and technical committees

• Allocation data is integrated into LDAP and pulled into Slurm DB

• Job walltime: 12 hours to 28 days

• Multiple partitions: by-core, by-node, by-gpu, by-gpu_node

• Cedar is suitable for serial & smaller to medium size parallel jobs

22

Page 23: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Compute Canada Support Model• Cedar System team maintains hardware, OS, interconnect, storage

• CC RSNT maintains software CVMFS and interface with users

• CC support staff will help any user to use these resources

• OTRS support ticketing system

• National help desk

• Other national teams provide various support and consulting

• With consent, CC analysts can use “ccsudo” to access users’ home directories, debug users’ problems. “ccsudo” writes audit logs

23

Page 24: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

SFU Data Centre

24

Page 25: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

SFU Data Centre

25

Page 26: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Cooling Towers

26

Page 27: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Mechanical Room

27

Page 28: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Rack Power and Cooling

28

Page 29: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Cedar Compute racks

29

Page 30: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Acknowledgement

30

B.C. KnowledgeDevelopment Fund

Page 31: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization

Questions?

31