compute canada - helping advantage research...
TRANSCRIPT
![Page 1: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/1.jpg)
Helping Advantage Research Computing in CanadaLixin LiuSimon Fraser UniversityOctober 15, 2019
![Page 2: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/2.jpg)
Outline
• Compute Canada and Canadian ARC Funding
• Cedar System and Hardware
• Lustre File System
• Cedar Provisioning & Management
• Software Application Delivery
• Resource Allocation and Scheduler
• Compute Canada Support Model
2
![Page 3: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/3.jpg)
Compute Canada and Canadian ARC Funding
Compute Canada
• Non-profit organization to support research in Canadian public institution
• 4 Regional consortia: ACENET, Calcul Quebec, Compute Ontario, WestGrid
• 37 member institutions, all public universities
• More than 200 analysts and systems administrators
• Operate 5 large national ARC sites, arbutus, cedar, graham, Niagara, beluga
• Participate many national and international collabrations, e.g., Atlas
ARC Funding in Canada
• Federal government provides majority of ARC funding by CFI/ISED
– CFI: Canadian Foundation for Innovation, National Platform Fund
– ISED: Ministry of Industry, Science & Economic Development, Cyber Infrastructure Fund
• Provincial matching fund
• Vendor in kind matching fund
3
![Page 4: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/4.jpg)
Compute Canada Clusters
• CC issued RFP call to host national ARC systems
• 5 hosting institutions are selected after review
– Arbutus (GP1): Victoria, openstack clou
– Cedar (GP2): SFU, general purpose cluster with GPUs
– Graham (GP3): Waterloo, general purpose cluster with GPUs
– Niagara (LP): Toronto, large parallel jobs only
– Beluga (GP4): McGill, general purpose cluster with GPUs
• CC & Hosting institutions issued RFPs to purchase
– Systems at 4 Stage-1 sites (GP1-3 and LP) and 1 Stage-2 site (GP4)
– National Data Cyberinfrastructure (long term storage at all stage-1 sites)
– WAN network equipment (100GE to National RE network at stage-1 sites)
– Scheduler, parallel filesystem for all clusters
4
![Page 5: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/5.jpg)
SFU Data Centre• Location – Water Tower Building
– SFU campus on Burnaby mountain, 15km from downtown Vancouver
– Built in 1969 by BC Hydro as the Control Centre, covering 90% BC population
– 9000+ SF on ground floor space on concrete slap
– Divide into High Availability zone (17KW/rack) and High Density zone (35KW/rack)
– 7/24 NOC with operators on site
• Power– 3.5MW available, can upgrade to 10MW
– 1MW 1+1 UPS power with diesel backup (HA zone)
– Output 3-phase 240/415V
• Cooling Towers and Chillers with estimated PUE 1.07– Stage 1: radiative cooling during colder days
– Stage 2: evaporative cooling when web-bulb temperature under 20C
– Stage 3: chillers as needed (should only happen a few days each year)
5
![Page 6: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/6.jpg)
6
![Page 7: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/7.jpg)
Cedar Cluster Information – Stage 1• Timeline
– GP2 Stage-1 RFP was issued by SFU in early June 2016, closed on July 26, 2016
– Proposals were reviewed by SFU and Compute Canada members
– Awarded to Scalar Decisions in September 2016
– Cluster was installed in Q1 2017 and passed acceptance tests on April 15, 2017
– SFU officially announced on April 20, 2017 and cluster in production on July 25, 2017
• Cluster nodes (902 nodes, 27696 cores, 584 P100, 186TB RAM)– 576 base, Dell C6320, 2 E5-2683v4, 128GB RAM, 2 480GB SSDs
– 128 large, Dell C6320, 2 E5-2683v4, 256GB RAM, 2 480GB SSDs
– 24 bigmem512, Dell C6320, 2 E5-2683v4, 512GB RAM, 2 480GB SSDs
– 24 bigmem1500, Dell R630, 2 E5-2683v4, 1.5TB RAM, 2 480GB SSDs
– 4 bigmem3000, Dell R930, 4 E7-4809v4, 3TB RAM, 2 480GB SSDs
– 114 base GPU, Dell C4130, 2 E5-2650v4, 4 P100/12GB, 128GB, 800GB SSD
– 32 large GPU, Dell C4130, 2 E5-2650v4, 4 P100/16GB, 256GB, 800GB SSD
– Others: 8 head nodes, 5 DTN nodes, 8 management nodes, TDS partition
7
![Page 8: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/8.jpg)
Cedar Cluster Information - Stage 1• Interconnect
– Intel Omni-Path, 16 spine switches, 30 leaf switches
– Leaf switch connects to a 32-node island with 16 uplinks, 1 to each spine
– 29+2 compute islands, 1+1 for storage/service island, 2:1 blocking factor
• Global Storage & File Systems
– DDN SFA 14KX with 640 8TB drives, 4PB, Lustre filesystem
– 4 Embedded OSS servers running on SFA controllers, 64 OSTs
– 2 MDS servers with a EF4024 disk shelf, 6TB RAID 10, 2 MDTs
– 35GB/s read and 32GB/s write
• Rack Power and Cooling
– Each compute rack has 1 3P 60A PDU and UHD RDHX, 35KW power/cooling
– Storage/service racks have 2 3P 30A PDUPDUs and HD RDHX, 17.5KW
– Balance CPU and GPU islands to optimize of space, power & cooling
8
![Page 9: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/9.jpg)
Cedar Cluster Information – Stage 2• Stage 2 expansion in Spring 2018
• Nodes
– 640 Skylake nodes, total 30,720 cores
– 122TB total memory
• Interconnect
– Intel Omni-Path, 16 spine switches, 20 leaf switches
– Adding 4 core switches to connect Stage 1 and Stage 2, 1:8 blocking
• Storage
– Expand SFA 14K by 200 8TB disks, all for scratch
9
![Page 10: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/10.jpg)
Cedar Cluster Information – Stage 3• Stage 3 expansion fund is provided by ISED CI fund
• Installation planned in late 2019
• Nodes
– 768 Cascade Lake nodes, total 36,864 CPU cores
– 192 GPU nodes, 768 V100, 6144 CPU cores
– 184TB total memory
• Interconnect
– Intel Omni-Path, 16 spine switches, 30 leaf switches
– Adding additional 4 core switches to connect Stage 1 and 2, 1:4 blocking
10
![Page 11: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/11.jpg)
Cedar Cluster Interconnect
11
![Page 12: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/12.jpg)
Cedar Cluster Benchmark• Stage 1 HPL benchmark was performed on GPU nodes only
– TOP500 ranking in June 2017: No.86, 1,337TF
– GREEN500 ranking in June 2017: No.13, 8GFlops/Watts
• Installation planned in late 2019
• Stage 2 HPL benchmark was performed on CPU nodes only– TOP500 ranking in November 2018: No.190, 1,633TF
– Using hybrid HPL code from Intel to balance Skylake and Broadwell nodes
• Stage 3 HPL benchmark planned in Spring 2020
12
![Page 13: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/13.jpg)
Cedar Cluster Benchmark
13
![Page 14: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/14.jpg)
Persistent Storage• Long term storage is funded by National Data Cyberinfrastructure Fund
• Storage type include
– Lustre File system (project filesystem), 20PB
– dCache storage, 10PB
– Openstack Ceph, 3.5PB
– Offline/Nearline tape storage, 60PB
• Allow direct access to these storage from Cedar
• Globus is the preferred option to move data between sites
14
![Page 15: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/15.jpg)
Lustre Filesystems – Home & Scratch• Current /home and /scratch
– SFA 14K with 840 8TB disks, EXA 4.2
– ldiskfs backend, 8+2 RAID6
– 4 Embedded OSS servers, 4 OSTs for /home and 80 for /scratch
– Major performance issues happened after stage 2 expansion
• Planning new /home
– move /home to a DDN SS9012 with 2 Dell R640 as OSS servers
– use ZFS backend OSTs: 12+2 RAIDZ2 with one SSD cache, total 6 OSTs
– move MDT from SAS to SSD based hardware RAID 10 storage, ldiskfs
• Planning /scratch changes
– Replace embedded SFA 14K controller to block based controller
– Add 4 Dell R640 as OSS servers
– Keep original MDS/MDT and OSTs
15
![Page 16: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/16.jpg)
Lustre Filesystems – ProjectProject filesystem is based on CC storage building block, option 4
• Community version 2.10.7
• Each pair of OSS servers connect to 4 disk enclosures in JBOD mode
• Servers: Dell R630/640, 4 OSTs per server
• Enclosures: Seagate SP2584, SP3106 and SS9012
• ZFS backend, RAIDZ2 with SSD L2ARC cache
• OST level failover only, not using Multipathing
• 2 MDS servers using Dell R630, ldiskfs for MDT
• Directory structure is organized by projects and group quota is used
• Waiting to migrate to project quota
• Plan for 2.12 migration in 2020
16
![Page 17: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/17.jpg)
Lustre Filesystems – ProjectPerformance Issues
• Initial MDT is using 24 SAS drives, RAID 10
• Frequent high load on MDS and OSS servers, mostly caused by bioinformatic jobs, like blast
Resolution
• Replaced all SAS disks by SSDs in March 2019
• Significant improvement observed on both MDS and OSS loads
17
![Page 18: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/18.jpg)
Lustre HSM Filesystem – NearlineHSM Design
• Lustre HSM Filesystem using Robinhood and TSM copytool solution is developed by CC members, mostly by Simon GuilBault
• IBM Spectrum Protect (aka TSM) is used for backup at all CC sites
• Cross sites replication is planned
• Use Robinhood as a policy engine and storage changelog in MySQL
• Lhsmtool_cmd calls a script to `archive’ files to Spectrum Protect server
• Keep 2 copies of data on tape
Cedar implementation
• 1 DSS7000 with 8 OSTs (ldiskfs 9+2 RAID6) as OSS server
• Share the same MDS server with /project
• Directory structure is similar to Project and we use project GID as the project id for project quota
18
![Page 19: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/19.jpg)
Lustre Filesystems – Nearline
19
![Page 20: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/20.jpg)
Auto Provisioning and Management (ADAM)
• Compute nodes are running CentOS 7.5
• OS is installed in memory (ramdisk)
• Local disks are used as local scratch only
• Using iPXE to boot an unconfigured node with temporary IP address
• Register node information and assign a permanent IP address
• Performance node firmware update automatically during boot
• 2-stage installation process to boot 1600+ nodes within 60 minutes
• System provisioning using Puppet
• User authentication by Compute Canada LDAP services
• Global syslog collection and centralized monitoring
20
![Page 21: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/21.jpg)
Software Application Delivery• CVMFS is used to provide nation wide software distribution
• Software in CVMFS is maintain by CC analysts from all regions
• Use Nix/Easybuild to install software to CVMFS Stratum 0 server
• CVMFS Stratum 1 servers available on East and West
• Local sites have dedicated Squid Servers
• CC provided training for analysts to use Nix, EasyBuild and Lmod to build and install software in CVMFS
• Local (cedar only) software is installed on a NFS server
• Singularity Container support
• Module environment to run applications
• Job scheduler: Slurm
21
![Page 22: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/22.jpg)
Resource Allocation & Scheduler• All CC resources are free to Canadian researchers in public institutions
• Projects have a default allocation (coreyear CPU, TB storage)
• CC issues resource allocation call every year include
– Resources for Research Groups (RRG), award for 1 year
– Resource Platform and Portals (RPP), award for 3 years
– Rapid Access Services (RAS), short term, no application necessary
• Allocations: CPU (coreyear), GPU (gpuyear) & Storage (TB)
• Applications are reviewed by science and technical committees
• Allocation data is integrated into LDAP and pulled into Slurm DB
• Job walltime: 12 hours to 28 days
• Multiple partitions: by-core, by-node, by-gpu, by-gpu_node
• Cedar is suitable for serial & smaller to medium size parallel jobs
22
![Page 23: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/23.jpg)
Compute Canada Support Model• Cedar System team maintains hardware, OS, interconnect, storage
• CC RSNT maintains software CVMFS and interface with users
• CC support staff will help any user to use these resources
• OTRS support ticketing system
• National help desk
• Other national teams provide various support and consulting
• With consent, CC analysts can use “ccsudo” to access users’ home directories, debug users’ problems. “ccsudo” writes audit logs
23
![Page 24: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/24.jpg)
SFU Data Centre
24
![Page 25: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/25.jpg)
SFU Data Centre
25
![Page 26: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/26.jpg)
Cooling Towers
26
![Page 27: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/27.jpg)
Mechanical Room
27
![Page 28: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/28.jpg)
Rack Power and Cooling
28
![Page 29: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/29.jpg)
Cedar Compute racks
29
![Page 30: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/30.jpg)
Acknowledgement
30
B.C. KnowledgeDevelopment Fund
![Page 31: Compute Canada - Helping Advantage Research …lustrefs.cn/wp-content/uploads/2019/10/CLUG2019_05...Compute Canada and Canadian ARC Funding Compute Canada •Non-profit organization](https://reader033.vdocuments.site/reader033/viewer/2022042921/5f684a6151b27d1db014ada5/html5/thumbnails/31.jpg)
Questions?
31