themis athanassiadou hpc project manager clustervision · pdf filethemis athanassiadou hpc...
TRANSCRIPT
ClusterVision Engineer Innovate Integrate
About ClusterVision
• 12 years Europe's Dedicated Specialist for High-Performance Computing
• End-to-end hardware/software/services solution provider
• HPC engineering and innovation is at the heart of what we do
• Active in Europe, Middle-East & Africa, Asia-Pacific
• Amsterdam based: - Well connected to major European business locations
• Quality - ISO9001:2008 & ISO14001 certified
• More than 400 projects and 250 customers
ClusterVision Engineer Innovate Integrate
HPC in Industry HPC in Industry
Industry 51%
Research 24%
Academic 18%
Government 4%
Vendor 3%
Systems share Top 500
Industry Research Academic Government Vendor
ClusterVision Engineer Innovate Integrate
“Today, to Out-Compute is to Out-Compete”
“Today, to Out-Compute is to Out-Compete”
HPC:
• Enables development of new products and services
• Reduces time to market
• Reduces R&D costs
• Increases quality
• Reduces personnel costs
ClusterVision Engineer Innovate Integrate
and empowers a broad spectrum of other businesses and empowers a broad spectrum of other businesses
ClusterVision Engineer Innovate Integrate
Adopting HPC can be challenging for many businesses Adopting HPC can be challenging for many businesses
• Lack of infrastructure
• Cost of equipment
• Cost of operation
• Lack of expertise
• Lack of experience
ClusterVision Engineer Innovate Integrate
CV Innovations that ease HPC adoption CV Innovations that ease HPC adoption
OpenStack Cloud Compute
▪ Convergence of big data, cloud and HPC
▪ Merge with the general IT infrastructure
▪ Open industry standard, open source
▪ Securely host CPU cycles and storage
HPC mineral oil based cooling solution
▪ Saving ~20% power: no air-cooling or fans
▪ Less current leakage, higher HPC performance
▪ Skinless servers
▪ Re-use of racks and same oil for > 15 years
Remote System Administration ▪ Outsourced infrastructure management ▪ Power of scaling: lower your cost ▪ Especially suitable for OpenStack ▪ You can focus on workflow instead of
hardware
Trinity – HPC enabled cloud environment ▪ OpenStack base. ▪ Support for containers ▪ Enhanced for High Performance Compute ▪ Full HPC Ecosystem
ClusterVision Engineer Innovate Integrate
Why cloud? Freedom of choice and flexibility Why cloud? Freedom of choice and flexibility
Main characteristics:
• On demand self-service
• Broad network access
• Resource pooling
• Rapid elasticity
• Measured service
Service models:
• Software as a Service
• Platform as a Service
• Infrastructure as a Service
Deployment models:
• Private Cloud
• Public Cloud
• Hybrid Cloud
• Community Cloud
ClusterVision Engineer Innovate Integrate
Ideally… Ideally…
Engineering (HPC nodes)
IT department
Finance
Physical Hardware
(Servers, Database, Storage, GPU Pool)
ClusterVision Engineer Innovate Integrate
To get there, you need some form of virtualization To get there, you need some form of virtualization
Engineering IT
department Finance
Physical Hardware
(Servers, Database, Storage, GPU Pool)
VM VM VM
Virtual Machine Monitor
ClusterVision Engineer Innovate Integrate
Virtualization technologies
HYPERVISORS
(eg. Vmware, Xen, KVM)
• Full virtualization
• Great workload isolation
• Slower
CONTAINERS
(eg LXC, Docker)
• Lightweight virtualization
•Good workload isolation
•Faster
VS
Image credit: CISCO
ClusterVision Engineer Innovate Integrate
Which is best for HPC? Which is best for HPC?
HPC applications:
• usually tuned to specific hardware
• strive to maximize performance (compute, I/O)
• some require a very fast network (Infiniband)
• Clouds only guarantee "minimal" level of performance
IBM research report: RC25482 (AUS1407-001) July 21, 2014
CPU test: Linpack performance on 2 sockets (16 cores) CPU test: Linpack performance on 2 sockets (16 cores)
ClusterVision Engineer Innovate Integrate
Is performance using container virtualization good enough?
According to the IBM study:
•Docker equals or exceeds
KVM performance in every case tested
(CPU, memory, I/O)
• For I/O-intensive workloads, both
forms of virtualization should be used
carefully.
•Network, which is very important in
many applications, needs to be tested.
CPU test: Linpack performance on 2 sockets (16 cores) CPU test: Linpack performance on 2 sockets (16 cores)
Container based virtualization is a great
starting point for HPC in the Cloud.
ClusterVision Engineer Innovate Integrate
Building a suitable HPC Cloud Building a suitable HPC Cloud
An HPC Cloud should strive to find the balance
between flexibility, convenience and
acceptable performance.
An HPC Cloud should strive to find the balance
between flexibility, convenience and
acceptable performance.
ClusterVision Engineer Innovate Integrate
Choosing a suitable Cloud for
HPC
Choosing a suitable Cloud for
HPC
• Depends on the need (Public? Private? Level of abstraction?
Specialized hardware? Performance? )
• A number of companies, including Penguin, R-HPC, Amazon,
Univa, SGI, Sabalcore, UberCloud and Gompute offer specialized
HPC clouds. Evaluate and choose.
• For full freedom of choice/customization/security, build your
cloud from an OpenSource project (OpenStack, OpenNebula,
Eucalyptus) + add HPC functionality using containers.
ClusterVision Engineer Innovate Integrate
Openstack: Cloud building toolkit of choice
• Open source set of software tools for building and managing cloud computing platforms for public and
private clouds.
•Currently managed by the OpenStack Foundation.
•More than 200 companies have joined, including Dell, Intel, Red Hat and Oracle
•Rapidly becoming industry standard
• It is primarily deployed as an infrastructure as a service (IaaS) solution.
• Open source set of software tools for building and managing cloud computing platforms for public and
private clouds.
•Currently managed by the OpenStack Foundation.
•More than 200 companies have joined, including Dell, Intel, Red Hat and Oracle
•Rapidly becoming industry standard
• It is primarily deployed as an infrastructure as a service (IaaS) solution.
ClusterVision Engineer Innovate Integrate
Challenges addressed by OpenStack
Problem Solution
Manager: Resources wasted when Virtual Desktop Infrastructure (VDI) is idle at night
Admin: Inability to collaborate with external parties due to lack of a security infrastructure for hosting CPUs/disks
Admin : Inflexibility in building and maintaining similar environments on multiple physical platforms
User: Finance department needs to run the payroll for tomorrow, and doesn’t have the resources to do so in time!
With OpenStack you can easily switch from the Virtual Desktop Infrastructure (VDI) to HPC
With OpenStack, you can securely host CPU/DISK for paying customers through the use of virtual instances / environments
With OpenStack, you can easily share images which contain predefined application binaries and/or environments
With OpenStack this user would more easily access a larger or even external infrastructure with the required CPU / Storage environment
Manager: Virtual server and HPC environments running independently.
With OpenStack, you merge these into one single efficient environment
ClusterVision Engineer Innovate Integrate
• Full HPC Stack - Monitoring
- Checking
- Module and Library Environment
- Scheduler (SLURM, PBS, SGE etc)
• Performance Optimisations
(containers)
• Typical HPC services and integration
- InfiniBand
- Parallel filesystems
- GPUs/Accelerators
Enhancing OpenStack for HPC Enhancing OpenStack for HPC
ClusterVision Engineer Innovate Integrate
Trinity: Linking Cloud and HPC
Trinity is a set of software tools for building and managing virtual HPC or OpenStack environments in a platform as a service (PaaS) solution, customized for HPC performance.
Trinity is a set of software tools for building and managing virtual HPC or OpenStack environments in a platform as a service (PaaS) solution, customized for HPC performance.
• Adds ease of management (Trinity dashboard) to HPC
• Scalable to tens of thousands of nodes
• Full hardware support (IPMI, infiniband, PXE)
• Provides full HPC stacks (schedulers, MPI, libraries)
• No performance loss (virtualization based on Docker)
• Allows customers to host their own private
or public IaaS cloud (for general IT)
• Load balancing (HPC) partitions
• Environment customization
ClusterVision Engineer Innovate Integrate
Features Bright CM IBM Platform Trinity
Node provisioning
Health check and monitoring
GUI and command line interfaces
SLURM, SGE & PBS support
Parallel shell
Modules environment
Compilers, debuggers & profilers
MPI + Scientific libraries
All of the standard HPC cluster manager requirements included All of the standard HPC cluster manager requirements included
Containerized HPC building blocks
Cloud Computing Ready
ClusterVision Engineer Innovate Integrate
A traditional cluster
Login node(s) Worker node(s) Storage node(s)
ClusterVision Engineer Innovate Integrate
A Trinity managed cluster
Login node(s) Worker node(s) Storage node(s)
Virtual Cluster A: runs the HPC stack for department A Virtual Cluster A: runs the HPC stack for department A
Virtual Cluster B: runs the HPC stack for department B Virtual Cluster B: runs the HPC stack for department B
Virtual Cluster C: runs VDI Virtual Cluster C: runs VDI
Virtual Cluster D: runs general IT infrastructure using OpenStack Virtual Cluster D: runs general IT infrastructure using OpenStack
Trinity Dashboard
(Single management interface)
Trinity Dashboard
(Single management interface)
ClusterVision Engineer Innovate Integrate
In summary: In summary:
• An HPC Cloud is a powerful tool for the arsenal of any industry, small or big.
•It gives both power and flexibility at many stages of product design and testing
•It can reduce cost by consolidating resources used for different purposes
•New software technologies are alleviating performance concerns
•Many vendor choices to suit every need
•Trinity is a great choice for a private cloud, providing a full cluster manager, HPC
stack and cloud management for HPC, IT, Data.
ClusterVision Engineer Innovate Integrate
HPC, Cloud and BigData are coming together HPC, Cloud and BigData are coming together
Hardware Resources (nodes, network, disks etc) Hardware Resources (nodes, network, disks etc)
Resource Management Resource Management
Compute Virtualization Compute Virtualization
HPC CRM Database VDI Email
Monitoring Monitoring
Authentication Authentication
Deployment Deployment
Object storage Object storage Dashboard Dashboard
HPC, Cloud and BigData have a lot of overlap: • Centralised resources • Same complex management, same complex environment • Similar high performance storage and powerful server requirements • Almost same physical networking • Same controller <-> worker-node relationship
ClusterVision Engineer Innovate Integrate
Neat tricks in Virtual Machines
Reliable checkpoint restart of jobs
• Towards a 100% scheduling efficiency
• Fast track high priority users
• Move jobs within the cluster & outside the cluster
The price: loosing performance (5% -> 1%)
Node A
Node B
Node C
Node D
Current time
t
Current time
t
ClusterVision Engineer Innovate Integrate
Why HPC in the Cloud?
What is OpenStack?
A set of software tools for building and managing cloud computing platforms
for public and private clouds.
OpenStack is primarily deployed as an infrastructure as a service (IaaS)
solution.
OpenStack began in 2010 as a joint project of Rackspace Hosting and
NASA.
Currently, it is managed by the OpenStack Foundation. More than 200
companies have joined this project, including Dell, Intel and Oracle.
ClusterVision Engineer Innovate Integrate
Challenges addressed with OpenStack Problem Solution
Manager: Resources wasted on a Virtual Desktop Infrastructure (VDI) is idle at night
Admin: Inability to collaborate with external parties due to lack of a security infrastructure for hosting CPUs/disks
User Group: Inflexibility in building and maintaining similar environments on multiple physical platforms
User: has a deadline for a paper/conference tomorrow, requires fast amounts of immediate CPU cycles
With OpenStack you can easily switch from the Virtual Desktop Infrastructure (VDI) to HPC
With OpenStack, you can securely host CPU/DISK for paying customers through the use of virtual instances / environments
With OpenStack, you can easily share images which contain predefined application binaries and/or environments
With OpenStack this user would more easily access a larger or even external infrastructure with the required CPU / Storage environment
Manager: vmware and HPC environment running independently from each other
With OpenStack, you merge these into one single efficient environment
ClusterVision Engineer Innovate Integrate
Using the cloud to address growing challenges in HPC
IT Manager
▪ Rising infrastructure & personnel costs
▪ Growing complexity / fragmented infrastructure
▪ Increasingly complicated personnel needs
System Administrator
▪ Growing complexity of hardware & software environment
▪ Managing tenants with different needs/workflows
▪ Managing secure access to resources
▪ Dealing with hardware changes
User ▪ Cluster environment different from workstation ▪ Software stack needed for workflow not pre-installed ▪ Non- availability of resource when needed most
ClusterVision Engineer Innovate Integrate
Why HPC in the Cloud?
What can the Cloud bring to HPC?
HPC and Cloud also have significant differences
Cloud:
● Split bigger computing units
into smaller
● Timesharing execution model
● Elastic
● Provides commodity (virtual)
hardware
● Increases utilisation to 85%
HPC:
● Merge smaller computing
units into a single whole
● Batch execution model
● Backfilling
● Needs specialised hardware
(GPU, IB)
● Utilisation already above
85%
ClusterVision Engineer Innovate Integrate
ClusterVision Recent References
Dolphin Geophysical
▪ Lasting Partnership
▪ Onshore / offshore
▪ HPC solutions & extended consulting services
▪ Bright, BeeGFS, Dell / Asus hardware
Volvo IT
▪ HPC services partner for Dell
▪ Main installs
▪ Lyon (France)
▪ Gothenburg (Sweden)
Ecole Polytechnique Fédérale de Lausanne
▪ Framework contract 512+ compute nodes
▪ Intel Ivy Bridge / Haswell (Truescale & Servers)
▪ Close collaboration on application fine tuning
▪ XCat2
National Supercomputer Center (Sweden)
▪ 640 compute nodes
▪ Asus 4 nodes in 2U systems
▪ First Haswell reference: 2640V3 (8 cores / 2.6GHz)
ClusterVision Engineer Innovate Integrate
What is RSA and why does it fit so nicely?
What?
ClusterVision end-to-end HPC management
Why?
- Avoid single point of failure
- Empower your highly qualified admins
- Use the power of scaling: lower your cost
- Reliability of service delivery
How?
- Central monitoring system
- Know instantly when something is wrong –
and react upon it
- Software updates
- Remote and onsite repair
- Notification and explanation for actions
- Management reporting
ClusterVision Engineer Innovate Integrate
OpenStack Trinity
OpenStack Trinity
Based on open source components with custom OpenStack plugins
● xCAT Deployment
● Docker Virtualization and image management
● SLURM Scheduling
● OpenMPI Communication
● RSA/Nagios Monitoring and Health-checks
Cookbook style recipes and easy customizations
ClusterVision Engineer Innovate Integrate
Trinity
Why xCAT?
1. xCAT is stable, full featured and well tested & Open Source
2. xCAT is also usable without OpenStack
3. Supports node discovery, image management, stateless nodes, IPMI
abstraction, PXE boot
4. OpenStack ironic is still in beta
5. Bright is expensive
6. The main xCAT weakness (lousy UI) will be mitigated by standardization
and adding a custom OpenStack dashboard to xCAT
ClusterVision Engineer Innovate Integrate
Trinity
Why Docker?
Docker is an operating system level virtualization framework.
1. Used extensively by Google and other industry giants
2. Lightweight
3. Much faster than full machine virtualization
4. Good support for image management and versioning
5. Pluggable virtualization drivers (lxc, OpenVZ)
6. Pluggable storage virtualization (AUFS, devicemapper, VFS)
ClusterVision Engineer Innovate Integrate
Current status of Trinity
Phase 1 of the project is completed
We support the following core features
1. Cluster partitioning (virtual clusters)
2. Multiple OS and applications versions between clusters
3. Isolated (sandboxed) applications between clusters
4. Setup OpenStack insideTrinity
Will require manual customizations and improvisation to support this “in the
field”. Overall dashboard is absent.
POC Q3 2014: Bristol & Cambridge University.
ClusterVision Engineer Innovate Integrate
A new approach: Container-based virtualization A new approach: Container-based virtualization
Linux containers
LXC is an operating system level
virtualization method for running
multiple isolated Linux systems
(containers) on a single control
host.
- Lightweight
- Fast provisioning
- Workload isolation
- Near bare metal performance
Docker
is an open-source project that
automates the deployment of
applications inside containers,
by providing an additional layer
of abstraction and automation.
ClusterVision Engineer Innovate Integrate
Adopting cloud computing is easier Adopting cloud computing is easier
• Lack of infrastructure
• Cost of equipment
• Cost of operation
• Lack of expertise
• Lack of experience
ClusterVision Engineer Innovate Integrate
Current status of HARP
Phase 2 (Q3 2014):
1. Allow remote management (RSA inside OpenStack dashboard)
2. Allow self-service partitioning by customers (use case 5)
Phase 3 (Q4 2014):
1. Allow automated elasticity (meta-scheduling, repartition resources
based on a calendar- use case 2)
2. Allow sharing of resources between HPC clusters
3. Allow self-service of jobs by customers of our customers (SaaS model)