cycle computing record-breaking petascale hpc run

25
Record-breaking Petascale CycleCloud HPC Production Run 156,000-core Cluster (1.21PetaFLOPS) Accelerates Schrödinger Materials Science and Green Energy November 2013 Cycle Computing

Upload: insidehpc

Post on 08-May-2015

710 views

Category:

Technology


1 download

DESCRIPTION

In this slidecast, Jason Stowe from Cycle Computing describes the company's recent record-breaking Petascale CycleCloud HPC production run. "For this big workload, a 156,314-core CycleCloud behemoth spanning 8 AWS regions, totaling 1.21 petaFLOPS (RPeak, not RMax) of aggregate compute power, to simulate 205,000 materials, crunched 264 compute years in only 18 hours. Thanks to Cycle's software and Amazon's Spot Instances, a supercomputing environment worth $68M if you had bought it, ran 2.3 Million hours of material science, approximately 264 compute-years, of simulation in only 18 hours, cost only $33,000, or $0.16 per molecule." Learn more: http://blog.cyclecomputing.com/2013/11/back-to-the-future-121-petaflopsrpeak-156000-core-cyclecloud-hpc-runs-264-years-of-materials-science.html Watch the video presentation: http://wp.me/p3RLHQ-aO9

TRANSCRIPT

Page 1: Cycle Computing Record-breaking Petascale HPC Run

Record-breaking Petascale CycleCloud HPC Production Run 156,000-core Cluster (1.21PetaFLOPS) Accelerates Schrödinger Materials Science and Green Energy

November 2013 Cycle Computing

Page 2: Cycle Computing Record-breaking Petascale HPC Run

Cycle Computing believes utility high performance computing

accelerates invention

Page 3: Cycle Computing Record-breaking Petascale HPC Run

Records broken, Science done On November 3rd, ran a “MegaRun” cluster that had: • 156,314 cores and 1.21 PetaFLOPS of theoretical peak compute power • Ran 2.3 Million hours, totaling 264 years of computing, in 18 hours • Executed world-wide, across all 8 public AWS Regions (5 continents) • Compared to $68Million to purchase – done on CycleCloud with Spot Instances for just $33K THE SCIENCE • Finding Organic Photovoltaic Compounds that are more efficient, easier to manufacture to help remove

the US’s reliance on fossil fuels. • Designing, synthesizing, and experimenting with a new material can take 1 year of a scientists time

requiring hundreds of thousands of dollars in equipment, chemicals, etc. or With Schrödinger Materials Science’s tools, on Cycle and AWS Spot Instances, it cost $0.16 per molecule

• The run analyzed 205,000 compounds in total • This is the exact kind of science being outlined in the Materials Genome initiative from the White House

Page 4: Cycle Computing Record-breaking Petascale HPC Run

Challenge of Materials Science Traditional Materials Design •  Design, Synthesis, Analysis are challenging

for an arbitrary material

•  Low hit rate for viable materials

•  Total Molecule Cost: •  Time: A year for a grad student

•  $100,000s in equipment, chemicals, etc.

With Schrödinger Computational Chemistry & Cycle

•  Schrödinger Materials Science tools simulate accurate properties in hours

•  Simulation guides the researcher’s intuition

•  Focus physical analysis on promising materials

•  Total cost: •  Time to enumerate molecules: Minutes/

hours

•  $0.16 per molecule in infrastructure using AWS Spot Instances

Page 5: Cycle Computing Record-breaking Petascale HPC Run

Designing Solar Materials The Challenge is efficiency •  Need to efficiently turn photons from the sun to Electricity The number of possible materials is limitless •  Need to separate the right compounds from the useless ones •  If the 20th century was the century of silicon, the 21st will be all

organic How do we find the right material,

without spending the entire 21st century looking for it?

Page 6: Cycle Computing Record-breaking Petascale HPC Run

The Challenge for the Scientist Dr. Mark Thompson

Professor of Chemistry, USC “Solar energy has the potential to replace

some of our dependence on fossil fuels, but only if the solar panels can be made very inexpensively and have reasonable to high efficiencies. Organic solar cells have this potential.”

Challenge: run a virtual screen of 205,000 molecules in continuing analysis of possible materials for organic solar cells

Page 7: Cycle Computing Record-breaking Petascale HPC Run

The right needle in the right hay stack Before: Trade-off between compute time vs. sampling

Now: Better analysis, more materials è Better results

Coarse screen, Small

samples

Higher Quality

Analysis, More

materials

More Materials

More Materials

Page 8: Cycle Computing Record-breaking Petascale HPC Run

Solution: Utility HPC On-demand compute power is transformative for users, but hard to make production �  Big Opportunity to help Manufacturing, Life Science, Energy, Financial

companies:

�  Rise of BigData, compute, Monte Carlo problems that power modern business and science

�  Applications, like Schrödinger Materials Science tools, offer a compelling alternative

to physically testing products

�  Amazon Web Services makes infrastructure easily accessible

�  AWS Spot instances decrease the cost of compute

�  Science & engineering face faster time-to-market, increased agility requirements

�  Capital efficiency (OpEx replacing CapEx) are organizational goals

Page 9: Cycle Computing Record-breaking Petascale HPC Run

Why isn’t everyone doing this? Because it is really complicated, and really hard to orchestrate

technical applications, securely, at scale We’re the first and only ones doing this including the well-

publicized: 2000, 4000, 10000, 30000, and 50000 core clusters in 2010-2013

Clients including: Johnson & Johnson, Schrödinger, Pfizer,

Novartis, Genentech, HGST, Pacific Life Insurance, Hartford Insurance Group …

Page 10: Cycle Computing Record-breaking Petascale HPC Run

Cycle Computing Makes Utility HPC a Reality Easily orchestrates complex workloads and data access to local and Cloud HPC �  Scales from 100-1,000,000 cores �  Handles errors, reliability �  Schedules data movement �  Secures, encrypts and audits �  Provides reporting and chargeback �  Automates spot bidding �  Supports Enterprise operations

Page 11: Cycle Computing Record-breaking Petascale HPC Run

Challenge: 205,000 compounds

totaling 2,312,959 core-hours, or 264 core-years

Page 12: Cycle Computing Record-breaking Petascale HPC Run

Solution: “MegaRun” Cluster

Tool   Description  Schrödinger  Materials  Science  tools  

Set  of  automated  workflows  that  enable  organic  semiconductor  materials  to  be  simulated  accurately  

CycleCloud   HPC  clusters  at  small  to  massive  scale:  application  deployment,  job/data  aware  routing,  error-­‐handling  

Jupiter   Cycle’s  massively  scalable,  resilient  cloud  scheduler    Chef   Automated  configuration  at  scale  Multi-­‐Region  AWS  Spot  Instances   Massive  server  resource  capacity  across  all  public  regions  of  AWS  

New record: MegaRun is the largest dedicated Cloud HPC Cluster to date on Public Cloud

Page 13: Cycle Computing Record-breaking Petascale HPC Run

16,788 Spot Instances, 156,314 cores!

205,000 molecules 264 years of computing

Page 14: Cycle Computing Record-breaking Petascale HPC Run

156,314 cores = 1.21 PetaFLOPS (Rpeak)

Equivalent to Top500 Jun2013 #29

205,000 molecules 264 years of computing

Page 15: Cycle Computing Record-breaking Petascale HPC Run

Done in 18 hours Access to $68M system

for $33k

205,000 molecules 264 years of computing

Page 16: Cycle Computing Record-breaking Petascale HPC Run

8-Region Deployment

US-West-1 US-East

EU US-West-2

Brazil Singapore

Tokyo

Australia

Page 17: Cycle Computing Record-breaking Petascale HPC Run

Jupiter Scheduler � Make large cloud regions work together � Spans many regions/datacenters to resiliently route

work with minimal scheduling overhead �  Batch/MPI Schedulers get 10k cores doing 100k jobs �  Jupiter seeks to get Millions of cores doing 10Ms tasks

�  Currently 100k’s cores doing 1M tasks on large runs

� Can survive machine, availability zone, and region failure while still executing the full workload

Page 18: Cycle Computing Record-breaking Petascale HPC Run

Resilient Workload Scheduling

Page 19: Cycle Computing Record-breaking Petascale HPC Run

MegaRun – Facts and Figures Metric � Count �Compute Hours of Work� 2,312,959 hours �Compute Days of Work� 96,373 days �Compute Years of Work� 264 years �Molecule Count � 205,000 materials �Run Time � < 18 hours �Max Scale (cores) � 156,314 cores across 8 regions �Max Scale (instances) � 16,788 instances �

Page 20: Cycle Computing Record-breaking Petascale HPC Run

Accelerated Time to Result Cluster Scale � Cost � Run-time �

156,000 core CycleCloud � $33,000� ~ 18 hours�

300-core Internal cluster �(stopping all other work) � $132,000� ~ 10.5 months�

Page 21: Cycle Computing Record-breaking Petascale HPC Run

CycleCloud–156,000 cores

Page 22: Cycle Computing Record-breaking Petascale HPC Run

CycleCloud – 16,788 instances

Page 23: Cycle Computing Record-breaking Petascale HPC Run

8 Public Regions across AWS

Page 24: Cycle Computing Record-breaking Petascale HPC Run

Ramping up to full capacity

Page 25: Cycle Computing Record-breaking Petascale HPC Run

Solution: 205,000 compounds, 264 core years,

156k core Utility HPC cluster in 18 hours

for $0.16/molecule using

Schrödinger Materials Science tools, Cycle & AWS Spot Instances