utility hpc: right systems, right scale, right science

Utility HPC: Right Systems, Right Scale,

Right Science

Jason Stowe, CEO @jasonastowe, @cyclecomputing

I’m here to recruit you, for a cause

We believe utility access to compute power

makes impossible science, possible.

Dynamic, utility access to compute power

is as important as uptime

(that’s why coded infrastructure is critical)

Skeptical? Flickr: Tourist on Earth

In prior years (today?)

Researchers/engineers waited for computing

For the horsepower

For the place to put it

For it to be Configured..

Flickr: vaxomatic

Yesterday, high performance engineering, science clusters

were…

Too small when you need it most,

Too large every other time.

The Innovation Bottleneck: Researchers/Scientists/Engineers

Forced to size questions to the infrastructure you have

Multi-‐tenant systems create float capacity That is critical to innovation

The 60’s

The 70’s

The 80’s

The 90’s

The 00’s

From centralized to decentralized, collaborative to independent

and right back again!

The 10’s

Mainframes VAX The PC Beowulf Clusters Central Clouds

100% 60% 0% 40% ??? %

SHARIN

G ~ 0Mbit ~ 1Mbit ~ 10Mbit ~ 1000 Mbit ~ 10,000 Mbit

Bigger, better but further and further away from the scientist’s lab

Ask a Question Hypothesize Predict Experiment /

Test Analyze Final Results

The Scientific Method

Test and Analyze stages require the most time,

compute, and data



The Scientific Method

Any improvements to this cycle yield multiplicative

benefits

A Challenge Across Industries � 3 of Top 5 Insurance � 6 of Top 8 Pharmaceutical � 2 of Top 3 Banks � 2 of Top 3 Genomics Sequencing � 1 of Top 2 FPGA

Utility HPC in the News�WSJ, NYTimes, Wired, Bio-IT World BusinessWeek

To accelerate science, we need automation

Management Software

CC1/CCG Instances EBS S3

Shared FS

EBS

Utility HPC Cluster -‐ Scales to 50,000+ cores -‐ Data Scheduling -‐ Workload portability

Data & Application

Aware Movement

Traditional Scheduler

Massive Scale Based upon workload

Secure, HPC Cluster

User

HPC Reporting &

Audit

50,000-core CycleCloud Using Chef and AWS

ChefConf 2012

10,600-instance cluster against cancer target

ChefConf 2013

Created in 2 hours Configured with Search,

with Data bags

one Chef 11 server

We make software tools to easily orchestrate complex workloads and data access across Utility HPC

Today is a survey of use cases…

10,600 instance Life Science

Molecular Modeling

600 core Manufacturing Nuclear Power Plant for safety

simulation

Genomic Analysis RNA for

Stem Cells

#1: “Better” Science =

“Answer the question we want to ask”, not constrained to what fits

on local compute power

#2 “Faster” Science =

Run this “better” science, that would have taken

months or years in hours or days

Survey of Use Cases þ Drug Design þ CAD/CAM þ Genomics …

Life Sciences & Compute? C

ompu

te

Data/Bandwidth

Genomics

Molecular Modeling

CAD/ CAM

All Sample Analysis

Proteomics Biomarker/

Image Analysis

Sensor Data Import

Creating fake Charts, with Fake Data

Why is this important?

(W.H.O./Globocan 2008)

~2 million Type 2 diabetics, ~200k Type 1

Every day is crucial and costly

Before: Trade-off compute time vs.

accuracy

Now: Accurate analysis, fewer false

negatives, faster Initial

Coarse Screen

Higher Quality

Analysis

Best Quality

Process for Drug Design

Higher Quality

Analysis

Best Quality

Big 10 Pharma Built 10,600 instance cluster

($44M) in 2 hours, ran 40 years of science

in 11 hours for $4,372

Most Recent Utility Supercomputer server count:

AWS Console view:

Cycle’s view of this cluster:

One Chef 11 Server

Earlier Drug Design Novartis discussed at BioIT2012

� Needed �  Push-button Utility Supercomputer for molecular

modeling � Created

�  30,000 core run across US/EU Cloud (AWS) �  10 years of compute in 8 hours for $10,000 �  Found 3 compounds now in the wetlab as a result

�  Capacity is no longer an issue

�  Hardware = software �  Testing (error handling, unit testing, etc.)

e.g. Cycle spent ~$1M dollars on AWS over 5 years

�  The only way to do this is to automate

Lessons learned

Servers are not house plants

Servers are wheat

Nuclear Power Plant simulation

We don’t’ know what they’re running, but it has “Safety”

600-core CAD/CAM 3 Quarters of a year wait became 3 weeks

Site Data

Corporate

Firewall

3 Weeks instead Of 3 Quarters

Secure HPC

Cluster

TBs FS

External Cloud

~600 CPU cluster Scheduled

Data Engineer

Gene Expression Analysis Morgridge Institute for Research

Run holistic comparison of all 78 terabyte stem cell RNA samples to build a unique gene expression database

Make it easier to replicate disease in petri dishes w/induced stem cells

78 TB of Stem Cell RNA

1 Million compute hours, 115 years of computing in

1 week for $19,555

Gene Expression Analysis Morgridge Institute for Research

� Cluster details

�  5,000 to 10,000 cores for a week �  Very long individual analysis were check-pointed = Spot instance usage possible

Code can accelerate Science



The Scientific Method on Utility HPC

Yield “Better”, “Faster” Research for less $

I’m here to recruit you, for a cause

Contribute to Chef. Make the community better.

And you will help Cycle make impossible science,

possible.

2013 BigScience Challenge

$10,000 of free computing to science benefitting humanity

2012 winner: 115yr Genomic analysis

Enter at: http://cyclecomputing.com/big-science-challenge/enter

Thank You! Questions?

utility hpc: right systems, right scale, right science

Technology

years of compute

compute hours

data access

better science

withfake data

data bags

utility hpctoday

faster science