nasa goddard: head in the clouds

23
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 NASA Goddard: Head in the Clouds Dan Duffy, NASA Steve Orrin, Intel Tim Carroll, Cycle Computing ©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Upload: amazon-web-services

Post on 14-Aug-2015

218 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

NASA Goddard: Head in the Clouds

Dan Duffy, NASA

Steve Orrin, Intel

Tim Carroll, Cycle Computing

©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Page 2: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Fastest growing workloads

Fraud DetectionRisk Modeling

Drug DesignGenomics

Modeling and Simulation

Unstructured Data Analysis,

Data Lakes

Page 3: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Most resource intensive

1 core 8 cores 8 servers 10–10000 servers

Page 4: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Great, so…what’s the problem?

Page 5: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

The challenge of fixed capacity

Time

Capa

bilit

y

Internal Capacity

User Demand

System Organization

Page 6: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Transform/life sciencesThe problem in 2013: • Cancer research needed 50,000 cores,

not available in-house

The options they didn’t choose: • Buy infrastructure: Spend $2M, wait 6 months• Write software for 9–12 months this 1 app

Solution: • Created 10,600 server cluster• 39.5 years of computing in 8 hours• Found 3 potential drug candidates!• Total infrastructure bill: $4,372

6

Page 7: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Cycle powers cloud BigData and BigCompute

Data Workflow

Cloud OrchestrationAnalyticsModeling

Internal Compute

Compute Burst

Software required to drive analytics and simulation at scale:

• Easy access

• Highly automated

• On-demand

• Ask the right questions

Page 8: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Best way to try it… try it

8

to try it…try [email protected]

Page 9: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 20159

Measure Woody Biomass on South Side of the Sahara at the 40–50 cm Scale Using AWS

Overview of the NASA Head in the Clouds Project presented at the Amazon Web Services Public Summit 2015

Daniel Duffy [email protected] and on Twitter @dqduffyHigh Performance Computing Lead at the

NASA Center for Climate Simulation (NCCS) – http://www.nccs.nasa.gov and @NASA_NCCSGoddard Space Flight Center (GSFC) – http://www.nasa.gov/centers/goddard/home/

Page 10: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

ESD Project Won Intel Head in Clouds Challenge Award to Estimate Biomass in South SaharaProject Goal• Using NGA data to estimate tree and bush biomass over the

entire arid and semi-arid zone on the south side of the Sahara

Project Summary• Estimate carbon stored in trees and bushes in arid and semi-

arid south Sahara• Establish carbon baseline for later research on expected CO2

uptake on the south side of the Sahara

Principal Investigators• Dr. Compton J. Tucker, NASA Goddard Space Flight Center• Dr. Paul Morin, University of Minnesota

Tree Crown

Shadow

NGA 40 cm imagery representing tree and shrub automated recognition

Page 11: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Intel• Professional Services and Funding for AWS Resources

Amazon Web Services (AWS)• Compute and storage• Support to set up environment

Cycle Computing• Cloud Resource Management Software• Services to install and configure the software

Climate Model Data Services (CDS – GSFC Code 600)• NGA data support

NASA Center for Climate Simulation (NCCS – GSFC Code 606.2)• System administration, application support, and data movement

NASA CIO• General cloud consulting and coordination support

Partners and Resources

Page 12: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Existing Sub-Saharan Arid and Semi-Arid Sub-Meter Commercial Imagery

9600 Strips (~80TB) to Be Delivered to GSFC

~1600 strips (~20TB) at GSFC

Area Of Interest (AOI) for Sub-Saharan Arid and Semi-Arid Africa

Page 13: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

The DigtalGlobe Constellation

The Entire Archive is Licensed to the USG

GeoeyeQuickbird

Ikonos

Worldview 1

Worldview 2

Worldview 3 (Available Q1 2015)

Page 14: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

14

Panchromatic and multispectral mappingat the 40- and 50-cm scale

Page 15: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Use Niger as the test caseNGA data over Niger

• Currently have about 16,000 total scenes covering Niger (the data is already orthorectified)

• For this test case, approximately 3,120 scenes need to be processed to generate the vegetation index

• Each scene is approximately 30,000 x 30,000 data points (pixels)

• Will break each scene up into 100 tiles (3,000 x 3,000)

Where is the data?

• Data currently resides within the NCCS and in AWS

Additional data

• If we are successful and have additional time and resources, other African areas can be studied.

15NASA Head in the Clouds Project

Page 16: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Processing requirementsBased on the tests run in the NCCS private cloud, following processing requirements were

estimated

• The tests were run on a single core (Intel E5-2670 2.5 GHz processor) virtual machine with 2 GB of

memory

• Each of the 3,120 scenes is broken up into 100 tiles

• Each tile took 24 minutes

• Hence, one scene will then take 24 * 100 = 2,400 minutes of total processor time (about 40 wall

clock hours)

• Tiles and scenes can be run in parallel

• Total scene to process = 312,000

• Total compute hours = 124,800

Target completion time

• 1 month will take between 175 to 200 virtual machines running non-stop

16NASA Head in the Clouds Project

Page 17: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Input and output dataInput data

• Total input of about 8 TB for the 3,120 scenes

• Average of about 2.63 GB of data per scene

• Average of about 26.3 MB of data per tile

Intermediate data products

• Unsure of how much intermediate data products are needed; this will impact the amount of

temporary space required for each run

Output data products

• Total output data is estimated to be 25% of the input data

• Estimated total output is about 2 to 3 TB 

• Output data will be transferred back to the NCCS

17NASA Head in the Clouds Project

Page 18: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Cluster configuration requirements

18

Category Description Requirement

Number of Cores How many cores are required on a single node for the application?

1 per tile

Amount of Memory (RAM) How much memory on a node (or per core) is required for the application?

2 GB per tile

Operating System (OS) What operating system does the application need? Linux (Centos or debian)

Libraries/Tools/Software What additional libraries, tools, and software are needed to be installed? Compilers? Commercial software?

None

Parallelization Can the application run in a parallel manner? If so, how (threaded, MPI, or multiple instances of the application)?

Inherently parallel processing of each scene and/or tile

Cluster If the application runs in parallel across many nodes, how many nodes are required?

175 – 200 to complete in 1 month; more can be used

Storage How much storage space will be required for each run (input, intermediate, and output files)?

Total Input – 8 TB (approx. 2.6 GB for each scene)Intermediate – To be determinedTotal Output Back to NCCS – 2 TB ( approx. 25% of total input)

Shared Storage Does this storage have to be shared across all nodes? No

Page 19: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Workflow

19

DataManCycle Computing Data Transfer

Software

NCCS Science Cloud

(Internal Cloud)Shared File

SystemNGA Data at NASA

NGA Data External to NASA (PGC, Digital Globe)

Data to be copied into the NCCS science cloud NGA data repository.

NCCS/NASA VM

Local

Data

A resource manager (batch queue) will be running in AWS. Scientists will interact and launch jobs through the Cycle Computing system directly in AWS.

Virtual machines will be launched in AWS. After the job is completed, the results will be copied back to the NCCS.

VM

Local

Data

VM

Local

Data

VM

Local

Data

AWS

VM VM VM

Virtual machines in the internal cloud can read the data directly from the shared disk in the NASA internal cloud. No additional data movement is required.

Amazon S3

Data to be processed is staged into Amazon S3. Data will be moved to the local storage of the VM’s for processing. Products could be stored in S3 for transfer to the NCCS at a later time.

Batch Queue System

The Cycle Computing DataMan software will be used to transfer the data into Amazon S3.

Cycle Computing System

Page 20: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Time line

20

Category Dec Jan Feb Mar Apr May Jun Jul Aug Sep

Bi-Weekly Tag Ups

Requirements/Scope

Setup/Configuration

Test Runs

Transfer Data to S3

Configure S3 Buckets

Production Runs

Analysis

Final Report

Page 21: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Why use Cycle Computing and AWS?• The bigger goal is to analyze the entire arid and semi-arid zone on the south side of

the Sahara– About 80 TB

– 10x the data that the initial project will analyze

• On 200 virtual machines, this will take 10 months!– How can we accelerate this?

• Can easily scale up the number of virtual machines using the Cycle Computing software and the AWS resources

– Once the data is in AWS, 80 TB of data can be analyzed in approximately the same amount

of time as 8 TB of data

– Scientists really love this part!

• Might need longer given the data transfers may take time – can overlap data transfers and computation

21

Page 22: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Thanks goes to the following…NASA• Dr. Compton Tucker (Co-PI)• Katherine Melocik (GSFC)• Jennifer Small (GSFC)• Dr. Tsengdar Lee (HQ)• Daniel Duffy (GSFC)• Mark McInerney (GSFC)• Hoot Thompson (GSFC)• Garrison Vaughn (GSFC)• Brittany Wills (GSFC)• Scott Sinno (GSFC)• Ray Obrien (ARC)• Richard Schroeder (ARC)• Milton Checchi (ARC)

University Partners• Paul Morin (Co-PI, Univ. Minnesota)• Claire Porter (Univ. Minnesota)• Jamon Van Den Hoek (Oak Ridge)

22

Cycle Computing• Tim Carroll• Michael Requa• Carl Chesal• Bob Nordlund• Glen Otero• Rob Futrick

AWS• Jamie Baker• Jeff Layton

There are others… My apologies for those I missed. These are typically the ones on the our conference calls!

Page 23: NASA Goddard: Head in the Clouds

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Thank You.This presentation will be loaded to SlideShare the week following the Symposium.

http://www.slideshare.net/AmazonWebServices

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015