incite proposal writing webinar presentation · incite proposal writing webinar april 25, 2013 ......

49
INCITE Proposal Writing Webinar April 25, 2013 James Osborn ALCF Catalyst Team Argonne National Laboratory Matt Norman OLCF Scientific Computing Group Oak Ridge National Laboratory and Julia C. White, INCITE Manager

Upload: lehanh

Post on 03-Apr-2018

220 views

Category:

Documents


5 download

TRANSCRIPT

INCITE Proposal Writing Webinar

April 25, 2013

James Osborn ALCF Catalyst Team

Argonne National Laboratory

Matt Norman OLCF Scientific Computing Group

Oak Ridge National Laboratory

and Julia C. White, INCITE Manager

2

Overview

• Allocation programs [3] • INCITE mission and recent stats [4 – 12] • Titan and Mira [13 – 17] • Tips for applicants [18 – 36]

– Common oversights – Requesting a startup account – Benchmarking data

• Q&A [37, open discussion] • Conclusions [38 – 45]

– Submittal, review, and awards decisions – Contact links

3

Three primary ways for access to LCF Distribution of allocable hours

60% INCITE 4.7 billion core-hours in

CY2013 Up to 30% ASCR Leadership Computing

Challenge

10% Director’s Discretionary

Leadership-class computing

DOE/SC capability computing

4

What is INCITE?

INCITE promotes transformational advances in science and technology through large allocations of computer time, supporting resources, and data storage at the Argonne and Oak Ridge Leadership Computing Facilities (LCFs) for computationally intensive, large-scale research projects.

Innovative and Novel Computational Impact on Theory and Experiment

5

INCITE criteria Access on a competitive, merit-reviewed basis*

1 Merit criterion Research campaign with the potential for significant domain and/or community impact

3 Eligibility criterion • Grant allocations regardless of funding source*

• Non-US-based researchers are welcome to apply

2 Computational leadership criterion Computationally intensive runs that cannot be done

anywhere else: capability, architectural needs

*DOE High-End Computing Revitalization Act of 2004: Public Law 108-423

6

Twofold review process

New proposal assessment Renewal assessment Peer review:

INCITE panels • Scientific and/or

technical merit • Appropriateness

of proposal method, milestones given

• Team qualifications • Reasonableness

of requested resources

• Change in scope • Met milestones • On track to meet

future milestones • Scientific and/or

technical merit

Computational readiness

review: LCF centers

• Technical readiness • Appropriateness for

requested resources

• Met technical/ computational milestones

• On track to meet future milestones

Award Decisions

• INCITE Awards Committee comprised of LCF directors, INCITE program manager, LCF directors of science, sr. management

7

2013 INCITE statistics

Contact information Julia C. White, INCITE Manager

[email protected]

• Request for Information helped attract new projects

• Call closed June 27th, 2012

• Total requests ~15 billion core-hours, 3x more than the 5 billion core-hours requested last year

• Number of proposals submitted increased nearly 20%

• Awards of ~5 billion core-hours for CY 2013

• 61 projects awarded of which 20 are renewals

Acceptance rates 33% of nonrenewal submittals and 100% of renewals

8

2013 award statistics, by system

Jaguar Titan Mira Intrepid

2012 INCITE 2013 INCITE 2013 INCITE 2012 INCITE 2013 INCITE

Number projects* 35 32 27 31 27

Average Project 27M 58M 78M 24M 27M

Median Project 23M 49.5M 45M 20M 25M

Titan Mira Intrepid

Total Awards (Hrs in CY2013) 1.84B 2.11B 0.721B

* Totals of 32 projects at the OLCF, 37 projects at the ALCF (many of the ALCF projects received time on both Mira and Intrepid)

9

New PI’s in INCITE

• A “new” PI has never previously led an INCITE submittal • 32% of the nonrenewal projects are led by new PI’s

– 41 new projects awarded, 13 led by new PI’s

INCITE actively engages with new research teams through outreach such as workshops, email distributions, and individual networking.

10

2013 INCITE panel peer reviewers

83 science experts participated in the 2013 INCITE panel review.

• > 50% (e.g. more than 40) of the reviewers are: — Society fellows (AAAS, APS, SIAM, IEEE, etc), — Agency awardees (ex. NSF Early Career), — Laboratory fellows, — National Academy members, — National Society presidents

• 41% participated in the 2012 INCITE review

11

INCITE seeks high-impact research campaigns

Examples of previous successful INCITE applications that advance the state of the art across a broad range of topics and different mission priorities

• Glimpse into dark matter • Supernovae ignition • Protein structure • Creation of biofuels • Replicating enzyme functions • Global climate • Accelerator design • Carbon sequestration • Turbulent flow • Propulsor systems

• Membrane channels • Protein folding • Chemical catalyst design • Plasma physics • Algorithm development • Nano-devices • Batteries • Solar cells • Reactor design • Nuclear structure

12

INCITE breakthroughs since inception A few of the many science and engineering advances

Hours allocated 4.9M 6.5M 18.2M 95M 268M 889M 1.6B 1.7B 1.7B 5B Projects 3 3 15 45 55 66 69 57 60 61

Unprecedented simulation of magnitude-8 earthquake over 125-square miles, Proceedings, SC10

World’s first continuous simulation of 21,000 years of Earth’s climate history. Science (2009)

Largest-ever LES of a full-sized commercial combustion chamber used in an existing

helicopter turbine

Largest simulation of a galaxy’s worth of dark matter, showed for the first time the fractal-like appearance of dark

matter substructures. Nature (2008), Science (2009)

OMEN breaks the petascale barrier using more than 220,000 cores, Proceedings SC10

NIST proposes new standard reference materials from LCF concrete simulations

New method to rapidly determine protein structure, with limited experimental data Science (2010), Nature (2011)

Researchers solved the 2D Hubbard model and presented evidence that it predicts HTSC behavior

Phys. Rev. Lett 2005

Hours requested vs. allocated: ~2X per year ~3X per year

2007 2008 2009 2010 2011 2013 2012 2004 2005 2006

Modeling of molecular basis of Parkinson’s disease named #1 computational accomplishment

Breakthroughs 2008 Calculation of the number of bound nuclei in nature, Nature (2012)

13

INCITE resources: Mira at ALCF • Mira - Blue Gene/Q System

– Node: 64 bit PowerPC A2, 1.6 GHz • 16 cores, 4 HW threads/core • 16 GB memory • 205 GFLOPS peak

– 48 racks / 48k nodes / 768k cores – 768 TB of memory – Peak flop rate: 10 PF – 35 PB of disk + large tape storage system

• New Visualization Systems – Initial system available now – Advanced visualization system in 2015

14

Overview of Blue Gene/Q

Design Parameters BG/P BG/Q Improvement Cores / Node 4 16 4x HW threads / Core 1 4 4x Clock Speed (GHz) 0.85 1.6 1.9x Flop / Clock / Core 4 8 2x Nodes / Rack 1,024 1,024 -- RAM / core (GB) 0.5 1 2x Flops / Node (GF) 13.6 204.8 15x Mem. BW/Node (GB/sec) 13.6 42.6 3x Network Interconnect 3D torus 5D torus Smaller diameter Concurrency / Rack 4,096 65,536 16x

15

Scaling workshop at ALCF

Don’t miss the ALCF's scaling workshop that has produced Gordon Bell finalists for four years running. ALL ARE WELCOME!

• Hit your highest performance numbers in preparation for INCITE 2014 • Try your code out on full-system Mira reservations • Work one-on-one with ALCF experts to optimize your code’s performance Register today! Go to www.alcf.anl.gov for details, or www.alcf.anl.gov/workshops/mira-performance-boot-camp-2013

16

Titan at OLCF • Cray Linux Environment

operating system • Gemini interconnect

• 3-D Torus • Globally addressable memory • Advanced synchronization features

• AMD Opteron 6200 processor (Interlagos) • New accelerated node design using NVIDIA

multi-core accelerators • NVIDIA “Kepler” (K20) GPUs

• 27 PFlops peak performance • 584 TB DDR3 + 110 TB GDDR5 memory

Titan Specs Compute Nodes 18,688 Login & I/O Nodes 512 Memory per node 32 GB + 6 GB NVIDIA “Kepler” (2012) 1.31 TFlops Opteron 2.2 GHz Opteron performance 141 GFlops Total Opteron Flops 2.6 PFlops Disk Bandwidth ~ 1 TB/s

17

Cray XK7 compute node XK7 Compute Node

Characteristics AMD Opteron 6200 “Interlagos” 16 core processor @ 2.2GHz

Tesla K20 “Kepler” @ 1.31 TF with 6GB GDDR5 memory

Memory Host: 32GB @ 1600 MHz DDR3 GPU: 6GB @ 5200 MHz GDDR5 Gemini High Speed Interconnect

Four compute nodes per XK7 blade. 24 blades per rack

18

Key questions to ask yourself • Is both the scale of the runs and the time demands

of the problem of LCF scale? – Yes, I can’t get the amount of time I need anywhere else. – Yes, I my simulations are too large to run on other systems.

• Do you need specific LCF hardware? – Yes, the memory and I/O available here are necessary for my work.

Do answer these questions in the proposal. This is especially helpful for the computational readiness reviewers.

TIPS

19

• Do you have the people ready to do this work? – No, I’m waiting to hire a postdoc. – Yes, I have commitments from the major participants.

• Do you have a workflow? • Do you have a post-processing strategy? • Do you use ensemble runs and need LCF resources?

– My ensembles can run under the direction of a large job, with I/O scaling on a parallel file system. -> possible yes

– My ensemble expects to run millions of serial jobs on nodes with local disk available. -> probably no

Key questions to ask yourself (cont.)

Some of these characteristics are negotiable, so make sure to discuss atypical requirements with the centers

20

Some limitations on what can be done

• Laws regulate what can be done on these systems – LCF systems have cyber security plans that bound the types of

data that can be used and stored on them

• Some kinds of information we cannot have – Personally Identifiable Information (PII) – Classified Information or National Security Information – Unclassified Controlled Nuclear Information (UCNI) – Naval Nuclear Propulsion Information (NNPI) – Information about development of nuclear, biological or chemical

weapons, or weapons of mass destruction

Inquire if you are unsure or have questions

21

Proposal form: Outline 1 Principal investigator and co-principal investigators 2 Project title (80 characters) 3 Research category 4 Project summary (50 words) 5 Computational resources requested 6 Funding sources 7 Other high-performance computing support for this project 8 Project narrative, other materials

(A) Executive summary (1 page) (B) Project narrative including impact of the work, objectives, benchmarking (15 pages) (C) Personnel justification & management plan (D) Milestone table (E) Publications resulting from INCITE Awards (*new*) (F) Request for Information – Data Management Plan (*new*)

9 Application packages 10 Proprietary and sensitive information 11 Export control 12 Monitor information

22

Getting started: Know your audience

• Remember, INCITE is very broad in scope – Computational-science-savvy senior scientists/engineers drawn

around the world from national labs, universities, and industry – They will be assessing potential impact of this work versus other

proposals submitted

Don’t assume that your audience is familiar with your work through other review programs (ex. funding agencies). INCITE is very broad in scope and you may be competing against a diverse set of proposals.

TIPS

23

Narrative: Impact of the work

• This is the principal determinant of a successful submittal – What is the scientific challenge, and its significance – Impact of a successful computational campaign — the big picture – Reasons this work needs to be done now, on the resources requested

Do give a compelling picture of the impact of this work, both in the context of your field and, where appropriate, beyond. Do explain why this work cannot be done elsewhere. Reviewers scrutinize whether another allocation program may be a better fit.

TIPS

24

Narrative: Objectives and milestones

• Successful submittals must also very clearly – Describe approach to solving the problem, its challenging aspects,

preliminary results – Tie to the resources requested your key objectives, key simulations,

and project milestones in your milestone table

Do clearly articulate your project’s milestones for each year. Reviewers have downgraded proposals that don’t show that the PI has a well thought out plan for using the allocation. Do bear in mind that the average INCITE award of time for a single project is equivalent to several million dollars. Spend your time on the proposal accordingly.

TIPS

25

Narrative Computational approach Provide the basic foundation

• Describe the underlying formulation – Don’t assume reviewers know all the codes – Do show that the code you plan to employ is the correct tool for your research plan – Do explain the differences if you plan to use a private version of a well-known code

• List programming languages, libraries and tools used – Check that what you need is available on the system

26

Narrative: Computational campaign The details are important!

• Describe the kind of runs you plan with your allocation – L exploratory runs using M nodes for N hours – X big runs using Y nodes for Z hours – P analysis runs using Q nodes for R hours

• Big runs often have big output and/or big I/O – Show you can deal with it and understand the bottlenecks – Understand the size of results, where you will analyze them,

and how you will get the data there

Do clearly emphasize the relationship between the proposed runs and the major milestones. This helps the Awards Committee maximize your milestones, if they can’t grant the full award requested.

TIPS

27

Narrative: Personnel & Management plan

• Experience and credibility – List the scientific and technical members and their experience as

related to the proposed scientific or technical goals – Successful proposal teams demonstrate a clear understanding of

petascale computing and can optimally use these resources to accomplish the stated scientific/technical goals

• Transparent use of time – Projects involving multiple teams or different thrust areas should

clearly state how the allocation will be distributed and managed

Do include in “Personnel Justification” a brief description of the role of each team member. Although not a requirement, proposals with application developers or clear connections to development teams are favorably viewed by readiness reviewers.

TIPS

28

Narrative: New for 2014 Call for Proposals

• Publications resulting from INCITE awards – To show impact of the INCITE program, we ask authors to list the

publications from previous INCITE awards to this project team for work related to the proposal under consideration

– Include only publications with INCITE acknowledgements

• Request for Information – Data Management Plan (DMP) – We plan to implement in future solicitations a requirement for a formal

DMP as part of the proposal. Submit a short document, not to exceed one page, which describes your anticipated future data management strategies and needs. [Note: this is for INCITE management and will not be included in the materials sent to reviewers.]

29

Are you ready to apply now?

• Port your code before submitting the proposal – Check to see if someone else has already ported it – Request a startup account if needed (see next slide)

• Provide compelling benchmark data – Prove application scalability in your proposal – Run example cases at full scale – If you cannot show proof of runs at full scale,

then provide a very tight story about how you will succeed

Do make the benchmark examples as similar to your production runs as possible, or, make it clear why another benchmark example is valid for your proposed work.

TIPS

30

Request a start up account now • Director’s Discretionary Proposals considered year-round • Award up to millions of hours • Allocated by LCF center directors

• Director’s Discretionary (DD) requests can be submitted anytime

• DD may be used for porting, tuning, scaling in preparation for an INCITE submittal

• Submit applications at least 2 months before INCITE Call for Proposals closes

Argonne DD Program: http://www.alcf.anl.gov/getting-started/apply-for-dd Oak Ridge DD Program: www.olcf.ornl.gov/support/getting-started/olcf-director-discretion-project-application/

31

Computational approach Use of next-generation systems

Do provide a development plan articulating a strategy for maximizing node-level parallelism.

TIPS

• Use as much of the resources on a node as possible • Strategies to consider

– Hybridization utilizing OpenMP or Pthreads to expose thread-level or SMP-like parallelism (for multiple cores/HW threads)

– Make use of available accelerators (GPUs) using, e.g. CUDA, OpenCL, compiler directives, etc.

– Algorithmic improvements or design to maximize data locality and memory hierarchy usage

32

Code performance overview

Do provide performance data in the requested format. Do provide performance of the scaling baseline, not just scaling efficiency

TIPS

• Performance data should support the required scale – Use similar problems to what

you will be running – Show that you can get to the range

of processors required – Best to run on the same machine,

but similar size runs on other machines can be useful

– Be clear about the number of nodes, MPI ranks, threads and GPUs (if applicable) being used in runs

– Include production style I/O in benchmarks (checkpoint/restart, analysis)

– Describe how you will address any scaling deficiencies

33

Parallel performance: Direct evidence

Pick the approach(es) relevant to your work and show results

WEAK SCALING DATA STRONG SCALING DATA Increase problem size as resources are increased

Increase resources (nodes) while doing the same computation

10.0

10.2

10.4

10.6

10.8

11.0

Tim

e to

solu

tion

(m)

Number of processors

Actual Ideal

2400 4800 9600 19200 38400

Weak Scaling Example

76800 0

1,000

2,000

3,000

4,000

5,000

6,000

Tim

e to

solu

tion

(s)

Number of processors

Actual Ideal

2400 4800 9600 19200 38400

Strong Scaling Example

76800

34

More about our ensemble policy or, “Can I meet the computationally intensive criterion by loosely coupling my jobs?”

• Possibly “yes”, – If you require large numbers of discrete or loosely coupled simulations

where time-to-solution is an untenable pacing issue, and – If a software workflow solution (e.g., pre- and post-processing scripts

that automate run management and analysis) is provided to facilitate this volume of work.

• Probably “no”, – If by decoupling the simulations the work could be effectively carried

out on a smaller resource within a reasonable time-to-solution.

Do examine the Frequently Asked Questions for these and other topics at http://hpc.science.doe.gov/allocations/incite/faq.do

TIPS

35

Proposal form: Final check 1 Principal investigator and co-principal investigators 2 Project title (80 characters) 3 Research category 4 Project summary (50 words) 5 Computational resources requested 6 Funding sources 7 Other high-performance computing support for this project 8 Project narrative, other materials

(A) Executive summary (1 page) (B) Project narrative including impact of the work, objectives, benchmarking (15 pages) (C) Personnel justification & management plan (D) Milestone table (E) Publications resulting from INCITE Awards (*new*) (F) Request for Information – Data Management Plan (*new*)

9 Application packages 10 Proprietary and sensitive information 11 Export control 12 Monitor information

36

Renewal form: Outline 1 Principal investigator and co-principal investigators 2 Research category 3 Project status summary (1 page) 4 Renewal computational resources requested 5 Project achievements and plans

(A) Project achievements (1 page) (B) Project plans (15 pages)

• Project achievements – Accomplishments, publications, allocation use, parallel performance,

anticipated data storage needs

• Project plans for next year – What you expect to accomplish, anticipated production-to-

development job time, benchmarking data if new codes to be used

37

Q&A

Open discussion on what authors should include in their proposal

38

Submitting your proposal or renewal

• You may save your proposal at any time without having the entire form complete

• Your Co-PIs may also log in and edit your proposal • Required fields must be completed for the form to be

successfully submitted – An incomplete form may be saved for later revisions

• After submitting your proposal, you will not be able to edit it

Submit

39

INCITE awards committee decisions

• The INCITE Awards Committee is comprised of the LCF center directors, INCITE program manager, LCF directors of science and senior management.

• The committee identifies the top-ranked proposals by a) peer-review panel ratings, rankings, and reports; and b) additional considerations, such as the desire to promote use of HPC resources by underrepresented communities.

• Computational readiness review is used to identify whether the top-ranked proposals are “ready” for the requested system.

40

INCITE awards committee decisions (cont.)

• A balance is struck to ensure – each awarded project has sufficient allocation to enable

all or part of the proposed scientific or technical achievements – a robust support model for each INCITE project

• When the centers are oversubscribed, each potential project is assessed to determine the amount of time that may be awarded to allow the researchers to accomplish significant scientific goals.

• Requests for appeals can be submitted to the INCITE manager or LCF center directors. If an error has occurred in the decision-making process (e.g. procedural, clerical), consideration is given by the INCITE management and an award may be granted.

41

2014 INCITE award announcements

• Awards will be announced by INCITE Manager, Julia White, in November 2013

• Welcome and startup information from centers – Agreements to sign: Start this process as soon as possible! – Getting started materials: Work closely with the center

• Centers provide expert-to-expert assistance to help you get the most from your allocation – Scientific “Liaisons” and “Catalysts” (OLCF / ALCF)

42

PI responsibilities

• Provide quarterly status updates (on supplied template) – Milestone reports – Publications, awards, journal covers, presentations, etc., related to the work

• Provide highlights on significant science/engineering accomplishments as they occur

• Submit annual renewal request • Complete annual surveys • Encourage your team to be good citizens on the computers • Use the resources for the proposed work

Let us know your achievements and challenges

43

It is a small world…

• Let the science agency that funds your work know how significant the INCITE program and the Leadership Computing Facilities will be to your work

• Be sure to include the appropriate acknowledgements • Contact us if you have questions: we want to hear from you

44

Relevant links

INCITE General Information www.doeleadershipcomputing.org/

INCITE Proposal Site proposals.doeleadershipcomputing.org/

Argonne Discretionary Program www.alcf.anl.gov/getting-started/apply-for-dd Oak Ridge Discretionary Program www.olcf.ornl.gov/support/getting-started/olcf-director-discretion-project-application

Contact the center if you’d like to request Discretionary time for benchmarking

45

Contacts

For details about the INCITE program:

www.doeleadershipcomputing.org – General information proposals.doeleadershipcomputing.org – Proposal site [email protected]

For details about the centers:

www.olcf.ornl.gov [email protected], 865-241-6536

www.alcf.anl.gov [email protected], 866-508-9181

46

Supplementary material

47

Innovative and Novel Computational Impact on Theory and Experiment

Call for Proposals

Contact information Julia C. White, INCITE Manager

[email protected]

The INCITE program seeks proposals for high-impact

science and technology research challenges that require the power of the

leadership-class systems. Allocations will be for calendar year 2014.

April 15 – June 28, 2013

INCITE is an annual, peer-review allocation program that provides unprecedented computational and data science resources

• 5 billion core-hours to be awarded for 2014 on the 27-petaflops Cray XK7 “Titan” and the 10-petaflops IBM BG/Q “Mira”

• Average award: 50+ million core-hours

• Individual awards will be up to several hundred million core-hours

• INCITE is open to any science domain

• INCITE seeks computationally intensive, large-scale research campaigns

48

Allocation Programs at the LCFs

INCITE ALCC Director’s Discretionary

Mission High-risk, high-payoff science that requires LCF-scale resources*

High-risk, high-payoff science aligned with DOE mission

Strategic LCF goals

Call 1x/year – (Closes June) 1x/year – (Closes February) Rolling

Duration 1-3 years, yearly renewal 1 year 3m,6m,1 year

Typical Size 50 – 70 projects

50M – 100’s M core-hours/yr.

10 – 20 projects

1M – 75M core-hours/yr.

100s of projects

10K – 1M core-hours

Review Process

Scientific Peer-Review

Computational Readiness

Scientific Peer-Review

Computational Readiness

Strategic impact and feasibility

Managed By INCITE management committee (ALCF & OLCF) DOE Office of Science LCF management

Availability Open to all scientific researchers and organizations Capability >20% of cores

60% 30% 10%

49

A sample of codes with local expertise available at Argonne and Oak Ridge Application Field ALCF OLCF FLASH Astrophysics ✓ ✓ MILC,CPS LQCD ✓ ✓ Nek5000 Nuclear energy ✓ Rosetta Protein structure ✓ DCA++ Materials science ✓ ANGFMC Nuclear structure ✓ NUCCOR Nuclear structure ✓ Qbox Chemistry ✓ ✓ LAMMPS Molecular dynamics ✓ ✓ NWChem Chemistry ✓ ✓ GAMESS Chemistry ✓ ✓ MADNESS Chemistry ✓ ✓ CHARMM Molecular dynamics ✓ ✓ NAMD Molecular dynamics ✓ ✓

Application Field ALCF OLCF AVBP Combustion ✓ GTC Fusion ✓ ✓ Allstar Life science ✓ CPMD, CP2K Molecular dynamics ✓ ✓ CESM Climate ✓ ✓ CAM-SE Climate ✓ ✓ WRF Climate ✓ ✓ Amber Molecular dynamics ✓ ✓ enzo Astrophysics ✓ ✓ Falkon Computer science/HTC ✓ ✓ s3d Combustion ✓ DENOVO Nuclear energy ✓ LSMS Materials science ✓ GPAW Materials science ✓