clade review 2003-2008 nancy wilkins-diehr [email protected] clade 2008, june 23, 2008

52
CLADE Review 2003-2008 Nancy Wilkins-Diehr [email protected] CLADE 2008, June 23, 2008

Upload: kimberly-brooks

Post on 11-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

CLADE Review2003-2008

Nancy [email protected]

CLADE 2008, June 23, 2008

Page 2: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

The Origin of CLADE

• “The CLADE workshop began with a discussion at HPDC-11, July 24-26, 2002, at Edinburgh International Conference Center in Scotland.

• Salim Hariri, C.S. Raghavendra, and I and likely a couple of others got to talking about the state of Grid applications.

• At that time quite a lot of progress had been made with tools and technologies for distributed applications, but we were not seeing many applications papers at HPDC, or in other forums either.

• So Salim suggested that we put together a workshop to focus attention on applications, and he asked me to help organize it.”

Ray Bair

CLADE 2008, June 23, 2008

Page 3: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Keys to the Success of CLADE

• Compliments the HPDC program– Focus on real applications that demonstrate the use of Grid approaches on

a significant scale. – CLADE's association with HPDC still distinguishes it from other conferences

•Bringing together cutting edge computer science and applications

• Support of the HPDC Steering Committee• Strong Program Committee chairs• Good advice from CLADE's Steering Committee• Engaged Program Committee members• Peer-review system has been important in selecting good

papers that are timely and interesting• Distribution of the CLADE proceedings at the workshop

increases the value and usefulness of the papers to the participants

CLADE 2008, June 23, 2008

Page 4: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

2008 CLADE Organization• STEERING COMMITTEE• Raymond Bair, ANL• Ioana Banicescu, Mississippi State

Univ.• Francine Berman, Univ. of Calif.,

San Diego• Jack Dongarra, Univ. of Tenn.,

Knoxville• Salim Hariri, University of Arizona• Manish Parashar, Rutgers

University• Viktor Prasanna, Univ. of Southern

Calif.• Joel Saltz, Ohio State University• Edward Seidel, Louisiana State

University• Alan Sussman, University of

Maryland

• PROGRAM COMMITTEE Henrique Andrade, IBM ResearchDavid Bernholdt, ORNLJiannong Cao, HK PolyUUmit Catalyurek, Ohio State U.Kenneth Chiu, U. BinghamtonJose Cunha, U. Nova de LisboaEwa Deelman, ISIFrederic Desprez, ENS LyonHai Jin, HUSTTevfik Kosar, Louisiana State U.Tahsin Kurc, Ohio State U.Jysoo Lee, Calit2

Sang Boem Lim, KonKuk U.David Lowenthal, U. GeorgiaMalika Mahoui, IUPUIJames Myers, NCSA

Gregory Newby, Arctic Region Supercomputing CenterJun Ni, U. IowaYoonho Park, IBM ResearchMarlon Pierce, Indiana U.

Ilkyun Ra, U. Colorado DenverThomas Rauber, U. BayreuthGudula Rünger, TU ChemnitzEdward Walker, TACC

Shaowen Wang, UIUC

CLADE 2008, June 23, 2008

Page 5: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Today’s Talk

•Overview CLADE keynotes 2003-2007– 2003 “Dynamic Data Driven Application Systems”,

Frederica Darema– 2004 “A Grid based Diagnostics and Prognosis System

for Rolls Royce Aero Engines: The DAME Project”, Jim Austin

– 2005 “Enabling Science and Engineering Applications on the Grid”, Ed Seidel

– 2006 “Gridcast - a Next Generation Broadcasting Infrastructure?”, Terry Harmer

– 2007 “The Cancer Biomedical Informatics Grid: Connecting the Cancer Research Community”, Scott Oster

•TeraGrid Science GatewaysCLADE 2008, June 23, 2008

Page 6: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

CLADE 2003, Seattle

•Keynote Presentation– Frederica Darema, Senior Science and Technology Advisor

and Director of the Next Generation Software Program, National Science Foundation

– Dynamic Data Driven Application Systems

•Highlighted the relationship between theory, simulation and experiment or field data– Dynamic feedback and control loop between simulation and

experimental data

•“DDDAS has potential for significant impact to science, engineering, and commercial world, akin to the transformation effected since the ‘50s by the advent of computers”

CLADE 2008, June 23, 2008

Page 7: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Example DDDAS Applications

•Generalized methodology for state estimation and prediction– Predictor-Corrector methods– Advanced Driving Assistance

Systems for automobiles– Tracking algorithms for Air

Traffic Control– Enhancing oil exploration

methods and capabilities

•Enhanced manufacturing supply chains through sensor information

Source Frederica Darema CLADE 2008, June 23, 2008

Page 8: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

•Virtual operations re-planning and control– Event-driven

simulations for systems subject to unplanned outages

•Earthquake tolerant buildings and bridges

•Fire propagation prediction and management

Source Frederica Darema CLADE 2008, June 23, 2008

Page 9: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

• Integrated Image-Guided Interventions– Real-time, three-

dimensional (3D) imaging needs of surgeons.

•Biodiversity and bio-complexity – Dramatic changes due to

habitat transformation, invasions of exotic species, chemical contamination, diseases and epidemics, climate change, and floods and drought

Source Frederica Darema CLADE 2008, June 23, 2008

Page 10: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

•Hydro-complexity – Weather, Water and Pollution

•Design and configuration methodologies for sensor networks

•The oceanographic community at large has interests in DDDAS in order to help optimize observing systems for important scientific studies.

Source Frederica Darema CLADE 2008, June 23, 2008

Page 11: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

CLADE 2004, Honolulu

•Keynote Presentation– Jim Austin, University of York– A Grid based Diagnostics and Prognosis System

for Rolls Royce Aero Engines: The DAME Project

•Very practical engineering application– Using distributed data intensive Grid application

to diagnosis and prognosis of Rolls-Royce Aero Engines

CLADE 2008, June 23, 2008

Page 12: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Distributed Aircraft Maintenance Environment (DAME)

•UK e-Science pilot project– Quote

•Neural network–based techniques for real-time monitoring•Compare stored vibration data with instantaneous snapshots•Each flight produces 1GB of data, TBs per year of distributed data for a fleet.

– AURA•Advanced Uncertain Reasoning Architecture for Pattern Matching•Pattern matching among terascale datasets, distribute for speed

– CBR•Case Based Reasoning systems for intelligent decision support•Correlates engine anomalies with root cause•Combine into scalable system using grid middleware•Utilising large amounts of vibration and performance data available from modern aero-engines for fleet based diagnostics

Source: Jim Austin CLADE 2008, June 23, 2008

Page 13: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

•Fault diagnosis and prognosis integrated with predictive maintenance– Detect that engine has deviated from normal

(QUOTE)– Diagnose why (AURA)– Form a prognosis (CBR)– Plan remedial actions

•Common components of all fault diagnosis and prognosis systems

Source: Jim Austin CLADE 2008, June 23, 2008

Page 14: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

•Quality of Service and Security are two most important project concerns– QoS critical for commercial deployment, SLAs will

likely be a necessity– Workgroup formed to focus on security

•Future directions– Base services can be used with many other apps– Put core services into a portal– More flexible workflow configurations– Current project considered a demonstration project– Commercial implementation will need high

availability, reliability, data integrity, confidentialitySource: Jim Austin CLADE 2008, June 23, 2008

Page 15: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

CLADE 2005, Research Triangle Park, NC

•Keynote Presentation– Ed Seidel, Louisiana State University– Enabling Science and Engineering Applications on

the Grid

•Ed Seidel, recently named Office of Cyberinfrastructure director at NSF reporting to Dr. Bement– Many years experience with distributed

applications and high performance computing

CLADE 2008, June 23, 2008

Page 16: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Optical Networks 1000x faster than regionalWhat are people doing with this?

•Collaboration– Distributed communities (NEES, GEON), shared CI

– data, code, tools, resources, simulations

• Standard things– Task farming, resource brokering, remote steering

•New scenarios– Apps abstracted, dynamic apps find their own

services, resources, people; distributed apps – spawned, monitored

•Grids bring it all together, but worries in the US about DOE, NSF CI funding

Source: Ed Seidel CLADE 2008, June 23, 2008

Page 17: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Distributed computation – the old way

• Why?– Capacity: computers can’t keep up with needs– Throughput

• Issues– Bandwidth (increasing faster than computation)– Latency– Communication needs, Topology– Communication/computation

• Techniques to be developed– Overlapping communication/computation– Extra ghost zones to reduce latency– Compression– Algorithms to do this for scientist

• Gridlab.org, cactuscode.org

CLADE 2008, June 23, 2008Source: Ed Seidel

Page 18: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Distributed computation – the new way

• Intelligent parameter surveys, Monte Carlos– May control other simulations– Dynamic staging: move to faster/cheaper/bigger machine (“Grid

Worm”)– Need more memory? Need less?

• Multiple universe: clone to investigate steered parameter (“Gird Virus”)– Automatic component loading– Needs of process change, discover/load/execute new component

somewhere– Automatic “look ahead”, convergence testing

• spawn off and run coarser resolution to predict likely future, study convergence

– Routine profiling•Best machine/queue, choose resolution parameters based on queue•Dynamic load balancing: inhomogeneous loads, multiple grids•DDDAS: injecting data into the above, feed back to experiment

Source: Ed Seidel CLADE 2008, June 23, 2008

Page 19: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

GridLab5M EU Project

• Code/User/Infrastructure should be aware of environment

– Discover resources available NOW, and their current state•What is my allocation?•What is the bandwidth/latency between sites?

• Code/User/Infrastructure should be able to make decisions– A slow part of my simulation can run asynchronously…spawn it off!– New, more powerful resources just became available…migrate there!– Machine went down…reconfigure and recover!– Need more memory (or less!)…get it by adding (dropping) machines!

• Code/User/Infrastructure should be able to publish to central server for tracking, monitoring, steering…– Unexpected event…notify users!– Collaborators from around the world all connect, examine simulation.– Rethink algorithms: Task farming, vectors, pipelines, etc all apply on Grids… The Grid IS your

Computer!

Source: Ed Seidel CLADE 2008, June 23, 2008

Page 20: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Ed’s Conclusions

•Optical Networks, grids promise new ways of computing– Networks need application toolkits, reasonable cost model– Standards developing– 15 years ago: parallel computing drove interconnects, HPF,

MPI

•Now: 2 levels...OGSA grid services, SAGA for apps– GridLab: www.gridlab.org– Grid Application Toolkit: www.gridlab.org/GAT

–Documentation, publications, software download

– Cactus Computational Toolkit: www.cactuscode.org– GGF “Simple API for Grid Applications” (SAGA)– Today, SAGA continues as an active research group in the

Open Grid Forum (OGF)– Paper presentation on GAT/SAGA at TeraGrid 08 last week

Source: Ed Seidel CLADE 2008, June 23, 2008

Page 21: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

CLADE 2006, Paris

•Keynote Presentation– Terry Harmer, Technical Director of the Belfast e-

Science Centre (BeSC)– Gridcast - a Next Generation Broadcasting

Infrastructure?

•Media broadcasting– BBC has offices in most world capitals– Large scale, distributed, dynamic, highly reactive

management of broadcast content– Prototype broadcasting grid developed has been

deployed since 2004– UK e-Science project

•50% of funding for UK e-Science centers must come from industry

CLADE 2008, June 23, 2008

Page 22: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Broadcasting is distributedUndergoing rapid technical change

• Grid can potentially address technical challenges– Secure, wide area distribution of high volume content– Secure remote access to high value technical resources

•Advanced editing suites

– Integration of devices, equipment, applications– Economic challenges to deliver cost-effective. Resilient, extensible

infrastructure in rapidly changing environment

• BBC wanted move to commodity infrastructure– 280 gig per hour in data movement

• Grid as integration framework– Tie together various platforms– Deploy software– Not really for computing at this stage

• 13 May, 2008– BeSC awarded over £900,000 to continue its role in developing the

successor to the world wide web– Use of grid via Gridcast provides greater programming autonomy

among BBC sitesSource: Terry Harmer CLADE 2008, June 23, 2008

Page 23: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

CLADE 2007, Monterrey, CA

•Keynote Presentation– Scott Oster, Ohio State University– The Cancer Biomedical Informatics Grid: Connecting the

Cancer Research Community

•Goal: Relieve suffering due to cancer by 2015•61 cancer labs supported by the National Cancer

Institute (NCI)– More than 50 of these, 30 organizations, 800 people

involved in caBIG– Create scalable, actively managed organization that will

connect members of the NCI-supported cancer enterprise by building a biomedical informatics network

CLADE 2008, June 23, 2008

Page 24: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

caBIG Motivation

•This year there will be approximately 1,400,000 Americans diagnosed with cancer

•More than 500,000 Americans are expected to die from cancer this year

• In 2005, the NIH estimated costs for cancer at $209.9 billion, with direct medical costs of $74 billion

Source Scott Oster CLADE 2008, June 23, 2008

Page 25: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

What is caBIG?

•Common, widely distributed infrastructure that permits the cancer research community to focus on innovation

•Shared, harmonized set of terminology, data elements, and data models that facilitate information exchange

•Collection of interoperable applications developed to common standards

•Cancer research data available for mining and integration

Source Scott Oster CLADE 2008, June 23, 2008

Page 26: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Driving Needs

• A multitude of “legacy” information systems, most of which cannot be readily shared between institutions

• Difficulty in identifying and accessing available resources– Approach: standards-based grid, WSRF web services, Introduce

•But standards in Web/Grid service domain are turbulent at best– Competing interests of “big business” and multiple standards bodies

• An absence of tools to connect different databases• An absence of common data formats

– Approach: Adopt XML as data exchange format– Cancer Data Standards Repository (caDSR) captures logical model

with annotations; facilitates reuse and formal definition

• A huge and growing volume of data must be collected, analyzed, and made accessible– Gridftp, move services to data

• Few common vocabularies, making it difficult, if not impossible, to interlink diverse research and clinical results

CLADE 2008, June 23, 2008Source Scott Oster

Page 27: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

• An absence of information infrastructure to share data within an institution, or among different institutions– If cancer is cured, and caBIG resources play a role, there will be much

interest in knowing who contributed what (and who funded them)– Technical Approach

•Single sign on, Grid Authentication and Authorization with Reliably Distributed Services (GAARDS)

•Federate Identity Management (Dorian)•Authorization solutions

– GridGrouper for group-based– CSM for local policy– Globus PDPs for complex rules

– Institutional Review Boards (IRB) involved for any protected health information (PHI); even for de-identified data

•Grid is multi-institutional which means IRBs must reach agreements (read: separately employed lawyers working together)

– Socio-Cultural Approach•Whole workspace in caBIG dedicated to it (DSIC)• NCI in a good position to “encourage” it

–Large percentage of institutions’ cancer research funding comes from NCI–Hope is motivation will be value-based once initially primed

CLADE 2008, June 23, 2008Source Scott Oster

Page 28: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Scott’s Summary•The bad news:

– Large-scale, distributed knowledge sharing is hard

•The good news:– The potential rewards are large

•The good news (for computer scientists):– There are lots of unsolved problems (and interest in

getting them solved)

Source Scott Oster CLADE 2008, June 23, 2008

Page 29: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

TeraGrid Science Gateways

CLADE 2008, June 23, 2008

Page 30: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Phenomenal Impact of the Internet on Worldwide Communication and Information Retrieval

• Implications on the conduct of science are still evolving– 1980’s, Early gateways, National Center for Biotechnology

Information BLAST server, search results sent by email, still a working portal today

– 1992 Mosaic web browser developed– 1995 “International Protein Data Bank Enhanced by Computer

Browser”– 2004 TeraGrid project director Rick Stevens recognized

growth in scientific portal development and proposed the Science Gateway Program

•Simultaneous explosion of digital information– Analysis needs in a variety of scientific areas– Sensors, telescopes, satellites, digital images and video– #1 machine on Top500 today is more powerful than all

combined entries on the first list in 1993

CLADE 2008, June 23, 2008

Only 16 years since the release of Mosaic!

Page 31: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

1998 Workshop Highlights Early Impact of Internet on Science

• Shared access to geographically disperse resources

• Assembling the best minds to tackle the toughest problems regardless of location

• Tackling the same problems differently, but also tackling different problems

• Not only the scope, but the process of scientific investigation is changed– “As the chemical applications and

capabilities provided by collaboratories become more familiar, researchers will move significantly beyond current practice to exciting new paradigms for scientific work”

CLADE 2008, June 23, 2008

Requirements for future success include:- Development of interdisciplinary partnerships of chemists and computer scientists- Flexible and extensible frameworks for collaboratories- Means to deploy, support, and evaluate collaboratories in the field

Page 32: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Rapid Advances in Web Usability

•First generation– Static Web pages

•Second generation – Dynamic, database interfaces, cgi– Lacked the ease of use of desktop applications

•Third generation– True networked and internetworked applications

that enable dynamic two-way, even multi-way, communication and collaboration on the Web.

– Remarkable new uses of the Web in the organizational workplace and on the Internet

CLADE 2008, June 23, 2008

Source: Screen Porch White Paper, The University of Western Ontario (1996)

Page 33: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

The Internet as a Resource for News and Information about Science:Summary of Findings at a Glance

40 million Americans rely on the internet as their primary source for news and information about science.

For home broadband users, the internet and television are equally popular as sources for science news – and the internet leads the way for young broadband users.

The internet is the source to which people would turn first if they need information on a specific scientific topic.The internet is a research tool for 87% of online users. That translates to 128 million adults.Consumers of online science information are fact-checkers of scientific claims. Sometimes they use the internet for this, other times they use offline sources.

Convenience plays a large role in drawing people to the internet for science information.Happenstance also plays a role in users’ experience with online science resources. Two-thirds of internet users say they have come upon news and information about science when they went online for another reason.Those who seek out science news or information on the internet are more likely than others to believe that scientific pursuits have a positive impact on society.

Internet users who have sought science information online are more likely to report that they have higher levels of understanding of science.

Between 40% and 50% of internet users say they get information about a specific topic using the internet or through email.Search engines are far and away the most popular source for beginning science research among users who say they would turn first to the internet to get more information about a specific topic.

Half of all internet users have been to a website which specializes in scientific content.Fully 59% of Americans have been to a science museum in the past year.Science websites and science museums may serve effectively as portals to one another.

The convenience of getting scientific material on the web opens doors to better attitudes and understanding of science.

November 20, 2006John B. Horrigan, Associate

Director

http://www.pewinternet.org/pdfs/PIP_Exploratorium_Science.pdfCLADE 2008, June 23, 2008

Page 34: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

NSF (my sponsor) has long recognized the importance of science and technology

interactions•Interdisciplinary programs did much to facilitate application-technology integration and develop standard tools– 1997 PACI Program

•“Shotgun marriages” of technologists and application scientists

–A few groups served as path finders and benefited tremendously–NPACI neuroscience thrust in 1997 leads to Telescience portal and BIRN in 2001

– Information Technology Research (ITR)– NSF Middleware Initiative (NMI)

•Plug and play tools so more groups can benefitCLADE 2008, June 23, 2008

Page 35: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

NSF Continues Its Leadership TodayWhat Will Lead to Transformative Science?

• “Virtual environments have the potential to enhance collaboration, education, and experimentation in ways that we are just beginning to explore.”

• “In every discipline, we need new techniques that can help scientists and engineers uncover fresh knowledge from vast amounts of data generated by sensors, telescopes, satellites, or even the media and the Internet.”

CLADE 2008, June 23, 2008

Gateways are a terrific example of interfaces that can support

transformative science

Page 36: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Evolution of the Gateway Program

• 2004 “TeraGrid Science Gateway” term originates– We will help them build gateway portals that leverage TeraGrid

capabilities and provide web-based interfaces to community tools

• 2005 Gateway requirements analysis team– Areas of identified commonality include:

•Web services, auditing, community accounts, flexible allocations, scheduling, outreach

– Needs of command-line supercomputing users fairly well defined•Ssh to tg-login•Data transfer to and from supercomputer•Software

–MPI, math libraries, domain software

•Compilers•Batch queue submission•Help desk

– Need to address Gateway developer needs just as efficiently

CLADE 2008, June 23, 2008

Page 37: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Tremendous Opportunities Using the Largest Shared Resources -

Challenges too!•What’s different when the resource doesn’t belong just

to me?– Resource discovery– Accounting– Security– Proposal-based requests for resources (peer-reviewed access)

•Code scaling and performance numbers•Justification of resources•Gateway citations

•Tremendous benefits at the high end, but even more work for the developers

•Potential impact on science is huge– Small number of developers can impact thousands of scientists– But need a way to train and fund those developers and provide

them with appropriate toolsCLADE 2008, June 23, 2008

Page 38: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Ongoing Work to Meet Common Needs• Web Services

– GT4 deployment, identification of remaining capabilities

– Information services, MDS– Registry of Gateway services– TG-specific “where can I run soonest”

with QBETS

• Auditing– GRAM audit to retrieve usage information

for individual compute jobs– GridShib

•Counting gateway users, individualized accounting, increased security

• Community Accounts– Policy finalized, security approaches

being tested by RPs– GridShib development, testing with

gateways

• Resource requests– Collaboration with reviewers to develop

guidelines for Gateway PIs– Adapt to usage uncertainties, ability to

assess impact, Gateway management structure

• Scheduling– Metascheduling– On-demand via SPRUCE

framework

• Outreach– Pathways project

•Gateway use by educators•Training MSI students to build Gateways

• Documentation– Extensive wiki information

transformed into navigable documentation

• Gateway Hosting– Available at IU through peer

review

• Staff Support– Targeted support, general

capabilities, production coordinator

CLADE 2008, June 23, 2008

Page 39: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Variety of Gateways Available TodayTitle Discipline

Open Science Grid (OSG) Advanced Scientific Computing

Special PRiority and Urgent Computing Environment (SPRUCE)

Advanced Scientific Computing

Massive Pulsar Surveys using the Arecibo L-band Feed Array (ALFA)

Astronomical Sciences

National Virtual Observatory (NVO) Astronomical Sciences

Linked Environments for Atmospheric Discovery (LEAD) Atmospheric Sciences

Computational Chemistry Grid (GridChem) Chemistry

Computational Science and Engineering Online (CSE-Online) Chemistry

Network for Earthquake Engineering Simulation (NEES) Earthquake Hazard Mitigation

GEON(GEOsciences Network) (GEON) Earth Sciences

Network for Computational Nanotechnology and nanoHUB Emerging Technologies Initiation

TeraGrid Geographic Information Science Gateway (GISolve) Geography and Regional Science

CIG Science Gateway for the Geodynamics Community Geophysics

QuakeSim (QuakeSim) Geophysics

The Earth System Grid (ESG) Global Atmospheric Research

National Biomedical Computation Resource (NBCR) Integrative Biology and Neuroscience

Developing Social Informatics Data Grid (SIDGrid) Language, Cognition, and Social Behavior

Neutron Science TeraGrid Gateway (NSTG) Materials Research

Biology and Biomedicine Science Gateway Molecular Biosciences

Open Life Sciences Gateway (OLSG) Molecular Biosciences

The Telescience Project Neuroscience Biology

Grid Analysis Environment (GAE) Physics

SCEC Earthworks Project Seismology

TeraGrid Visualization Gateway Visualization, Graphics, and Image Processing

CLADE 2008, June 23, 2008

Page 40: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Easy Gateway True and False TestAnswers Provided

• TeraGrid selects all gateways (F)

• TeraGrid designs all gateways (F)

• TeraGrid limits the number of gateways (F)

• All gateways need TeraGrid funding to exist (F)

• Any PI can request an allocation and use it to develop a gateway (T)

• Gateway design is community-developed and that is the core strength of the program (T)

• TeraGrid staff are alerted to gateway work when a proposal is reviewed or when a community account is requested (T)

• Limited TeraGrid support can be provided for targeted assistance to integrate an existing gateway with TeraGrid (T)

CLADE 2008, June 23, 2008

Page 41: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Gateway Idea Resonates with Scientists•Capabilities provided by the Web are easy to envision

because we use them in every day life•Researchers can imagine scientific capabilities provided

through a familiar interface

•Groups resonate with the fact that gateways are designed by communities and provide interfaces understood by those communities– But also provide access to greater capabilities on the back end

without the user needing to understand the details of those capabilities

– Scientists know they can undertake more complex analyses and that’s all they want to focus on

•But this seamless access doesn’t come for free. It all hinges on very capable developers.

CLADE 2008, June 23, 2008

Page 42: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Gateways Greatly Expand Access

•Almost anyone can investigate scientific questions using high end resources– Not just those in the research groups of those who

request allocations

•Fosters new ideas, cross-disciplinary approaches

•Encourages students to experiment•But used in production too

– Increasing number of papers resulting from the use of gateways

– Scientists can focus on challenging science problems rather than challenging infrastructure problems

CLADE 2008, June 23, 2008

Page 43: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Highlights: NanoHub Explosive User Growth

•In past 12 months– 68,975 users

•43% from U.S.

– 25,187 course downloads

– 8,287 podcast downloads

– 371 online meetings

•Full featured gateway– Simulation tools,

curricula, multimedia, user contributions, collaborations

CLADE 2008, June 23, 2008

Page 44: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Highlights: LEAD Inspires StudentsAdvanced capabilities regardless of location

• A student gets excited about what he was able to do with LEAD

• “Dr. Sikora:Attached is a display of 2-m T and wind depicting the WRF's interpretation of the coastal front on 14 February 2007. It's interesting that I found an example using IDV that parallels our discussion of mesoscale boundaries in class. It illustrates very nicely the transition to a coastal low and the strong baroclinic zone with a location very similar to Markowski's depiction. I created this image in IDV after running a 5-km WRF run (initialized with NAM output) via the LEAD Portal. This simple 1-level plot is just a precursor of the many capabilities IDV will eventually offer to visualize high-res WRF output. Enjoy!”

• Eric (email, March 2007)

CLADE 2008, June 23, 2008

Page 45: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Highlights: GridChem Employs a Client-Server Approach…

CLADE 2008, June 23, 2008

Page 46: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

…for Production Science

• Chemical Reactivity of the Biradicaloid (HO...ONO) Singlet States of Peroxynitrous Acid. The Oxidation of Hydrocarbons, Sulfides, and Selenides. Bach, R. D et al. J. Am. Chem. Soc. 2005, 127, 3140-3155.

• The "Somersault" Mechanism for the P-450 Hydroxylation of Hydrocarbons. The Intervention of Transient Inverted Metastable Hydroperoxides. Bach, R. D.; Dmitrenko, O. J. Am. Chem. Soc. 2006, 128(5), 1474-1488.

• The Effect of Carbonyl Substitution on the Strain Energy of Small Ring Compounds and their Six-member Ring Reference Compounds Bach, R. D.; Dmitrenko, O. J. Am. Chem. Soc. 2006,128(14), 4598.

• Azide Reactions for Controlling Clean Silicon Surface Chemistry: Benzylazide on Si(100)-2 x 1Semyon Bocharov et al..J. Am. Chem. Soc., 128 (29), 9300 -9301, 2006

• Chemistry of Diffusion Barrier Film Formation: Adsorption and Dissociation of Tetrakis(dimethylamino)titanium on Si(100)-2 × 1 Rodriguez-Reyes, J. C. F.; Teplyakov, A. V.J. Phys. Chem. C.; 2007; 111(12); 4800-4808.

• Computational Studies of [2+2] and [4+2] Pericyclic Reactions between Phosphinoboranes and Alkenes. Steric and Electronic Effects in Identifying a Reactive Phosphinoborane that Should Avoid Dimerization Thomas M. Gilbert and Steven M. Bachrach Organometallics, 26 (10), 2672 -2678, 2007.

CLADE 2008, June 23, 2008

Page 47: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

cancer Bioinformatics Grid

Addressing today’s challenges in cancer research and treatment

• The mission of caBIG™ is to develop a truly collaborative information network that accelerates the discovery of new approaches for the detection, diagnosis, treatment, and prevention of cancer, ultimately improving patient outcomes.

• The goals of caBIG™ are to: • Connect scientists and practitioners

through a shareable and interoperable infrastructure

• Develop standard rules and a common language to more easily share information

• Build or adapt tools for collecting, analyzing, integrating, and disseminating information associated with cancer research and care.

CLADE 2008, June 23, 2008Source: cabig.cancer.gov

Page 48: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

caBIG and TeraGrid

•caBIG conducted study of all Gateways– Pleased to discover that community accounts and web services

will exactly meet their requirements

•TeraGrid resources incorporated into geWorkbench– an open source platform for integrated genomics used to

•Load data from local or remote data sources. •Visualize gene expression and sequence data in a variety of ways. •Provide access to client- and server-side computational analysis tools such as t-test analysis, hierarchical clustering, self organizing maps, regulatory networks reconstruction, BLAST searches, pattern/motif discovery, etc.–Clustering is used to build groups of genes with related expression patterns which may contain functionally related proteins, such as enzymes for a specific pathway

•Validate computational hypothesis through the integration of gene and pathway annotation information from curated sources as well as through Gene Ontology enrichment analysis.

CLADE 2008, June 23, 2008

Page 49: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

geWorkbench Integrages TeraGrid Resources

CLADE 2008, June 23, 2008

“Although the new service is TeraGrid-aware, the perspective from geWorkbench does not change. As far as geWorkbench is concerned, it is still connecting to a Hierarchical Clustering caGrid service. The difference is now the caGrid service is a gateway service that submits a TeraGrid job on behalf of geWorkbench. geWorkbench, however, does not notice this difference.”

Source: http://wiki.c2b2.columbia.edu/informatics/index.php/GeWorkbench_Example

Page 50: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Hide the “C” in CLADE with a GatewayWhen is a gateway appropriate?

• Researchers using defined sets of tools in different ways– Same executables, different input

•GridChem, CHARMM

– Creating multi-scale or complex workflows

– Datasets

• Common data formats– National Virtual Observatory– Earth System Grid– Some groups have invested

significant efforts here•caBIG, extensive discussions to develop common terminology and formats

•BIRN, extensive data sharing agreements

• Difficult to access data/advanced workflows– Sensor/radar input

•LEAD, GEONCLADE 2008, June 23, 2008

Page 51: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Tremendous Potential for Gateways• In only 16 years, the Web has

fundamentally changed human communication

• Science Gateways can leverage this amazingly powerful tool to:– Transform the way scientists

collaborate– Streamline conduct of science– Influence the public’s

perception of science

• Reliability, trust, continuity are fundamental to truly change the conduct of science through the use of gateways– High end resources can have a

profound impact

• The future is very exciting!

CLADE 2008, June 23, 2008

Page 52: CLADE Review 2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

Thank you for your attention

[email protected]•www.teragrid.org

CLADE 2008, June 23, 2008