Download - Perspectives on Cyberinfrastructure
Perspectives on Cyberinfrastructure
Daniel E. [email protected]
Professor, University of MichiganSchool of Information & Dept. of
EECSOctober 2002
2
3
Input to Panel• 62 presentations at invitational public testimony sessions• 700 responses to a community-wide survey• review of dozens of prior relevant reports; scores of
unsolicited emails and phone calls• 250 pages of written critique from 60 reviewers of an
early draft of this report• hundreds of hours of deliberation and discussion
between Panel members• The members of the Panel have backgrounds in areas
widely relevant to creating, managing, and using advanced cyberinfrastructure.
4
Report Flow
(Cyber) infrastructure• The term infrastructure has been used since the 1920’s
to refer collectively to the roads, bridges, rail lines, and similar public works that are required for an industrial economy to function.
• The recent term cyberinfrastructure refers to an infrastructure based upon computer, information and communication technology (increasingly) required for discovery, dissemination, and preservation of knowledge.
• Traditional infrastructure is required for an industrial economy. Cyberinfrastructure is required for an information economy.
Cyberinfrastructure: the Middle Layer
Base-technology: computation, storage, communication
Cyberinfrastructure: hardware, software, personnel, services,
institutions
Applications in science and engineering research and
education
Enabling and Motivating a CI InitiativeASC PACI’s
Pittsburgh TSC
Distributed Terascale Facility
Some ITR Projects
Digital Library Initiatives
Networking Initiatives
Middleware Initiatives
Other CISE Research
Cyber- Infrastructure
Initiative
Initiatives in non-CISE Directorates
NSB Research Infrastructure Review
Initiatives in DOE, NIH, DOD, NASA, …
International Initiatives: UK e-science,Earth Simulator, EU Grid & 6th Framework
Scientific Data Collection/Curation
Collaboratories
8
Trends & Issues
• Components Circuit speed flattening in about 6 years, then
most increase from improving chip density and massive parallelism. New technology curves?
Disk capacity increase 60-100% per year. Networking: 1.6 Terabits/sec running in labs on
a single fiber (40 channels at 40 gigabits/sec.). Ubiquitous wireless.
9
Computational Diversity
• Capability not just capacity: technology, policy, tools.• Still need some center-based leading- edge,super computers.• On-demand supercomputing,not just batch.
10
Content
• Digital everything; exponential growth; conversion and born-digital.
• S&E literature is digital. Microfilm-> digital for preservation. Digital libraries are real and getting better.
• Distributed (global scale), multi-media, multi-disciplinary observation. Huge volume.
• Need for large-scale, enduring, professionally managed/curated data repositories.
• New modes of scholarly communication emerging.• IP, openness, ownership, privacy, security issues
Converging Streams of Activity
GRIDS (broadly defined)
E-science
CI-enabled Science & Engineering Research & Education
Science-driven pilots (not using above labels)
ITFRU Scholarly communicationin the digital age
National PetascaleSystems
UbiquitousSensor/actuator
Networks
LaboratoryTerascaleSystems
Ubiquitous Infosphere
Collaboratories ResponsiveEnvironments
Terabit Networks
ContextualAwareness
SmartObjects
Building Out
Building Up
Science, Policy and Education
PetabyteArchives
Futures: The Computing Continuum
Components of CI-enabled science & engineering
CollaborationServices
Knowledge managementinstitutions for collection buildingand curation of data, information,
literature, digital objects
High-performance computingfor modeling, simulation, data
processing/mining
Individual &Group Interfaces& Visualization
Physical World
Humans
Facilities for activation,manipulation and
construction
Instruments forobservation andcharacterization.
GlobalConnectivity
A broad, systemic, strategic conceptualization
Community Planning Guidance Examples from Geosciences
Consultation with
environmental community
leaders
NSF - Nov. 19, 2001
LIGO
ATLAS and CMS
NVO and ALMA
The number of nation-scale projects is growing rapidly!
Climate Change
Cyberinfrastructure Enabled Science
Instruments
Picture ofdigital sky
Knowledge from Data
Sensors
Picture ofearthquakeand bridge
Wireless networks
Personalized Medicine
More Diversity, New Devices, New Applications
Four LHC Experiments: The Petabyte to Exabyte Challenge
ATLAS, CMS, ALICE, LHCBHiggs + New particles; Quark-Gluon Plasma; CP Violation
Data storedData stored ~40 Petabytes/Year and UP; ~40 Petabytes/Year and UP; CPU CPU 0.30 Petaflops and UP 0.30 Petaflops and UP
0.1 to 1 Exabyte (1 EB = 100.1 to 1 Exabyte (1 EB = 101818 Bytes) Bytes) (2007) (~2012 ?) for the LHC Experiments(2007) (~2012 ?) for the LHC Experiments
Crab Nebula in 4 spectral regionsX-ray, optical, infrared, radio
Cyberinfrastructure is a First-Class Tool for Science
Network for Earthquake Engineering Simulation
Field Equipment
Laboratory Equipment
Remote Users
Remote Users
High-Performance Network(s)
Instrumented Structures and Sites
Leading Edge Computation
Curated Data Repository
Laboratory EquipmentGlobal Connections
Need highly coordinated, persistent, major investment in…
• Research and development (CI as object of R&D)) Base technology (CISE) CI components & systems (CISE & SEB) Science-driven pilots (CISE, SEB, all others)
• Operational services Distributed but connected (Grid) Exploit commonality, interoperability Advanced, leading-edge but… Robust, predictable, responsive, persistent
• Domain science communities (CI in service of R&D) Specific application of CI to revolutionizing research (pilot -> operational) Required not optional. New things, new ways. New things, new ways. Empowerment, training, retraining. X-informatics.
• Education and broader engagement Multi-use: education, public science literacy Equity of access Pilots of broader application: ITFRU, industry, workforce & economic development
Shared Opportunity and Responsibility
• All NSF communities• Multi-agency• Industry• International
From Prime Minister Tony Blair’s Speech to the Royal
Society (23 May 2002)
• What is particularly impressive is the way that scientists are now undaunted by important complex phenomena. Pulling together the massive power available from modern computers, the engineering capability to design and build enormously complex automated instruments to collect new data, with the weight of scientific understanding developed over the centuries, the frontiers of science have moved into a detailed understanding of complex phenomena ranging from the genome to our global climate. Predictive climate modelling covers the period to the end of this century and beyond, with our own Hadley Centre playing the leading role internationally.
• The emerging field of e-science should transform this kind of work. It's significant that the UK is the first country to develop a national e-science Grid, which intends to make access to computing power, scientific data repositories and experimental facilities as easy as the Web makes access to information.
• One of the pilot e-science projects is to develop a digital mammographic archive, together with an intelligent medical decision support system for breast cancer diagnosis and treatment. An individual hospital will not have supercomputing facilties, but through the Grid it could buy the time it needs. So the surgeon in the operating room will be able to pull up a high-resolution mammogram to identify exactly where the tumour can be found.
Bottom-line• NSF had a unique responsibility to provide
leadership for the Nation in an initiative to revolutionize science and engineering research capitalizing on cyberinfrastructure opportunities. A nascent revolution has begun. Demand is here and
growing. The time is now (opportunities & opportunity costs.)
Many prior investments (projects, initiatives, centers) are a key resource to build upon.
Now need sanction, leadership and empowerment through significant new funding and effective coordination.
Need very broad (synergistic) participation by many communities with complementary needs and expertise.
Need appropriate leadership and management structure. Need incremental funding of $1B/year (continuing).
Incremental budget estimates
• Our estimates are based on current and previous NSF activities testimonies other agencies’ programs in related areas activities in other countries explicit input from community on Draft 1.0
Budget Overview(Incremental in $ Millions)
• Fundamental research to advance CI $ 60• Application of CI to advance S&E research $200• Provision of operational CI $660• Information and data support $200
• TOTAL $1020
28
The INITIATIVE = ???• 1. Advanced Cyberinfrastructure Initiative (ACI)• 2. Advanced Application and Cyberinfrastucture
Initiative (AACI)• 3. Advanced Cyberinrastructure and Application
Initiative (ACAI)• 4. Advanced Digital Science and Engineering (ADSE)• 5. eScience Initiative (eSI)• 6. Digital Science for the Future (DSF)• 7. Digital Science and Engineering for the Future (DSEF)• 8. New Science and Engineering Research (NSER)• 9. Revolutions in Digital Exploration (RIDE)• 10. Digital Science and Engineering Exploration (D-SEE)
END
Need Appropriate Organizational Structure
• An INITIATIVE OFFICE with a highly placed, credible leader empowered to Initiate competitive, discipline-driven path-breaking applications
within NSF of cyberinfrastructure which contribute to the shared goals of the INITIATIVE.
Coordinate policy and allocations across fields and projects. Participants across NSF directorates, Federal agencies, and international e-science.
Develop high quality middleware and other software that is essential and special to scientific research.
Manage individual computational, storage, and networking resources at least 100x larger than individual projects or universities can provide.