1 data preservation imperatives: the role of the us national science foundation lucy nowell, ph.d....

47
1 Data Preservation Imperatives: The Role of the US National Science Foundation Lucy Nowell, Ph.D. Office of Cyberinfrastructure Conference on Permanent Access to the Records of Science Brussels, Belgium 15 November 2007

Upload: elizabeth-simpson

Post on 17-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

1

Data Preservation Imperatives: The Role of the US National Science Foundation

Lucy Nowell, Ph.D.

Office of Cyberinfrastructure

Conference on Permanent Access to the Records of Science

Brussels, Belgium

15 November 2007

2

Outline

• NSF Office of Cyberinfrastructure

• Motivation for Data Preservation

• Role of Universities and Academic Libraries

• Characteristics of the Digital Age

• NSF OCI Data Strategic Vision and Goals

3

4

NSF Act of 1950

• “To promote the progress of science…”

• Encourage & develop a national policy for the promotion of basic research and education in the math, physical, medical, biological, engineering and other sciences

• Initiate & support basic scientific research in the sciences

5

NationalAeronautic and Space

Administration

EnvironmentalProtection

Agency

Smithsonian Institution

Nuclear Regulatory Commission

Other agencies

Commerce

Science Advisor

Other boards, councils, etc.

U.S. President

Independent Agencies

Major Departments

Science AdvisorOffice of Science and

Technology Policy

Office of Management and Budget

Agriculture Health and Human Services

Interior Homeland Security

Defense Energy

6

Research Directorates

• Biological Sciences• Computer & Info. Science & Eng.• Education & Human Resources• Engineering• Geosciences• Mathematical & Physical Sciences• Social, Behaviorial & Econ. Sciences

Offices

• CyberInfrastructure

• Integrative Activities

• Polar Programs

• International Science and Engineering

National Science Foundation Director

Deputy Director

NationalScience Board

7

New Modes of Investigation

The conduct of science and The conduct of science and engineering is changing and evolving. engineering is changing and evolving. This is due, in large part, to the This is due, in large part, to the expansion of networked expansion of networked cyberinfrastructure …cyberinfrastructure …

NSF Strategic Plan 2006-2011NSF Strategic Plan 2006-2011

8

TerryLangendoen

Office of CyberInfrastructure (OCI)

Dan AtkinsOffice Director

José MuñozDep. Office Dir.

Lucy Nowell

Diana Rhoten

KevinThompson

Judy Hayden

Mary Daley Irene Lombardo Deborah White

Steve Meacham, Abani Patra

Data Learning &Workforce

VirtualOrganizations

Software/Middleware

High PerformanceComputing

9

… is the organized aggregate of technologies that

enable us to access and integrate today’s

information technology resources—data and

storage, computation, communication,

visualization, networking, scientific instruments,

expertise—to facilitate science and engineering

goals.

- Fran Berman, Director, SDSC

Cyberinfrastructure …

10

CI Vision :4 Interrelated Perspectives

Data, Data Analysis &

Visualization

High PerformanceComputing

Collaboratories, Collaboratories, Observatories &Observatories &Virtual Virtual OrganizationsOrganizations

Learning &Learning &Workforce Workforce

DevelopmentDevelopment

11

The Fragility of Memory in a Digital Age

Report of the Task Force on Archiving of Digital InformationCommission on Preservation and Access and the Research Libraries Group

“In 1964, the first electronic mail message was sent from either MIT, the Carnegie Institute, or Cambridge University. The message does not survive, however, and so there is no documentary record to determine which group sent the pathbreaking message.”

12

NASA plans new search for missing moon tapes

Aug. 15, 2006, 5:13PM

Seth Borenstein, Associated Press

WASHINGTON —NASA said today it was launching an official search for more than 13,000 original tapes of the historic Apollo moon missions.

Study Resource typeResource half-life

Koehler (1999 and 2002) 

Random Web pages 

2.0 years

Nelson and Allen (2002) 

Digital Library Object 

24.5 years

Harter and Kim (1996) 

Scholarly Article Citations 

1.5 years

Rumsey (2002)  Legal Citations  1.4 years

Markwell and Brooks (2002) 

Biological Science

Education Resources 

4.6 years

Spinellis (2003) Computer

Science Citations 4.0 years

Source: Koehler W. (2004) Information Research, 9 (2), 174

14

Replication of Results: A Cornerstone of Science

“…the results of one scientist's experiment are not considered reliable until another scientist has replicated them. The reproducibility of results plays several different, crucial roles in science…[but] in many circumstances, considerations of time and money often make reproducibility impractical.”

The Key Role of Replication in Science, Nancy S. Hall, The Chronicle of Higher Education, 10 November 2000

15

Replication of Results

• First and foremost, scientists attempt to reproduce someone else's experiment if they doubt that the results are accurate, or if the results contradict a view that is widely accepted in the field.

• An experiment is so reproducible that replicating it becomes a test of the student; if the student cannot replicate the experiment, it is the student who is at fault.

• As a training exercise, a new person [in a group] might be asked to repeat experiments that others have already performed, both to familiarize the newcomer with the work of the group and to give the older members a sense of the newcomer's expertise.

The Key Role of Replication in Science, Nancy S. Hall, The Chronicle of Higher Education, 10 November 2000

16

Replication of Data Collection Not Always Feasible

• Medical experiments carried out over years or decades, involving hundreds or even thousands of human subjects.

• Events that are singular and beyond the experimenter's control, like comets, earthquakes, and volcanic eruptions.

The Key Role of Replication in Science, Nancy S. Hall, The Chronicle of Higher Education, 10 November 2000

17

A Global Response

“Ensuring research data are easily accessible, so that they can be used as often and as widely as possible, is a matter of sound stewardship of public resources.”

Organization for Economic Cooperation and Development (OECD); “Promoting Access to Public Research Data for Scientific, Economic,

and Social Development”

18

“If we are effectively to preserve for future generations the …. corpus of information in digital form that represents our cultural record, we need … to commit ourselves technically, legally, economically, and organizationally to the full dimensions of the task.”

Report of the Task Force on Archiving of Digital Information, 1996Commission on Preservation and Access and the Research Libraries Group

A Challenge for Society

19

The Universities

“Ever since their inception, universities have been occupied with the fundamental elements of what we now call 'knowledge management', i.e. the creation, collection, preservation and dissemination of knowledge.”

Andre Oesterlinck, Knowledge Management in

Post-Secondary Education: Universities

20

The distinctive mission of the University is to serve society as a center of higher learning, providing long-term societal benefits through transmitting advanced knowledge, discovering new knowledge, and functioning as an active working repository of organized knowledge.

Mission Statement of the University of California

21

The Academic Libraries

“It is to the research library community that others will look for the preservation of … digital assets, as they have looked to us in the past for reliable, long-term access to the ‘traditional’ resources and products of research and scholarship.”

Association of Research Libraries (ARL)Strategic Plan 2005-2009

22

Information is the currency of the

digital age and information

integration is the means for

mobilizing that currency for

discovery, innovation, learning, and

progress.

23

24

25

26

27

x

yz

x

yz

Time t

x

yz

t

x

yz

x

yz

t

t

Before the Digital Age: A World Constrained to 4 Dimensions

28

x

yz

x

yz

Time t

x

yz

t

x

yz

x

yz

t

t

CI

5thDimension

29

Opening a 5th dimension

through cyberinfrastructure

is the revolutionary force of

the digital age …

30

Characteristics of a 5D World:(in priority order)

1. Time and place are no longer barriers to participation and interaction

2. Access is open to specialists and non-specialists alike

3. Information is the primary driver for progress

4. The realm of the possible is expanded through new capabilities, resources, and mechanisms

31

Individuals, groups,

organizations, and

nations that don’t

embrace the 5th

dimension will fall

behind in the digital age

32

The World Is Flat- Thomas Friedman

• More room for innovation• New spaces for learning and discovery• Expanded opportunities for collaboration

and interaction• Greater capabilities for research and

education

The flat world is expanding-Anonymous OCI program director

33http://www.nsf.gov/pubs/2007/nsf0728/index.jsp

NSF Draft Strategic Plan for Data, Data Analysis, and

Visualization

Chapter 3

34

Vision

• “Science and engineering digital data are routinely deposited in a well-documented form, are regularly and easily consulted and analyzed by specialists and non-specialists alike, are openly accessible while suitably protected, and are reliably preserved.”• NSF Cyberinfrastructure Vision for 21st

Century Discovery, Chapter 3

35

Goals

• To catalyze the development of a system of science and engineering data collections that is open, extensible and evolvable.

• To support development of a new generation of tools and services facilitating data acquisition, mining, integration, analysis, and visualization.

36

Principles

• Data generated with NSF funding will be accessible and reliably preserved

• Research/education opportunities determine investment priorities

• Broad community engagement is necessary in reviewing and prioritizing data activities

37

Principles (cont’d)

• Data is only useful if it can be found, understood, and analyzed

• Legitimate privacy, confidentiality, and intellectual property rights must be protected

• International, interagency, and public-private partnerships are essential

38

Digital Data Preservation and Access Framework

Federal

State

LocalInternational

Non-profit

College

University

USER

Commercial

Multi-Sector

Nimble

Sustainable

Reliable

User-centric

39

DataNet• A robust and resilient national and global digital data

framework for preservation and access to the resources and products of the digital age• Provide reliable digital preservation, access, integration and

analysis capabilities for science and/or engineering over a decades-long timeline: sustainability

• Continuously anticipate and adapt to changes in technologies & user needs and expectations

• Engage at the frontiers of science & engineering research & education, with research & development to drive the leading edge forward

• Serve as component elements of an interoperable data preservation and access network, spanning national and international boundaries: shared governance and standards

• Creation of new types of organizations that fully integrate all of these capabilities

40

DataNet Partners

• Combine expertise in library and archival sciences; computer, computational and information sciences; cyberinfrastructure; and domain sciences and engineering

• Develop models for economic and technological sustainability over multiple decades

• Engage at the frontiers of science and engineering research and education

• Work cooperatively and in coordination to to create a functional data network with revolutionary new capabilities for information access, use, and integration without regard to conventional barriers such as data type and format, discipline or subject area, and time and place/institution.

41

DataNet Partner Responsibilities

• Provide for full data management life cycle• Data deposition/acquisition/ingest• Data curation & metadata management• Data protection, including privacy• Data discovery, access, use, & dissemination• Data interoperability, standard, & integration• Data evaluation, analysis, & visualization

• Engage in research central to DataNet responsibilities• Education & training• Community & user input assessment• International engagement – collaborate & coordinate closely

with preservation & access organizations to catalyze formation of a global data network

• Foreign collaborators are expected to secure support from their own national sources.

42

Summary Strategic Plan

• Promote a change in culture

• Catalyze development of a national digital data framework

• Support new generations of tools, services, and capabilities

43

NSFNet TrafficSeptember 1991

44

The World Wide DataNet @ T=T0

= Data point-of-presence

45

The World Wide DataNet @ T=TN

46

The Whole Is Greater Than the Sum of Its Parts

• Climate Change• Pandemic• Drought and Starvation• Sustainable Energy• Aging Populations• Human Behavior under Stress• Etc.

47

Thank you!