june2015 asist rhode_island_2004

Post on 08-Aug-2015

267 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Supporting further and higher education

E-science and Digital Preservation Neil Beagrie

BL/JISC Partnership ManagerASIST Annual Conference Nov 2004

E-science Panel

2

Overview

• Apologies for absence (and thanks to Gail for presenting!)

• Trends and implications– Data growth– e-Research and collection-based science – The Grid– New publishing roles for datasets– Digital preservation

• Digital Curation Centre– What the funders are looking for

3

Growth of Scientific Data and Data Curation

• In next 5 years e-Science will produce more data than has been collected in the whole of human history

• Data growth – Protein Data Bank (1972-2003)

4

Implications

• Core Funding for institutions will not grow in line with information growth

• Need for more automation and tools• Need for new shared services– lower

the curation cost for disciplines – accelerate knowledge transfer

• Significant need for R&D and investment now to prepare for this

5

Collection-based Science (1)

• National Science Foundation Advisory Panel on Cyberinfrastructure

– “The importance of data in science and engineering continues on a path of exponential growth; some even assert that the major science driver of high end computing will soon be data…Collecting, organizing, storing, and providing access to vast quantities of data and other information (such as scholarly publications) is becoming as important as simulation has been and will likely grow faster over the next decade.”

6

Collection-based Science (2)

• NSF Advisory Panel on Cyberinfrastructure

• “To succeed NSF must… ensure that the exponentially growing amounts of data are collected, curated, managed, and stored for broad long-term access by scientists everywhere.”

• “Data Repositories...Providing access to observational and other data entails far more than attaching a lot of disks to a server that is on the Internet.”

• “R&D centers could be established for addressing common issues…there may be advantages to grouping applied research, development, and operations within a common organization and geographic location.”

7

The Grid

• ‘The Grid is a software infrastructure that enables flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions and resources’ ( Foster, Kesselman and Tuecke)

• Includes computational systems, data storage resources, digital libraries and specialized facilities

8

e-Government and the Grid

‘[The Grid] intends to make access to computing power, scientific data repositories and experimental facilities as easy as the Web makes access to information.’

Tony Blair, 2002

Implications for dp - Grids could enable better replication and preservation, and access

9

Data Publishing

In some subjects databases are wholly or partly replacing journal publications as a medium of communication– These databases are built and maintained with a

great deal of human effort – Scale of effort and supporting infrastructure varies

– may have discipline-wide scope and dedicated “curators”

– They may not contain primary data. Sometimes just value-added annotation/metadata

– They borrow/exchange extensively, and refer to other databases and journal articles

– May have evolved from supporting/internal facing role to publishing to external audiences

10

Ordnance Survey

• Publication in paper editions at different scales since 1791.

• Computerisation first designed to assist in workflow of paper publication.

• OS National Topographic Database (NTD)• For large –scale mapping paper editions

now discontinued. NTD is the map -continuously updated and printed remotely on demand.

11

Digital Preservation “ digital documents last forever –or five

years, which ever comes first” (Jeff Rothenberg 1997)

BBC Domesday System

12

Organisational and technical challenges

“….I have data files from projects from years ago which are on disks I no longer have a drive for on computers I no longer have access to or are no longer made or the software/operating system changes would make it extremely difficult to access any more…. the nature of research work means a lot of short-term researchers over the years …Also as PIs move around and collaborate with many people in other organisations it is pretty difficult to go back more than a few years with confidence that data will be adequately archived.”

(Interview quote from UK-based Professor cited in JISC Audit of e-Science Curation report)

13

Digital Curation Centre

• Joint funding JISC and e-science core programme

• Three year initial funding - $6m • Awarded to Consortium of Edinburgh,

Glasgow, CCLRC, UKOLN• Not a data centre – will provide generic

support services and research• DCC officially launched 5th November

2004

14

What the DCC funders are looking for

• Research into data curation and preservation issues

• advisory services in best practice and a repository for tools, software and documentation

• DCC is not being funded to set up its own data repository

• DCC will need to work with key data centres, repositories and libraries to engage the relevant communities

15

Further information

• Digital Curation Centre www.dcc.ac.uk

• The Continuing Access and Digital Preservation Strategy for the UK Joint Information Systems Committee (JISC)

http://www.dlib.org/dlib/july04/beagrie/07beagrie.html

top related