ecure 2005, phoenix, az faculty research data: informatics and archiving sarah m. pritchard...

26
ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

Post on 19-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

ECURE 2005, Phoenix, AZ

Faculty Research Data:Informatics and Archiving

Sarah M. PritchardUniversity LibrarianUniversity of California, Santa Barbara

Page 2: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ2

Informatics: A Definition

The study of the structure and behavior of natural and artificial systems designed to process data

Development of tools to ingest and interpret large stores of data in heterogeneous and distributed systems

Integration of data (numeric, textual, image, spatial) with tools for modeling, trend analysis, mapping, image processing, etc.

Business applications not studied in this context

Page 3: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ3

Informatics at UCSB

Emergence of informatics as a specialty in several academic departments, notably environmental sciences

Highly interdisciplinary faculty Development of unique stand-alone systems for

managing collaborative research data No ongoing mechanisms for communication and

technical coordination Campus and consortial projects emerging for

digital publications and for instructional support but not yet for research data

Page 4: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ4

Faculty Research Data

Large numeric data sets from physical sciences and laboratory research

Imaging – geosciences, neurosciences

Fieldwork – environmental, archaeological

Customized interpretive and manipulation tools

Drafts, correspondence, notes

Page 5: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ5

UCSB Computing Environment

One of the original nodes of the Internet

No centralized academic computing organization

Offices for networking, and for instructional support

Individual colleges and departments have developed own servers and support for research data and teaching tools

High-level campus policy board for IT issues brings some coordination

Page 6: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ6

UCSB Library Context

Alexandria Digital Library (www.alexandria.ucsb.edu) Extension into new disciplinary applications Heterogeneous metadata ingest Extensive backup and archiving architecture Long record of faculty collaboration NDIIPP

California Digital Library (www.cdlib.org) Digital preservation initiatives for published documents and for

(under development) government information web sites eScholarship program to support publication of online journals,

preprint archives Online Archive of California – special collections support

Other faculty support Electronic reserves including streaming audio reserves Digital document delivery to the desktop

Page 7: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ7

What questions emerge from this?

Why are faculty building informatics systems?

Is valuable research time and funding being spent on tangential work?

Are there commonalities across informatics applications and disciplines?

Is there redundancy in tool development?

Can data be openly accessed or shared?

Are digital library concerns (metadata, IP rights, archiving) incorporated?

Page 8: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ8

Informatics Project Goals

Create stronger linkages among relevant faculty research projects

Identify components and needs in informatics and the management of research data

Assess the degree of commonality in informatics tools and functionality

Determine whether more support is needed for data archiving, metadata, interfaces, IP

Develop a planning agenda for informatics in a distributed environment

Inform the design of facilities and services

Page 9: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ9

Project Components

Background research in current informatics work in academic disciplines

Structured interviews and site visits with selected faculty

Matrix of system characteristics and issues

Informal roundtables for faculty working in these areas

Collaboration with related IT units

White paper for campus discussion of futures

Page 10: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ10

UCSB Informatics: Participants

Faculty chosen on the basis of Innovative science Data intensive work Interdisciplinary research Recommended by the Office of Research, colleagues,

department heads, IT offices and librarians.

Control Group: Non-science faculty Select group of technologically innovative faculty in other

disciplines were used as a control to determine whether trends were specific to sciences

About 40 people interviewed

Page 11: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ11

Sample Questions for Faculty

How do you store research information? Do you do any cataloging, indexing, or metadata? How are your data maintained on an on-going basis? Is there something special about the way that you manage

your data compared to colleagues within the field? Do you write or borrow scripts/tools? For what purpose? Are you having difficulty managing your data collection? Are

there services that you wish others would provide? How is IP and sharing of datasets/information handled in your

field? When you collaborate with others through the web what kinds

of tools, if any, do you use? What are your plans for this research in the next five years?

Are there service requirements that you will need then?

Page 12: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ12

Findings: Growth of Systems

The sophistication of informatics arrangements is determined by the amount of data collected and how labor-intensive it is to collect.

Change happens when the following converge: Data size increases exponentially Research questions encompass broad range of specialties Funding agencies require change for funding

Guiding principles seem to be: “What is the smallest group of people that I can have do

the work, and still do the [work]” “What is the least amount of indirect work [e.g.,

informatics] related to the research that I can do, and still do the [work]”

Page 13: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ13

Findings: Data Preservation

Perceived Long-term Preservation Need of Faculty and Staff Researchers

Impact Unknown31%

Some Need50%

Future Need3%

Critical Need16%

Page 14: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ14

Findings: Data Preservation

Some science fields have national and international data centers where data deposit is required for grant funding.

Where data centers do not exist, backup depends on: Length of a grant Length of time primary researcher on campus Perception that data has maximum value for 12-18 months after

publication, and negligible value after 5-10 years.

Departments lack personnel and support for long-term preservation of data.

Faculty store data on the “removable media of the day” and forget about it, until it becomes difficult or impossible to access

More complex systems, same number of people to manage them, leads to less time to devote to “meta-issues”

Critical impact: research collaboration and long term historical data analysis suffer

Page 15: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ15

Data Preservation Practices

Page 16: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ16

Findings: Data Organization

Most common organizing mechanism – directory structure, spreadsheets, and word processing software

Databases (with or without metadata) are uncommon. Viewed as time/labor-intensive, unnecessary drain on research time.

Portals built by tech specialists within a field are well utilized. Storage space is adequate for now. Over half the people

contacted were in the process of upgrading. Most departments did not have strictly enforced limits on

email, data storage, and personal storage Though much on their servers is “garbage,” memory is thrown

at the problem; little support in most departments for data management

“Not a solved problem.” While actual memory might be cheap, tape, labor, and other equipment to ensure that data are maintained is NOT.

Page 17: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ17

Findings: Metadata issues

Metadata is discipline specific; commonalities exist, but key requirements of a discipline vary.

Metadata structures and subject taxonomies reflect the way faculty in a discipline think

While organizational structure is an important issue in metadata use, other considerations are: Services available in one’s discipline Acceptance and standardization in the discipline Usage in key portals, data centers, and repositories

One worldwide metadata format is not likely at this time

Interdisciplinary metadata issues and crosswalks

Page 18: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ18

Metadata Usage

Used in select projects.

11%

On campus usage only19%

Assisted in development of

metadata5%

Rarely used.38%

Consistent use at data centers/portals

27%

Page 19: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ19

Findings: Intellectual Property

Intellectual property protocols that faculty follow after creating software, portals or databases are highly correlated to the discipline. In disciplines where things move quickly, the ideal method

is to open source one’s tool to obtain an audience, then later align oneself with a company, or start one;

In disciplines where there is a lot of money there is pressure to ensure patents are filed.

Databases, portals and data centers on campus typically all have legal waiver forms, allowing release of the data sets to other researchers as part of the process to ingest the data.

Disciplines vary in the extent to which they support an ethic of data sharing.

Page 20: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ20

Digital Rights Management Practices

Prefer to create open source

products to avoid intellectual property issues, 22%

Practices and Procedures in industry are well tested and accepted - no major issues,

16%Occasional minor issues with

an individual collaborator or publisher, 24%

Intellectual property issues

affect my research

significantly, 30%

Have not yet encountered

issues, 8%

Page 21: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ21

Findings: Data Support Needs

Some needs and services were mentioned across disciplines regardless of current arrangements: Informatics “point person” or clearinghouse for

information on tools, expertise, and research knowledge on campus and nationally

Long term archiving of research data especially during the gap in coverage between publication and obsolescence

Tiered support services for database development, cataloging, conversion, emulation, migration, web development, metadata, pre-planning for technology grants

Page 22: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ22

Trends Shaping Future Demand

Growth in complex data objects Improved data mining Policies of funding agencies

National repositories New cyberinfrastructure initiatives

Prevalence of campus repositories for text Tech-intensive academic programs Need for rapid and global data exchange Steady or decreasing staffing

Page 23: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ23

Key System Characteristics

Flexibility to customize control, interfaces and security

Secure access worldwide Metadata-agnostic design Interoperability with scholarly

communication, archiving and rights management systems

Clearinghouse functions Advanced services for migration, emulation,

long-term digital archiving

Page 24: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ24

Topics for Campus Discussion

Where are the gaps in current offerings? How do technology services on campus

interact, and are new organizational models needed?

What are faculty priorities for various services? What kinds of research data should be high

priority for preservation, and how much is at risk?

What are incentives for faculty participation? What is the impact of tenure and promotion

structures in encouraging “data maintenance work?”

Page 25: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ25

Possible outcomes

Everything stays as is More peer-to-peer sharing of resources and

expertise Policies are established

Intellectual property rights at several levels Use of metadata and digital object standards Ensure data sustainability

Organizational approaches are considered IT offices, the library, consortial systems support, disciplinary

groups, or a combination

New services are offered Database design Metadata creation Consulting Clearinghouse functions Full digital archiving and migration

Page 26: ECURE 2005, Phoenix, AZ Faculty Research Data: Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara

March 1, 2005 ECURE 2005, Phoenix, AZ26

Further Information

UCSB Informatics Project web site: http://www.library.ucsb.edu/informatics/

ECAR Research Bulletin, vol. 2005, Issue 2: “Informatics and Knowledge Management for Faculty Research Data,” Jan. 18, 2005

Contact: Sarah M. Pritchard, University Librarian

[email protected] Larry Carver, Director of Library Technologies and

Digital Initiatives, [email protected]

• Special thanks to Smiti Anand, Project Analyst