l&p humphrey stewart-shearer-joint session project arc & federated dmp pilot

39
THE FEDERATED PILOT and PROJECT ARC Walter Stewart – RDC Co-ordinator Kathleen Shearer – RDC and CARL Chuck Humphrey – U of Alberta Library

Upload: casrai

Post on 16-Jul-2015

122 views

Category:

Science


1 download

TRANSCRIPT

THE FEDERATED PILOTand

PROJECT ARC

Walter Stewart – RDC Co-ordinator

Kathleen Shearer – RDC and CARL

Chuck Humphrey – U of Alberta Library

The Canadian Association of Research Libraries

Project ARC

Kathleen Shearer, RDC and CARL

THE TREND TOWARDS SHARED SERVICES• Other countries are developing shared services and

infrastructure to support research data management services.

• Why? To address cost redundancies, pools knowledge, breaks down silos across disciplines.

• Common shared services are: discovery, data registries, support and expertise, training, shared repositories and preservation.

• General trend towards domain-based services to generic RDM infrastructure. There are common infrastructure and service requirements across domains!

PROJECT ARC• Initiated and supported by CARL

• December 2013 initial stakeholders meeting

• March 2014 working group launched, for 1 year

• Working group members represent

– CARL

– all four regional academic library associations: CAUL, COPPUL, OCUL and Quebec

– CRKN

• Includes some of Canada’s top research data management experts

PROJECT ARC

Builds on previous efforts by CARL to improve capacity at Canadian universities in the area of research data management:• 2009 Research Data Management Toolkit: Unseen

Opportunities• 2010 Library Roles in Management Research Data• 2011-2012 Proposal: “Canadian National

Collaborative Data Infrastructure Project” • 2013 RDM Course: Introduction to Research Data

Management Services

PROJECT ARC AIM AND VISION

• A future in which Canada capitalizes on the trend towards data intensive research and is a world leader in research and innovation

• This future is achievable, with comprehensive support for research data management at a national scale.

• Project ARC aim is to improve our national capacity for the management, preservation, and re-use of research data.

PROJECT ARC – SCOPE

• Bring together existing library-based initiatives to better coordinate activities and build capacity across the country

• Lay the foundation for a library-based research data management network

• Work closely with other stakeholders (e.g. CANARIE, Compute Canada, Research Data Canada) to ensure integration with and support for other infrastructures and initiatives in Canada

PROJECT ARC - PRINCIPLES

• Data are a public good

• Intelligent access: openness, with respect for privacy

• Collaborative approaches: cost savings and sharing expertise

• Inclusiveness: aim to serve all researchers and create a more level playing field

• Commitment to standards and interoperability

• International relationships: liaise internationally and ensure our work is in keeping with international practices

• Respect for differences: flexibility to meet the needs of different regions, institutions, and disciplines

• Open source: Tools will be contributed back to the community

• Stewardship: a sense of responsibility for managing research data over the long term

OBJECTIVES OF PROJECT ARCLiaising closely with all relevant stakeholder in this arena, 1. Provide support for institutions to deliver data

management plans (DMPs)2. Develop a plan for the implementation of a centre

of expertise for the curation of research data in Canada

3. Undertake a pilot that will act as an exemplar for a national preservation service for research data

4. Develop an organizational framework and operational plan for a library-based research data management network in Canada

PORTAGEAt Project ARC mid-point (September 2014), a network name was proposed and concepts were refined…

The Portage network will have two major components:

• A distributed centre of expertise for research data management, and

• A national preservation system for research data that will evolve and expand over time

NETWORK CENTRE OF EXPERTISE1) Comprehensive set of resources to support data management planning

– How-to guides, case studies, training materials

– Cooperation with UK Digital Curation Centre (DCC)

– Currently being collected on Project ARC website

2) National DMP automated tool to assist Canadian researchers in developing management plans

– DMP online (originally developed by DCC) selected

3) Consulting services

– Draw on expertise of librarians and others from across the country

– Support data curation, training, DMPs, discovery, preservation, privacy-security-ethics

– Build human capacity across the country

NATIONAL PRESERVATION SYSTEM

(more from Chuck and Walter)Advice and support for researchers depends on viable technical solutions!

• Continue a pilot in close collaboration with Compute Canada and RDC, including some of the domain data centres

• Domain data centres currently involved are Canadian Astronomy Data Centre and C-Brain, whose creation was supported by the CANARIE research software program

• Goal is to enable all interested academic libraries to participate, whether or not they have their own local infrastructure

• Complements high performance computing infrastructure and domain repositories and contributes integration layer

STILL TO COME

• Governance

• Service Models

• Funding

The Federated Pilot

Walter Stewart RDC

A COLLABORATIVE EFFORT AMONG RDC, CARL Project Arc, Compute Canada, CANARIE,

Scholar’s Portal, SFU Libraries, CANFAR, C-Brain, CPDN

Born at the DI Summit2014

THE CONTEXT:

• Data are both a product and a resource for 21st

century discovery.

• The TC3+ are preparing to require Data

Management Plans as part of the funding

application process.

• The federal government has extended its

commitment to open government and open data

to cover federally funded research:

THE CONTEXT:

The Government of Canada will maximize access to federally funded scientific research to encourage greater collaboration and engagement with the scientific community, the private sector, and the public.

Among the commitments for 2014 to 2016:

Launch of open access to publications and data resulting from federally funded scientific activities

Canada's Action Plan on Open Government 2014-16

http://open.canada.ca/

THE PROBLEM:

• Data that cannot be discovered cannot be open!

• Data that are only on someone’s hard drive or

memory stick cannot be open!

• Data that are not curated cannot be open for

long!

THE PROBLEM:

• Currently in Canada, most researchers lack

access to the services and the infrastructure

that would permit them to be good stewards of

their research data and to make it accessible.

• Outside of some data intensive disciplines, little

is in place to provide for the long-term curation

and preservation of data

THE OPPORTUNITY:

• Many of the elements for a national system of

data stewardship are in place – the networks

that are required with CANARIE and the ORANS;

storage systems at Compute Canada; data

expertise in research libraries and the ARC

Project of CARL; significant experience in

developing repositories with CANFAR, C-Brain,

and CPDN among others.

THE CHALLENGE:

• Can we integrate those elements at a pilot level?

• Can we work with a small set of researchers to

ingest their data into a storage and curation

environment easily and seamlessly in a manner

that provides for easy retrieval?

• Can we create this opportunity first at a local

level and then demonstrate integration among a

few local sites into a proto-typical national

network that provides appropriate replication

and the basis for long-term preservation?

THE ANTICIPATED RESULT:

• Anticipating meeting the challenge successfully,

we hope to be able to arrive at a set of

conclusions that will allow us to make

recommendations on what would be required to

grow such a prototypical system into a truly

national network that would serve those parts of

the research community currently unserved and

would provide further support and backup for

existing repositories, some of which have

concerns about their long-term viability.

PROGRESS TO DATE:

• We have a model identified about which Chuck

Humphrey will speak in a moment.

• Building on a local scale project at SFU, we have

researcher data being moved into Compute

Canada storage resources by library staff

• We have a plan for a similar activity at the

University of Toronto to get underway in 2015

NEXT STEPS:

• We will shortly start having researchers do their own

ingest directly into the repository and archive

environment at SFU

• We will be looking at establishing a duplicative

installation at another university

• We will be looking to test replication services

• We will look to have researchers use the system at a

distance

• We will minutely detail the processes

• We will begin to discuss what it would take to scale

The Pilot Environment

Chuck Humphrey, University of Alberta Libraries

The PilotWorking with existing digital technology and expertise, the Pilot is to assemble a research data management infrastructure demonstrating interoperability among data repositories and the archiving of research data.

LEVELS OF DATA STEWARDSHIP• Research data management infrastructure supports data

stewardship that occurs at different levels across the research lifecycle.o The researcher at the project levelo The data repository levelo The interoperability level at the regional and national

level

EXCHANGES AMONG LEVELS

• The exchange of research data and metadata among these three levels can encounter barriers or gaps.

• A Data Management Plan is helpful in identifying a pathway for data and metadata across levels, bridging gaps and overcoming barriers.

KEY OBJECTIVES

Because no national RDM infrastructure exists today, the pilot is building pathways across levels and assembling an operational system to demonstrate a community response to providing RDM infrastructure.

PROJECT OR RESEARCH PROGRAM LEVEL

DATA REPOSITORY LEVEL

PROJECT OR RESEARCH PROGRAM LEVEL

DATA REPOSITORY LEVEL

EXC

HA

NG

E

PROJECT OR RESEARCH PROGRAM LEVEL

PROJECT LEVELDATA REPOSITORY LEVEL

PROJECT LEVELDATA REPOSITORY LEVEL

NETWORK LEVEL

PROJECT LEVELDATA REPOSITORY LEVEL

NETWORK LEVEL

EXC

HA

NG

E

aticaInstance(s

)

aticaInstance(s

)

ArchivematicaInstance(s)

managed by CARL/RDC

Writes Data Package to

Compute Canada HPC

Facilities

Custom search app or domain-specific search

apps

Compute Canada Storage

Systems

Interoperability Layer

Questions and Discussion