a datacite case study from the uk data archive

29
……………………………………………………………………………………………………………………………….…………………………… ………………………………………………………………………………………………………………………………………………………………… UK DATA ARCHIVE A DATACITE CASE STUDY FROM THE UK DATA ARCHIVE …………………………………………………………………………………………………… TOM ENSOM …………………….…………………………….… UK DATA SERVICE UK DATA ARCHIVE UNIVERSITY OF ESSEX ………………………………..……………………. C4D WORKSHOP, JULY 2013, LONDON

Upload: tracy

Post on 25-Feb-2016

30 views

Category:

Documents


3 download

DESCRIPTION

……………………………………………………………………………………………………. A DATACITE CASE STUDY FROM THE UK DATA ARCHIVE. TOM ENSOM …………………….…………………………….… UK DATA SERVICE UK DATA ARCHIVE UNIVERSITY OF ESSEX ………………………………..……………………. C4D WORKSHOP , JULY 2013, LONDON. WHO WE ARE. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

A DATACITE CASE STUDY FROM THE UK DATA ARCHIVE

……………………………………………………………………………………………………

TOM ENSOM…………………….…………………………….…

UK DATA SERVICEUK DATA ARCHIVEUNIVERSITY OF ESSEX………………………………..…………………….

C4D WORKSHOP, JULY 2013, LONDON

Page 2: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

WHO WE ARE

• Established in 1968 - 46 years of selecting, curating, preserving and providing access to social science data

• 6,000 datasets in the collection• Over 25,000 registered users

• Data and data support services for higher and further education for research, teaching and learning

• Have been registered to ISO 27001 (information security standard) since June 2010

Page 3: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

OUR SERVICES

• UK Data Archive itself a department of the University of Essex

• Distributed service established 1 January 2003 called the Economic and Social Data Service (ESDS)

• New five-year UK Data Service from 2012

Page 4: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

WHAT WE DO

• Research & development, innovation • Promoting best practice in data curation

• Raise standards in data security and awareness of ethical/legal issues

• Raise standards in data management• Data management hub• We provide guidance to ESRC

researchers and anyone else who asks

Page 5: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

WE SUPPORT RESEARCHERS

• Popular training materials• Managing and Sharing Guide• Training Resources

• Website:http://data-archive.ac.uk/create-manage

• Bespoke training events• Large and small scale workshops

Page 6: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

ENGAGEMENT WITH RDM COMMUNITY

• Recently completed JISC Managing Research Data project with University of Essex

• Cross support service, departmental engagement• Piloted an RDM infrastructure• http://www.data-archive.ac.uk/create-manage/proj

ects/rd-essex

• Outputs of value to RDM community:• Metadata profile for institutional data

repositories• Research data plugin for EPrints

Page 7: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

WHY CITE DATA?

It’s a vital part of a rigorous research process:

• Acknowledges researcher’s sources • Gives data creators, authors and data

curators proper credit when their work is reused

• Facilitates data resource discovery and access

• Helps track the use and impact of data collections

Page 8: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

OUR APPROACH TO CITATION

• Required by our user agreement (End User Licence) for many years:

Page 9: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

OUR APPROACH TO CITATION

• Should include enough information to ensure the exact version can be located

“University of Essex. Institute for Social and Economic Research and National Centre for Social Research, Understanding Society: Wave 1, 2009-2010 [computer file]. 2nd Edition. Colchester, Essex: UK Data Archive [distributor], November 2011. SN: 6614.”

• No widely agreed standard citation format yet!• Version information crucial

Page 10: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

PERSISTENT IDENTIFERS

• Persistent Identifiers (PIDs) • A string identifying a clearly

defined digital object• Persistence must mean enduring• Identifiers must be unique

• PIDs have been attached to scientific publications for some time

• Next logical step: data• Also being applied to other entities

e.g. people via ORCID system

Page 11: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

CHANGES TO DATA

• Our ‘data collections’ are not discrete digital objects

• Approx. 15% UKDA data collections are altered within first year after publication

• Versioning - we need to distinguish between major and minor changes to a data collection

• Integrate processes with:• Digital preservation activities• Current ingest infrastructure / workflows

Page 12: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

MINOR CHANGES – LOW IMPACT

• Publication reference added• Correction of spelling in variable

labels• Small changes in variable labels• Removal of (erroneously

supplied) admin variables• Correction of spelling in

metadata• Minor changes in documentation• New index (keyword) terms• Additional documentation added

(non-fundamental)• Change in access conditions

Page 13: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

MAJOR CHANGES – HIGH IMPACT

• Adding new ‘waves’ in a data series

• New variable added

• New labels/value codes added

• Weighting variables reconstructed

• Wrong data supplied (e.g., March not April)

• Mis-coded data (e.g., Don’t know/Refused mix-up)

• Change in format (file migration)

• Significant changes in documentation

• Change in access conditions

Page 14: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

DATACITE DOIs

• 2011: we started working with the British Library and DataCite to develop a permanent, reliable method of citing our data collections

• DataCite • Founded by organisations from six

countries• Established a citation format for research

data, including a DOI• Works with data publishers, e.g.

established data centres and institutional repositories

Page 15: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

WHY DATACITE?

Not the only choice, but right for us:• DOI framework an international and persistent

standard for identifying digital objects

• Familiar within the research data domain

• Centralised resolution service

• Metadata registry (and thus de facto standard)

• Discovery link up

• API – allowing for automation of minting process (but also manual if you prefer!)

Page 16: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

DOI FORMAT

Readable archive identifier

Resource identifier type

Resource identifier

Resource version

10.5255 / UKDA – SN – 1 – 1

Unique archive identifier

Page 17: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

DOI VERSIONING

……

……

……

……

.……

……

……

……

……

……

……

……

……

……

High impact change

10.5255/UKDA-SN-1-1

10.5255/UKDA-SN-1-2

Low impact change

10.5255/UKDA-SN-1-1Increments

major version – new DOI

Increments minor version - internal

Page 18: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

• New data collection ‘ingested’

• Structured DOI ‘created’

• New change log• New citation file

CREATING A NEW DOI

• DataCite API sends back an approval

• Flagged behind the scenes

• Minimal DataCite metadata inc. requested DOI pushed to DataCite metadata store via API

Page 19: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

• Minimal DataCite metadata inc. requested DOI pushed to DataCite metadata store via API

• DataCite API sends back an approval

• Flagged behind the scenes

• High impact change to data collection

• Incremental DOI version ‘created’

• Update change log

• New citation file

UPDATING A DOI – HIGH IMPACT

Page 20: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

• Minimal DataCite metadata pushed to DataCite metadata store via API

• Low impact change to data collection

• Update change log

UPDATING A DOI – LOW IMPACT

• DataCite API sends back an approval

• Flagged behind the scenes

Page 21: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

THE END RESULT…

DOI: SN-####-1

DOI: SN-####-3

DOI: SN-####-2

SN####Survey Waves 1-13

SN####Survey Waves 1-14

SN####Survey Waves 1-15

Instance-specific data and metadata

Instance-specific data and metadata

(current)

Instance-specific data and metadata

Jump page (= change log)

Page 22: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

Page 23: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

Page 24: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

OUR DOI METADATA

Page 25: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

Page 26: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

CHALLENGES FOR THE FUTURE

• Citing parts (fragments) of data collections• single files• subsets of quantitative data files • extracts of textual data

• Still uncertainty over where exactly research data should go – IR, Subject Specific Repository, Data Journal?• Who should be minting DOIs?• Avoid assigning multiple identifiers to an object

Page 27: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

ESRC’s CITATION AWARENESS GUIDE

Page 28: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

ACKNOWLEDGEMENTS

Thanks to the following UKDA/UKDS staff for their assistance in putting this together:

• Matthew Woollard• Louise Corti • John Payne• Matthew Brumpton• Sharon Bolton

Page 29: A  DATACITE  CASE STUDY FROM THE  UK DATA ARCHIVE

……………………………………………………………………………………………………………………………….……………………………

…………………………………………………………………………………………………………………………………………………………………

UK DATA ARCHIVE

CONTACT

TOM ENSOM

UK DATA ARCHIVEUNIVERSITY OF ESSEXWIVENHOE PARKCOLCHESTERESSEX CO4 3SQ……………..…..………………………..T +44 (0)1206 872974 E [email protected]