metadata and rif-cs in plain english seqld data intensive - 30 january 2015 kathryn unsworth

30
Metadata and RIF- CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

Upload: aldous-alan-hutchinson

Post on 18-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

Metadata and RIF-CS in plain English

SEQld Data Intensive - 30 January 2015Kathryn Unsworth

Page 2: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

2

http://www.datamation.com/cnews/article.php/3878261/Tech-Comics-Whats-Metadata.htm

Comic strip picture of a simple explanation of metadata – an 8 letter word. Not included due to copyright constraints.

Page 3: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

3

What is metadata?

Metadata is a means of collecting or structuring data about the content of other data

Example: catalogue record

Page 5: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

6

Describing publications is easy, but data….?

Research data varies widely between and within disciplines:

▪ Methods of data collection

▪ Number, type and size of data files

▪ Electronic/physical resources or combinations of

▪ Software/hardware dependencies

▪ Contextual information

▪ Legal restrictions

▪ Ethical restrictions

▪ Access restrictions

This has lead to the proliferation of metadata schemas for data.

Page 6: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

7

Metadata standards – why we need them

▪ Provide us with a common way of describing information resources

▪ Facilitate discoverability of resources

▪ Facilitate exchange of data between systems – interoperability

Page 7: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

8

So…Requirements for describing datasets efficiently differ somewhat from the metadata used to describe publications, but the same principles are applicable to both.

Page 8: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

9

1. Enable a (re)user to find a dataset of which either(A) the data creator(B) the title of the dataset(C) the subject is known

2. To show what is available(D) by a given data creator(E) on a given subject(F) in a given format/data type

3. To assist in the choice of a dataset(G) as to temporal and/or spatial coverage (when and where)(H) as to its authority and access (associated publications, access rights)

Keep the (re)user in mind when describing datasets Based on a customised version of Cutter’s cataloguing objects and the FISO user tasks model

Find

IdentifySelectObtain

Recent changes to RIF-CS now include: access

descriptions – open, conditional, restricted

Page 9: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

10

Descriptive metadata (intellectual content) Administrative metadata (rights, technical, preservation)

Structural metadata (relationship between the parts)

Discovery/access points

Page 10: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

11

Traditional publication (homogenous)

VS Data publication(heterogeneous)

MARC21 RIF-CS Example

Item-level description Collection-level description

Fields and tags Elements and attributes Standards for consistency

655$a <collection type> “Dataset”

245$a <namePart> “Historical coastlines (community perspectives) : manuscript and images archive”

700$a <party type=“person”><namePart>

“Steve Mullins”

710$a,b <party type=“group”><namePart>

“Faculty of Arts, Humanities and Education CQU”

520%a <description type=“full”> “Photographic images are useful tools for environmental historians and have their place alongside the official documents generated by environmental regulators and managers, and the unofficial written observations and…”

654$a <subjects type=“anz-for”> 2103 – Historical Studies

542$f <rightsAccess> Rights statementThis work is copyright. Permission is given for non-profit electronic viewing, via the Internet. Apart from this, and any use as permitted under the Australian Copyright Act 1968 no part may be reproduced or copied by any process, without written permission.

700$0 <identifier type> ORCID - 0000-0001-5207-5061

Page 11: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

13

RIF-CS elements, a quick look

Page 12: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

▪ ISO 2146 standard: Registry services for libraries and related organizations

▪ ANDS implementation describes not just collections but also the researchers, research activities that surround and are linked to a research data collection and the systems that support data collections – the ‘mesh’

▪ RIF-CS is an XML schema for sharing metadata between source repository and the ANDS Collections Registry

▪ Is reviewed annually by the RIF-CS Advisory Board

ISO 2146 and RIF-CS interchange schema

14

Page 13: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

15

The structure of RIF-CS (based on ISO 2146 standard)

ISO 2146 object Description

Collection an aggregation of physical or digital objects

Party a person or group

Activity Something occurring over time that generates one or more outputs

Service a physical or electronic interface that provides its users with benefits such as work done by a party or access to a collection or activity

Page 14: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

16

Metadata pathways to RDA

• Create records manually in the ANDS Registry

• Create automatic RIF-CS feed for harvesting by ANDS into RDA

• Configure your harvest for schema that is not RIF-CS, eg.CKAN, ISO19115

Do you know what method your institution will use?

Page 15: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

17

A primary, abbreviated and/or alternative name for the data collection

Rainfall in northern Australia 1980-1989

Better…

Daily rainfall observations over the northern Australian tropics, November to February, 1980-1989

Name

Page 16: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

Description

18

A full or brief description of the data collection

Rainfall observations were taken over a 10-year period across northern Australia.

Better…

The dataset consists of rainfall observations taken during the wet season in northern Australia, over a 10 year period, 1950-1959. It is part of an ongoing longitudinal study of weather in the region. Observations are made daily at 157 geographic locations across the area. Data is sent to a central point in Darwin. Measurements are recorded in millimetres: 0.2 to 9; 10 to 24; 25 to 49; 50 to 99; 100 to 149; 150 +; Data is recorded in spreadsheets and calculated hourly, daily, weekly and monthly. Statistical analysis of the data was made using Excel.

Page 17: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

Subject

19

The subject represents the primary topic or topics covered by the collection.

Rainfall

Better…

040102 (text value = Weather Research & Forecast Model (WRF)) (type = anzsrc)Rainfall frequencies (type = lcsh)Rainfall in northern Australia (type = local)

Page 18: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

Coverage (spatial)

20

Spatial coverage refers to the geographical area where data was collected or a place which is the subject of a collection.

Australia

Better…

121.394531,-17.662169 147.146484,-17.662169 142.312500,-9.901036 130.271484,-11.713920 121.394531,-17.662169 (type = kmlPolyCoords)Northern Australia (type = text)AU-NSW (type = iso31662)

Page 19: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

Coverage (temporal)

21

Temporal coverage refers to a time period during which data was collected or observations made or a time period that collection is linked to intellectually or thematically

1950-1959

Better…

November-February, 1950-1959 (type = text)

Page 20: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

Rights

22

Rights held in and over the data collection.

copyright

Better…

Copyright 2011. Use of the data is subject to legal, ethical and commercial restrictions.Licenced under Attribution 3.0 Australia (CC BY 3.0)

Page 21: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

Access rights

23

Information about access rights to the data collection. Recent changes to RIF-CS now allow for selection of the following access statements: open, conditional or restricted.

Restricted. Access to this dataset is restricted

Better…

Conditional. Access to this data collection is by negotiation with Professor Rayne Fall.

Better…

Open. Access to this data collection is open.

Page 22: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

Identifier

24

Identifiers uniquely identify the collection within the domain of a specified authority; persistent identifiers are preferred.

ID:848086

Better…

http://repository.ari.org.au/collectionId=848086 (type = uri)Other exampleshttp://hdl.handle.net/1259.13/847351 (type = uri)1259.13/847351 (type = hdl)

Page 23: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

Location/address

25

The address of the collection (electronic or physical), or another address which enables access to the collection.

Australian Research Institution Western Australia

or

http://repository.ari.org.au/collectionId=848086 (type = uri) [email protected] (type = email)

Page 24: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

Related object (party)

26

A party (person or group) related to the data collection.

Not used; if used, relation is incorrect

Better…

Key: ari.org.au/researcher:3852740Relation: hasCollector (Professor Rayne Fall - person) Key: www.ari.org.auRelation: isManagedBy (Australian Research Institution - group)Other exampleshttp://nla.gov.au/nla.party-471322 (NLA party record)http://orcid.org/0000-0003-0635-1998 (ORCID party record)

Page 25: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

Related object (activity)

27

An activity related to the collection

Not used or is made up where no real activity exists; if used, relation is incorrect

Better…

Key: ari.org.au/activity:2011-380473Relation: isOutputOf (Rainfall patterns in the northern Australian tropics during the wet period: a longitudinal study from 1950 onwards)Other examplehttp://purl.org/au-research/grants/nhmrc/100009

Page 26: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

Related information

28

Related information that provides contextual information about the data collection.

Title: Title not includedIdentifier: ISSN to the journal (type = publication)

Better…

Title: Rainfall in the northern Australian tropics: a statistical analysis of rainfall over a 50 year period, 1950-1999Identifier: http://dx.doi.org/10.1109/ISSTA.2002.1048560 (type = publication; the identifier is the journal article’s url)

Page 27: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

Citation

29

Citation is the preferred form for citing a dataset or collection in a publication or other bibliographic environment.

Citation given is to the research publication based on the data, not the collection

Better…

Fall, R (2011): Daily rainfall observations over the northern Australian tropics, November to February, 1950-1959. [place of publication, publisher]. doi:10.1109/2002.1048397 http://dx.doi.org/10.1109/2002.1048397, 2011. (type = fullCitation)

Page 28: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

30http://researchdata.ands.org.au

Page 29: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

31

Any questions?

Page 30: Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth

32

This work is licensed under a Creative Commons Attribution 3.0 Australia License

ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy (NCRIS).