metadata and rif-cs in plain english seqld data intensive - 30 january 2015 kathryn unsworth

Post on 18-Dec-2015

220 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Metadata and RIF-CS in plain English

SEQld Data Intensive - 30 January 2015Kathryn Unsworth

2

http://www.datamation.com/cnews/article.php/3878261/Tech-Comics-Whats-Metadata.htm

Comic strip picture of a simple explanation of metadata – an 8 letter word. Not included due to copyright constraints.

3

What is metadata?

Metadata is a means of collecting or structuring data about the content of other data

Example: catalogue record

6

Describing publications is easy, but data….?

Research data varies widely between and within disciplines:

▪ Methods of data collection

▪ Number, type and size of data files

▪ Electronic/physical resources or combinations of

▪ Software/hardware dependencies

▪ Contextual information

▪ Legal restrictions

▪ Ethical restrictions

▪ Access restrictions

This has lead to the proliferation of metadata schemas for data.

7

Metadata standards – why we need them

▪ Provide us with a common way of describing information resources

▪ Facilitate discoverability of resources

▪ Facilitate exchange of data between systems – interoperability

8

So…Requirements for describing datasets efficiently differ somewhat from the metadata used to describe publications, but the same principles are applicable to both.

9

1. Enable a (re)user to find a dataset of which either(A) the data creator(B) the title of the dataset(C) the subject is known

2. To show what is available(D) by a given data creator(E) on a given subject(F) in a given format/data type

3. To assist in the choice of a dataset(G) as to temporal and/or spatial coverage (when and where)(H) as to its authority and access (associated publications, access rights)

Keep the (re)user in mind when describing datasets Based on a customised version of Cutter’s cataloguing objects and the FISO user tasks model

Find

IdentifySelectObtain

Recent changes to RIF-CS now include: access

descriptions – open, conditional, restricted

10

Descriptive metadata (intellectual content) Administrative metadata (rights, technical, preservation)

Structural metadata (relationship between the parts)

Discovery/access points

11

Traditional publication (homogenous)

VS Data publication(heterogeneous)

MARC21 RIF-CS Example

Item-level description Collection-level description

Fields and tags Elements and attributes Standards for consistency

655$a <collection type> “Dataset”

245$a <namePart> “Historical coastlines (community perspectives) : manuscript and images archive”

700$a <party type=“person”><namePart>

“Steve Mullins”

710$a,b <party type=“group”><namePart>

“Faculty of Arts, Humanities and Education CQU”

520%a <description type=“full”> “Photographic images are useful tools for environmental historians and have their place alongside the official documents generated by environmental regulators and managers, and the unofficial written observations and…”

654$a <subjects type=“anz-for”> 2103 – Historical Studies

542$f <rightsAccess> Rights statementThis work is copyright. Permission is given for non-profit electronic viewing, via the Internet. Apart from this, and any use as permitted under the Australian Copyright Act 1968 no part may be reproduced or copied by any process, without written permission.

700$0 <identifier type> ORCID - 0000-0001-5207-5061

13

RIF-CS elements, a quick look

▪ ISO 2146 standard: Registry services for libraries and related organizations

▪ ANDS implementation describes not just collections but also the researchers, research activities that surround and are linked to a research data collection and the systems that support data collections – the ‘mesh’

▪ RIF-CS is an XML schema for sharing metadata between source repository and the ANDS Collections Registry

▪ Is reviewed annually by the RIF-CS Advisory Board

ISO 2146 and RIF-CS interchange schema

14

15

The structure of RIF-CS (based on ISO 2146 standard)

ISO 2146 object Description

Collection an aggregation of physical or digital objects

Party a person or group

Activity Something occurring over time that generates one or more outputs

Service a physical or electronic interface that provides its users with benefits such as work done by a party or access to a collection or activity

16

Metadata pathways to RDA

• Create records manually in the ANDS Registry

• Create automatic RIF-CS feed for harvesting by ANDS into RDA

• Configure your harvest for schema that is not RIF-CS, eg.CKAN, ISO19115

Do you know what method your institution will use?

17

A primary, abbreviated and/or alternative name for the data collection

Rainfall in northern Australia 1980-1989

Better…

Daily rainfall observations over the northern Australian tropics, November to February, 1980-1989

Name

Description

18

A full or brief description of the data collection

Rainfall observations were taken over a 10-year period across northern Australia.

Better…

The dataset consists of rainfall observations taken during the wet season in northern Australia, over a 10 year period, 1950-1959. It is part of an ongoing longitudinal study of weather in the region. Observations are made daily at 157 geographic locations across the area. Data is sent to a central point in Darwin. Measurements are recorded in millimetres: 0.2 to 9; 10 to 24; 25 to 49; 50 to 99; 100 to 149; 150 +; Data is recorded in spreadsheets and calculated hourly, daily, weekly and monthly. Statistical analysis of the data was made using Excel.

Subject

19

The subject represents the primary topic or topics covered by the collection.

Rainfall

Better…

040102 (text value = Weather Research & Forecast Model (WRF)) (type = anzsrc)Rainfall frequencies (type = lcsh)Rainfall in northern Australia (type = local)

Coverage (spatial)

20

Spatial coverage refers to the geographical area where data was collected or a place which is the subject of a collection.

Australia

Better…

121.394531,-17.662169 147.146484,-17.662169 142.312500,-9.901036 130.271484,-11.713920 121.394531,-17.662169 (type = kmlPolyCoords)Northern Australia (type = text)AU-NSW (type = iso31662)

Coverage (temporal)

21

Temporal coverage refers to a time period during which data was collected or observations made or a time period that collection is linked to intellectually or thematically

1950-1959

Better…

November-February, 1950-1959 (type = text)

Rights

22

Rights held in and over the data collection.

copyright

Better…

Copyright 2011. Use of the data is subject to legal, ethical and commercial restrictions.Licenced under Attribution 3.0 Australia (CC BY 3.0)

Access rights

23

Information about access rights to the data collection. Recent changes to RIF-CS now allow for selection of the following access statements: open, conditional or restricted.

Restricted. Access to this dataset is restricted

Better…

Conditional. Access to this data collection is by negotiation with Professor Rayne Fall.

Better…

Open. Access to this data collection is open.

Identifier

24

Identifiers uniquely identify the collection within the domain of a specified authority; persistent identifiers are preferred.

ID:848086

Better…

http://repository.ari.org.au/collectionId=848086 (type = uri)Other exampleshttp://hdl.handle.net/1259.13/847351 (type = uri)1259.13/847351 (type = hdl)

Location/address

25

The address of the collection (electronic or physical), or another address which enables access to the collection.

Australian Research Institution Western Australia

or

http://repository.ari.org.au/collectionId=848086 (type = uri) rayne.fall@ari.org.au (type = email)

Related object (party)

26

A party (person or group) related to the data collection.

Not used; if used, relation is incorrect

Better…

Key: ari.org.au/researcher:3852740Relation: hasCollector (Professor Rayne Fall - person) Key: www.ari.org.auRelation: isManagedBy (Australian Research Institution - group)Other exampleshttp://nla.gov.au/nla.party-471322 (NLA party record)http://orcid.org/0000-0003-0635-1998 (ORCID party record)

Related object (activity)

27

An activity related to the collection

Not used or is made up where no real activity exists; if used, relation is incorrect

Better…

Key: ari.org.au/activity:2011-380473Relation: isOutputOf (Rainfall patterns in the northern Australian tropics during the wet period: a longitudinal study from 1950 onwards)Other examplehttp://purl.org/au-research/grants/nhmrc/100009

Related information

28

Related information that provides contextual information about the data collection.

Title: Title not includedIdentifier: ISSN to the journal (type = publication)

Better…

Title: Rainfall in the northern Australian tropics: a statistical analysis of rainfall over a 50 year period, 1950-1999Identifier: http://dx.doi.org/10.1109/ISSTA.2002.1048560 (type = publication; the identifier is the journal article’s url)

Citation

29

Citation is the preferred form for citing a dataset or collection in a publication or other bibliographic environment.

Citation given is to the research publication based on the data, not the collection

Better…

Fall, R (2011): Daily rainfall observations over the northern Australian tropics, November to February, 1950-1959. [place of publication, publisher]. doi:10.1109/2002.1048397 http://dx.doi.org/10.1109/2002.1048397, 2011. (type = fullCitation)

30http://researchdata.ands.org.au

31

Any questions?

32

This work is licensed under a Creative Commons Attribution 3.0 Australia License

ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy (NCRIS).

top related