doi workshop: metadata issues · passenger employee voter dog owner in each of these roles...

42
Metadata issues and DOI

Upload: others

Post on 18-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

Metadata issues and DOI

Page 2: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

Metadata issues and DOI

overview of presentation...

BackgroundThree <indecs> conclusionsThe metadata landscape: which schemes matter most to DOI?

DOI metadata - practical implicationsDOI applications: sets of metadata for a useDOI KernelHandle and metadataConclusion

Page 3: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

Definitions of metadatapopular...Metadata is data about data.Everyone

logical...An item of metadata is a relationship that someone claims exists between two entities*.<indecs> framework

functional...Metadata is the life-blood of e-commerce.John Erickson (HP)

*entity = something which has identity

Page 4: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

#1: All metadata is just a view

e.g. Views of a �person�: some (generic) ways in which you might be identified in metadata schemes...

SonLegal personAgentAlienScholarLibrary userComposercredit card holderShoe purchaserAuthorLottery entrant

Hospital patientCitizenCar driverRights ownerMarathon runnerSoftware licenseeParentTax payerClub membere-consumer Back account holder

HusbandCharity giverHotel guestSpeeding ticket recipientDisneyWorld visitorFrequent FlyerConcert-goerPassengerEmployeeVoterDog owner

In each of these roles �you� will have different IDs and attributes.

Three <indecs> conclusions

Page 5: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

#1: All metadata is just a view

Creations are the same. An identifier for a published article may refer to...

A manuscript The abstract workA draft A (class of) physical copy in a publication A (class of) digital copy (not in a publication)A (class of) digital copy in a publicationA (class of) digital formatA specific digital copyA (class of) paper copyA specific paper copyAn editionA reprintA translationetc�and many combinations of the above

Similar views apply to other types of creations.

Three <indecs> conclusions

Page 6: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

#1: All metadata is just a view

Views must not be confused for digital content and rights management. Mistaken identity can be catastrophic.

Increasingly, views need to be interoperable (e.g. production workflow, rights, marketing within one business; supply chain transfer; etc.).

The need for automated, interoperable views in d-commerce will be enormous.

Three <indecs> conclusions

Page 7: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

#2: (Almost) all terms need identifiers

Each of the values of a view must be defined and identified if other views are to recognize them (what do you mean by an abstract work? an edition? a format? a scholar? a name?)

So views need comprehensive controlled vocabularies(nb our reliance on ISO language, territory, currency, time codes).

Automation needs disambiguity.

Terms of rights must be unambiguous. Anything may be a term of an agreement.

Emergence of the value of structured ontologies for commerce (like the indecs model).

Three <indecs> conclusions

Page 8: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

#3: Events are the key to interoperability

Most metadata is �thing� or �people� based.

� static views e.g. �a creation�

In the net future, metadata interoperability will be achieved by describing �events�; relating things and people

� dynamic views e.g. �A created B�

Event descriptions will also be the key to rights metadata (transactions are events)

Three <indecs> conclusions

Page 9: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

The metadata landscape

These conclusions are being reached increasingly often elsewhere.

There is an explosion of metadata activity:

� Models, Identifiers, Vocabularies, Dictionaries, Ontologies.

� XML/RDF schemas.

� Registries/Repositories/�crosswalks�.

� Technical standards.

Page 10: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

The metadata landscape for �creations�

Page 11: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

The metadata landscape for �creations�

Libraries EducationArchives Museums

Technology Newspapers

Magazines

Standards Journals

Books

TextsAudiovisual

Music CopyrightAudio

Page 12: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

1980sThe metadata landscape for �creations�

Libraries EducationArchives MuseumsMARC

Technology Newspapers

EANUPC Magazines

Books

Audio

Audiovisual

Copyright

JournalsISSNStandards

ISO codes

ISBN

Texts

CAEMusic

Page 13: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

mid 90�sThe metadata landscape for �creations�

Archives MuseumsLibraries EducationIMSIIMMARC FRBR

Technology Dublin Core Newspapers

EANUPC MagazinesDOIurl

Books

Audio

Audiovisual

Multimedia

Copyright

JournalsISSNStandardsHandleurn

ISO codes

ISBN

TextsISWCISANCISISMN CAEISRC

Music

Page 14: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

todayThe metadata landscape for �creations�

Archives MuseumsLibraries EducationIMSCIDOC IIMMARC FRBR NITFLOM

Technology Dublin Core NewspapersRDFXML schema abcISO11179 EANUPC

Books

Audio

Audiovisual

Multimedia

Copyright

Journals

Magazines

Standards

ISRC CAE

ISBN

ISSN

ISAN

MusicISMN CIS

UMIDTextsISTCSMPTE

DMCS

EPICSONIX

<indecs>

MPEG7

MPEG21

DOI

IPDA

PRISMeBooks

EBooksurl uri

Handle CROSSREFurnISO codes SICI

P/META BICIXrML

ISWC

Page 15: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

Convergence

All serious schemes are becoming...�Granular (parts and versions)�Modular (creations within

creations)�Multimedia �Multinational�Multilingual�Multipurpose

EPICS/ONIX (text)SMPTE (audiovisual)SDMI/DCMS (audio/music)eBooks DOI genresCIDOC (museums/archives)FRBR (libraries)Dublin CoreCIS (copyright societies)PRISM (magazines)NITF (newspapers)MPEG21 (multimedia)

Result: major �sector� schemes are now trying to define metadata with broadly the same scope, only different emphases.

Page 16: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

Which initiatives matter most to DOI?

MPEG21SMPTE data dictionaryONIXXrML

Criteria...Strong underlying data modelMulti-purposeExtensive, structured vocabularyCommercial critical massOutward-looking

Page 17: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

MPEG21

Began 2000 (ISO Motion Picture Expert Group).

Possible umbrella for digital multimedia standards. Place to bring technology and content standards together.

MPEG track record of disciplined standards development.

Most major players getting involved.

Not many lawyers (yet).

Short-term perception problem: �MPEG is audiovisual�.

Is the challenge too great?

Page 18: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

SMPTE Data Dictionary/UMID

Began 1998 (Society of Motion Picture and Television Engineers).

Well-structured multimedia technically-oriented data dictionary.

ISO 11179 metadata registry based, good governance and update procedure.

SMPTE track record of disciplined standards development.

UMID (Unique Media Identifier) for digital material -complementary to �editorial� identifiers like DOI.

Guaranteed implementation in �home� sector.

Start point for MPEG7 metadata work.

Page 19: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

EPICS & ONIX International

EDItEUR (EPICS) and AAP (ONIX) convergence (May 2000).

Substantial and extensible EPICS metadata dictionary, <indecs>-model based, from which �ONIX� XML-tagged subset(s) are taken.

Commerce-driven (Amazon etc) with transatlantic industry support and International Steering Group.

Likely to be used by eBooks, ISTC.

ONIX for video (Amazon initiative)? ONIX for audio?

Best chance of e-commerce multimedia vocabulary and schema (and maybe d-commerce?).

Page 20: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

XrML and Rights metadata

DRM (Digital Rights Management) systems at present are for �unitary� rights: doesn�t deal with modularity.

Holdup 1: Rights vocabularies need descriptive vocabularies - not yet ready.

Holdup 2: Events model needed to integrate descriptions and rights - event-based tools not yet developed.

XrML likely focal point for next stage.

2001+ before more mature interoperable developments start to emerge.

DOI-R? Interested partners in a prototype?

Page 21: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

Standard controlled vocabularies

Existing�Territories, Language, Currency, Date/Time (ISO)Measures (Unified Code for Units of Measure)

Needed�Creation typesDerivation types (adaptation, sample, compilation�)Contributor roles (author, translator, cameraman�)Title types (abbreviated, inverted, formal... etc)Media types (formats)Name types Identifier types Encoding types Tools/instrumentsUser roles etc...and many identifiers need establishing or creating (Parties, Agreements, ISWC, ISTC, ISAN, UMID etc)

Page 22: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

Metadata issues and DOI

DOI metadata - practical implicationsDOI �Application Profiles� and �User Communities�

(was �Genres�)DOI KernelHandle and metadataConclusion

Page 23: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

DOI Application Profile

A DOI Application Profile is a DOI view: mechanism for �unity in diversity�.

Based on any interest group�s view of a type of creation (a DOI User Community). Functional granularity: create a genre when you need it.

DOI-AP�s can overlap: creations can be in multiple DOI-APs.

DOI-AP has metadata kernel, Registration Agency,Governance /Development GroupBase Set for new, unplaced DOIs.

Zero Set = �initial implementation� DOIs (just a single URL redirection; zero additional metadata).

Page 24: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

W3C, WIPO, NISO, ISO, UDDI, etc

MetadataSingle redirection (persistent identifier) Multiple resolution

ActivitytrackingActivitytracking

Fullimplementation

Fullimplementation

Initial implementation

Initial implementation

Page 25: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

W3C, WIPO, NISO, ISO, UDDI, etc

MetadataSingle redirection (persistent identifier) Multiple resolution

Defined App ProfilesDefined

App ProfilesZero App ProfileZero App Profile

Page 26: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

DOI Kernel

Each DOI-AP starts from Base kernel (8 elements) and may add whatever else it needs: defined by the DOI User Community.

A kernel extension model is being developed

DOI metadata vocabulary to be developed - in tandem with EPICS/ONIX?Can/should coincide with or provide sector requirements (eg ISBN, ISRC, ISWC etc).Different DOI-APs� metadata will interoperate if vocabularies are developed within indecs-based model.

Page 27: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

DOI Kernel

Contains critical minimum metadata for basic recognition (but not complete disambiguation).

Standard base vocabulary (eg manifestation, version) mean all DOI applications can expect base genre metadata.

DOI -AP entity (e.g. �book�) must be analysable in terms of other attributes (e.g. media, mode, content, subject).

DOI 10.1000/ISBN0141255559

DOI Genre Book

Identifier ISBN 0141255559

Title Two for the dough

Type Manifestation

Mode Visual

Primary Agent Janet Evanovich

Agent Role Author

Page 28: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

DOI Kernel Extensions

IDF to develop an extended �catalogue� for all extended metadata requirements from indecs-based models and vocabulary, along these lines...

DOI UASetIdentifier(s)Title(s) + Types, LanguagesPrimary type ModeMediaEncodingForm(s)Subject(s)Content Language + Use Type Measures + Units of Measure

Content CreationsContent Link Sequence, Measure

Related Creations + Link TypeCreation Event + Type

Primary Agent + Agent Role + ToolSource CreationDate(s)Location(s)

Availability Event + TypeAgent + Agent RoleDate(s)Location(s)Price + Type

Page 29: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

DOI Kernel as the basis of each app. profile

Each Profile can be thought of as built from the kernel + extensions:

DOI AP

metadata for application Compulsory kernel for any DOI

Page 30: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

DOI Kernel as the basis of each application set

Each DOI-AP can be thought of as built from the kernel + extensions�

...But the kernel is actually what several AP�s have in common (compare the different views of a person) :

SonLegal personAgentAlienScholarLibrary userComposercredit card holderShoe purchaserAuthorLottery entrant

Hospital patientCitizenCar driverRights ownerMarathon runnerSoftware licenseeParentTax payerClub membere-consumer Back account holder

HusbandCharity giverHotel guestSpeeding ticket recipientDisneyWorld visitorFrequent FlyerConcert-goerPassengerEmployeeVoterDog owner

Page 31: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

DOI Kernel as the basis of each Application

This kernel cannot be logically defined from first principles

In the absence of existing Application Profiles to define this overlap = kernel, we have made a reasonable estimate from the logical analysis of <indecs>

Page 32: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

DOI AP 1

metadata for AP

DOI AP 2

DOI A

kernel for any DOI P 3

DOI-APs: all metadata in well-formed structure

Page 33: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

<indecs> analysis and DOI

Attributes

Primary agent

Agent role

Creating events

quantities mode

Creation identified by DOIlabels quantities

situations

measures

qualities

Content creations

Related Creations

events

IP Rights statement

agent

time

placetool

SourceIP entity

currency

IP type

Using events

agenttimeplaceprice

DOI>

Relations

names (titles)

identifiers

types

language

continuity

infixion

number

DOI-AP

format

genreaudience

origination

= kernel

Page 34: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

Metadata declarations

WHAT:

� Base kernel metadata must be declared.

� DOI-AP-specific metadata is a matter for the DOI User Community (Governance Group/Registration Agency) to decide.HOW:

� Either local webpage or central repository or both (as decided by User Community rules).

� Automated access to metadata declaration via Handle data types?

� XML schemas.

Page 35: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

Roles of declared metadata

= Functional specification of the DOI kernel

(a) to assign a unique DOI to the creation [DOI]

(b) to link the DOI to the principal local identifier of a creation (if any) to enable the integration of DOI-related applications and metadata with others [Identifier]

(c) to enable a searcher or application to identify the creation by its most common name and the parties(s) responsible for its creation or publication [Title, Primary Agent, Agent Role]

Page 36: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

Roles of declared metadata (continued)

(d) to enable a searcher or application to distinguish the fundamental type of creation (abstract, physical, digital or spatio-temporal), and thereby also to distinguish between creations of different types with the same names and creators. [Type]

(e) to enable a searcher or application or distinguish the mode of the creation (visual, audio, etc.)[Mode]

(f) to enable a searcher or application to determine to which DOI user/application set the creation belongs [DOI-AP].

Page 37: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

Handle and metadata

Handle data types could create a way of processing metadata as a �distributed database� of services: e.g.

[email protected]/[email protected]/[email protected]/[email protected]/[email protected]/[email protected]/[email protected]/123456etc.

Data types (and results) must be consistent, so the Handle data type vocabulary must be developed with great care within indecs-based model. Some data types could be application specific.

Page 38: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

Metadata tasks for DOI

� Mapping ONIX to <indecs>� reconcile any differences

� <indecs> data dictionary� elements and iids tested in depth; for mappings

� maintaining iid registry� database � available to anyone building application schema,

but not need to be public� applications based on iid registry

� technology tools to ease application set building

Page 39: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

The DOI model: future extension

1. developing rights management aspects of dictionary.

doi>

Identifier

Description Action

Page 40: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

The DOI model: future extension

Developing rights management aspects of dictionary:

Rights

doi>

Identifier

Description Action

DOI for parties and events in future?

Page 41: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

Conclusion: DOI as the Integrator

�DOI is the most ambitious identifier in the history of the world�. (G. Rust 1998)

But now several things are becoming established...

�it has a persistent, granular, flexible, unique identifier which can be a �wrapper� for other IDs. Not competitive - enhances legacy identifiers� functionality in d-commerce. DOI as the integrating digital identifier?

...a strong, established metadata model and vocabulary.

�a controlled but flexible development structure.

�it does not confuse names with addresses.

�allows multiple, standardised automated actions.

Nothing else comes close...

Page 42: DOI workshop: metadata issues · Passenger Employee Voter Dog owner In each of these roles fiyoufl will have different IDs and attributes. Three  conclusions #1: All

Metadata issues and DOI