metadata issues and doi doi>. overview of presentation... background three conclusions the...
TRANSCRIPT
Metadata issues and DOI
doi>
overview of presentation...
BackgroundThree <indecs> conclusionsThe metadata landscape: which schemes matter most to DOI?
DOI metadata - practical implicationsDOI GenresDOI KernelHandle and metadataConclusion
Metadata issues and DOI
Definitions of metadata
popular...
Metadata is data about data.Everyone
logical...
An item of metadata is a relationship that someone claims exists between two entities*.<indecs> framework
functional...
Metadata is the life-blood of e-commerce.John Erickson (HP)
*entity = something which has identity
#1: All metadata is just a view
e.g. Views of a “person”: some (generic) ways in which you might be identified in metadata schemes...SonLegal personAgentAlienScholarLibrary userComposercredit card holderShoe purchaserAuthorLottery entrant
Hospital patientCitizenCar driverRights ownerMarathon runnerSoftware licenseeParentTax payerClub membere-consumer Back account holder
HusbandCharity giverHotel guestSpeeding ticket recipientDisneyWorld visitorFrequent FlyerConcert-goerPassengerEmployeeVoterDog owner
In each of these roles “you” will have different IDs and attributes.
Three <indecs> conclusions
#1: All metadata is just a view
Creations are the same. An identifier for a published article may refer to...
A manuscript The abstract workA draft A (class of) physical copy in a publication A (class of) digital copy (not in a publication)A (class of) digital copy in a publicationA (class of) digital formatA specific digital copyA (class of) paper copyA specific paper copyAn editionA reprintA translationetc…and many combinations of the above
Similar views apply to other types of creations.
Three <indecs> conclusions
#1: All metadata is just a view
Views must not be confused for digital content and rights management. Mistaken identity can be catastrophic.
Increasingly, views need to be interoperable (e.g. production workflow, rights, marketing within one business; supply chain transfer; etc.).
The need for automated, interoperable views in d-commerce will be enormous.
Three <indecs> conclusions
#2: (Almost) all terms need identifiers
Each of the values of a view must be defined and identified if other views are to recognize them (what do you mean by an abstract work? an edition? a format? a scholar? a name?)
So views need comprehensive controlled vocabularies (nb our reliance on ISO language, territory, currency, time codes).
Automation needs disambiguity.
Terms of rights must be unambiguous. Anything may be a term of an agreement.
Emergence of the value of structured ontologies for commerce (like the indecs model).
Three <indecs> conclusions
#3: Events are the key to interoperability
Most metadata is “thing” or “people” based.
• static views e.g. “a creation”
In the net future, metadata interoperability will be achieved by describing “events”; relating things and people
• dynamic views e.g. “A created B”
Event descriptions will also be the key to rights metadata (transactions are events)
Three <indecs> conclusions
These conclusions are being reached increasingly often elsewhere.
There is an explosion of metadata activity:
• Models, Identifiers, Vocabularies, Dictionaries, Ontologies.
• XML/RDF schemas.
• Registries/Repositories/”crosswalks”.
• Technical standards.
The metadata landscape
The metadata landscape for “creations”
The metadata landscape for “creations”
Books
Audio
Audiovisual
Libraries
Copyright
Journals
Magazines
Newspapers
Standards
Education
Music
Texts
Technology
Archives Museums
The metadata landscape for “creations”
Books
Audio
Audiovisual
Libraries
Copyright
Journals
Magazines
Newspapers
Standards
Education
MARC
CAE
ISBN
ISSN
Music
Texts
EAN
Technology
Archives Museums
UPC
ISO codes
1980s
The metadata landscape for “creations”
Books
Audio
Audiovisual
Multimedia
Libraries
Copyright
Journals
Magazines
Newspapers
Standards
Education
MARC
ISRCCAE
ISBN
ISSN
ISAN
Music
ISMN CIS
Texts
Dublin Core
EAN
Technology
DOI
IIMArchives Museums
ISWC
FRBR
UPCurl
urn HandleISO codes
mid 90’s
IMS
The metadata landscape for “creations”
Books
Audio
Audiovisual
Multimedia
Libraries
Copyright
Journals
Magazines
Newspapers
Standards
Education
MARC
ISRCCAE
ISBN
ISSN
ISAN
Music
ISMN CIS
UMIDTextsISTC
Dublin Core
SMPTE
DMCS
EPICS
ONIX
EAN
IMS
LOM
abc
<indecs>
MPEG7
MPEG21
ISO11179
RDFTechnology
XML schema
DOI
IPDA
PRISMeBooks
EBooks
IIM NITF
Archives MuseumsCIDOC
CROSSREF
ISWC
P/METAXrML
FRBR
UPCurl uri
urn Handle
BICI
SICIISO codes
today
Convergence
All serious schemes are becoming...
•Granular (parts and versions)
•Modular (creations within creations)
•Multimedia
•Multinational
•Multilingual
•Multipurpose
EPICS/ONIX (text)
SMPTE (audiovisual)
SDMI/DCMS (audio/music)
eBooks
DOI genres
CIDOC (museums/archives)
FRBR (libraries)
Dublin Core
CIS (copyright societies)
PRISM (magazines)
NITF (newspapers)
MPEG7 (multimedia)
Result: major “sector” schemes are now trying to define metadata with broadly the same scope, only different emphases.
Which initiatives matter most to DOI?
MPEG21
SMPTE data dictionary
EPICS/ONIX
XrML
Criteria...
Strong underlying data model
Multi-purpose
Extensive, structured vocabulary
Commercial critical mass
Outward-looking
MPEG21
Began 2000 (ISO Motion Picture Expert Group).
Possible umbrella for digital multimedia standards. Place to bring technology and content standards together.
MPEG track record of disciplined standards development.
Most major players getting involved.
Not many lawyers (yet).
Short-term perception problem: “MPEG is audiovisual”.
Is the challenge too great?
SMPTE Data Dictionary/UMID
Began 1998 (Society of Motion Picture and Television Engineers).
Well-structured multimedia technically-oriented data dictionary.
ISO 11179 metadata registry based, good governance and update procedure.
SMPTE track record of disciplined standards development.
UMID (Unique Media Identifier) for digital material - complementary to “editorial” identifiers like DOI.
Guaranteed implementation in “home” sector.
Start point for MPEG7 metadata work.
EPICS & ONIX International
EDItEUR (EPICS) and AAP (ONIX) convergence (May 2000).
Substantial and extensible EPICS metadata dictionary, <indecs>-model based, from which “ONIX” XML-tagged subset(s) are taken.
Commerce-driven (Amazon etc) with transatlantic industry support and International Steering Group.
Likely to be used by eBooks, ISTC.
ONIX for video (Amazon initiative)? ONIX for audio?
Best chance of e-commerce multimedia vocabulary and schema (and maybe d-commerce?).
XrML and Rights metadata
DRM (Digital Rights Management) systems at present are for “unitary” rights: doesn’t deal with modularity.
Holdup 1: Rights vocabularies need descriptive vocabularies - not yet ready.
Holdup 2: Events model needed to integrate descriptions and rights - event-based tools not yet developed.
XrML likely focal point for next stage.
2001+ before more mature interoperable developments start to emerge.
DOI-R? Interested partners in a prototype?
Standard controlled vocabularies
Existing…Territories, Language, Currency, Date/Time (ISO)Measures (U.C.U.M)
Needed…Creation typesDerivation types (adaptation, sample, compilation…)Contributor roles (author, translator, cameraman…)Title types (abbreviated, inverted, formal... etc)Media types (formats)Name types Identifier types Encoding types Tools/instrumentsUser roles etc...and many identifiers need establishing or creating (Parties, Agreements, ISWC, ISTC, ISAN, UMID etc)
DOI metadata - practical implicationsDOI GenresDOI KernelHandle and metadataConclusion
Metadata issues and DOI
DOI Genres
A genre is a DOI view: mechanism for “unity in diversity”.
Genre based on any interest group’s view of a type of creation. Functional granularity: create a genre when you need it.
Genres can overlap: creations can be in multiple genres.
Genre has metadata kernel, Registration Agency, Genre Development / Steering Group?
Base Genre for new, unplaced DOIs.
Zero Genre = “initial implementation” DOIs (just a single URL redirection; zero additional metadata).
ActivitytrackingActivitytracking
Full implementation
Full implementation
Initial implementation
Initial implementation
Single redirection (persistent identifier)
Metadata W3C, WIPO, NISO, ISO, UDDI, etcMultiple resolution
Defined genresDefined genresZero genreZero genre
Single redirection (persistent identifier)
Metadata W3C, WIPO, NISO, ISO, UDDI, etcMultiple resolution
Each Genre starts from Base Genre kernel (8 elements) and may add whatever else it needs.
A kernel extension model is being developed
DOI Genre vocabulary to be developed - in tandem with EPICS/ONIX?
Can/should coincide with or provide sector requirements (eg ISBN, ISRC, ISWC etc).
Different Genres’ metadata will interoperate if vocabularies are developed within indecs-based model.
DOI Kernel
DOI 10.1000/ISBN0141255559
DOI Genre Book
Identifier ISBN 0141255559
Title Two for the dough
Type Manifestation
Origination Original
Primary Agent Janet Evanovich
Agent Role Author
DOI Kernel
Contains critical minimum metadata for basic recognition (but not complete disambiguation).
Standard base vocabulary (eg manifestation, version) mean all DOI applications can expect base genre metadata.
DOI Genre type (e.g. “book”) must be analysable in terms of other attributes (e.g. media, mode, content, subject).
DOI Kernel Extensions
IDF to develop an extended “catalogue” for all extended metadata requirements from indecs-based models and vocabulary, along these lines...
DOI DOI GenreIdentifier(s) Title(s) + Types, Languages
Primary type OriginationMediaEncodingGenre(s)Form(s)Subject(s)Content Language + Use Type Measures + Units of Measure
Content Creations Content Link Sequence, MeasureRelated Creations + Link TypeCreation Event + Type Primary Agent + Agent Role + Tool Source Creation Date(s) Location(s)Availability Event + Type Agent + Agent Role Date(s) Location(s) Price + Type
DOI Kernel as the basis of each genre
Each genre can be thought of as built from the kernel + extensions:
DOI Genre
metadata for Genre
Compulsory kernel for any DOI
Each genre can be thought of as built from the kernel + extensions…
...But the kernel is actually what several genres have in common (compare the different views of a person) :
SonLegal personAgentAlienScholarLibrary userComposercredit card holderShoe purchaserAuthorLottery entrant
Hospital patientCitizenCar driverRights ownerMarathon runnerSoftware licenseeParentTax payerClub membere-consumer Back account holder
HusbandCharity giverHotel guestSpeeding ticket recipientDisneyWorld visitorFrequent FlyerConcert-goerPassengerEmployeeVoterDog owner
DOI Kernel as the basis of each genre
In the absence of existing genres to define this overlap = kernel, we have made a reasonable estimate from the logical analysis of <indecs>
DOI Kernel as the basis of each genre
DOI Genre 1
Compulsory metadata for Genre
DOI Genre 2
D
OI Gen
re 3
kernel for any DOI
Genres: all metadata in well-formed structure
Metadata declarations
WHAT:
• Base kernel metadata must be declared.
• Genre-specific metadata is a matter for the Genre (Development Group/Registration Agency) to decide.
HOW:
• Either local webpage or central repository or both (as decided by Genre).
• Automated access to metadata declaration via Handle data types?
• XML schemas.
Roles of declared metadata
= Functional specification of the DOI kernel
(a) to assign a unique DOI to the creation [DOI]
(b) to link the DOI to the principal local identifier of a creation (if any) to enable the integration of DOI-related applications and metadata with others [Identifier]
(c) to enable a searcher or application to identify the creation by its most common name and the parties(s) responsible for its creation or publication [Title, Primary Agent, Agent Role]
Roles of declared metadata (continued)
(d) to enable a searcher or application to distinguish the fundamental type of creation (abstract, physical, digital or spatio-temporal), and thereby also to distinguish between creations of different types with the same names and creators. [Type]
(e) to enable a searcher or application or distinguish the origination of the creation (original, derivation) [Origination]
(f) to enable a searcher or application to determine to which DOI Genre the creation belongs [DOI Genre].
Handle and metadata
Handle data types could create a way of processing metadata as a “distributed database” of services: e.g.
Data types (and results) must be consistent, so the Handle data type vocabulary must be developed with great care within indecs-based model. Some data types could be genre specific.
[email protected]/[email protected]/[email protected]/[email protected]/[email protected]/[email protected]/[email protected]/123456etc.
Metadata tasks for DOI
• Mapping ONIX to <indecs>– reconcile any differences
• <indecs> data dictionary– elements and iids tested in depth; for
mappings
• maintaining iid registry– database – available to anyone building a genre
schema, but not need to be public
• applications based on iid registry – technology tools to ease genre building
The DOI model: future extension
Identifier
Description
Actiondoi>
1. developing rights management aspects of dictionary.
Identifier
Description
Action
Rights
doi>
DOI for parties and events in future?
The DOI model: future extension
Developing rights management aspects of dictionary:
Conclusion: DOI as the Integrator
“DOI is the most ambitious identifier in the history of the world”. (G. Rust 1998)
But now several things are becoming established...
…it has a persistent, granular, flexible, unique identifier which can be a “wrapper” for other IDs. Not competitive - enhances legacy identifiers’ functionality in d-commerce. DOI as the integrating digital identifier?
...a strong, established metadata model and vocabulary.
…a controlled but flexible development structure.
…it does not confuse names with addresses.
…allows multiple, standardised automated actions.
Nothing else comes close...
Metadata issues and DOI
doi>