automatic metadata generation charles duncan

Findings of the Automatic Metadata Generation Use Cases project Charles Duncan [email protected] om

Upload: jisc-cetis

Post on 18-Dec-2014




3 download


Slides by Charles Duncan summarising the findings of the automatic metadata generation use cases project, see


Page 1: Automatic Metadata Generation Charles Duncan

Findings of the Automatic Metadata Generation Use Cases project Charles [email protected]

Page 2: Automatic Metadata Generation Charles Duncan

Find and Seek

•Hide and Seek?

Page 3: Automatic Metadata Generation Charles Duncan

What is metadata?

• Anything that aids the discovery and discrimination of resources



























?known unknowns

unknown unknowns

too manyresults

Page 4: Automatic Metadata Generation Charles Duncan

Purposes of metadata

• Discovery (known unknowns)

• Discrimination (too many results)

• Recommendation (unknown unknowns)

Page 5: Automatic Metadata Generation Charles Duncan

Simplistic View

Information Archive

Deposit Discovery/Curation

Metadata Generation

Metadata Use

Page 6: Automatic Metadata Generation Charles Duncan

Closer to reality

Metadata Generation

Other Information


Other Information

Arc hives Other

Information Archives

Metadata Use

Information Archive

Deposit Discovery

Metadata Generation


Recommendation Metadata Generation



Page 7: Automatic Metadata Generation Charles Duncan

In-case v. In-time

For Against


• Efficient if metadata is created once and used many times

• May create and store metadata which might never be used


• Allows great flexibility for new applications

• Could require unreasonable processing times for a real-time service

Page 8: Automatic Metadata Generation Charles Duncan

Use case - student

• Student on history course gets reading list from VLE. Selects an article and is offered, additional information about geographic locations and historical characters mention in the article, list of other articles by same author that have the same highly ranked keywords, other articles that commonly appear on reading lists with this article and books borrowed by students of similar profile with matched keywords.

Page 9: Automatic Metadata Generation Charles Duncan

Use case - depositor

• Researcher submits a paper for deposit in a repository. The PDF is analysed and keywords and classification suggested. File type and size are detected. Author and journal names are detected and checked and disambiguated against authoritative source. Page numbers and date of publication extracted. All these metadata fields completed automatically. Depositor puts paper into two “collections”. All references are identified and related to this paper.

Page 10: Automatic Metadata Generation Charles Duncan

Types of automatic metadata gen

• Automatic recognition and extraction services• Key word extraction• Automatic classification• Basic facts (date, depositor, file type, file size, etc)

• Authoritative source services• Name authority• Journal title authority

• Translation services• Conversion between metadata schemas• Conversion between languages

• Metadata quality validation services• Harvesting and validating metadata

• Activity aggregation services• Usage (reading lists, library borrowing, search

terms)• Relationship services

• User-created collections• Automatic term relation mapping (with “strength”)

Page 11: Automatic Metadata Generation Charles Duncan

Types of automatic metadata gen



Automatic recognition and extraction services

Authoritative source services Translation services Metadata quality validation


Activity aggregation services Relationship services

Page 12: Automatic Metadata Generation Charles Duncan


• Synthesis Report (Use Cases)• Guidance Report (Tools)• Recommendations Report (JISC only)• Specialist reports

– Subject metadata– Name metadata – Geospatial metadata– Factual metadata– Bibliographic metadata– Usage metadata– File format metadata– Integrating automatic metadata services