automatic metadata generation charles duncan
DESCRIPTION
Slides by Charles Duncan summarising the findings of the automatic metadata generation use cases project, see http://www.intrallect.com/wiki/index.php/AMG-UCTRANSCRIPT
Findings of the Automatic Metadata Generation Use Cases project Charles [email protected]
Find and Seek
•Hide and Seek?
What is metadata?
• Anything that aids the discovery and discrimination of resources
?
?
?
?
?
??
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?known unknowns
unknown unknowns
too manyresults
Purposes of metadata
• Discovery (known unknowns)
• Discrimination (too many results)
• Recommendation (unknown unknowns)
Simplistic View
Information Archive
Deposit Discovery/Curation
Metadata Generation
Metadata Use
Closer to reality
Metadata Generation
Other Information
Archives
Other Information
Arc hives Other
Information Archives
Metadata Use
Information Archive
Deposit Discovery
Metadata Generation
Discrimination
Recommendation Metadata Generation
Just-in-case
Just-in-time
In-case v. In-time
For Against
Just-in-case
• Efficient if metadata is created once and used many times
• May create and store metadata which might never be used
Just-in-time
• Allows great flexibility for new applications
• Could require unreasonable processing times for a real-time service
Use case - student
• Student on history course gets reading list from VLE. Selects an article and is offered, additional information about geographic locations and historical characters mention in the article, list of other articles by same author that have the same highly ranked keywords, other articles that commonly appear on reading lists with this article and books borrowed by students of similar profile with matched keywords.
Use case - depositor
• Researcher submits a paper for deposit in a repository. The PDF is analysed and keywords and classification suggested. File type and size are detected. Author and journal names are detected and checked and disambiguated against authoritative source. Page numbers and date of publication extracted. All these metadata fields completed automatically. Depositor puts paper into two “collections”. All references are identified and related to this paper.
Types of automatic metadata gen
• Automatic recognition and extraction services• Key word extraction• Automatic classification• Basic facts (date, depositor, file type, file size, etc)
• Authoritative source services• Name authority• Journal title authority
• Translation services• Conversion between metadata schemas• Conversion between languages
• Metadata quality validation services• Harvesting and validating metadata
• Activity aggregation services• Usage (reading lists, library borrowing, search
terms)• Relationship services
• User-created collections• Automatic term relation mapping (with “strength”)
Types of automatic metadata gen
Just-in-case
Just-in-time
Automatic recognition and extraction services
Authoritative source services Translation services Metadata quality validation
services
Activity aggregation services Relationship services
Reports
• Synthesis Report (Use Cases)• Guidance Report (Tools)• Recommendations Report (JISC only)• Specialist reports
– Subject metadata– Name metadata – Geospatial metadata– Factual metadata– Bibliographic metadata– Usage metadata– File format metadata– Integrating automatic metadata services
http://www.intrallect.com/index.php/intrallect/knowledge_base/research_projects/automatic_metadata_generation_use_cases