module 3b: metadata imt530: organization of information resources winter 2007 michael crandall
Post on 20-Dec-2015
221 views
TRANSCRIPT
Module 3b: Metadata
IMT530: Organization of Information Resources
Winter 2007
Michael Crandall
IMT530A- Organization of Information Resources 2
Recap
• Information systems have two inputs– User needs– Information objects
• Representing those inputs effectively inside the system enables the output of objects (or pointers to objects) matching user needs
• Representation is accomplished through developing a model that describes the needs and the objects
• This is expressed through metadata and controlled vocabularies, which are applied to the user needs and information objects
• So what is metadata anyway, and how is it created and used?
IMT530A- Organization of Information Resources 3
Ways to Express Meaning: for people & machines
General Logic
Glossaries / Controlled Vocabularies Data and Document Metamodels
Formal Knowledge Bases & InferenceInformal Taxonomies and Thesauri
Terms Thesauri
formalTaxonomies
Frames(OKBC)
Data Models(UML, STEP)
Restricted Logics
(OWL, Flogic)
Principled, informal
taxonomies
ad hoc Hierarchies
(Yahoo!)structured Glossaries
XML DTDs
Data Dictionaries
(EDI)
‘ordinary’Glossaries
XML Schema
DB Schema
Michael Uschold | The Boeing Company
IMT530A- Organization of Information Resources 4
Module 3b Outline
• Metadata defined• Origins of metadata theory• Types of metadata• Metadata schemas• Objectives of metadata• Using metadata• Encoding metadata• Creating metadata• Metadata issues
IMT530A- Organization of Information Resources 5
What is Metadata?
• Data about data• Definitional data that provides information
about or documentation of other data managed within an application or environment… metadata may include descriptive information about the context, quality and condition, or characteristics of the data (FOLDOC)
• Levels of complexity– Simple (embedded in object; e.g., a hyperlink)– Structured (Dublin Core, content management)– Rich (library MARC records, Encoded Archival
Description)
IMT530A- Organization of Information Resources 6
Origins
• Library science– Focus is on entities as containers for information– Emphasis is on resource discovery– Tight focus resulted in widespread standards
• Data management– Focus is on the information itself– Much more complex information spaces (e.g.,
NASA satellite data)– Much more varied types of information and use– Emphasis is on data use (authenticity, authority)– Standards tend to be associated with data types
IMT530A- Organization of Information Resources 7
Types of Metadata
• Administrative– Object management– Rights and access management– Maintenance and preservation– Meta-metadata for managing metadata
• Structural or technical– Describes relationships between parts– Enables recognition and use of objects by systems
• Descriptive– Describes characteristics of object– Physical and aboutness (subject)
IMT530A- Organization of Information Resources 8
Metadata Schemas
• Sets of metadata elements designed to meet the needs of a community
• The elements are the fields that hold values authorized for use in the schema
• Many different needs, so many different schemas are available
• Three primary components– Structure: the model used to derive the schema (e.g., RDF)– Semantics: the meaning of the elements
• Values are specified through rules or vocabularies (“encoding schemes” or authority control)
– Syntax: the method for encoding the schema (e.g., XML, XHTML)
IMT530A- Organization of Information Resources 9
Schema Characteristics
• Interoperability– Structural (same model)– Semantic (same meaning for elements)– Syntactic (same encoding format)
• Flexibility– Ability to use parts or all of elements and values
• Extensibility – Allows addition or qualification of elements to meet
local needs– Tradeoff with interoperability
IMT530A- Organization of Information Resources 10
Objectives of Metadata
• Find– Through search engines, catalogs, etc.
• Identify– Distinguishing between items for purposes of use
• Select– By attributes such as language, format, genre, etc.
• Obtain– Either directly or through location/ordering metadata
• Navigate– For example, categories on web sites
• Manage– Content management systems– Document repositories
IMT530A- Organization of Information Resources 11
Using Metadata
• Application profiles– Collection of elements from multiple schemas used to meet
local needs– May extend or refine if allowed by rules in schema
namespace– Can’t add new elements or you’re creating a new schema
• Registries– Machine-accessible repositories of schemas– Allow reuse and sometimes interoperability
• Crosswalks– Manual equivalence tables across schemas– Often used to provide partial interoperability across systems– Difficult to achieve 1:1 correspondence, however
• Roll your own– Most common approach in business applications
IMT530A- Organization of Information Resources 12
Encoding Metadata (Syntax)
• Translating a metadata schema to syntax is essential – For machines to be able to access the metadata record– For display of metadata elements/values– For record transmission
• The standard in the library world is MARC (Machine-Readable Cataloging)– Current version is MARC 21
• Current standard in most other worlds is XML and its various flavors created for specific applications– Controlled by DTD (Document Type Descriptions) or XML
schemas– Advantage of schema is that it is also expressed in XML, so
can be referred to easily by other XML applications
IMT530A- Organization of Information Resources 13
Example of MARC Record
01187nam 2200337 a 4500001001200000003000600012005001700018008004100035035001500076040001800091049000900109074001400118074002300132086001500155099001900170100003000189245007800219260010600297300001900403440003000422500001900452500003400471500002000505530009200525610003400617650002600651700002500677710007600702856006000778949001100838tmp96303807OCoLC19970728102440.0971114s1996 dcu f000 0 eng d a1258-02760 dGPOdDLCdMvI aVPII a0378-H-12 a0378-H-12 (online)0 aD 5.417:84 aDocs D5.417:841 aOakley, Robert B.,d1931-10aPolicing the new world disorder /cby Robert Oakley and Michael Dziedzic. a[Washington, D.C.?] :bNational Defense University, Institute for National Strategic Studies,c[1996] a4 p. ;c28 cm. 0aStrategic forum ;vno. 84 aCaption title. aShipping list no.: 97-0045-P. a"October 1996." aAlso available via Internet from the Institute for National Strategic Studies web site.20aUnited NationsxArmed Forces. 0aInternational police.1 aDziedzic, Michael J.2 aNational Defense University.bInstitute for National Strategic Studies.7 uhttp://www.ndu.edu/ndu/inss/strforum/forum84.html2http
IMT530A- Organization of Information Resources 14
XML Version of MARC Record <?xml version="1.0" encoding="UTF-8" ?> - <sequence xmlns="http://www.dlib.vt.edu/projects/OAi/marcxml/container" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-
instance" xsi:schemaLocation="http://www.openarchives.org/OAI/oai_marc http://www.openarchives.org/OAI/oai_marc.xsd http://www.dlib.vt.edu/projects/OAi/marcxml/container http://www.dlib.vt.edu/projects/OAi/marcxml/container.xsd">
- <oai_marc xmlns="http://www.openarchives.org/OIA/oai_marc" status="n" type="a" level="m" catForm="a"> <fixfield id="1">"tmp96303807"</fixfield> <fixfield id="3">"OCoLC"</fixfield> <fixfield id="5">"19970728102440.0"</fixfield> <fixfield id="8">"971114s1996 dcu f000 0 eng d"</fixfield> - <varfield id="35" i1="" i2=""> <subfield label="a">1258-02760</subfield> </varfield>- <varfield id="40" i1="" i2=""> <subfield label="d">GPO</subfield> <subfield label="d">DLC</subfield> <subfield label="d">MvI</subfield> </varfield>- <varfield id="49" i1="" i2=""> <subfield label="a">VPII</subfield> </varfield>- <varfield id="74" i1="" i2=""> <subfield label="a">0378-H-12</subfield> </varfield>- <varfield id="74" i1="" i2=""> <subfield label="a">0378-H-12 (online)</subfield> </varfield>- <varfield id="86" i1="0" i2=""> <subfield label="a">D 5.417:84</subfield> </varfield>
IMT530A- Organization of Information Resources 15
XML Version of MARC Record--<varfield id="99" i1="" i2=""> <subfield label="a">Docs D5.417:84</subfield> </varfield> <varfield id="100" i1="1" i2=""> <subfield label="a">Oakley, Robert B.,</subfield> <subfield label="d">1931-</subfield> </varfield>- <varfield id="245" i1="1" i2="0"> <subfield label="a">Policing the new world disorder /</subfield> <subfield label="c">by Robert Oakley and Michael Dziedzic.</subfield> </varfield>- <varfield id="260" i1="" i2=""> <subfield label="a">[Washington, D.C.?] :</subfield> <subfield label="b">National Defense University, Institute for National Strategic Studies,</subfield> <subfield label="c">[1996]</subfield> </varfield>- <varfield id="300" i1="" i2=""> <subfield label="a">4 p. ;</subfield> <subfield label="c">28 cm.</subfield> </varfield>- <varfield id="440" i1="" i2="0"> <subfield label="a">Strategic forum ;</subfield> <subfield label="v">no. 84</subfield> </varfield>- <varfield id="500" i1="" i2=""> <subfield label="a">Caption title.</subfield> </varfield>- <varfield id="500" i1="" i2=""> <subfield label="a">Shipping list no.: 97-0045-P.</subfield> </varfield>
IMT530A- Organization of Information Resources 16
XML Version of MARC Record- <varfield id="500" i1="" i2=""> <subfield label="a">"October 1996."</subfield> </varfield>- <varfield id="530" i1="" i2=""> <subfield label="a">Also available via Internet from the Institute for National Strategic Studies web site.</subfield> </varfield>- <varfield id="610" i1="2" i2="0"> <subfield label="a">United Nations</subfield> <subfield label="x">Armed Forces.</subfield> </varfield>- <varfield id="650" i1="" i2="0"> <subfield label="a">International police.</subfield> </varfield>- <varfield id="700" i1="1" i2=""> <subfield label="a">Dziedzic, Michael J.</subfield> </varfield>- <varfield id="710" i1="2" i2=""> <subfield label="a">National Defense University.</subfield> <subfield label="b">Institute for National Strategic Studies.</subfield> </varfield>- <varfield id="856" i1="7" i2=""> <subfield label="u">http://www.ndu.edu/ndu/inss/strforum/forum84.html</subfield> <subfield label="2">http</subfield> </varfield>- <varfield id="949" i1="" i2=""> <subfield label="a">000103</subfield> </varfield> </oai_marc>
IMT530A- Organization of Information Resources 17
MARC Record in TextCall Number: Docs D5.417:84 Authors: Oakley, Robert B., 1931- Dziedzic, Michael J. National Defense University. Institute for National Strategic Studies.
Titles: Policing the new world disorder / / by Robert Oakley and Michael Dziedzic. Strategic forum ; / no. 84
Imprint: [Washington, D.C.?] : / National Defense University, Institute for National Strategic Studies, / [1996] Description: 4 p. ; 28 cm.
Notes: Caption title. Shipping list no.: 97-0045-P. "October 1996." Access: URL: http://www.ndu.edu/ndu/inss/strforum/forum84.html |2|: http
Subjects: United Nations — Armed Forces. International police.
IMT530A- Organization of Information Resources 18
Creating Metadata
• We’ve focused on building metadata structures, but someone has to actually create the metadata values used in a system
• Structural and administrative metadata values are often applied when information is created, or generated automatically by authoring tools
• Descriptive metadata is harder to create– In libraries, traditionally has been done by trained
professionals– Some automated tools have shown limited success in narrow
domains– End users have not generally been a good source unless
forced to as part of document creation– Quality often suffers when trained indexers are not used
IMT530A- Organization of Information Resources 19
Metadata Issues
• Make sure you can measure results
• Don’t assume one size fits all
• Choose user access points wisely
• Provide user tools and education for effective use of your metadata
• Make sure you’re adding value
• Balance theory with practical needs
• Trust and provenance
IMT530A- Organization of Information Resources 20
Questions?
• If not, take a break!!!
IMT530A- Organization of Information Resources 21
Exercise 3
• Find your groups
• Spend the next 30 minutes exploring the examples in Exercise 3
• Ask questions and talk!!!
• Be sure to hand in completed work at the end of class for credit!!!
IMT530A- Organization of Information Resources 22
Next Week
• We’ll look at application profiles and selection of metadata elements for description and access (Part 1 of your assignment)
• Remember to read assignments BEFORE class
• Have a great weekend!!