lis512 lecture 7
DESCRIPTION
lis512 lecture 7. metadata, XML and MARCXML. metadata. The term metadata is usually defined as “data about data”. As such it is controversial what is metadata and what is data. As far as we are concerned metadata are records that are attached to documents . - PowerPoint PPT PresentationTRANSCRIPT
lis512 lecture 7
metadata, XML and MARCXML
metadata The term metadata is usually defined as “data
about data”. As such it is controversial what is metadata and what is data.
As far as we are concerned metadata are records that are attached to documents.
Meaning records about something of interest. We can say that a MARC record is metadata, although it may not.
metadata example: email If you send and receive email, you will
sometimes see what is known as email headers.
These collection of fields are of the form attribute: value.
Example on next slide
From [email protected] Sun Jul 12 14:55:16 2009Date: Sun, 12 Jul 2009 14:55:16 +0700From: Thomas Krichel <[email protected]>To: [email protected]: <[email protected]>MIME-Version: 1.0Content-Type: text/plain; charset=us-asciiContent-Disposition: inlineEnvelope-to: Thomas Krichel <[email protected]>Return-Path: Thomas Krichel <[email protected]>User-Agent: Mutt/1.5.18 (2008-05-17)Status: ROContent-Length: 5Lines: 1
metadata example: http headers
• HTTP/1.1 200 OK• Date: Wed, 24 Feb 2010 17:34:33 GMT• Server: Apache/2.2.14 (Debian)• Last-Modified: Sun, 13 Dec 2009 08:03:42 GMT• ETag: "5f8271-f76-47a9798613380"• Accept-Ranges: bytes• Content-Length: 3958• Connection: close• Content-Type: text/html
example id3v1• A fixed 128 byte format. – header 3 bytes "TAG" – title 30 bytes of the title – artist 30 bytes of the artist name – album 30 bytes of the album name – year 4 byte year– comment 30 bytes– zero-byte 1 If a track number is stored, this byte
contains a binary 0.– track 1 The number of the track on the album, or 0. – genre 1 Index in a list of genres, or 255
markup Markup is the information contained in a
document that is not its contents. Markup mainly comes with two types of
information information related to the structure information related to the appearance
In good documents, structure and appearance are related.
XML
XML is a syntax to encode information as documents.
XML is not really a language since it has no vocabulary.
You can use any vocabulary you like. XML is used to format records and to provide
markup in a document.
XML nodes XML is written in the form of nodes. I will only
discuss three types of nodes here character data XML elements attributes to elements
Character data as just that: characters.
XML elements If you write an element, write something of
the form. <name>contents</name> here name is the name of the element and contents is the contents of the element.
The contents can be character data and or other elements.
XML tags <name> is the start tag of an element that is
called name. </name> is the end tag of an element that is
called name. XML tags a syntactic feature of XML. They are
not nodes.
empty elements If an element has no contents whatsoever, it
can be written as <foo></foo> or <foo/> in the latter case it is an empty element
element examples <name>Thomas Krichel</name> <foo><bar>hello world</bar></foo> <name>Mr. <first>Thomas</first>
<last>Krichel</last></name> <thomaskrichel/>
child elements
• If an element is in the contents of another element, it is called a child element.
• When you write an XML document all elements much be children of one single element. That single element is the called the root element. The root element is the only element without a parent element.
attributes Attributes attach name=value pairs to
element. Values are enclosed in single or double
quotes. Double quotes are more common. These attribute value pairs appears written at
the start tag. It is not written into the end tag.
attribute examples <name type="full">Thomas Krichel</name> <name string="Thomas Krichel"/> <name type="reverse">Krichel,
Thomas</name>
more on attributes Attributes names and values are strings. Attributes names are separated from values
by the = sign. Attribute names and vales may be surrounded
by whitespace.
XML application examples HTML is the language used to encode a
specific type of documents known as a web page.
It has a vocabulary on element names and attribute names.
HTML is written in XML syntax or a syntax that is close to it.
example HTML element <a> The <a> element creates an anchor. This is a
part of the document that leads to another. Where it leads to is given by an attribute
called href. Example <a href="http://openlib.org/home/krichel"> Thomas Krichel</a>
example HTML element <img/> The HTML element <img/> requests an image
to be included in the web page <img
src="http://openlib.org/home/krichel/ToK.gif" alt="picture of Thomas Krichel"/>
Note that this element is empty.
MARC XML
• In order to increase the interoperability of MARC defined a mapping of the MARC format into the XML syntax.
• Not everybody thinks it is a good idea. http://serials.infomotions.com/ngc4lib/archive/2009/200909/1450.html
• A shamelessly copied example is at http://wotan.liu.edu/home/krichel/courses/lis512/external_doc/sandburg.xml
start of the example<collection><record><leader>01142cam 2200301 a 4500</leader><controlfield tag="001"> 92005291 </controlfield><controlfield tag="003">DLC</controlfield><controlfield tag="005">19930521155141.9</controlfield><controlfield tag="008">920219s1993 caua j 000 0 eng
</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a"> 92005291 </subfield></datafield>
end of the example<datafield tag="650" ind1=" " ind2="1"><subfield code="a">Visual perception.</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Rand, Ted,</subfield><subfield code="e">ill.</subfield></datafield></record></collection>
comments on example
• In an XML document, there must be one element that all other elements are children of.
• In this case this is the <collection> element.• The <collection> can contain many <record>
elements. In the example, there is just one.• Find the features of MARC as set out in the
description of MARC.