introduction to xml data management issues. types of data structured structured semi-structured...
Post on 25-Dec-2015
260 Views
Preview:
TRANSCRIPT
Introduction to Introduction to XMLXML
Data Management IssuesData Management Issues
Types of dataTypes of data
StructuredStructured Semi-structuredSemi-structured
Structured DataStructured Data
data is organized in data is organized in entities ( entities ( tablestables))
entities have entities have attributesattributes
Current Database Current Database WorldWorld
– StructureStructure Relational Database Management System Relational Database Management System
(DBMS):(DBMS): everything is a tableeverything is a table
– Software: MS Access, Oracle….Software: MS Access, Oracle….
Example of a table (patients)Example of a table (patients)
Example ofExample ofa group of a group of tablestables
MS Access Table LinksMS Access Table Links
World of Web DataWorld of Web Data
– Easy document exchangeEasy document exchange
– Unstructured (or poorly structured) Unstructured (or poorly structured) datadata Everything is a documentEverything is a document
– No standard for query languagesNo standard for query languages
World of Web DataWorld of Web Data
ExampleExample– An organization An organization AA publishes financial publishes financial
data on its web pages (HTML), data on its web pages (HTML), generated from DBMS.generated from DBMS.
– A second organization A second organization BB wants some wants some financial analyses; can access only financial analyses; can access only web data.web data.
RDBMS
A BHTML
Semi-structured DataSemi-structured Data
data can be of any type data can be of any type not necessarily following any format not necessarily following any format does not follow any rules does not follow any rules examples include:examples include:
– text text – video video – sound sound – images images
Characteristics of Semi-Characteristics of Semi-Structured DataStructured Data
structure is structure is irregularirregular: missing or : missing or additional attributes additional attributes
parts of data parts of data lacklack structure, e.g., structure, e.g., images images
some may yield some may yield littlelittle structure, structure, e.g., plain text e.g., plain text
Semi-structured Data Semi-structured Data DefinitionDefinition
Data that is inherently Data that is inherently self-self-describingdescribing and does not conform to and does not conform to an explicit and fixed rule is known as an explicit and fixed rule is known as Semistructured DataSemistructured Data
Data Structure is contained within Data Structure is contained within data itselfdata itself
Example of Semi-Structured Example of Semi-Structured DataData
name: name: Peter WoodPeter Wood email: email: ptw@dcs.bbk.ac.uk, ptw@dcs.bbk.ac.uk,
p.wood@bbk.ac.ukp.wood@bbk.ac.uk ------------------------------------------------------------------------------------------------------------------------------------ name:name:
• first name: first name: MarkMark • last name: last name: LeveneLevene
email: email: mark@dcs.bbk.ac.ukmark@dcs.bbk.ac.uk ------------------------------------------------------------------------------------------------------------------------------------ name: name: Alex SmithAlex Smith affiliation: affiliation: StFXStFX
IMDB – A Motivating IMDB – A Motivating ExampleExample
The The Internet Movie DatabaseInternet Movie Database is a is a classical example of a collection classical example of a collection of semi-structured dataof semi-structured data
Although the information Although the information pertaining to different movies pertaining to different movies may be essentially similar, their may be essentially similar, their structure may be different!structure may be different!
Let us consider an example movie Let us consider an example movie databasedatabase
An Example Movie An Example Movie DatabaseDatabase
IMDB-Irregularity In IMDB-Irregularity In StructureStructure
• Different layout for movies and TV seriesDifferent layout for movies and TV series• Movie entries show Movie entries show Director, Writers Director, Writers andand
StarsStars• TV entries show just TV entries show just Creators Creators & & StarsStars
Captain Phillips (Movie)Captain Phillips (Movie)
Lost (TV Series)Lost (TV Series)
XML – An Embodiment XML – An Embodiment of Semi-structured of Semi-structured DataData XML can be used to represent XML can be used to represent
semi-structured data.semi-structured data.
What is XML? What is XML?
XML stands for EXML stands for EXXtensible tensible MMarkup arkup LLanguage anguage
XML is a XML is a markup languagemarkup language much much like HTML (tags)like HTML (tags)
XML was designed to XML was designed to describe describe datadata
XML tags are XML tags are not predefinednot predefined. . You must You must define your own tagsdefine your own tags
The main difference The main difference between XML and HTML between XML and HTML
XML and HTML were designed with XML and HTML were designed with different goalsdifferent goals::
XMLXML was designed to was designed to describe datadescribe data and and to focus on what data is.to focus on what data is.
HTMLHTML was designed to was designed to display datadisplay data and and to focus on how data looks.to focus on how data looks.
It is important to understand that It is important to understand that XML is XML is not a replacement for HTMLnot a replacement for HTML..
XML does not DO XML does not DO anythinganything Maybe it is a little hard to understand, but XML DOES NOT DO Maybe it is a little hard to understand, but XML DOES NOT DO
ANYTHING. XML is created to structure, store and to send ANYTHING. XML is created to structure, store and to send information.information.
The note has a header and a message body. It also has sender and The note has a header and a message body. It also has sender and receiver information. But still, this XML document does not DO receiver information. But still, this XML document does not DO anything. It is just pure information wrapped in XML tags. Someone anything. It is just pure information wrapped in XML tags. Someone must write a piece of software to send, receive or display it.must write a piece of software to send, receive or display it.
<note>
<to>John</to>
<from>Mary</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
XML is free and XML is free and extensibleextensible XML tags are not predefined. You must XML tags are not predefined. You must
""inventinvent" your own tags." your own tags. The tags used to mark up The tags used to mark up HTMLHTML documents documents
and the structure of HTML documents are and the structure of HTML documents are predefinedpredefined. (like <b>, <i>, <h1>, etc.).. (like <b>, <i>, <h1>, etc.).
XML allows authors to define their own tags XML allows authors to define their own tags and their own document structure.and their own document structure.
The tags in the example above (like The tags in the example above (like <to><to> and and <from>)<from>) are not defined in any XML are not defined in any XML standard. These tags are "invented" by the standard. These tags are "invented" by the author of the XML document.author of the XML document.
XML is used to Exchange XML is used to Exchange DataData
With XML, data can be exchanged between With XML, data can be exchanged between incompatible systems.incompatible systems.
In the real world, computer systems and In the real world, computer systems and databases contain data in databases contain data in incompatible incompatible formatsformats. One of the most time-consuming . One of the most time-consuming challenges for developers has been to challenges for developers has been to exchange data between such systems over exchange data between such systems over the Internet.the Internet.
Since XML data is stored in Since XML data is stored in plain text formatplain text format, , XML provides a XML provides a software- and hardware-software- and hardware-independent independent way of sharing data.way of sharing data.
XML can be used to Create XML can be used to Create new Languagesnew Languages XML is the mother of XML is the mother of WAPWAP( ( Wireless Wireless
Application ProtocolApplication Protocol)) and and WMLWML ( (The The Wireless Markup Language)Wireless Markup Language)..
WML used to markup Internet applications WML used to markup Internet applications for for handheld deviceshandheld devices like like mobile phonesmobile phones..
XML and Microsoft XML and Microsoft OfficeOffice
Starting with Office 2007, Microsoft changed Starting with Office 2007, Microsoft changed the format of all Office documents.the format of all Office documents.
They are all saved in XML format.They are all saved in XML format. So a Word file is a ZIP folder holding a So a Word file is a ZIP folder holding a
number of files including the text in XML number of files including the text in XML format.format.
Advantages:Advantages:– Small file sizeSmall file size– Compatibility with other softwareCompatibility with other software– Older Word files have the extension Older Word files have the extension DOCDOC, ,
new ones use new ones use DOCXDOCX
XML Syntax XML Syntax
The syntax rules of XML The syntax rules of XML are very are very simplesimple and and very strictvery strict. The rules . The rules are very easy to learn, and very are very easy to learn, and very easy to use.easy to use.
Because of this, creating software Because of this, creating software that can read and manipulate that can read and manipulate XML is very easy to do.XML is very easy to do.
All XML elements must have All XML elements must have a closing taga closing tag
Elements or tags Elements or tags are basic blocks of any are basic blocks of any XML documentXML document
With XML, it is illegal to omit the closing tag.With XML, it is illegal to omit the closing tag.
In HTML some elements do not have to have In HTML some elements do not have to have a closing tag. The following code is legal in a closing tag. The following code is legal in HTMLHTML::
<p>This is a paragraph<p>This is a paragraph In In XMLXML all elements all elements mustmust have a closing have a closing
tag, like this:tag, like this:
<par>This is a paragraph</par><par>This is a paragraph</par>
XML tags are case XML tags are case sensitivesensitive Unlike HTML, XML tags are Unlike HTML, XML tags are
case sensitive.case sensitive. With XML, the tag With XML, the tag <Letter> <Letter> is is
different from the tag different from the tag <letter><letter>.. Opening and closing tags must Opening and closing tags must
therefore be written with the therefore be written with the same case:same case:<Message>This is incorrect</message> <Message>This is incorrect</message> <message>This is correct</message><message>This is correct</message>
All XML elements must be All XML elements must be properly nestedproperly nested
Improper nesting of tags makes no sense to Improper nesting of tags makes no sense to XML.XML.
In HTML some elements can be improperly nested In HTML some elements can be improperly nested within each other like this:within each other like this:
<b><i>This text is bold and italic</b></i><b><i>This text is bold and italic</b></i> In XML all elements must be properly nested within In XML all elements must be properly nested within
each other like this:each other like this:<bold><italic><bold><italic>
This text is bold and italicThis text is bold and italic
</italic></bold></italic></bold>
All XML documents must All XML documents must have a root element (tag)have a root element (tag)
All XML documents must contain a single All XML documents must contain a single tag pair to define a root element.tag pair to define a root element.
All other elements must be within this root All other elements must be within this root element.element.
All elements can have sub elements (child All elements can have sub elements (child elements). Sub elements must be correctly elements). Sub elements must be correctly nested within their parent element:nested within their parent element:<root><root>
<child><child> <subchild>.....</subchild><subchild>.....</subchild>
</child> </child> </root> </root>
With XML, white space is With XML, white space is
preservedpreserved With XML, white space is preservedWith XML, white space is preserved With XML, the white space in your With XML, the white space in your
document is not truncateddocument is not truncated.. This is unlike HTML. With HTML, a This is unlike HTML. With HTML, a
sentence like this:sentence like this:
Hello my name is JohnHello my name is John,,
will be displayed like this:will be displayed like this:
Hello my name is JohnHello my name is John,,
because HTML strips off the white space.because HTML strips off the white space.
Element NamingElement Naming
XML elements must follow these naming XML elements must follow these naming rules:rules:
Names can contain Names can contain letters, numbers, and letters, numbers, and other characters other characters
Names must Names must not start with a number or not start with a number or punctuation character punctuation character
Names must Names must not start with the letters xml not start with the letters xml (or (or XML or Xml ..) XML or Xml ..)
Names cannot contain spaces Names cannot contain spaces
Element NamingElement Naming
Any name can be used, no words are Any name can be used, no words are reserved, but the idea is to make reserved, but the idea is to make names descriptivenames descriptive
XML documents often have a XML documents often have a
corresponding databasecorresponding database, in which fields , in which fields exist corresponding to elements in the exist corresponding to elements in the XML document. A good practice is to XML document. A good practice is to use the naming rules of your database use the naming rules of your database for the elements in the XML documents.for the elements in the XML documents.
Comments in XMLComments in XML
The syntax for writing comments The syntax for writing comments in XML is similar to that of HTML.in XML is similar to that of HTML.
<!-- This is a comment --<!-- This is a comment -->>
Errors in XML will stop the XML Errors in XML will stop the XML programprogram
The World Wide Web Consortium (W3C) XML The World Wide Web Consortium (W3C) XML specification states that a program should not specification states that a program should not continue to process an XML document if it finds a continue to process an XML document if it finds a validation error. The reason is that XML software validation error. The reason is that XML software should be easy to write, and that all XML documents should be easy to write, and that all XML documents should be compatible.should be compatible.
With HTML it was possible to create documents with With HTML it was possible to create documents with
lots of errors (like when you forget an end tag). One of lots of errors (like when you forget an end tag). One of the main reasons that HTML browsers are so big and the main reasons that HTML browsers are so big and incompatible, is that they have their own ways to incompatible, is that they have their own ways to figure out what a document should look like when figure out what a document should look like when they encounter an HTML error.they encounter an HTML error.
With XML this should not be possible.With XML this should not be possible.
XML and Web XML and Web BrowsersBrowsers
Internet Explorer Internet Explorer 5.0+, 5.0+, Google Google Chrome Chrome & & FirefoxFirefox support XMLsupport XML
Viewing XML Files Viewing XML Files
If you open an XML document in IE ( or If you open an XML document in IE ( or other browsers), it will display the other browsers), it will display the document with document with color color codedcoded root and root and child elements. A plus (child elements. A plus (++) or minus sign ) or minus sign ((--) to the left of the elements can be ) to the left of the elements can be clicked to expand or collapse the clicked to expand or collapse the element structure.element structure.
If you want to view the raw XML source, If you want to view the raw XML source,
you must select "View Source" from the you must select "View Source" from the browser menu. browser menu.
If an erroneous XML file is opened, the If an erroneous XML file is opened, the browser will report the error.browser will report the error.
Other Examples Other Examples
Viewing some XML documents will Viewing some XML documents will help you get the XML feeling.help you get the XML feeling.
An XML CD catalogAn XML CD catalogThis is some CD collection, stored as XML dataThis is some CD collection, stored as XML data
An XML plant catalogAn XML plant catalogThis is a plant catalog from a plant shop, This is a plant catalog from a plant shop, stored as XML data.stored as XML data.
A Simple Food MenuA Simple Food MenuThis is a breakfast food menu from a This is a breakfast food menu from a restaurant, stored as XML data.restaurant, stored as XML data.
Why does XML display like Why does XML display like this?this?
XML documents do not carry XML documents do not carry information about how to display the information about how to display the data.data.
Since XML tags are "invented" by the author Since XML tags are "invented" by the author of the XML document, browsers do not know of the XML document, browsers do not know if a tag like <table> describes an HTML if a tag like <table> describes an HTML tabletable or a or a dining tabledining table..
Without any information about how to Without any information about how to display the data, most browsers will just display the data, most browsers will just display the XML document as it is.display the XML document as it is.
The XML Rules The XML Rules (Summary)(Summary)
1.1. Single, unique root Single, unique root elementelement
2.2. Matching open/close Matching open/close tagstags
3.3. Consistent Consistent capitalisationcapitalisation
4.4. Correctly nested Correctly nested elementselements
5.5. Tags naming Tags naming
<?xml version=“1.0”?>
<company id=“4859”>
<name>3Months.com</name>
<type>Web Development</type>
<address>
<street>Wakefield st</street>
<city>Wellington</city>
<country>New Zealand</country>
</address>
</company>
Authoring XML Authoring XML DocumentsDocuments
A basic XML document is an XML element A basic XML document is an XML element that can, but might not, include nested that can, but might not, include nested XML elements.XML elements.
Example:Example: <<booksbooks>> <<bookbook>> <<titletitle> Second Chance <> Second Chance </title/title>> <<authorauthor> Matthew Dunn <> Matthew Dunn </author/author>>
<<ISBNISBN> 123456789 > 123456789 </ISBN></ISBN> <</book/book>> <</books/books>>
Use of XML and HTML Use of XML and HTML togethertogether
This is pure data in XML fileThis is pure data in XML file This is a pure Format file to display the This is a pure Format file to display the
same datasame data
View the result with Google Chrome or IE View the result with Google Chrome or IE 6+ 6+
Converting Relational Database to Converting Relational Database to XMLXML
ExampleExample:: Exporting the following data into XML Exporting the following data into XML
Relational DatabaseRelational Database::
Store (Store (sidsid, name, phone), name, phone)
Book (Book (bidbid, title, authors), title, authors)
BookStore (BookStore (sid sid , , bidbid, price, stock), price, stock)
Store BookBookStore
phone
authors
bidtitlesid
name
price stock
Converting Relational Converting Relational Database to XML (Cont’d)Database to XML (Cont’d)
XML:XML:<<storestore> >
<<sidsid> 123 </> 123 </sidsid>><<namename> Chapter <> Chapter </name/name>><<phonephone> 429-8976<> 429-8976</phone/phone>><<bookbook> >
<<titletitle> The Da Vinci Code<> The Da Vinci Code</title/title> > <<authorsauthors> Dan Brown<> Dan Brown</authors/authors>><<bidbid> 987<> 987</bid/bid>>
<</book/book>><<bookbook>…<>…</book/book> > … …
<</store/store>>
ExamplesExamples
example of databaseexample of database
Example of database converted Example of database converted to XMLto XML
XML representation of a XML representation of a sample Movie Databasesample Movie Database
<?xml version="1.0" encoding="ISO-8859-1“ standalone=“yes”?><?xml version="1.0" encoding="ISO-8859-1“ standalone=“yes”?> <IMDb><IMDb>
<Movies> <Movies> <Movie> <Movie>
<Title> The Notebook</Title><Title> The Notebook</Title><Actor> Ryan Gosling</Actor><Actor> Ryan Gosling</Actor><Actor> Rachel McAdams</Actor><Actor> Rachel McAdams</Actor><Director> Nick Cassavetes</Director><Director> Nick Cassavetes</Director>
</Movie></Movie><Movie> <Movie>
<Title> 300 </Title><Title> 300 </Title><Actor> Gerard Butler</Actor><Actor> Gerard Butler</Actor><Actor> Lena Headey </Actor><Actor> Lena Headey </Actor><Director> Zack Snyder</Director><Director> Zack Snyder</Director>
</Movie></Movie>
</Movies></Movies></IMDb></IMDb>
XML JokeXML Joke
Question: When should I use Question: When should I use XML?XML?
Answer: When you need a Answer: When you need a buzzword in your resume. buzzword in your resume.
top related