the extensible markup language: an introduction to...
TRANSCRIPT
11
1
Spring 2000
Christophides Vassilis
The eXtensible Markup Language:An Introduction to XML
Documents & Databases
2
Spring 2000
Christophides Vassilis
Preliminary Issues
22
3
Spring 2000
Christophides Vassilis
What is a document?
● Content: the components(words, images etc). which makeup a document
● Structure: the organization andinter-relationship of thecomponents
● Presentation: how a documentlooks and what processes areapplied to it
4
Spring 2000
Christophides Vassilis
Separating these things means...
● The content can be re-used◆for printing◆for querying◆for exchanging
● The structure can be formallyvalidated
● The presentation can becustomized for
◆different media◆different audiences
● … in short, the informationcan be uncoupled from itsprocessing
33
5
Spring 2000
Christophides Vassilis
Documents vs Databases
Document worldDocument world
● plenty of small documents◆ usually static
● implicit structure◆ section, paragraph, toc,
● tagging◆ human friendly
● content◆ form/layout, annotation
● paradigms◆ “Save as”, WYSIWYG
● metadata◆ author name, date, subject
Database worldDatabase world
● a few large databases◆ usually dynamic
● explicit structure◆ types
● records◆ machine friendly
● content◆ data, methods
● paradigms◆ Data Independence, Transaction
Management, Query Languages
● metadata◆ schema description
6
Spring 2000
Christophides Vassilis
DBMS ANSI/SPARC Architecture
PHYSICALSCHEMA
LOGICALSCHEMA
,17(51$//(9(/
&21&(378$//(9(/
(;7(51$//(9(/
VIEW 1 VIEW 2 VIEW 3
INTEGRATION
44
7
Spring 2000
Christophides Vassilis
What to do with them
DocumentsDocuments
● editing
● spell-checking
● counting words
● retrieving (IR)
● printing
DatabaseDatabase
● updating
● cleaning
● querying
● composing/transforming
8
Spring 2000
Christophides Vassilis
Query Languages
Document RetrievalDocument Retrieval
Claude Monet DQG San DiegoMuseum of Art
Database QueryingDatabase Querying
VHOHFW pIURP Artists a, a.artwork pZKHUH a.first = “Claude”���������DQG a.last = “Monet”���������DQG p.located = “San Diego Museum of Art”
55
9
Spring 2000
Christophides Vassilis
The Long Road of Document Standards
Rick Jelliffe 1999
10
Spring 2000
Christophides Vassilis
�%!021(7��&ODXGH�%!�%5!+D\VWDFNV�DW�&KDLOO\�DW�6XQULVH�%5!�����%5!2LO�RQ�FDQYDV�%5!���[����FP���������[��������LQ���%5!6DQ�'LHJR�0XVHXP�RI�$UW��%5!�3!�,0*�65& ³KWWS�����������������DUWFKLYH�P�PRQHW�KD\ULFNV�MSJ´!
What’s Wrong with HTML
● If written properly, normal HTML may reflect document presentation,but it cannot adequately represent the semantics & structure of data
$UWLVW�1DPH
'DWH
$UWLIDFW�7LWOH
'LPHQVLRQV
0DWHULDO
0XVHXP
,PDJH�5HIHUHQFH
66
11
Spring 2000
Christophides Vassilis
HTML Document Presentation vs. …
12
Spring 2000
Christophides Vassilis
… XML Data Representation
● A possible XML markup of the same information will retain thestructure (and the semantics) of the various data objects
<ARTIST> <NAME><FIRST>Claude</FIRST><LAST>Monet</LAST></NAME> <ARTWORK> <ARTIFACT> <TITLE>Haystacks at Chailly at Sunrise</TITLE> <DATE>1865</DATE> <MATERIAL>Oil on canvas</MATERIAL> <DIM Metric=‘cm’> <HEIGHT>30</HEIGHT><WIDTH>60</WIDTH></DIM> <DIM Metric=‘in’> <HEIGHT>11 7/8</HEIGHT><WIDTH>23 3/4</WIDTH></DIM> <LOCATION>San Diego Museum of Art</LOCATION> <IMAGE File=‘http://192.41.13.240/artchive/m/monet/hayricks.jpg’/> </ARTIFACT> </ARTWORK></ARTIST>
77
13
Spring 2000
Christophides Vassilis
XML can be Published as normal Web Data
14
Spring 2000
Christophides Vassilis
What is XML?
● Markup Meta-Language for domain or application specific structureddocumentation
◆ Mathematical, chemical, musical, publishing, etc.
● Developed by the SGML Editorial Board formed under the auspices ofthe World Wide Web Consortium (W3C)
◆ Founded in 1996 by Jon Bosac (Sun) and various Web/SGML vendors:Textuality, Netscape, Microsoft, INSO, HP, Highland, NCSA, ArbortText,GRIF, SoftQuand
● Subset of SGML optimized for use in the Inter/Intranet◆ SGML is proving difficult to implement for Web/Intranet applications
◆ SGML has been hard to cost-justify to management
● Opens the way for a new generation of Web applications◆ Improve precision during searching and retrieval◆ Enable multiple usage of the same data◆ Facilitate distributed processing with more versatile ways to manipulate data
88
15
Spring 2000
Christophides Vassilis
Why XML?
● XML provides key features for a new generation of Web applications:◆Structuring: unlike HTML it preserves the structure of the data◆Extensibility: not a fixed format like HTML but user-oriented tagging◆Validation: provides the means to consuming applications to check
data for structural validity on importation◆Presentation Late Binding: describes data, not visual presentation◆Human Readable: similar to HTML◆Interchange: good for transmission of data from server to browser,
and from application to application, or machine to machine◆Open standard: non proprietary format
● XML becomes an integral part of the Web infrastructure◆Microsoft Explorer (V5.0) already offers XML browsing◆Ongoing XML implementation by Netscape◆Various XML middleware and manipulation tools
16
Spring 2000
Christophides Vassilis
The XML Language Family
● XML (Extensible Markup Language)◆ A subset of SGML (ISO 8879)
designed for easy implementation
● XLink (Extensible Linking Language)◆ A set of standard hypertext
mechanisms based on HyTime(ISO/IEC 10744) and the TextEncoding Initiative (TEI)
● XSL (Extensible Stylesheet Language)◆ A standard stylesheet language for
structured information derived fromDSSSL (ISO/IEC 10179) and key CSSconcepts
N0
50
100
150
200
250
300
350
HTML 0.0 TEI Lite 1.6 IBM ID Doc4.0
Tags
Attribute
∆τ$
99
17
Spring 2000
Christophides Vassilis
Interrelationships Among the Various W3C Efforts
18
Spring 2000
Christophides Vassilis
XML Syntax and Semantics
1010
19
Spring 2000
Christophides Vassilis
An Example of XML Markup
<ARTIST> <NAME> <FIRST>Claude</FIRST> <LAST>Monet</LAST> </NAME> <ARTWORK> <ARTIFACT> <TITLE>Haystacks at Chailly at Sunrise</TITLE> <DATE>1865</DATE> <MATERIAL>Oil on canvas</MATERIAL> <DIM Metric=‘cm’> <HEIGHT>30</HEIGHT><WIDTH>60</WIDTH></DIM> <DIM Metric=‘in’> <HEIGHT>11 7/8</HEIGHT><WIDTH>23 3/4</WIDTH></DIM> <LOCATION>San Diego Museum of Art</LOCATION> <IMAGE File=‘http://192.41.13.240/artchive/m/monet/hayricks.jpg’/> </ARTIFACT> </ARTWORK></ARTIST>
(OHPHQW�1DPH (OHPHQW�&RQWHQW
(PSW\�(OHPHQW
$WWULEXWH�9DOXH$WWULEXWH�1DPH
20
Spring 2000
Christophides Vassilis
The Logical Tree Structure of XML
ARTIST
NAME ARTWORK
FIRST LAST ARTIFACT
TITLE DATE
MATERIAL
DIM IMAGEDIM
LOCATION
���KD\ULFNV�MSJ
&ODXGH 021(7
+D\VWDFNV ����
2LO�RQ�FDQYDV
6DQ'LHJR0XV�
�� �� �����
�����
H W H W
1111
21
Spring 2000
Christophides Vassilis
XML Document Type Definitions
��'2&7<3(�DUWLVW�>��(/(0(17�DUWLVW��QDPH��ERUQ��GHDWK��DUWZRUN��QDWLRQDOLW\"��LQIOXHQFHV�!��$77/,67�DUWLVW�RLG�,'��5(48,5('�[PO�ODQJ�1072.(1��,03/,('!��(/(0(17�QDPH��ILUVW��ODVW�!��(/(0(17�ILUVW���3&'$7$�!��(/(0(17�ODVW���3&'$7$�!������(/(0(17�DUWZRUN��DUWLIDFW��!��(/(0(17�DUWIDFW��WLWOH��GDWH��PDWHULDO��GLP ��ORFDWLRQ��LPDJH�!��(/(0(17�WLWOH���3&'$7$�!������(/(0(17�GLP��KHLJKW��ZLGWK�!��$77/,67�GLP�PHWULF��FP_�LQ��µFP¶!��(/(0(17�ORFDWLRQ���3&'$7$�!��(/(0(17�LPDJH�(037<!��$77/,67�LPDJH�ILOH�(17,7<��5(48,5('!��(/(0(17�LQIOXHQFHV��3&'$7$�_�DUHI� !��(/(0(17�UHI�(037<!��$77/,67�DUHI�[PO�OLQN�&'$7$��),;('�µVLPSOH¶�KUHI�&'$7$��5(48,5('!��127$7,21�MSHJ�38%/,&�³���ORFDO��127$7,21�*SHJ�,PDJHV��(1´!��(17,7<�ILJ��6<67(0�µ«��PRQHW�KD\ULFNV�MSJ¶�1'$7$�MSHJ!@!
22
Spring 2000
Christophides Vassilis
XML Core Markup Features
● Elements: Components of the tree logical structure defined by a DTD◆identified in a document instance by descriptive markup, usually a
start-tag and end-tag● Attributes: Characteristics associated to the elements (other than
their content and type)◆ may be applied to one specific instance of a given element
● Entities: Named fragments of information that can be storedseparately from a document (or a DTD)
◆can be included in the document (or the DTD) one or more timesby reference to their names
1212
23
Spring 2000
Christophides Vassilis
Definition of Element’s Content
EMPTY��(/(0(17�LPDJH�(037<!
end-tag is omittedimplicit content or generated
automatically by an application
ANY��(/(0(17�REMHFW�$1<!
element content can consist of a freemixture of parsed character data and
any of the elements in the DTDMIXED TEXT
��(/(0(17�WLWOH���3&'$7$�!element content can contain
characters, entity references andtags allowed by the element model
GROUP��(/(0(17�PHGLD��LPDJH_YLGHR�!��(/(0(17�QDPH��ILUVW�ODVW�!
element content with connectors (‘,’ ‘|’ )and occurrence indicators (‘+’ ‘?’ ‘*’)
● Mixed models must be optional repeatable OR-groups, with #PCDATAfirst
24
Spring 2000
Christophides Vassilis
What XML can express?
● Sequence « , »
��(/(0(17�QDPH��ILUVW�ODVW�!
● Choice « | »
��(/(0(17�PHGLD��LPDJH�_�YLGHR�!
● Option ( 1 or 0 ) « ? »
��(/(0(17�DUWLVW��«��QDWLRQDOLW\"��«�
● Repetition (1 or more ) « + »
��(/(0(17�DUWZRUN��DUWLIDFW��!
● Option and Repetition ( 0, 1 or more ) « * »��(/(0(17�DUWIDFW�������GLP ������!
1313
25
Spring 2000
Christophides Vassilis
XML Content Models and Regular Expressions
● Each element content model is defined by a regular expression
◆ Example: QDPH��DGGU ��HPDLO
● Each regular expression determines a corresponding finite state
automaton
●This suggests a simple parsing program
● Content Models should be defined by unambiguous regular expressions
QDPH
DGGU
HPDLO
26
Spring 2000
Christophides Vassilis
XML Regular Expressions: Another Example
● Adding in the optional greet further complicates things◆Example: name,address*,(tel | fax)*,email*
QDPH
DGGUHVV
WHOWHO
ID[
ID[
HPDLO
HPDLO
HPDLO
1414
27
Spring 2000
Christophides Vassilis
Definition of Attribut’s Content
(value1 | value2 | .. |valuen)value enumeration
��$77/,67�GLP�PHWULF��FP�_�LQ�!�GLP�PHWULF� �µLQ¶�!
CDATAstring of valid XML character data
��$77/,67�UHI�KUHI�&'$7$�!�UHI�KUHI �µKWWS�����������������
DUWFKLYH�P�PRQHW�[PO¶!ID
symbolic identifier of an element��$77/,67�DUWLVW�RLG�,'!�DUWLVW�RLG� �0RQHW!
IDREF(S)reference (or list of) to element’s
identifiers
��$77/,67�DUWLIDFW�FUHDWRU�,'5()!�DUWLIDFW�FUHDWRU� �0RQHW!
ENTITY(IES)reference (or list of) to entity names
��$77/,67�LPDJH�ILOH�(17,7<!�LPDJH�ILOH� �ILJ�!
NMTOKEN(S)valid XML string preceded by
numbers, hyphen, period
��$77/,67�DUWLVW�[PO�ODQJ�1072.(1!�DUWLVW�[PO�ODQJ� �³BHQJOLVK´!
● More types (e.g., DATE) may soon be part of the standard
28
Spring 2000
Christophides Vassilis
Attribute Default Values
● Value µYL�¶◆a given value from an enumeration of values
● �),;(' value◆the value is the only possible instance for the attribute
● �5(48,5('◆the value must be supplied
● �,03/,('◆the value can be optionally supplied
1515
29
Spring 2000
Christophides Vassilis
XML Entities
● Entities allow the definition of short strings to stand for more complexinformation, which can reside inside or outside the document or its DTD
● Used for substitutions of data or markup:
◆DTD level e.g., markup declaration (Parameter entity)
◆Document level e.g., data and markup instances (General entity)
● Used for references to external data or markup sources:
◆the content of the entity can be found using an XML system-specificstorage location (Specific entity)
◆the content of the entity can be found by mapping a public identifierto a system-specific storage location (Public entity)
30
Spring 2000
Christophides Vassilis
XML Parameter Entities
● Parameter entities are used for extensible declarations (e.g., macros)of complex content models or attributes in a DTD
��(17,7<���VW\OH�³LPSUHVVLRQLVP�_�FXELVP�_�VXUUHDOLVP´!● Parameter entities can be nested
��(17,7<���ELEHOHP��³�ELEHOHP��_�H[SUHVVLRQLVP�_�GDGD´!
but we must avoid infinite loops
��(17,7<���ELEHOHP�³�ELEHOHP��_�H[SUHVVLRQLVP�_�GDGD´!● Replacement entity text can be found outside the DTD
��(17,7<���,62ODW��38%/,&�³,62������������(17,7,(6$GGHG�/DWLQ����(1´!
1616
31
Spring 2000
Christophides Vassilis
XML General Entities
● General entities are used for substitution of textual or not textualobjects (e.g., constants) occurring many times or are volatiles in thedocument instances
��(17,7<�[PO�³([WHQVLEOH�0DUNXS�/DQJXDJH´!● Replacement text of general entities can contain tags, character
references or other entities
��(17,7<�ZZZ�³:�&�5HFRPPHQGDWLRQ����)HEUXDU\�����´!��(17,7<�[PO�³�7,7/(!([WHQVLEOH�0DUNXS�/DQJXDJH�ZZZ���7,7/(!´!
but also we must avoid infinite loops● The content of a general entity can be found outside the DTD and it
may have a particular format
32
Spring 2000
Christophides Vassilis
XML Specific Entities
● Specific entities can be viewed as “abstract storage objects” (e.g., datastream) that are mapped onto real ones by using a system-specificstorage location
● Sub-documents encoded in XML with a different DTD
��(17,7<�ELRJUDSK\�6<67(0�³«��PRQHW�[PO´!
● Textual data encoded with a particular format
��(17,7<�ELEOLRJUDSK\�6<67(0�³«��PRQHW�ELE´!
● Non-SGML data
��(17,7<�ILJ��6<67(0�³«��PRQHW�KD\ULFNV�MSJ´�1'$7$�MSHJ!
1717
33
Spring 2000
Christophides Vassilis
The Main XML Components
34
Spring 2000
Christophides Vassilis
Well-Formed XML
● A textual object is said to be a well-formed XML document if it meetsall the well-formedness constraints (WFCs) of the XML syntax:
◆tags (etc.) are syntactically correct◆every tag has an end-tag◆tags are properly nested◆there exists a root
● By definition if a document is not well-formed, it is not XML◆This means that there is no an XML document which is not well-formed, and XML processors are not required to do anything withsuch documents
1818
35
Spring 2000
Christophides Vassilis
Valid XML
● A well-formed document is valid only if it contains a proper DTD and ifthe document obeys the constraints of that DTD and therefore theXML Validity Constraints (VCs)
◆only declared tags are used◆all tag occurrences conform to specified content models
● Examples:◆The following XML Document is well-formed but not valid
�DUWLVW!�021(7��&ODXGH���DUWLVW!◆The following XML Document is not even well-formed
�ILUVW!&ODXGH��ILUVW!�ODVW!021(7��ODVW!
36
Spring 2000
Christophides Vassilis
When do we need a DTD?
● At document preparation time (definitely)◆validation, checking, consistency
● At document processing time (probably)◆simplifies generic/specific processing◆may clarify intended semantics
● At document delivery time (possibly)◆strictly unnecessary for well-formed docs◆but reduces processing effort
Creation Composition
Validation
Usage
1919
37
Spring 2000
Christophides Vassilis
Where is the behaviour of XML defined?
● In a stylesheet
◆using XSL or CSS
● Possibly embedded in a programapplet, or script, or JAVA bean
◆defined for that particular DTD, setof tags, or tag
● By reference to pre-existing mutualagreement amongst user communities
◆aka “namespaces”
● By reference to a Document ObjectModel
;0/ ;6/
38
Spring 2000
Christophides Vassilis
type-checkingconstantsmacrosvoid*void*header file#ifdefstandard librarynamespace
validationentity referenceentity parameterANYIDREFDTDconditional sectionkey entitiesnamespace
Comparing XML and Programming Languages
●�But no type inference, polymorphism, modules, etc.
2020
39
Spring 2000
Christophides Vassilis
XML DTDs vs. Database Schemas
● By database standards, DTDs are rather weak specifications◆Only one base type i.e., PCDATA◆Only two element constructors i.e., sequence and alternative◆No useful “abstractions” e.g., bulk types, inheritance◆IDREFs are untyped
• You point to something, but you don’t know what!
◆No integrity constraints e.g., FKLOG is inverse of SDUHQW◆No methods◆Tag definitions are global
● Recent XML extensions impose something like a schema or type onan XML data (XML Schema)
40
Spring 2000
Christophides Vassilis
XML vs. ODMG ODL: Example
FODVV�0RYLH����H[WHQW�0RYLHV��NH\�WLWOH����^�DWWULEXWH�VWULQJ�WLWOH��DWWULEXWH�VWULQJ�GLUHFWRU��UHODWLRQVKLS�VHW�$FWRU!�FDVWV����������LQYHUVH�$FWRU��DFWHGB,Q��DWWULEXWH�LQW�EXGJHW�`��
FODVV�$FWRU����H[WHQW�$FWRUV��NH\�QDPH����^��DWWULEXWH�VWULQJ�QDPH���UHODWLRQVKLS�VHW�0RYLH!�DFWHGB,Q����������������������LQYHUVH�0RYLH��FDVWV���DWWULEXWH�LQW�DJH���DWWULEXWH�VHW�VWULQJ!�GLUHFWHG�`��
2121
41
Spring 2000
Christophides Vassilis
XML vs. ODMG ODL: Example
�GE!
����PRYLH�LG ³P�´!
�������WLWOH!:DNLQJ�1HG�'LYLQH��WLWOH!
�������GLUHFWRU!.LUN�-RQHV�,,,��GLUHFWRU!
�������FDVW�LGUHIV ³D��D�´��!
�������EXGJHW!���������EXGJHW!������
�����PRYLH!
����PRYLH�LG ³P�´!
�������WLWOH!'UDJRQKHDUW��WLWOH!
�������GLUHFWRU!5RE�&RKHQ��GLUHFWRU!
�������FDVW�LGUHIV ³D��D��D��´�!
�������EXGJHW!���������EXGJHW!������
�����PRYLH!
����PRYLH�LG ³P�´!
�������WLWOH!0RRQGDQFH��WLWOH!
�������GLUHFWRU!'DJPDU�+LUW]��GLUHFWRU!
�������FDVW�LGUHIV ³D��D�´�!
�������EXGJHW!��������EXGJHW!������
�����PRYLH!
����DFWRU�LG ³D�´!�������QDPH!'DYLG�.HOO\��QDPH!�������DFWHGB,Q�LGUHIV ³P��P��P��´�!�����DFWRU!����DFWRU�LG ³D�´!��������QDPH!6HDQ�&RQQHU\��QDPH!��������DFWHGB,Q�LGUHIV ³P��P��P��´�!��������DJH!����DJH!�����DFWRU!����DFWRU�LG ³D�´!���������QDPH!,DQ�%DQQHQ��QDPH!���������DFWHGB,Q�LGUHIV ³P��P��´�!�����DFWRU!�������GE!
42
Spring 2000
Christophides Vassilis
XML vs. ODMG ODL: Example
��'2&7<3(�GE�>�����(/(0(17� GE������� �PRYLH���DFWRU��!�����(/(0(17� PRYLH�� �WLWOH��GLUHFWRU��FDVW��EXGJHW�!�����$77/,67�� PRYLH���LG���,'����5(48,5('!�����(/(0(17� WLWOH��������3&'$7$�!�����(/(0(17� GLUHFWRU���3&'$7$�!�����(/(0(17� FDVW� (037<!�����$77/,67� FDVW� LGUHIV���,'5()6���5(48,5('!�����(/(0(17� EXGJHW� ��3&'$7$�!�����(/(0(17� DFWRU� �QDPH��DFWHGB,Q��DJH"��GLUHFWHG �!�����$77/,67� DFWRU�� LG�,'��5(48,5('!�����(/(0(17� QDPH� ��3&'$7$�!�����(/(0(17� DFWHGB,Q�(037<!�����$77/,67� DFWHGB,Q�LGUHIV�,'5()6��5(48,5('!�����(/(0(17� DJH���������3&'$7$�!�����(/(0(17� GLUHFWHG���3&'$7$�!�@!
2222
43
Spring 2000
Christophides Vassilis
Mapping Between XML and Objects
44
Spring 2000
Christophides Vassilis
XML vs Relational DBMS
SURMHFWV�WLWOH��������EXGJHW�������PDQDJHG%\
HPSOR\HHV�QDPH�����VVQ�����DJH
��'2&7<3(�GE�>��(/(0(17�GE� ���������SURMHFWV��HPSOR\HHV�!��(/(0(17�SURMHFWV������SURMHFW �!��(/(0(17�HPSOR\HHV��HPSOR\HH �!��(/(0(17�SURMHFW�������WLWOH��EXGJHW��PDQDJHG%\�!��(/(0(17�HPSOR\HH���QDPH��VVQ��DJH�!���
@!
��'2&7<3(�GE�>��(/(0(17�GE������������SURMHFW�_�HPSOR\HH� !��(/(0(17�SURMHFW������WLWOH��EXGJHW��PDQDJHG%\�!��(/(0(17�HPSOR\HH��QDPH��VVQ��DJH�!���
@!
2323
45
Spring 2000
Christophides Vassilis
Recursive DTDs
�'2&7<3(�JHQHDORJ\�>��(/(0(17�JHQHDORJ\��SHUVRQ �!��(/(0(17�SHUVRQ���QDPH�
���������������������������������GDWH2I%LUWK�������������������������������SHUVRQ��������������������PRWKHU������������������������������SHUVRQ��������������������IDWKHU��!���
@!
● What is the problem with this?
46
Spring 2000
Christophides Vassilis
Recursive DTDs cont’d.
�'2&7<3(�JHQHDORJ\�>��(/(0(17�JHQHDORJ\��SHUVRQ �!��(/(0(17�SHUVRQ���QDPH��������������������������������GDWH2I%LUWK��������������������������������SHUVRQ"���������������������PRWKHU�������������������������������SHUVRQ"���!����������������IDWKHU���
@!
● What is now the problem with this?
2424
47
Spring 2000
Christophides Vassilis
Some Things are Hard to Specify
● Each employee element is to contain name, age and ssn elements insome order
��(/(0(17�HPSOR\HH�����QDPH��DJH��VVQ��_��DJH��VVQ��QDPH��_��VVQ��QDPH��DJH��_�«�!
● Suppose there were many more fields !
48
Spring 2000
Christophides Vassilis
Specifying ,' and ,'5()�Attributes
��'2&7<3(�IDPLO\�>��(/(0(17�IDPLO\����SHUVRQ� !��(/(0(17�SHUVRQ���QDPH�!��(/(0(17�QDPH������3&'$7$�!��$77/,67�SHUVRQ�LG�����������,'���������5(48,5('���������������������������PRWKHU���,'5()����,03/,('���������������������������IDWKHU�����,'5()����,03/,('������������������������FKLOGUHQ��,'5()6���,03/,('!
@!
2525
49
Spring 2000
Christophides Vassilis
Some Conforming XML data
�IDPLO\!�� �SHUVRQ��LG �MDQH���PRWKHU �PDU\��IDWKHU �MRKQ�!����� �QDPH!�-DQH�'RH���QDPH!�� ��SHUVRQ!�� �SHUVRQ�LG �MRKQ��FKLOGUHQ �MDQH�MDFN�!���� �QDPH!�-RKQ�'RH���QDPH!�� ��SHUVRQ!�� �SHUVRQ�LG �PDU\��FKLOGUHQ �MDQH��MDFN�!���� �QDPH!�0DU\�'RH���QDPH!�� ��SHUVRQ!�� �SHUVRQ��LG �MDFN���PRWKHU ´PDU\��IDWKHU �MRKQ�!����� �QDPH!�-DFN�'RH���QDPH!��� ��SHUVRQ!��IDPLO\!
50
Spring 2000
Christophides Vassilis
An Alternative XML DTD Specification
��'2&7<3(�IDPLO\�>��(/(0(17�IDPLO\����SHUVRQ� !��(/(0(17�SHUVRQ���PRWKHU"��IDWKHU"��FKLOGUHQ��QDPH�!��$77/,67�SHUVRQ�LG�,'��5(48,5('!��(/(0(17�QDPH������3&'$7$�!��(/(0(17�PRWKHU�(037<!��$77/,67�PRWKHU�LGUHI�,'5()��5(48,5('!��(/(0(17�IDWKHU�(037<!��$77/,67�IDWKHU�LGUHI�,'5()��5(48,5('!��(/(0(17�FKLOGUHQ�(037<!��$77/,67�FKLOGUHQ�LGUHIV�,'5()6��5(48,5('!@!
2626
51
Spring 2000
Christophides Vassilis
The Revised XML Data
�IDPLO\!�� �SHUVRQ��LG� ��MDQH´!
�QDPH!�-DQH�'RH���QDPH!������� �PRWKHU�LGUHI� ��PDU\´!��PRWKHU!������� �IDWKHU�LGUHI� ��MRKQ�!��IDWKHU!
��SHUVRQ!�� �SHUVRQ�LG� ��MRKQ´!
�QDPH!�-RKQ�'RH���QDPH!�FKLOGUHQ�LGUHIV� ��MDQH�MDFN�!���FKLOGUHQ!
������ ��SHUVRQ!���
��IDPLO\!
52
Spring 2000
Christophides Vassilis
Mapping between XML and Tables
2727
53
Spring 2000
Christophides Vassilis
Bluring the Frontiers between Data & Documents
54
Spring 2000
Christophides Vassilis
Towards XML-enabled DBMS
● Xml-enabled database system◆ Store XML data/documents into the
database server◆ Query and search valid and well-
formed XML◆ Generate XML data from the
database server◆ Add XML capabilities in supporting
database facilities● XML has the potential to impact four
important markets◆ Web integration◆ Web publishing◆ Application integration◆ Electronic commerce
DBMS
Store XML
GenerateXML
Integrate with other facilities
2828
55
Spring 2000
Christophides Vassilis
Storing XML Data
● Enhance XML storage facilities inthe database:◆ Utilities to load XML data into the
database◆ Provide more efficient database
storage (componentized storage,compression, indexing,…)
◆ XML export tools from the server◆ Allow server-to-server replication
of XML data
Database
HTMLHTML
XML
Database
56
Spring 2000
Christophides Vassilis
Querying and Searching XML Data
● Fine-grained access to XMLdocuments
● Search XML data efficiently◆Special SQL queries over
valid + well-formed XML◆Content-based indexing (e.g.
Text indexes) for searchingXML data efficiently
◆Support for XML querylanguages (e.g. XQL) on XMLdata
Database
HTMLHTML
XML
Web
2929
57
Spring 2000
Christophides Vassilis
Generating and Manipulating XML
● Generate XML from the database server
◆ Map ODMG, SQL92, SQL3 and PL/SQLdatatypes to XML
◆ Provide mappings between java, SQLand XML types
● Script XML content from the database
◆ Allow SQL queries to return XML results◆ Provide embedded XML in stored
procedures◆ Java scripting: support embedded XML
in java◆ Common APIs to access any XML
content in databases
DatabaseHTML
XML
Web
58
Spring 2000
Christophides Vassilis
DatabaseX
DatabaseY
XML XML
XMLSortedXMLTotal
3030
59
Spring 2000
Christophides Vassilis
● 1960’s: Data Centric
● 1970’s: Process Centric● 1980’s: Object Oriented● 1990’s: Component Based● 2000’s: XML?
Epilogue
60
Spring 2000
Christophides Vassilis
��¶V'DWD
�5HFRUG�/D\RXWV�3ULQWHU�/D\RXWV�6\VWHP�)ORZ�&KDUWV�'HFLVLRQ�7DEOHV
%DWFK�-REV�ZHUH�D�6HULHVRI�VPDOO�3URJUDPV
Data was our First Focus
3131
61
Spring 2000
Christophides Vassilis
��¶V'DWD
��¶V/RJLF
�*272�/HVV�3URJUDPPLQJ�6WUXFWXUHG�3URJUDPPLQJ�7RS�'RZQ�'HVLJQ
3URJUDPV�%HFDPH9HU\�/DUJH
Then we Focused on Logic
62
Spring 2000
Christophides Vassilis
��¶V'DWD
��¶V22
��¶V/RJLF
�&RPPRQ�7HUPV�IRU��$QDO\VLV�DQG�'HVLJQ�7LJKWO\�&RXSOHG�&RGH
&RGH�5HXVH�ZDV�WKH�+RO\�*UDLO�5DUHO\�$FKLHYHG
Object Oriented ProgrammingFocused on Runtime Behavior
3232
63
Spring 2000
Christophides Vassilis
��¶V'DWD
��¶V&RPS
��¶V/RJLF
��¶V226HULDOL]DWLRQ�7LHG�WR�&RGH
�&RGH�5HXVH�,'(�%DVHG�&RPSRVLWLRQ�/LPLWHG�$FFHSWDQFH
Component ProgrammingShifted the Focus to Interfaces
64
Spring 2000
Christophides Vassilis
��¶V;0/
��¶V/RJLF
��¶V22
��¶V&RPS
�;0/�:UDSSHUV�IRU�,QFRPSDWLEOH�6\VWHPV�,QGXVWU\�6SHFLILF�0DUNXS�/DQJXDJHV�;0/�IRU�3HUVLVWHQW�'DWD�DQG�&RPSRVLWLRQ
;0/�(QDEOHV�0LGGOHZDUHIRU�$SSOLFDWLRQ�6SHFLILF�'DWD
XML Returns the Focus to Data
3333
65
Spring 2000
Christophides Vassilis
BIBLIOGRAPHY
● Charles F. Goldfarb, Paul Prescod, Paper Michael, Leventhal, et al. “The XMLHandbook”. Printice Hall, 1998.
● David Megginson. “Structuring XML Documents”. Printice Hall, 1998.● Simon St. Laurent. “XML : Extensible Markup Language”. IDG Books, 1998.● Rick Jelliffe. “The XML and SGML Cookbook : Recipes for Structured
Information”. Printice Hall, 1998.● Simon St. Laurent. “Xml : A Primer”. IDG Books, 1998.● Steven Holzner. “XML Complete”. McGraw-Hill, 1997.● Richard Light, Tim Bray. “Presenting Xml”. Macmillan Publishing, 1997.● Bryan Pfaffengerger. “Web Publishing With XML in Six Easy Steps”, 1997.● Steven J. DeRose. “The SGML FAQ Book : Understanding the Foundation of
HTML and XML”. Kluwer Academic Publishers, 1997.● Sean McGrath. “XML by Example: Building E-Commerce Applications”. Printice
Hall, 1998● Charles F. Goldfarb, Steve Pepper, Chet Ensign. “SGML Buyer’s Guide : A
Unique Guide to Determining Your Requirements and Choosing the Right SGMLand XML Products and Services”. Printice Hall, 1998.