the extensible markup language: an introduction to...

33
1 1 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases 2 Spring 2000 Christophides Vassilis Preliminary Issues

Upload: vohuong

Post on 03-Sep-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

11

1

Spring 2000

Christophides Vassilis

The eXtensible Markup Language:An Introduction to XML

Documents & Databases

2

Spring 2000

Christophides Vassilis

Preliminary Issues

22

3

Spring 2000

Christophides Vassilis

What is a document?

● Content: the components(words, images etc). which makeup a document

● Structure: the organization andinter-relationship of thecomponents

● Presentation: how a documentlooks and what processes areapplied to it

4

Spring 2000

Christophides Vassilis

Separating these things means...

● The content can be re-used◆for printing◆for querying◆for exchanging

● The structure can be formallyvalidated

● The presentation can becustomized for

◆different media◆different audiences

● … in short, the informationcan be uncoupled from itsprocessing

33

5

Spring 2000

Christophides Vassilis

Documents vs Databases

Document worldDocument world

● plenty of small documents◆ usually static

● implicit structure◆ section, paragraph, toc,

● tagging◆ human friendly

● content◆ form/layout, annotation

● paradigms◆ “Save as”, WYSIWYG

● metadata◆ author name, date, subject

Database worldDatabase world

● a few large databases◆ usually dynamic

● explicit structure◆ types

● records◆ machine friendly

● content◆ data, methods

● paradigms◆ Data Independence, Transaction

Management, Query Languages

● metadata◆ schema description

6

Spring 2000

Christophides Vassilis

DBMS ANSI/SPARC Architecture

PHYSICALSCHEMA

LOGICALSCHEMA

,17(51$//(9(/

&21&(378$//(9(/

(;7(51$//(9(/

VIEW 1 VIEW 2 VIEW 3

INTEGRATION

44

7

Spring 2000

Christophides Vassilis

What to do with them

DocumentsDocuments

● editing

● spell-checking

● counting words

● retrieving (IR)

● printing

DatabaseDatabase

● updating

● cleaning

● querying

● composing/transforming

8

Spring 2000

Christophides Vassilis

Query Languages

Document RetrievalDocument Retrieval

Claude Monet DQG San DiegoMuseum of Art

Database QueryingDatabase Querying

VHOHFW pIURP Artists a, a.artwork pZKHUH a.first = “Claude”���������DQG a.last = “Monet”���������DQG p.located = “San Diego Museum of Art”

55

9

Spring 2000

Christophides Vassilis

The Long Road of Document Standards

Rick Jelliffe 1999

10

Spring 2000

Christophides Vassilis

�%!021(7��&ODXGH�%!�%5!+D\VWDFNV�DW�&KDLOO\�DW�6XQULVH�%5!�����%5!2LO�RQ�FDQYDV�%5!���[����FP���������[��������LQ���%5!6DQ�'LHJR�0XVHXP�RI�$UW��%5!�3!�,0*�65& ³KWWS�����������������DUWFKLYH�P�PRQHW�KD\ULFNV�MSJ´!

What’s Wrong with HTML

● If written properly, normal HTML may reflect document presentation,but it cannot adequately represent the semantics & structure of data

$UWLVW�1DPH

'DWH

$UWLIDFW�7LWOH

'LPHQVLRQV

0DWHULDO

0XVHXP

,PDJH�5HIHUHQFH

66

11

Spring 2000

Christophides Vassilis

HTML Document Presentation vs. …

12

Spring 2000

Christophides Vassilis

… XML Data Representation

● A possible XML markup of the same information will retain thestructure (and the semantics) of the various data objects

<ARTIST> <NAME><FIRST>Claude</FIRST><LAST>Monet</LAST></NAME> <ARTWORK> <ARTIFACT> <TITLE>Haystacks at Chailly at Sunrise</TITLE> <DATE>1865</DATE> <MATERIAL>Oil on canvas</MATERIAL> <DIM Metric=‘cm’> <HEIGHT>30</HEIGHT><WIDTH>60</WIDTH></DIM> <DIM Metric=‘in’> <HEIGHT>11 7/8</HEIGHT><WIDTH>23 3/4</WIDTH></DIM> <LOCATION>San Diego Museum of Art</LOCATION> <IMAGE File=‘http://192.41.13.240/artchive/m/monet/hayricks.jpg’/> </ARTIFACT> </ARTWORK></ARTIST>

77

13

Spring 2000

Christophides Vassilis

XML can be Published as normal Web Data

14

Spring 2000

Christophides Vassilis

What is XML?

● Markup Meta-Language for domain or application specific structureddocumentation

◆ Mathematical, chemical, musical, publishing, etc.

● Developed by the SGML Editorial Board formed under the auspices ofthe World Wide Web Consortium (W3C)

◆ Founded in 1996 by Jon Bosac (Sun) and various Web/SGML vendors:Textuality, Netscape, Microsoft, INSO, HP, Highland, NCSA, ArbortText,GRIF, SoftQuand

● Subset of SGML optimized for use in the Inter/Intranet◆ SGML is proving difficult to implement for Web/Intranet applications

◆ SGML has been hard to cost-justify to management

● Opens the way for a new generation of Web applications◆ Improve precision during searching and retrieval◆ Enable multiple usage of the same data◆ Facilitate distributed processing with more versatile ways to manipulate data

88

15

Spring 2000

Christophides Vassilis

Why XML?

● XML provides key features for a new generation of Web applications:◆Structuring: unlike HTML it preserves the structure of the data◆Extensibility: not a fixed format like HTML but user-oriented tagging◆Validation: provides the means to consuming applications to check

data for structural validity on importation◆Presentation Late Binding: describes data, not visual presentation◆Human Readable: similar to HTML◆Interchange: good for transmission of data from server to browser,

and from application to application, or machine to machine◆Open standard: non proprietary format

● XML becomes an integral part of the Web infrastructure◆Microsoft Explorer (V5.0) already offers XML browsing◆Ongoing XML implementation by Netscape◆Various XML middleware and manipulation tools

16

Spring 2000

Christophides Vassilis

The XML Language Family

● XML (Extensible Markup Language)◆ A subset of SGML (ISO 8879)

designed for easy implementation

● XLink (Extensible Linking Language)◆ A set of standard hypertext

mechanisms based on HyTime(ISO/IEC 10744) and the TextEncoding Initiative (TEI)

● XSL (Extensible Stylesheet Language)◆ A standard stylesheet language for

structured information derived fromDSSSL (ISO/IEC 10179) and key CSSconcepts

N0

50

100

150

200

250

300

350

HTML 0.0 TEI Lite 1.6 IBM ID Doc4.0

Tags

Attribute

∆τ$

99

17

Spring 2000

Christophides Vassilis

Interrelationships Among the Various W3C Efforts

18

Spring 2000

Christophides Vassilis

XML Syntax and Semantics

1010

19

Spring 2000

Christophides Vassilis

An Example of XML Markup

<ARTIST> <NAME> <FIRST>Claude</FIRST> <LAST>Monet</LAST> </NAME> <ARTWORK> <ARTIFACT> <TITLE>Haystacks at Chailly at Sunrise</TITLE> <DATE>1865</DATE> <MATERIAL>Oil on canvas</MATERIAL> <DIM Metric=‘cm’> <HEIGHT>30</HEIGHT><WIDTH>60</WIDTH></DIM> <DIM Metric=‘in’> <HEIGHT>11 7/8</HEIGHT><WIDTH>23 3/4</WIDTH></DIM> <LOCATION>San Diego Museum of Art</LOCATION> <IMAGE File=‘http://192.41.13.240/artchive/m/monet/hayricks.jpg’/> </ARTIFACT> </ARTWORK></ARTIST>

(OHPHQW�1DPH (OHPHQW�&RQWHQW

(PSW\�(OHPHQW

$WWULEXWH�9DOXH$WWULEXWH�1DPH

20

Spring 2000

Christophides Vassilis

The Logical Tree Structure of XML

ARTIST

NAME ARTWORK

FIRST LAST ARTIFACT

TITLE DATE

MATERIAL

DIM IMAGEDIM

LOCATION

���KD\ULFNV�MSJ

&ODXGH 021(7

+D\VWDFNV ����

2LO�RQ�FDQYDV

6DQ'LHJR0XV�

�� �� �����

�����

H W H W

1111

21

Spring 2000

Christophides Vassilis

XML Document Type Definitions

��'2&7<3(�DUWLVW�>��(/(0(17�DUWLVW��QDPH��ERUQ��GHDWK��DUWZRUN��QDWLRQDOLW\"��LQIOXHQFHV�!��$77/,67�DUWLVW�RLG�,'��5(48,5('�[PO�ODQJ�1072.(1��,03/,('!��(/(0(17�QDPH��ILUVW��ODVW�!��(/(0(17�ILUVW���3&'$7$�!��(/(0(17�ODVW���3&'$7$�!������(/(0(17�DUWZRUN��DUWLIDFW��!��(/(0(17�DUWIDFW��WLWOH��GDWH��PDWHULDO��GLP ��ORFDWLRQ��LPDJH�!��(/(0(17�WLWOH���3&'$7$�!������(/(0(17�GLP��KHLJKW��ZLGWK�!��$77/,67�GLP�PHWULF��FP_�LQ��µFP¶!��(/(0(17�ORFDWLRQ���3&'$7$�!��(/(0(17�LPDJH�(037<!��$77/,67�LPDJH�ILOH�(17,7<��5(48,5('!��(/(0(17�LQIOXHQFHV��3&'$7$�_�DUHI� !��(/(0(17�UHI�(037<!��$77/,67�DUHI�[PO�OLQN�&'$7$��),;('�µVLPSOH¶�KUHI�&'$7$��5(48,5('!��127$7,21�MSHJ�38%/,&�³���ORFDO��127$7,21�*SHJ�,PDJHV��(1´!��(17,7<�ILJ��6<67(0�µ«��PRQHW�KD\ULFNV�MSJ¶�1'$7$�MSHJ!@!

22

Spring 2000

Christophides Vassilis

XML Core Markup Features

● Elements: Components of the tree logical structure defined by a DTD◆identified in a document instance by descriptive markup, usually a

start-tag and end-tag● Attributes: Characteristics associated to the elements (other than

their content and type)◆ may be applied to one specific instance of a given element

● Entities: Named fragments of information that can be storedseparately from a document (or a DTD)

◆can be included in the document (or the DTD) one or more timesby reference to their names

1212

23

Spring 2000

Christophides Vassilis

Definition of Element’s Content

EMPTY��(/(0(17�LPDJH�(037<!

end-tag is omittedimplicit content or generated

automatically by an application

ANY��(/(0(17�REMHFW�$1<!

element content can consist of a freemixture of parsed character data and

any of the elements in the DTDMIXED TEXT

��(/(0(17�WLWOH���3&'$7$�!element content can contain

characters, entity references andtags allowed by the element model

GROUP��(/(0(17�PHGLD��LPDJH_YLGHR�!��(/(0(17�QDPH��ILUVW�ODVW�!

element content with connectors (‘,’ ‘|’ )and occurrence indicators (‘+’ ‘?’ ‘*’)

● Mixed models must be optional repeatable OR-groups, with #PCDATAfirst

24

Spring 2000

Christophides Vassilis

What XML can express?

● Sequence « , »

��(/(0(17�QDPH��ILUVW�ODVW�!

● Choice « | »

��(/(0(17�PHGLD��LPDJH�_�YLGHR�!

● Option ( 1 or 0 ) « ? »

��(/(0(17�DUWLVW��«��QDWLRQDOLW\"��«�

● Repetition (1 or more ) « + »

��(/(0(17�DUWZRUN��DUWLIDFW��!

● Option and Repetition ( 0, 1 or more ) « * »��(/(0(17�DUWIDFW�������GLP ������!

1313

25

Spring 2000

Christophides Vassilis

XML Content Models and Regular Expressions

● Each element content model is defined by a regular expression

◆ Example: QDPH��DGGU ��HPDLO

● Each regular expression determines a corresponding finite state

automaton

●This suggests a simple parsing program

● Content Models should be defined by unambiguous regular expressions

QDPH

DGGU

HPDLO

26

Spring 2000

Christophides Vassilis

XML Regular Expressions: Another Example

● Adding in the optional greet further complicates things◆Example: name,address*,(tel | fax)*,email*

QDPH

DGGUHVV

WHOWHO

ID[

ID[

HPDLO

HPDLO

HPDLO

1414

27

Spring 2000

Christophides Vassilis

Definition of Attribut’s Content

(value1 | value2 | .. |valuen)value enumeration

��$77/,67�GLP�PHWULF��FP�_�LQ�!�GLP�PHWULF� �µLQ¶�!

CDATAstring of valid XML character data

��$77/,67�UHI�KUHI�&'$7$�!�UHI�KUHI �µKWWS�����������������

DUWFKLYH�P�PRQHW�[PO¶!ID

symbolic identifier of an element��$77/,67�DUWLVW�RLG�,'!�DUWLVW�RLG� �0RQHW!

IDREF(S)reference (or list of) to element’s

identifiers

��$77/,67�DUWLIDFW�FUHDWRU�,'5()!�DUWLIDFW�FUHDWRU� �0RQHW!

ENTITY(IES)reference (or list of) to entity names

��$77/,67�LPDJH�ILOH�(17,7<!�LPDJH�ILOH� �ILJ�!

NMTOKEN(S)valid XML string preceded by

numbers, hyphen, period

��$77/,67�DUWLVW�[PO�ODQJ�1072.(1!�DUWLVW�[PO�ODQJ� �³BHQJOLVK´!

● More types (e.g., DATE) may soon be part of the standard

28

Spring 2000

Christophides Vassilis

Attribute Default Values

● Value µYL�¶◆a given value from an enumeration of values

● �),;(' value◆the value is the only possible instance for the attribute

● �5(48,5('◆the value must be supplied

● �,03/,('◆the value can be optionally supplied

1515

29

Spring 2000

Christophides Vassilis

XML Entities

● Entities allow the definition of short strings to stand for more complexinformation, which can reside inside or outside the document or its DTD

● Used for substitutions of data or markup:

◆DTD level e.g., markup declaration (Parameter entity)

◆Document level e.g., data and markup instances (General entity)

● Used for references to external data or markup sources:

◆the content of the entity can be found using an XML system-specificstorage location (Specific entity)

◆the content of the entity can be found by mapping a public identifierto a system-specific storage location (Public entity)

30

Spring 2000

Christophides Vassilis

XML Parameter Entities

● Parameter entities are used for extensible declarations (e.g., macros)of complex content models or attributes in a DTD

��(17,7<���VW\OH�³LPSUHVVLRQLVP�_�FXELVP�_�VXUUHDOLVP´!● Parameter entities can be nested

��(17,7<���ELEHOHP��³�ELEHOHP��_�H[SUHVVLRQLVP�_�GDGD´!

but we must avoid infinite loops

��(17,7<���ELEHOHP�³�ELEHOHP��_�H[SUHVVLRQLVP�_�GDGD´!● Replacement entity text can be found outside the DTD

��(17,7<���,62ODW��38%/,&�³,62������������(17,7,(6$GGHG�/DWLQ����(1´!

1616

31

Spring 2000

Christophides Vassilis

XML General Entities

● General entities are used for substitution of textual or not textualobjects (e.g., constants) occurring many times or are volatiles in thedocument instances

��(17,7<�[PO�³([WHQVLEOH�0DUNXS�/DQJXDJH´!● Replacement text of general entities can contain tags, character

references or other entities

��(17,7<�ZZZ�³:�&�5HFRPPHQGDWLRQ����)HEUXDU\�����´!��(17,7<�[PO�³�7,7/(!([WHQVLEOH�0DUNXS�/DQJXDJH�ZZZ���7,7/(!´!

but also we must avoid infinite loops● The content of a general entity can be found outside the DTD and it

may have a particular format

32

Spring 2000

Christophides Vassilis

XML Specific Entities

● Specific entities can be viewed as “abstract storage objects” (e.g., datastream) that are mapped onto real ones by using a system-specificstorage location

● Sub-documents encoded in XML with a different DTD

��(17,7<�ELRJUDSK\�6<67(0�³«��PRQHW�[PO´!

● Textual data encoded with a particular format

��(17,7<�ELEOLRJUDSK\�6<67(0�³«��PRQHW�ELE´!

● Non-SGML data

��(17,7<�ILJ��6<67(0�³«��PRQHW�KD\ULFNV�MSJ´�1'$7$�MSHJ!

1717

33

Spring 2000

Christophides Vassilis

The Main XML Components

34

Spring 2000

Christophides Vassilis

Well-Formed XML

● A textual object is said to be a well-formed XML document if it meetsall the well-formedness constraints (WFCs) of the XML syntax:

◆tags (etc.) are syntactically correct◆every tag has an end-tag◆tags are properly nested◆there exists a root

● By definition if a document is not well-formed, it is not XML◆This means that there is no an XML document which is not well-formed, and XML processors are not required to do anything withsuch documents

1818

35

Spring 2000

Christophides Vassilis

Valid XML

● A well-formed document is valid only if it contains a proper DTD and ifthe document obeys the constraints of that DTD and therefore theXML Validity Constraints (VCs)

◆only declared tags are used◆all tag occurrences conform to specified content models

● Examples:◆The following XML Document is well-formed but not valid

�DUWLVW!�021(7��&ODXGH���DUWLVW!◆The following XML Document is not even well-formed

�ILUVW!&ODXGH��ILUVW!�ODVW!021(7��ODVW!

36

Spring 2000

Christophides Vassilis

When do we need a DTD?

● At document preparation time (definitely)◆validation, checking, consistency

● At document processing time (probably)◆simplifies generic/specific processing◆may clarify intended semantics

● At document delivery time (possibly)◆strictly unnecessary for well-formed docs◆but reduces processing effort

Creation Composition

Validation

Usage

1919

37

Spring 2000

Christophides Vassilis

Where is the behaviour of XML defined?

● In a stylesheet

◆using XSL or CSS

● Possibly embedded in a programapplet, or script, or JAVA bean

◆defined for that particular DTD, setof tags, or tag

● By reference to pre-existing mutualagreement amongst user communities

◆aka “namespaces”

● By reference to a Document ObjectModel

;0/ ;6/

38

Spring 2000

Christophides Vassilis

type-checkingconstantsmacrosvoid*void*header file#ifdefstandard librarynamespace

validationentity referenceentity parameterANYIDREFDTDconditional sectionkey entitiesnamespace

Comparing XML and Programming Languages

●�But no type inference, polymorphism, modules, etc.

2020

39

Spring 2000

Christophides Vassilis

XML DTDs vs. Database Schemas

● By database standards, DTDs are rather weak specifications◆Only one base type i.e., PCDATA◆Only two element constructors i.e., sequence and alternative◆No useful “abstractions” e.g., bulk types, inheritance◆IDREFs are untyped

• You point to something, but you don’t know what!

◆No integrity constraints e.g., FKLOG is inverse of SDUHQW◆No methods◆Tag definitions are global

● Recent XML extensions impose something like a schema or type onan XML data (XML Schema)

40

Spring 2000

Christophides Vassilis

XML vs. ODMG ODL: Example

FODVV�0RYLH����H[WHQW�0RYLHV��NH\�WLWOH����^�DWWULEXWH�VWULQJ�WLWOH��DWWULEXWH�VWULQJ�GLUHFWRU��UHODWLRQVKLS�VHW�$FWRU!�FDVWV����������LQYHUVH�$FWRU��DFWHGB,Q��DWWULEXWH�LQW�EXGJHW�`��

FODVV�$FWRU����H[WHQW�$FWRUV��NH\�QDPH����^��DWWULEXWH�VWULQJ�QDPH���UHODWLRQVKLS�VHW�0RYLH!�DFWHGB,Q����������������������LQYHUVH�0RYLH��FDVWV���DWWULEXWH�LQW�DJH���DWWULEXWH�VHW�VWULQJ!�GLUHFWHG�`��

2121

41

Spring 2000

Christophides Vassilis

XML vs. ODMG ODL: Example

�GE!

����PRYLH�LG ³P�´!

�������WLWOH!:DNLQJ�1HG�'LYLQH��WLWOH!

�������GLUHFWRU!.LUN�-RQHV�,,,��GLUHFWRU!

�������FDVW�LGUHIV ³D��D�´��!

�������EXGJHW!���������EXGJHW!������

�����PRYLH!

����PRYLH�LG ³P�´!

�������WLWOH!'UDJRQKHDUW��WLWOH!

�������GLUHFWRU!5RE�&RKHQ��GLUHFWRU!

�������FDVW�LGUHIV ³D��D��D��´�!

�������EXGJHW!���������EXGJHW!������

�����PRYLH!

����PRYLH�LG ³P�´!

�������WLWOH!0RRQGDQFH��WLWOH!

�������GLUHFWRU!'DJPDU�+LUW]��GLUHFWRU!

�������FDVW�LGUHIV ³D��D�´�!

�������EXGJHW!��������EXGJHW!������

�����PRYLH!

����DFWRU�LG ³D�´!�������QDPH!'DYLG�.HOO\��QDPH!�������DFWHGB,Q�LGUHIV ³P��P��P��´�!�����DFWRU!����DFWRU�LG ³D�´!��������QDPH!6HDQ�&RQQHU\��QDPH!��������DFWHGB,Q�LGUHIV ³P��P��P��´�!��������DJH!����DJH!�����DFWRU!����DFWRU�LG ³D�´!���������QDPH!,DQ�%DQQHQ��QDPH!���������DFWHGB,Q�LGUHIV ³P��P��´�!�����DFWRU!�������GE!

42

Spring 2000

Christophides Vassilis

XML vs. ODMG ODL: Example

��'2&7<3(�GE�>�����(/(0(17� GE������� �PRYLH���DFWRU��!�����(/(0(17� PRYLH�� �WLWOH��GLUHFWRU��FDVW��EXGJHW�!�����$77/,67�� PRYLH���LG���,'����5(48,5('!�����(/(0(17� WLWOH��������3&'$7$�!�����(/(0(17� GLUHFWRU���3&'$7$�!�����(/(0(17� FDVW� (037<!�����$77/,67� FDVW� LGUHIV���,'5()6���5(48,5('!�����(/(0(17� EXGJHW� ��3&'$7$�!�����(/(0(17� DFWRU� �QDPH��DFWHGB,Q��DJH"��GLUHFWHG �!�����$77/,67� DFWRU�� LG�,'��5(48,5('!�����(/(0(17� QDPH� ��3&'$7$�!�����(/(0(17� DFWHGB,Q�(037<!�����$77/,67� DFWHGB,Q�LGUHIV�,'5()6��5(48,5('!�����(/(0(17� DJH���������3&'$7$�!�����(/(0(17� GLUHFWHG���3&'$7$�!�@!

2222

43

Spring 2000

Christophides Vassilis

Mapping Between XML and Objects

44

Spring 2000

Christophides Vassilis

XML vs Relational DBMS

SURMHFWV�WLWOH��������EXGJHW�������PDQDJHG%\

HPSOR\HHV�QDPH�����VVQ�����DJH

��'2&7<3(�GE�>��(/(0(17�GE� ���������SURMHFWV��HPSOR\HHV�!��(/(0(17�SURMHFWV������SURMHFW �!��(/(0(17�HPSOR\HHV��HPSOR\HH �!��(/(0(17�SURMHFW�������WLWOH��EXGJHW��PDQDJHG%\�!��(/(0(17�HPSOR\HH���QDPH��VVQ��DJH�!���

@!

��'2&7<3(�GE�>��(/(0(17�GE������������SURMHFW�_�HPSOR\HH� !��(/(0(17�SURMHFW������WLWOH��EXGJHW��PDQDJHG%\�!��(/(0(17�HPSOR\HH��QDPH��VVQ��DJH�!���

@!

2323

45

Spring 2000

Christophides Vassilis

Recursive DTDs

�'2&7<3(�JHQHDORJ\�>��(/(0(17�JHQHDORJ\��SHUVRQ �!��(/(0(17�SHUVRQ���QDPH�

���������������������������������GDWH2I%LUWK�������������������������������SHUVRQ��������������������PRWKHU������������������������������SHUVRQ��������������������IDWKHU��!���

@!

● What is the problem with this?

46

Spring 2000

Christophides Vassilis

Recursive DTDs cont’d.

�'2&7<3(�JHQHDORJ\�>��(/(0(17�JHQHDORJ\��SHUVRQ �!��(/(0(17�SHUVRQ���QDPH��������������������������������GDWH2I%LUWK��������������������������������SHUVRQ"���������������������PRWKHU�������������������������������SHUVRQ"���!����������������IDWKHU���

@!

● What is now the problem with this?

2424

47

Spring 2000

Christophides Vassilis

Some Things are Hard to Specify

● Each employee element is to contain name, age and ssn elements insome order

��(/(0(17�HPSOR\HH�����QDPH��DJH��VVQ��_��DJH��VVQ��QDPH��_��VVQ��QDPH��DJH��_�«�!

● Suppose there were many more fields !

48

Spring 2000

Christophides Vassilis

Specifying ,' and ,'5()�Attributes

��'2&7<3(�IDPLO\�>��(/(0(17�IDPLO\����SHUVRQ� !��(/(0(17�SHUVRQ���QDPH�!��(/(0(17�QDPH������3&'$7$�!��$77/,67�SHUVRQ�LG�����������,'���������5(48,5('���������������������������PRWKHU���,'5()����,03/,('���������������������������IDWKHU�����,'5()����,03/,('������������������������FKLOGUHQ��,'5()6���,03/,('!

@!

2525

49

Spring 2000

Christophides Vassilis

Some Conforming XML data

�IDPLO\!�� �SHUVRQ��LG �MDQH���PRWKHU �PDU\��IDWKHU �MRKQ�!����� �QDPH!�-DQH�'RH���QDPH!�� ��SHUVRQ!�� �SHUVRQ�LG �MRKQ��FKLOGUHQ �MDQH�MDFN�!���� �QDPH!�-RKQ�'RH���QDPH!�� ��SHUVRQ!�� �SHUVRQ�LG �PDU\��FKLOGUHQ �MDQH��MDFN�!���� �QDPH!�0DU\�'RH���QDPH!�� ��SHUVRQ!�� �SHUVRQ��LG �MDFN���PRWKHU ´PDU\��IDWKHU �MRKQ�!����� �QDPH!�-DFN�'RH���QDPH!��� ��SHUVRQ!��IDPLO\!

50

Spring 2000

Christophides Vassilis

An Alternative XML DTD Specification

��'2&7<3(�IDPLO\�>��(/(0(17�IDPLO\����SHUVRQ� !��(/(0(17�SHUVRQ���PRWKHU"��IDWKHU"��FKLOGUHQ��QDPH�!��$77/,67�SHUVRQ�LG�,'��5(48,5('!��(/(0(17�QDPH������3&'$7$�!��(/(0(17�PRWKHU�(037<!��$77/,67�PRWKHU�LGUHI�,'5()��5(48,5('!��(/(0(17�IDWKHU�(037<!��$77/,67�IDWKHU�LGUHI�,'5()��5(48,5('!��(/(0(17�FKLOGUHQ�(037<!��$77/,67�FKLOGUHQ�LGUHIV�,'5()6��5(48,5('!@!

2626

51

Spring 2000

Christophides Vassilis

The Revised XML Data

�IDPLO\!�� �SHUVRQ��LG� ��MDQH´!

�QDPH!�-DQH�'RH���QDPH!������� �PRWKHU�LGUHI� ��PDU\´!��PRWKHU!������� �IDWKHU�LGUHI� ��MRKQ�!��IDWKHU!

��SHUVRQ!�� �SHUVRQ�LG� ��MRKQ´!

�QDPH!�-RKQ�'RH���QDPH!�FKLOGUHQ�LGUHIV� ��MDQH�MDFN�!���FKLOGUHQ!

������ ��SHUVRQ!���

��IDPLO\!

52

Spring 2000

Christophides Vassilis

Mapping between XML and Tables

2727

53

Spring 2000

Christophides Vassilis

Bluring the Frontiers between Data & Documents

54

Spring 2000

Christophides Vassilis

Towards XML-enabled DBMS

● Xml-enabled database system◆ Store XML data/documents into the

database server◆ Query and search valid and well-

formed XML◆ Generate XML data from the

database server◆ Add XML capabilities in supporting

database facilities● XML has the potential to impact four

important markets◆ Web integration◆ Web publishing◆ Application integration◆ Electronic commerce

DBMS

Store XML

GenerateXML

Integrate with other facilities

2828

55

Spring 2000

Christophides Vassilis

Storing XML Data

● Enhance XML storage facilities inthe database:◆ Utilities to load XML data into the

database◆ Provide more efficient database

storage (componentized storage,compression, indexing,…)

◆ XML export tools from the server◆ Allow server-to-server replication

of XML data

Database

HTMLHTML

XML

Database

56

Spring 2000

Christophides Vassilis

Querying and Searching XML Data

● Fine-grained access to XMLdocuments

● Search XML data efficiently◆Special SQL queries over

valid + well-formed XML◆Content-based indexing (e.g.

Text indexes) for searchingXML data efficiently

◆Support for XML querylanguages (e.g. XQL) on XMLdata

Database

HTMLHTML

XML

Web

2929

57

Spring 2000

Christophides Vassilis

Generating and Manipulating XML

● Generate XML from the database server

◆ Map ODMG, SQL92, SQL3 and PL/SQLdatatypes to XML

◆ Provide mappings between java, SQLand XML types

● Script XML content from the database

◆ Allow SQL queries to return XML results◆ Provide embedded XML in stored

procedures◆ Java scripting: support embedded XML

in java◆ Common APIs to access any XML

content in databases

DatabaseHTML

XML

Web

58

Spring 2000

Christophides Vassilis

DatabaseX

DatabaseY

XML XML

XMLSortedXMLTotal

3030

59

Spring 2000

Christophides Vassilis

● 1960’s: Data Centric

● 1970’s: Process Centric● 1980’s: Object Oriented● 1990’s: Component Based● 2000’s: XML?

Epilogue

60

Spring 2000

Christophides Vassilis

��¶V'DWD

�5HFRUG�/D\RXWV�3ULQWHU�/D\RXWV�6\VWHP�)ORZ�&KDUWV�'HFLVLRQ�7DEOHV

%DWFK�-REV�ZHUH�D�6HULHVRI�VPDOO�3URJUDPV

Data was our First Focus

3131

61

Spring 2000

Christophides Vassilis

��¶V'DWD

��¶V/RJLF

�*272�/HVV�3URJUDPPLQJ�6WUXFWXUHG�3URJUDPPLQJ�7RS�'RZQ�'HVLJQ

3URJUDPV�%HFDPH9HU\�/DUJH

Then we Focused on Logic

62

Spring 2000

Christophides Vassilis

��¶V'DWD

��¶V22

��¶V/RJLF

�&RPPRQ�7HUPV�IRU��$QDO\VLV�DQG�'HVLJQ�7LJKWO\�&RXSOHG�&RGH

&RGH�5HXVH�ZDV�WKH�+RO\�*UDLO�5DUHO\�$FKLHYHG

Object Oriented ProgrammingFocused on Runtime Behavior

3232

63

Spring 2000

Christophides Vassilis

��¶V'DWD

��¶V&RPS

��¶V/RJLF

��¶V226HULDOL]DWLRQ�7LHG�WR�&RGH

�&RGH�5HXVH�,'(�%DVHG�&RPSRVLWLRQ�/LPLWHG�$FFHSWDQFH

Component ProgrammingShifted the Focus to Interfaces

64

Spring 2000

Christophides Vassilis

��¶V;0/

��¶V/RJLF

��¶V22

��¶V&RPS

�;0/�:UDSSHUV�IRU�,QFRPSDWLEOH�6\VWHPV�,QGXVWU\�6SHFLILF�0DUNXS�/DQJXDJHV�;0/�IRU�3HUVLVWHQW�'DWD�DQG�&RPSRVLWLRQ

;0/�(QDEOHV�0LGGOHZDUHIRU�$SSOLFDWLRQ�6SHFLILF�'DWD

XML Returns the Focus to Data

3333

65

Spring 2000

Christophides Vassilis

BIBLIOGRAPHY

● Charles F. Goldfarb, Paul Prescod, Paper Michael, Leventhal, et al. “The XMLHandbook”. Printice Hall, 1998.

● David Megginson. “Structuring XML Documents”. Printice Hall, 1998.● Simon St. Laurent. “XML : Extensible Markup Language”. IDG Books, 1998.● Rick Jelliffe. “The XML and SGML Cookbook : Recipes for Structured

Information”. Printice Hall, 1998.● Simon St. Laurent. “Xml : A Primer”. IDG Books, 1998.● Steven Holzner. “XML Complete”. McGraw-Hill, 1997.● Richard Light, Tim Bray. “Presenting Xml”. Macmillan Publishing, 1997.● Bryan Pfaffengerger. “Web Publishing With XML in Six Easy Steps”, 1997.● Steven J. DeRose. “The SGML FAQ Book : Understanding the Foundation of

HTML and XML”. Kluwer Academic Publishers, 1997.● Sean McGrath. “XML by Example: Building E-Commerce Applications”. Printice

Hall, 1998● Charles F. Goldfarb, Steve Pepper, Chet Ensign. “SGML Buyer’s Guide : A

Unique Guide to Determining Your Requirements and Choosing the Right SGMLand XML Products and Services”. Printice Hall, 1998.