what is xml?
DESCRIPTION
What Is XML?. eXtensible Markup Language for data Standard for publishing and interchange “Cleaner” SGML for the Internet Applications: Data exchange over intranets, between companies E-business Native file formats (Word, SVG) Publishing of data Storage format for irregular data …. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/1.jpg)
1
What Is XML?
• eXtensible Markup Language for data– Standard for publishing and interchange– “Cleaner” SGML for the Internet
• Applications:– Data exchange over intranets, between companies– E-business– Native file formats (Word, SVG)– Publishing of data– Storage format for irregular data– …
![Page 2: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/2.jpg)
2
How Does it Look?
– Emerging format for data exchange on the web and between applications.
<db> <book> <title>Complete Guide to DB2</title> <author>Chamberlin</author> </book> <book> <title>Transaction Processing</title> <author>Bernstein</author> <author>Newcomer</author> </book> <publisher> <name>Morgan Kaufman</name> <state>CA</state> </publisher></db>
![Page 3: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/3.jpg)
3
XML Terminology• tags: book, title, author, …• start tag: <book>, end tag: </book>• elements: <book>…<book>,<author>…</author>• elements are nested• empty element: <red></red> abbrv. <red/>• an XML document: single root element
well formed XML document: if it has matching tags
![Page 4: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/4.jpg)
4
Attributes and References
<db> <book ID="b1" pub="mkp" year=1992> <title>Complete Guide to DB2</title> <author>Chamberlin</author> </book> <book ID="b2" pub="mkp" year=1997> <title>Transaction Processing</title> <author>Bernstein</author> <author>Newcomer</author> </book> <publisher ID="mkp"> <name>Morgan Kaufman</name> <state>CA</state> </publisher></db>
XML distinguishes attributes from sub-elements. ID’s and IDREFs are used to reference objects.
oids and references in XML are just syntax
![Page 5: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/5.jpg)
5
What’s Special about XML?
• Supported by almost everyone• Easy to parse (even with no info about the doc)• Can encode data with little or much structure• Supports data references inside & outside
document• Presentation layer for publishing (XSL)• Human readable. No need for proprietary formats
anymore.• Many, many tools
![Page 6: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/6.jpg)
6
Origin of XML • Comes from SGML (very nasty language).
• Principle: separate the data from the graphical presentation.
<UL> <li> <b> Complete Guide to DB2 </b> By <i> Chamberlin </i>.
<li> <b> Transaction Processing </b> By <i> Bernstein and Newcomer </i>
<li> <b> The guide to the good lifethrough database research. </b> By <i> Alon Levy </i> <UL>
![Page 7: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/7.jpg)
7
XML, After the roots• A format for sharing data.• Applications:
– EDI: electronic data exchange:• Transactions between banks• Producers and suppliers sharing product data (auctions)• Extranets: building relationships between companies• Scientists sharing data about experiments.
– Sharing data between different components of an application.– Format for storing all data in Office 2000.
• Basis for data sharing and integration.
![Page 8: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/8.jpg)
8
Why are we DB’ers interested?
• It’s data, stupid. That’s us.• Proof by Altavista:
– database+XML -- 40,000 pages.
• Database issues:– How are we going to model XML? (graphs).– How are we going to query XML? (XML-QL)– How are we going to store XML (in a relational database?
object-oriented?)– How are we going to process XML efficiently? (uh…
well..., um..., ah..., get some good grad students!)
![Page 9: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/9.jpg)
9
Document Type Descriptors
<!ELEMENT Book (title, author*) >
<!ELEMENT title #PCDATA> <!ELEMENT author (name, address,age?)>
<!ATTLIST Book id ID #REQUIRED> <!ATTLIST Book pub IDREF #IMPLIED>
Sort of like a schema but not really.
Inherited from SGML DTD standard
BNF grammar establishing constraints on element structure and content
Definitions of entities
![Page 10: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/10.jpg)
10
Shortcomings of DTDs
Useful for documents, but not so good for data:• No support for structural re-use
– Object-oriented-like structures aren’t supported
• No support for data types– Can’t do data validation
• Can have a single key item (ID), but:– No support for multi-attribute keys
– No support for foreign keys (references to other keys)
– No constraints on IDREFs (reference only a Section)
![Page 11: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/11.jpg)
11
XML Schema
• In XML format• Includes primitive data types (integers, strings,
dates, etc.)• Supports value-based constraints (integers > 100)• User-definable structured types• Inheritance (extension or restriction)• Foreign keys• Element-type reference constraints
![Page 12: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/12.jpg)
12
Sample XML Schema<schema version=“1.0”
xmlns=“http://www.w3.org/1999/XMLSchema”><element name=“author” type=“string” /><element name=“date” type = “date” /><element name=“abstract”> <type> … </type></element><element name=“paper”> <type> <attribute name=“keywords” type=“string”/> <element ref=“author” minOccurs=“0” maxOccurs=“*” /> <element ref=“date” /> <element ref=“abstract” minOccurs=“0” maxOccurs=“1” /> <element ref=“body” /> </type></element></schema>
![Page 13: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/13.jpg)
13
Subtyping in XML Schema<schema version=“1.0”
xmlns=“http://www.w3.org/1999/XMLSchema”><type name=“person”> <attribute name=“ssn”> <element name=“title” minOccurs=“0” maxOccurs=“1” /> <element name=“surname” /> <element name=“forename” minOccurs=“0” maxOccurs=“*” /></type><type name=“extended” source=“person”
derivedBy=“extension”> <element name=“generation” minOccurs=“0” /></type><type name=“notitle” source=“person”
derivedBy=“restriction”> <element name=“title” maxOccurs=“0” /></type><key name=“personKey”> <selector>.//person[@ssn]</selector> <field>@ssn</field></key></schema>
![Page 14: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/14.jpg)
14
Important XML Standards• XSL/XSLT*: presentation and transformation
standards• RDF: resource description framework (meta-info
such as ratings, categorizations, etc.)• Xpath/Xpointer/Xlink*: standard for linking to
documents and elements within• Namespaces: for resolving name clashes• DOM: Document Object Model for manipulating
XML documents• SAX: Simple API for XML parsing
•This weekend, somewhere in Germany, a W3C committee is meeting to discuss standard query language.
![Page 15: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/15.jpg)
15
XML Data Model (Graph)
bookb1
b2
title authorauthor
author
pcdata
Com plete... P rincip les...Cham berlin Bernstein Newcom er
pcdata pcdata pcdata pcdata
publisher
nam e state
CAM organ...
pcdata pcdata
pub pub
db
m kp
#1 #2 #3 #4 #5 #6 #7
#0
book
title
Issues:• distinguish between attributes and sub-elements?• Should we conserve order?
Think of the labels asnames of binary relations.
![Page 16: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/16.jpg)
16
Comparison with Relational Data
• No strict typing• Arbitrary nesting• Data can be irregular• Schema is part of the data
n a m e p h o n e
J o h n 3 6 3 4
S u e 6 3 4 3
D i c k 6 3 6 3
row row row
name name namephone phone phone
“John” 3634“Sue” “Dick”6343 6363
![Page 17: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/17.jpg)
17
Querying XML
• Requirements:– Query a graph, not a relation.– The result should be a graph (representing an
XML document), not a relation.– No schema.– We may not know much about the data, so we
need to navigate the XML.
![Page 18: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/18.jpg)
18
Query Languages
• First, there was XQL (from Microsoft). • Very quickly realized that it was very limited. • Then, a bunch of database researchers looked at
XML and invented XML-QL.– XML-QL comes from the nicer StruQL language.
– Many people got excited. Formed a committee.
• Last week: Quilt, a new language combining the best of XML-QL and XQL. Stay tuned.
![Page 19: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/19.jpg)
19
Extracting Data by Query
• Matching data using elements patterns.WHERE <book>
<publisher><name>Addison-Wesley</></>
<title> $t </>
<author> $a </>
</book> IN “www.a.b.c/bib.xml”
CONSTRUCT $a
![Page 20: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/20.jpg)
20
Constructing XML Data
WHERE <book>
<publisher><name>Addison-Wesley</></>
<title> $t </>
<author> $a </>
</> IN “www.a.b.c/bib.xml
CONSTRUCT <result>
<author> $a </>
<title> $t</>
</>
![Page 21: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/21.jpg)
21
Grouping with Nested Queries
WHERE <book>
<title> $t </>,
<publisher><name>Addison-Wesley</></>
</> CONTENT_AS $p IN “www.a.b.c/bib.xml”
CONSTRUCT <result>
<titre> $t </>
WHERE <author> $a </> IN $p
CONSTRUCT <auteur> $a</>
</>
![Page 22: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/22.jpg)
22
Joining Elements by ValueWHERE
<article> <author> <firstname> $f </> <lastname> $l </>
</> </> ELEMENT_AS $e IN “www.a.b.c/bib.xml”
<book year=$y> <author>
<firstname> $f </> <lastname> $l </>
</> </> IN “www.a.b.c/bib.xml” , y > 1995
CONSTRUCT $e
Find all articles whose writers also published a book after 1995.
![Page 23: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/23.jpg)
23
Tag Variables
WHERE <article> <author>
<firstname> $f </> <lastname> $l </>
</> </> ELEMENT_AS $e IN “www.a.b.c/bib.xml”
<$t year=$y> <author>
<firstname> $f </> <lastname> $l </>
</> </> IN “www.a.b.c/bib.xml” , y > 1995
CONSTRUCT $e Find all articles whose writers have done something
after 1995.
![Page 24: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/24.jpg)
24
Regular Path Expressions
WHERE
<part*>
<name>$r</>
<brand>Ford</> </>
IN "www.a.b.c/bib.xml"
CONSTRUCT
<result>$r</>Find all parts whose brand is Ford, no matter what level
they are in the hierarchy.
![Page 25: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/25.jpg)
25
Regular Path Expressions
WHERE
<part+.(subpart|component.piece)>$r</>
IN "www.a.b.c/parts.xml"
CONSTRUCT
<result> $r </>
![Page 26: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/26.jpg)
26
XML Data Integration
WHERE <person>
<name></> ELEMENT_AS $n
<ssn> $ssn </>
</> IN “www.a.b.c/data.xml”
<taxpayer>
<ssn> $ssn </>
<income></> ELEMENT_AS $I
</> IN “www.irs.gov/taxpayers.xml”
CONSTRUCT <result> $n $I </>
Query can access more than one XML document.
![Page 27: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/27.jpg)
27
Skolem Functions in XML-QL
where <book language = $l> <author> $a </> </> in “www.a.b.c/bib.xml”construct <result> <author id=F($a)> $a</> <lang> $l </> </>
where <book language = $l> <author> $a </> </> in “www.a.b.c/bib.xml”construct <result> <author id=F($a)> $a</> <lang> $l </> </>
<result> <author>Smith</author> <lang>English</lang> <lang>Mandarin</lang> </result><result> <author>Doe</author> <lang>English</lang> </result>
![Page 28: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/28.jpg)
28
Query Processing For XML• Approach 1: store XML in a relational database.
Translate an XML-QL query into a set of SQL queries.– Leverage 20 years of research & development.
• Approach 2: store XML in an object-oriented database system.– OO model is closest to XML, but systems do not perform
well and are not well accepted.
• Approach 3: build an entire DBMS tailored to XML.– Still in the research phase.
![Page 29: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/29.jpg)
29
&o1
&o3
&o2
&o4 &o5
paper
title author authoryear
&o6
“The Calculus” “…” “…” “1986”
Store XML in Ternary Relation
[Florescu, Kossman 1999]
S o u r c e L a b e l D e s t
& o 1 p a p e r & o 2& o 2 t i t l e & o 3& o 2 a u t h o r & o 4& o 2 a u t h o r & o 5& o 2 y e a r & o 6
N o d e V a l u e
& o 3 T h e C a l c u l u s& o 4 …& o 5 …& o 6 1 9 8 6
Ref
Val
![Page 30: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/30.jpg)
30
Use DTD to derive Schema
• DTD:
• ODMG classes:
• [Christophides et al. 1994 , Shanmugasundaram et al. 1999]
<!ELEMENT employee (name, address, project*)><!ELEMENT address (street, city, state, zip)>
class Employee public type tuple (name:string, address:Address, project:List(Project))class Address public type tuple (street:string, …)
![Page 31: What Is XML?](https://reader031.vdocuments.site/reader031/viewer/2022020922/56813c52550346895da5d33d/html5/thumbnails/31.jpg)
31
The Future
• Many research problems remain:– Efficient storage of XML– How to leverage relational DBMS– Update formalisms– Processing streaming data– Transactions– Everything else we think about in databases.