schema-free xquery based on the work of: yanyao li, cong yu and h.v.jagadish from the university of...

69
Schema-Free XQuery Schema-Free XQuery Based on the work of: Based on the work of: Yanyao Li, Cong Yu and Yanyao Li, Cong Yu and H.V.Jagadish H.V.Jagadish From the University of From the University of Michigan Michigan Presented by Gil Barash in the course SDBI 05’

Upload: lorraine-cunningham

Post on 18-Jan-2018

218 views

Category:

Documents


0 download

DESCRIPTION

XQuery XQuery is an XML Query Language. XQuery is an XML Query Language. Sometimes referred as the SQL of XML files. It is built on XPath expressions. It is supported by all major database engines. It will soon become a W3C standard.

TRANSCRIPT

Page 1: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Schema-Free XQuerySchema-Free XQueryBased on the work of: Based on the work of: Yanyao Li, Cong Yu and Yanyao Li, Cong Yu and

H.V.JagadishH.V.Jagadish From the University of MichiganFrom the University of Michigan

Presented by Gil Barash in the course SDBI 05’

Page 2: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

ContentContent What is XQueryWhat is XQuery The problem of Schema-Based The problem of Schema-Based

queriesqueries MLCASMLCAS Integrating MLCAS with XQueryIntegrating MLCAS with XQuery ConclusionConclusion

Page 3: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

XQueryXQuery

XQuery is an XML Query Language.XQuery is an XML Query Language. Sometimes referred as the SQL of

XML files. It is built on XPath expressions. It is supported by all major database

engines. It will soon become a W3C standard.

Page 4: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

XPathXPath

XPath is used to navigate through XPath is used to navigate through XML documents.XML documents.

In order for us to write an XQuery In order for us to write an XQuery query, we should first get familiar with query, we should first get familiar with XPath…XPath…

Page 5: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Bibliography XML (version Bibliography XML (version 1)1)

<bibliography> <bib> <year> 1999 </year> <book>

<title> SQL </title><author> Bob </author>

</book> <article>

<title> XML </title><author> Mary </author>

</article> </bib> … …</bibliography>

bibliography

bib

year articlebook1999 title author

SQL Bob

title author

XML Mary

bib

year articlebook2000 title author

D.B. David

title author

.NET Bill

Page 6: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

XPath - exampleXPath - example<bibliography> <bib> <year> 1999 </year> <book>

<title> SQL </title><author> Bob </author>

</book> <article>

<title> XML </title><author> Mary </author>

</article> </bib> <bib>

……

</bib></bibliography>

The expression: The expression: /bibliograph/bib/*/bibliograph/bib/*

Will return the nodes: Will return the nodes: <year> , <book> and <year> , <book> and <article><article>Look from

the root of the

document

Under the path “bibliography/bib”

For all child nodes

/ / bibliograph/bibbibliograph/bib /* /*

Page 7: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

XPath - exampleXPath - example

The expression: The expression: /bibliography//title/bibliography//title

Will return both the Will return both the titles “SQL” and “XML”titles “SQL” and “XML”

For all child nodes of the root which

are named “bibliography”

Look for any descendent (not

only direct children)For the nodes named “title”

/bibliography/bibliography //// titletitle

<bibliography> <bib> <year> 1999 </year> <book>

<title> SQL </title><author> Bob </author>

</book> <article>

<title> XML </title><author> Mary </author>

</article> </bib> <bib>

……

</bib></bibliography>

Page 8: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

XPath - exampleXPath - example<bibliography> <bib> <year> 1999 </year> <book>

<title> SQL </title><author> Bob </author>

</book> <article>

<title> XML </title><author> Mary </author>

</article> </bib> <bib>

……

</bib></bibliography>

The expression: The expression: //bib[1]//bib[1]

Will return the sub Will return the sub tree rooted by the first tree rooted by the first ‘bib’‘bib’

// // bib[1]bib[1]

Look somewhere in the document

For the 1st bib node

Page 9: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

XQuery queriesXQuery queries

FOR $x IN doc(“doc.xml”)/bibliography/bib/bookWHERE $x/author/text()=“Mary”RETURN $x/title

Suppose we want to find the title of the book of which Mary is an author.

Our Query will be:

Page 10: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

XQuery - exampleXQuery - example

For all sub trees (marked as $x) in the document “doc.xml” under the XPath: /bibliograyph/bib/book

FOR $x IN doc(“doc.xml”)/bibliography/bib/book

WHERE $x/author/text()=“Mary”If in the sub tree $x there is a path /author/ and the text of the node at the end of the path is “Mary”.

Page 11: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

XQuery - exampleXQuery - example

Return the node which is under the path /title from the $x sub tree.

RETURN $x/title

Page 12: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Bibliography XML (version Bibliography XML (version 1)1)

bibliography

bib

year articlebook1999 title author

SQL Mary

title author

XML Mary

bib

year articlebook2000 title author

D.B. David

title author

.NET Bill

FOR $x IN doc(“doc.xml”)/bibliography/bib/bookWHERE $x/author/text()=“Mary”RETURN $x/title

Page 13: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

XQuery - exampleXQuery - example

Suppose we want to find the Suppose we want to find the authorsauthors that that wrotewrote a book with Mary. a book with Mary.

bibliography

bib

year articlebook1999 title author

SQL Mary

title author

XML Mary

bib

year book2000 title author

D.B. Davidauthor

Bill

Page 14: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

XQuery - exampleXQuery - example Suppose we want to find the Suppose we want to find the authorsauthors

that that wrotewrote a book with Mary. a book with Mary.

FOR $b IN doc(“doc.xml”)/bibliography/bib/book, $a IN $b/authorWHERE $b/author/text()=“Mary” AND $a/text() != “Mary”RETURN $a

Page 15: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

XQuery - exampleXQuery - example FOR $b IN doc(“doc.xml”)/bibliography/bib/book, $a IN $b/author

For all sub trees (marked as $b) in the document “doc.xml” under the XPath: /bibliograyph/bib/book And all sub trees (marked as $a) in the tree $b under the XPath: /author

Ahhh… $b is a book and $a is an author of the book

Page 16: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

XQuery - exampleXQuery - example WHERE $b/author/text()=“Mary” AND $a/text() != “Mary”

If $b contains a path /author ending with “Mary” And $a isn’t “Mary”

RETURN $a

Return the sub tree $a

Page 17: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

ContentContent What is XQueryWhat is XQuery The problem of Schema-Based The problem of Schema-Based

queriesqueries MLCASMLCAS Integrating MLCAS with XQueryIntegrating MLCAS with XQuery ConclusionConclusion

Page 18: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

The Schema-Based problemThe Schema-Based problem Remember the first query?Remember the first query?

We wanted to find a We wanted to find a titletitle of a of a bookbook of of which Mary is an which Mary is an authorauthor..

We never said that it will be under We never said that it will be under the path the path /bibliography/bib/book/bibliography/bib/book

FOR $x IN doc(“doc.xml”)/bibliography/bib/bookWHERE $x/author/text()=“Mary”RETURN $x/title

Page 19: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

The Schema-Based problemThe Schema-Based problem FurthermoreFurthermore

Suppose we want to get the year of the book that Mary wrote… <bibliography>

<bib> <year> 1999 </year> <book>

<title> SQL </title><author> Mary </author>

</book> <article> …

Notice that the year of the book IS NOT a descendent node of the book node, but of the bib

node

Page 20: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

The Schema-Based problemThe Schema-Based problem

FOR $x in doc(“doc.xml”)/bibliography/bib/WHERE $x/book/author/text()=“Mary”RETURN $x/year

$x is now the bib node. If there exists a book written by Mary under that bib then the year of that

bib is returned

Before:Before:FOR $x IN doc(“doc.xml”)/bibliography/bib/bookWHERE $x/author/text()=“Mary”RETURN $x/title After:After:

((getting the titlegetting the title))

((getting the yeargetting the year))

Page 21: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

The Schema-Based problemThe Schema-Based problem We could have never written that We could have never written that

query without knowledge about the query without knowledge about the structure of the XML file.structure of the XML file.

The query we wrote will not work on The query we wrote will not work on other files, even if they represent the other files, even if they represent the same data, under a different same data, under a different structure.structure.

Page 22: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Bibliography XML (version Bibliography XML (version 2)2)

<bibliography> <bib> <book> <year> 1999 </year>

<title> SQL </title><author> Bob </author>

</book> <book> <year> 2000 </year>

<title> D.B. </title><author> David </author>

</article> </bib> … …</bibliography>

bibliography

bib

year

book

1999

title author

SQL Bob

bib

year

book

2000

title author

D.B. David

BeforeAfter

Page 23: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

The Schema-Based problemThe Schema-Based problem

FOR $x in doc(“doc.xml”)/bibliography/bib/WHERE $x/book/author/text()=“Mary”RETURN $x/year

bibliography

bib

year

book

1999

title author

SQL Bob

bib

year

book

2000

title author

D.B. David

Our query (getting the year) from before:Our query (getting the year) from before:

$x is a ‘bib’ node, and it has no child named year

Page 24: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

33 kinds of peoplekinds of people……

If the user has If the user has FULL knowledgeFULL knowledge of the of the structure, she can simply use XQuery.structure, she can simply use XQuery.

If the user has If the user has NO knowledgeNO knowledge of the of the structure, she can use keyword based structure, she can use keyword based queries (like XKeyword)queries (like XKeyword)

If the user has If the user has PARTIAL knowledgePARTIAL knowledge of the of the structure, she can use schema-free structure, she can use schema-free queries, queries, and make good use of her and make good use of her knowledge.knowledge.

Page 25: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Partial knowledgePartial knowledge Suppose you want to search all the Suppose you want to search all the

books about Albert Einstein…books about Albert Einstein… If you will be using a keyword based If you will be using a keyword based

search. You will enter the keyword search. You will enter the keyword “Albert Einstein”.“Albert Einstein”.

Now, what if you want all the books Now, what if you want all the books written by Albert Einstein? written by Albert Einstein?

Your query will not change. Even though Your query will not change. Even though you you knowknow what you are really looking what you are really looking for.for.

Page 26: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

XQuery with partial XQuery with partial knowledgeknowledge

Suppose we want to find the title and year of the publications of which Mary is

an author:FOR $a in doc(“doc.xml”)//author, $b in doc(“doc.xml”)//title, $c in doc(“doc.xml”)//yearWHERE $a/text()=“Mary”RETURN { $b , $c }

All we know are the

names of the nodes which

we are looking for

Page 27: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

XQuery with partial XQuery with partial knowledgeknowledgebibliography

bib

year articlebook1999 title author

SQL Mary

title author

XML Mary

bib

year articlebook2000 title author

D.B. David

title author

.NET Bill

FOR $a in doc(“doc.xml”)//author, $b in doc(“doc.xml”)//title, $c in doc(“doc.xml”)//yearWHERE $a/text()=“Mary”RETURN { $b , $c }

Page 28: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

ContentContent What is XQueryWhat is XQuery The problem of Schema-Based queriesThe problem of Schema-Based queries MLCASMLCAS

– LCALCA– MLCAMLCA– MLCASMLCAS

Integrating MLCAS with XQueryIntegrating MLCAS with XQuery ConclusionConclusion

Page 29: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

LCALCA We would like to guess which part of We would like to guess which part of

the XML document is relevant for our the XML document is relevant for our search.search.

By reducing the XML tree, we would By reducing the XML tree, we would get more precise answers and avoid get more precise answers and avoid wrong ones.wrong ones. bibliography

bib

year articlebook1999 title author

SQL Mary

title author

XML Mary

bib

year articlebook2000 title author

D.B. David

title author

.NET Bill

Page 30: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

LCALCA LLowest owest CCommon ommon AAncestorncestor

bibliography

bib

year articlebook1999 title author

SQL Bob

title author

XML Mary

What is the LCA of “title” and “author”?

Page 31: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

LCALCA Lowest Common AncestorLowest Common Ancestor

bibliography

bib

year articlebook1999 title author

SQL Bob

title author

XML Mary

The LCA of “author” and

“title”

“book” is the root of the tree we should look within.

Page 32: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

LCALCA Lowest Common AncestorLowest Common Ancestor

bibliography

bib

year articlebook1999 title author

SQL Bob

title author

XML Mary

The LCA of “author” and

“title”

“bib” doesn’t help us refine our search

Page 33: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

ContentContent What is XQueryWhat is XQuery The problem of Schema-Based queriesThe problem of Schema-Based queries MLCASMLCAS

– LCALCA– MLCAMLCA– MLCASMLCAS

Integrating MLCAS with XQueryIntegrating MLCAS with XQuery ConclusionConclusion

Page 34: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

MLCAMLCA Blindly computing the LCA might Blindly computing the LCA might

bring undesired results.bring undesired results. What we are looking for is:What we are looking for is:

MMeaningful eaningful LLowest owest CCommon ommon AAncestorncestor

Page 35: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Entity TypeEntity Type A Type of a node is it’s tag nameA Type of a node is it’s tag name

bibliography

bib

year articlebook1999 title author

SQL Bob

title author

XML Mary

Nodes of the “title” type

Page 36: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Meaningfully RelatedMeaningfully Related

A B

Consider two nodes “A” and “B”, of Consider two nodes “A” and “B”, of type “T1” and “T2” respectively.type “T1” and “T2” respectively.

If, we say that A and B are If, we say that A and B are meaningfully related. meaningfully related.

If, we say that A and B are If, we say that A and B are related, being related, being descendents of node C.descendents of node C.

So far, this is much like LCA…So far, this is much like LCA…A B

C

Page 37: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Meaningfully RelatedMeaningfully Related There is an exception to the second case:There is an exception to the second case:

Suppose that node B* is of the same type as B

A B*

C

B

D

In this case, nodes “A” and “B” are NOT meaningfully related.

Author Title

book

Title

bib

Page 38: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

MLCAMLCA So we say that a node “D” is the MLCA So we say that a node “D” is the MLCA

of nodes “A” and “B” if:of nodes “A” and “B” if:– ““D” is a common ancestor of nodes “A” D” is a common ancestor of nodes “A”

and “B”.and “B”.– There is no node “C” that is the LCA of There is no node “C” that is the LCA of

types “T1” and “T2” which is a types “T1” and “T2” which is a descendent of node “D”descendent of node “D”

A B*

C

B

D X

Page 39: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

MLCAMLCA For multiple nodes, we require that For multiple nodes, we require that

all the subsets will have a MLCA and all the subsets will have a MLCA and that the MLCA of the whole set will that the MLCA of the whole set will be an ancestor of the MLCAs of the be an ancestor of the MLCAs of the subsets.subsets.

year bookbook2000 title author

D.B. David

title author

.NET Bill

bib

For example, if we are looking at the types: year, title and authorbib is the

MLCA of the types: year,

title and author

book is the MLCA of the types: title and author

Page 40: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

MLCAMLCA

FOR $a in doc(“doc.xml”)//author, $b in doc(“doc.xml”)//title, $c in doc(“doc.xml”)//yearWHERE $a/text()=“Mary”RETURN { $b , $c }

Lets’ try the query again…Lets’ try the query again…

Page 41: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Bibliography XMLBibliography XML

bibliography

bib

year articlebook1999 title author

SQL Bob

title author

XML Mary

bib

year articlebook2000 title author

D.B. David

title author

.NET Bill

““bib” is the MLCA of “author”, “title” bib” is the MLCA of “author”, “title” and “year”and “year”

““author” = Maryauthor” = Mary

year1999

year1999 title

SQL

title

XML

FOR $a in doc(“doc.xml”)//author, $b in doc(“doc.xml”)//title, $c in doc(“doc.xml”)//yearWHERE $a/text()=“Mary”RETURN { $b , $c }

Page 42: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

ContentContent What is XQueryWhat is XQuery The problem of Schema-Based queriesThe problem of Schema-Based queries MLCASMLCAS

– LCALCA– MLCAMLCA– MLCASMLCAS

Integrating MLCAS with XQueryIntegrating MLCAS with XQuery ConclusionConclusion

Page 43: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

MLCASMLCAS The result of the query was almost right.The result of the query was almost right. The problem was that “bib” is the MLCA of The problem was that “bib” is the MLCA of

several groups of nodes which satisfy the several groups of nodes which satisfy the query.query.

To solve this, we use:To solve this, we use:MMeaningful eaningful LLowest owest CCommon ommon AAncestor ncestor SStructuretructure

bib

year articlebook1999 title author

SQL Bob

title author

XML Mary

year1999

year1999 title

SQL

title

XML

Nodes Nodes requested:requested:

TitleTitle AuthorAuthor YearYear

Page 44: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

MLCASMLCAS Given a set of types {tGiven a set of types {t11 …t …tmm} from } from

the querythe query MLCAS is a set of nodes {r, aMLCAS is a set of nodes {r, a11, … , , … ,

aamm}} Where {aWhere {a11 … a … amm} are nodes } are nodes

matching the types {tmatching the types {t11 …t …tmm} } And r is the MLCA of {aAnd r is the MLCA of {a11 … a … amm} }

Page 45: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

MLCAS exampleMLCAS example We are looking for the types: We are looking for the types: AuthorAuthor, ,

TitleTitle and and YearYear.. Set of nodes matching those types:Set of nodes matching those types:

The MLCA of the set:The MLCA of the set:bibliography

year bookbook1999 title author

SQL Bob

title author

XML Mary

year articlebook2000 title author

D.B. David

title author

.NET Bill

{David, SQL, 1999}There is none

bib nodes are the MLCA of the types: Author,

Title, Yearbibliography is the LCA of the nodes: David, SQL, 1999

So this set isn’t good for us

So this set is good for us

bib[2]bib[1]

{Mary, SQL, 1999}

book is the MLCA of the types:

Title, Author

bib is the LCA of the nodes:

Mary, SQL

{Bob, SQL, 1999}bib[2]

Page 46: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

MLCAS query exampleMLCAS query exampleFOR $a in doc(“doc.xml”)//year, $b in doc(“doc.xml”)//title, $c in doc(“doc.xml”)//authorWHERE $c/text()=“Mary”RETURN { $a , $b }

bib

year articlebook1999 title author

SQL Bob

title author

XML Mary

year1999

year1999 title

SQL

title

XML

bibbib

author

Bob

author

Mary

Page 47: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Other work on creating meaningful Other work on creating meaningful resultsresults

““Integrating Keyword Search into XML Integrating Keyword Search into XML Query Processing (XML-QL)” - Daniela Query Processing (XML-QL)” - Daniela Florescu and Ioana Manolescu from INRIA Florescu and Ioana Manolescu from INRIA Rocquencourt, France and Donald Rocquencourt, France and Donald Kossmann from UnivKossmann from Univ. . of Passau, Germany. of Passau, Germany. – Use of hierarchical location in the XML (at what Use of hierarchical location in the XML (at what

level the keyword should be).level the keyword should be).– Use of semantical location in the XML (tag name, Use of semantical location in the XML (tag name,

CDATA, attribute …)CDATA, attribute …)– Use of the user’s knowledge of the structure of Use of the user’s knowledge of the structure of

the XML file (Ex: if she knows that books are the XML file (Ex: if she knows that books are under the bib tag she can ask for those elements under the bib tag she can ask for those elements only).only).

Page 48: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

“XSEarch: A Semantic Search Engine for XML” - Sara Cohen, Jonathan Mamou, Yaron Kanza and Yehoshua Sagiv from the Hebrew University.– Enables the user to specify a tag name under

which the keyword should be found.– Use of the fact that if the shortest path between

two elements goes through the same tag name more than once, they are probably not meaningfully related.

– Gives ranking to the results.

Other work on creating meaningful Other work on creating meaningful resultsresults

bookbooktitle author

D.B. David

title author

.NET Bill

bib

Page 49: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

ContentContent What is XQueryWhat is XQuery The problem of Schema-Based The problem of Schema-Based

queriesqueries MLCASMLCAS Integrating MLCAS with XQueryIntegrating MLCAS with XQuery

– mlcasmlcas– ExpandExpand

ConclusionConclusion

Page 50: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Integrating MLCAS with Integrating MLCAS with XQueryXQuery

In order for us to integrate MLCAS In order for us to integrate MLCAS into XQuery we will introduce a new into XQuery we will introduce a new function into XQuery: function into XQuery: mlcas mlcas (surprising, (surprising, isn't it?)isn't it?)

Whenever we want to make sure that Whenever we want to make sure that the nodes exist in an MLCAS, we will the nodes exist in an MLCAS, we will add the condition: add the condition: existsexists mlcas ($a, mlcas ($a, $b, $c)$b, $c)((existsexists is a keyword in XQuery) is a keyword in XQuery)

Page 51: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Query example number 1Query example number 1 Find the Find the titletitle and and yearyear of the publications of the publications

of which of which MaryMary is an is an authorauthor..

for $a in doc(“doc.xml”)//author, $b in doc(“doc.xml”)//title, $c in doc(“doc.xml”)//yearwhere $a/text() = “Mary” and exists mlcas ($a, $b, $c)return { $b, $c }

This will make sure that the “author”, “title” and “year” that we get, are

really of the same publication

Page 52: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Query example number 2Query example number 2 Find additional Find additional authorsauthors of the publications, of the publications,

of which of which MaryMary is an is an authorauthor

for $a in doc(“doc.xml”)//author, $b in doc(“doc.xml”)//authorwhere $a/text() = “Mary” and $a != $b and exists mlcas ($a, $b)return $b

This will make sure that both the

authors are really of the same publication

Page 53: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Query example number 3Query example number 3 Find Find yearyear and and authorauthor of the of the

publications with similar publications with similar titletitles to a s to a publication of which publication of which MaryMary is an is an authorauthor

for $a in doc(“doc.xml”)//author, $t in doc(“doc.xml”)//title, $y in doc(“doc.xml)//year, $t2 in {

for $aM in doc(“doc.xml”)//author, $tM in doc(“doc.xml”)//titlewhere $aM/text() = “Mary” and exists mlcas($aM, $tM)return $tM }

where $t ≈ $t2 and exists mlcas ($y, $a, $t)return { $y, $a }

Page 54: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Not integrated enoughNot integrated enough??

The user who will want to use the The user who will want to use the MLCAS feature will have to add the MLCAS feature will have to add the line:line:andand existsexists mlcas($a, $b, …) mlcas($a, $b, …)to the where statement.to the where statement.

This might not be simple enough, This might not be simple enough, especially when changing an already especially when changing an already existing query.existing query.

Page 55: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

The The mlcasmlcas keyword keyword The keyword The keyword mlcasmlcas will be used to will be used to

ask the system to use MLCAS when ask the system to use MLCAS when choosing nodes:choosing nodes:for $a in mlcas doc(“doc.xml”)//author, $b in mlcas doc(“doc.xml”)//titlewhere $a/text() = “Mary”return $b

and exists mlcas ($a, $b)

Page 56: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Some we knowSome we know Suppose you do know that you are Suppose you do know that you are

interested only in the first ‘bib’ nodeinterested only in the first ‘bib’ node You can make use of your knowledge…You can make use of your knowledge…

bibliography

bib

year articlebook1999 title author

SQL Bob

title author

XML Mary

bib

year articlebook2000 title author

D.B. David

title author

.NET Bill

Page 57: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Some we knowSome we know

for $b in doc(“doc.xml”)//bib[1], $a in mlcas $b//author, $t in mlcas $b//titlereturn { $a , $t }

bibliography

bib

year articlebook1999 title author

SQL Bob

title author

XML Mary

bib

year articlebook2000 title author

D.B. David

title author

.NET Bill

Page 58: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Some we knowSome we know

bibliography

bib

year articlebook1999 title author

SQL Bob

title author

XML Mary

bib

year articlebook2000 title author

D.B. David

title author

.NET Bill

for $b in doc(“doc.xml”)//bib[1], $a in mlcas $b//author, $t in mlcas $b//titlereturn { $a , $t }

Page 59: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

ContentContent What is XQueryWhat is XQuery The problem of Schema-Based The problem of Schema-Based

queriesqueries MLCASMLCAS Integrating MLCAS with XQueryIntegrating MLCAS with XQuery

– mlcasmlcas– ExpandExpand

ConclusionConclusion

Page 60: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Many ways to say…Many ways to say… There are different tag names that There are different tag names that

represent the same thing.represent the same thing.Author:Author: AuthorAuthor / / WriterWriter / / AuAuTitle:Title: Title Title // Name Name // Headline Headline

Less then 20% choose the same term Less then 20% choose the same term for a single well known object.for a single well known object.

Our partial knowledge of the XML file Our partial knowledge of the XML file will still have to be accurate of how it will still have to be accurate of how it tags the information we want.tags the information we want.

Page 61: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

The The expandexpand keyword keyword To solve this issue, we will include To solve this issue, we will include

yet another keyword: yet another keyword: expandexpand Whenever we are not sure of the Whenever we are not sure of the

exact tag name, we could use the exact tag name, we could use the expand keyword to find it for us.expand keyword to find it for us.for $a in mlcas doc(“doc.xml”)//expand(author), $b in mlcas doc(“doc.xml”)//titlewhere $a/text() = “Mary”return $b

Page 62: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

The The expandexpand keyword keyword The synonyms of a word can be found The synonyms of a word can be found

using a domain-specific thesaurus using a domain-specific thesaurus (developed by domain experts or (developed by domain experts or WordNet).WordNet).

Another application is an ontology-Another application is an ontology-driven hierarchical thesaurus. For driven hierarchical thesaurus. For example, use the word “publication” to example, use the word “publication” to get both “book” and “article” tags.get both “book” and “article” tags.

Think of other applications where this Think of other applications where this can useful. (google?)can useful. (google?)

Page 63: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Ontology-based Query Ontology-based Query ProcessingProcessing

An Ontology for Domain-oriented Semantic An Ontology for Domain-oriented Semantic Similarity Search On XML Data - Similarity Search On XML Data - Anja Theobald from the university of the Saarland, Germany.– Use of tag name and keyword similarity.– Use of WordNet and Google to give a ranking to

how similar objects are. WordNet is used to get synonyms or broader terms Google is used to get a rank of how close two terms

are– Gives ranking to the results.

Page 64: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Ontology-based Query Ontology-based Query ProcessingProcessing

Taken from “The Index Based XXL Search Engine for Querying XML Data with Relevance Ranking” by:

Anja Theobald, Gerhard WeikumUniversity of the Saarland, Germany

Page 65: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Ontology-based Query Processing(taken from a presentation of Anja Theobald - 26.02.03)

XXL Query:

... WHERE #.~universe AS U AND U.#.~appearance AS A AND U.#.S ~ „star“

sim(universe,

galaxy)

0.94

1.0

sim(star, sun) * tfidf(sun)0.43

XXL Query Representation:

~universe

~appearance

% %

~ “star”

1.0

sim(app, app)

1.0

XML Data Graph:

galaxy

object

“…light and heat…”

description

sun

appearance

location

history

Page 66: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

ContentContent What is XQueryWhat is XQuery The problem of Schema-Based The problem of Schema-Based

queriesqueries MLCASMLCAS Integrating MLCAS with XQueryIntegrating MLCAS with XQuery ConclusionConclusion

Page 67: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

ConclusionConclusion We wanted to find a way to get We wanted to find a way to get

accurate results from an XML file accurate results from an XML file which it’s structure we don’t know.which it’s structure we don’t know.

We used the MLCAS concept to get We used the MLCAS concept to get meaningful results.meaningful results.

We integrated the ability into an We integrated the ability into an already existing query language.already existing query language.

Page 68: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Thank youThank youQuestions?Questions?

Page 69: Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented

Computing MLCASComputing MLCAS One could implement MLCAS computation using the definition One could implement MLCAS computation using the definition

of MLCAS:of MLCAS:– ““D” is a MLCA for nodes “A” and “B” of types “T1” and “T2” D” is a MLCA for nodes “A” and “B” of types “T1” and “T2”

respectively. If:respectively. If: ““D” is a common ancestor of nodes “A” and “B”.D” is a common ancestor of nodes “A” and “B”. There is no node “C” that is the LCA of types “T1” and “T2” which is a There is no node “C” that is the LCA of types “T1” and “T2” which is a

descendent of node “D”descendent of node “D” Take each pair {n1, n2} when “n1” and “n2” are of types “T1” Take each pair {n1, n2} when “n1” and “n2” are of types “T1”

and “T2” respectively.and “T2” respectively. Find their LCA by going up from both the nodes till you find a Find their LCA by going up from both the nodes till you find a

common ancestor. And produce a tree, rooted by the LCA, with common ancestor. And produce a tree, rooted by the LCA, with n1 and n2 as it’s leaves.n1 and n2 as it’s leaves.

For each pair of trees that you found (TA and TB), if the root of For each pair of trees that you found (TA and TB), if the root of TA is a descendent of the root of TB, remove TB.TA is a descendent of the root of TB, remove TB.– Because TB contradicts the second rule:Because TB contradicts the second rule:

There is no node “C” that is the LCA of types “T1” and “T2” which is a There is no node “C” that is the LCA of types “T1” and “T2” which is a descendent of node “D”descendent of node “D”