1 a semantic approach to rewriting queries for integrated xml data xia yang 1, mong li lee 1, tok...

41
1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1 , Mong Li Lee 1 , Tok Wang Ling 1 , Gillian Dobbie 2 1 School of Computing, National University of Singapore 2 Department of Computer Science, The University of Auckland, New Zealand

Upload: miranda-todd

Post on 18-Jan-2018

218 views

Category:

Documents


0 download

DESCRIPTION

3 Outline Background Preliminary Definitions Query Rewriting Comparison with Related Work Conclusion

TRANSCRIPT

Page 1: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

1

A Semantic Approach to Rewriting Queries for Integrated XML Data

Xia Yang1, Mong Li Lee1, Tok Wang Ling1, Gillian Dobbie2

1School of Computing, National University of Singapore2Department of Computer Science, The University of Auckland, New Zealand

Page 2: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

2

The Problem

S1 S2 S12

Local schema S1, S2, Integrated schema S12

.

b o o k

is b n a u th o r t it le+

b o o k

is b n a u th o r y e a r+

b o o k

is b n a u th o r y e a r+

t it le

Return the title of the books in the integrated databasewhere the authors name is “Tom” and the publication year is “2000”

Page 3: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

3

Outline

Background Preliminary Definitions Query Rewriting Comparison with Related Work Conclusion

Page 4: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

4

Background In data integration, many systems construct a global or

mediated schema from numerous heterogeneous data sources.

The local sources are XML repositories. When XML repositories are involved in data integration, query

rewriting algorithms will need to take into consideration the hierarchical structures of XML schemas.

XML schema languages such as DTD and XML Schema lack the necessary semantic information.

Our approach utilizes the ORA-SS model which provides the semantic information needed for the query rewriting process.

Page 5: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

5

Preliminary Definitions

XML Schema Model ORA-SS schema diagrams

XML Query Language XQuery

Integrating ORA-SS Schemas Mapping table Query allocation table

Page 6: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

6

ORA-SS model The ORA-SS model (Object-Relationship-Attribute model for SemiStructured

data)[1] is a semantically rich data model that has been designed for semistructured data.

The ORA-SS model distinguishes between object classes, relationship types and attributes.

The main contributions are that: relationship type is expressed explicitly the degree of the relationship type expresses the number of object classes involved in the

relationship type the attributes are classified as attributes of object class and attributes of relationship type.

The semantically rich ORA-SS model makes it possible to get the correct result in query rewriting for integrated XML data.

1. Tok Wang Ling, Mong Li Lee, and Gillian Dobbie. Semistructured Database Design. Springer, 2005

Page 7: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

7

ORA-SS model example

An example of ORA-SS schema diagram

p ar t

s u p p lie rp n o

s n o q u a n t ity

jp s ,3,1:n ,1:n

jp s

p r o jec t

jn ojp ,2,1:n ,1:n

f u n d s

u n o

p r o jec t m an ag er

n a memn o

jp m,2,1:n ,1:n

jf,2,1:n ,1:n

Page 8: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

8

Integrating ORA-SS Schemas Integrating ORA-SS Schemas

The algorithm to correctly integrate ORA-SS schemas was presented in a previous ER conference [2].

S1 S2 S12

Local schema S1, S2, Integrated schema S12

2. X. Yang, M.L. Lee, T.W. Ling. Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach. ER 2003.

b o o k

is b n a u th o r t it le+

b o o k

is b n a u th o r y e a r+

b o o k

is b n a u th o r y e a r+

t it le

Page 9: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

9

Mapping table

A mapping table is created during the process of integrating the various local schemas S1, S2, …, Sn into an integrated schema S.

The mapping table contains the mappings between the object classes and attributes in the local source schemas and the object classes and attributes in the integrated schema.

Page 10: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

10

Mapping table example

S5 Integrated Schema S12345

p ain tin g

p n a me a rt is t

m u s eu m

ar t is tmn a me

a n a me

f u n d s

fn o

ma ,2,1:n ,1:nmf,2,1:n ,1:n

m u s eu m

p ain tin g

mn a me

p n a me ar tis t

a n a me

s c u lp tu r e

s n a mear t is t

a n a me

ar tis t

a n a me

s p o n s o r

s n a me f u n d s

fn o

mp ,2,1:n ,1:1 ms ,2,1:n ,1:1 ma ,2,1:n ,1:n

mo ,2,1:n ,1:n

p a ,2,1:n ,1:n s a ,2,1:n ,1:n o f,2,1:n ,1:n

ar tis t

p a in tin ga n a me

p n a me

p a ,2,1:n ,1:n

m u s eu m

mn a me s c u lp tu r e

s n a me ar t is t

a n a me

s p o n s o r

s n a me f u n d s

fn o

ms ,2,1:n ,1:1 mo ,2,1:n ,1:n

s a ,2,1:n ,1:n o f,2,1:n ,1:n

m u s eu m

p a in tin gmn a me

p n a me

mp ,2,1:n ,1:1

S1 S2 S3 S4

Page 11: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

11

Mapping table exampleIntegrated schema Local schemaS12345/museum S1/museum, S3/museum, S5/museum

S12345/museum/mname S1/museum/mname, S3/museum/mname, S5/museum/mname

S12345/museum/painting S1/museum/painting, S2/painting, S4/artist/painting

S12345/museum/painting/pname S1/museum/painting/pname, S2/painting/pname, S4/artist/painting/pname

S12345/museum/painting/artist S2/painting/artist, S4/artist

S12345/museum/painting/artist/aname

S2/painting/artist, S4/artist/aname

S12345/museum/sponsor/funds S3/museum/funds, S5/museum/sponsor/funds

… …

Page 12: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

12

Query allocation table The QAT is derived from the mapping table. A QAT is created for each query. The query allocation table (QAT) stores the

selection condition paths, the return result paths, and the local schemas where the data for these paths can be found.

The QAT has two parts. The first part contains the selection conditions, and the second part contains the return result paths.

The QAT is automatically generated by our algorithm.

Page 13: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

13

Query allocation table example

S1 S2 S3 S4 S5 S12345

S12345 is the integrated schema of local schemas S1, S2, S3, S4, S5.

Query Q3: for $b in /book where $b/author=”Tom” and $b/year=”2000” return <result> {$b/year/text()} {$b/title/text()} </result>

b o o k

is b n a u th o r y e a r t it le

p u b lis h er

n a me lo c a t io n

+b o o k

is b n a u th o r t it le+

b o o k

is b n a u th o r+

b o o k

is b n a u th o r y e a r+

b o o k

is b n y e a r

p u b lis h e r

n a me a d d re s s

b o o k

is b n

p o s ta l c o d e

Page 14: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

14

Query allocation table example

Query Allocation Table for Query Q3Selection Condition Table:

/book/author S1, S2, S3

/book/year S3, S4

Return Result Table:

/book/yearS3, S4

/book/title S1

Page 15: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

15

Query Rewriting

Step1: Build the query allocation table. Step 2: Group local schemas to form join

groups that answer the user’s query. Step 3: Decompose the user query to

subqueries on the local sources. Step 4: Compose the subqueries from local

schemas in a join group.

Page 16: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

16

Step1: Build the query allocation table.

A query allocation table (QAT) consists of a selection condition table and a return result table.

The path of each selection condition and the return result is inserted into the selection condition table and the return result table respectively. The associated schemas identified from the mapping table are inserted into the corresponding rows.

Page 17: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

17

Step 2: Group local schemas to form join groups that answer the user’s query.

The algorithm finds which local schemas must be combined to get the expected results.

This is done by scanning the QAT to find the join groups.

The local schemas in each join group must contain all the paths required for the selection condition but may be missing some of the paths for the return result. This is possible because the result of a query can be incomplete.

Page 18: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

18

Step 2: Example demonstrating finding the sources from QAT that answer user’s query

S1 S2 S3 S4 S5 S12345

S12345 is the integrated schema of local schemas S1, S2, S3, S4, S5.

Query Q3: for $b in /book where $b/author=”Tom” and $b/year=”2000” return <result> {$b/year/text()} {$b/title/text()} </result>

b o o k

is b n a u th o r y e a r t it le

p u b lis h er

n a me lo c a t io n

+b o o k

is b n a u th o r t it le+

b o o k

is b n a u th o r+

b o o k

is b n a u th o r y e a r+

b o o k

is b n y e a r

p u b lis h e r

n a me a d d re s s

b o o k

is b n

p o s ta l c o d e

Page 19: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

19

Step 2: Example demonstrating finding the sources from QAT that answer user’s query

Output join group: {S1, S3} {S1, S4} {S2, S4} {S3}Note: {S2, S4} {S3} are join groups, even though they

cannot offer full results of the query. We allow the return results with missing information.

Query Allocation Table for Query Q3Selection Condition Table:

/book/author S1, S2, S3

/book/year S3, S4

Return Result Table:

/book/year S3, S4

/book/title S1

Page 20: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

20

Step 3: Decompose the user query to subqueries on the local sources

This step decomposes the user query into queries on the local schema based on the join groups.

Since subqueries are later composed to compute the answers in the same join group, the subquery must return the data asked for in the user query and also the data necessary to join the parts of the answers from different local schemas together. We call the classes necessary for joining the parts of answers, join object classes.

Page 21: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

21

Step 3: Decompose the user query to subqueries on the local sources

The key of the join object class is used for testing the equivalence when joining the subqueries.

Finding join object classes Case 1: For a join group, if there are n paths in the

QAT from different local schemas with a common ancestor in the user query, then the least common ancestor in the user query is a join object class.

There are two other cases that we will not consider in this talk.

Page 22: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

22

Step 3: Example demonstrating decomposing the user query to subqueries on the local sources (case1)

S1 S2 S3 S4 S5 S12345

S12345 is the integrated schema of local schemas S1, S2, S3, S4, S5.

Query Q3: for $b in /book where $b/author=”Tom” and $b/year=”2000” return <result> {$b/year/text()} {$b/title/text()} </result>

b o o k

is b n a u th o r y e a r t it le

p u b lis h er

n a me lo c a t io n

+b o o k

is b n a u th o r t it le+

b o o k

is b n a u th o r+

b o o k

is b n a u th o r y e a r+

b o o k

is b n y e a r

p u b lis h e r

n a me a d d re s s

b o o k

is b n

p o s ta l c o d e

Page 23: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

23

Step 3: Example demonstrating decomposing the user query to subqueries on the local sources (case1)

Recall the join group {S1, S3}, S1 provides “/book/title”, “/book/author” and S3 provides “/book/year”, “/book/author”.

To answer the query Q3, the subqueries from S1 and S3 need to be composed using the key of their least common ancestor i.e. the key “isbn” of the join object class “book”.

Page 24: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

24

Step 4: Compose the subqueries from local schemas in a join group

When joining subqueries on local schemas in the same join group, the identifier of the join object classes must be tested for equivalence.

We start by considering the basic case where the same object attributes are from different local schemas. To compose subqueries from these local schemas in join groups, the for, where, and return clause are combined together with the join condition equivalence test inserted in the where clause.

We allow the return results to have missing information. The parent object will not be removed from the return result, if it has a missing child. For each return object or attribute, the join equivalence condition test related to this return object or attribute is nested in the appropriate part of the query.

Page 25: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

25

Step 4: Example of composing the subqueries from local schemas in a join group

S1 S2 S3 S4 S5 S12345

S12345 is the integrated schema of local schemas S1, S2, S3, S4, S5.

Query Q9: for $b in /book where $b/author=”Tom” and $b/year=”2000” return<result>{$b/year/text()} {$b/title/text()}{

for $p in $b/publisher where contains ($b/publisher/location/text(),”Singapore”) return<publisher> {$b/publisher/name} </publisher> }

</result>

b o o k

is b n a u th o r y e a r t it le

p u b lis h er

n a me lo c a t io n

+b o o k

is b n a u th o r t it le+

b o o k

is b n a u th o r+

b o o k

is b n a u th o r y e a r+

b o o k

is b n y e a r

p u b lis h e r

n a me a d d re s s

b o o k

is b n

p o s ta l c o d e

Page 26: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

26

Step 4: Example of composing the subqueries from local schemas in a join group

The join groups are {S1, S3, S5}, {S1, S4, S5}, {S2, S4, S5} and {S3, S5}.

We show the query example for the join group {S1, S3, S5}. The user query is decomposed into subqueries on the local schemas S1, S3, and S5. The join object class is “book” for these local schemas.

Page 27: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

27

Step 4: Example of composing the subqueries from local schemas in a join group

The subqueries on S1, S3 and S5 are shown below: Query Q9_S1:

for $b in /book where $b/author=”Tom” return <result> {$b/isbn/text()} {$b/title/text()} </result>

Query Q9_S3: for $b in /book

where $b/author=”Tom” and $b/year=”2000” return <result> {$b/isbn/text()} {$b/year/text()} </result>

Query Q9_S5: for $b in /book where contains ($b/publisher/address/text(),”Singapore”)

return<result>{$b/isbn/text()} <publisher>{$b/publisher/name} </publisher></result>

Page 28: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

28

Step 4: Example of composing the subqueries from local schemas in a join group

The composition of the subqueries for local schemas S1, S3 and S5 are as follows:

for $b1 in doc(“S1.xml”)/book, $b3 in doc(“S3.xml”)/bookwhere $b1/author=”Tom” and $b3/author=”Tom” and $b3/year=”2000” and $b1/isbn=$b3/isbnreturn <result>{$b3/year/text()} {$b1/title/text()}

{for $b5 in doc(“S5.xml”)/book where contains ($b5/publisher/address/text(),”Singapore”) and $b5/isbn=$b1/isbnreturn<publisher>

{$b5/publisher/name}<publisher>}</result>

Note that although the join object class for S1, S3 and S5 is book, the equivalence tests are on separate lines in the rewritten query. This is because the parent information should be returned even when a child object class is missing.

Page 29: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

29

Comparison with Related Work Amman et al. [3] propose a mediator architecture for

querying and integrating XML data sources. Lakshmanan and Sadri [4] propose an infrastructure for

interoperability among XML data sources. Yu and Popa [5] introduce an algorithm for answering

queries via a target schema.

3. B. Amann, C. Beeri, I. Fundulaki, M. Scholl. Querying XML sources using an Ontology-based Mediator. CoopIS, 2002.

4. L.V.S. Lakshmanan, F. Sadri. Interoperability on XML Data. ICSW, 2003.5. C. Yu, L. Popa. Constraint-Based XML Query Rewriting for Data Integration. SIGMOD 2004.

Page 30: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

30

Comparison with Related Work

The models that these authors use cannot represent whether a relationship type is binary or n-ary and do not distinguish between attributes of object classes and attributes of relationship types from the local XML sources. The lack of such semantic information can lead to the retrieval of wrong results.

Page 31: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

31

Comparison with Related Work

S1 S2 S12S12 is the integrated schema of S1, S2

Query Q1: for $j in /project return <project> {$j/jno}

{for $p in $j/part return <part>{$p/pno} {for $s in $p/supplier return {$s}} </part>} </project>

p ro jec t

p ar tjn o

p n o s u p p lier

s n o

jp ,2,1:n ,1:n

jp s ,3,1:n ,1:n

q u a n t ity

jp ss u p p lie r 2

s n o

p s ,2,1:n ,1:n

p r o jec t

p ar tjn o

p n o s u p p lie r

s n o

jp ,2,1:n ,1:n

jp s ,3,1:n ,1:n

q u a n t it y

jp s

p r o jec t

s u p p lierjn o

s n o p ar t

p n o

js ,2,1:n ,1:n

jp s ,3,1:n ,1:n

q u a n t ityjp s

Page 32: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

32

Comparison with Related Work (example)X1: (data source for S1) X2: (data source for S2)

<project jno=”j01”> <part pno=”p01”> <supplier sno=”s01”> <quantity> 100 </quantity> </supplier> </part></project>

<project jno=”j02”> <supplier sno=”s02”> <part pno=”p01”> <quantity> 200 </quantity> </part> </supplier></project>

Page 33: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

33

Comparison with Related Work (example)Results obtained by our proposed algorithm Result obtained in related work

<result> <project jno=”j01”> <part pno=”p01”> <supplier sno=”s01”> <quantity> 100 </quantity> </supplier> </part> </project> <project jno=”j02”> <part pno=”p01”> <supplier sno=”s02”> <quantity> 200 </quantity> </supplier> </part> </project></result>

<result> <project jno=”j01”> <part pno=”p01”> <supplier sno=”s01”> <quantity> 100 </quantity> </supplier> <supplier sno=”s02”> <quantity> 200 </quantity> </supplier> </part> </project> <project jno=”j02”> <part pno=”p01”> <supplier sno=”s01”> <quantity> 100 </quantity> </supplier> <supplier sno=”s02”> <quantity> 200 </quantity> </supplier> </part> </project></result>

Page 34: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

34

Comparison with Related Work

The results returned by the query rewriting method in related work contain the project with jno “j01” has part “p01”, which is supplied by suppliers with sno “s01” and “s02”. This violates the local data sources X1 and X2, where the project with jno “j01” has part “p01” is only supplied by suppliers with sno “s01”.

Page 35: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

35

Comparison with Related Work This is because the methods in the related work

treat the relationship type between part and supplier as a binary relationship type, instead of the intended ternary relationship type involving project, part, and supplier. They treat the quantity as the attribute of part in S2, so when they find the part with pno “p01” has quantity “100” in X1, and has quantity “200” in X2, they will combine them to make the final result. This leads to the wrong answer returned.

In contrast, our algorithm takes the XML hierarchy structure into consideration and retrieves the expected answer.

Page 36: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

36

Conclusion We develop an algorithm for rewriting queries, using

the ORA-SS schema diagram that takes the semantic relationship between the source schemas and the integrated schema into account.

This allows us to distinguish between binary and n-ary relationship types, attributes of object classes and attributes of relationship types, and in turn treat these cases differently in our rewriting algorithm.

Page 37: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

37

Questions?

Page 38: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

38

Step 2: Algorithm that finds the sources from QAT that answer user’s query Algorithm GenerateJoinGroups Input: Query allocation table qat; Output: join groups1. create an empty list lt; 2. for i=1 to num_of_row of qat3. for j=1 to num_of_schema_id of row i4. if schemaij is present in the rowi and not in list lt5. add schemaij to list lt;6. n=the number of local sources in qat;7. create an empty stack st;8. for i=1 to n from lt9. { 10. if schemai is not in the top row in qat11. break;12. push schemai on the stack st;13. if schemai is present in all rows of qat14. {15. Output {schemai};16. st=null;17. continue;18. }

19. for j=i+1 to n if schemaj occurs in the rows, which the other schemas in st do not occur in, and schemaj does not occur in all the rows that the top element of st occurs in20. {21. push schemaj on the stack st;22. if (the local schemas in st has included all the path information in qat)23. {24. output all the schemas in the stack st split by”,” in a “{}”;25. pop the top schema off the stack st;26. }27. }28. if (j= =n and st has included all the path in-formation of the selection condition table and at least one result in return result table)29. output all the schemas in the stack st split by”,” in a “{}”;30. st=null;31.}

Page 39: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

39

Step 3: Decompose the user query to subqueries on the local sources

Finding join object classesCase 2: For a join group, if the paths in the

QAT are from different local schemas, and there is an object class that is the end of one path and the start of the other path, then this common object class is a join object class.

Case 3: For a join group, if two attributes of the same relationship type in a user query are from different local schemas, then all the object classes involved in this relationship type are join object classes.

Page 40: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

40

Step 3: Decompose the user query to subqueries on the local sources

We describe how to rewrite a user query for a local schema where the subquery on the local schema returns only one object class or attribute.

Then we describe how to rewrite a user query for a local schema where the subquery on the local schema returns many object classes or attributes.

(For details, please refer to the paper.)

Page 41: 1 A Semantic Approach to Rewriting Queries for Integrated XML Data Xia Yang 1, Mong Li Lee 1, Tok Wang Ling 1, Gillian Dobbie 2 1 School of Computing,

41

References1. B. Amann, C. Beeri, I. Fundulaki, M. Scholl. Querying XML sources using an Ontology-based Mediator. CoopIS, 2002.2. P. Buneman, S. Davidson, W. Fan, C. Hara, W.C. Tan. Keys for XML. WWW Conference, 2001.3. Y. Chen, T.W. Ling, M.L. Lee. Automatic Generation of XQuery View Definitions from ORA-SS Views. ER, 2003.4. S. Cluet, P. Veltri, D. Vodislav. Views in a Large Scale XML Repository. VLDB, 2001.5. O.M. Duschka, M.R. Genesereth. Answering Recursive Queries Using Views. ACM PODS, 1997.6. L. Haas, D. Kossmann, E. Wimmers, J. Yang. Optimizing queries across diverse data sources. VLDB, 1997.7. A. Halevy. Theory of Answering Queries Using Views. ACM SIGMOD Record 29(4), 2000.8. L.V.S. Lakshmanan, F. Sadri. Interoperability on XML Data. ICSW, 2003.9. M.L. Lee, T.W. Ling, W.L. Low. Designing Functional Dependencies for XML. EDBT, 2002.10. A. Levy. Logic-Based Techniques in Data Integration. Logic based artificial intelligence, 1999.11. T.W. Ling, M.L. Lee, G. Dobbie. Semistructured Database Design, ISBN: 0-387-23567-1, Springer, 2005.12. I. Manolescu, D. Florescu, D. Kossman. Answering XML queries over heterogeneous data sources. VLDB, 2001.13. H. Garcia-Molina, Y. Papakonstantinou, D. Quass, et al. The TSIMMIS project: Integration of heterogeneous information sources. Journal of

Intelligent Information Systems, 1997.14. K. Passi, E. Chaudhry. A Global-to-Local Rewriting Querying Mechanism using Semantic Mapping for XML Schema Integration. ODBASE

2003.15. K. Passi, L. Lane, S.Madria, Bipin C. Sakamuri, M. Mohania, S. Bhowmick. A Model for XML Schema Integration. EC-Web, 2002.16. R. Pottinger, A. Levy. A Scalable Algorithm for Answering Queries Using Views. VLDB, 2000.17. M. Stonebraker. Implementation of integrity constraints and views by query modification. ACM SIGMOD, 1975.18. Xyleme. A dynamic warehouse for XML Data of the Web. IEEE Data Engineering Bulletin, 2001.19. H.Z. Yang, P.A. Larson. Query Transformation for PSJ-queries. VLDB, 1987.20. X. Yang. Global Schema Generation and Query Rewriting In XML Integration. MSc thesis, National University of Singapore, 2005.21. X. Yang, M.L. Lee, T.W. Ling. Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach. ER 2003.22. C. Yu, L. Popa. Constraint-Based XML Query Rewriting for Data Integration. SIGMOD 2004.23. P. Buneman, A Deutsch, W.C. Tan. A deterministic model for semistructured data. Workshop on Query Processing for Semistructured Data

and Non-Standard Data Formats, 1998.24. Tok Wang Ling, Mong Li Lee, and Gillian Dobbie. Semistructured Database Design. Springer, 2005