matching and reuse of xml schemas

37
1 Matching and Reuse of XML Schemas

Upload: daktari

Post on 16-Feb-2016

58 views

Category:

Documents


0 download

DESCRIPTION

Matching and Reuse of XML Schemas . Sample XML Schema. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Matching and Reuse of XML Schemas

1

Matching and Reuse of XML Schemas

Page 2: Matching and Reuse of XML Schemas

2

Sample XML Schema<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="car"> <xs:complexType> <xs:sequence> <xs:element name="make" type="xs:string"/> <xs:element name="model" type="xs:string"/> <xs:element name="year" type="xs:string"/> <xs:element name="color" type="xs:string"/> <xs:element name="driver"> <xs:complexType>

<xs:sequence> <xs:element name="first" type="xs:string"/> <xs:element name="last" type="xs:string"/> <xs:element name="license" type="xs:string"/></xs:sequence>

</xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element></xs:schema>

Page 3: Matching and Reuse of XML Schemas

3

What is XML schema matching

Matching – identifying the relations among the corresponding elements of two schemas e.g. customer/firstName <==> client/name/first customer/name <==> concatenate (client/name/first, client/name/last)

Calculate the distance between two Schemas E.g., distance between customer.xsd and client.xsd is 0.67.

Page 4: Matching and Reuse of XML Schemas

4

Why XML Schema matching From data integration point of view:

Purpose: Automatically identifying corresponding elements between two schemas Relevant works:

Database schema matching/mapping, e.g., A. Doan, et al., Reconciling schemas of disparate data sources: A machine-learning approach. SIGMOD, 2001

Generic schema mapping, e.g., J. Madhavan, P. A. Bernstein, E. Rahm. Generic schema matching with Cupid. VLDB, 2001.

XML Schema matching. E.g. H. Do, E. Rahm. COMA A system for flexible combination of schema matching approaches. VLDB 2002.

From web service composition point of view e.g., matching the output type of one service with the input of another in

sequential composition From software reuse point of view:

Purpose: Build XML Schema categories and search engines; Relevant works:

Software component search: A Mili, R Mili, RT Mittermeir, A survey of software reuse libraries, Annals of Software Engineering, 1998.

Agent and service matching: Katia Sycara, Jianguo Lu, Matthias Klusch, Interoperability among Heterogeneous Software Agents on the Internet, Technical Report CMU-RI-TR-98-22, CMU.

Page 5: Matching and Reuse of XML Schemas

5

What are the problems

Modelling As graph As tree matching

Node similarity Name, type, cardinality.

Structure similarity Tree edit distance

K. Zhang, D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing, 1989.

Page 6: Matching and Reuse of XML Schemas

6

Overview of our system

XMLSchema Name

Similarity

XMLSchema

Modelling Structural RelationsName Relations

Results retrieval

Node Relations

NodeSimilarity

Structural similarity

Page 7: Matching and Reuse of XML Schemas

7

Three similarities

WordNet,string matching

Hungarian method

NameSimilarity

NodeSimilarity

Structural Similarity

Node name Hierarchicalstructure

Compatibilitytables

User-defineddata type

Built-indata type Cardinality

Tree matchingalgorithm

Page 8: Matching and Reuse of XML Schemas

8

Modelling

<xs:element name="driver" type="driverType"/>

<xs:attribute name="license" type="xs:string"/>

Model schem

as as trees

Page 9: Matching and Reuse of XML Schemas

9

ModellingcustomerOrder

shipping billing address

date ship2Add date bill2Add street province postcode

schema

reference

paper

authortitle contents

refNo paper

customerOrder

shipping billing

date ship2Add date bill2Add

schema

street

address

province postcode

street

address

state zip

Address_ca.xsd Address_us.xsd

Model schem

as as trees

Reference

Importing and Inclusion

Recursion

Page 10: Matching and Reuse of XML Schemas

10

Information excluded in Modelling Related to elements or attributes

Default value, value range, unique, nullable…

Related to structure Sequence All Choice

name

first last

name

last first

Model schem

as as trees

Page 11: Matching and Reuse of XML Schemas

11

Computing node similarity Computing name similarity with the help of:

WordNet and its API String matching Hungarian method

Add the similarity of other information Data type Minimum cardinality Maximum cardinality

Node similarity

Page 12: Matching and Reuse of XML Schemas

12

Name similarity from token lists Tokenize names

E.g. clientName -> client name submittedReports -> submit report

Similarity between two token lists Using Hungarian method for Weighted Bipartite Graph Matching

(WBGM)

simi,j

sim0,0customer

delivery

address

client

require

shipping

address

customerDeliveryAddress vs. clientRequiredShippingAddress

Node similarity

Page 13: Matching and Reuse of XML Schemas

13

Determine the structural relation

Tree 1 Tree 2

Structure similarity

Page 14: Matching and Reuse of XML Schemas

14

Common substructure

car

make

model

year

colordriver

firstName

lastName

license

make

carmodel

year

color

driver

first

last

license

Structure similarity

Page 15: Matching and Reuse of XML Schemas

15

Approximate Common Structure

car

make

model

year

colordriver

firstName

lastName

license

make

carmodel

year

color

driver

first

last

license

Structure similarity

Page 16: Matching and Reuse of XML Schemas

16

Mappings in an ACS

car

make

model

year

color

driver

first (firstName)

last (lastName)

license

mACS1 = {(s1.car, s2.car), (s1.make, s2.make), (s1.year, s2.year), (s1.color, s2.color)}

mACS2 = {(s1.dirver, s2.driver), (s1.fist, s2.firstName), (s1.last, s2.lastName), (s1.license, s2.license)}

ACS1

ACS2

Structure similarity

Page 17: Matching and Reuse of XML Schemas

17

Evaluation Criteria

Matching outcomes Mappings Schema similarity

Execution time

Collected four groups of Schemas Purchase orders used in COMA (5) Large schemas from XML.org (86) Schemas on hospitality domain (95) Extract from WSDL (419)

Evaluation

Page 18: Matching and Reuse of XML Schemas

18

Comparison with edit distance algorithm element mapping on data group 1

0.00.10.20.30.40.50.60.70.80.91.0

Precision by method 1 Recall by method 1Precision by method 2 Recall by method 2

Evaluation

Method 1: our algorithmMethod 2: edit distance

Page 19: Matching and Reuse of XML Schemas

19

Comparison with edit distance: schema similarity data group 3 and 4

Top-k Precision

0.0

0.2

0.4

0.6

0.8

1.0

Method 1 onSchema group

3

Method 1 onSchema group

4

Method 2 onSchema group

3

Method 2 onSchema group

4

Top-3 Precision

Top-5 Precision

Evaluation

Method 1: our algorithmMethod 2: edit distance

Page 20: Matching and Reuse of XML Schemas

20

Comparison with edit distance: performance on data group 2

0

50

100

150

200

250

Input size (M*N)

(sec

onds

)

Avg Matching Time 1 Avg Matching Time 2

Evaluation

Method 1: our algorithmMethod 2: edit distance

Page 21: Matching and Reuse of XML Schemas

21

Comparison with COMA (Mapping)

  COMA – 'All' COMA – 'All+SchemaM' Our algorithm

Precision about 0.95 about 0.93 0.88

Recall about 0.78 about 0.89 0.87

Overall 0.73 0.82 0.75

Overall is a measure that combines precision and recall. It reflects the efforts of removing incorrect mappings and adding missing ones.

Evaluation

Page 22: Matching and Reuse of XML Schemas

22

Conclusion

Scalable schema matching Wang Lian, David W. Cheung, Nikos Mamoulis, and Siu-Ming Yiu,

An Efficient and Scalable Algorithm for Clustering XML Documents by Structure, TKDE, 2005.

Subtyping

Apply to web service matching

Page 23: Matching and Reuse of XML Schemas

23

Web service synthesis

Page 24: Matching and Reuse of XML Schemas

24

Web Service Composition

Composite web service: “service implemented by combining the functionality provided by other web services” –G. Alonso et al.

Web service composition: the process of developing a composite web service

Approaches to web service composition: Conventional programming languages, such as Java, C#; Web service composition languages, such as BPEL; Workflow, pi-calculus, petri net, automata… Web service synthesis.

composition

Page 25: Matching and Reuse of XML Schemas

25

Web Service Synthesis

BPEL and the like are still programming languages They describe exactly how to compose the web services.

Web service synthesis We describe what is the service. But don’t describe how to

implement it; We don’t even know what are the component services involved;

The relevant services are discovered and invoked dynamically; The implementation is synthesized from the web service

specification, automatically.

Program synthesis has a long history.

composition

Page 26: Matching and Reuse of XML Schemas

26

Web Service Synthesis

WSSyntactic Specification (WSDL)Semantic Specification (Datalog)

Service Implementation

Service Specification (WSDL/Datalog)

WS2

WS1

WSService Implementation (BPEL)

composition

Page 27: Matching and Reuse of XML Schemas

27

Syntactic specification: …Semantic Specification:chapters(ISBN, PRICE, TITLE, AUTHOR) <- Chapters(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR).

Synthesis Example

Service specificationSyntactic: Interface definition defined by WSDLSemantic: Q(ISBN, PRICE, TITLE, RATE) <- Chapters(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR), Book2(ISBN, COMMENT, RATE).

Service ImplementationJava code, database

Service SpecificationSyntactic specification:WSDL fileSemantic Specification:amazon(ISBN, PRICE, RATE, TITLE, AUTHOR) <- Amazon(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR), Book2(ISBN, COMMENT, RATE).

Chapters

amazon

MetaSearchService

??

MetaSearchService Implementation

composition

Page 28: Matching and Reuse of XML Schemas

28

Generate the abstract implementation by query rewriting

Syntactic specification: …Semantic Specification:chapters(ISBN, PRICE, TITLE, AUTHOR) <- Chapters(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR).

Service specificationSyntactic: Interface definition defined by WSDLSemantic: Q(ISBN, PRICE, TITLE, RATE) <- Chapters(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR), Book2(ISBN, COMMENT, RATE).

Service ImplementationJava code, database

Service SpecificationSyntactic specification:WSDL fileSemantic Specification:amazon(ISBN, PRICE, RATE, TITLE, AUTHOR) <- Amazon(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR), Book2(ISBN, COMMENT, RATE).

Chapters

amazon

MetaSearchService

Q(ISBN, PRICE, TITLE, RATE) <- amazon(ISBN, PRICE, RATE, TITLE', AUTHOR'), chapters(ISBN, PRICE0, TITLE, AUTHOR).

MetaSearchService Abstract Implementation

composition

Page 29: Matching and Reuse of XML Schemas

29

Generate the Concrete Implementation

Syntactic specification: …Semantic Specification:chapters(ISBN, PRICE, TITLE, AUTHOR) <- Chapters(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR).

Service specificationSyntactic: Interface definition defined by WSDLSemantic: Q(ISBN, PRICE, PRICE0, TITLE, RATE) <- …

Service ImplementationJava code, database

Service SpecificationSyntactic specification:WSDL fileSemantic Specification:amazon(ISBN, PRICE, RATE, TITLE, AUTHOR) <- Amazon(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR), Book2(ISBN, COMMENT, RATE).

Chapters

amazon

MetaSearchService

Q(ISBN, PRICE, PRICE0, TITLE, RATE) <- amazon(ISBN, PRICE, RATE, TITLE', AUTHOR'), chapters(ISBN, PRICE0, TITLE, AUTHOR).

MetaSearchService Abstract Implementation

Invoke amazon;Invoke chapters;Combine the output;

MetaSearchService Concrete Implementation

composition

Page 30: Matching and Reuse of XML Schemas

30

It is a lightweight approach…

Web services are restricted to be database queries or functions that can be described by database queries or Datalog;

Semantic specification is Datalog instead of more powerful specification mechanism employing ontology;

Compositions are restricted to data composition instead of full-blown process specification such as BPEL.

All those choices are meant for the construction of a practical web service synthesis system…

composition

Page 31: Matching and Reuse of XML Schemas

31

Mapping between Datalog and Web Services

Database vendors also provide wrappers for web services Behind a web service there is a SQL query that corresponds to the

web service; SQL defines the semantics of the web service. Major database vendors support the mapping between SQL and

Web service; We experimented with DB2WS.

Malaika, S. et al. DB2 and Web Services. IBM System Journal, 41(4), pp. 666-685. 2002.

composition

Page 32: Matching and Reuse of XML Schemas

32

Generate the Abstract Implementation by Query rewriting

Definition: Given a query Q and a set of views V. A rewriting of Q using V is a query Q’ such that Q=Q’, and Q’ refers to one or more views in V.

Q T1, T2, T3.

Query:

Views:

Rewriting 2:Q V1, V2.

Rewriting 1:Q V1, T3.

V1T1,T2.V2T2,T3.

composition

Page 33: Matching and Reuse of XML Schemas

33

Our query rewriting system

composition

Page 34: Matching and Reuse of XML Schemas

34

Limitations of our approach

Focus on database web services; Datalog is not expressive enough.

Query rewriting in Description Logic, or OWL.

Assume the existence of global database schemas: Service providers need to provide the semantic definition of web

services in terms a global database schema; New service specification is also defined using the common schema

Schema matching

composition

Page 35: Matching and Reuse of XML Schemas

35

Other threads

Web service collection and clustering From UDDI, Crawler, Search engines such as Google Master thesis to be finished this summer

Web service metrics Schema subtyping

Based on regular tree grammar Master thesis to be finished this summer

Bottom up web service composition Semantic web service

Page 36: Matching and Reuse of XML Schemas

36

Service Oriented Architecture

Discovery agency

ProviderRequesterinteract

findpublish

Page 37: Matching and Reuse of XML Schemas

37

Web service discovery

Keywords search Based on IR techniques, such as vector space model Fast, but not accurate

Signature matching Decide subtype relations between input and output of web services Used in service composition, to find composable web services

Relaxed matching Approximate matching, allowing small deviations in both structure

and words/tags Semantic matching

Matching functional requirements of web services Used in adaptive, autonomous systems