clustering wsdl documents to bootstrap the discovery of web services

Clustering WSDL Documents to Bootstrap the Discovery of Web Services

Web Services Discovery

Outline

Introduction Related Work Our Approach Experiments Conclusion

Introduction

Major providers decided to publish WS through their own websites instead of public registries

UDDI Busine

ss Registr

y

Search

engine

47%92%

Introduction

Problem of search engine If the search query doesn’t contain part

of the service name exactly, the service may not be retrieved

User may even miss services that use synonyms or variations of keywords car -> vehicle

Outline

IntroductionRelated Work Our Approach Experiments Conclusion

Related work

Using the Jaccard coefficient to calculate the similarity between Web services. (Richi Nayak 2008) provides the user with related search terms based on

other users’ experiences with similar queries Web services search engine Woogle (Xin Dong 2004)

that is capable of providing Web services similarity search. Does not adequately consider data types

Apply text mining techniques to extract features such as service content, context, host name, and name, from Web service description files in order to cluster Web services(Wei Liu 2009) service context and service host name features offer little

help in the clustering process

Outline

Introduction Related WorkOur Approach Experiments Conclusion

Big picture

Features Extraction

Mine the WSDL documents to extract features that describe the semantic and behavior of the Web service WSDL content WSDL types WSDL messages WSDL ports Web service name

Features Extraction Process

Feature 1: WSDL Content

Ti={types, message, weather, zipcode, web, forecast,

forecasting, is..}

Ti={weather, zipcode, web, forecast,

forecasting, is…}

Ti={weather, zipcode, web, forecast, is…}

Ti={weather, zipcode, web, forecast…}

Ti={weather, zipcode, forecast..}

Parsing WSDL

Tag removal

Word stemming

Function word

removal

Content word

recognition

Function word removal

Function word: is, a, do.. Content word: weather, zipcode..

Content word recognition

Apply k-means clustering algorithm with k=2 on Ti

use Normalized Google Distance (NGD) as a featureless distance measure between words

{weather, zip,

zipcode, forecast, place}

{response, bind,

data, post, port,

target}

{runtime, bind, web,

service, module,

data, post}

Web service specific cluster Predefined cluster

Non-Web-service-specific

cluster

WSDL types, messages, ports

Feature 2,3,4

Feature 2: WSDL Types (complexType)

the type attribute is a good candidate for describing the functionality of a service.

Feature 3: WSDL Messages Feature 4: WSDL Ports

Feature 5: Web Service Name

We consider the Web service name used in the URI of the WSDL document

http://www.webservicex.net/WeatherForecast.Asmx?WSDL

the name of the Web service is ”Weather Forecast”




Features Integration and clustering

We use the Quality Threshold (QT) clustering algorithm to cluster similar Web services based on the five similarity features presented above.

Similarity factor between web service si and sj

Outline

Introduction Related Work Our ApproachExperiments Conclusion

Experiments

Two criteria Precision: exactness Recall: completeness

Experiments

400 online web services Manual classification, serve as a

comparison point for clustering algorithms ”Currency exchange”, ”Weather”,

”Address validation”, ”E-mail verification”, and ”Credit card services”

Results

High Precision and Recall

Outline

Introduction Related Work Our Approach ExperimentsConclusion

Conclusion

We propose an approach to improve service discovery of non-semantic Web services by clustering similar services through mining WSDL documents

Future work: plan to improve features integration by choosing optimized weights for each feature using a linear programming approach

Thanks

clustering wsdl documents to bootstrap the discovery of web services

Technology

wsdl types

extract features

web services

ti weather

web service

weather

web

service