clustering wsdl documents to bootstrap the discovery of web services
DESCRIPTION
Reading ICWS2010 "Clustering WSDL Documents to Bootstrap the Discovery of Web Services"TRANSCRIPT
Clustering WSDL Documents to Bootstrap the Discovery of Web Services
Web Services Discovery
Outline
Introduction Related Work Our Approach Experiments Conclusion
Introduction
Major providers decided to publish WS through their own websites instead of public registries
UDDI Busine
ss Registr
y
Search
engine
47%92%
Introduction
Problem of search engine If the search query doesn’t contain part
of the service name exactly, the service may not be retrieved
User may even miss services that use synonyms or variations of keywords car -> vehicle
Outline
IntroductionRelated Work Our Approach Experiments Conclusion
Related work
Using the Jaccard coefficient to calculate the similarity between Web services. (Richi Nayak 2008) provides the user with related search terms based on
other users’ experiences with similar queries Web services search engine Woogle (Xin Dong 2004)
that is capable of providing Web services similarity search. Does not adequately consider data types
Apply text mining techniques to extract features such as service content, context, host name, and name, from Web service description files in order to cluster Web services(Wei Liu 2009) service context and service host name features offer little
help in the clustering process
Outline
Introduction Related WorkOur Approach Experiments Conclusion
Big picture
Features Extraction
Mine the WSDL documents to extract features that describe the semantic and behavior of the Web service WSDL content WSDL types WSDL messages WSDL ports Web service name
Features Extraction Process
Feature 1: WSDL Content
Ti={types, message, weather, zipcode, web, forecast,
forecasting, is..}
Ti={weather, zipcode, web, forecast,
forecasting, is…}
Ti={weather, zipcode, web, forecast, is…}
Ti={weather, zipcode, web, forecast…}
Ti={weather, zipcode, forecast..}
Parsing WSDL
Tag removal
Word stemming
Function word
removal
Content word
recognition
Function word removal
Function word: is, a, do.. Content word: weather, zipcode..
Content word recognition
Apply k-means clustering algorithm with k=2 on Ti
use Normalized Google Distance (NGD) as a featureless distance measure between words
{weather, zip,
zipcode, forecast, place}
{response, bind,
data, post, port,
target}
{runtime, bind, web,
service, module,
data, post}
Web service specific cluster Predefined cluster
Non-Web-service-specific
cluster
WSDL types, messages, ports
Feature 2,3,4
Feature 2: WSDL Types (complexType)
the type attribute is a good candidate for describing the functionality of a service.
Feature 3: WSDL Messages Feature 4: WSDL Ports
Feature 5: Web Service Name
We consider the Web service name used in the URI of the WSDL document
http://www.webservicex.net/WeatherForecast.Asmx?WSDL
the name of the Web service is ”Weather Forecast”
Features Integration and clustering
We use the Quality Threshold (QT) clustering algorithm to cluster similar Web services based on the five similarity features presented above.
Similarity factor between web service si and sj
Outline
Introduction Related Work Our ApproachExperiments Conclusion
Experiments
Two criteria Precision: exactness Recall: completeness
Experiments
400 online web services Manual classification, serve as a
comparison point for clustering algorithms ”Currency exchange”, ”Weather”,
”Address validation”, ”E-mail verification”, and ”Credit card services”
Results
High Precision and Recall
Outline
Introduction Related Work Our Approach ExperimentsConclusion
Conclusion
We propose an approach to improve service discovery of non-semantic Web services by clustering similar services through mining WSDL documents
Future work: plan to improve features integration by choosing optimized weights for each feature using a linear programming approach
Thanks