using semantic annotation of web services for analyzing
DESCRIPTION
Presented at ICWS 2012TRANSCRIPT
Shahab Mokarizadeh , Royal Institute of Technology (KTH) , Sweden
Peep Küngas, University of Tartu (UT) , Estonia
Mihhail Matskin , Royal Institute of Technology (KTH) , Sweden
Marco Crasso, Marcelo Campo, Alejandro Zunino , UNICEN University,
Argentina
Contact: [email protected]
1
Information Diffusion in Web Services Networks
Outline
2
Background of Information Flow Analysis
Roadmap and Computational Model
Web service Annotation
Web service Categorization
Experimental Results
Discussion & Conclusion
Background – Information Diffusion
3
Information Diffusion: the communication of knowledge over
time among members of a social system
It shows intrinsic properties of real-world phenomenon.
Already studied in the context of: biosphere, microblogs,
publication citation, … where a network structure present.
Information Diffusion
among Web service Domains
4
Observation: Services published in the Web form a conceptual
ecology of knowledge where information is shared and flows
along input and output parameters of service operations.
Case-study: How Web services in different commodities have
been designed from information exchange perspective?
Introducing value-add Web services
Web service adoption spots
Roadmap
5
1 • Semantically annotation of Web services
2 • Assign Web services to respective categories
3 • Construct Web service network
4 • Compute information flow matrix
5 • Matrix Analysis
1-Web service Annotation
6
Image from : Web Services and
Security,1/17/2006 ,Marco Cova
-Only semantic annotations of basic elements of input and output
parameters of Web service Operations
-SAWSDL annotation model
-We exploit our Semi-automated ontology learning method which
relies on lexico-syntactic patterns “Ontology Learning for Cost-Effective Large-Scale Semantic Annotation
of Web Service Interfaces”. EKAW 2010:pp. 401-410
Tax and Customs Board service
7
Output message content fragment
Business Registry service
8
Input message content fragment
A Business Registry service
9 Output message content fragment
Registry of Economic Activities Service
10
Output message content fragment
2-Web service Categorization
11
A category (a.k.a. commodity) describes a general kind of a service
that is provided, for example “B2B” , “Health”, “E-Commerce”, etc.
Each Web service could belong to multiple categories !
Standard Software Taxonomy e.g. UNSPSC: http://www.unspsc.org/
We use Classifier : "AWSC: An approach to Web Service classification
based on machine learning techniques“, Inteligencia Artificial, ISSN 1137-3601, vol.
12, no. 37, pp. 25-36, Asociación Española para la Inteligencia Artificial, Valencia, España.
2008.
UNSPSC
Instant messaging Calendar and scheduling
Adventure games Mobile operator specific
Internet directory services Medical software
Music or sound editing Video conferencing software
3-Web service Network Construction
12
1- Present annotated Web services as bipartite (2-mode) graph
2- Create Semantic Network (1-mode graph)
3- Create Weighted Category Network using Semantic network
Bipartite Web Service Network
13
Bipartite Web Service Network
(categorized)
14
Propagate the categories to semantic
nodes , Cu: semantic node ,
qk: weight of node in category k
Network Transformation
15
Semantic Network Category Network
nku qqqQ .,,..1
n
i
iu
su
s
DinCoffrequency
DinCoffrequencyq
1
Ds, Dt : category nodes
Label each category edge with weights:
tvsutsvu qqDD ,,, .),(
),(
, ),(),(vuedge
tsvuts DDDDW
4-Normalizing Weights (Z-score)
16
Edge category weight W(Di,Dj) : Wi,j
Sum of all weights of all links from category i:
Sum of all weights of all links to category j:
Sum of weights of all categories:
Expected weights from category i to category j :
Normalize category weights (Z-Score):
j
jii DDWW ),(*
i
jij DDWW ),(*
ji
ji DDWW,
),(
W
WW ji **
W
WW
W
WWW
jiji
jiji
****
,, )(
Matrix of Information flow
17
nnjnn
nijii
nj
,,1,
,,1,
,1,11,1
Matrix of information flow between pair of categories:
A high proximity (Φ i j) between categories i and j reveals a strong
tendency for semantic concepts associated to category j to be resulted
from invocation of services which take semantic concepts associated to
category i.
5-Experimental Settings
18
27000 public Web services (WSDLs) (collected 2005-2011)
Semantic Annotation
Lexico-syntactic based ontology learning
Annotation accuracy: Precision= 31% , Recall= 19%
Categorization
AWSC Classifier
Training dataset: 1500 WSDLs
Categorization Accuracy: 91%
Category Category
1-Communications server 11-Network operation system
2-Instant messaging 12-Database management system
3-Adventure games 13-Analytical or scientific
4-Internet directory services 14-Portal server
5-Music or sound editing 15-Foreign language software
6-Calendar and scheduling 16-Procurement software
7-Mobile operator specific 17-Inventory management software
8-Medical software 18-Dictionary software
9-Video conferencing 19-Fax software
10-Map creation software 20-Object oriented database management
19
Excerpt of Identified Service Categories
20
Visualization of Matrix of Information Flow
Information Exchange Patterns - 1:
21
Self-Referential Pattern: A category mainly provides inputs
for its own services and consumes mostly the information
provided by itself (i.e. self contained).
Appear in diagonal of matrix
Categories: Financial Analysis Software, Web Platform Development
Software, Map Creation Software, Video Conferencing Software and
Accounting Software
The API-s exposed by these Web services exploit frequently
domain-specific concepts as input and output elements
Information Exchange Patterns - 2:
22
Outside main diagonal:
-Foreign Language category , Presentation category
-Financial Analysis category , Enterprise Resource Planning category
Least volume of information flow:
-Video Conferencing software and Financial Analysis software
Threats to Validity
23
The presented model heavily relies of accuracy of underlying semantic annotation and matching scheme !
The examined Web services account only for small proportion of existing ones on the Web!
The collection of Web services’ interface descriptions may also suffer from unintentional preference toward some specific categories.
In the absence of timing factor our analysis is rather static analysis of information flow
Conclusion and Future Work
24
The presented approach can discover information exchange
patterns.
In general our approach is applicable to any other kind of machine
understandable APIs, not just WSDLs, !
Future work:
To examine how presence of service composition or mashups
influences the information exchange pattern
Recommending value-add Web services based on identified
information exchange patterns and Web service network
properties
Thanks!
Questions Please!
25
26
tvsutsvu qqDD ,,, .),( Partial Category Weight for Edge (Ds,Dt) :
Augmented Category Weight for Edge (Ds, Dt):
),(
, ),(),(vuedge
tsvuts DDDDW
Ontology Learning for
Web service Annotation1
27
Reference Ontology
Adding Relations
Ontology Organization
Term Extraction
Syntactic Refinement
Information Elicitation
Pattern-based Semantic Analysis
Term Disambiguation
Class and Relation Determination
Ontology Discovery
Ontology Learning Input:
- Message Part names of input/output
parameters
- XML Schema leaf element names of
complex types
[1] ”Ontology Learning for Cost-Effective Large-scale Semantic
Annotation of XML Schemas and Web Service Interfaces". in Porc.
EKAW 2010, LNAI 6317,pp.401-410, 2010