semantic network analysis 11.07.05
DESCRIPTION
Semantic Network Analysis 11.07.05. Analyzing Semantic Interoperability in Bioinformatic Database Networks Philippe Cudré-Mauroux, EPFL Joint work with: Julien Gaugaz, Adriana Budura and Karl Aberer. Overview. Peer Data Management Systems (PDMS) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/1.jpg)
1
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Semantic Network Analysis 11.07.05
Analyzing Semantic Interoperability in Bioinformatic
Database Networks
Philippe Cudré-Mauroux, EPFL
Joint work with:Julien Gaugaz, Adriana Budura and Karl Aberer
![Page 2: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/2.jpg)
2
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Overview
1. Peer Data Management Systems (PDMS)2. Semantic Interoperability in the Large
• Generatingfunctionologic framework
3. The Sequence Retrieval System• Degree distribution• Analysis of giant component• Weighted analysis
4. Conclusions
![Page 3: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/3.jpg)
3
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Beyond Keyword Search
searching semantically richer objects in large scale heterogeneous networks
<xap:CreateDate>2001-12-19T18:49:03Z</xap:CreateDate><xap:ModifyDate>2001-12-19T20:09:28Z</xap:ModifyDate>
date?
<es:DofCreation> 05/08/2004 </es:DofCreation>
<myRDF:Date> Jan 1, 2005 </myRDF:Date>
?
?
??
?
![Page 4: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/4.jpg)
4
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Decentralized Data Integration
• Large Scale Information Systems (e.g., WWW)– Number of sources > 100– Unreliable data
• Autonomy– Semi-structured data
• E.g., XML/RDF– No integrity constraints– No transactions– Simple SP queries
• E.g., triple patterns, ranking
– Schemata created by end users
– Network churn
• Distributed Databases
– Number of sources < 100– Consistent data
• Coordination– Structured data
• E.g., Relational data model– Integrity constraints– Transactions– Powerful queries
• E.g., SQL, aggregation– Schemas created by
administrators– Relatively Fixed topology
VS
![Page 5: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/5.jpg)
5
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Data Integration: LAV/GAV
• Traditional database techniques (e.g., LAV/GAV) rely on centralized schemas to integrate data sources
• Not applicable to our context– Scale (upper ontologies?)– Churn– Autonomy
• How can we foster semantic interoperability in decentralized settings?
Date
myDate yourDate
m(Date) = yourDatem(Date) = myDate
![Page 6: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/6.jpg)
6
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Semantic Interoperability
Q1=<GUID>$p/GUID</GUID> FOR $p IN /Photoshop_Image WHERE $p/Creator LIKE "%Robi%"
<Photoshop_Image> <GUID>178A8CD8865</GUID> <Creator>Robinson</Creator> <Subject> <Bag> <Item> Tunbridge Wells </Item> <Item>Royal Council</Item> </Bag> </Subject> …</Photoshop_Image>
Photoshop(own schema)
<WinFSImage> <GUID>178A8CD8866</GUID> <Author> <DisplayName> Henry Peach Robinson <DisplayName> <Role>Photographer</Role> <Author> <Keyword> Tunbridge </Keyword> <Keyword>Council</Keyword> …</WinFSImage>
WinFS (known schema)
T12 =<Photoshop_Image> <GUID>$fs/GUID</GUID> <Creator> $fs/Author/DisplayName </Creator></Photoshop_Image>FOR $fs IN /WinFSImage
Q2=<GUID>$p/GUID</GUID> FOR $p IN T12 WHERE $p/Creator LIKE "%Robi%"
Extending semantic interoperability techniques to decentralized settings
![Page 7: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/7.jpg)
7
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
1. Peer Data Management Systems
• Pairwise mappings– Peer Data Management Systems (PDMS)
• Local mappings overcome global heterogeneity– Iterative query rewriting
<xap:CreateDate>2001-12-19T18:49:03Z</xap:CreateDate><xap:ModifyDate>2001-12-19T20:09:28Z</xap:ModifyDate>
date?
<es:cDate> 05/08/2004 </es:cDate>
<myRDF:Date> Jan 1, 2005 </myRDF:Date>
articleweather
es:cDate xap:CreateDate
es:cDate
myR
DF:D
ate
myR
DF:
Dat
e
xap
:Mod
ifyD
ate
![Page 8: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/8.jpg)
8
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Semantic Mediation Layer
Correlated / Uncorrelated
Correlated / Uncorrelated
“Physical”layer
Overlay Layer
SemanticMediation Layer
![Page 9: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/9.jpg)
9
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Schema-to-Schema Graph
Inter-organization of the different schemas used by the peers - Logical model- Directed- Weighted- Redundant
![Page 10: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/10.jpg)
10
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
The Semantic Connectivity Graph
• Definition (Semantic Interoperability) Two peers are said to be semantically interoperable if they can
forward queries to each other in the Schema-to-Schema graph, potentially through series of semantic translation links
• Idea– As for physical network analyses, create a connectivity layer to
account for semantic interoperability
• The semantic connectivity Graph S– Unweighted, irreflexive and non-redundant version of the Schema-
to-Schema graph
![Page 11: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/11.jpg)
11
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Observations
• Theorem Peers in a set Ps are semantically interoperable iff Ss is
strongly connected, with Ss {s | p Ps, ps}
• Observation 1 A set of peers Ps cannot be semantically interoperable if
|Es| < |Vs|
• Observation 2 A set of peers Ps is semantically interoperable if
|Es| > |Vs| (|Vs|-1) - (|Vs|-1)
![Page 12: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/12.jpg)
12
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
2. Semantic Interoperability in the Large
• Question– How can we analyze semantic interoperability in
large-scale PDMS?
• Idea: use percolation theory to detect the emergence of a strongly connected component in S– Necessary condition for vertex-strong connectivity– Necessary condition for semantic interoperability
![Page 13: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/13.jpg)
13
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
The Model
• Adaptation of a recent graph-theoretic framework– Newman, Strogatz, Watts 2001
• Large-scale semantic graphs as random graphs with arbitrary degree distribution– Exponentially distributed, small-world, scale-free… graphs
• Specificities of our model– Strong clustering (clustering coefficient cc)– Bidirectionality (bidirectionality coefficient bc) (for directed networks)
• Based on generatingfunctionology
–
• Percolation: ci > 0
![Page 14: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/14.jpg)
14
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Size of the giant component
With u the smallest non-negative solution of
And G1 the distribution of edges from first to second-order neighbors:
![Page 15: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/15.jpg)
15
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
3. The Sequence Retrieval System (SRS)
• Commercial information indexing and retrieval system
• Bioinformatic libraries– EMBL– SwissProt– Prosite– Etc.
• Schemas described in a custom language (Icarus)
• Mappings (links) from one database to others
![Page 16: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/16.jpg)
16
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Why is SRS interesting?
• Applying our heuristics on a real large-scale corpus of interconnected databases– More than 380 databanks– More than 500 (undirected) links– Data used by professionals on a daily basis
![Page 17: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/17.jpg)
17
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Crawling the SRS schema-to-schema graph
• Custom crawler• As of May 2005 (EBI repository)
– 388 nodes– 518 edges
– Giant connected component: 187 nodes– Power-law distribution of node degrees
– Clustering coefficient = 0.32– Diameter = 9
![Page 18: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/18.jpg)
18
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Results
• Connectivity indicator ci = 25.4– Super-critical state
• Size of the giant component– 0.47 (derived)– 0.48 (observed)
![Page 19: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/19.jpg)
19
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Graphs with same power-law degree distr.
• Varying number of edges
![Page 20: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/20.jpg)
20
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
10x Bigger Graph
![Page 21: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/21.jpg)
21
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Analyzing weighted networks
• Do we have a sufficient number of good mappings?
• Introducing quality measures from the mappings– Weights– Attribute / schema level– Cf. Chatty Web (WWW03)
• Semantic query forwarding– Per-hop forwarding behaviors
– Only forward if wi >= = 0 : flooding = 1 : exact answers
![Page 22: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/22.jpg)
22
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Weighted Results
• Same degree distribution (388 nodes)• Uniformly distributed weights between 0 and 1
![Page 23: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/23.jpg)
23
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
4. Conclusions
• Analyzing a real network of bioinformatic databases– Accurate results (even for relatively small networks)– Weighted / unweighted
• Current works– Compositions of weights along a path– Semantic random walkers– Public domain simulator
• Future works– Analyzing other forwarding behaviors– Implementation in a real PDMS (self-organizing
mappings)• GridVine
![Page 24: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/24.jpg)
24
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
References
A Necessary Condition for Semantic Interoperability in the LargePhilippe Cudré-Mauroux and Karl AbererODBASE 2004
GridVine: Building Internet-Scale Semantic Overlay NetworksKarl Aberer, Philippe Cudré-Mauroux and Tim van PeltISWC 2004
Semantic Overlay Networks (Tutorial)Karl Aberer and Philippe Cudré-MaurouxVLDB 2005
… complete reference list athttp://lsirpeople.epfl.ch/pcudre/
![Page 25: Semantic Network Analysis 11.07.05](https://reader035.vdocuments.site/reader035/viewer/2022062321/5681358d550346895d9cf692/html5/thumbnails/25.jpg)
25
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities
Thank you for your attention
Questions ?