designing clustering methods for ontology building: the mo’k workbench authors: gilles bisson,...
Post on 15-Jan-2016
228 views
TRANSCRIPT
![Page 1: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/1.jpg)
Designing clustering methods for ontology building:
The Mo’K workbench
Authors:Gilles Bisson, Claire Nédellec
and Dolores CañameroPresenter:
Ovidiu Fortu
![Page 2: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/2.jpg)
INTRODUCTION
Paper objectives: Presentation of a workbench for
development and evaluation of the methods that learn ontologies
Some experimental results that illustrate the suitability of the model in characterization of the methods of learning semantic classes
![Page 3: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/3.jpg)
INTRODUCTION
Ontology building general strategy: Define a distance metric (as good an
approximation for the semantic distance as possible)
Devise/use a classifying algorithm that uses the above distance to build the ontology
![Page 4: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/4.jpg)
Harris’ hypothesis
Formulation: Study of syntactic regularities leads to identification of syntactic schemata made out of combinations of word classes reflecting specific domain knowledge
Consequence: one can measure similarity using cooccurence in syntactic patterns
![Page 5: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/5.jpg)
Conceptual clustering
Ontologies are organized as acyclic graphs: Nodes represent concepts Links represent inclusion (generality
relation) The methods considered in this
paper rely upon bottom-up construction of the graph
![Page 6: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/6.jpg)
The Mo’K model
Representation of examples: Binary syntactic patterns of the form:<head – grammatical relation – modifier head>,
where <modifier head> is the object, and the rest of the pattern is the attribute
Example: This causes a decrease in […] <cause Dobj decrease>
![Page 7: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/7.jpg)
Clustering
Bottom up clustering by joining classes that are near: Join classes of objects (nouns or actions
– tuples <verb, relation>) that are frequently determined by the same attributes
Join attribute classes that frequently determine the same objects
![Page 8: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/8.jpg)
Corpora
Specialized corpora used for domain specific ontologies
Corpora are pruned (rare examples are eliminated) – the workbench allows the specification of Minimum number of occurences for a
pattern to be considered Minimum number of occurences for an
attribute/object to be considered
![Page 9: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/9.jpg)
Distance modeling
Consider only distances that: Take syntactic analysis as input Do not use other ontologies (like
WordNet) Are based on distributions of the
attributes of an object Identify general steps in
computation of these distances to formulate a general model
![Page 10: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/10.jpg)
Distance computation
Step 1: weighting phase Modify the frequencies of elements in the
contingency matrix using general algorithm: Initialization of the weight of each example E:
W(E) Initialization of the weight of each attribute A:
W(A) For each example E
For each attribute A of the example Calculate W(A) in the context of E
Update global W(E) For each attribute A of the example
Normalization of the W(A) by W(E) Step 2: similarity computation phase
![Page 11: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/11.jpg)
Distance evaluation
The workbench provides support for evaluation of metrics
The procedure is Divide the corpus in training and test Perform clustering on training Use similarities computed on training to
classify examples in the test and compute precision and recall – produce negative examples by randomly combining objects and attributes
![Page 12: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/12.jpg)
Experiments
Purpose: evaluate Mo’K ’s parameterization possibilities and the impact of the parameters on results
Corpora: two French corpora One with cooking recipes from the Web
– nearly 50000 examples One with agricultural data (Agrovoc) –
168287 examples
![Page 13: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/13.jpg)
Results (Asium’s distance, 20% test)
Corpus Learning object
% Induced learned triplets
Recall (test set)
Precision
Agrovoc
Action 40% 4.7% 45%
Nom 38% 5.3% 45%
Cooking Action 34% 12% 32%
Nom 38% 9.1% 52%
![Page 14: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/14.jpg)
Recall rate
X-axis: the number of disjointed classes on which recall is evaluated
![Page 15: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/15.jpg)
Class efficiency
Class efficiency: ration between triplets learned and triplets effectively used in evaluation of recall
![Page 16: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/16.jpg)
Conclusions
Comments? Questions?
![Page 17: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/17.jpg)
Ontology Learning and Its Application to Automated Terminology Translation
Authors:Roberto Navigli, Paola Velardi and
Aldo GangemiPresenter:
Ovidiu Fortu
![Page 18: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/18.jpg)
Introduction
Paper objective: Present OntoLearn, a system for
automated construction of ontologies by extraction of relevant domain terms from corpora of text
Present the usage of OntoLearn in the task of translating multiword terms from English to Italian
![Page 19: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/19.jpg)
The OntoLearn architecture
Complex system, uses external resources like WordNet and the Ariosto language processor
![Page 20: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/20.jpg)
The OntoLearn
New important feature: Semantic interpretation of terms (word
sense disambiguation) Three main phases:
Terminology extraction Semantic interpretation Creation of a specialized view of
WordNet
![Page 21: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/21.jpg)
Terminology extraction
Terms selected with shallow stochastic methods
Better quality if syntactic features are used
High frequency in a corpus is not necessarily sufficient: credit card – is a term last week – not a term
![Page 22: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/22.jpg)
Terminology extraction, continued
The comparison of frequencies in texts from different domains eliminates such constructs as “last week” – domain relevance score
Relevance of term t in domain Dk
k
kt
Dtkt
ktk
k
njj
kkt
D
tf
f
fDtPE
DtP
DtP
DtPDR
k
domain in
termoffrequency
))|((
by estimated is )|( where
,)|(
)|(
,
','
,
,1
,
![Page 23: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/23.jpg)
Terminology extraction, continued
Domain consensus of a term t in class Dk exploits the frequency of t across documents
td
dP
dPdPDC
kDd ttkt
includes document that
y probabilit theis )( where
,))(
1log)((,
![Page 24: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/24.jpg)
Terminology extraction, continued
A combination of the two scores is used to detect relevant terms
)1,0( and
entropy normalized a is where
,)1(
,
,,,
normkt
normktktkt
DC
DCDRDW
Only the terms with DW larger than a threshold are retained
![Page 25: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/25.jpg)
Semantic interpretation
Step 1: create semantic nets for every wk t and any synset wk by following all WordNet links, but limiting the path length to 3 (after disambiguation of words)
Step 2: intersect the networks and compute a score based on the number and type of semantic patterns connecting the networks
![Page 26: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/26.jpg)
Semantic interpretation, continued
Semantic patterns are instances of 13 predefined metapatterns
Example: Topic, like in archeological site
Compute the score (Sik is sense k of
word i in the term) for all possible pairs
21 SS topic
)()( 1ki
ji SSNSSNI
![Page 27: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/27.jpg)
Semantic interpretation, continued
Use the common paths in the semantic networks to detect semantic relations (taxonomic knowledge) between concepts: Select a set of domain specific semantic
relations Use inductive learning to learn semantic
relations given ontological knowledge Apply the model to detect semantic relations
Errors from the disambiguation phase can be corrected here
![Page 28: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/28.jpg)
Creation of a specialized view of WordNet
In the last phase of the process Construct the ontology by eliminating
the WordNet nodes that are not domain terms from the semantic networks
A domain core ontology can also be used as backbone
![Page 29: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/29.jpg)
Translating multiword terms
Classic approach: use of parallel corpora Advantage: easy to implement Disadvantage: few such corpora,
especially in specific domains OntoLearn based solution:
Use EuroWordNet and build ontologies in both languages, associating them to synsets
![Page 30: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/30.jpg)
Translation – the experiment
Experiment on 405 complex term in a tourism corpus
Problem: poor encoding of Italian words in EuroWordNet (fewer terms than in the English version – reduce to 113 examples)
Use semantic relations given by OntoLearn to translate: room service servizio in camera
Quality of translation Good Acceptable Poor
Manually corrected input
74% 14% 12%
OntoLearn input 70% 14% 16%
![Page 31: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu](https://reader031.vdocuments.site/reader031/viewer/2022020320/56649d5d5503460f94a3ca2c/html5/thumbnails/31.jpg)
Conclusions
Questions? Comments?