oana adriana Şoica
DESCRIPTION
Building and Ordering a SenDiS Lexicon Network. Oana Adriana Şoica. SenDiS operates on a specific lexicon network ( LexNet ) – “sense tagged glosses” relations lexicon networks obtained from other semantic / lexical relations obtaining a SenDiS LexNet : - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/1.jpg)
Oana Adriana Şoica
Building and Ordering a SenDiS Lexicon Network
![Page 2: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/2.jpg)
Page 2
SenDiS
SenDiS operates on a specific lexicon network (LexNet) – “sense tagged glosses” relations
lexicon networks obtained from other semantic / lexical relations
obtaining a SenDiS LexNet: build a “sense tagged glosses” LexNet
(manually annotate the lexicon with a specific tool) import a “sense tagged glosses” LexNet (WordNet tagged glosses, as of 2008)
preprocessing (ordering) the SenDiS LexNet (before WSD) truncation of the LexNet leveling the LexNet
Outline
![Page 3: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/3.jpg)
Page 3
SenDiS
o hypernyms
o hyponyms
o similar to
o has part
o synonyms
o antonyms
o holonyms
o meronyms
o coordinate terms
o troponyms
o entailment
Semantic/Lexical Relations
![Page 4: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/4.jpg)
Page 4
SenDiS
An excerpt of the WordNet semantic network* Navigli, R. 2009.Word sense disambiguation: A survey. ACM Comput. Surv. 41, 2, Article 10 (2009)
Semantic/Lexical relations: WordNet
![Page 5: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/5.jpg)
Page 5
SenDiSSemantic/Lexical relations: GRAALANTail of relation Head of relation Relation type
{synonym } {synonym} Bidirectional, symmetric
{antonym } {antonym} Bidirectional, symmetric
{paronym} {paronym} Bidirectional, symmetric
{ hypernym } {hyponym} Bidirectional, asymmetric
{connotation} - Unidirectional
{holonym} {meronym} Bidirectional, asymmetric
{homonym} {homonym} Bidirectional, symmetric
{heteronym} {heteronym} Bidirectional, symmetric
{homophone} {homophone} Bidirectional, symmetric
{diminutive of} {diminutive by} Bidirectional, asymmetric
{augmentative of} {augmentative by} Bidirectional, asymmetric
{extension from} {extension into} Bidirectional, asymmetric
{reduction from} {reduction into} Bidirectional, asymmetric
{generalization from} {generalization into} Bidirectional, asymmetric
{specialization from} {specialization into} Bidirectional, asymmetric
{figurative of} {literal for} Bidirectional, asymmetric
{reference to} - Unidirectional
{derived from} {derived into} Bidirectional, asymmetric
{back formatted form} {back formats} Bidirectional, asymmetric
{abstract for} {concretized from} Bidirectional, asymmetric
{with variant} {variant for} Bidirectional, asymmetric
![Page 6: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/6.jpg)
Page 6
SenDiS
manually annotating the glosses from a lexicon(using a specific tool that can ease the process)
importing an existing “gloss tagged” lexicon net (also obtained manually or semi-automatically), this usually translates in a dependency to a specific list of meanings/glosses
Obtaining a SenDiS LexNet
![Page 7: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/7.jpg)
Page 7
SenDiS
o implied a significant effort, usually measured in months, involving several trained linguists
o using a specialized collaborative tool(BuildLNTool – Build Lexicon Network Tool)
o enriching the “gloss tagged” relation with three relative degrees of importance (in the gloss context) weak medium strong or ignoring the gloss word
o SenDiS objective, two LexNets: “gloss tagged” LexNet for the Romanian language “gloss tagged” LexNet for the English language
Creating the SenDiS LexNet
![Page 8: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/8.jpg)
Page 8
SenDiS
o BuildLNTool (Build Lexicon Network Tool) provides:
a visual and effective mechanism to manually annotate the lexicon glosses
a synchronized overview of the already created relations
a browsing mechanism for inspecting the already tagged glosses and relations
BuildLNTool
![Page 9: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/9.jpg)
Page 9
SenDiS
“Lemmas & MWEs” “Lemma \ MWE Info” “Competence & Definition Trees”
“Root & Leaf Meanings” Messages and progress
BuildLNTool - Sections
![Page 10: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/10.jpg)
Page 10
SenDiS
o “Lemmas & MWEs”: list of lexicon entries
o “Root & Leaf Meanings”: list of roots and leafs for the lexicon network
o “Lemma/MWE Info”: current lexicon entry being analyzed
o “Competence & Definition Trees”: spanning trees for a given meaning over the current lexicon net
o section for messages and progress
BuildLNTool – Sections II
![Page 11: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/11.jpg)
Page 11
SenDiS
selection of lexicon entry type
selection of unfinished lexicon entries filter
selection of viewing interval
text filter
lexicon entry text
lexicon entry status
BuildLNTool – Lemmas & MWEs
![Page 12: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/12.jpg)
Page 12
SenDiS
double click
BuildLNTool – Selection of a current lexicon entry
![Page 13: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/13.jpg)
Page 13
SenDiS
lexicon entry text morphologic interpretation
list of meanings filters
meaning/gloss fully tagged
meaning/gloss partially tagged
meaning/gloss not tagged
BuildLNTool – Browsing the meanings of the current lexicon entry
![Page 14: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/14.jpg)
Page 14
SenDiS
double click
BuildLNTool – Selection of a current meaning for tagging
![Page 15: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/15.jpg)
Page 15
SenDiS
unrecognizedgloss constituent
‘Enter’
BuildLNTool – Gloss constituent without interpretations
![Page 16: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/16.jpg)
Page 16
SenDiS
Default setting: Medium
BuildLNTool – Degrees of relevance (in gloss context)
![Page 17: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/17.jpg)
Page 17
SenDiS
‘Strong’ tokens
‘Medium’ tokens
‘Weak’ tokens
Ignored (X) tokens
BuildLNTool – Degrees of relevance II
![Page 18: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/18.jpg)
Page 18
SenDiS
Unsavedannotations
Savedannotations
BuildLNTool – Gloss tagging
![Page 19: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/19.jpg)
Page 19
SenDiS
view of meaning tagging tree
selection of constituent / group of gloss constituents
set / modifyrelevance degree
edit textof gloss constituent
select / modify the sense for the gloss constituent
further annotate meaning / save annotations
chose the next meaning
further on
save annotations
current gloss constituent
withoutsense interpretations
BuildLNTool – Gloss tagging protocol
![Page 20: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/20.jpg)
Page 20
SenDiS
LexNets AllTokens OperatedTokens OpTokensValid OpTokensRelated OpTokens V & R
LL_Romanian - 99% 1,528,819 1,191,942 691,010 720,420 686,210
LL_English - 2% 36,828 30,350 18,523 17,641 17,505
LexNets Glosses Tagged Glosses Targeted Glosses Tags Density
LL_Romanian - 99% 130,087 118,536 58,976 0.5757
LL_English - 2% 259,651 3,496 7,551 0.5767
Built LexNets for Romanian and English
![Page 21: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/21.jpg)
Page 21
SenDiS
o WordNet (3.0) is organized in synsets 117,659 synsets 155,287 words (lexicon entries) 206,941 word-sense pairs (gloss + usage examples)
o the synsets were split and transformed in to a classical lexicon format
o the lexicon network imported:
LexNets Glosses Tagged Glosses Targeted Glosses Tags Density
WordNet 206,941 206,938 59,251 0.3486
WordNet_extendedGlosses 206,941 206,941 83,174 0.3006
LexNets AllTokens OperatedTokens OpTokensValid OpTokensRelated OpTokens V & R
WordNet 2,394,190 2,394,190 2,394,189 834,803 834,803WordNet_extendedGlosses 3,114,968 3,114,968 3,114,967 936,397 936,397
Imported WordNet tagged glosses
![Page 22: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/22.jpg)
Page 22
SenDiS
o “gloss tagged” lexicon nets are large and dense graphs between 100,000 and 200.000 vertices over 1,000,000 edges / arcs
o to ease the operation with such graphs, “gloss tagged” lexicon nets can be preprocessed and optimized truncation of a lexicon net leveling of a lexicon net
o aims when optimizing a lexicon net elimination of loops or strong connected components a minimum number of removed edges leveling on a minimum number of levels minimization/maximization of roots/leafs vertices
Ordering a SenDiS LexNet
![Page 23: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/23.jpg)
Page 23
SenDiS
e9
e4 e5 e6 e7
e8
e1 e2 e3
A minimal lexicon net in the original form
Unordered LexNet
![Page 24: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/24.jpg)
Page 24
SenDiS
9
1
2
3
4
5
6
7
8
V
e11
e1
e2
e3
e4
e5
e6
e7
e8
e9
10
e10
11
B
The same minimal lexicon net leveled
Ordered (leveled) LexNet
![Page 25: Oana Adriana Şoica](https://reader036.vdocuments.site/reader036/viewer/2022081420/568166e5550346895ddb1ec1/html5/thumbnails/25.jpg)
Page 25
SenDiS
LNs Vertices Edges InOLN
Algorithm Edges Out Edges Removed Levels Time (s)
wn 202,361 834,803 Patentv1 821,048 13,755 192 4.5
wn_ex 205,188 936,397 Patentv1 936,397 74,526 382 5.7
ro_48% 72,067 318,741 Patentv1 308,592 10,149 195 1.6
ro_78% 100,175 523,192 Patentv1 504,210 18,982 244 2.3
ro_99% 120,472 686,784 Patentv1 659,030 27,754 291 2.8
ro_48% 130,407 318,741 NT_eades 308,334 10,407 58 60
ro_99% 130,099 686,784 NT_eades 654,025 32,759 70 330
wn_ex 206,941 936,397 NT_eades 904,992 31,405 46 1,315
Results on leveling experimental LexNets