identifying appropriate concepts for unknown words with formal concept analysis akihiro yamamoto...
TRANSCRIPT
![Page 1: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/1.jpg)
Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis
Akihiro Yamamoto Joint work with Madori IKEDAGraduate School of Informatics
Kyoto University
28, Nov., 2014
![Page 2: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/2.jpg)
• This talk is based on:Ikeda, M., Yamamoto, A.: Classification by Selecting Plausible Formal Concepts in a Concept Lattice, In Proc. of Workshop on Formal Concept Analysis meets Information Retrieval (FCAIR2013), CEUR Workshop Proceedings, vol. 977, pp. 22-35, Moscow, Russia, 2013.
2
![Page 3: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/3.jpg)
OutlineGoal• Extending many thesauri in a uniform manner– finding (extrapolating) appropriate meaning of
unknown terms
Result• A multi-class multi-label classification method– uses formal concept analysis and does not use
parametric machine learning– does not need explicit feature selection• but selects features implicitly
– is adapted to update of test data– classifies test data better than k-NN (k-nearest
neighbor algorithm)
![Page 4: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/4.jpg)
Why Formal Concept Analysis today• FCA is very popular in Japanese Discovery Science
community, –Mainly because of the efficient algorithm by Uno
in NII–but it is not well known that it is for “ FCA”
• Syntax and Semantics of Logic can be interpreted as an instance of FCA,–We would like to present an application of FCA to
NLP–Our work might be interpreted as a type of
abduction
![Page 5: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/5.jpg)
Thesaurus Extension• Relating unregistered terms with senses related with
semantically similar registered terms– Classifying unregistered terms into senses– Multi-class multi-label classification
“dog”
“fox”
“wolf”
“rock”“stone”
“flint” sense1
sense2
“jackal” (unregistered)
Thesaurus
“dog”
“fox”
“wolf”
“rock”“stone”
“flint” sense1
sense2
“jackal”
• A popular approach [Agirre et al.’00, Uramoto’96]1. Calculating semantic similarities among unregistered and
registered terms• Semantic similarities are estimated by using
characteristics of terms acquired from corpora2. Relating unregistered terms based on the semantic
similarities
![Page 6: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/6.jpg)
(Text) Corpus
• Corpora consist of texts in a natural language– Many kinds of characteristics are also contained• POS(part of speech), N-gram, frequencies, …
“The quick brown fox jumps over the lazy dog”
The quick brown, quick brown fox, brown fox jumps,fox jumps over, jumps over the, the lazy dog
3-gram:over the lazy,
Thedeterminer
quickadjective
brownadjective
foxsingular noun
jumpsverb present
overpreposition
thedeterminer
lazyadjective
POS:(part of speech) dog
singular noun
A text:
• Semantic similarities are estimated by using characteristics in corpora
.
.
.
![Page 7: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/7.jpg)
(Text) Corpus
• Corpora consist of texts in a natural language– Many kinds of characteristics are also contained• POS(part of speech), N-gram, frequencies, …
“The quick brown fox jumps over the lazy dog”
The quick brown, quick brown fox, brown fox jumps,fox jumps over, jumps over the, the lazy dog
3-gram:over the lazy,
Thedeterminer
quickadjective
brownadjective
foxsingular noun
jumpsverb present
overpreposition
thedeterminer
lazyadjective
POS:(part of speech) dog
singular noun
A text:
• Semantic similarities are estimated by using characteristics in corpora
.
.
.
• Many kinds of corpora are available for many languages– Google N-gram (many languages), Penn (English),
L’Arboratorie (French), NEGRA (German), Dependency Treebank for Russian, ART Dependency corpus (Japanese), …
• Generally, a corpus has large amount of data– Data are automatically gathered by using parsers and
syntactic analyzers– A corpus is a set of high dimensional data
![Page 8: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/8.jpg)
Thesaurus Extension• Classifying unregistered terms into senses based on
semantic similarities
term characteristicsdogflintfox
jackalrock
stonewolf
Corpus
“dog”
“fox”
“wolf”
“rock”“stone”
“flint” sense1
sense2
“jackal” (unregistered)
Thesaurus
1. Some characteristics are selected as features
Feature space
dog
fox
wolf
rockstone
flintjackal
dog
foxwolf
rockstone
flintsense1 sense2
jackal? ?
Classification
![Page 9: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/9.jpg)
Thesaurus Extension• Classifying unregistered terms into senses based on
semantic similarities
term featuresdogflintfox
jackalrock
stonewolf
Corpus
Thesaurus
Feature space
dog
wolf rock
stoneflintjackal
dogwolf
rockstone
flintsense1 sense2
jackal? ?
Classification
“dog”
“fox”
“wolf”
“rock”“stone”
“flint” sense1
sense2
“jackal” (unregistered)
fox
fox
1. Some characteristics are selected as features
![Page 10: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/10.jpg)
Thesaurus Extension• Classifying unregistered terms into senses based on
semantic similarities
term featuresdogflintfox
jackalrock
stonewolf
Corpus
Thesaurus
Feature space
dogfox
wolf rock
stoneflintjackal
dogwolf
rockstone
flintsense1 sense2
Classification
“dog”
“fox”
“wolf”
“rock”“stone”
“flint” sense1
sense2
“jackal” (unregistered)
fox
2. Finding neighbors
jackal
1. Some characteristics are selected as features
3. Classifying the unregistered term
Accuracy of classification depends on features
![Page 11: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/11.jpg)
features
Extending Many Thesauri• For each thesaurus, an appropriate features are required
term
Thesaurus1 Thesaurus2 Thesaurus3
t7
t1
t8
t6t3 t5
t2t4
u
featuresterm term features
t7t6
t8
t2 t4
t3
t1
t5
t7
t1
t8
t6t3 t5
t2
t4t7
t2t8
t3 t5
t6
t1
t4
t7t6
t8
t2 t4
t3
t1
t5u
t8t2 t3 t5
t1
ut6 t7 t4
u uu
![Page 12: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/12.jpg)
features
Extending Many Thesauri• For each thesaurus, an appropriate features are required
term
Thesaurus1 Thesaurus2 Thesaurus3
t7
t1
t8
t6t3 t5
t2t4
u
featuresterm term features
t7t6
t8
t2 t4
t3
t1
t5
t7
t1
t8
t6t3 t5
t2
t4t7
t2t8
t3 t5
t6
t1
t4
t7t6
t8
t2 t4
t3
t1
t5u
t8t2 t3 t5
t1
ut6 t7 t4
u uu
• Extending many thesauri needs various features corresponding to each thesaurus– Selecting features for multi-label multi-class
classification is difficult and time-consuming• For a thesaurus, its extension is executed repeatedly– Corpora are frequently updated (new unregistered
terms appear, new characteristics are adopted)
![Page 13: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/13.jpg)
Our Idea for Extending Many Thesauri1. Preparing various types of sets of terms as sets of candidates for
neighbors with no thesauri2. Selecting some of the prepared sets as sets of neighbors of
unregistered terms for each thesaurus
t7
t1
t8
t6t3 t5
t2t4
ut7t6
t8
t2 t4
t3
t1
t5u
t8t2 t3 t5
t1
ut6 t7 t4
u
t7
t1 t5 t2
t3
t8
t6 t4
The prepared setsare representedin a concept lattice
![Page 14: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/14.jpg)
Our Idea for Extending Many Thesauri1. Preparing various types of sets of terms as sets of candidates for
neighbors with no thesauri2. Selecting some of the prepared sets as sets of neighbors of
unregistered terms for each thesaurus
t7
t1
t8
t6t3 t5
t2t4
ut7t6
t8
t2 t4
t3
t1
t5u
t8t2 t3 t5
t1
ut6 t7 t4
u
t7
t1 t5 t2
t3
t8
t6 t4
The prepared setsare representedin a concept lattice
Thesaurus1
t7t6
t8
t2 t4
t3
t1
t5u
Neighbors
![Page 15: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/15.jpg)
Our Idea for Extending Many Thesauri1. Preparing various types of sets of terms as sets of candidates for
neighbors with no thesauri2. Selecting some of the prepared sets as sets of neighbors of
unregistered terms for each thesaurus
t7
t1
t8
t6t3 t5
t2t4
ut7t6
t8
t2 t4
t3
t1
t5u
t8t2 t3 t5
t1
ut6 t7 t4
u
t7
t1 t5 t2
t3
t8
t6 t4
The prepared setsare representedin a concept lattice
Thesaurus1 Thesaurus2
t7t6
t8
t2 t4
t3
t1
t5
t7t2
t8
t3 t5
t6
t1
t4u u
Neighbors
![Page 16: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/16.jpg)
Our Idea for Extending Many Thesauri1. Preparing various types of sets of terms as sets of candidates for
neighbors with no thesauri2. Selecting some of the prepared sets as sets of neighbors of
unregistered terms for each thesaurus
t7
t1
t8
t6t3 t5
t2t4
ut7t6
t8
t2 t4
t3
t1
t5u
t8t2 t3 t5
t1
ut6 t7 t4
u
t7
t1 t5 t2
t3
t8
t6 t4
The prepared setsare representedin a concept lattice
Thesaurus1 Thesaurus2 Thesaurus3
t7t6
t8
t2 t4
t3
t1
t5
t7
t1
t8
t6t3 t5
t2
t4t7
t2t8
t3 t5
t6
t1
t4u uu
Neighbors
![Page 17: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/17.jpg)
Problems and SolutionProblems in extending many thesauri• Extending many thesauri needs various features
corresponding to each thesaurus– Selecting features for multi-label multi-class
classification is difficult and time-consuming• For a thesaurus, its extension is executed repeatedly– Corpora are frequently updated (new unregistered
terms appear, new characteristics are adopted)
Our solution1. Preparing various types of sets of candidates for
neighbors in a concept lattice generated from a corpus– A concept lattice can be updated incrementally by
presented algorithms [Nishimura et al. 14, Choi et al.’06, Soldano et al.’10, Valtchev et al.’01]
2. Selecting some of the sets of candidates as sets of neighbors of unregistered terms for each thesaurus
![Page 18: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/18.jpg)
Contents1. Thesauri and thesaurus extension by using
corpora2. A classification method using Formal Concept
Analysis– Formal definition–Preparation of candidates for neighbors– Selection of candidates for neighbors
3. Experiments4. Conclusion
![Page 19: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/19.jpg)
Formal Concept Analysis (1/2)• A (formal) context K=(G, M, I)– G: a set of objects– M: a set of attributes– I: a subset of G×M
• Galois connection– AI:={m∈M|∀g∈A.(g, m)∈I} for A⊆G– BI:={g∈G|∀m∈B.(g, m)∈I} for B⊆G
• A formal concept c=(A, B) of K– AI=B, A=BI for A⊆G, B⊆M– Ex(c):=A, the extent of c– In(c):=B, the intent of c
K0=(G0, M0, I0)
m1 m2 m3 m4 m5 m6 m7
g1 × ×g2 × × ×g3 × × ×g4 × × ×g5 × × ×g6 × × ×g7 × × ×
![Page 20: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/20.jpg)
Formal Concept Analysis (2/2)• orignally proposed by Rudolf Wille for classifying data– represent relationship between objects and
attributes• It is said that he referred Galois connection in
algebraic geometry– objects : extended fields, manifolds attributes : bases, polynomials
• On the viewpoint of graph theory, a closed item set is a clique in the bipartite graph– An efficient algorithm for enumerating all cliques
was developed by Uno et al.
![Page 21: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/21.jpg)
Instances of FCA (1/3)Frequent Pattern Mining from Transaction Databases G: the set of transaction IDs M: the set of item IDs I : transaction data
m1 m2 m3 m4 m5 m6 …g1 × ×g2 × × ×g3 × × ×g4 × × ×g5 × × ×g6 × × ×…
![Page 22: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/22.jpg)
Instances of FCA (2/3)Logic G: the set of formulae M: the class of interpretations I : m |- g
m1 m2 m3 m4 m5 m6 …g1 × ×g2 × × ×g3 × × ×g4 × × ×g5 × × ×g6 × × ×…
m1 m2 m3 m4 m5 m6 …g1 × ×g2 × × ×g3 × × ×g4 × × ×g5 × × ×g6 × × ×…
Formal Languages G: the set of words M: a class of grammar I : S ®g m
![Page 23: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/23.jpg)
Instances of FCA (3/3)• In thesaurus extension,
G : the set of terms M: the set of characteristics I : corpus
m1 m2 m3 m4 m5 m6 …g1 × ×g2 × × ×g3 × × ×g4 × × ×g5 × × ×g6 × × ×…
![Page 24: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/24.jpg)
Principal Filter• The concept lattice B(K) of a context K=(G, M, I)– γg: the object concept ({g}II, {g}I) of an object g∈G– ↑c: the principal filter {c’∈B(K)|c≤c’} of c∈B(K)• c≤c’ for formal concepts c,c’∈B(K) iff Ex(c)≤Ex(c’)
K0=(G0, M0, I0)
g1g2g3g4g5g6g7c1
c7g2g3
m1m2m4 c9g5g6
m2m5m6 c10g7
m3m5m7
c11m1m2m3m4m5m6m7
c8g4
m2m4m6
c4g1g2g3
m1m2 c5g2g3g4
m2m4 c6g4g5g6
m2m6
c2g1g2g3g4g5g6
m2 c3g5g6g7
m5
B(K0)
↑γg4
γg4
m1 m2 m3 m4 m5 m6 m7
g1 × ×g2 × × ×g3 × × ×g4 × × ×g5 × × ×g6 × × ×g7 × × ×
![Page 25: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/25.jpg)
Training Set• A training set τ=(T, L)– T: a subset of objects, T⊆G• g∈T: a known object• u∈T - G: an unknown object
– L: a set of labels– L: T→2L
g1 g2
l1 l4
g3 g5
l5
g6 g7
l2 l3 l6 l7
L0
T0
l8L0
τ0=(T0, L0)
![Page 26: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/26.jpg)
Training Set• A training set τ=(T, L)– T: a subset of objects, T⊆G• g∈T: a known object• u∈T - G: an unknown object
– L: a set of labels– L: T→2L
g1 g2
l1 l4
g3 g5
l5
g6 g7
l2 l3 l6 l7
L0
T0
l8L0
τ0=(T0, L0)
• In thesaurus extension,– A thesaurus is a training set• Registered terms• are known objects• Unregistered terms• are unknown objects• Senses are labels
SensesRegistered terms
LT L
Thesaurus: τ=(T, L)
“dog”
“fox”
“wolf”
“rock”“stone”
“flint” sense1
sense2
![Page 27: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/27.jpg)
Contents1. Thesauri and thesaurus extension by using
corpora2. A classification method using a concept lattice– Formal definition–Preparation of candidates for neighbors– Selection of candidates for neighbors
3. Experiments4. Conclusion
![Page 28: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/28.jpg)
Preparation of Candidates• Preparation is constructing a concept lattice from a context• A set of candidates for neighbors of an unknown object is the extent of a
formal concept– A context K=(G, M, I), a training set τ=(T,L)
• u: an unknown object u∈G\T– All prepared sets of candidates for neighbors of u are represented by
every concept c ↑∈ γu as its extent |Ex(c)|
ug3
g5
g1
g6
g2g4
↑γu g1g2g3g4g5g6u c1
c8u
c5g2g3uc6g4g5u
c2g1g2g3g4g5u
![Page 29: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/29.jpg)
Contents1. Thesauri and thesaurus extension by using
corpora2. A classification method using a concept lattice– Formal definition–Preparation of candidates for neighbors– Selection of candidates for neighbors
3. Experiments4. Conclusion
![Page 30: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/30.jpg)
Selection of Candidates• Neighbors for an unknown object u are extracted from extents
of selected formal concepts– The selected formal concepts are called plausible formal
concepts• Policies of selecting plausible formal concepts
1. Precision oriented:– Neighbors of u should be similar to each other2. Recall oriented:– Objects near neighbors of u should be also neighbors of u
only if Policy 1 is guaranteed• Score σ(c, τ) of concept c under a training set τ(T, L)– Similarities among objects in Ex(c)
![Page 31: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/31.jpg)
Selection of Candidates• Neighbors for an unknown object u are extracted from extents
of selected formal concepts– The selected formal concepts are called plausible formal
concepts• Policies of selecting plausible formal concepts
1. Precision oriented:– Neighbors of u should be similar to each other2. Recall oriented:– Objects near neighbors of u should be also the neighbors
only if Policy 1 is guaranteed• Score σ(c, τ) of concept c under a training set τ(T, L)– Similarities among objects in Ex(c)
Score σ(c, τ) of concept c is calculated as
σ(c, τ)=
1
0
sim(gi, gj)Σ|Ex(c)∩T|i=1 j=i+1Σ|Ex(c)∩T|
2|Ex(c)∩T|
otherwise,
if|Ex(c)∩T|=0,
if|Ex(c)∩T|=1, and
where sim(gi, gj)= |L(gi)∪L(gj)||L(gi)∩L(gj)|
Pairwise similarity of objects(Jaccard index)
The average of pairwise similarities
![Page 32: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/32.jpg)
Plausible Formal Concepts
• Policies of selecting plausible formal concepts1. Precision oriented:– Neighbors of u should be similar to each other2. Recall oriented:– Objects near neighbors of u should be also the neighbors
only if Policy 1 is guaranteed
• For an unknown object u, a formal concept c ↑∈ γu is plausible if both of following regulations are satisfied1. Precision oriented regulation:• (c, τ)≥σ(c’, τ) for any other formal concept c’ ↑∈ γu
2. Recall oriented regulation:• |Ex(c)|≥|Ex(c’)| for any other • formal concept c’ ↑∈ γu s.t. σ(c, τ)=σ(c’, τ)
• Selecting plausible concepts can be regarded as implicit feature selection
![Page 33: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/33.jpg)
Plausible Formal Concepts
• σ(c2, τ0)
=(sim(g1, g2)+sim(g1, g3)+sim(g1, g4)+sim(g1, g5)+sim(g2, g3)
=+sim(g2, g4)+sim(g2, g5)+sim(g3, g4)+sim(g3, g5)+sim(g4, g5))/10=(0.25+0+0.25+0+0.2+0+0+0.2+0.25+0.666…)/10=0.181…
g1 g2
l1 l4
g3 g4
l5
g5 g6
l2 l3 l6 l7
L0
T0
l8L0
τ0=(T0, L0)↑γu g1g2g3g4g5g6u c1
c8u0
c5g2g3u0.2 c6g4g5u
0.666…
c2g1g2g3g4g5u0.181…
0.187…
![Page 34: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/34.jpg)
Plausible Formal Concepts
• The formal concept c6 is the only plausible formal concept of the unknown object u
• Finally, the unknown object u is related to labels l1, l6, and l7 like the neighbors g4 and g5 in the plausible formal concept is related
g1 g2
l1 l4
g3 g4
l5
g5 g6
l2 l3 l6 l7
L0
T0
l8L0
τ0=(T0, L0)↑γu g1g2g3g4g5g6u c1
c8u0
c5g2g3u0.2 c6g4g5u
0.666…
c2g1g2g3g4g5u0.181…
0.187…
u
• L(u)=∪g N(∈ u)L(g), N(u)=∪c P(∈ u)Ex(c)– P(u): a set of plausible formal concepts of u
![Page 35: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/35.jpg)
Time Complexity• Classifying an unknown object u– O(|↑γu|M2)• M = mean size of Ex(c)\T, c ↑∈ γu
–More unknown objects reduce time-costg1g2g3g4g5g6g7
c1
c7g2g3c9g5g6
c10g7
c11
c8g4
c4g1g2g3c5g2g3g4
c6g4g5g6
c2g1g2g3g4g5g6 c3g5g6g7
unknown
scored
![Page 36: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/36.jpg)
Contents1. Thesauri and thesaurus extension by using
corpora2. A classification method using a concept lattice– Formal definition–Preparation of candidates for neighbors– Selection of candidates for neighbors
3. Experiments4. Conclusion
![Page 37: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/37.jpg)
Thesauri and Corpora for Experiments• Two thesauri are used as two training sets– Japanese WordNet 1.0 (JWN)– Bunruigoihyo (BGH)
• Two corpora are used as two contexts– Kyoto University’s case frame data (KCF) as K1=(G1,
M1, I1)– Japanese Web 4-gram (J4g) as K2=(G1, M2, I2)– and the third context (combined) is K3=(G1, M1∪M2,
I1∪I2)
• Experiments– 10-fold cross validation for each pair of a training
set (thesaurus) and context (corpus)–Measure precision and recall– Compare with k-NN for k={1, 5, 10}
![Page 38: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/38.jpg)
Experimental Results• Our method is better than k-NN in all accuracies
method precision recall simple precision recall simplewith T 0.039 0.274 0.347 0.164 0.553 0.562
without T (#classified)
0.066 0.094 0.141 0.238 0.329 0.363
1-NN 0.026 0.024 0.037 0.103 0.103 0.1145-NN 0.007 0.036 0.056 0.031 0.150 0.167
10-NN 0.004 0.038 0.061 0.016 0.169 0.193with T 0.007 0.079 0.114 0.028 0.248 0.269
without T(#classified)
0.007 0.052 0.084 0.029 0.237 0.259
1-NN 0.007 0.007 0.010 0.027 0.027 0.0315-NN 0.002 0.013 0.020 0.014 0.070 0.075
10-NN 0.002 0.018 0.028 0.010 0.100 0.119with T 0.030 0.072 0.108 0.132 0.250 0.285
without T(#classified)
0.030 0.063 0.098 0.134 0.255 0.283
1-NN 0.009 0.009 0.014 0.039 0.039 0.0435-NN 0.004 0.018 0.031 0.017 0.085 0.094
10-NN 0.002 0.024 0.039 0.011 0.116 0.011
KCF
J4g
Com
bine
d
JWN BGH
(443.2 / 764) (524.3 / 764)
(715.1 / 764) (753.5 / 764)
(745.5 / 764) (716.3 / 764)
![Page 39: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/39.jpg)
Statistics for Contexts and Training Sets
KCF J4g Combined# objects 7,636 7,636 7,636
# attributes 19,313 7,135 26,448mean # attributes of objects 3.85 4.70 8.55
# formal concepts 11,960 20,066 30,540mean size of extents 2.55 6.04 4.89
mean # scored concept for an object 2.99 14.87 18.58
• In the experiments, our method works faster– Classifying an unknown object u
• O(|↑γu|M2)– M = mean size of Ex(c)\T, c ↑∈ γu
– Other methods• O(|T|)
JWN BGH# known objects 6,872 6,872
# labels 9,560 595mean # labels of objects 2.19 2.89
![Page 40: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/40.jpg)
Conclusion• We proposed a multi-class multi-label classification method– provides a uniform manner to extending many thesauri– does not need parameters– selects features implicitly• Preparing candidates for neighbors as formal concepts• Selecting formal concepts as neighbors of unknown
objects (plausible formal concept)• By experiments, we confirm that our method works faster
and classifies more accurately than k-NN
• Future work– improving classification accuracy of our method
![Page 41: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/41.jpg)
References1. [Agirre 09] Agirre, E., Ansa, O., Hovy, E., Martinez, D.: Enriching Very Large
Ontologies Using the WWW. Proc. of ECAI’00 Workshop “Ontology Learning” (2000)
2. [Choi 06] Choi, V., Huang, Y.: Faster Algorithms for Constructing a Galois Lattice, Enumerating All Maximal Bipartite Cliques and Closed Frequent Sets. SIAM Conference on Discrete Mathematics (2006)
3. [Davey 02] Davey, B., A., Priestly, H., A.: Introduction to Lattice and Order. Cambridge University Press (2002)
4. [Deng 12] Deng, H., Runger, G.: Feature Selection via Regularized Trees. Proc. of IJCNN’12, pp. 1–8. (2012)
5. [Ganter 99] Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer-Verlag (1999)
6. [GSK 09] 言語資源協会 , http://www.gsk.or.jp (2009)7. [Kudo 04] Kudo, T., Kazawa, H.: Web Japanese N-gram Version 1, Gengo Shigen
Kyokai (2004)8. [NINJAL 04] 国立国語研究所 , http://www.ninjal.ac.jp/archives/goihyo (2004)
![Page 42: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/42.jpg)
References9. [Lopez 06] Lopez, F., G., Torres, M., G., Melian, B., Perez, J., A., M., Moreno-Vega,
J., M.: Solving Feature Subset Selection Problem by a Parallel Scatter Search. European Journal of Operational Research, vol. 169, no. 2, pp. 477–489 (2006)
10. [Mok 12] Mok, S., W., H., Gao, H., E., Bond, F.: Using Wordnet to Predict Numeral Classifiers in Chinese and Japanese. Proc. of GWC’12 (2012)
11. [Soldano 10] Soldano, H., Ventos, V., Champesme, M., Forge, D.: Incremental Construction of Alpha Lattices and Association Rules. Proc. of KES’10, pp. 351–360. Springer (2010)
12. [Tan 2005] Tan, P., Steinbach, M., Kumar, V.: Introduc- tion to Data Mining. Addison Wesley (2005)
13. [Uramoto 96] Uramoto, N.: Positioning Unknown Words in a Thesaurus by Using Information Extracted from a Corpus. Proc. of COLING’96, vol. 2, pp. 956–961. Association for Computational Linguistics (1996)
14. [Valtchev 01] Valtchev, P., Missaoui, R.: Building Concept (Galois) Lattices from Parts: Generalizing the Incremental Methods. Proc. of ICCS’01, pp. 290–303. Springer (2001)
15. [Wang 06] Wang, J., Ge, N.: Automatic Feature Thesaurus Enrichment: Extracting Generic Terms From Digital Gazetteer. Proc. of JCDL’06, pp. 326–333. IEEE (2006)
![Page 43: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/43.jpg)
Thank you!
![Page 44: Identifying Appropriate Concepts for Unknown Words with Formal Concept Analysis Akihiro Yamamoto Joint work with Madori IKEDA Graduate School of Informatics](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e315503460f94b22868/html5/thumbnails/44.jpg)
Contexts for Experiments• Two contexts are constructed from two Japanese corpora– Kyoto University’s case frame data• A case frame is a relation between a noun and a
predicate“inu ga otoko ni hoete iru (in English, a dog is barking to a man)”
inu (dog)ga (is/do)
otoko (man)hoeru (bark) ni (to)
hoeru (bark), ga (is/do) hoeru (bark), ni(to)
inu (dog) ×otoko (man) ×
ga otoko ni hoete iruinu × × ×
otoko × × ×
– Japanese Web 4-gram