intelligent database systems lab n.y.u.s.t. i. m. unsupervised word sense disambiguation for korean...
TRANSCRIPT
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph
using corpus and dictionary
Presenter: Chun-Ping Wu Authors: Yeohoon Yoon, Choong-Nyoung Seon, Songwook Lee, Jungynu Seo
IPM 2007
國立雲林科技大學National Yunlin University of Science and Technology
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Outline
Motivation Objective Methodology Experiments Conclusion Comments
2
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Motivation
The Word Sense Disambiguation is a common problem in natural language processing.
Traditional approaches only consider the co-occurrence probability alone.
3
Sample: I deposit some money in the bank.
Options:bank = 銀行?bank = 堤 ; 岸?bank = ( 一 ) 排; ( 一 ) 組
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objective
To construct a WSD system, which can be easily implemented by learning all polysemous words at once, while covering all polysemous words which are listed in MRD.
To consider relation between each sense of context words and the sense of the target word.
4
Sample: I deposit some money In the bank.
Ans:bank = 銀行
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
Learning step Similarity matrix
Word vector
Vector representations of sense definitions in MRD
Disambiguation step The definition of acyclic weighted digraph.
Selecting context words
Constructing the acyclic weighted digraph
Searching the optimal path on the acyclic weighted digraph
5
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology Learning step
Similarity matrix
Word vector
Vector representations of sense definitions
in MRD
6
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology Learning step
Similarity matrix
Word vector
Vector representations of sense definitions
in MRD.
7
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology Learning step
Similarity matrix
Word vector
Vector representations of sense definitions
in MRD
8
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
Disambiguation step The definition of acyclic weighted digraph.
Selecting context words
Constructing the acyclic weighted digraph
Searching the optimal path on the acyclic
weighted digraph
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
Disambiguation step The definition of acyclic weighted digraph.
Selecting context words
Constructing the acyclic weighted digraph
Searching the optimal path on the acyclic
weighted digraph
10
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
Disambiguation step The definition of acyclic weighted digraph.
Selecting context words
Constructing the acyclic weighted digraph
Searching the optimal path on the acyclic
weighted digraph
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
Disambiguation step The definition of acyclic weighted digraph.
Selecting context words
Constructing the acyclic weighted digraph
Searching the optimal path on the acyclic
weighted digraph
12
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments
System results
13
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments
Experiment on English The accuracy of the system is 30.7% on average.
The result is very low; there are some reasons as follows. Context words are not appropriate although context words are very important in that
they decide which sense of the target word might be the best. Mapping English senses to Korean for using English-Korean dictionary leads to some
loss of information. The errors of the stemming process disturbed us to search the right root of the verb in
the MRD.
14
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Conclusion
1515
To consider the relationship between each sense of context words and the sense of the target word
By using Viterbi algorithm to reduce computational complexity.
The system showed bad results on English (30.7), but it resulted in suitable performances, 76.4% by accuracy, over the semantically ambiguous Korean words.
To apply this method to other languages by studying language characteristics.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Comments
1616
Advantage To consider the relationship between each sense of context words and
the sense of the target word.
By using Viterbi algorithm to reduce computational complexity.
Drawback The performance of this system is better in Korean.
Application Word Sense Disambiguation