topic structure identification of pclause sequence based on generalized topic theory
DESCRIPTION
Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory. Yuru Jiang , Rou Song Beijing University of Technology. Punctuation Clause. Example :斑鳐. 斑鳐 是 鳐形目 鳐科 鳐属 的 1 种 。吻 中长 ,尖 突 。尾 细长 ,. c 1 : 斑鳐 是 鳐形目 鳐科 鳐属 的 1 种 。 c 2 : 吻 中长 , c 3 : 尖 突 。 - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/1.jpg)
Yuru Jiang , Rou Song
Beijing University of Technology
![Page 2: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/2.jpg)
Example :斑鳐
c1: 斑鳐 是 鳐形目 鳐科 鳐属 的 1 种 。c2: 吻 中长 ,c3: 尖 突 。c4: 尾 细长 ,
斑鳐 是 鳐形目 鳐科 鳐属 的 1 种 。吻 中长 ,尖 突 。尾 细长 ,
PClause Sequence
![Page 3: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/3.jpg)
c1: 斑鳐 是 鳐形目 鳐科 鳐属 的 1 种 。c2: 吻 中长 ,c3: 尖 突 。c4: 尾 细长 , t1:斑鳐 是 鳐形目 鳐科 鳐属 的 1 种 。t2:斑鳐 吻 中长 ,t3:斑鳐 吻 尖 突 。t4:斑鳐 尾 细长 ,
What we have
done
![Page 4: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/4.jpg)
Identification Process Identification Algorithm CTCs Scoring Function
![Page 5: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/5.jpg)
Example2 :斑鳐(选自《中国大百科全书》)c1: 斑鳐 是 鳐形目 鳐科 鳐属 的 1 种 。c2: 吻 中长 ,c3: 尖 突 。c4: 尾 细长 ,
t1= c1
t2= ?
![Page 6: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/6.jpg)
if :t1: 斑鳐 是 鳐形目 鳐科 鳐属 的 1 种 。c2: 吻 中长 ,
then :t2= ?
1. 吻 中长 ,2. 斑鳐 吻 中长 ,3. 斑鳐 是 吻 中长 ,4. 斑鳐 是 鳐形目 吻 中长 ,5. 斑鳐 是 鳐形目 鳐科 的 吻 中长 ,6. 斑鳐 是 鳐形目 鳐科 鳐属 吻 中长 ,7. 斑鳐 是 鳐形目 鳐科 鳐属 的 吻 中长 ,8. 斑鳐 是 鳐形目 鳐科 鳐属 的 1 吻 中长
,9. 斑鳐 是 鳐形目 鳐科 鳐属 的 1 种 吻
中长 ,
c2 的 CTCs
![Page 7: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/7.jpg)
t1
CTCs of c2
Topic Clause of C3C3
![Page 8: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/8.jpg)
if :CTCs of c2:
c3: 尖 突 ,then :
t3= ?1. 吻 中长 ,2. 斑鳐 吻 中长 ,3. 斑鳐 是 吻 中长 ,4. 斑鳐 是 鳐形目 吻 中长 ,5. 斑鳐 是 鳐形目 鳐科 的 吻 中长 ,6. 斑鳐 是 鳐形目 鳐科 鳐属 吻 中长 ,7. 斑鳐 是 鳐形目 鳐科 鳐属 的 吻 中长 ,8. 斑鳐 是 鳐形目 鳐科 鳐属 的 1 吻 中长
,9. 斑鳐 是 鳐形目 鳐科 鳐属 的 1 种 吻
中长 ,
CTCs of c2
![Page 9: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/9.jpg)
if :one CTC of c2: 斑鳐 是 鳐形目 吻 中长 ,c3: 尖 突 ,
then : one group CTCs of c3 is:
1. 尖 突 ,2. 斑鳐 尖 突 ,3. 斑鳐 是 尖 突 ,4. 斑鳐 是 鳐形目 尖 突 ,5. 斑鳐 是 鳐形目 吻 尖 突 ,6. 斑鳐 是 鳐形目 吻 中长 尖 突 ,
![Page 10: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/10.jpg)
![Page 11: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/11.jpg)
t1
c2 的CTCs
c3 的CTCs
![Page 12: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/12.jpg)
How to choose
the best path?
![Page 13: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/13.jpg)
Question1 : How to calculate the value of each node in the CTC tree ?◦ CTCs Scoring Function
Question2 : How to calculate the path value of each leaf node to the root node ?◦ Sum of the node value
![Page 14: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/14.jpg)
Given a CTC d of PClause c, a topic clause most similar to d is found from the corpus, whose similarity is marked as sim_CT(d). For any two strings x and y, given that their similarity is sim(x,y). sim_CT(d) is defined as
Topic Clause Corpus
)t,d(simmaxsim_CT(d)Tcorpust
![Page 15: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/15.jpg)
CTset(c) is the CTCs set of c, then the topic clause of c is :
Accuracy rate is 0.6499
Reference : Yuru Jiang, Rou Song: Topic Clause Identification Based On Generalized Topic Theory. Journal of Chinese Information Processing. 26(5), (2012)
)sim_CT(d)(maxarg)c(CTsetd
![Page 16: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/16.jpg)
))tc_t,tc_d(sim
)c_t,c_d(sim
)t,d(sim(max)d(CT_Simctx
prepre3
2
1Tcorpust
Accuracy rate is 0.7625 >0.6499>baseline
![Page 17: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/17.jpg)
Example3 :d_tcpre : A 一般 均 具 H 或 H C ,d_c : 用以 引诱 食饵 。t1 : A 一般 均 具 H 用以 引诱 食饵 。st1 : A C 一般 具 H ,t2 : A 一般 均 具 H 或 H C 用以 引诱 食饵 。
t_tcpre : A 有些 B C 具 C ,t_c : 以 引诱 食饵 ,t : A 有些 B C 具 C 以 引诱 食饵 ,
![Page 18: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/18.jpg)
Corpus Evaluation Criteria Experiment Result Analysis
![Page 19: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/19.jpg)
202 texts about fish in the Biology volume of China Encyclopedia
15 texts are used for test in the experiment
K-1 test are used
![Page 20: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/20.jpg)
For N PClauses, if the number of PClauses whose topic clauses are correctly identified is hitN, then the identification accuracy rate is hitN/N.
![Page 21: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/21.jpg)
Fig. 2. PClause Count and Accuracy Rate for Topic Clause Identification about 15 texts
![Page 22: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/22.jpg)
![Page 23: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/23.jpg)
![Page 24: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/24.jpg)
![Page 25: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/25.jpg)
CTCs Scoring Function
CTC Tree
Extend to other text
![Page 26: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory](https://reader036.vdocuments.site/reader036/viewer/2022062315/5681552d550346895dc30739/html5/thumbnails/26.jpg)