topic embedding of sentences for story segmentation · topic embedding of sentences for story...

6
Topic Embedding of Sentences for Story Segmentation Jia Yu * , Xiong Xiao , Lei Xie *‡ , Eng Siong Chng , * Shaanxi Provincial Key Laboratory of Speech and Image Information Processing, School of Computer Science, Northwestern Polytechnical University, Xi’an, China School of Computer Engineering, Nanyang Technological University, Singapore E-mail: {jiayu,lxie}@nwpu-aslp.org, {xiaoxiong,ASESChng}@ntu.edu.sg Abstract— In this paper, we propose to embed sentences into fixed-dimensional vectors that carry the topic information for story segmentation. As a sentence comprises of a sequence of words and may have different lengths, we use long short-term memory recurrent neural network (LSTM-RNN) to summarize the information of the whole sentence and only predict the topic class at the last word in the sentence. The output of the network at the last word can be used as an embedding of the sentence in the topic space. We used the obtained sentence embeddings in the HMM-based story segmentation framework and obtained promising results. On the TDT2 corpus, the F1 measure is improved to 0.789 from 0.765 which is obtained by a competitive system using DNN and bag-of-words features. I. I NTRODUCTION Serving as a necessary precursor for the downstream ap- plications [1], [2], [3], [4], story segmentation [1], [5], [6], [7] aims to partition a stream of video, audio or text into a sequence of topically coherent segments, each addressing a central topic. Manual segmentation is accurate but costly both in time and effort. Therefore, automatic story segmentation is in high demand. As recent speech recognition technology has been a huge success, lexical-cohesion based approaches on speech transcripts have been gained greater emphasis. The representation of sentences in a text is a fundamental step not only for the story segmentation task in hand but also for many natural language processing tasks. By lexical cohesion, we mean that terms in a coherent story are likely to be semantically related, and different stories tend to use different sets of terms. By measuring the lexical cohesiveness between consecutive sentences in a text, a story boundary can be determined. Thus representation of words or sentences is the first important step for story segmentation. Traditional lexical-cohesion based approaches, e.g., TextTiling [7], [8], only consider simple term statistics in a sentence, e.g., using Bag-of-words (BOW) representation or term frequency-inverse document frequency (tf-idf) score. In contrast, topic modeling techniques were studied to provide semantic-level cohesiveness measurement. Probabilistic latent semantic analysis (PLSA) [9], latent Dirichlet allocation (L- DA) [10] and Laplacian PLSA (LapPLSA) [11] have recently become the commonly used topic modeling techniques for story segmentation and related tasks [12], [13]. Note that topic Corresponding author modeling can be viewed as a kind of embedding techniques, in which the term statistics in a text block (e.g., a sentence) is embedded into a vector of the distribution of latent topics. Recently, deep learning has achieved great success in varies areas due to its strong ability of feature learning and modeling. Thus various deep learning architectures have been explored to learn embedding of words or sentences for different task [14], [15], [16], [17], [18]. For example, word vector derived from neural network language model (NNLM) can capture syntactic and semantic information [14]. Based on word vectors, doc- ument vectors are derived by adding an additional document weight matrix [15]. Since the semantic meaning learned from the neural network is highly task dependent, in this study, we explore a story segmentation task related sentence embedding through the neural network. Given the above lexical cohesiveness measures, a variety of story segmentation approaches have been proposed. Popular approaches include dynamic programming (DP) [19], [20], [21], Ncuts [13], BayesSeg [22] and dd-CRP [23]. Particu- larly, hidden Markov model (HMM) [24] has been naturally introduced to story segmentation [25], [26], [27]. A story is treated as an instance of an underlying topic (an HMM hidden state) and words are generated from the distribution of the topic. The emission probability of a state is inferred by a topic-dependent language model (LM), which is calculated by word counting. A Viterbi decoder finds the hidden state sequence, with a change of topic indicating a story boundary. Recently, we have proposed a DNN-HMM approach [28], in which, instead of using topic-dependent LM, we directly map the word observation into topic posterior probabilities using a fully-connected feed-forward deep neural network. We have shown that deep neural network (DNN) is able to learn meaningful continuous features to represent topics of a sentence and hence outperforms traditional n-gram LM HMM approach significantly. In this paper, we further improve the HMM approach by topic sentence embedding. Specifically, we use LSTM to generate fixed-length sentence vectors by predicting topic posterior probability at the last word in the sentence and the output of the last word is used as the representation of the whole sentence. Compared with the previous DNN based approach [28] in which a sentence is represented by aver- aging topic posterior vectors of overlapping word sequences APSIPA ASC 2017

Upload: dangnga

Post on 26-Aug-2018

253 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Topic Embedding of Sentences for Story Segmentation · Topic Embedding of Sentences for Story Segmentation Jia Yu , ... we use long short-term memory recurrent neural network

Topic Embedding of Sentences for StorySegmentation

Jia Yu∗, Xiong Xiao†, Lei Xie∗‡, Eng Siong Chng†,∗ Shaanxi Provincial Key Laboratory of Speech and Image Information Processing,School of Computer Science, Northwestern Polytechnical University, Xi’an, China† School of Computer Engineering, Nanyang Technological University, Singapore

E-mail: {jiayu,lxie}@nwpu-aslp.org, {xiaoxiong,ASESChng}@ntu.edu.sg

Abstract— In this paper, we propose to embed sentences intofixed-dimensional vectors that carry the topic information forstory segmentation. As a sentence comprises of a sequence ofwords and may have different lengths, we use long short-termmemory recurrent neural network (LSTM-RNN) to summarizethe information of the whole sentence and only predict the topicclass at the last word in the sentence. The output of the networkat the last word can be used as an embedding of the sentencein the topic space. We used the obtained sentence embeddingsin the HMM-based story segmentation framework and obtainedpromising results. On the TDT2 corpus, the F1 measure isimproved to 0.789 from 0.765 which is obtained by a competitivesystem using DNN and bag-of-words features.

I. INTRODUCTION

Serving as a necessary precursor for the downstream ap-plications [1], [2], [3], [4], story segmentation [1], [5], [6],[7] aims to partition a stream of video, audio or text into asequence of topically coherent segments, each addressing acentral topic. Manual segmentation is accurate but costly bothin time and effort. Therefore, automatic story segmentationis in high demand. As recent speech recognition technologyhas been a huge success, lexical-cohesion based approacheson speech transcripts have been gained greater emphasis. Therepresentation of sentences in a text is a fundamental step notonly for the story segmentation task in hand but also for manynatural language processing tasks.

By lexical cohesion, we mean that terms in a coherentstory are likely to be semantically related, and differentstories tend to use different sets of terms. By measuringthe lexical cohesiveness between consecutive sentences in atext, a story boundary can be determined. Thus representationof words or sentences is the first important step for storysegmentation. Traditional lexical-cohesion based approaches,e.g., TextTiling [7], [8], only consider simple term statisticsin a sentence, e.g., using Bag-of-words (BOW) representationor term frequency-inverse document frequency (tf-idf) score.In contrast, topic modeling techniques were studied to providesemantic-level cohesiveness measurement. Probabilistic latentsemantic analysis (PLSA) [9], latent Dirichlet allocation (L-DA) [10] and Laplacian PLSA (LapPLSA) [11] have recentlybecome the commonly used topic modeling techniques forstory segmentation and related tasks [12], [13]. Note that topic

‡Corresponding author

modeling can be viewed as a kind of embedding techniques,in which the term statistics in a text block (e.g., a sentence)is embedded into a vector of the distribution of latent topics.Recently, deep learning has achieved great success in variesareas due to its strong ability of feature learning and modeling.Thus various deep learning architectures have been explored tolearn embedding of words or sentences for different task [14],[15], [16], [17], [18]. For example, word vector derived fromneural network language model (NNLM) can capture syntacticand semantic information [14]. Based on word vectors, doc-ument vectors are derived by adding an additional documentweight matrix [15]. Since the semantic meaning learned fromthe neural network is highly task dependent, in this study, weexplore a story segmentation task related sentence embeddingthrough the neural network.

Given the above lexical cohesiveness measures, a variety ofstory segmentation approaches have been proposed. Popularapproaches include dynamic programming (DP) [19], [20],[21], Ncuts [13], BayesSeg [22] and dd-CRP [23]. Particu-larly, hidden Markov model (HMM) [24] has been naturallyintroduced to story segmentation [25], [26], [27]. A story istreated as an instance of an underlying topic (an HMM hiddenstate) and words are generated from the distribution of thetopic. The emission probability of a state is inferred by atopic-dependent language model (LM), which is calculatedby word counting. A Viterbi decoder finds the hidden statesequence, with a change of topic indicating a story boundary.Recently, we have proposed a DNN-HMM approach [28],in which, instead of using topic-dependent LM, we directlymap the word observation into topic posterior probabilitiesusing a fully-connected feed-forward deep neural network.We have shown that deep neural network (DNN) is able tolearn meaningful continuous features to represent topics of asentence and hence outperforms traditional n-gram LM HMMapproach significantly.

In this paper, we further improve the HMM approachby topic sentence embedding. Specifically, we use LSTMto generate fixed-length sentence vectors by predicting topicposterior probability at the last word in the sentence andthe output of the last word is used as the representation ofthe whole sentence. Compared with the previous DNN basedapproach [28] in which a sentence is represented by aver-aging topic posterior vectors of overlapping word sequences

APSIPA ASC 2017

Page 2: Topic Embedding of Sentences for Story Segmentation · Topic Embedding of Sentences for Story Segmentation Jia Yu , ... we use long short-term memory recurrent neural network

Fig. 1. The architecture of LSTM for topic embedding of sentences. Theoutput of the last word in the sentence is used as topic embedding of thesentence.

in the sentence, the LSTM accumulates increasingly richerinformation as it goes through the sentence, and when itreaches the last word, the output of the network provides atopic embedding of the whole sentence naturally. There aretwo obvious advantages of the LSTM approach. First, LSTMhas strong ability to capture long term memory while DNNonly uses a limited fixed-length window to model contextinformation. Hence LSTM is more suitable to model thedependence of words and has strong capability to explore topicinformation in the sentence. Second, the LSTM constructssentence vector through summarizing the information of thewhole sentence rather than simply averaging the posteriors ofwords within the sentence. Meanwhile, our proposed approachalso motivated by deep sentence embedding method which hasrecently achieved great success in document retrieval [29] andneural machine translation [30], [18].

II. TOPIC EMBEDDING OF SENTENCES

A. Sentence embedding through encoding

The proposed framework of topic embedding of sentencesis illustrated in Fig. 1. An LSTM network is used to embed avariable length sentence into a fixed-dimensional vector. Theinput of the network is the word sequence in the form of 1-hot vectors: s = [w1, ...,wT ], where T is the length of thecurrent sentence and varies from sentence to sentence. These1-hot vectors are projected into word vectors by using a linearprojection layer. The word vectors are then fed into the LSTMlayer one-by-one. The output of the LSTM layer is projectedto the posterior probabilities of topic classes, denoted as z =[z1, ..., zT ], through an affine transform and softmax layers.Each of these posterior vectors has N dimensions where N isthe number of topic classes. Although T posterior vectors aregenerated, the cross-entropy cost function is only computed onthe posterior vector of the last word zT during training. Thisis to encourage the network to compress the information of

(a) Topic embedding of sentences using DNN

(b) Topic embedding of sentences using LSTM

Fig. 2. t-SNE visualization on different embedding. (a) and (b) are thevisualization on sentence vectors derived from DNN and LSTM, respectively.Different colors represent the topic labels of the corresponding points. Fewernoise points within the cluster and further distance between clusters indicatestronger discriminative capability of the feature in topic space. The data usedto generate the plot are the stories from TDT2 corpus.

the whole sentence into just one single vector. The vector zTis thus used as the sentence embedding. Training the LSTM-RNN is achieved by the error back-propagation through time(BPTT) algorithm [31], [32], which propagates the gradient ofthe cost function from the last word all the way to the firstword.

The proposed sentence embedding framework has similarstructure as the encoder sub-network of the encoder-decoderbased machine translation framework [30]. However, we trainthe embedding network directly on the topic classification task,which is closely related to story segmentation task of interest.As the sentence embedding vectors are the posterior vectorsof the topics, they form a representation of the topic space.We visualize the embedding vectors to gain insight into itsdiscriminative capability in the topic space and compare itwith DNN derived sentence vectors [28] that is constructedby averaging word vectors within the sentence. We run the

APSIPA ASC 2017

Page 3: Topic Embedding of Sentences for Story Segmentation · Topic Embedding of Sentences for Story Segmentation Jia Yu , ... we use long short-term memory recurrent neural network

t-SNE algorithm [33] to project the embedding vectors to2-dimensional vectors for visualization. Fig. 2 (a) and (b)are the visualization derived from DNN and LSTM basedembedding vectors, respectively. Compared with (a), thereare more compact clusters and the further distance betweenclusters in (b), which indicates better discriminative power inthe topic space.

B. Sentence embedding through averaging

We also studied a simpler sentence embedding method forcomparison purpose. There are two major differences betweenthe simpler method and the method described in the previoussection. First, the cross-entropy cost function is computedon all the words of a sentence. Second, the input to thenetwork during training is a chunk of text comprises of wordsfrom multiple sentences. Hence, the simpler method acceptsrunning text and predict the topic class of each word. Wecall this method as word-level LSTM (WLSTM) and theprevious method described in Section II-A as sentence-levelLSTM (SLSTM). For the WLSTM method, the final sentenceembedding can be obtained by simply averaging the word levelembedding vectors.

The WLSTM method is similar to the DNN-based methodproposed in [28], in the way that both methods predict topicposteriors for each word. The major difference is that in theDNN-based method, context information is obtained throughcontinuous bag-of-words (CBOW) representation which spansup to 50 words in [28]. In the WLSTM method, we rely onthe LSTM network to model long term temporal information.

C. Using sentence embedding in HMM-based story segmen-tation

The sentence embedding discussed above are all derivedfrom the topic posterior probabilities. Instead of using themas features, we can also convert them into likelihood scoresand use them in the HMM-based story segmentation [28].

In the HMM-based story segmentation, each HMM staterepresents a topic class, the state transition probabilities rep-resent the possible transition between topics, and the emissionprobability of a state represents the distribution of words givena topic. The transition probabilities matrix can be trained fromthe topic transition statistics in the training data. For emissionprobabilities, topic dependent n-gram LM can be used [25].Alternatively, we can convert the topic posterior probabilitiesto word likelihoods given topic classes, as proposed in [28].The posterior-to-likelihood conversion is achieved by using theBayes rule: p(wt|zt = i) = p(zt=i|wt)p(wt)

p(zt=i) where p(wt) doesnot depend on the topic class and thus can be ignored in thedecoding. p(zt = i) is the prior probability of the topic class i.Note that the way of converting class posterior to observationlikelihood in the Bayesian function has been used widely inhybrid DNN-HMM approaches for speech recognition [34],[35]. For details on using DNN derived posteriors in HMMbased story segmentation, readers are referred to [28].

TABLE IF1-MEASURE OF TEXTTILING AND DP APPROACHES ON DIFFERENT

EMBEDDING FEATURES

Feature TextTiling DPTopic posteriors by DNN 0.663 0.726Topic posteriors by WLSTM 0.682 0.732Topic posteriors by SLSTM 0.697 0.739

D. Generating Topic Labels Using Clustering

The topic label of each word is usually not readily available.Manual topic labeling is not practical in real applications.In order to get topic labels of words for neural networktraining, we cluster the segmented stories in the training set topredefined number of clusters using the CLUTO tool [36]. Theclustering objective is to minimize the inter-cluster similarityand maximize the intra-cluster similarity. After clustering,each story is assigned with a topic label and words in thestory have the same topic label. The topic-label words arethus used for neural network training.

III. EXPERIMENTS

A. Experimental Setup

We carried out experiments on the Topic Detection andTracking (TDT2) corpus [2] which includes 2,280 Englishbroadcast news programs. We separate them into a trainingset with 1, 800 programs, a development set and a testing seteach with 240 programs. All texts were stemmed by a Porterstemmer and stop words are removed. We used F1-measure,i.e., the harmonic mean of recall and precision, to evaluate thestory segmentation performance with a tolerance window of50 words according to the TDT2 standard [2].

For all experiments, the nodes of the word embedding layerin Figure 1 is set to 200. The size of vocabulary is 57, 817. Forthe neural networks, we initialized the learning rate to 0.03,and decreased it by a decay rate of 0.999. The value of themomentum was set to 0.9. We used an L2 regularizer and setthe value to 3 × 10−6. The GPU was used to accelerate thetraining process. In the unsupervised clustering process, a k-way clustering solution was used and the distance metric usedin CLUTO toolkit [36] was cosine.

B. Analysis of Topic Embedding of Sentences

The topic sentence embedding can be used as both featuresin methods like TextTiling and dynamic programming, andalso probabilities in Viterbi decoding in the HMM-basedmethod. In this section, we first examine the use of topicembedding as features. We compare three neural networkbased topic embedding of sentences in the TextTiling [7],[8] and DP [19], [20], [21] methods, including WLSTM,SLSTM, and DNN-based embedding. Cosine distance is usedto compute the similarity score between sentences in the twoapproaches. Table I shows the results of TextTiling and DPon the testing set with different features. We can clearlysee that the LSTM-derived topic embedding show superiorperformances as compared with DNN embedding. The bestperformance is achieved by the SLSTM model.

APSIPA ASC 2017

Page 4: Topic Embedding of Sentences for Story Segmentation · Topic Embedding of Sentences for Story Segmentation Jia Yu , ... we use long short-term memory recurrent neural network

Sentence Index(c) SLSTM Posterior

20 40 60 80 100 120

20

40

60

80

100

120

Sentence Index

Sentence Index (b) WLSTM Posterior

20 40 60 80 100 120

20

40

60

80

100

120

Sentence Index

Sentence Index (a) DNN Posterior

20 40 60 80 100 120

20

40

60

80

100

120

Sentence Index

Fig. 3. Sentence similarity matrix dotplots for an episode of broadcast news program from the TDT2 testing set, in which the similarities are calculated basedon (a) DNN, (b) WLSTM and (c) SLSTM Posteriors, respectively. x-axis and y-axis are index of sentences. High similarity values are represented by darkpixels. The vertical lines indicate the real story boundaries.

TABLE IIF1-MEASURE WITH DIFFERENT NUMBERS OF TOPICS AND DIFFERENT

NUMBER OF NODES ON LSTM LAYER USING WLSTM-HMM APPROACH.

`````````Topics#Nodes# 256 512 768 1024

50 0.727 0.732 0.738 0.740100 0.739 0.747 0.758 0.761150 0.752 0.765 0.774 0.770200 0.737 0.748 0.756 0.751

TABLE IIIF1-MEASURE WITH DIFFERENT NUMBERS OF TOPICS AND DIFFERENT

NUMBER OF NODES ON LSTM LAYER USING THE SLSTM-HMMAPPROACH.

`````````Topics#Nodes# 256 512 768 1024

50 0.738 0.752 0.747 0.739100 0.749 0.771 0.763 0.761150 0.772 0.789 0.781 0.775200 0.761 0.773 0.769 0.758

Fig. 3 illustrates the sentence similarity matrix dot plotsfor an episode of broadcast news program from the testingset, which shows how good the story segmentation is. Thesimilarities in the three sub-figures are calculated based onDNN, WLSTM and SLSTM embedding, respectively. The redline indicates the ground truth story boundaries. We can seethat all dot plot figures contain dark square regions along thediagonal delimited by story boundaries. These regions indi-cate cohesive topic segments with high sentence similarities.Besides, we can see more salient blocks on the WLSTM andSLSTM embedding (b, c), both are generated by LSTM. Wealso notice that dot plot (c) has the most salient dark squaresand the least noise points outside the diagonal, which indicatesmore promising segmentation results.

C. Results of the WLSTM-HMM approach

We investigate the WLSTM-HMM model to see whetherexplicit sequential modeling based on LSTM benefits the storysegmentation performance. Different numbers of memory cells

TABLE IVF1-MEASURE OF THE PROPOSED LSTM-HMM APPROACH AND THE

DNN-HMM APPROACH IN [28].

Approach F1-measureDNN-HMM [28] 0.765WLSTM-HMM 0.774SLSTM-HMM 0.789

and topics classes are examined. As shown in Table II, thetopic number ranges from 50 to 200 and the number ofmemory cells on the LSTM layer ranges from 256 to 1024.The F1-measure improves when the number of memory cellsincreases from 256 to 768 and it begins to decrease when thenumber is 1024. Meanwhile, with a fixed number of memorycells, the F1-measure first increases when the cluster numberincreases from 50 to 150 and then it decreases when the clusternumber is larger than 150. When the cluster number is 150and the LSTM cell number is 768, we have the highest F1-measure of 0.774, which is higher than 0.765 of the DNN-HMM approach [28]. Note that both DNN embedding [28] andWLSTM embedding predict topic posteriors for each word andsummarize the whole sentence information through averaging.The result has confirmed that explicit sequential modelingusing LSTM is beneficial.

D. Results of the SLSTM-HMM approach

We evaluate the performance of the SLSTM embeddingin this section with the HMM-based method. As shown inTable III, when the topic number is 150 and the LSTM cellnumber is 512, we have the best F1-measure of 0.789 that ishigher than 0.774 of WLSTM-HMM model. The results showthat it is more effective to compress the topic information ofthe whole sentence into one vector than to averaging the wordlevel topic embeddings in WLSTM.

We compared the proposed SLSTM-HMM approach withthe WLSTM-HMM approach and the DNN-HMM [28] ap-proach. From Table IV, the F1-measure of the proposed

APSIPA ASC 2017

Page 5: Topic Embedding of Sentences for Story Segmentation · Topic Embedding of Sentences for Story Segmentation Jia Yu , ... we use long short-term memory recurrent neural network

TABLE VF1-MEASURE COMPARISON WITH STATE-OF-THE-ART METHODS.

Approach F1-measureTextTiling [7] 0.553HMM [25] 0.637PLSA-DP-CE [13] 0.682BayesSeg [22] 0.710DD-CRP [23] 0.730DNN-HMM [28] 0.765WLSTM-HMM (this study) 0.774SLSTM-HMM (this study) 0.789

SLSTM-HMM approach outperforms all the other approaches.The F1-measure improved from 0.765 of the DNN-HMMapproach (with DNN-derived topic posteriors) to 0.789 ofthe proposed SLSTM-HMM approach (the differences aresignificant at p < 0.01 [37]) .

E. Comparison with state-of-the-art methods

We also compared the proposed approach with severalstate-of-the-art methods on the same corpus. The results aresummarized in Table V. We can clearly see the proposedSLSTM-HMM approach outperforms all the methods in thecomparison. The superior performances demonstrate that L-STM has strong topic modeling ability at the sentence leveland it provides a promising way for detecting story boundaries.

IV. CONCLUSIONS

In this paper, we have proposed a topic sentence embeddingapproach to story segmentation. Specifically, we use longshort-term memory - recurrent neural network (LSTM-RNN)to map word observation to topic posterior probabilities andthe whole sentence is represented by the topic posterior ofthe last word in the sentence. We call it sentence levelLSTM embedding (SLSTM). An HMM is used to model thetransition between topics. The Viterbi algorithm is used fordecoding the word sequence into topic sequence, from whichthe story boundary can be identified when the topic changes.Experiments on TDT2 corpus show that the proposed SLSTM-HMM approach outperforms the previous DNN-HMM ap-proach significantly and achieves state-of-the-art performancein story segmentation. Since the topic embedding of sentenceshas powerful discriminative capability in the topic space (asshown in Fig. 2), in the future, we plan to use it in other tasks,such as topic classification and tracking [38].

V. ACKNOWLEDGEMENTS

This work was supported by the National Natural ScienceFoundation of China (Grant No. 61571363).

REFERENCES

[1] A. James, “Introduction to topic detection and tracking,” Topic detectionand tracking, pp. 1–16, 2002.

[2] J. Fiscus, G. Doddington, J. Garofolo, and A. Martin, “Nists 1998topic detection and tracking evaluation (tdt2),” Proceedings of the 1999DARPA Broadcast News Workshop, pp. 19–24, 1999.

[3] S. Soderland, “Learning information extraction rules for semi-structuredand free text,” Machine learning, vol. 34, no. 1-3, pp. 233–272, 1999.

[4] L.-s. Lee and B. Chen, “Spoken document understanding and organi-zation,” Signal Processing Magazine, IEEE, vol. 22, no. 5, pp. 42–60,2005.

[5] D. Beeferman, A. Berger, and J. Lafferty, “Statistical models for textsegmentation,” Machine learning, vol. 34, no. 1-3, pp. 177–210, 1999.

[6] J. C. Reynar, “An automatic method of finding topic boundaries,” inProc. ACL, 1994, pp. 331–333.

[7] M. A. Hearst, “Texttiling: Segmenting text into multi-paragraph subtopicpassages,” Computational linguistics, vol. 23, no. 1, pp. 33–64, 1997.

[8] L. Xie, Y.-L. Yang, and Z.-Q. Liu, “On the effectiveness of subwords forlexical cohesion based story segmentation of chinese broadcast news,”Information Sciences, vol. 181, no. 13, pp. 2873–2891, 2011.

[9] Hofmann and Thomas, “Probabilistic latent semantic indexing,” in Proc.SIGIR, 1999, pp. 50–57.

[10] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,”the Journal of machine Learning research, vol. 3, pp. 993–1022, 2003.

[11] H. Chen, L. Xie, C.-C. Leung, X. Lu, B. Ma, and H. Li, “Modeling latenttopics and temporal distance for story segmentation of broadcast news,”IEEE/ACM Transactions on Audio, Speech, and Language Processing,vol. 25, no. 1, pp. 112–123, 2017.

[12] M. Lu, C.-C. Leung, L. Xie, B. Ma, and H. Li, “Probabilistic latentsemantic analysis for broadcast news story segmentation.” in Proc.INTERSPEECH, 2011, pp. 1109–1112.

[13] M. Lu, L. Zheng, C.-C. Leung, L. Xie, B. Ma, and H. Li, “Broadcastnews story segmentation using probabilistic latent semantic analysis andlaplacian eigenmaps,” in Proc. APSIPA ASC 2011, pp. 356–360, 2011.

[14] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Dis-tributed representations of words and phrases and their compositionali-ty,” Advances in neural information processing systems, pp. 3111–3119,2013.

[15] Q. V. Le and T. Mikolov, “Distributed representations of sentences anddocuments,” in Proc. ICML, 2014, pp. 1188–1196.

[16] R. Collobert and J. Weston, “A unified architecture for natural languageprocessing: Deep neural networks with multitask learning,” in Proc.ICML, 2008, pp. 160–167.

[17] W. Y. Zou, R. Socher, D. M. Cer, and C. D. Manning, “Bilingual wordembeddings for phrase-based machine translation.” in Proc. EMNLP,2013, pp. 1393–1398.

[18] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learningwith neural networks,” in Advances in neural information processingsystems, 2014, pp. 3104–3112.

[19] P. Fragkou, V. Petridis, and A. Kehagias, “A dynamic programming al-gorithm for linear text segmentation,” Journal of Intelligent InformationSystems, vol. 23, no. 2, pp. 179–197, 2004.

[20] O. Heinonen, “Optimal multi-paragraph text segmentation by dynamicprogramming,” in Proc. ACL, 1998, pp. 1484–1486.

[21] L. Xie, L. Zheng, Z. Liu, and Y. Zhang, “Laplacian eigenmaps forautomatic story segmentation of broadcast news,” Audio, Speech, andLanguage Processing, IEEE Transactions on, vol. 20, no. 1, pp. 276–289, 2012.

[22] J. Eisenstein and R. Barzilay, “Bayesian unsupervised topic segmenta-tion,” in Proc. EMNLP, 2008, pp. 334–343.

[23] C. Yang, L. Xie, and X. Zhou, “Unsupervised broadcast news storysegmentation using distance dependent chinese restaurant processes,” inProc. ICASSP, 2014, pp. 4062–4066.

[24] L. R. Rabiner and B.-H. Juang, “An introduction to hidden markovmodels,” ASSP Magazine, IEEE, vol. 3, no. 1, pp. 4–16, 1986.

[25] M. Sherman and Y. Liu, “Using hidden markov models for topicsegmentation of meeting transcripts,” in Proc. SLT, 2008, pp. 185–188.

[26] P. Van Mulbregt, I. Carp, L. Gillick, S. Lowe, and J. Yamron, “Textsegmentation and topic tracking on broadcast news via a hidden markovmodel approach.” in Proc. ICSLP, 1998.

[27] J. P. Yamron, I. Carp, L. Gillick, S. Lowe, and P. van Mulbregt, “Ahidden markov model approach to text segmentation and event tracking,”in Proc. ICASSP, 1998, pp. 333–336.

APSIPA ASC 2017

Page 6: Topic Embedding of Sentences for Story Segmentation · Topic Embedding of Sentences for Story Segmentation Jia Yu , ... we use long short-term memory recurrent neural network

[28] J. Yu, X. Xiao, L. Xie, E. S. Chng, and H. Li, “A dnn-hmm approachto story segmentation,” in Proc. INTERSPEECH, 2016, pp. 1527–1531.

[29] H. Palangi, L. Deng, Y. Shen, J. Gao, X. He, J. Chen, X. Song, andR. Ward, “Deep sentence embedding using long short-term memorynetworks: Analysis and application to information retrieval,” IEEE/ACMTransactions on Audio, Speech and Language Processing (TASLP),vol. 24, no. 4, pp. 694–707, 2016.

[30] K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares,H. Schwenk, and Y. Bengio, “Learning phrase representations usingrnn encoder-decoder for statistical machine translation,” arXiv preprintarXiv:1406.1078, 2014.

[31] P. J. Werbos, “Generalization of backpropagation with application to arecurrent gas market model,” Neural Networks, vol. 1, no. 4, pp. 339–356, 1988.

[32] M. P. Cullar, M. Delgado, and M. C. Pegalajar, “An application of non-linear programming to train recurrent neural networks in time seriesprediction problems,” in Proc. ICEIS, 2005, pp. 35–42.

[33] L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journalof Machine Learning Research, vol. 9, no. Nov, pp. 2579–2605, 2008.

[34] D. Yu and L. Deng, Automatic Speech Recognition - A Deep LearningApproach. Springer, 2015.

[35] H. A. Bourlard and N. Morgan, Connectionist speech recognition: ahybrid approach. Springer Science & Business Media, 2012, vol. 247.

[36] G. Karypis, “Cluto-a clustering toolkit,” DTIC Document, Tech. Rep.,2002.

[37] P.Koehn, “Statistical significance tests for machine translation evalua-tion.” in Proc. EMNLP, 2004, pp. 388–395.

[38] J. Allan, J. G. Carbonell, G. Doddington, J. Yamron, and Y. Yang, “Topicdetection and tracking pilot study final report,” 1998.

APSIPA ASC 2017