summary the task of extractive speech summarization is to select a set of salient sentences from an...

1
Summary The task of extractive speech summarization is to select a set of salient sentences from an original spoken document and concatenate them to form a summary To facilitate users to better browse through and understand the content of the document We propose a novel margin-based discriminative training (MBDT) algorithm that aims to penalize non-summary sentences in an inverse proportion to their summarization evaluation scores The summarization model can be trained with an objective function that is closely coupled with the ultimate evaluation metric of extractive speech summarization A Margin-based Discriminative Modeling Approach for Extractive Speech Summarization Shih-Hung Liu †* , Kuan-Yu Chen * , Berlin Chen # , Ea-Ee Jan + ,Hsin-Min Wang * , Hsu-Chun Yen , Wen-Lian Hsu * Institute of Information Science, Academia Sinica, Taiwan # National Taiwan Normal University, Taiwan * National Taiwan University, Taiwan + IBM Thomas J. Watson Research Center, USA Various Discriminative Models SVM • A binary summarizer which can output the decision score g(S i ) of each sentence S i The posterior probability of a sentence S i being included in the summary class S can be approximated by the following sigmoid operation (the weights and are optimized by the training set): Perceptron The decision score of sentence S i is f(S i )=X i X i is the inner product of feature vector of sentence S i and model parameter The model parameter can be estimated by maximizing the accumulated squared score distances of all the training spoken documents N is total training documents, Summ n is the reference summary of the n-th training document D n , S R denotes a summary sentence in Summ n , and S n * is the non-summary sentence of D n that has the highest decision score Global Conditional Log-linear Model (GCLM) • The GCLM method aims at maximizing the posterior scores of the summary sentences of each given training spoken document Experimental Results Dataset: • The summarization dataset employed in this study is a publicly available broadcast news (MATBN) corpus • 100 documents for training and 20 documents for test 16 total for representing the sentence The results of comparing with other discriminative methods and unsupervised approaches. TD: manual transcripts SD: speech recognition transcripts The results of each individual feature used in isolation and combination with other kind of feature Margin-based Discriminative Training (MBDT) 2) In the second stage, MBDT attempts to define a training objective function that is closely coupled with the ultimate evaluation metric of speech summarization w(S j ) is the weight of a (non-summary) sentence S j Eval(S j ,Summ n ) is a function that estimate the summarization performance of a sentence S j of D n by comparing S j to the reference summary Summ n of D n with a desired evaluation metric The score returns by Eval(S j ,Summ n ) is between 0 and 1 The higher the value, the better the performance Margin-based Discriminative Training (MBDT) The MBDT algorithm proceeds in two stages 1) MBDT conducts a training data selection procedure to select a subset of most confusing non-summary sentences S i (i.e., to form a support set ) for each summary sentence S R in a training spoken document D n is a tunable threshold is the separation margin Prosodic Features 1. Pitch value (max, min, mean, diff) 2. Energy value (max, min, mean, diff) Lexical Features 1. Number of named entities 2. Number of stop words 3. Bigram language model scores 4. Normalized bigram scores Relevance Features 1. VSM 2. DLM 3. RM 4. SMM TD SD ROUGE-1 ROUGE-2 ROUGE-L ROUGE-1 ROUGE-2 ROUGE-L SVM 0.470 0.364 0.426 0.383 0.245 0.342 Ranking SVM 0.490 0.391 0.447 0.388 0.254 0.344 Perceptron 0.487 0.394 0.439 0.393 0.259 0.352 GCLM 0.482 0.386 0.433 0.380 0.250 0.342 MBDT 0.515 0.422 0.462 0.393 0.264 0.353 ILP 0.442 0.337 0.401 0.348 0.209 0.306 Submodulari ty 0.414 0.286 0.363 0.332 0.204 0.303 TD SD ROUGE-1 ROUGE-2 ROUGE-L ROUGE-1 ROUGE-2 ROUGE-L MBDT+Pro 0.374 0.256 0.337 0.325 0.189 0.292 MBDT+Lex 0.255 0.159 0.228 0.189 0.082 0.170 MBDT+Rel 0.411 0.287 0.360 0.360 0.202 0.298 MBDT+Pro+Lex 0.370 0.269 0.345 0.342 0.205 0.310 MBDT+Lex+Rel 0.428 0.314 0.382 0.355 0.214 0.313 MBDT+Pro+Rel 0.422 0.315 0.370 0.341 0.197 0.288 MBDT+All 0.515 0.422 0.462 0.393 0.264 0.353 ) ) ( exp( 1 1 ) | ( i i i S g S P X S N n S n R Perceptron n R S f S f F 1 2 * ) ( ) ( 2 1 ) ( Summ α N n S D S j R n R n j F 1 ) ( Summ GCLM ) X α exp( ) X α exp( log α } ) ( | { j S j S S S R R Sup ) ( ) ( ) ( j R j S S f S f S R N n S S j R j n R R S j S f S f S w F 1 2 ) ( ) ( ) ( 2 1 ) ( Summ Sup MBDT α ) , ( 1 ) ( n j j S Eval S w Summ

Upload: christiana-lyons

Post on 13-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Summary  The task of extractive speech summarization is to select a set of salient sentences from an original spoken document and concatenate them to

Summary The task of extractive speech summarization is to select a set of

salient sentences from an original spoken document and concatenate them to form a summary• To facilitate users to better browse through and understand the

content of the document We propose a novel margin-based discriminative training (MBDT)

algorithm that aims to penalize non-summary sentences in an inverse proportion to their summarization evaluation scores

The summarization model can be trained with an objective function that is closely coupled with the ultimate evaluation metric of extractive speech summarization

A Margin-based Discriminative Modeling Approach for Extractive Speech

SummarizationShih-Hung Liu†*, Kuan-Yu Chen*, Berlin Chen#, Ea-Ee Jan+,Hsin-Min Wang*, Hsu-Chun Yen†, Wen-Lian Hsu*

†Institute of Information Science, Academia Sinica, Taiwan#National Taiwan Normal University, Taiwan

*National Taiwan University, Taiwan+IBM Thomas J. Watson Research Center, USA

Various Discriminative Models

SVM

• A binary summarizer which can output the decision score g(Si) of each sentence Si

The posterior probability of a sentence Si being included in the summary class S can be approximated by the following sigmoid operation (the weights and are optimized by the training set):𝛼 𝛽

Perceptron

• The decision score of sentence Si is f(Si)=𝛂Xi

𝛂 Xi is the inner product of feature vector of sentence Si and model parameter 𝛂

The model parameter can be estimated by maximizing the accumulated squared score distances of all the training spoken documents

N is total training documents, Summn is the reference summary of the n-th training document Dn, SR denotes a summary sentence in Summn, and Sn

* is the non-summary sentence of Dn that has the highest decision score

Global Conditional Log-linear Model (GCLM)

• The GCLM method aims at maximizing the posterior scores of the summary sentences of each given training spoken document

Experimental Results Dataset:

• The summarization dataset employed in this study is a publicly available broadcast news (MATBN) corpus

• 100 documents for training and 20 documents for test

16 indicative features in total for representing the sentence

The results of comparing with other discriminative methods and unsupervised approaches.

• TD: manual transcripts

• SD: speech recognition transcripts

The results of each individual feature used in isolation and combination with other kind of feature

Margin-based Discriminative Training (MBDT)

2) In the second stage, MBDT attempts to define a training objective function that is closely coupled with the ultimate evaluation metric of speech summarization

• w(Sj) is the weight of a (non-summary) sentence Sj

• Eval(Sj,Summn) is a function that estimate the summarization performance of a sentence Sj of Dn by comparing Sj to the reference summary Summn of Dn with a desired evaluation metric

The score returns by Eval(Sj,Summn) is between 0 and 1 The higher the value, the better the performance

))(exp(1

1)|(

iii Sg

SP XS

N

n SnRPerceptron

nR

SfSfF1

2*)()(2

1)(

Summ

α

N

n SDS

j

R

nR

nj

F1

)(Summ

GCLM )Xαexp(

)Xαexp(logα

Margin-based Discriminative Training (MBDT)

The MBDT algorithm proceeds in two stages

1) MBDT conducts a training data selection procedure to select a subset of most confusing non-summary sentences Si (i.e., to form a support set ) for each summary sentence SR in a training spoken document Dn

• 𝜀 is a tunable threshold

• is the separation margin

})(|{ jSjS SSRR

Sup

)()()( jRjS SfSfSR

N

n S SjRj

nR RSj

SfSfSwF1

2)()()(2

1)(

Summ SupMBDT α

),(1)( njj SEvalSw Summ

Prosodic Features 1. Pitch value (max, min, mean, diff)2. Energy value (max, min, mean, diff)

Lexical Features

1. Number of named entities2. Number of stop words3. Bigram language model scores4. Normalized bigram scores

Relevance Features

1. VSM2. DLM3. RM4. SMM

 TD SD

ROUGE-1 ROUGE-2 ROUGE-L ROUGE-1 ROUGE-2 ROUGE-L

SVM 0.470 0.364 0.426 0.383 0.245 0.342

Ranking SVM 0.490 0.391 0.447 0.388 0.254 0.344

Perceptron 0.487 0.394 0.439 0.393 0.259 0.352

GCLM 0.482 0.386 0.433 0.380 0.250 0.342

MBDT 0.515 0.422 0.462 0.393 0.264 0.353

ILP 0.442 0.337 0.401 0.348 0.209 0.306

Submodularity 0.414 0.286 0.363 0.332 0.204 0.303

 

TD SDROUGE-1 ROUGE-2 ROUGE-L ROUGE-1 ROUGE-2 ROUGE-L

MBDT+Pro 0.374 0.256 0.337 0.325 0.189 0.292

MBDT+Lex 0.255 0.159 0.228 0.189 0.082 0.170

MBDT+Rel 0.411 0.287 0.360 0.360 0.202 0.298

MBDT+Pro+Lex 0.370 0.269 0.345 0.342 0.205 0.310

MBDT+Lex+Rel 0.428 0.314 0.382 0.355 0.214 0.313

MBDT+Pro+Rel 0.422 0.315 0.370 0.341 0.197 0.288

MBDT+All 0.515 0.422 0.462 0.393 0.264 0.353