improved spoken term detection with graph-based re-ranking in feature space

1
Improved Spoken Term Detection with Graph-Based Re-Ranking in Feature Space Yun-Nung Chen, Chia-Ping Chen, Hung-Yi Lee , Chun-an Chan, and Lin-shan Lee National Taiwan University Summary Using graph-based re-ranking to improve spoken term detection with acoustic similarity. With MLLR acoustic model, MAP improves from 55.54% to 67.38%. The relative improvement rate is 22.04% • Node: rst-pass retrieved utterance • Edge: weighted by the similarity between two utterances •Nodes connected to more nodes with higher scores are given higher scores. •Considering global similarity among first-pass retrieved utterances. Main Idea Acoustic Distance 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 -2.00 0.00 2.00 4.00 6.00 8.00 10.00 12.00 SI MLLR SD α MAP Impr . Experiments Methods SI MLLR SD MAP Impr. MAP Impr. MAP Impr. First-Pass 45.47 - 55.54 - 73.52 - PRF 52.63 7.16 64.07 8.53 76.30 2.78 Graph 54.37 8.90 66.82 11.28 78.44 4.92 PRF + Graph 57.75 12.28 67.38 11.84 77.47 3.95 Corpus: 33 hours of course lectures (single instructor) Language: primarily in Mandarin Chinese Acoustic Model: SI, MLLR, SD v(4 ) x 1 x 3 x 4 x 5 x 6 x 2 Out(v i ) = {x 3 , x 5 , x 6 } “Hit Region”: the corresponding MFCC sequence which is most possible to be the query in the lattice. Modified Random Walk Corpus MFCC Extraction ASR Retrieval Lattic e Query Q 1st-Pass Results X Q Original Relevance Score S Q (x) Utterance Pair x i , x j Graph-Based Re-Computation d(x i ,x j ) Re-Ranking v (i) White blocks: original spoken term detection Acoustic Distance Acoustic Feature Space releva nt irrelev ant • Relevant utterances are similar to each other in feature space. • Utterances similar to more utterances with higher scores should be given higher relevance scores. Graph-Based Re-Ranking with Acoustic Feature Q: 0.5 w2: 0.1 w1: 0.1 w2: 0.6 w2: 0.3 Q: 0.7 w1: 0.2 Hit Region of the utterance x i Hit Region of the utterance x j DTW d(x i ,x j ) Latt ice MFCC sequence of x Compute acoustic distance d(x i ,x j ) for each utterance pair x i ,x j in first-pass result. • Node: first-pass retrieved utterance • Edge: weighted by the similarity between the two utterances evaluated in feature space • Normalized original relevance score • Scores propagated from neighbors of node i Re-Ranking v(i) is higher when 1) Higher original relevance score 2) Similar to more utterances with higher scores • Solution: dominant eigenvector of P’ • PRF: pseudo-relevance feedback in feature space (InterSpeech 2010) • Our approach performs better, specially for the relatively poorer acoustic models (SI and MLLR). •Integrated with original relevance score • The performance is optimized at α = 0.9 • Better retrieval relies primarily on global similarity. • The graphic structure provides signi cant • Normalized similarity Compute similarity from distance 0 1 (matrix form)

Upload: tasha

Post on 22-Feb-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Improved Spoken Term Detection with Graph-Based Re-Ranking in Feature Space Yun-Nung Chen, Chia -Ping Chen, Hung-Yi Lee , Chun-an Chan, and Lin- shan Lee. Query Q. Corpus. MFCC Extraction. ASR. Retrieval. Lattice. Re-Ranking. Original Relevance Score S Q ( x ). v ( i ). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Improved Spoken Term Detection with Graph-Based Re-Ranking in  Feature  Space

Improved Spoken Term Detectionwith Graph-Based Re-Ranking in Feature Space

Yun-Nung Chen, Chia-Ping Chen, Hung-Yi Lee , Chun-an Chan, and Lin-shan Lee National Taiwan University

SummaryUsing graph-based re-ranking to improve spoken term detection with acoustic

similarity. With MLLR acoustic model, MAP improves from 55.54% to 67.38%. The relative

improvement rate is 22.04%

• Node: first-pass retrieved utterance• Edge: weighted by the similarity between two utterances• Nodes connected to more nodes with higher scores are given

higher scores.• Considering global similarity among first-pass retrieved

utterances.

Main Idea

Acoustic Distance

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 -2.00 0.00 2.00 4.00 6.00 8.00

10.00 12.00

SIMLLRSD

α

MAP Impr.

Experiments

Methods SI MLLR SDMAP Impr. MAP Impr. MAP Impr.

First-Pass 45.47 - 55.54 - 73.52 -PRF 52.63 7.16 64.07 8.53 76.30 2.78

Graph 54.37 8.90 66.82 11.28 78.44 4.92PRF + Graph 57.75 12.28 67.38 11.84 77.47 3.95

• Corpus: 33 hours of course lectures (single instructor)• Language: primarily in Mandarin Chinese • Acoustic Model: SI, MLLR, SD

v(4)

x1

x3x4

x5

x6x2

Out(vi) = {x3, x5, x6}

“Hit Region”: the corresponding MFCC sequence which is most possible to be the query in the lattice.

Modified Random Walk

CorpusMFCC

Extraction ASR RetrievalLattice

Query Q

1st-Pass Results XQ

Original Relevance Score SQ (x)

Utterance Pair xi, xj

Graph-BasedRe-Computation d(xi ,xj)

Re-Ranking

v (i)

White blocks: original spoken term detection

AcousticDistance

Acoustic Feature Space relevantirrelevant

• Relevant utterances are similar to each other in feature space.• Utterances similar to more utterances with higher scores

should be given higher relevance scores.

Graph-Based Re-Ranking with Acoustic Feature

Q: 0.5

w2: 0.1

w1: 0.1 w2: 0.6w2: 0.3

Q: 0.7w1: 0.2

Hit Region of the utterance xi

Hit Region of the utterance xj

DTW d(xi ,xj)

Lattice

MFCC sequence of x

• Compute acoustic distance d(xi ,xj) for each utterance pair xi ,xj in first-pass result.

• Node: first-pass retrieved utterance• Edge: weighted by the similarity between

the two utterances evaluated in feature space

• Normalized original relevance score

• Scores propagated from neighbors of node i

Re-Ranking

v(i) is higher when1) Higher original relevance score2) Similar to more utterances with higher scores

• Solution: dominant eigenvector of P’

• PRF: pseudo-relevance feedback in feature space (InterSpeech 2010)

• Our approach performs better, specially for the relatively poorer acoustic models (SI and MLLR).

• Integrated with original relevance score

• The performance is optimized at α = 0.9• Better retrieval relies primarily on global similarity.• The graphic structure provides significant information in ranking.

• Normalized similarity

• Compute similarity from distance 0

1

(matrix form)