7- speech quality assessment
DESCRIPTION
7- Speech Quality Assessment. Quality Levels Subjective Tests Objective Tests Intelligibility Naturalness. Quality Levels. Synthetic Quality (Under 4.8 kbps) Communication Quality (4.8 to 13 kbps) Toll Quality (13 to 64 kbps) Broadcast Quality (Upper than 64 kbps). Test Types. - PowerPoint PPT PresentationTRANSCRIPT
7-Speech Quality Assessment
Quality Levels
Subjective TestsObjective Tests
IntelligibilityNaturalness
Quality Levels
Synthetic Quality (Under 4.8 kbps)Communication Quality (4.8 to 13 kbps)Toll Quality (13 to 64 kbps)Broadcast Quality (Upper than 64 kbps)
Test Types
Intelligibility Naturalness
Subjective DRT, MRT MOS, DAM
Objective None.Future ASR systems
AI, Global SNR, Seg. SNR, FW-Seg. SNR, Itakura Measure,WSSM
First ClassSubjective Intelligibility Tests
Diagnostic Rhyme Test (DRT)– Selecting between two CVC by different first C– First C should have specific properties– Ex. hop - fop And than - dan
Modified Rhyme Test (MRT)– Selecting between CVC’s by different first C– Ex. Cat, bat, rat, mat, fat, sat
First Class (Cont’d)Subjective Intelligibility tests
DRT is very applicable and credibleIn this test user can hear the speech only once
100%
Tests
IncorrectCorrect
NNN
DRT
Second ClassSubjective Naturalness tests
Mean Opinion Score (MOS)– MOS is very applicable and credible– In this test user can hear the speech a lot
Diagnostic Acceptability Measure (DAM)– This test is very complex
Mean Opinion Score (MOS)
Scores for MOS are like this
Score Speech Quality1
2
3
4
5
Not Acceptable
Weak
Medium
Good
Excellent
Diagnostic Acceptability Measure (DAM)
This test is very complexIn this test there is 19 different parameters for score. These parameters divide into 3 main groups:– Signal Quality– Background Quality– Total Quality
Objective Tests
These tests can not be used for intelligibility. Because system couldn’t recognize speech intelligibility
Objective tests can only be used for speech Naturalness
Objective Tests (Cont’d)
Articulation Index (AI)
Signal to Noise Ratio (SNR)– Global (Classic) SNR– Segmental SNR– Frequency Weighted Segmental SNR
Articulation Index (AI)
AI assumes that different frequency bands distortion are independent, and measure signal quality in different bands.In each band determines percentage of perceptible signal by listener
. . . . . . . . . 20 BandsHZ
200 6100
Articulation index (Cont’d)
Perceptible by user signal :– 1- Upper than human hearing threshold– 2- Under than human pain threshold– 3- Upper than Masking Noise level
– In each case one of the states 1 or 3 is prevail
Articulation index (Cont’d)
In AI SNR measured isolated in each band
20
1 30)30,(
201
j
SNRMinAI
Signal To Noise Ratio(SNR)
)()()( ˆ nnn ss
n
nnn
n ssE 2)()(
2)( ]ˆ[
n
ns sE 2)(
nnn
nn
sglobal
ss
s
EE
SNR2
)()(
2)(
)(
]ˆ[log10log10
Segmental SNR
1
0
1
2)()(
1
2)(
)( ]]ˆ[
[log101 M
jm
Nmnnn
m
Nmnn
seg j
j
j
j
ss
s
MSNR
j’th Frame SNR
M : Number of frames
Frequency Weighted Segmental SNR
1
0
1,
1,,,
)( ]])()([
log[101 M
jK
kkj
K
kjkjkskj
segfw
W
mEmEW
MSNR
K : Number of frequency bands
M : Number of frames
Deller Formula
, 10 , ,11
( ) 100
,1
10log [ ( ) ( )]1 10log [ ]
K
j k s k j k jMk
fw seg Kj
j kk
w E m E mSNR
M w
Other Formulas:
1,
( ) 10 ,0 1 ,
,1
( )1 1 10log( )
M Ks k j
fw seg j kKj k k j
j kk
E mSNR w
M E mw
, 10 , ,11
( )0
,1
10log [ ( ) ( )]1
K
j k s k j k jMk
fw seg Kj
j kk
w E m E mSNR
M w
Itakura Measure
)(H
)(S
)(H Is the envelope spectrum
2|)(|)()}({)( XSRFS
Use from All-Pole (AR) Model
Itakura Measure (Cont’d)
p
i
jiea
H
1
1
1)(
This is based on the spectrum difference between main signal and assessment signal
ia
iRiK
Autoregressive Coefficients
Reflection Coefficients
Autocorrelation Coefficients
Itakura Measure (Cont’d)
M
lssss mlgmlg
Mmgmgd
1
2ˆˆ )],(),([1))(),((
m :Index of frame
l : Index of coefficients
Itakura Measure (Cont’d)
1
1',,
1ˆ',,
ˆ
])]',(),([
[
))'(),((~
M
lmml
M
lssmml
sslp
W
mlmlW
mmd
),( mls Is the l’th parameter of the frame that conduces m’th sample
Weighted Spectral Slope Measure(WSSM)
|),(||),1(||),(| mksmksmks |),(ˆ||),1(ˆ||),(ˆ| mksmksmks
236
1, ]|),(ˆ||),(|[
|)),(ˆ||,),((|
k
mk
WSSM
mksmksWK
msmsd
),( mks Is STFT of k’th band of the frame that conduces m’th sample
dB.in are|),(||),1(| mksandmks