y(j) stein vop4 1 vopvop yjs other features. y(j) stein vop4 2 vopvop yjs echo cancellation
TRANSCRIPT
Y(J) Stein VoP4 5
V
O
P
YJS
Subjective reaction to echo
Round-Trip Delay (ms)Required suppression (dB)
01.4
2011.1
4017.7
6022.7
8027.2
10030.9
Ecan
Y(J) Stein VoP4 6
V
O
P
YJS
Subjective reaction to echo delay
0
5
10
15
20
25
30
35
0 20 40 60 80 100
Round-Trip Delay (ms)
Me
an
Re
qu
ire
d L
os
s
( dB
)
Ecan
Y(J) Stein VoP4 7
V
O
P
YJS
Subjective effect of 15 dB echo returns lossSubjective effect of 15 dB echo returns loss..
Round-trip Delay (ms)Decrease in MOSPercent Difficulty
000
3001.330
6002.060
12002.060
Ecan
Y(J) Stein VoP4 8
V
O
P
YJS
Echo suppressEcho suppressoror
comp
switch
switch
inv
4w
4w
In practice need more:VOX, over-ride, reset, etc.
Ecan
Y(J) Stein VoP4 9
V
O
P
YJS
Why Why notnot echo suppresion? echo suppresion?
Echo suppression makes conversation half duplex– Waste of full-duplex infrastructure– Conversation unnatural– Hard to break in– Dead sounding line
It would be better to cancel the echo
subtract the echo signal allowing desired signal through
but that requires DSP.
near
end
-
far end
Ecan
Y(J) Stein VoP4 10
V
O
P
YJS
Echo cancellation?Echo cancellation?
Unfortunately, it’s not so easy
Outgoing signal is delayed, attenuated, distorted
Two echo canceller architectures:
MODEM TYPE
LINE ECHO CANCELLER (LEC)
near
end
far end
-clean
echo path
clean
near
end
far end- echo path
clean
Ecan
Y(J) Stein VoP4 11
V
O
P
YJS
LEC architectureLEC architecture
A/D
hybrid
D/A
near
end doubletalk
detector adapt
- NLP
far endfilterH
X
Y
Ecan
Y(J) Stein VoP4 12
V
O
P
YJS
Adaptive AlgorithmsAdaptive Algorithms
How do we find the echo cancelling filter? keep it correct even if the echo path parameters change?
Need an algorithm that continually changes the filter parameters
All adaptive algorithms are based on the same ideas
(lack of corellation between desired signal and interference)
Let’s start with a simpler case - adaptive noise cancellation
Ecan
Y(J) Stein VoP4 14
V
O
P
YJS
Noise cancellation - cont.Noise cancellation - cont.
Assume that noise is distorted only by unknown gain h We correct by transmitting e n so that the audience hears
y = x + h n - e n = x + (h-e) n the energy of this signal is
Ey y2 = x2 + (h-e)2 n2 + 2(h-e) x n
Assume that Cxn = x nWe need only set e to minimize Ey ! (turn knob until minimal)
Even if the distortion is a complete filter h
we set the ANC filter e to minimize Ey
Ecan
Y(J) Stein VoP4 15
V
O
P
YJS
The LMS algorithmThe LMS algorithm
Gradient descent on energy
correction to H is proportional to error times input X
H H + X
Ecan
Y(J) Stein VoP4 16
V
O
P
YJS
Nonlinear processingNonlinear processing
Because of finite numeric precision
the LEC (linear) filtering can not completely remove echo
Standard LEC adds center clipping to remove residual echo
Clipping threshold needs to be properly set by adaptation
Ecan
Y(J) Stein VoP4 17
V
O
P
YJS
Doubletalk detectionDoubletalk detection
Adaptation of H should take place only when far end speaks
So we freeze adaptation when no far end or double-talk,
that is whenever near end speaks
Geigel algorithm compares absolute value of near-end speech
to half the maximum absolute value in X buffer
If near-end exceeds far-end can assume only near-end is speaking
Ecan
Y(J) Stein VoP4 19
V
O
P
YJS
The need for relaysThe need for relays
Voice is a relatively forgiving signal (rather the ear is)
Compression techniques are designed to pass voice
but may hopelessly distort other signals
Even simple tones (or DTMF) may not be passed by coders
We could go back to 64Kbps G.711 for non-voice signals
But isn’t that silly?
Using 64Kbps for 64bps or even 9.6Kbps data?
The solution is to use a relay
Relays
Y(J) Stein VoP4 20
V
O
P
YJS
Open ChannelOpen Channel
Reasons to use 64Kbps G.711 (open channel) (32 Kbps ADPCM may work as well)
Inexpensive Simple design Robust
Even open channel is not trivial! Need dynamic BW mechanism Need to detect the event (fax/modem tone, DTMF, MF, CPT, etc.)
Need to return to compressed voice (end of session, time-out)
Y(J) Stein VoP4 21
V
O
P
YJS
Tone / Fax / Modem RelayTone / Fax / Modem Relay
A/DD/A
Demodulate/Remodulate
Analog 64 Kbps Demodulate/Remodulate
64 Kbps A/DD/A
Analog
Relays
Fax Fax
PSN
Problems:• need highly accurate detectors• need low false alarm rate• need appropriate protocol• need accurate timing• need expensive DSP processing• delay may be too large• may need “spoofing”• can sides operate with different parameters?
Y(J) Stein VoP4 22
V
O
P
YJS
VoP DSP ArchitectureVoP DSP Architecture
MultiChannelCodec
SpeechCoders
ToneDetector
PacketVoice
ProtocolPlayoutUnit
ControlReal Time
Operating System
VADCNGDISC.
PCMInterface
ToneGenerator
Serial
Port
Voice Packet Module
LEC
Relays
PSN
Y(J) Stein VoP4 23
V
O
P
YJS
DSP
VoP System ImplementationVoP System Implementation
TelephonySignalingModule
MicroprocessorVoicePacketModule
Microprocessor
Voice
Signaling
PacketProtocolModule
NetworkManagement
ModuleNM info
Voice& SignalingPackets
ATM / FR / IPNetwork
PSTN
Relays
Y(J) Stein VoP4 24
V
O
P
YJS
Quality Quality of of
ServiceService
Quality Quality of of
ServiceService
Y(J) Stein VoP4 25
V
O
P
YJS
The meaning of QoSThe meaning of QoS
For general purpose data: Every little bit counts
– only lossless compression– best effort delivery
Real-time not essential– dynamic routing and packet reordering allowed
For speech: Only subjective quality counts
– Can use lossy compression– Can drop segments with little effect
Real-time essential– predetermined route preferable (traffic engineering)
QoS
Y(J) Stein VoP4 26
V
O
P
YJS
PSTN QoSPSTN QoS
Virtually all calls (>95%) completed Once connected virtually no disconnects or faults Toll quality voice Low delay (except satellite calls) Full switching, optimized routing Call Management Fax/Modem functions Wireline and wireless services
QoS
Y(J) Stein VoP4 27
V
O
P
YJS
Paying for QoSPaying for QoS
Law of Photonics Price of transmitting a bit drops by half every 9 months
Free Internet telephony Several firms offering free long distance service over Internet
Strong compression, significant delay and jitter
We no longer need to pay for service
… but we are willing to pay for quality of service
QoS
Y(J) Stein VoP4 28
V
O
P
YJS
Paying for QoSPaying for QoS
1 2 3 4 5
Paying for QoS
MOS
PR
ICE wire servicemobile service
toll
QoS
Y(J) Stein VoP4 30
V
O
P
YJS
Why does it sound Why does it sound the way it sounds?the way it sounds?
PSTN BW=0.2-3.8 KHz, SNR>30 dB PCM, ADPCM (BER 10-3) five nines reliability line echo cancellation
Voice over packet network speech compression delay, delay variation, jitter packet loss/corruption/priority echo cancellation
SQM
Y(J) Stein VoP4 31
V
O
P
YJS
Subjective Voice QualitySubjective Voice Quality
Old Measures 5/9 DRT DAM
The modern scale MOS DMOS
meet neat seat feet Pete beat heat
SQM
Y(J) Stein VoP4 32
V
O
P
YJS
MOS according to ITUMOS according to ITU
P.800 Subjective Determination of Transmission Quality
Annex B: Absolute Category Rating (ACR)
Listening Quality Listening Effort5 excellent relaxed
4 good attention needed
3 fair moderate effort
2 poor considerable effort
1 bad no meaning
with feasible effort
SQM
Y(J) Stein VoP4 33
V
O
P
YJS
MOS according to ITU (cont)MOS according to ITU (cont)
Annex D Degradation Category Rating (DCR)
Annex E Comparison Category Rating (CCR)
ACR not good at high quality speech
DCR CCR 5 inaudible 4 not annoying 3 slightly annoying much better 2 annoying better 1 very annoying slightly better 0 the same -1 slightly worse-2 worse-3 much worse
SQM
Y(J) Stein VoP4 34
V
O
P
YJS
Some MOS numbersSome MOS numbers
Effect of Speech Compression:
(from ITU-T Study Group 15)
Quiet room 48 KHz 16 bit linear sampling 5.0 PCM (A-law/law) 64 Kb/s 4.1 G.723.1 @ 6.3 Kb/s 3.9 G.729 @ 8 Kb/s 3.9 ADPCM G.726 32 Kb/s 3.8 toll quality GSM @ 13Kb/s 3.6 VSELP IS54 @ 8Kb/s 3.4
SQM
Y(J) Stein VoP4 35
V
O
P
YJS
The Problem(s) with MOSThe Problem(s) with MOS
Accurate MOS tests are the only reliable benchmark
BUT
MOS tests are off-line MOS tests are slow MOS tests are expensive Different labs give consistently different results Most MOS tests only check one aspect of system
SQM
Y(J) Stein VoP4 36
V
O
P
YJS
The Problem(s) with SNRThe Problem(s) with SNR
Naive question: Isn’t CCR the same as SNR?
SNR does not correlate well with subjective criteria
Squared difference is not an accurate comparator
Gain Delay Phase Nonlinear processing
SQM
Y(J) Stein VoP4 37
V
O
P
YJS
Speech distance measuresSpeech distance measures
Many objective measures have been proposed:
Segmental SNR Itakura Saito distance Euclidean distance in Cepstrum space Bark spectral distortion Coherence Function
None correlate well with MOS
ITU target - find a quality-measure that does correlate well
SQM
Y(J) Stein VoP4 38
V
O
P
YJS
Return to BiologyReturn to Biology
Standard speech model (LPC) (used by most speech processing/compression/recognition systems)
is a model of speech production
Unfortunately, speech production and perception systemsare not matched
Speech quality measurement idea: use a models of human auditory system (perception)
ITU-T P.861 Perceptual Speech Quality Measurement (PSQM)ITU-T P.862 Perceptual Evaluation of Speech Quality (PESQ)ITU-R BS1387 Objective Measurements of Perceived Audio Quality
SQM
Y(J) Stein VoP4 39
V
O
P
YJS
Some objective methodsSome objective methods
Perceptual Speech Quality Measurement (PSQM)ITU-T P.861
Perceptual Analysis Measurement System (PAMS)BT proprietary technique
Perceptual Evaluation of Speech Quality (PESQ)ITU-T P.862
Objective Measurement of Perceived Audio Quality (PAQM)ITU-R BS.1387
E-modelITU-T G.107, G.108 ETSI ETR-250
SQM
Y(J) Stein VoP4 40
V
O
P
YJS
Objective Quality StrategyObjective Quality Strategy
speechMOS
estimate
channel
QM
QM
to
MOS
SQM
Y(J) Stein VoP4 41
V
O
P
YJS
PSQM philosophyPSQM philosophy(from P.861)(from P.861)
Perceptual
model
Perceptual
model
Internal
Representation
Internal
Representation
Audible
Difference
Cognitive
Model
SQM
Y(J) Stein VoP4 42
V
O
P
YJS
PSQM philosophy (cont)PSQM philosophy (cont)Perceptual Modelling (Internal representation) Short time Fourier transform Frequency warping (telephone-band filtering, Hoth noise) Intensity warping
Cognitive Modelling Loudness scaling Internal cognitive noise Asymmetry Silent interval processing
PSQM Values 0 (no degradation) to 6.5 (maximum degradation)
Conversion to MOS PSQM to MOS calibration using known references Equivalent Q values
SQM
Y(J) Stein VoP4 43
V
O
P
YJS
Problems with PSQMProblems with PSQM
Designed for telephony grade speech codecs
Doesn’t take network effects into account: filtering variable time delay localized distortions
Draft standard P.862 adds: transfer function equalization time alignment, delay skipping distortion averaging
SQM
Y(J) Stein VoP4 44
V
O
P
YJS
PESQ philosophyPESQ philosophy(from P.862)(from P.862)
Perceptual
model
Perceptual
model
Internal
Representation
Internal
Representation
Audible
Difference
Cognitive
Model
Time
Alignment
SQM
Y(J) Stein VoP4 45
V
O
P
YJS
E-modelE-modelR factor mouth to ear transmission quality model
R = R0 - Is - Id - Ie + Awhere
R0 effect of SNR
Is effect of simultaneous impairments
Id effect of delayed impairments
Ie effect of equipment distortion
A advantage of method (e.g. mobility of cellphone)
Defined in ITU-T G.107, G.108 and ETSI ETR-250
SQM
Y(J) Stein VoP4 46
V
O
P
YJS
VQMonVQMon
PSQM and PESQ are intrusive techniquesPSQM and PESQ require on-line DSP processing
Given the speech encoder
shouldn’t there be a connection
between network parameters e.g. packet loss, jitter
and speech quality?
A nonintrusive technique has been developed based on the E-model
Invented by AD Clark (Telchemy) accepted by ETSI TIPHON
SQM