y(j) stein vop4 1 vopvop yjs other features. y(j) stein vop4 2 vopvop yjs echo cancellation

Y(J) Stein VoP4 1

V

O

P

YJS

OtherOtherFeaturesFeatures

Y(J) Stein VoP4 2

V

O

P

YJS

Echo Echo

CancellationCancellation

Echo Echo

CancellationCancellation

Y(J) Stein VoP4 3

V

O

P

YJS

Acoustic EchoAcoustic Echo

Ecan

Y(J) Stein VoP4 4

V

O

P

YJS

Line echoLine echo

Telephone

1

Telephone

2hybrid hybrid

Ecan

Y(J) Stein VoP4 5

V

O

P

YJS

Subjective reaction to echo

Round-Trip Delay (ms)Required suppression (dB)

01.4

2011.1

4017.7

6022.7

8027.2

10030.9

Ecan

Y(J) Stein VoP4 6

V

O

P

YJS

Subjective reaction to echo delay

0

5

10

15

20

25

30

35

0 20 40 60 80 100

Round-Trip Delay (ms)

Me

an

Re

qu

ire

d L

os

s

( dB

)

Ecan

Y(J) Stein VoP4 7

V

O

P

YJS

Subjective effect of 15 dB echo returns lossSubjective effect of 15 dB echo returns loss..

Round-trip Delay (ms)Decrease in MOSPercent Difficulty

000

3001.330

6002.060

12002.060

Ecan

Y(J) Stein VoP4 8

V

O

P

YJS

Echo suppressEcho suppressoror

comp

switch

switch

inv

4w

4w

In practice need more:VOX, over-ride, reset, etc.

Ecan

Y(J) Stein VoP4 9

V

O

P

YJS

Why Why notnot echo suppresion? echo suppresion?

Echo suppression makes conversation half duplex– Waste of full-duplex infrastructure– Conversation unnatural– Hard to break in– Dead sounding line

It would be better to cancel the echo

subtract the echo signal allowing desired signal through

but that requires DSP.

near

end

-

far end

Ecan

Y(J) Stein VoP4 10

V

O

P

YJS

Echo cancellation?Echo cancellation?

Unfortunately, it’s not so easy

Outgoing signal is delayed, attenuated, distorted

Two echo canceller architectures:

MODEM TYPE

LINE ECHO CANCELLER (LEC)

near

end

far end

-clean

echo path

clean

near

end

far end- echo path

clean

Ecan

Y(J) Stein VoP4 11

V

O

P

YJS

LEC architectureLEC architecture

A/D

hybrid

D/A

near

end doubletalk

detector adapt

- NLP

far endfilterH

X

Y

Ecan

Y(J) Stein VoP4 12

V

O

P

YJS

Adaptive AlgorithmsAdaptive Algorithms

How do we find the echo cancelling filter? keep it correct even if the echo path parameters change?

Need an algorithm that continually changes the filter parameters

All adaptive algorithms are based on the same ideas

(lack of corellation between desired signal and interference)

Let’s start with a simpler case - adaptive noise cancellation

Ecan

Y(J) Stein VoP4 13

V

O

P

YJS

Noise cancellationNoise cancellation

h n xy

x

n

-h

y

e

e n

Ecan

Y(J) Stein VoP4 14

V

O

P

YJS

Noise cancellation - cont.Noise cancellation - cont.

Assume that noise is distorted only by unknown gain h We correct by transmitting e n so that the audience hears

y = x + h n - e n = x + (h-e) n the energy of this signal is

Ey y2 = x2 + (h-e)2 n2 + 2(h-e) x n

Assume that Cxn = x nWe need only set e to minimize Ey ! (turn knob until minimal)

Even if the distortion is a complete filter h

we set the ANC filter e to minimize Ey

Ecan

Y(J) Stein VoP4 15

V

O

P

YJS

The LMS algorithmThe LMS algorithm

Gradient descent on energy

correction to H is proportional to error times input X

H H + X

Ecan

Y(J) Stein VoP4 16

V

O

P

YJS

Nonlinear processingNonlinear processing

Because of finite numeric precision

the LEC (linear) filtering can not completely remove echo

Standard LEC adds center clipping to remove residual echo

Clipping threshold needs to be properly set by adaptation

Ecan

Y(J) Stein VoP4 17

V

O

P

YJS

Doubletalk detectionDoubletalk detection

Adaptation of H should take place only when far end speaks

So we freeze adaptation when no far end or double-talk,

that is whenever near end speaks

Geigel algorithm compares absolute value of near-end speech

to half the maximum absolute value in X buffer

If near-end exceeds far-end can assume only near-end is speaking

Ecan

Y(J) Stein VoP4 18

V

O

P

YJS

DataData

RelaysRelays

DataData

RelaysRelays

Y(J) Stein VoP4 19

V

O

P

YJS

The need for relaysThe need for relays

Voice is a relatively forgiving signal (rather the ear is)

Compression techniques are designed to pass voice

but may hopelessly distort other signals

Even simple tones (or DTMF) may not be passed by coders

We could go back to 64Kbps G.711 for non-voice signals

But isn’t that silly?

Using 64Kbps for 64bps or even 9.6Kbps data?

The solution is to use a relay

Relays

Y(J) Stein VoP4 20

V

O

P

YJS

Open ChannelOpen Channel

Reasons to use 64Kbps G.711 (open channel) (32 Kbps ADPCM may work as well)

Inexpensive Simple design Robust

Even open channel is not trivial! Need dynamic BW mechanism Need to detect the event (fax/modem tone, DTMF, MF, CPT, etc.)

Need to return to compressed voice (end of session, time-out)

Y(J) Stein VoP4 21

V

O

P

YJS

Tone / Fax / Modem RelayTone / Fax / Modem Relay

A/DD/A

Demodulate/Remodulate

Analog 64 Kbps Demodulate/Remodulate

64 Kbps A/DD/A

Analog

Relays

Fax Fax

PSN

Problems:• need highly accurate detectors• need low false alarm rate• need appropriate protocol• need accurate timing• need expensive DSP processing• delay may be too large• may need “spoofing”• can sides operate with different parameters?

Y(J) Stein VoP4 22

V

O

P

YJS

VoP DSP ArchitectureVoP DSP Architecture

MultiChannelCodec

SpeechCoders

ToneDetector

PacketVoice

ProtocolPlayoutUnit

ControlReal Time

Operating System

VADCNGDISC.

PCMInterface

ToneGenerator

Serial

Port

Voice Packet Module

LEC

Relays

PSN

Y(J) Stein VoP4 23

V

O

P

YJS

DSP

VoP System ImplementationVoP System Implementation

TelephonySignalingModule

MicroprocessorVoicePacketModule

Microprocessor

Voice

Signaling

PacketProtocolModule

NetworkManagement

ModuleNM info

Voice& SignalingPackets

ATM / FR / IPNetwork

PSTN

Relays

Y(J) Stein VoP4 24

V

O

P

YJS

Quality Quality of of

ServiceService

Quality Quality of of

ServiceService

Y(J) Stein VoP4 25

V

O

P

YJS

The meaning of QoSThe meaning of QoS

For general purpose data: Every little bit counts

– only lossless compression– best effort delivery

Real-time not essential– dynamic routing and packet reordering allowed

For speech: Only subjective quality counts

– Can use lossy compression– Can drop segments with little effect

Real-time essential– predetermined route preferable (traffic engineering)

QoS

Y(J) Stein VoP4 26

V

O

P

YJS

PSTN QoSPSTN QoS

Virtually all calls (>95%) completed Once connected virtually no disconnects or faults Toll quality voice Low delay (except satellite calls) Full switching, optimized routing Call Management Fax/Modem functions Wireline and wireless services

QoS

Y(J) Stein VoP4 27

V

O

P

YJS

Paying for QoSPaying for QoS

Law of Photonics Price of transmitting a bit drops by half every 9 months

Free Internet telephony Several firms offering free long distance service over Internet

Strong compression, significant delay and jitter

We no longer need to pay for service

… but we are willing to pay for quality of service

QoS

Y(J) Stein VoP4 28

V

O

P

YJS

Paying for QoSPaying for QoS

1 2 3 4 5

Paying for QoS

MOS

PR

ICE wire servicemobile service

toll

QoS

Y(J) Stein VoP4 29

V

O

P

YJS

SpeechSpeechQualityQuality

MeasurementMeasurement

Y(J) Stein VoP4 30

V

O

P

YJS

Why does it sound Why does it sound the way it sounds?the way it sounds?

PSTN BW=0.2-3.8 KHz, SNR>30 dB PCM, ADPCM (BER 10-3) five nines reliability line echo cancellation

Voice over packet network speech compression delay, delay variation, jitter packet loss/corruption/priority echo cancellation

SQM

Y(J) Stein VoP4 31

V

O

P

YJS

Subjective Voice QualitySubjective Voice Quality

Old Measures 5/9 DRT DAM

The modern scale MOS DMOS

meet neat seat feet Pete beat heat

SQM

Y(J) Stein VoP4 32

V

O

P

YJS

MOS according to ITUMOS according to ITU

P.800 Subjective Determination of Transmission Quality

Annex B: Absolute Category Rating (ACR)

Listening Quality Listening Effort5 excellent relaxed

4 good attention needed

3 fair moderate effort

2 poor considerable effort

1 bad no meaning

with feasible effort

SQM

Y(J) Stein VoP4 33

V

O

P

YJS

MOS according to ITU (cont)MOS according to ITU (cont)

Annex D Degradation Category Rating (DCR)

Annex E Comparison Category Rating (CCR)

ACR not good at high quality speech

DCR CCR 5 inaudible 4 not annoying 3 slightly annoying much better 2 annoying better 1 very annoying slightly better 0 the same -1 slightly worse-2 worse-3 much worse

SQM

Y(J) Stein VoP4 34

V

O

P

YJS

Some MOS numbersSome MOS numbers

Effect of Speech Compression:

(from ITU-T Study Group 15)

Quiet room 48 KHz 16 bit linear sampling 5.0 PCM (A-law/law) 64 Kb/s 4.1 G.723.1 @ 6.3 Kb/s 3.9 G.729 @ 8 Kb/s 3.9 ADPCM G.726 32 Kb/s 3.8 toll quality GSM @ 13Kb/s 3.6 VSELP IS54 @ 8Kb/s 3.4

SQM

Y(J) Stein VoP4 35

V

O

P

YJS

The Problem(s) with MOSThe Problem(s) with MOS

Accurate MOS tests are the only reliable benchmark

BUT

MOS tests are off-line MOS tests are slow MOS tests are expensive Different labs give consistently different results Most MOS tests only check one aspect of system

SQM

Y(J) Stein VoP4 36

V

O

P

YJS

The Problem(s) with SNRThe Problem(s) with SNR

Naive question: Isn’t CCR the same as SNR?

SNR does not correlate well with subjective criteria

Squared difference is not an accurate comparator

Gain Delay Phase Nonlinear processing

SQM

Y(J) Stein VoP4 37

V

O

P

YJS

Speech distance measuresSpeech distance measures

Many objective measures have been proposed:

Segmental SNR Itakura Saito distance Euclidean distance in Cepstrum space Bark spectral distortion Coherence Function

None correlate well with MOS

ITU target - find a quality-measure that does correlate well

SQM

Y(J) Stein VoP4 38

V

O

P

YJS

Return to BiologyReturn to Biology

Standard speech model (LPC) (used by most speech processing/compression/recognition systems)

is a model of speech production

Unfortunately, speech production and perception systemsare not matched

Speech quality measurement idea: use a models of human auditory system (perception)

ITU-T P.861 Perceptual Speech Quality Measurement (PSQM)ITU-T P.862 Perceptual Evaluation of Speech Quality (PESQ)ITU-R BS1387 Objective Measurements of Perceived Audio Quality

SQM

Y(J) Stein VoP4 39

V

O

P

YJS

Some objective methodsSome objective methods

Perceptual Speech Quality Measurement (PSQM)ITU-T P.861

Perceptual Analysis Measurement System (PAMS)BT proprietary technique

Perceptual Evaluation of Speech Quality (PESQ)ITU-T P.862

Objective Measurement of Perceived Audio Quality (PAQM)ITU-R BS.1387

E-modelITU-T G.107, G.108 ETSI ETR-250

SQM

Y(J) Stein VoP4 40

V

O

P

YJS

Objective Quality StrategyObjective Quality Strategy

speechMOS

estimate

channel

QM

QM

to

MOS

SQM

Y(J) Stein VoP4 41

V

O

P

YJS

PSQM philosophyPSQM philosophy(from P.861)(from P.861)

Perceptual

model

Perceptual

model

Internal

Representation

Internal

Representation

Audible

Difference

Cognitive

Model

SQM

Y(J) Stein VoP4 42

V

O

P

YJS

PSQM philosophy (cont)PSQM philosophy (cont)Perceptual Modelling (Internal representation) Short time Fourier transform Frequency warping (telephone-band filtering, Hoth noise) Intensity warping

Cognitive Modelling Loudness scaling Internal cognitive noise Asymmetry Silent interval processing

PSQM Values 0 (no degradation) to 6.5 (maximum degradation)

Conversion to MOS PSQM to MOS calibration using known references Equivalent Q values

SQM

Y(J) Stein VoP4 43

V

O

P

YJS

Problems with PSQMProblems with PSQM

Designed for telephony grade speech codecs

Doesn’t take network effects into account: filtering variable time delay localized distortions

Draft standard P.862 adds: transfer function equalization time alignment, delay skipping distortion averaging

SQM

Y(J) Stein VoP4 44

V

O

P

YJS

PESQ philosophyPESQ philosophy(from P.862)(from P.862)

Perceptual

model

Perceptual

model

Internal

Representation

Internal

Representation

Audible

Difference

Cognitive

Model

Time

Alignment

SQM

Y(J) Stein VoP4 45

V

O

P

YJS

E-modelE-modelR factor mouth to ear transmission quality model

R = R0 - Is - Id - Ie + Awhere

R0 effect of SNR

Is effect of simultaneous impairments

Id effect of delayed impairments

Ie effect of equipment distortion

A advantage of method (e.g. mobility of cellphone)

Defined in ITU-T G.107, G.108 and ETSI ETR-250

SQM

Y(J) Stein VoP4 46

V

O

P

YJS

VQMonVQMon

PSQM and PESQ are intrusive techniquesPSQM and PESQ require on-line DSP processing

Given the speech encoder

shouldn’t there be a connection

between network parameters e.g. packet loss, jitter

and speech quality?

A nonintrusive technique has been developed based on the E-model

Invented by AD Clark (Telchemy) accepted by ETSI TIPHON

SQM

y(j) stein vop4 1 vopvop yjs other features. y(j) stein vop4 2 vopvop yjs echo cancellation

Documents