microsoft powerpoint - ccnc10_voip

64
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010 Jin Li, Microsoft Research 1 TOWARDS GLITCH-FREE VOIP AND VIDEO CONFERENCING JIN LI MICROSOFT RESEARCH Outline 2 Introduction Anatomy of VoIP and Video Conferencing Systems Audio/Video Components Network Components Summary

Upload: videoguy

Post on 17-May-2015

600 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 1

TOWARDS GLITCH-FREE VOIP AND VIDEO CONFERENCING

JIN LI

MICROSOFT RESEARCH

Outline2

� Introduction

� Anatomy of VoIP and Video Conferencing Systems

� Audio/Video Components

� Network Components

� Summary

Page 2: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 2

Introduction3

Booming of IP Based Communication4

� Advanced voice over IP (VoIP)

� Web-, audio-, video-conferencing

� Tele-presence

� Instant messaging

� Calendar and other PIM functions

� Email, fax and voice mail

Page 3: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 3

Worldwide VoIP subscribers5

• Worldwide VoIP service revenue was $24.1B in 2007, up 52% over 2006. • It is expected that worldwide VoIP service to more than double over the next 4 years, to $61.3B in 2011, with an annual growth rate of 26%.

Source: 2008 Infonetics Research Inc,

US Broadband Telephony Forecast, 2007-2013

6

VoIP subscriber base are predicted to double from 2007 to 2013.

Source: Jupiter Research, US Broadband Telephony Forecast, 2008 to 2013

Page 4: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 4

VoIP Trend7

� IP networks are the next gen networks for all forms of communication.

� Broadband penetration is a key driver of VoIP expansion

� Worldwide DSL subscriptions were at 205.9M at the end of 2007, up 23% from 2011. It is predicted to increase to 363.6M in 2011.

� Cable subscriptions were up 15% annually to 68M at the end of 2007, climbing to 97.3M in 2011.

� Passive Optical Network (PON) subscribers were at 10.9M in 2007

� Ethernet FTTH subscribers were at 1.7M in 2007

� 2004/2005 are breakthrough years for VoIP adoption

High End Systems – Tele-Presence8

Cisco Telepresence $299K Tandberg Experia $225K

HP Halo $425K + $18K/mo Polycom RPX210M $269K + $18.5K/mo

Page 5: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 5

Worldwide Tele-presence Forecast (2006-2012)

9

# of end points

Revenue forecast

Source: 2008 IDC Research

Desktop Video Conferencing10

� Multiple solutions, often acted as add on to VoIP

� Benefit

� See faces of people you may not have met before

� See facial expressions & gestures

� Easier to follow a conversation

� More interactive than phone

� Get the general mood of ambience

� See and show documents/objects

� Drawback

� Difficult to setup and planning

� Network reliability� Without(or poor) video, people talk; without(or poor) audio, people walk.

� Interpersonal factors

Page 6: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 6

Anatomy of VoIP and Video Conferencing Systems

11

Infrastructure vs. P2P

� Infrastructure based� Microsoft Unified

Communication

� Cisco

� Gtalk

� P2P based� Skype

12

Page 7: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 7

Infrastructure Based VoIP:Microsoft Unified Communication

13

Unified Communication: Architecture14

Page 8: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 8

Unified Communication: P2P Call15

Key Steps16

� Alice calls Bob

� Find Bob’s registered SIP endpoints

Page 9: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 9

Unified Communication: To VoiceMail17

Key Steps18

� Alice calls Bob

� Find Bob’s registered SIP endpoints

� Voicemail system plays a greeting, records Alice’s msg, send the msgto Bob’s email, and use speech server to transcribe the msg

Bob doesn’t answer after a certain period, call re-routes

Page 10: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 10

Unified Communication: PSTN�UC19

Key Steps20

� PSTN user Alice calls Bob

� IP-PSTN gateway terminates the call

� MS/Gateway routes call to mediation server, which performs transcoding & ICE, etc..

� Through director, the proper UC client is found

Page 11: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 11

P2P VoIP: Skype21

P2P VoIP: Skype

� Information

�Debut: 08/2003, by N. Zennstrom and J. Friis, who founded KaZaA

�A P2P overlay network for VoIP and other app

� Free intra-net VoIP and fee-based SkypeOut/SkypeIn

22

Page 12: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 12

Skype Usage (Apr. 2008)

� 11 million concurrent Skype users on line in peak time (180,000+ simultaneous calls)

� 309 million registered users worldwide, the largest registered user base within eBay portfolio (33 million added users for Q1FY08)

� $126M revenue in Q1FY08 (61% YOY growth, 5.6 billion SkypeOut minutes in FY2007)

� 100 billion cumulative Skype-to-Skype minutes

23

Skype Share of International VoIP Traffic

24

Page 13: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 13

Skype Gadget25

Netgear Skype Wi-Fi Phone

Motorola CN620WiFi Cellphone

IPEVO Free-1USB Skype Phone

USB Mouse with Phone50 hardware partners, 150+ Skype certificated device.

IPDRUM mobile SkypeCable

Skype vs. VoIP

� Public VoIP standard

� H.323, SIP

� Skype is a proprietary VoIP solution

� Rely on P2P network for user directory

� Scalable without costly infrastructure

� Route calls through supernodes in Skype

� Universal firewall/NAT traversal

� Encrypted traffic (but you have to trust eBay/Skype)

26

Page 14: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 14

Skype Ingredient (1)27

User retrieves ID from

a skype server

Skype Network

� any computer w/ sufficient CPU, memory & network bw & not behind firewall

� For distributed directory service

� Relay traffic for computer behind NAT/firewall

28

Skype

Server

Supernode Overlay:

authentication

Page 15: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 15

NAT Traversal (Skype)29

� NAT/Firewall detection� Try UDP connection

� Try TCP connection (arb port, 80 (http), 443(https) )

� Traversal� Direct connection if a) both clients have no NAT, b) one

client has no NAT, and one behind cone-NAT

� Relay by supernode otherwise

� Since Skype doesn’t need to pay for relay cost� High bitrate wideband voice codec (>24kbps)

Skype : Call Routing Through Supernode30

Skype

Server

Supernode Overlay:

authentication

�Route call through supernodes

�High bitrate wideband voice codec (>24kbps)

Page 16: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 16

Skype Encryption

�256-bit AES over 128 bit data block

�1536/2048 RSA for key negotiation (2048/2048 for paid service)

31

Peer 1Peer 2

Skype: Complete Black box(Security by Obfuscation )

� Almost everything is obfuscated� Many protections, anti-debugging tricks, ciphered code� Avoid static disassembly: xor binary with a hard-coded key,

erasure beginning of the code, own packer� Code integrity check: use checksum to avoid breakpoint� Anti-debugging technique: anti softice, integrity check� Code obfuscation� Network obfuscation

32

Page 17: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 17

Audio/Video Component33

Audio/Video Component34

� Audio Codec

� Video Codec

� Acoustic Echo Cancellation

Page 18: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 18

Audio Codec35

G.711 (PCM)

� Still widely used today: PSTN interface� If uniform quantization

�12 bits * 8 k/sec = 96 kbps� Non-uniform quantization

�65 kbps DS0 rate�North America: µ-law

�Other countries: A-law

�MOS of about 4.3µ = 255 , A = 87.6

Page 19: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 19

G.722.1: Siren

� Audio bandwidth: 14 kHz� Sample rate: 32 kHz� Bit rate: 24, 32, and 48 kbit/s

� Algorithm: Transform coding (Siren14TM)� Frame size: 20 ms� Algorithmic delay: 40 ms� Complexity: <11 WMOPS (enc/dec)� Available on royalty-free licensing terms (from Polycom)

Siren Encoder

Page 20: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 20

Siren Decoder39

Siren Codec

� Audio sampled at 32kHz

� Operates on frames of 20 ms corresponding to 640

samples

� Based on transform coding, using a Modulated

Lapped Transform (MLT)

� A Look-ahead of 20 ms due to 50% overlap between

frames

� Total algorithmic delay of 40 ms

Page 21: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 21

41/75

MLT - Modulated Lapped Transforms

Spatial Response Frequency Domain

Categorization & SQVH42

Expected # of Bits For Each Category

Quantization Used by SQVH

Vector Property Used in SQVH

Page 22: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 22

AMR-WB Basics

� “Wideband coding of speech at around 16kbit/s using adaptive multi-rate wideband (AMR-WB)”

� Adopted as ITU-T G722.2, and also as 3GPP spec TS 26.190.

� “Foreseen applications are: VoIP and internet applications, Mobile Com., PSTN app, ISNDN wideband telephony, ISDN videophone and videoconf.”

� Sampling rate 16KHz;� Bitrate: 6.60, 8.85, 12.65, 14.25, 15.85, 18.25, 19.85,

23.05, and 23.85 kbit/s.� 20 ms frame.

� ACELP (algebraic code excited LPC).

Pre-processing

� Sampling rate conversion: 16 to 12.8KHz; (now a 20ms frame has 256 samples…)

� HP filter (cut off @ 50Hz)

� Pre-emphasis filter ( 1 -.68 z-1 )

Page 23: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 23

LP analysis and Quant.

� One 30 ms asymmetric window� 5 ms look-ahead

� Obtain LPC Coef.:� Compute correlation;

� Multiply by window (add 60HZ BW expansion);

� R(0) = 1.0001R(0) ( adds 40dB noise floor);

� levinson-durbin to compute LP coefficients.

� LP to ISP

� Quantize in ISP q-domain.

LP analysis and Quant. (2)

� Quantization bottom line:

� 46 bits/frame on most modes;

� 36 bits/frame on 6.60 Kbps mode;

� M.A. prediction with 1/3 gain;

� Quantizer: S-MSVQ (split multistage VQ)

� Both quantized and unquantized coefs will be used in algorithm.

Page 24: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 24

subframes

� Each 20ms (256 samples) frame is divided in 4 sub-frames (64 samples each).

� Interpolated LPC coefficients obtained for each sub-frame

� Interpolation done in ISP q-domain

Perceptual weighting

� Weighting filter is:

W(z) = A(z/γ1).Hde-emph(z)

� This helps solving the tilt problem, which is worse in WB speech.

Page 25: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 25

Excitation

� Searched for each 5ms sub-frame.

� Two components:

�Adaptive codebook (past excitation)

�Algebraic codebook

� “target” signal obtained by filtering the LPC residual (for the sub-frame) through the synthesis LPC filter and weighting filter.

Adaptive codebook

� Start with “open loop” pitch estimation� based on cross correlation;

� Low-value bias;

� ‘last value’ value bias (actually 5-frame median), if voiced.

� Re-compute with “closed loop”, around initial value ±7, and up to ¼ sample precision.� “Analysis by synthesis” based;

� Restrict to values allowed by encoding step.

� Start with “open loop” pitch estimation� based on cross correlation;

� Low-value bias;

� ‘last value’ value bias (actually 5-frame median), if voiced.

� Re-compute with “closed loop”, around initial value ±7, and up to ¼ sample precision.� “Analysis by synthesis” based;

� Restrict to values allowed by encoding step.

Page 26: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 26

Algebraic codebook

� Remove contribution of (unquantized) prediction from adaptive codebook from the “target signal” to obtain new target.

� Divide sub-frame into 4 alternating tracks.

Algebraic codebook (2)

� Select best pulses, for a total of 24 (6), 18(5-4), 16 (4), 12(3), 10(3-2), 8(2), 4(1), 2(.5), depending on bitrate.

� Pulses + Two filters:� Periodicity enhancement: 1/(1-.85z-T);

� Tilt: 1/(1- β1 z -1)

� Tricks to save bits in encoding pulse position;

� Tricks to save computation on pulse search.

Page 27: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 27

Wrap up

� High pass, de-emphasis;

� Upsample back to 16KHz;

� Add high frequency components.

High Freq. Components

� Random noise used as excitation

� LP filter is extended to 8KHz.

� Energy of excitation based on energy of base-band residual, and voicing info, except in highest bitrate mode.

� Extension of LPC filter is equivalent to mapping 5.1 to 5.6Khz to 6.4 to 7.0KHz;

� Band-pass filtered to 6-7KHz, and added to output signal.

Page 28: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 28

Video Codec55

H.264/AVC Encoder56

Page 29: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 29

H.264/AVC Decoder57

Reference Picture Management58

� Reference pictures are stored in decoded picture buffer (DPB)

� Short/long term reference picture, a decoded frame may be marked as � unused for reference

� short term picture

� long term picture� Sliding Window” memory management

� Keep #(long_term_pic+ short_term_pic)� Remove short term picture if lack of space

� Adaptive memory control� issued by encoder� change the type of the ref frame

� IDR (Instantaneous Decoder Refresh)� clear ref buffer� I frame

Page 30: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 30

Slice Group59

� Former called “FMO” (Flexible MacroblockOrdering)

� A subset of the macroblocks and may contain one or more slices

� Error resilience

Inter Prediction60

� Variable block size

� ¼ pixel motion compensation

� Interpolation

Page 31: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 31

Motion Vector (MV) Prediction61

� Efficiently encode correlated MV

� Other than 16×8 and 8×16, MVp=(MVA+MVB+MVC) /3

� 16×8, MVp of the upper =MVB ;MVp of the lower =MVA

� 8×16, MVp of the left =MVA ;MVp of the right =MVC

� For skipped macroblocks, do as 16 × 16 Inter mode

Intra Prediction62

� For Luma samples

� 4*4 block: 9 prediction modes

� 16*16 block: 4 modes

� I_PCM: transmit the encoded samples w/o pred. & trans

Page 32: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 32

Prediction Modes63

4x4 Luma

Intra 16x16

8x8 Chroma is similar to 16x16 luma intra

Signaling of Intra Prediction Modes64

� Mode choices need to be signaled to the decoder, but compactly

� The prediction mode for luma coded in Intra-16×16 mode or chroma coded in Intra mode is signaled in the macroblock header

� Intra modes for neighboring 4 × 4 blocks are often correlated

� If A and B are available, C = min (A,B)

� else if (neither A nor B are available) C = 2 (DC)

� else C = available (A,B)

� Use prev_intra4x4_pred_mode flag & rem_intra4x4_pred_mode flag to indicate mode selected.

BCA

Page 33: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 33

65

Deblocking filter

� Filter 4 vertical/horizontal boundaries of luma

� Filter 2 vertical/horizontal boundaries of chroma

� Affect up to 3 samples on the either side.

� The filter is stronger at places where there is likely to be significant blocking distortion� e.g.: such as the boundary of an intra coded macroblock or a boundary

between blocks that contain coded coefficients.

66

Transform and Quantisation

� 3 transforms� DCT-base transform for all 4*4 residual block

� Hadamard transform for 4*4 luma DC coefficient (in 16*16 intra)

� Hadamard transform for 2*2 chroma DC coefficient

a=1/2, b = (2/5)1/2, d = 1/2

Page 34: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 34

67

Combine Quantization into Scaling of Transform

� |ZD(i, j)| = (|YD(i, j)| MF(0,0) + 2f ) >> (qbits +1)

� sign (ZD(i, j)) = sign (YD(i, j))

4x4 DC Intra Luma

� |ZD(i, j)| = (|YD(i, j)| MF(0,0) + 2f ) >> (qbits +1)

� sign (ZD(i, j)) = sign (YD(i, j))

CAVLC: Context-Based Adaptive Variable Length Coding

68

� Characteristics:� Run-level coding to compact zero string

� Trailing ones (+1, -1 after 0)

� Number of nonzero coefficient in neighboring blocks is correlated

� Choice VLC lookup table for level parameter for level magnitude

Page 35: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 35

CAVLC Encoding69

� 1. Encode the number of coefficients and trailing ones (coeff token)� TotalCoeffs : 0 ~ 16

� TrailingOnes : 0 ~ 3� if more than 3 TrailingOnes, only last three are treated as ‘special cases’

� Four look up table� Three variable-length, one fixed-length

� Choice depend on neighboring blocks

� 2. Encode the sign of each TrailingOne: In reverse order

� 3. Encode the levels of the remaining nonzero coefficients� level_prefix, level_suffix

� 4.Encode the total number of zeros before the last coefficient� Zero-runs at start of the array need not to be encoded

� 5. Encode each run of zeros� If less then 3 TrailingOnes, the first nonzero coefficient is adjusted

Acoustic Echo Cancellation70

Page 36: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 36

Acoustic Echo Cancellation71

From AudioDecoder

To AudioEncoder

Acoustic Echo Cancellation

Acoustic Echo Cancellation Module72

Page 37: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 37

Adaptive Traversal Filter73

� FIR filter – inherently stable

� Length of the filter affects other performance, convergence, goodness, and complexity.

� Filter introduces errors since it is trying to model IIR response.

� Short Filters

� 128 – 256 coefficients (taps)

� Faster convergence, but final solution has more residual error

� Less complex O(N).

� Long Filters

� 512-1024

� Slower convergence, but final solution has less error.

� More complex, as algorithm can be O(N2)

Challenges74

� Dynamic range of the human ear = 120dB.� Even quiet echoes can be heard.

� Longer delays from satellite (300-500ms), VoIP� Ear is more sensitive to longer delays.

� More difficult to find the beginning of the echo.

� Long filters (~1000 taps) are needed (complexity & convergence)

� Near-end noise: corrupt the echo, decreasing the cancellers ability to converge.

� Acoustic echo paths can change rapidly� More difficult for the AEC to remain converged.

� Nonlinear echo components� Speakers driven beyond linear region.

Page 38: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 38

Network Component75

IP-based VoIP / Video Conference76

Page 39: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 39

Internet Primer77

Internet : Grand View78

Page 40: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 40

Impact on ISPs79

sibling

peering

peering entityboundary

sibling entityboundary

transit

� Economics of ISP relationships

� sibling relationship

� several ISPs belong to same org

� peering relationship

� mutual beneficial free agreement (to certain extent)

� transit relationship

� one ISP pays another

Inside ISP80

Page 41: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 41

ISP POP (Point of Presence)81

Home Networking82

Page 42: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 42

Network Characteristics83

Under-provisioned Links84

BranchBranch

Page 43: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 43

Growth Trends85

Packet Loss vs. Jitter (vs. Delay?)86

Page 44: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 44

The Usual Suspects87

Packet Bursts88

Page 45: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 45

What kind of Enterprise User?89

How QoS can help90

Page 46: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 46

QoS helps inside and between branches!

91

Observation92

� IP-based communication in the enterprise is growing

� Empirical results show poor calls for Wireless and VPN users

� QoS (DiffServ) is both used and useful!

Page 47: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 47

Available Bandwidth Estimation93

What is Available Bandwidth (ABW)?94

� ABW is the left-over capacity along an Internet path

Page 48: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 48

Why Is It Useful?

� Maximizing QoE (Quality of Experience) in A/V conferencing� Audio prefers minimum delay (high priority)� Video prefers maximum rate (low priority)

� One solution: measure ABW, encode and send video at the ABW rate

One Way Delay (OWD) = propagation delay (constant) + queuing delay (variable)

Typical Targeting Scenario

� First hop is the bottleneck

� Cable modem, DSL, high-speed link…

� Timescale for the ABW estimation: 2-4 seconds

Page 49: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 49

Why Is Measuring ABW Hard?

� Available bandwidth changes over time � ABW measurements must be quick

� Audio packets (along the same path) should experience minimum delay � Measurement must be non-intrusive

Two Models

� Probe Rate Model (PRM) based solutions

� Pathload, TOPP, Pathchirp, Bfind, PTR …

� Probe Gap Model (PGM) based solutions

� Spruce, Delphi, IGI, Moseab …

Page 50: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 50

Pathload (PRM) [Jain & Dovrolis]

� Send probe trains at various rates

� ABW is the probe rate at transition, where OWD is increasing (queuing delay is observed)

Spruce (PGM) [Jacob et. al.]

� Send probe pairs/train at Ri (Ri > A), measure sending gaps and receiving gaps

� Compute A directly

Page 51: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 51

Advantage/Disadvantages of The Approaches

Advantages Disadvantages

PGM based

approaches

Fast estimation:

Estimation can be done in

single probe.

Assumptions are not easy

to verify in practice

PRM based

approaches

No assumption Slow estimation:

iterative probes

Forward Error Correction102

Page 52: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 52

Block Based Erasure Resilient Coding 103

k1 2 3

1 2 3 k k+1 n

Original data:

ERC:

k messages

At a certain

instance X X X XX

X

Some of the blocks may be lost in delivery. However, as long as there

are at least k blocks delivered, the original data can be reconstructed.

ERC in VoIP and Video Conferencing

� VoIP

� Mainly packet replication, due to small VoIP packet size & low delay requirement

� Video Conferencing

� Packet loss protection (for I frame or P frame in HD)

� Each frame is separate into k msg, and protect by n-k msg. As long as there are less than n-k loss, the transmission succeeds

104

Page 53: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 53

ERC Terms

� Number of Original Block: k

� Number of Coded Block: n

� Rate of ERC: k/n

� MDS: Maximum Distance Separable

� Any k of n coded block may recover the original

� The theoretical optimal performance

105

Erasure Encoding: Mathematics

106

xkx1 x2

y1 y2 yn

Original data:

Coded data:

: Vectors on Galois Field.

Page 54: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 54

Example: ERC of 10MB

107

xkx1 x2

y1 y2 yn

Original data(10MB): Coded data:(n=30)

k=10, GF(28), each vector is 1MB.

30

10 1M 1M

Erasure Decoding: Mathmatics108

xkx1 x2

y1 y2 yn

Original data:

Coded data:

Code select

Available

Page 55: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 55

Erasure Decoding: Mathmatics109

xkx1 x2

y1 y2 yn

Original data:

Coded data:

Original data can be recovered if the sub-generator matrix

has a full rank k.

Systematic vs Non-Systematic ERC

� Systematic ERC

� Slightly low encoding & decoding complexity

� Even can’t recover, we can still use some original msg

110

k1 2 3

1 2 3 k k+1 n

Original data:

Non systematicERC:

k messages

1 2 3 k k+1 nSystematicERC:

Page 56: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 56

Reed-Solomon111

� Has been around for decades

� Has systematic form

� Cauchy Reed-Solomon Code

Tutorial, Jin Li

Reed-Solomon Decoding

112

Receive

Inverse

Page 57: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 57

Dejitter Buffer113

Variable Delay & Dejitter Buffer

� Queuing delay

� Dejitter buffers

� Variable packet sizes

DejitterBuffer

Queuing Delay

Queuing Delay

Queuing Delay

Page 58: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 58

Fixed Dejitter Buffer – Budget For Worst Case

� Total End-to-End Delay� Codec delay: 40ms

� Propagation delay: 8ms

� Dejitter buffer: 50ms � To accommodate queuing delay: 0-50 ms

� Total delay: 98ms

PropagationDelay—8 ms

Coder Delay40 ms

Dejitter Buffer50 ms

QueuingDelay

4-50 ms

Site A Site B

(128kbps Bandwidth

Dejitter Buffer Size & Late Loss

late loss

buffering delay

Playout Jitter

Delay Packet Loss

Fixed playout deadline and jitter absorption:

� The playout rate is constant� The tradeoff is between Dejitter

buffer size and late loss

Page 59: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 59

Adaptive Playout and Dejitter Buffer Adaptation

Adaptive playout and jitter adaptation

� Scaling of voice/video packets in highly dynamic way

� Playout schedule set according to past delays recorded� Usually dejitter buffer size expand quickly to late

packet arrival, and shrink slowly when jitter reduces

� Improved tradeoff between buffering delay and late loss

� Playout rate is not constant

Playout Jitter

Delay Packet Loss

buffering delay

Adaptive Play Out118

� Packets push into Adaptive Playout module

� Render requests new waveform seg for playout

� Playout module passes packet to audio decoder

Audio AdaptivePlayout

Page 60: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 60

Packet Loss Concealment119

Audio Packet Loss Concealment

i-2 i-1 i+1 i+2

time

i-2 i+2

time

i lost

i-1 i+1

L ∆L

2 L1.3 L

alignment found by correlation

� Depend on voiced & unvoiced segment

Page 61: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 61

Voiced segments

Unvoiced segments

Page 62: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 62

Concealment as (bi-directional) stretching

Video Packet Loss Concealment124

� Spatial Concealment

� Use spatial correlation

� E.g., bilinear interpolation

� Projection onto convex sets

� Temporal Concealment

� Use correlation exists between consecutive frames

� Temporal replacement

� Boundary matching

Page 63: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 63

Spatial-Temporal Concealment125

Summary126

Page 64: Microsoft PowerPoint - ccnc10_voip

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010

Jin Li, Microsoft Research 64

Summary127

� VoIP/Video Conference Systems� Infrastructure based

� P2P based

� Audio/Video Components� Audio codec

� Video codec

� Acoustic echo cancellation

� Network components� Primer of the Internet

� Network characteristics

� Available bandwidth estimation

� Forward error correction (FEC)

� Dejitter buffer

� Packet loss concealment