microsoft powerpoint - ccnc10_voip
TRANSCRIPT
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 1
TOWARDS GLITCH-FREE VOIP AND VIDEO CONFERENCING
JIN LI
MICROSOFT RESEARCH
Outline2
� Introduction
� Anatomy of VoIP and Video Conferencing Systems
� Audio/Video Components
� Network Components
� Summary
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 2
Introduction3
Booming of IP Based Communication4
� Advanced voice over IP (VoIP)
� Web-, audio-, video-conferencing
� Tele-presence
� Instant messaging
� Calendar and other PIM functions
� Email, fax and voice mail
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 3
Worldwide VoIP subscribers5
• Worldwide VoIP service revenue was $24.1B in 2007, up 52% over 2006. • It is expected that worldwide VoIP service to more than double over the next 4 years, to $61.3B in 2011, with an annual growth rate of 26%.
Source: 2008 Infonetics Research Inc,
US Broadband Telephony Forecast, 2007-2013
6
VoIP subscriber base are predicted to double from 2007 to 2013.
Source: Jupiter Research, US Broadband Telephony Forecast, 2008 to 2013
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 4
VoIP Trend7
� IP networks are the next gen networks for all forms of communication.
� Broadband penetration is a key driver of VoIP expansion
� Worldwide DSL subscriptions were at 205.9M at the end of 2007, up 23% from 2011. It is predicted to increase to 363.6M in 2011.
� Cable subscriptions were up 15% annually to 68M at the end of 2007, climbing to 97.3M in 2011.
� Passive Optical Network (PON) subscribers were at 10.9M in 2007
� Ethernet FTTH subscribers were at 1.7M in 2007
� 2004/2005 are breakthrough years for VoIP adoption
High End Systems – Tele-Presence8
Cisco Telepresence $299K Tandberg Experia $225K
HP Halo $425K + $18K/mo Polycom RPX210M $269K + $18.5K/mo
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 5
Worldwide Tele-presence Forecast (2006-2012)
9
# of end points
Revenue forecast
Source: 2008 IDC Research
Desktop Video Conferencing10
� Multiple solutions, often acted as add on to VoIP
� Benefit
� See faces of people you may not have met before
� See facial expressions & gestures
� Easier to follow a conversation
� More interactive than phone
� Get the general mood of ambience
� See and show documents/objects
� Drawback
� Difficult to setup and planning
� Network reliability� Without(or poor) video, people talk; without(or poor) audio, people walk.
� Interpersonal factors
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 6
Anatomy of VoIP and Video Conferencing Systems
11
Infrastructure vs. P2P
� Infrastructure based� Microsoft Unified
Communication
� Cisco
� Gtalk
� P2P based� Skype
12
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 7
Infrastructure Based VoIP:Microsoft Unified Communication
13
Unified Communication: Architecture14
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 8
Unified Communication: P2P Call15
Key Steps16
� Alice calls Bob
� Find Bob’s registered SIP endpoints
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 9
Unified Communication: To VoiceMail17
Key Steps18
� Alice calls Bob
� Find Bob’s registered SIP endpoints
� Voicemail system plays a greeting, records Alice’s msg, send the msgto Bob’s email, and use speech server to transcribe the msg
Bob doesn’t answer after a certain period, call re-routes
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 10
Unified Communication: PSTN�UC19
Key Steps20
� PSTN user Alice calls Bob
� IP-PSTN gateway terminates the call
� MS/Gateway routes call to mediation server, which performs transcoding & ICE, etc..
� Through director, the proper UC client is found
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 11
P2P VoIP: Skype21
P2P VoIP: Skype
� Information
�Debut: 08/2003, by N. Zennstrom and J. Friis, who founded KaZaA
�A P2P overlay network for VoIP and other app
� Free intra-net VoIP and fee-based SkypeOut/SkypeIn
22
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 12
Skype Usage (Apr. 2008)
� 11 million concurrent Skype users on line in peak time (180,000+ simultaneous calls)
� 309 million registered users worldwide, the largest registered user base within eBay portfolio (33 million added users for Q1FY08)
� $126M revenue in Q1FY08 (61% YOY growth, 5.6 billion SkypeOut minutes in FY2007)
� 100 billion cumulative Skype-to-Skype minutes
23
Skype Share of International VoIP Traffic
24
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 13
Skype Gadget25
Netgear Skype Wi-Fi Phone
Motorola CN620WiFi Cellphone
IPEVO Free-1USB Skype Phone
USB Mouse with Phone50 hardware partners, 150+ Skype certificated device.
IPDRUM mobile SkypeCable
Skype vs. VoIP
� Public VoIP standard
� H.323, SIP
� Skype is a proprietary VoIP solution
� Rely on P2P network for user directory
� Scalable without costly infrastructure
� Route calls through supernodes in Skype
� Universal firewall/NAT traversal
� Encrypted traffic (but you have to trust eBay/Skype)
26
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 14
Skype Ingredient (1)27
User retrieves ID from
a skype server
Skype Network
� any computer w/ sufficient CPU, memory & network bw & not behind firewall
� For distributed directory service
� Relay traffic for computer behind NAT/firewall
28
Skype
Server
Supernode Overlay:
authentication
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 15
NAT Traversal (Skype)29
� NAT/Firewall detection� Try UDP connection
� Try TCP connection (arb port, 80 (http), 443(https) )
� Traversal� Direct connection if a) both clients have no NAT, b) one
client has no NAT, and one behind cone-NAT
� Relay by supernode otherwise
� Since Skype doesn’t need to pay for relay cost� High bitrate wideband voice codec (>24kbps)
Skype : Call Routing Through Supernode30
Skype
Server
Supernode Overlay:
authentication
�Route call through supernodes
�High bitrate wideband voice codec (>24kbps)
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 16
Skype Encryption
�256-bit AES over 128 bit data block
�1536/2048 RSA for key negotiation (2048/2048 for paid service)
31
Peer 1Peer 2
Skype: Complete Black box(Security by Obfuscation )
� Almost everything is obfuscated� Many protections, anti-debugging tricks, ciphered code� Avoid static disassembly: xor binary with a hard-coded key,
erasure beginning of the code, own packer� Code integrity check: use checksum to avoid breakpoint� Anti-debugging technique: anti softice, integrity check� Code obfuscation� Network obfuscation
32
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 17
Audio/Video Component33
Audio/Video Component34
� Audio Codec
� Video Codec
� Acoustic Echo Cancellation
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 18
Audio Codec35
G.711 (PCM)
� Still widely used today: PSTN interface� If uniform quantization
�12 bits * 8 k/sec = 96 kbps� Non-uniform quantization
�65 kbps DS0 rate�North America: µ-law
�Other countries: A-law
�MOS of about 4.3µ = 255 , A = 87.6
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 19
G.722.1: Siren
� Audio bandwidth: 14 kHz� Sample rate: 32 kHz� Bit rate: 24, 32, and 48 kbit/s
� Algorithm: Transform coding (Siren14TM)� Frame size: 20 ms� Algorithmic delay: 40 ms� Complexity: <11 WMOPS (enc/dec)� Available on royalty-free licensing terms (from Polycom)
Siren Encoder
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 20
Siren Decoder39
Siren Codec
� Audio sampled at 32kHz
� Operates on frames of 20 ms corresponding to 640
samples
� Based on transform coding, using a Modulated
Lapped Transform (MLT)
� A Look-ahead of 20 ms due to 50% overlap between
frames
� Total algorithmic delay of 40 ms
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 21
41/75
MLT - Modulated Lapped Transforms
Spatial Response Frequency Domain
Categorization & SQVH42
Expected # of Bits For Each Category
Quantization Used by SQVH
Vector Property Used in SQVH
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 22
AMR-WB Basics
� “Wideband coding of speech at around 16kbit/s using adaptive multi-rate wideband (AMR-WB)”
� Adopted as ITU-T G722.2, and also as 3GPP spec TS 26.190.
� “Foreseen applications are: VoIP and internet applications, Mobile Com., PSTN app, ISNDN wideband telephony, ISDN videophone and videoconf.”
� Sampling rate 16KHz;� Bitrate: 6.60, 8.85, 12.65, 14.25, 15.85, 18.25, 19.85,
23.05, and 23.85 kbit/s.� 20 ms frame.
� ACELP (algebraic code excited LPC).
Pre-processing
� Sampling rate conversion: 16 to 12.8KHz; (now a 20ms frame has 256 samples…)
� HP filter (cut off @ 50Hz)
� Pre-emphasis filter ( 1 -.68 z-1 )
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 23
LP analysis and Quant.
� One 30 ms asymmetric window� 5 ms look-ahead
� Obtain LPC Coef.:� Compute correlation;
� Multiply by window (add 60HZ BW expansion);
� R(0) = 1.0001R(0) ( adds 40dB noise floor);
� levinson-durbin to compute LP coefficients.
� LP to ISP
� Quantize in ISP q-domain.
LP analysis and Quant. (2)
� Quantization bottom line:
� 46 bits/frame on most modes;
� 36 bits/frame on 6.60 Kbps mode;
� M.A. prediction with 1/3 gain;
� Quantizer: S-MSVQ (split multistage VQ)
� Both quantized and unquantized coefs will be used in algorithm.
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 24
subframes
� Each 20ms (256 samples) frame is divided in 4 sub-frames (64 samples each).
� Interpolated LPC coefficients obtained for each sub-frame
� Interpolation done in ISP q-domain
Perceptual weighting
� Weighting filter is:
W(z) = A(z/γ1).Hde-emph(z)
� This helps solving the tilt problem, which is worse in WB speech.
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 25
Excitation
� Searched for each 5ms sub-frame.
� Two components:
�Adaptive codebook (past excitation)
�Algebraic codebook
� “target” signal obtained by filtering the LPC residual (for the sub-frame) through the synthesis LPC filter and weighting filter.
Adaptive codebook
� Start with “open loop” pitch estimation� based on cross correlation;
� Low-value bias;
� ‘last value’ value bias (actually 5-frame median), if voiced.
� Re-compute with “closed loop”, around initial value ±7, and up to ¼ sample precision.� “Analysis by synthesis” based;
� Restrict to values allowed by encoding step.
� Start with “open loop” pitch estimation� based on cross correlation;
� Low-value bias;
� ‘last value’ value bias (actually 5-frame median), if voiced.
� Re-compute with “closed loop”, around initial value ±7, and up to ¼ sample precision.� “Analysis by synthesis” based;
� Restrict to values allowed by encoding step.
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 26
Algebraic codebook
� Remove contribution of (unquantized) prediction from adaptive codebook from the “target signal” to obtain new target.
� Divide sub-frame into 4 alternating tracks.
Algebraic codebook (2)
� Select best pulses, for a total of 24 (6), 18(5-4), 16 (4), 12(3), 10(3-2), 8(2), 4(1), 2(.5), depending on bitrate.
� Pulses + Two filters:� Periodicity enhancement: 1/(1-.85z-T);
� Tilt: 1/(1- β1 z -1)
� Tricks to save bits in encoding pulse position;
� Tricks to save computation on pulse search.
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 27
Wrap up
� High pass, de-emphasis;
� Upsample back to 16KHz;
� Add high frequency components.
High Freq. Components
� Random noise used as excitation
� LP filter is extended to 8KHz.
� Energy of excitation based on energy of base-band residual, and voicing info, except in highest bitrate mode.
� Extension of LPC filter is equivalent to mapping 5.1 to 5.6Khz to 6.4 to 7.0KHz;
� Band-pass filtered to 6-7KHz, and added to output signal.
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 28
Video Codec55
H.264/AVC Encoder56
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 29
H.264/AVC Decoder57
Reference Picture Management58
� Reference pictures are stored in decoded picture buffer (DPB)
� Short/long term reference picture, a decoded frame may be marked as � unused for reference
� short term picture
� long term picture� Sliding Window” memory management
� Keep #(long_term_pic+ short_term_pic)� Remove short term picture if lack of space
� Adaptive memory control� issued by encoder� change the type of the ref frame
� IDR (Instantaneous Decoder Refresh)� clear ref buffer� I frame
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 30
Slice Group59
� Former called “FMO” (Flexible MacroblockOrdering)
� A subset of the macroblocks and may contain one or more slices
� Error resilience
Inter Prediction60
� Variable block size
� ¼ pixel motion compensation
� Interpolation
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 31
Motion Vector (MV) Prediction61
� Efficiently encode correlated MV
� Other than 16×8 and 8×16, MVp=(MVA+MVB+MVC) /3
� 16×8, MVp of the upper =MVB ;MVp of the lower =MVA
� 8×16, MVp of the left =MVA ;MVp of the right =MVC
� For skipped macroblocks, do as 16 × 16 Inter mode
Intra Prediction62
� For Luma samples
� 4*4 block: 9 prediction modes
� 16*16 block: 4 modes
� I_PCM: transmit the encoded samples w/o pred. & trans
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 32
Prediction Modes63
4x4 Luma
Intra 16x16
8x8 Chroma is similar to 16x16 luma intra
Signaling of Intra Prediction Modes64
� Mode choices need to be signaled to the decoder, but compactly
� The prediction mode for luma coded in Intra-16×16 mode or chroma coded in Intra mode is signaled in the macroblock header
� Intra modes for neighboring 4 × 4 blocks are often correlated
� If A and B are available, C = min (A,B)
� else if (neither A nor B are available) C = 2 (DC)
� else C = available (A,B)
� Use prev_intra4x4_pred_mode flag & rem_intra4x4_pred_mode flag to indicate mode selected.
BCA
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 33
65
Deblocking filter
� Filter 4 vertical/horizontal boundaries of luma
� Filter 2 vertical/horizontal boundaries of chroma
� Affect up to 3 samples on the either side.
� The filter is stronger at places where there is likely to be significant blocking distortion� e.g.: such as the boundary of an intra coded macroblock or a boundary
between blocks that contain coded coefficients.
66
Transform and Quantisation
� 3 transforms� DCT-base transform for all 4*4 residual block
� Hadamard transform for 4*4 luma DC coefficient (in 16*16 intra)
� Hadamard transform for 2*2 chroma DC coefficient
a=1/2, b = (2/5)1/2, d = 1/2
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 34
67
Combine Quantization into Scaling of Transform
� |ZD(i, j)| = (|YD(i, j)| MF(0,0) + 2f ) >> (qbits +1)
� sign (ZD(i, j)) = sign (YD(i, j))
4x4 DC Intra Luma
� |ZD(i, j)| = (|YD(i, j)| MF(0,0) + 2f ) >> (qbits +1)
� sign (ZD(i, j)) = sign (YD(i, j))
CAVLC: Context-Based Adaptive Variable Length Coding
68
� Characteristics:� Run-level coding to compact zero string
� Trailing ones (+1, -1 after 0)
� Number of nonzero coefficient in neighboring blocks is correlated
� Choice VLC lookup table for level parameter for level magnitude
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 35
CAVLC Encoding69
� 1. Encode the number of coefficients and trailing ones (coeff token)� TotalCoeffs : 0 ~ 16
� TrailingOnes : 0 ~ 3� if more than 3 TrailingOnes, only last three are treated as ‘special cases’
� Four look up table� Three variable-length, one fixed-length
� Choice depend on neighboring blocks
� 2. Encode the sign of each TrailingOne: In reverse order
� 3. Encode the levels of the remaining nonzero coefficients� level_prefix, level_suffix
� 4.Encode the total number of zeros before the last coefficient� Zero-runs at start of the array need not to be encoded
� 5. Encode each run of zeros� If less then 3 TrailingOnes, the first nonzero coefficient is adjusted
Acoustic Echo Cancellation70
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 36
Acoustic Echo Cancellation71
From AudioDecoder
To AudioEncoder
Acoustic Echo Cancellation
Acoustic Echo Cancellation Module72
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 37
Adaptive Traversal Filter73
� FIR filter – inherently stable
� Length of the filter affects other performance, convergence, goodness, and complexity.
� Filter introduces errors since it is trying to model IIR response.
� Short Filters
� 128 – 256 coefficients (taps)
� Faster convergence, but final solution has more residual error
� Less complex O(N).
� Long Filters
� 512-1024
� Slower convergence, but final solution has less error.
� More complex, as algorithm can be O(N2)
Challenges74
� Dynamic range of the human ear = 120dB.� Even quiet echoes can be heard.
� Longer delays from satellite (300-500ms), VoIP� Ear is more sensitive to longer delays.
� More difficult to find the beginning of the echo.
� Long filters (~1000 taps) are needed (complexity & convergence)
� Near-end noise: corrupt the echo, decreasing the cancellers ability to converge.
� Acoustic echo paths can change rapidly� More difficult for the AEC to remain converged.
� Nonlinear echo components� Speakers driven beyond linear region.
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 38
Network Component75
IP-based VoIP / Video Conference76
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 39
Internet Primer77
Internet : Grand View78
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 40
Impact on ISPs79
sibling
peering
peering entityboundary
sibling entityboundary
transit
� Economics of ISP relationships
� sibling relationship
� several ISPs belong to same org
� peering relationship
� mutual beneficial free agreement (to certain extent)
� transit relationship
� one ISP pays another
Inside ISP80
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 41
ISP POP (Point of Presence)81
Home Networking82
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 42
Network Characteristics83
Under-provisioned Links84
BranchBranch
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 43
Growth Trends85
Packet Loss vs. Jitter (vs. Delay?)86
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 44
The Usual Suspects87
Packet Bursts88
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 45
What kind of Enterprise User?89
How QoS can help90
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 46
QoS helps inside and between branches!
91
Observation92
� IP-based communication in the enterprise is growing
� Empirical results show poor calls for Wireless and VPN users
� QoS (DiffServ) is both used and useful!
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 47
Available Bandwidth Estimation93
What is Available Bandwidth (ABW)?94
� ABW is the left-over capacity along an Internet path
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 48
Why Is It Useful?
� Maximizing QoE (Quality of Experience) in A/V conferencing� Audio prefers minimum delay (high priority)� Video prefers maximum rate (low priority)
� One solution: measure ABW, encode and send video at the ABW rate
One Way Delay (OWD) = propagation delay (constant) + queuing delay (variable)
Typical Targeting Scenario
� First hop is the bottleneck
� Cable modem, DSL, high-speed link…
� Timescale for the ABW estimation: 2-4 seconds
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 49
Why Is Measuring ABW Hard?
� Available bandwidth changes over time � ABW measurements must be quick
� Audio packets (along the same path) should experience minimum delay � Measurement must be non-intrusive
�
Two Models
� Probe Rate Model (PRM) based solutions
� Pathload, TOPP, Pathchirp, Bfind, PTR …
� Probe Gap Model (PGM) based solutions
� Spruce, Delphi, IGI, Moseab …
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 50
Pathload (PRM) [Jain & Dovrolis]
� Send probe trains at various rates
� ABW is the probe rate at transition, where OWD is increasing (queuing delay is observed)
Spruce (PGM) [Jacob et. al.]
� Send probe pairs/train at Ri (Ri > A), measure sending gaps and receiving gaps
� Compute A directly
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 51
Advantage/Disadvantages of The Approaches
Advantages Disadvantages
PGM based
approaches
Fast estimation:
Estimation can be done in
single probe.
Assumptions are not easy
to verify in practice
PRM based
approaches
No assumption Slow estimation:
iterative probes
Forward Error Correction102
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 52
Block Based Erasure Resilient Coding 103
k1 2 3
1 2 3 k k+1 n
Original data:
ERC:
k messages
At a certain
instance X X X XX
X
Some of the blocks may be lost in delivery. However, as long as there
are at least k blocks delivered, the original data can be reconstructed.
ERC in VoIP and Video Conferencing
� VoIP
� Mainly packet replication, due to small VoIP packet size & low delay requirement
� Video Conferencing
� Packet loss protection (for I frame or P frame in HD)
� Each frame is separate into k msg, and protect by n-k msg. As long as there are less than n-k loss, the transmission succeeds
104
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 53
ERC Terms
� Number of Original Block: k
� Number of Coded Block: n
� Rate of ERC: k/n
� MDS: Maximum Distance Separable
� Any k of n coded block may recover the original
� The theoretical optimal performance
105
Erasure Encoding: Mathematics
106
xkx1 x2
y1 y2 yn
Original data:
Coded data:
: Vectors on Galois Field.
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 54
Example: ERC of 10MB
107
xkx1 x2
y1 y2 yn
Original data(10MB): Coded data:(n=30)
k=10, GF(28), each vector is 1MB.
30
10 1M 1M
Erasure Decoding: Mathmatics108
xkx1 x2
y1 y2 yn
Original data:
Coded data:
Code select
Available
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 55
Erasure Decoding: Mathmatics109
xkx1 x2
y1 y2 yn
Original data:
Coded data:
Original data can be recovered if the sub-generator matrix
has a full rank k.
Systematic vs Non-Systematic ERC
� Systematic ERC
� Slightly low encoding & decoding complexity
� Even can’t recover, we can still use some original msg
110
k1 2 3
1 2 3 k k+1 n
Original data:
Non systematicERC:
k messages
1 2 3 k k+1 nSystematicERC:
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 56
Reed-Solomon111
� Has been around for decades
� Has systematic form
� Cauchy Reed-Solomon Code
Tutorial, Jin Li
Reed-Solomon Decoding
112
Receive
Inverse
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 57
Dejitter Buffer113
Variable Delay & Dejitter Buffer
� Queuing delay
� Dejitter buffers
� Variable packet sizes
DejitterBuffer
Queuing Delay
Queuing Delay
Queuing Delay
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 58
Fixed Dejitter Buffer – Budget For Worst Case
� Total End-to-End Delay� Codec delay: 40ms
� Propagation delay: 8ms
� Dejitter buffer: 50ms � To accommodate queuing delay: 0-50 ms
� Total delay: 98ms
PropagationDelay—8 ms
Coder Delay40 ms
Dejitter Buffer50 ms
QueuingDelay
4-50 ms
Site A Site B
(128kbps Bandwidth
Dejitter Buffer Size & Late Loss
late loss
buffering delay
Playout Jitter
Delay Packet Loss
Fixed playout deadline and jitter absorption:
� The playout rate is constant� The tradeoff is between Dejitter
buffer size and late loss
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 59
Adaptive Playout and Dejitter Buffer Adaptation
Adaptive playout and jitter adaptation
� Scaling of voice/video packets in highly dynamic way
� Playout schedule set according to past delays recorded� Usually dejitter buffer size expand quickly to late
packet arrival, and shrink slowly when jitter reduces
� Improved tradeoff between buffering delay and late loss
� Playout rate is not constant
Playout Jitter
Delay Packet Loss
buffering delay
Adaptive Play Out118
� Packets push into Adaptive Playout module
� Render requests new waveform seg for playout
� Playout module passes packet to audio decoder
Audio AdaptivePlayout
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 60
Packet Loss Concealment119
Audio Packet Loss Concealment
i-2 i-1 i+1 i+2
time
i-2 i+2
time
i lost
i-1 i+1
L ∆L
2 L1.3 L
alignment found by correlation
� Depend on voiced & unvoiced segment
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 61
Voiced segments
Unvoiced segments
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 62
Concealment as (bi-directional) stretching
Video Packet Loss Concealment124
� Spatial Concealment
� Use spatial correlation
� E.g., bilinear interpolation
� Projection onto convex sets
� Temporal Concealment
� Use correlation exists between consecutive frames
� Temporal replacement
� Boundary matching
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 63
Spatial-Temporal Concealment125
Summary126
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Jin Li, Microsoft Research 64
Summary127
� VoIP/Video Conference Systems� Infrastructure based
� P2P based
� Audio/Video Components� Audio codec
� Video codec
� Acoustic echo cancellation
� Network components� Primer of the Internet
� Network characteristics
� Available bandwidth estimation
� Forward error correction (FEC)
� Dejitter buffer
� Packet loss concealment