Live Music Performances over High-Speed IP Networks
Stefan Karapetkov
Director, Emerging Technologies
TERENA Networking Conference
Bruges, Belgium, May 20, 2008
Agenda
Manhattan School of Music
Audio-Video Networks
Audio Technology Voice-specific Codec FunctionsAdjustments for Live Music Mode
Video Technology
Transmission Technology
Live Music Mode Demo
2
Audio-Video Networks Today
Video Endpoints
Conference Servers
Call Control, Management & Scheduling
Video Recording, Streaming & Content Management
Security & NAT/FW Traversal
Security & NAT/FW Traversal
IM/Presence and IP-PBX Integration
User Database
Gatekeeper
Terminal A Terminal B1) H.225 SETUP 2) H.225 SETUP
6) H.245 CAPS, MS
8) H.245 CAPS, MS
5) H.245 CAPS, MS
4) H.225 CONNECT 3) H.225 CONNECT
7) H.245 CAPS, MS
9) H.245 OLC 10) H.245 OLC
IP Network
RTP/RTCP Stream
H.323 Architecture
2) 3
02 M
oved
Tem
pora
rily
7) ACK 8) ACK6) 200 OK 5) 200 OK
4) INVITE [email protected]
1) IN
VIT
E u
serB
@ho
me.
com
3) INVITE [email protected]
SIP Redirect Server
User Database
SIP ProxyRegistrar
IP Network
SIP User Agent A SIP User Agent B
RTP/RTCP Stream
SIP Architecture
Advanced Audio Compression Technology
Data Bit Rate
Au
dio
Fid
elit
y G.722.1G.722.1G.722G.722
AMR-NBAMR-NB
G.711G.711
G.722.2G.722.2
Wid
eb
an
d
Na
rro
wb
an
d
4 kbps 64 kbps 128 kbps
Siren 14 stereoSiren 14 stereo
Siren 22 stereoSiren 22 stereo
G.729AG.729AG.728G.728
G.722.1CG.722.1C
Su
per
Wid
e
SirenTM22 Stereo Codec Highlights
9
SirenTM22 MP3
Optimized for low latency - 40ms
Frequency band 22kHz
Stereo
High latency – 54-81ms
Stereo
Frequency band 18kHz
Low complexity 15MIPS High complexity 100MIPS
Low bit rate – max. 128kbps
Optimized for storage - bit rates > 128kbps
SirenTM22 on the Road to Standardization
ITU-T G.719 full-band codec approved in May 2008 Based on Polycom Siren™22 and Ericsson’s advanced audio G.719 number for higher visibility
ITU-T cited the strong and increasing demand for audio coding providing the full human auditory bandwidth Conferencing systems are increasingly used for more elaborate
presentations, often including music and sound effects In today’s multimedia presentations, playback of audio and
video from DVDs and PCs is becoming a common practice New Telepresence systems provide High Definition video and
audio quality to the user, and require high-quality media delivery to create the immersive experience
Extending the quality of remote meetings helps reduce travel which in turn reduces greenhouse gas emission and limits climate change.
10
Automatic Gain Control (AGC)
Signal strength
AGC adds 0dBAGC adds 3dB
AGC adds 6dB
Max. 12 feet from microphone
Nominal is 2 feet from microphone
Automatic Gain Control (AGC)
Activated by speech and music
Ignores white noise, e.g. if a fan is working close, AGC will not ramp up the gain based on fan noise
AGC destroys the natural dynamic range If the music is loud, AGC decreases the volume If the music is quiet, AGC increases the volume
Therefore, AGC must be completely disabled in a codec
12
Automatic Noise Suppression (ANS) & Noise Fill
13
Signal White noise
Signal
Signal Comfort Noise
ANS Noise Fill
Acoustic Echo Cancellation (AEC)
14
Acoustic Coupling
Hears echo
AEC
Stereo Acoustic Echo Cancellation (AEC)
50-22,000 Hz operating range
Adaptive filter length of 260ms This number is the max delay of the echo that we can compensate This is the room response – it includes many audio wave reflections
No learning sequence needed Algorithm trains quickly on speech No need to send out white noise to train it
Stereo echo canceller identifies multiple paths of the stereo loudspeakers
Quickly adapts to microphones that are moved within two words of speech Moving the mike changes the echo path and the adaptive filter has to
learn the new path. Echo comes back for short time (1-2 words); then canceller adjusts.
15
Stereo AEC in Live Music Mode (LMM)
Standard AEC leads to audio artifacts, low notes can be cut
Main complain from MSM is that sustained note (e.g. press sustain pedal on piano) cannot be heard all the way even if they are just 1dB over the noise floor
AEC settings in LMM prevent very quiet musical sounds from being cut out
Assumption that LMM is set in a quiet environment without background noise
We changed the thresholds for signal detection to be more aggressive (low)
16
Installed Audio
17
Definition: rack-mounted systems that process all the audio in a conference room or large meeting room
Microphones SpeakersVideo System
DVD TelephonySoundStructure
Interworking: Installed Audio & Video Endpoints
SoundStructure adds 8/12/16 additional inputs/outputs
Digital connectivity with Polycom Video Endpoints Fully digital audio for better quality Bi-directional stereo between SoundStructure and HDX Full 22kHz stereo AEC compatible with Siren 22 audio codec Shared mute and volume control Auto-discovery between the devices – automatic configuration
1818
SoundStructure
HDX
Advanced Video Technology: High Definition
1919
Qua
lity
Qua
lity
BandwidthBandwidth384kbps 512kbps 1Mbps 6Mbps
480p
720p
352x288
704x480
1280x720
Advanced Video Technology: Camera Control
Res 1280x720p
50/60FPS
Aspect ratio 16:9
Pan +/- 100°
Tilt +20° to -30°
12x optical zoom
20
FECC
FE
CC
Advanced Video Technology: Far End Camera Control (FECC)
21
FECC
In H.323, FECC uses H.281 (binary data) over H.224 (frames)
RFC 4573, MIME Type Registration for RTP Payload Format for H.224
Advanced Video Technology: Multiple Streams
22
‘Live’ Stream
‘Presentation’ Stream
ITU-T Recommendation H.239
RFC 4796, SDP Content Attribute
RFC 4574, The SDP Label Attribute
RFC 3388, Grouping of Media Lines in SDP
RFC 4582, Binary Flow Control Protocol (BFCP)
RFC 4583, SDP Format for BFCP Streams
draft-even-xcon-pnc-01, Role Mgmt & Multiple Streams
Audio Precedence in Codec Negotiation
24
Audio
Video
High priority
Bandwidth Standard Setting LMM Setting
> 1024 Siren22 Stereo 128 Siren22 Stereo 128
768 - 1024 Siren22 Stereo 96 Siren22 Stereo 128
512 - 768 Siren22 Stereo 96 Siren22 Stereo 128
384 - 512 Siren22 Stereo 96 Siren22 Stereo 128
256 - 384 Siren14 Stereo 48 Siren14 Stereo 48
Keeping Quality Up in Transmission
25
Video Error Concealment (PVEC)
IP Network
Lost Packet Recovery (LPR)
Video
Audio Video
Audio Video
LPR Definitions
LPR is a new method of error concealment for packet based networks that is based upon Forward Error Correction (FEC)
LPR constantly adjust the video bit rate to reduce the amount of loss in a packet based network
26
Lost Packet Recovery (LPR)
27
Video Encoder
EncryptionRTP
SenderLPR
Packetizer
LPR Recovery
Packet Generator
LPR DBA Mode
DecisionRTCP
RTCP
LPR Recovery
RTP Reordering
Buffer
LPR Regeneration Decryption
Video Decoder
LPR DBA Example
28
100%
74%
58%
72%72%64%
58%
70%77%
Down Speeding Up Speeding
X ms X ms X ms X ms X ms X ms X ms X ms X ms
Full BandwidthPacket loss 25%, FEC on
No packet loss, FEC off
…
Y ms
Bit rate drop 26%
Packet loss 15%
Bit rate drop 16%
Packet loss 4%, FEC on
Bit rate drop 5%
Bit rate increase e.g. 10%
Down Speeding
No packet loss, FEC off
…
58%
X ms
Technology Summary
Flexible Networking – H.323 and SIP
Advanced Audio Technologies Audio Compression Automatic Gain Control (AGC) Automatic Noise Suppression (ANS) and Noise Fill Stereo Acoustic Echo Cancellation (AEC)
Advanced Video Technology High Definition Camera Control Multiple Streams
Advanced Transmission Technology Lost Packet Recovery (LPR)
29