multimedia conferencing raphael coeffic ([email protected]) based partly on slides of ofer hadar, jon...

Multimedia conferencing

Raphael Coeffic ([email protected])

Based partly on slides of Ofer Hadar, Jon Crowcroft

Which Applications?

• Conferencing: Audio/video communication and application sharing First multicast session IETF 1992 Many-to-many scenarios

• Media Broadcast Internet TV and radio One to many scenario

• Gaming Many to many

What is needed?

• Efficient transport: enable real time transmission. avoid sending the same content more than once. Best transport depends on available bandwidth and technology.

• Audio processing: How to ensure Audio/Video Quality? How to Mix the streams?

• Conference setup: who is allowed to start a conference? how fast can a conference be initiated?

• Security and privacy: How to prevent not-wanted people from joining? How to secure the exchanged content?

• Floor control: How to maintain some talking order?

How to Realize? Centralized

• All register at a central point

• All send to central point

• Central point forwards to others

• Simple to implement

• Single point of failure

• High bandwidth consumption at center point Must receive N flows

• High processing overhead at center point Must decode N flows mix the flows and encode N flows With no mixing the central point would send Nx(N-1) flows

• Appropriate for small to medium sized conferences

• Simple to manage and administer: Allows access control and secure communication Allows usage monitoring Support floor control

• Most widely used scenario

• No need to change end systems

• Tightly coupled: Some instances know all information about all participants at all times

• All establish a connection to each other

• All can send directly to the others

• Each host will need to maintain N connections

• Outgoing bandwidth: Send N copies of each packet simple voice session with 64kb/s would translate to 64xN kb/s

• Incoming bandwidth: If silence suppression is used then only active speakers send data

• In case of video lots of bandwidth might be consumed Unless only active speakers send video

• Floor control only possible with cooperating users

• Security: simple! do not send data to members you do not trust

• End systems need to mix the traffic –more complex end systems

How to Realize? Full Mesh

• All establish a connection to the chosen mixer.

• Outgoing bandwidth at the mixer end point: Send N copies of each packet simple voice session with 64kb/s would translate to 64xN kb/s

• Incoming bandwidth: If silence suppression is used then only active speakers send data

• In case of video lots of bandwidth might be consumed Unless only active speakers send video

• One of the end systems need to mix the traffic –more complex end system.

• Mostly used solution for three-way conferencing.

How to Realize? End point based

How to Realize? Peer-to-Peer

• Mixing is done at the end systems

• Increases processing over-head at the end systems

• Increases overall delay Possibly mixed a multiple times

• If central points leave a conference the conference is dissolved

• Security: Must trust all members Any member could send all data to non-trusted

users

• Access control: Must trust all members Any member can invite new members

• Floor control: requires cooperating users

Transport considerations

• Transport layer: Most of the group communication systems on top of unicast sessions. Very popular in the past: multicast.

• Application layer: RTP over UDP. Why not TCP?

Better NAT traversal capabilites (used by Skype as the last solution). But, not really suitable for real time feed back (Why?).

• Control protocol: Interactive conferencing: SIP, H.323, Skype, etc... Webcast: RTSP, Real audio and other flavours.

• Session description: SDP (Session description protocol).

IP Multicast

• Why? Most group communication applications are based on top of unicast sessions. By unicast, each single packet has a unique receipient.

• How? Enhance the network with support for group communication Optimal distribution is delegated to the network routers instead of end systems Receivers inform the network of their wish to receive the data of a communication

session Senders send a single copy which is distributed to all receivers

Multicast vs. Unicast

A

E

B

D

C

• File transfer from C to A,B,D and E• Unicast: multiple copies • Multicast: single copy

IP Multicast

• True N-way communication Any participant can send at any time and everyone receives the message

• Unreliable delivery Based on UDP: Why?

Avoids hard problem (e.g., ACK explosion)

• Efficient delivery Packets only traverse network links once (i.e., tree delivery)

• Location independent addressing One IP address per multicast group

• Receiver-oriented service model Receivers can join/leave at any time Senders do not know who is listening

IP Multicast addresses

• Reserved IP addresses special IP addresses (class D): 224.0.0.0 through 239.255.255.255

class D: 1110+28 bits 268 million groups (plus scope for add. reuse)

224.0.0.x: local network only 224.0.0.1: all hosts Static addresses for popular services (e.g., SAP –Session

Announcement protocol)

Alternatives to Multicast

• Use application level multicast Multicast routing done using end hosts

Hosts build a multicast routing tables and act as multicast router (but on application level)

User request content using unicast Content distributed over unicast to the final users

Application level Multicast vs. unicast

Content source

Traditional

Content source

Application levelmulticast

Conference mixer architecture

• Main components for centralized conference mixer: Coder / decoder (+ quality ensuring components). Synchronization Mixer

• Processing pipeline:

Audio Mixing

G.711

E

G.729

E

GSM

E

Periodic timer

B

A

C

X=A+B+C

E

G.729

E

GSM

E

B

A

C

X-A=B+C

X-B=A+C

X-C=B+A

E: EncoderD: Decoder

G.711

D

G.729

D

GSM

D

G.711

Audio Quality

• Mostly based on „Best effort“ networks: No garanty for nothing. Packet get lost and/or delayed depending on the congestion status of the network.

• Depending on the codec, different quality can be reached: Mostly reducible to a „needed bandwidth vs. quality“ tradeoff. Wanted properties: loss resistancy, low complexity (easy to implement in embedded hardware).

• Audio datas have to be played at the same rate they have been sampled: Different buffering techniques have to be considered, depending on the application. Pure streaming (Radio/TV) are not interactive and thus not influenced by the delay. Quality is

everything. Interactive conferencing need short delays to garanty the real time property. Delay is

experienced as „very annoying“ by users in such applications.

Codecs quality measurements

• Codecs: Mean Opinion Score (MOS) measurements:

Codecs: loss resistancy

Codecs: complexity

Audio quality: packet loss

• Packet loss: The impact on voice quality depends on many factors:

Average rate: rate under 2~5% (depending on the codec) are almost unhearable. Over 15% (highly depending on the burstiness), most calls are experienced as ununderstandable.

Burstiness: depending on the loss distribution, the impairement can vary from small artifacts due to packet loss concealment to really anoying quality loss.

Modern codecs like iLBC, which are exclusively focused on VoIP, are much more resistant and should thus be prefered to PSTN based low-bitrate codecs.

Considering media servers and specially conferencing bridge, we should concentrate on receiver based methods, as every other method would not be compatible with the customers‘ phones.

Solutions: support appropriate codecs, assert a minimal link quality and implement a reasonable PLC algorithm.

Audio quality: jitter

• Delay variation (Jitter) Why?

varying buffering time at the routers on the packets‘ way. Inherent to the transmission medium (WiFi).

Depending on the buffering algorithm, quality impairements are mostly caused by a too high ear-to-mouth delay or late loss.

Ear-to-mouth delay: Whereby delays under 100 ms are not noticeable, value over 400 ms make a natural

conversation very difficult. Late loss:

If the buffering delay is smaller than the actual delay, some packets arrive after their playout schedule. This effect in called ‚Late loss‘.

Delivering a good voice quality means, apart from packet loss concealment, minimizing delay and late loss.

Jitter: example

Adaptive playout

• Static buffer Playout is delayed by a fix value. Buffer size has to be computed once for the rest of call. Some clients implement a panic mode, increasing the buffer size dramaticaly (x 2)

if the late loss rate is too high.

Advantages: Very low complexity.

Drawbacks: High delay. Performs poorly if the jitter is too high. Does not solve the clock skew problem.

Adaptive playout (2)

• Dynamic buffer: talk spurt based. Within a phone, a speaker is rarely active all the time. So it is possible to distinguish between

voiced and unvoiced segments. Ajusting the buffering delay within unvoiced segments has no negative impact on the voice

quality. Using a delay prediction algorithm on the previous packets, we then try to calculate the

appropriate buffering delay for the next voiced segment.

Advantages: Low complexity. Solves the clock skew problem.

Drawbacks: Needs Voice Activity Detection (VAD), either at the sender or at the receiver. High delay. Performs poorly if the jitter is varying fast (within a voice segment).

Adaptive playout (3)

• Dynamic buffer: packet based.

Based on Waveform Similarity Overlap Add Time-scale modification (WSOLA)

Enables packet scaling without pitch distortion. Very good voice quality: scaling factors from 0.5 to 2.0 are mostly unhearable

if done locally. But: High processing complexity.

WSOLA: how does it work?

multimedia conferencing raphael coeffic ([email protected]) based partly on slides of ofer hadar, jon...

Documents