webrtc infrastructures in the large (with experiences on real cloud deployments)

31
WebRTC infrastructures in the large (with experiences from real deployments) Luis Lopez [email protected] IIT RTC Conference & Expo October 2015

Upload: luis-lopez

Post on 13-Jan-2017

959 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: WebRTC infrastructures in the large (with experiences on real cloud deployments)

WebRTC infrastructures in the large(with experiences from real deployments)

Luis [email protected]

IIT RTC Conference & ExpoOctober 2015

Page 2: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org2

Speaker

• Coordinator of Kurento.org– FOSS project– WebRTC Media Server– WebRTC Media APIs– WebRTC Cloud Infrastructure

• Software developer• Software trainer• Software learner• FOSS enthusiast

http://www.kurento.org

http://twitter/@kurentomshttps://www.youtube.com/channel/UCFtGhWYqahVlzMgGNtEmKug

Page 3: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org3

WebRTC infrastructures

Peer-to-Peer WebRTC Application (without media infrastructure)

WebRTC video stream

WebRTC Application based on media infrastructuremedia infrastructure

Page 4: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org4

Function of WebRTC infrastructuresProcessing

VP8 H.264

Group Communications

Archiving

Page 5: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org5

WebRTC infrastructures in the large

From the hundreds to the millions: the scalability problem

WebRTC Cloud

Page 6: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org6

Page 7: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org

WebRTC cloud models

High flexibility

Com

plex

de

velo

pmen

t

Low

hou

rly

cost

s

Low flexibility

Sim

ple

deve

lopm

ent

High

hou

rlyco

sts

IaaS

PaaS

APIaaS

SaaS

No WebRTC-specific players here

ComputingResources

Page 8: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org8

WebRTC cloud architectures

Virtual infrastructure

WebRTC Platform

WebRTC API

WebRTC Application

IaaS

PaaS

APIaaS

SaaS

No new science

here

The science forthe scalability

problem is here

Page 9: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org9

WebRTC Vs traditional WWW Platforms: the three tiers

Application Server Container

Service Layer

Application 1 Application N…

WebRTCMedia Server

DD.BB. Server

Signaling

Page 10: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org10

Vertical scalability on monolithic WebRTC platforms

Application Server Instance

Media Server Instance

Application 1 Application N…

Qua

lity

of se

rvic

e

Number of WebRTC legs

Typical scalability curve for SFU media servers

~500 to 1000 in commodity hardware

The bottleneck is here

Page 11: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org11

Horizontal scalability of WebRTC Media Servers

ApplicationServer

ApplicationServer

ApplicationServer

MediaServer

MediaServer

MediaServer

MediaServer

Media Resource Broker

RFC6917

Load Balancer

Page 12: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org12

Media Resource Broker• Functions

– MS registration• MS instances register on the MRB

– MS brokering• Query model

– AS instances query the MRB for locating a MS instance– MRB is explicit for the AS

• In-line model– MRB routes signaling (control requests)– MRB is transparent for the AS

• MRB does not hold state about MS instances– MS instances are independent– MS instances are equivalent– We say it’s stateless

Page 13: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org13

Stateless MRB use cases

• Independent MS– B2B calls– WebRTC GW– Room servers– Media recording– Etc.

Stateless - MRB

ApplicationServer

Instance

Media Server

Instance

Media Server

Instance

Media Server

Instance

Media Server

Instance

Call Call

Page 14: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org14

• Amazon Web Services EC2– Most popular public cloud

• OpenStack– Popular public clouds (e.g. RackSpace)– Popular for private clouds

• Deployment– Cloud deployment templates• CloudFormation (Amazon)• Heat (OpenStack)

Deploying in public and private clouds

Page 15: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org15

Templates

– Declarative language for• Declaration of resources

and relationships– Images, Computing Nodes,

Networks, Volumes, Load Balancers, Autoscaling groups, etc.

• Deployment– Instantiation of resources

• Runtime– Provisioning– Autoscaling

Page 16: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org16

Deploying in public clouds

AWS AMI / OpenStack Glance

Media ServerImage

ApplicationServer Image

BrokerImage

Stack definition template

AWS EC2 / OpenStack Nova

CloudFormation / HeatChef + Packer

Autoscaling Rules

Launch configurations

AutoscalingGroup

AutoscalingGroup

Elastic Load Balancer

ApplicationServer

Instance

ApplicationServer

Instance

BrokerInstance

Media Server

Instance

Media Server

Instance

Media Server

Instance

Source code

Page 17: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org17

Page 18: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org18

Experiences deploying large WebRTC infrastructures in public clouds

• Lessons learnt: fault-resilience is hard– AS & MRB layers

• Are stateless => use distributed cache systems– MS layer

• Is stateful => lots of problems

ApplicationServer

ApplicationServer

MediaServer

MediaServer

MediaServer

MediaServer

Media Resource Broker

Page 19: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org19

Computing Node

Lessons learnt: avoid single points of failure

MS

MRB

Computing Node

MS

Computing Node

… MS

Elastic Load Balancer

Computing Node

MS

Computing Node

MRB MRBdistributed cache

The wrong way(single point of failure)

The right way(fault-tolerant MRB)

Page 20: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org20

Lessons learnt: fault-recovery at the MS layer

• Fault-tolerance on the MS layer

– Stateful problem• MS instances hold specific resources

that cannot be “serialized” to a distributed cache:– Specific Sockets

• Machine failure => session failure

– Our proposed solution• Re construct the session

– Detect failure– Notify failure– Reconnect

MRB

Media Server

Instance

Media Server

Instance

Media Server

Instance

Media Server

Instance

Call Call

ApplicationServer

InstanceFailure

detection

Failure notification

Sessionreconnection

Page 21: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org21

Autoscaling

Page 22: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org22

Lessons learnt: lack of optimal scale-out events and metrics

• Lessons learnt: firing scale-out events– which metric?– Bottleneck depends on applications: network, CPU, memory, etc.– our recommendation: define a synthetic metric (i.e. scaling points) and be

conservative

Qua

lity

of se

rvic

e

Number of WebRTC legs

Typical scalability curve for SFU media servers

CPU load 50%

Memory 40%

Page 23: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org23

Lessons learnt: scaling-in is harder than scaling-out

• The options (none-good)– Expose # sessions as a metric

• Depends on cloud capabilities• AS needs to be made cloud aware

– Session migration• AS needs to be made cloud aware• Renegotiations

– Retain period• Sub-optimal utilization• The simplest

MRB

ApplicationServer

Instance

MS1 MS2 MS3 MS4

Which one would you remove?

Page 24: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org24

Limits of the (stateless) MRB

Media stream

One to M

ANY

Page 25: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org25

Stateful MRB

Stateful MRB

ApplicationServer

Instance

Media Server

Instance

Media Server

Instance

Media Server

Instance

Media Server

Instance

Media Server

Instance

Page 26: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org26

Why?

Page 27: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org27

Stateful because …

• MRB– Must be aware of media topology• Stateful information about MS relationships

– Request routing depends on topology• Where to place a new viewer?

– Request routing depends on internal state• CPU load• QoS• Memory• Etc.

Page 28: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org28

Experiences with stateful MRB in AWS EC2 & OpenStack

• Lessons learned: beware of WebRTC internals– Differentiated quality

• SVC is the solution– but its not ready

• Plain SFU forwarding models are not an option.– RTCP feedback of viewers with bad connectivity destroy QoE

• Simulcast may be an option– Suppress feedback of viewers with really bad connectivity

• Layered transcoding works nicely– But its expensive

– Churn and the generation of key-frames• Periodic key-frame generation is an option

– In VP8 expect significant increase in BW consumption• Layered transcoding works nicely

– But its again expensive

Page 29: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org29

Experiences with stateful MRB in AWS EC2 & OpenStack

• Lessons learned: the cloud is evil– Placement of incoming WebRTC legs

• New science required here– Ideas?

• Our solutions– Count number of WebRTC legs (points mechanisms9– Ad-hoc, hard and error prone

– Fault-resilience• New science required here

– Ideas?• Our solution

– Re-construct internal parts of the tree, but never leaves.– Requires client renegotiation– Ad-hoc, hard and error prone

Page 30: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org30

Page 31: WebRTC infrastructures in the large (with experiences on real cloud deployments)

http://www.kurento.org31

Thanks

Luis [email protected]