harvard it summit 2016 - opencast in the cloud at harvard dce- live and on-demand from the dce...
Post on 12-Apr-2017
419 Views
Preview:
TRANSCRIPT
Extending Harvard to part-time learners with the academic ability, curiosity and drive to succeed at
Harvard.
Opencast in the Cloud at Harvard DCE
Video and Course Content Capture, Processing, Management, and Distribution
jody_fanto@harvard.edu
Software Architect and Director of Software Development
Our Reliability Requirements
● I am on-call 24/7/365
Our Current Architecture● Capture Agents (CAs) are the only machines on campus● Everything else is in AWS● Live content: CAs → Akamai● Video On Demand (VOD) content: CAs → Ingest to admin → processed on
workers → producers polish content → output files push to S3 → CloudFront
Goals:● Unusually high quality recording and playback
○ Sleek, modern player;
● Live streaming and Video On Demand (VOD)● Global audience - consistent experience regardless of timezone● Exceptionally reliable - capture, transcoding, and distribution● Fast processing time; volume of content increases most semesters● Robust - exception cases handled quickly and ideally automatically● Very responsive support
Goals● Tools for production staff, teaching staff, technical staff, operations:
○ Archive, republishing
○ Production team interface, including trimming, uploading content from other sources (e.g., Premier)
○ Workflow Browser, Student Viewing Analytics, Capture Agent Status Board,
● Automated deployments:○ Consistency between Production, Stage and Development clusters○ Means to run large scale experiments:
■ Confirm that new optimizations will work as expected in Prod■ Tune performance - storage, compute, networking configuration■ Track down bugs that only happen under heavy load, to test
At its root: an extensible workflow engine● “Workflows” are a series of “Workflow Operations” ● Workflow Operation = Java Class:
○ Transcode, automatically detect slide changes, etc○ Can do anything Java can do
● Workflows = XML file○ Series of operations listed in order, with some transitions, e.g., wait until producer sets start
and end points○ Different workflows for different use cases. E.g., live capture or inject content from Premier
● Current production workflows are ~20 steps long● It’s fair easy to handle new processing use cases:
○ Write a workflow○ Write any new Operations as needed
Architecture - Goal
.
.
.
A/V
Mira
cles Magic Students
Camera
Laptop
CameraLaptop
Camera
Laptop
Classrooms
Architecture - Capture Agents (CAs)
.
.
.
CACamera
Laptop
CACamera
Laptop
CACamera
Laptop
CACamera
Laptop
.
.
. A/V
Mira
cles Magic Students
The Search for Capture Apents (CAs)● Most 3rd party systems require that you buy their hardware and software:
○ Tight integration that you don’t have to worry about○ Limited by their box’s quality, resolutions, bitrates, number of streams, types of streams, etc○ They set the price, and declare the lifetime of your box (e.g., 3 years)○ Is this good? Bad? Depends on your use cases and funding
● Opencast’s CA API is fully open○ Any capture hardware that has an API○ Any capture software compatible with the CA API
The Search for Capture Apents (CAs)○ Capture Hardware options:
■ Vendor CAs (including their capture software) - Extron, NCast, DataPath,Teltek, etc■ Build it yourself, or have an OEM build it for you.■ General A/V boxes - KiPros, Epiphan Pearls, etc
○ Vendors’ Opencast specific CAs - Open Source Capture Agent (CA) software:■ Galicaster■ PyCA■ Harvard DCE’s mhpearl
○ The hard parts:■ Identifying the use cases you need to solve
■ Finding a CA that works well with your A/V setups in your classrooms, with your networking, etc
Epiphan Pearl ● Commodity A/V recorder
○ Maxwell Dworkin 119. Single Epiphan. Used by two different capture systems● Delightfully reliable● You can mix and match up to 4 sources into a single channel - side by side, picture in picture, etc. ● Multiple channels can be recorded and streamed at the same time. ● (Not used) Has the ability to do live switching between sources if an operator is available to run it during an event.
Add Opencast VOD; add multiple bitrate streams
.
.
.
CACamera
Laptop
CACamera
Laptop
CACamera
Laptop
CACamera
Laptop
.
.
. A/V
Mag
ic
Opencast
Engage API...
Students’ Browsers
Workflow Engine
Playback Resolutions and Bit RatesVideo On Demand (VOD):
● 720p @ 5Mbps● 540p @ 2Mbps● 360p @ 400Kbps● 180p @ 150Kbps
Live Streaming:
Doublewide - Presenter and Presentation in a single stream
● 1080p @ 5Mbps● 810p @ 2Mbps● 540p @ 400Kbps● 270p @ 150Kbps
Isn’t that a lot of data to push?● Mezz files are 20-30GB per hour of lecture● Use existing infrastructure:
○ Harvard’s Internet2 connection is >= 100gbps○ Amazon’s Internet2 connection is >= 40gbps
● Slowest hop:○ From classrooms to Harvard’s outgoing switches
Cluster Node Types1. Engage
a. Metadata search API1. Queried by the playback viewer, running in students’ browsers2. What courses am I in? For all courses, what presentations can I watch?
ii. Playback to producers pre-publish
2. Admina. Job coordinator; ingests
3. Workers4. Utility node5. Tools nodes: Workflow browser, Analytics
Add Live Streaming
.
.
.
CACamera
Laptop
CACamera
Laptop
CACamera
Laptop
CACamera
Laptop
.
.
. A/V
Mag
ic
Opencast Engage API ...
Students’ Browsers
Wowza
Add CDNs
.
.
.
CACamera
Laptop
CACamera
Laptop
CACamera
Laptop
CACamera
Laptop
.
.
. A/V
Mag
ic
Opencast Engage API ...
Students’ Browsers
Akamai
AWS CloudFront
Insert S3
.
.
.
CACamera
Laptop
CACamera
Laptop
CACamera
Laptop
CACamera
Laptop
.
.
. A/V
Mag
ic
Opencast Engage API ...
Students’ Browsers
Akamai
AWS CloudFrontAWS S3
Add redundancy for live streams
.
.
.
.
.
. A/V
Mag
icPrimary CA
Camera
Laptop Secondary CA
Primary CACamera
Laptop Secondary CA
Primary CACamera
Laptop Secondary CA
.
.
.
Akamai Primary Data Center
Akamai Secondary Data Center
Akamai Load Balancer
Architecture - Compute● Scale to arbitrary* load:
○ How powerful does each worker need to be?■ Bigger instances process faster, and cost more
○ Automated horizontal scaling:■ How many workers do you need?■ When do you need them?■ Currently time based scaling
The ways we use storage● “Hot” content - lecture content captured recently and awaiting processing and
publication by the production team - Zadara● “Warm” content -- S3 and S3 Infrequent Access● Archive -- republishing● CloudFront:
○ Engage as source of truth○ Engage to S3; S3 as source of truth
Student Experience: Player● Put the student in control:
○ Different layouts, can be changed throughout lecture○ Different resolutions and bitrates○ Control display of captions and transcripts○ Slidedeck download (where applicable)
● Code is from the open source Paella Project● Skinning is inspired by CS50
DCE OpsWorks for Opencast● Automated deployment● Automated config management● One command and you have a new cluster● Consistency between dev/stage/prod● Hard wall between clusters
Zadara is great● High performance NFS storage in AWS● It solved our storage problems● EFS, SoftNAS
AWS is awesome● It solves your problems:
○ EC2 solved our compute problems○ S3 met our “warm” and “cold” storage needs
■ S3 Infrequent Access cuts the cost by ⅔○ RDS solved our DB problems○ Instances never* go down. The network is always strong*.
● AWS has solutions for most of the things you don’t want to maintain○ Message queues, sending email, DBs, map-reduce, etc
● It allows you to do things that would normally be crazy hard and expensive:○ - Multi-AZ support; Multi-Region support
AWS is awesome● It enables you to develop faster
But...● Cost -- even pennies add up● AWS is complicated
○ VPC setup
● Some AWS behavior is unexpected:○ Spin up an instance for 3 minutes work - pay for an hour
○ Even on instances with guaranteed 10gbps throughput only get 1.6gbps inbound from the internet.
Thoughts on software development● Customers never know what they want● If your project is going well, you will have a huge backlog● People are the hardest part of software development
jody_fanto@harvard.edu
Software Architect and Director of Software Development
top related