orchestrating the execution of workflows for media streaming service and even more
TRANSCRIPT
Orchestrating the execution of workflows for media streaming service and even moreShuen-Huei (Drake) Guan
sr. principal engineer, KKBOXvice chairperson, PyCon APAC 2015
Who am I?
• administrator, Ptt BBS
• technical director / R&D manager, Digimax
• team player, KKBOX
• contributor, PyCon Taiwan
Let's work on a video-on-demand service
• Adaptive streaming.
• DRM protection.
• Video processing on cloud.
Issue 1. Workflow
multiple distinct interconnected steps that need to be executed in a particular order in a distributed environment...— someoneflickr:siddhu2020
flickr:siddhu2020 http://bit.ly/1FAukT2
def run(source, secret_key, cipher): # verify if the source is ok. if not verify(source): return False
# convert audio with different bitrates _ = [convert(source, i) for i in range(4)]
# update id3 tag for all converted audios _ = update_id3_tag(_)
# encrypt all audios _ = encrypt(_, secret_key, cipher)
# deploy to backend DB deploy(_)
return True
Sample client code to submit a workflow1
$workflow = new Gearman_Workflow('KKBOX_Convert_Audio' 'source' => $source, 'args' => $args);
$workflow->attachCallback(function () {});
$client->run($workflow);
1 warning, it's PHP.
Sample worker (server) code to do things1
class KKBOX_Convert_Audio extends Gearman_Worker { public function run($arg) { // check the source if (!verify()) return; // convert audio with different bitrates for ($i=0; $i<4; $i++) { convert($i); } // update id3 tag for all audios update_id3_tag(); // encrypt audios encrypt(); // sequentially deploy to backend DB for ($i=0; $i<4; $i++) { deploy($i); }}
1 warning, it's PHP.
Sample worker (server) code to do things1
class KKBOX_Encode_Video extends Gearman_Worker { public function run($arg) { transcode(); encrypt(); }}
class KKBOX_Convert_Video extends Gearman_Worker { public function run($arg) { if (!verify()) return;
// create asynchronous sub-workflows $result = create_sub_workflow(KKBOX_Encode_Video); // wait for all sub-workflows to finish joint($result);
create_sub_workflow(KKBOX_Package_DASH, $result->encrypted); create_sub_workflow(KKBOX_Package_HLS, $result->plain); joint();
deploy();}
1 warning, it's PHP.
Issue 3. Workflows would evolve...
• Let's save file size and IO.
• Let's make it faster.
• Let's add some more profiles.
• Let's fix some encoding.
Everything fails all the time.— Werner Vogels, CTO of Amazonflickr:Bill Abbott
flickr:Bill Abbott http://bit.ly/1GnrSGr
Factors we like to pay much attention in
• Encoding workflow
• Tasks distributing across machines on cloud.
• Server maintenance.
We hope ...
1. no need to maintain this system;
2. easier to distribute workflow/tasks, even to local machine;
3. with high-level workflow.As long as you can draw your processes on a paper, you can map it to a workflow!
What Google suggests us...
• Apache Kafka, Mesos, ...
• Gearman (sorry, but we've tried.)
• Luigi by Spotify
• Celery
• Potentially all message brokers with some additional work.
class HelloWorker(swf.ActivityWorker):
domain = DOMAIN version = VERSION task_list = TASKLIST
def run(self): activity_task = self.poll() if 'activityId' in activity_task: print 'Hello, World!' self.complete() return True
class HelloDecider(swf.Decider):
domain = DOMAIN task_list = TASKLIST version = VERSION
def run(self): history = self.poll() if 'events' in history: # Find workflow events not related to decision scheduling. workflow_events = [e for e in history['events'] if not e['eventType'].startswith('Decision')] last_event = workflow_events[-1]
decisions = swf.Layer1Decisions() if last_event['eventType'] == 'WorkflowExecutionStarted': decisions.schedule_activity_task(...) elif last_event['eventType'] == 'ActivityTaskCompleted': decisions.complete_workflow_execution() self.complete(decisions=decisions) return True
SWF
• Decider defines the workflow.
• We still need to write workflow logic in decider.
• Workers do the action.
• Everytime, we changed workflow or action, we need to re-deploy deciders and workers.
Job script for a workflow
Job {KKBOX Convert Video} -subtasks { Task {Source Inspection} -cmds { Cmd { emilia verify -i s3://bucket/source.mp4 } }
Task {Transcode} --parallel -subtasks { Iterate i -from 0 -to 4 -by 1 -template { Task {Transcode Audio} -cmds { Cmd { ffmpeg -i s3://bucket/source.mp4 -o /tmp/converted_$i.mp4 } } } Iterate i -from 0 -to 8 -by 1 -template { Task {Transcode Video} -cmds { Cmd { ffmpeg -i s3://bucket/source.mp4 -o /tmp/converted_$i.mp4 } } } }
Task {Adaptive} -subtasks { Task {DASH} -subtasks { } Task {HLS} -subtasks { } Task {MSS} -subtasks { } }}
Make it pythonic if that makes developers happier
source = 's3://bucket/source.mp4'
with Job(): with Task('Source Inspection'): Cmd('emilia verify -i %s' % source)
with Task('Transcode', parallel=True): for i in range(4): with Task(): Cmd('ffmpeg -i %s ... -o /tmp/a_%d.mp4' % (source, i)) for i in range(9): with Task(): Cmd('ffmpeg -i %s ... -o /tmp/v_%d.mp4' % (source, i))
with Task('Adaptive'): with Task('DASH'): pass with Task('HLS'): pass with Task('MSS'): pass
Status
• 1,500,000-minute videos got encoded.
• 3,000 videos per day (max).
• 800 workers on 100 c3.8xlarge instances (max).
• spent lots of $.
• everyone is really happy for that performance.
Technical status
• Fault tolerance by retry. [decider]
• Workflow/task has priorities. [SWF]
• try..except..finally mechanism. [-whendone, -whenerror, -precmds, -postcmds, ...]
To do:
• Use JSON or YAML for job script.
• A viewer to see the progress of workflows!
• Replace SWF by Apache Mesos or Mistral.
We are hiring• Video Engineer
• Full Stack Developer
• Python Developer