encoding, fast and slow - stanford university talks/sadjad...encoding, fast and slow: low-latency...

56
Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthik Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, Keith Winstein https://github.com/excamera

Upload: others

Post on 07-Jun-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17)

Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthik Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, Keith Winstein

https://github.com/excamera

Page 2: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Outline

• Vision & Goals

• Background on Video Processing

• Fine-grained Parallel Video Encoding

• μ: Supercomputing as a Service

• Evaluation

• Conclusion & Future Work

2

Page 3: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,
Page 4: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

for Video

* not really a Google product (yet)

Page 5: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

"Make this movie black and white."

Page 6: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

"Apply some awesome filter to my video."

Page 7: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

"Pixelate this face everywhere in this video."

Page 8: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

"Remake Star Wars Episode I without Jar Jar."

Page 9: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Can we achieve interactive speeds with very granular video

processing in a distributed system?

Page 10: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

The dilemmas

• Low-latency video processing requires fine-grained parallelism, but the finer-grain the parallelism, the worse the compression efficiency.

• Even if we have a way to avoid this penalty, we still need thousands of threads, running in parallel, with instant startup.

10

Page 11: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

ExCamera

• In this project, we tried to directly address these two dilemmas.

• We made two contributions:

• A video encoder intended for massive fine-grained parallelism (the first dilemma).

• A framework that orchestrates thousands of threads running in parallel on AWS Lambda (the second dilemma).

• We call the whole system ExCamera.

11

Page 12: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Fine-grained Cloud Computing

• Parallel make

• "Laptop Extension"

12

Page 13: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Outline

• Vision & Goals

• Background on Video Processing

• Fine-grained Parallel Video Encoding

• μ: Supercomputing as a Service

• Evaluation

• Conclusion & Future Work

13

Page 14: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

What's a video?

• A series of still images, displayed in order, at a specific rate.

• Movies are usually shown at 24 frames per second.

14

Page 15: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

4K Video 4096×2160 pixels/frame

~14 MB/frame

24 fps 2.5 Gbps

15

Page 16: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Video Codec

• A piece software or hardware that compresses and decompresses digital video.

16

1011000101101010001000111111101100111001100111011100110010010000...001001101001001101101101101011111010011001010000010011011011011010

Encoder Decoder

Page 17: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

How to compress?

• Compress each frame individually.

• JPEG Compression of a 4K frame: ~1 MB/frame → 192 Mbps

17

Page 18: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

How to compress?

• Exploit the temporal redundancy in adjacent frames.

• Store the first frame on its entirety: a key frame.

• For other frames, we could just store the “diff” with the previous frame: interframes.

18

Page 19: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Encoder & Decoder

encode(img[1..n]) → [kf] + if[2..n]

decode([kf] + if[2..n]) → img’[1..n]

19

compressed video

Page 20: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

4K Video

20

Key Frame SizeInter Frame Size

Encoded with VP8

~1 MB~25 KB

Page 21: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Traditional Parallel Video Encoding

encode(i[1..200]) → [kf1] + if[2..200]

encode(i[1..10]) → [kf1], if[2..10] encode(i[11..20]) → [kf11], if[12..20] encode(i[21..30]) → [kf21], if[22..30] ⠇ encode(i[191..200]) → [kf191], if[192..200]

21

Finer-grained parallelism → more frequent key frames

Page 22: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Outline

• Vision & Goals

• Background on Video Processing

• Fine-grained Parallel Video Encoding

• μ: Supercomputing as a Service

• Evaluation

• Conclusion & Future Work

22

Page 23: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

14.8-minute 4K Video

Encoding with vpxenc (VP8)

Single-Threaded ~7.5 hours

Multi-Threaded ~2.5 hours

23

Page 24: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Decoder State

• Decoder needs to maintain some information that evolves with each decoded frame.

• Traditional video codecs do not expose this information.

encode(img[1..n]) → [kf] + if[2..n]decode([kf] + if[2..n]) → img’[1..n]

• We call this the state and we made it explicit.

24

Page 25: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Decoder State

25

sourcestate

Page 26: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Decoder State

26

sourcestate

targetstate

frame

Page 27: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Decoder State

27

sourcestate

targetstate

frame

output

Page 28: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Decoder State

28

interframeinterframekey frame interframe

Page 29: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

What we built: a video codec in explicit state-passing style

• VP8 decoder with no inner state:

decode(state, frame) → (state′, image)

• VP8 encoder: resume from specified state

encode(state, image) → (state′, interframe)

29

Page 30: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

ExCamera: Encoding, Fast and Slow

• Divide the work into tiny tasks (6 frames each).

• [Parallel, the slow work] Make tiny independent chunks.

• [Serial, the fast work] Stitch the chunks together and remove keyframes.

30

Page 31: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

ExCamera: Encoding, Fast and Slow

• Divide the video into 4-second batches.

• Each batch is divided further to 16 chunks, ¼ second each.

31

Page 32: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

1. [Parallel] Download 6-frame chunk of raw video

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

32

Page 33: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

2. [Parallel] vpxenc → keyframe, interframe[5]

33

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

Google's VP8 encoderencode(img[1..n]) → [kf] + if[2..n]

Page 34: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

3. [Parallel] decode → state ↝ next thread

34

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

Our explicit-state style decoderdecode(state, frame) → (state′, image)

Page 35: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

4. [Parallel] last thread’s state ↝ encode

35

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

Our explicit-state style encoderencode(state, image) → (state′, interframe)

Page 36: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

4. [Parallel] last thread’s state ↝ encode

36

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

Adapt a frame to a different source state rebase(state, image, interframe) → interframe′

Page 37: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

5. [Serial] last thread’s state ↝ rebase → state ↝ next thread

37

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

Adapt a frame to a different source state rebase(state, image, interframe) → interframe′

Page 38: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

5. [Serial] last thread’s state ↝ rebase → state ↝ next thread

38

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

Adapt a frame to a different source state rebase(state, image, interframe) → interframe′

Page 39: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

6. [Parallel] Upload finished video

39

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

Page 40: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Thousands of tiny threads1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

1 61115

thread 1

7 1211111

thread 2

13 1811117

thread 3

19 2411123

thread 4

Page 41: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Outline

• Vision & Goals

• Background on Video Processing

• Fine-grained Parallel Video Encoding

• μ: Supercomputing as a Service

• Evaluation

• Conclusion & Future Work

41

Page 42: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Execution Environment

• We need an execution environment, where we can run thousands of threads in parallel, each for a few minutes.

• Cloud functions!

42

Page 43: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

AWS Lambda is an underutilized supercomputer

• Intended for event handlers and Web microservices

• Supports JavaScript (node.js), Java, Python

• But in practice, you can upload a zip file containing a binary executable and simply exec it from the Python handler

43

Page 44: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Lambda can launch 3,600 threads in 3 seconds

44

Page 45: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Hardware

• 1.5 GiB RAM, 2.8 GHz CPU, very little disk space

• 5-minute execution limit

45

Page 46: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Costs

• Price: 2.5 milli-cents per second (billed in 100 ms increments)

• 3,600 threads for one minute → $5.40

• Unique features:

• Sub-second billing,

• Thousands of threads,

• Fast startup.46

Page 47: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

μ, supercomputing as a service

• We built mu, a library for designing and deploying massively parallel computations on AWS Lambda.

• ExCamera is just an example, possibilities are endless.

47

Page 48: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Outline

• Vision & Goals

• Background on Video Processing

• Fine-grained Parallel Video Encoding

• μ: Supercomputing as a Service

• Evaluation

• Conclusion & Future Work

48

Page 49: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Quality vs. Bitrate

49

16

17

18

19

20

21

22

5 10 20 30 40 50 60 70

SSIM

(dB)

average bitrate (Mbit/s)

vpx (single-th

readed)

vpx (

multithrea

ded)

Page 50: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Quality vs. Bitrate

50

16

17

18

19

20

21

22

5 10 20 30 40 50 60 70

SSIM

(dB)

average bitrate (Mbit/s)

vpx (single-th

readed)

vpx (

multithrea

ded)

vpx (6-fra

me chunks)

Page 51: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Quality vs. Bitrate

51

16

17

18

19

20

21

22

5 10 20 30 40 50 60 70

SSIM

(dB)

average bitrate (Mbit/s)

vpx (single-th

readed)

vpx (

multithrea

ded)

vpx (6-fra

me chunks)

ExCamera

Page 52: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

14.8-minute 4K Video

Encoding with vpxenc (VP8)

Single-Threaded ~7.5 hours

Multi-Threaded ~2.5 hours

52

ExCamera 2.1 minutes

Page 53: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Outline

• Vision & Goals

• Background on Video Processing

• Fine-grained Parallel Video Encoding

• μ: Supercomputing as a Service

• Evaluation

• Conclusion & Future Work

53

Page 54: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Vision: Real-time Collaborative Video Processing

• Editing and sharing: Google Docs for Video.

• "Make this movie black and white."

• "Apply some awesome filter to my video."

• "Pixelate this face everywhere in this video."

• "Remake Star Wars Episode I without Jar Jar."

54

Page 55: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

Takeaways

• Low-latency video processing

• Two major contributions:

• A video encoder intended for massive fine-grained parallelism.

• A framework that orchestrates thousands of threads running in parallel on AWS Lambda.

• 64× faster than existing encoder, for less than $10.

• The future is granular & massively parallel • Parallel make • "Laptop Extension"

[email protected]

Page 56: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,

How time breaks down

56

download vpxenc decode encode-given-state waitwait rebase upload

start 25 s 50 s 75 s 100 s

thre

ads

1

4

7

10

13

16