elastic ephemeral storage for serverless analytics · 14 pocket o an elastic, distributed data...

1

Elastic Ephemeral Storage for Serverless Analytics

Ana Klimovic*, Yawen Wang*, Patrick Stuedi+, Animesh Trivedi+, Jonas Pfefferle+, Christos Kozyrakis*

*Stanford University, +IBM Research

OSDI 2018

2

Serverless Computing

o Serverless computing enables users to launch short-lived tasks with high elasticity and fine-grain resource billing

3



o Serverless computing is increasingly used for interactive analytics

PyWren(SoCC’17)

gg: The Stanford Builder

ExCamera(NSDI’17) serverless

Amazon Aurora Serverless

4



o Serverless computing is increasingly used for interactive analytics• Exploit massive parallelism with large number of serverless tasks

λ λλ

λ

λλ

λ λResult

User query&

input dataλλ

λλλ

λ

5

The Challenge: Data Sharingo Analytics jobs involve multiple stages of execution

o Serverless tasks need an efficient way to communicate intermediate data between different stages of execution

ResultUser query

& input data

λ λλ

λ

λλ

λ λ

λλ

λλλ

λ

ephemeral data

6

In traditional analytics…

o Ephemeral data is exchanged directly between tasks

mapper1

mapper2

mapper3

mapper0

reducer0

reducer1

7

In traditional analytics…

o Ephemeral data is exchanged directly between tasks

reducer0

reducer1

mapper1

mapper2

mapper3

mapper0

8

In serverless analytics…

o Direct communication between serverless tasks is difficult: • Tasks are short-lived and stateless

reducer0

reducer1

mapper1

mapper2

mapper3

mapper0

?

9


o The natural approach for sharing ephemeral data is through a common data store

reducer0

reducer1

mapper1

mapper2

mapper3

mapper0

10


o The natural approach for sharing ephemeral data is through a common data store

reducer0

reducer1

mapper1

mapper2

mapper3

mapper0

11

Requirements for Ephemeral Storage

1. High performance for a wide range of object sizes2. Cost efficiency, i.e., fine-grain, pay-what-you-use resource billing

Understanding Ephemeral Storage for Serverless Analytics. Ana Klimovic, Yawen Wang, Christos Kozyrakis, Patrick Stuedi, Jonas Pfefferle, Animesh Trivedi. ATC’18, 2018.

https://www.usenix.org/system/files/conference/atc18/atc18-klimovic-serverless.pdf

12


1. High performance for a wide range of object sizes2. Cost efficiency, i.e., fine-grain, pay-what-you-use resource billing

• Example of performance-cost tradeoff for a serverless video analytics job with different ephemeral data store configurations

Finding the Pareto optimal resource allocation is non-

trivial…and gets harder with multiple jobs.

13


1. High performance for a wide range of object sizes2. Cost efficiency, i.e., fine-grain, pay-what-you-use resource billing3. Fault-tolerance

Existing cloud storage systems do not meet the elasticity, performance and cost demands of serverless analytics jobs.

14

Pocket

o An elastic, distributed data store for ephemeral data sharing in serverless analytics

o Pocket achieves high performance and cost efficiency by:• Leveraging multiple storage technologies• Rightsizing resource allocations for applications• Autoscaling storage resources in the cluster based on usage

o Pocket achieves similar performance to Redis, an in-memory key value store, while saving ~60% in cost for various serverless analytics jobs

15

Metadata server(s)request routing

Pocket Design

Storage serverCPUNet

HDD


Flash


DRAM


DRAM

Controllerapp-driven resource allocation & scaling


16


Using PocketJob A

λ λ λ λ λ λ λλ λ λ λ λ λ λ

Job Bλ λ λ λ λ λ λ λ λ

Job C

i. Register job

Controllerapp-driven resource allocation & scaling ii. Allocate & assign

resources for job



HDD


Flash


DRAM


DRAM

17


Using PocketJob A

λ λ λ λ λ λ λλ λ λ λ λ λ λ

Job Bλ λ λ λ λ λ λ λ λ

Job Cλ λ λ λ λ λ λ λ λ λ

λ λ λ λ λ λ λ λ λ λ λ

iii. Deregister job

PUT ‘x’

GET/PUT API accepts hints about job attributes and

data lifetime




HDD


Flash


DRAM


DRAM

18

Assigning Resources to Jobs


i. Register job

Optional hints about job:§ Latency sensitivity§ Maximum # of concurrent tasks§ Total ephemeral data capacity§ Peak aggregate bandwidth required

Job Aλ λ λ λ λ λ λλ λ λ λ λ λ λ



1. Throughput allocation2. Capacity allocation3. Choice of storage tier(s)


HDD


Flash


DRAM


DRAM

19

Assigning Resources to Jobs

Controllerapp-driven resource allocation & scaling ii. Allocate & assign

resources for job

Job Aλ λ λ λ λ λ λλ λ λ λ λ λ λ

1. Throughput allocation2. Capacity allocation3. Choice of storage tier(s)

Job A: Server C àServer D à

0.4

0.6

Job B: Server A àServer B àServer C à

0.2

0.3

0.5

online bin-packing algorithm

Job Weight Map



i. Register job

Storage server ACPUNet

HDD

Storage server BCPUNet

Flash

Storage server CCPUNet

DRAM

Storage server DCPUNet

DRAM

20

Autoscaling the Pocket Cluster

o Goal: scale cluster resources dynamically based on resource usage

o Mechanisms:• Monitor CPU, network bandwidth, and storage capacity utilization• Add/remove storage & metadata nodes to keep utilization within range• Steer data for incoming jobs to active nodes • Drain inactive nodes as jobs terminate

o Avoid migrating data

21

Implementation

o Pocket’s metadata and storage server implementation is based on the Apache Crail distributed storage system [1]

o We use ReFlex for the Flash storage tier [2]

o Pocket runs the storage and metadata servers in containers, orchestrated using Kubernetes [3]

[1] Apache Crail (incubating). http://crail.apache.org/[2] ReFlex: Remote Flash == Local Flash. Ana Klimovic, Heiner Litz, Christos Kozyrakis. ASPLOS’17, 2017.[3] Kubernetes. https://kubernetes.io/

http://crail.apache.org/

https://web.stanford.edu/~anakli/pdf/reflex.pdf

https://kubernetes.io/

22

Pocket Evaluation

o We deploy Pocket on Amazon EC2

o We use AWS Lambda as our serverless platformo Applications: MapReduce sort, video analytics, distributed compilation

Controller m5.xlarge

Metadata server m5.xlarge

DRAM server r4.2xlarge

NVMe Flash server i3.2xlarge

SATA/SAS SSD server i2.2xlarge

HDD server h1.2xlarge

23

Application Performance with Pocketo Compare Pocket to S3 and Redis, which are commonly used today

S3 does not provide sufficient

throughput

MapReduce sort job hintsEphemeral capacity

100 GB

Latency sensitive FalseAggregatepeak throughput

100 Gb/s

24

Application Performance with Pocketo Compare Pocket to S3 and Redis, which are commonly used today

Pocket achieves similar performance to Redisbut uses NVMe Flash

MapReduce sort job hintsEphemeral capacity

100 GB

Latency sensitive FalseAggregatepeak throughput

100 Gb/s

25

Application Storage Cost with Pocketo Pocket leverages job attribute hints for cost-effective resource allocation and

amortizes VM costs across multiple jobs, offering a pay-what-you-use model

(with throughput & capacity hints)

Pocket reduces cost by ~60% compared

to Redis for all 3 jobs

26

Autoscaling the Pocket Cluster

Job hints Job1: Sort Job2: Video analytics Job3: Sort

Latency sensitive False False False

Ephemeral data capacity 10 GB 6 GB 10 GB

Aggregate throughput 3 GB/s 2.5 GB/s 3 GB/s

27

Autoscaling the Pocket ClusterThe controller elastically

scales resources to meet the requirements

of multiple jobs

Job hints Job1: Sort Job2: Video analytics Job3: Sort

Latency sensitive False False False

Ephemeral data capacity 10 GB 6 GB 10 GB

Aggregate throughput 3 GB/s 2.5 GB/s 3 GB/s

28

Conclusiono Pocket is a distributed ephemeral storage system that:

• Leverages multiple storage technologies• Rightsizes resource allocations for applications• Autoscales storage cluster resources based on usage

o We designed Pocket for ephemeral data sharing in serverless analytics. More generally, Pocket is an elastic, distributed /tmp.

www.github.com/stanford-mast/pocket

http://www.github.com/stanford-mast/pocket

elastic ephemeral storage for serverless analytics · 14 pocket o an elastic, distributed data...

Documents