elastic ephemeral storage for serverless analytics · 14 pocket o an elastic, distributed data...
TRANSCRIPT
1
Elastic Ephemeral Storage for Serverless Analytics
Ana Klimovic*, Yawen Wang*, Patrick Stuedi+, Animesh Trivedi+, Jonas Pfefferle+, Christos Kozyrakis*
*Stanford University, +IBM Research
OSDI 2018
2
Serverless Computing
o Serverless computing enables users to launch short-lived tasks with high elasticity and fine-grain resource billing
3
Serverless Computing
o Serverless computing enables users to launch short-lived tasks with high elasticity and fine-grain resource billing
o Serverless computing is increasingly used for interactive analytics
PyWren(SoCC’17)
gg: The Stanford Builder
ExCamera(NSDI’17) serverless
Amazon Aurora Serverless
4
Serverless Computing
o Serverless computing enables users to launch short-lived tasks with high elasticity and fine-grain resource billing
o Serverless computing is increasingly used for interactive analytics• Exploit massive parallelism with large number of serverless tasks
λ λλ
λ
λλ
λ λResult
User query&
input dataλλ
λλλ
λ
5
The Challenge: Data Sharingo Analytics jobs involve multiple stages of execution
o Serverless tasks need an efficient way to communicate intermediate data between different stages of execution
ResultUser query
& input data
λ λλ
λ
λλ
λ λ
λλ
λλλ
λ
ephemeral data
6
In traditional analytics…
o Ephemeral data is exchanged directly between tasks
mapper1
mapper2
mapper3
mapper0
reducer0
reducer1
7
In traditional analytics…
o Ephemeral data is exchanged directly between tasks
reducer0
reducer1
mapper1
mapper2
mapper3
mapper0
8
In serverless analytics…
o Direct communication between serverless tasks is difficult: • Tasks are short-lived and stateless
reducer0
reducer1
mapper1
mapper2
mapper3
mapper0
?
9
In serverless analytics…
o The natural approach for sharing ephemeral data is through a common data store
reducer0
reducer1
mapper1
mapper2
mapper3
mapper0
10
In serverless analytics…
o The natural approach for sharing ephemeral data is through a common data store
reducer0
reducer1
mapper1
mapper2
mapper3
mapper0
11
Requirements for Ephemeral Storage
1. High performance for a wide range of object sizes2. Cost efficiency, i.e., fine-grain, pay-what-you-use resource billing
Understanding Ephemeral Storage for Serverless Analytics. Ana Klimovic, Yawen Wang, Christos Kozyrakis, Patrick Stuedi, Jonas Pfefferle, Animesh Trivedi. ATC’18, 2018.
12
Requirements for Ephemeral Storage
1. High performance for a wide range of object sizes2. Cost efficiency, i.e., fine-grain, pay-what-you-use resource billing
• Example of performance-cost tradeoff for a serverless video analytics job with different ephemeral data store configurations
Finding the Pareto optimal resource allocation is non-
trivial…and gets harder with multiple jobs.
13
Requirements for Ephemeral Storage
1. High performance for a wide range of object sizes2. Cost efficiency, i.e., fine-grain, pay-what-you-use resource billing3. Fault-tolerance
Existing cloud storage systems do not meet the elasticity, performance and cost demands of serverless analytics jobs.
14
o An elastic, distributed data store for ephemeral data sharing in serverless analytics
o Pocket achieves high performance and cost efficiency by:• Leveraging multiple storage technologies• Rightsizing resource allocations for applications• Autoscaling storage resources in the cluster based on usage
o Pocket achieves similar performance to Redis, an in-memory key value store, while saving ~60% in cost for various serverless analytics jobs
15
Metadata server(s)request routing
Pocket Design
Storage serverCPUNet
HDD
Storage serverCPUNet
Flash
Storage serverCPUNet
DRAM
Storage serverCPUNet
DRAM
Controllerapp-driven resource allocation & scaling
Metadata server(s)request routing
16
Metadata server(s)request routing
Using PocketJob A
λ λ λ λ λ λ λλ λ λ λ λ λ λ
Job Bλ λ λ λ λ λ λ λ λ
Job C
i. Register job
Controllerapp-driven resource allocation & scaling ii. Allocate & assign
resources for job
Metadata server(s)request routing
Storage serverCPUNet
HDD
Storage serverCPUNet
Flash
Storage serverCPUNet
DRAM
Storage serverCPUNet
DRAM
17
Metadata server(s)request routing
Using PocketJob A
λ λ λ λ λ λ λλ λ λ λ λ λ λ
Job Bλ λ λ λ λ λ λ λ λ
Job Cλ λ λ λ λ λ λ λ λ λ
λ λ λ λ λ λ λ λ λ λ λ
iii. Deregister job
PUT ‘x’
GET/PUT API accepts hints about job attributes and
data lifetime
Controllerapp-driven resource allocation & scaling
Metadata server(s)request routing
Storage serverCPUNet
HDD
Storage serverCPUNet
Flash
Storage serverCPUNet
DRAM
Storage serverCPUNet
DRAM
18
Assigning Resources to Jobs
Controllerapp-driven resource allocation & scaling
i. Register job
Optional hints about job:§ Latency sensitivity§ Maximum # of concurrent tasks§ Total ephemeral data capacity§ Peak aggregate bandwidth required
Job Aλ λ λ λ λ λ λλ λ λ λ λ λ λ
Metadata server(s)request routing
Metadata server(s)request routing
1. Throughput allocation2. Capacity allocation3. Choice of storage tier(s)
Storage serverCPUNet
HDD
Storage serverCPUNet
Flash
Storage serverCPUNet
DRAM
Storage serverCPUNet
DRAM
19
Assigning Resources to Jobs
Controllerapp-driven resource allocation & scaling ii. Allocate & assign
resources for job
Job Aλ λ λ λ λ λ λλ λ λ λ λ λ λ
1. Throughput allocation2. Capacity allocation3. Choice of storage tier(s)
Job A: Server C àServer D à
0.4
0.6
Job B: Server A àServer B àServer C à
0.2
0.3
0.5
online bin-packing algorithm
Job Weight Map
Metadata server(s)request routing
Metadata server(s)request routing
i. Register job
Storage server ACPUNet
HDD
Storage server BCPUNet
Flash
Storage server CCPUNet
DRAM
Storage server DCPUNet
DRAM
20
Autoscaling the Pocket Cluster
o Goal: scale cluster resources dynamically based on resource usage
o Mechanisms:• Monitor CPU, network bandwidth, and storage capacity utilization• Add/remove storage & metadata nodes to keep utilization within range• Steer data for incoming jobs to active nodes • Drain inactive nodes as jobs terminate
o Avoid migrating data
21
Implementation
o Pocket’s metadata and storage server implementation is based on the Apache Crail distributed storage system [1]
o We use ReFlex for the Flash storage tier [2]
o Pocket runs the storage and metadata servers in containers, orchestrated using Kubernetes [3]
[1] Apache Crail (incubating). http://crail.apache.org/[2] ReFlex: Remote Flash == Local Flash. Ana Klimovic, Heiner Litz, Christos Kozyrakis. ASPLOS’17, 2017.[3] Kubernetes. https://kubernetes.io/
22
Pocket Evaluation
o We deploy Pocket on Amazon EC2
o We use AWS Lambda as our serverless platformo Applications: MapReduce sort, video analytics, distributed compilation
Controller m5.xlarge
Metadata server m5.xlarge
DRAM server r4.2xlarge
NVMe Flash server i3.2xlarge
SATA/SAS SSD server i2.2xlarge
HDD server h1.2xlarge
23
Application Performance with Pocketo Compare Pocket to S3 and Redis, which are commonly used today
S3 does not provide sufficient
throughput
MapReduce sort job hintsEphemeral capacity
100 GB
Latency sensitive FalseAggregatepeak throughput
100 Gb/s
24
Application Performance with Pocketo Compare Pocket to S3 and Redis, which are commonly used today
Pocket achieves similar performance to Redisbut uses NVMe Flash
MapReduce sort job hintsEphemeral capacity
100 GB
Latency sensitive FalseAggregatepeak throughput
100 Gb/s
25
Application Storage Cost with Pocketo Pocket leverages job attribute hints for cost-effective resource allocation and
amortizes VM costs across multiple jobs, offering a pay-what-you-use model
(with throughput & capacity hints)
Pocket reduces cost by ~60% compared
to Redis for all 3 jobs
26
Autoscaling the Pocket Cluster
Job hints Job1: Sort Job2: Video analytics Job3: Sort
Latency sensitive False False False
Ephemeral data capacity 10 GB 6 GB 10 GB
Aggregate throughput 3 GB/s 2.5 GB/s 3 GB/s
27
Autoscaling the Pocket ClusterThe controller elastically
scales resources to meet the requirements
of multiple jobs
Job hints Job1: Sort Job2: Video analytics Job3: Sort
Latency sensitive False False False
Ephemeral data capacity 10 GB 6 GB 10 GB
Aggregate throughput 3 GB/s 2.5 GB/s 3 GB/s
28
Conclusiono Pocket is a distributed ephemeral storage system that:
• Leverages multiple storage technologies• Rightsizes resource allocations for applications• Autoscales storage cluster resources based on usage
o We designed Pocket for ephemeral data sharing in serverless analytics. More generally, Pocket is an elastic, distributed /tmp.
www.github.com/stanford-mast/pocket