joshua reich princeton university department of computer science 1 p2p file-systems for scalable...
Post on 19-Dec-2015
219 views
TRANSCRIPT
1
Joshua ReichPrinceton University
Department of Computer Science
P2P File-Systems for Scalable Content Use
2
Goal:Scalable Content Distribution• In crowd (WAN – gateways)
or cloud (LAN – data-center servers)
• For use• Not all parts of content are used at
the same time!–Multimedia content– Executables– Virtual Appliances
3
Domain: Cloud
• Data-center w/ – physical machines– network storage
• VM - Software implementation of hardware machine [the machine as an executable]
• VMM - Software layer virtualizing hardware [OS for VMs]
4
Motivation• VM optimized for specific purpose
– Virtual Appliances
– Virtual Servers
– Virtual Desktops (VDI)
• Zero config, isolated, easy to replicate
• Shared infrastructure is cheaper
• Less IT headache
• 15772 unique images on EC2 alone!*
• Hosted VDI market alone est. $65B in 2013***http://thecloudmarket.com/stats#/totals, checked 15 July 2011
**http://www.cio.com/article/487109/Hosted_Virtual_Desktop_Market_to_Cross_65_Billion_in_201326 March, 2009
5
• VM images stored on network
• Contention for networked storage results in I/O bottlenecks
• I/O bottlenecks significantly delay VM execution
High-level problem
• VM image stored on SAN or NAS
• Accessed by servers hosting VDI instances
• Everyone comes to work in the morning, starts up their desktop
• SAN overloaded by simultaneous access
• Virtual Desktops stall
SAN
Example: Hosted VDI Boot Storm
7
Specific Challenges1. Large image size + high demand =
contention-induced network bottleneck
2. VMM expects complete image– Either download image completely– Or continual remote access
3. Complex VM image access patterns– Non-linear– Differ from run to run
8
• Assume (2) & (3) aren’t problems
• Begins to look like video streaming
• Known approach: P2P Video-on-Demand
– Need to stream a series of ordered pieces
–While maintaining swarm efficiency
– Use mix of earliest-first & rarest-first
Analogy to Streaming Video
9
Novel VMTorrent Architecture
1. Large image/high demand -> P2P
2. Complete VM image req’d -> Quick-Start
3. Non-linear access -> Profile Prefetch
10
Related Work MatrixWork Approach Problem
AddressedNotes
Mietzner:2008 Shi:2008
Sequential distribution of VM
images
VM Deployment Slow, doesn’t scale
O’Donnell:2008Chen:2009
Naive P2P distribution of VM
images
VM Deployment Slow, scales
Industry Hardware overprovisioning
VM Deployment Fast, expensive
Chandra:2005Moka5
content prefetching +
on-demand streaming
Virtual Desktop Delivery
Fast, highly structured
Vlavianos:2006Zhou:2007
Mix earliest first / random first prefetch
Video Streaming
Fast, scales well
VMTorrent Quick start + P2P + profile prefetch
VM Deployment
Fast, scales well
11
VM
VMM
Hardware/OS
Custom FS
6 7
VMTorrent Architecture
Swarm
P2P Manager
profile
VMTorrent Instance
UnmodifiedVM & VMM
13
Traditional VM Execution
VM
HostVM
Image
FS
• VM runs on some host
• Virtual Machine: software implementation of a
computer• Implementation stored in an image
• Image stored on host’s local file system
14
Traditional VM Execution
VM
VMM HostHardware/
OS
VM Imag
e
FS
• Virtual Machine Monitor virtualizes hardware• Conducts I/O to image through FS
15
VM Execution Over Network
VM
VMM
Hardware/OS
VM Imag
e
FS
Either to download image
NetworkBacken
d
Network backend used
Or to access via remote FS
16
VM Execution Over Network
VM
VMM
Hardware/OS
VM Imag
e
FS
Remote access smaller hits, but also writes and re-reads
NetworkBacken
d
Download – one big up front performance hit
17
CustomFS
Quick Start with Custom FS
VM
VMM
Hardware/OS
VM Imag
e
FS
NetworkBacken
d
Divide image into pieces
But provide appearanceof complete image to VMM
Introduce custom file system
18
CustomFS
Quick Start w/ Custom FS
VM
VMM
Hardware/OS
NetworkBacken
d
0 1 23 4 56 7 8
VMM attempts to read piece 1Piece 1 is present, read completes
19
CustomFS
Quick Start w/ Custom FS
VM
VMM
Hardware/OS
NetworkBacken
d
0 1 23 4 56 7 8
VMM attempts to read piece 0Piece 0 isn’t local, read stallsVMM waits for I/O to completeVM stalls
20
CustomFS
Quick Start w/ Custom FS
VM
VMM
Hardware/OS
NetworkBacken
d
0 1 23 4 56 7 8
FS requests piece from backendBackend requests from network
21
Quick Start w/ Custom FS
VM
VMM
Hardware/OS
NetworkBacken
d
0
Later, network delivers piece 0
CustomFS
1 23 4 56 7 8
0
Read completesCustom FS receives, updates piece
VMM resumes VM’s execution
22
Improved Performance w/ Custom FS
VM
VMM
Hardware/OS
No waiting for image download to complete
NetworkBacken
d
No more writes or re-reads over network w/ remote FS
CustomFS
1 23 4 56 7 8
0
X
X
24
Alleviate bottleneck to network storage
VM
VMM
Hardware/OS
NetworkBacken
d
CustomFS
1 23 4 56 7 8
0
Scaling w/ P2P Backend
Swarm
P2PManage
r
1 23 4 56 7 8
0
Exchange pieces w/ swarmP2P copy remains pristine
25
VM
VMM
Hardware/OS
CustomFS
1 23 4 56 7 8
0
Minimizing Stall Time
Swarm
P2PManage
r
1 23 4 56 7 8
0
VMM accesses to non-local pieces
6 7
4?
4?
4!
Trigger high priority swarm requests
26
VM
VMM
Hardware/OS
CustomFS
1 23 4 56 7 8
0
Custom FS + P2P Manager
Swarm
P2PManage
r
1 23 4 56 7 8
0
6 7
27
P2P Challenge: Request fulfillment latency
• Delays– Network RTT
– At image source (peer or server)
• Impact– If even occasionally it takes 0.5s to obtain
piece
– Over the course of thousands of requests
– 10’s of seconds may be lost
28
P2P Challenge: Network Capacity
Mem-cached: ideal access rate for given physical machine
(s)
Cumulative
Demand FS: ideal access rate w/ read-once never writePrefeching: ideal access rate w/ perfect prefetching
Even assuming no latency
100Mb Network
Delay lower bound
31
• Collect access patterns for VM/workload
• Determine expected accesses–Divide accesses into blocks
– Sort by average access time
– Remove blocks accessed in small fraction of runs
• Encode new order in profile
Solution: Generate Profile Using Statistical Ordering
32
E.g., During boot storm
⇒ All actively fetch same small set of pieces
⇒ Low piece diversity
⇒ Little opportunity for peers to share
⇒ Low swarming efficiency
P2P Challenge: In-order Profile Prefetch Inefficient
33
Solution: Randomization and Throttling
• Randomize prefetch order
• Rate limiting (based on
priority)
• Deadline-based throttling
34
VM
VMM
Hardware/OS
CustomFS
1 23 4 56 7 8
0
VMTorrent Architecture
Swarm
P2P Manager
1 23 4 56 7 8
0
6 7
profile
35
VM
VMM
Hardware/OS
CustomFS
1 23 4 56 7 8
0
VMTorrent Architecture
Swarm
P2P Manager
1 23 4 56 7 8
0
6 7
profile
VMTorrent Instance
UnmodifiedVM & VMM
37
VM
Hardware/OS
CustomFS
1 23 4 56 7 8
0
VMTorrent Prototype
BT Swarm
P2P Manager
1 23 4 56 7 8
0
6 7
profile
Custom CUsing FUSE
Custom C++& Libtorrent
38
Emulab Testbed*
• Up to 101 modern hardware nodes
• One VMTorrent instance per node
• 100Mb LAN
*[White:2002]
41
• We use normalized runtime (boot through shutdown)
• Normalized against memory-cached execution
• Allows easy cross-comparison for different VM/workload combinations
Data Presentation
44
Hypothesis
• Larger swarm -> Longer time until full swarm efficiency
• Demand prioritization -> Relative loss of prefetch piece diversity -> Lower swarm efficiency