efficient p2p backup through buffering at the edge s. defrance, a.-m. kermarrec (inria), e. le...
TRANSCRIPT
Efficient P2P backup through buffering at the edge
S. Defrance, A.-M. Kermarrec (INRIA), E. Le Merrer, N. Le Scouarnec, G. Straub, A. van Kempen
Peer to Peer backup system
2 04/20/23
« Pure » P2P backup systems severely limited by:• Low availability • Asymmetric bandwidth (Low uplink speed)• Asynchrony
Exploit users’ ressources : each user provides storage space
Time To Backup (TTB) and Time to restore (TTR) data may be very high
Practical deployment is limited
Peer 2
0 h 12 h 24 h
Peer 1
CDN-assisted architecture
3 04/20/23
The performances of client-server systems are approached (in terms of Time To Backup and Time To Restore data)
However : • A centralized part remains• Not fully convenient for users
Server = Reliable component
Architecture proposed in P2P 2010 :
What we propose
4 04/20/23
Gateways are turned into stable buffering layers
To take into account the low-level structure of network (i.e the presence of gateways in home networks)
To use gateways to distribute the centralized part of the hybrid scheme
Home network
(LAN)
LAN
LAN
LAN
Mask the asynchrony between peers
Why gateways are good candidates ?
5 04/20/23
• Already present in users 'homes
• Storage capable (for buffering)
• Highly available
• At the frontier between a fast LAN and a slow WAN
Home network
Gateways are highly available
6 04/20/23
We periodically pinged a random set of static IP of a french ISP*
• 25,000 gateways
• For 7.5 months
*The trace is available at : http://www.thlab.net/~lemerrere/trace_gateways
10000
13000
16000
19000
22000
25000
Jul 1Sep 1
Nov 1
Jan 1Feb 11
Gate
ways
up
School holidays in France
•Average gateway availability : 86 %• Large part is very stable • A few have power-off habits (daily or holiday basis)
How do we evaluate ?
8 04/20/23
Trace-based simulation using public traces • To model peers behavior :
-Skype 28 Days 1269 Peers AvailabilityMean = 0.5
-Jabber 28 Days 465 Peers AvailabilityMean = 0.27
Scenario:
Size of archive : 1GB Data creation : Poisson process(3 backups/month/user avg) Erasure code 50 simulations/curve
• To model gateways behavior : our gateway trace
• To model bandwidth uplink : trace from a study of residential broadband networks UplinkMean = 66 kB/s We randomly assign one gateway and one uplink speed to one peer of each trace
What do we evaluate ?
9 04/20/23
CDN-Assisted (CDNA)
Pure P2P(P2P)
Gateway-Assisted(GWA)
We compare :
We evaluate :
• Time To Backup (Hours)• Time To Restore (Hours)• Mean and Max data buffered (Mbytes)
TTB : Time between the backup request and the time when the last block has been completely uploaded
TTR : Time between the restore request and the time we downloaded enough data to reconstruct the file
• Time To Backup(Stored safely at remote place)
TTB & TTR (Skype trace)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
0.1 1 10 100
CD
F
Hours
CDNA & P2P
GWACDNA
P2P
90th Percentile of completed backup
GWA CDNA P2P
30 H 60 H 140 H
90th Percentile of completed restore
GWA CDNA P2P
3 H 40 H 40 H
• Time To Restore(Retrieve an archive locally)
04/20/2310
Scaling (Skype trace)
11 04/20/23
0
20
40
60
80
100
120
0 1 2 3 4 5 6 7 8 9 10
TTR
(H
ours
)
Archive size (GB)
GWACDNA
P2P
Better scaling with archive size :
This enables users to backup larger amounts of data
• Low storage needs1GB archives: 2.5GB needed (99%)
Realistic for current gateways
Dimensioning (Skype trace)
0
0.2
0.4
0.6
0.8
1
1.2
0 100 200 300 400 500 600 700
short
stack
Buff
er
Consu
med a
t each
peer\
\(Ave
rage in M
B)
Time (Hours)
Total
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
0 0.5 1 1.5 2 2.5 3 3.5 4
CD
F
Provisionned Buffer (Max in GB)
Total
Stopping backups
04/20/2312
Avera
ge s
tora
ge o
n g
ate
ways
(MB
)
• Average usage remains lowLess than 1MB hereData is really offloaded to peers Gateway effectively used as buffers
Conclusion
13 04/20/23
• Realistic architecture for P2P backup systems • Evaluation using trace-based simulation
• TTB and TTR are greatly reduced(Network connection can be used more efficiently)
• More convenient for users : Let to offload backup tasks quickly (LAN speed) from the user’s machine to the gateway
• Fully decentralized
• Trace of gateway availability