peer-to-peer backup presented by yingwu zhu. overview a short introduction to backup peer-to-peer...
Post on 21-Dec-2015
216 views
TRANSCRIPT
![Page 1: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/1.jpg)
Peer-to-Peer BackupPeer-to-Peer Backup
Presented by Yingwu Zhu
![Page 2: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/2.jpg)
OverviewOverview
A short introduction to backup Peer-to-Peer backup
![Page 3: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/3.jpg)
Why Need Backup ?Why Need Backup ?
User errors, e.g., accidental deletion or overwriting
Hardware failures, e.g., disk failures Software errors, e.g., file-system corruption Nature disaster, e.g., earthquake
Using backup to recover system or user data
![Page 4: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/4.jpg)
What is Good Backup ?What is Good Backup ?
Two metrics– Performance(speed): as quickly as possible– Correctness: data integrity, recoverable
Data in backup is ensured not to be modified Data can be recovered from the backup
![Page 5: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/5.jpg)
RecoveryRecovery
Corresponds to Backup Why need Recovery ?
– Disaster recovery, e.g. recover the whole file system– Stupidity recovery, e.g, recover a small set of files
![Page 6: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/6.jpg)
Primary Backup/Restore ApproachesPrimary Backup/Restore Approaches
Logical backup, e.g., dump– Interpret metadata– Identify which files need backup
Physical backup– Ignore file structure– Block-based
![Page 7: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/7.jpg)
Logical BackupLogical Backup
Benefits due to the use of the underlying FS– Flexible: backup/restore a subset of files– Fast individual file recovery
Drawback due to the use of the underlying FS– Slow
Traverse file/directory hierarchy Write each file contiguously to backup medias
![Page 8: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/8.jpg)
Physical BackupPhysical Backup
Benefits– Quick: avoid costly seek operations– Simple: ingore file structure
Drawbacks– Non-portable: dependent on disk layout– Difficult to recover a subset of the FS: must full restor
e
![Page 9: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/9.jpg)
On-line BackupOn-line Backup
Different from many backups that require the FS to remain quiescent during backup
Allow users to continue accessing files during backup – Synchronous mirroring (remote)
Expensive in performance and network bandwidth Strong consistency
– Asynchronous mirroring (remote) Use copy-on-write technique Periodically transfer self-consistent snapshots Some data loss, but better in performance and bandwid
th consumption
![Page 10: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/10.jpg)
Pastiche: Making Backup Cheap and EaPastiche: Making Backup Cheap and Easysy
OSDI’02 Paper: P2P Backup Introduction System Design Evaluation Summary
![Page 11: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/11.jpg)
IntroductionIntroduction
Traditional backup– Highly reliable storage devices– Administrative efforts– Cost of storage media as well as managing the media
and transferring it off-site Internet backup
– Very costly, charge a high fee (i.e., $15 for 4GB data per month, neither applications nor the OSs)
![Page 12: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/12.jpg)
IntroductionIntroduction
Several facts– Large cheap disks, approaching tape, but with better
access and restore time– Low write traffic, newly written data is a small portion– Excess storage capacity, e.g., 53% full on 5,000 machi
nes
Take advantage of low write trafficTake advantage of excess storage capacity
![Page 13: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/13.jpg)
IntroductionIntroduction
Peer-to-peer backup– Exploits slack resources at participating machines– Administrative-free– Backup data on multiple machines, most of them are
nearby (for performace), but at least one faraway against disaster
– Consolidate similar files for effective storage, e.g., similar OS, Window2000/98
– Untrusted machines require data privacy and integrity
![Page 14: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/14.jpg)
IntroductionIntroduction
Enabling technologies for P2P backup– Pastry (P2P location and routing infrastructure)– Content-based indexing– Convergent encrytion
![Page 15: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/15.jpg)
IntroductionIntroduction
Pastry– Peer-to-peer routing– Locality-aware (e.g., network proximity metric)
Content-based indexing– Find similarity across versions of files, different files– Anchors using Rabin fingerprint: divide files into chun
ks– Editing a file only change the chunks it touch– Name each chunk by SHA-1 content hash – Coalesce same chunks across files
![Page 16: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/16.jpg)
IntroductionIntroduction
Convergent encryption– Originally proposed by Farsite– Each file is encrypted by a key derived from the file’s
contents by hashing – Data privacy and integrity– Data sharing
![Page 17: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/17.jpg)
System DesignSystem Design
Data is stored as chunks by content-based indexing– Chunks carry owner lists (a set of nodes)– Naming and storing chunks, see Figure 1– Chunks are immutable– Write chunks, reference count (+1)
Meta-data chunk for a file– A list of handles for its chunks, e.g.,<handle, chunkId>– Ownership, permission, create/modification times, et
c– Encrypted – Mutable, to avoid cascading writes from the file to roo
t
![Page 18: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/18.jpg)
System DesignSystem Design
![Page 19: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/19.jpg)
System DesignSystem Design
Using abstracts to find data redundacy– Signature: the list of chunkIds describing a node’s F
S– Fact: the signature of a node doesn’t change much o
ver time small amount of data updated– Initial backup of a node is expensive: backup all data t
o a backup site/node– Find an ideal backup buddy: holds a superset of the d
ata of the node which needs backup
![Page 20: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/20.jpg)
System DesignSystem Design
Using abstract to find data redundacy– How to find such a good buddy: ideal case or more ov
erlap in signatures ? Naive: compare two signature, impractical
– Large size of signature – A node’ buddy set can change over time
Abstract: random subset of a signature– Tens of chunkIds for an abstract can work well
![Page 21: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/21.jpg)
System DesignSystem Design
How to find a set of buddies ?– Requiremets for a set of buddies
Substantial overlap in signatures to reduce storage overhead
Most buddies should be nearby to reduce network load and improve restore performance
At least one buddy be faraway to provide geographic diversity
![Page 22: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/22.jpg)
System DesignSystem Design
Using two Pastry overlays to facilitate buddy discovery– Standard Pastry overlay with network proximity– Second overlay with FS overlap metric– Lighthouse sweep: discovery request contains an abst
ract Subsequent probes are generated by varying the first di
git of the original nodeId
![Page 23: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/23.jpg)
Backup Backup
A backup node has full control over what, when, and how often to backup
Each discrete backup is a single snapshot The skeleton for a snaphot is stored as a collect
ion of persistent, per-file logs, see Figure 2 The skeleton and retained snapshots: stored at
local node + its backup nodes
![Page 24: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/24.jpg)
BackupBackup
![Page 25: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/25.jpg)
BackupBackup
Backup procedure (Asynchronous using copy-on-write)– The chunks to be added to the backup buddies
First check, then fetch if needed (by buddies)– The list of chunks to be removed
Delete the chunk which doesn’t belong to any snapshots
Public key to ensure correctness of requests Deferred to the end of the snapshot process
– The meta-data chunks in the skeleton that changes as a result
Overwite old meta-data chunks
![Page 26: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/26.jpg)
RestorationRestoration
Partial restore is straightforward by retaining its archive skeleton
Try to restore from the nearest buddy How to recover the entire machine?
– Keeps a copy of its root meta-data object on each member of its leaf set
– The root block contains the set of buddies which backup its FS
![Page 27: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/27.jpg)
Detecting Failure and MaliceDetecting Failure and Malice
Untrusted buddy– Come and go at will– Claim to store chunks without actually doing so
A probabilistic mechanism to deal with it– Probe a buddy for a random subset of chunks it shoul
d store– If it passes the check, go on– Otherwise, replace it with another buddy candidate
![Page 28: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/28.jpg)
Preventing GreedPreventing Greed
Problem: a greedy node consumes too much storage
Proposed solutions– Contribution = consumption– Solve cryptographic puzzles in proportion to consum
ption of storage– Electronic currency
![Page 29: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/29.jpg)
An Alternative DesignAn Alternative Design
Distribute chunks to peer-to-peer storage system
Advantages– K backup copies of a chunk exist anywhere– Pastry takes care of failed nodes
Disadvantages– Do not consider network proximity, increase network
load and restore latency– Difficult to deal with malicious nodes
![Page 30: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/30.jpg)
EvaluationEvaluation
Prototype: the chunkstore FS + a backup daemon
Evaluate– Performance of the chunkstore FS– Performance of backup / restore– How large must an abstract be? Is the lighthouse swe
ep able to find buddies?
![Page 31: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/31.jpg)
Performance of the Chunkstore FSPerformance of the Chunkstore FS
Benchmark: MAB (Modified Andrew Benchmark)
Baseline: ext2fs Slight overhead due to Rabin fingerprints
![Page 32: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/32.jpg)
Performance of Backup/RestorePerformance of Backup/Restore
Workload: 13.4MB tree of 1641 files and 109 directories, total 4004 chunks
![Page 33: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/33.jpg)
Buddy DiscoveryBuddy Discovery
Abstract size vs. signature overlap– An Win98 with an Office 2000 professional, 90,000 chu
nks– A Linux machine, running Debian unstable release, 27
0,000 chunks– Result:
Estimates are independ of smaple size Small abstracts are effective if good buddies exist
![Page 34: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/34.jpg)
Abstract SizeAbstract Size
![Page 35: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/35.jpg)
Buddy DiscoveryBuddy Discovery
How effectively lighthouse sweeps find good buddies?– 50,000 nodes, a distribution of 11 types– 30%(type 1), 20%(2,3), 10%(4,5), 5%(6), 1%(7-11)
![Page 36: Peer-to-Peer Backup Presented by Yingwu Zhu. Overview A short introduction to backup Peer-to-Peer backup](https://reader037.vdocuments.site/reader037/viewer/2022110207/56649d565503460f94a34578/html5/thumbnails/36.jpg)
SummarySummary
Automatic backup with no administrative costs Exploit slack resources and coalesce duplicate
copies of chunks across files Backup to a set of buddies for good backup/res
tore performance Handle security issues over untrusty buddies