dedupe nmamit
TRANSCRIPT
Deduplication in Storage Systems
Joseph FernandesEwen PintoSrinivas Billava
Who we are ?
Joseph Fernandes (Senior Engineer, Red Hat Storage)
Ewen Pinto (VI Sem MCA, NMAMIT, Nitte)
Srinivas Billava (VI Sem MCA, NMAMIT, Nitte)
Agenda
What is Dedupe
Why Dedupe
Type of DedupeWhat is Deduped
Where its Deduped
When its Deduped
Challenges in Dedupe
Current work
What is Deduplication?
Intelligent way of storing data, by removing redundant copies of data and storing only one instance.
What is Deduplication?
Data units are identified by hash index
Redundant data units replaced by pointers
Hash algorithm with minimum collision
Search should be precise and fast
Should have rich metadata filter : Modification Frequency, IO Sizes etc
Should deal with distributed nature of data
Should do load balancing
Why dedupe?
Reduces Total Cost of Ownership (TCO)Storage
Network
Used in Backup/Archive
Disaster Recovery
Replication local/remote
What is deduped?
File Level (Single instancing)
File 1# HASH 1File 2
What is deduped?
File Level (Single instancing)
File 1# HASH 1Pointer File 2
What is deduped?
File Level (Single instancing)
File 1# HASH 1File 2
# HASH 2
What is deduped?
Block Level
File 1# HASH 1
B1B2B3B4B5B6File 1
# HASH 2
# HASH 3
# HASH 4
# HASH 5
# HASH 6
File 1B1B1B3B4B4B6File 2
Fixed Block Chucking
File is divided in even/equal length blocks
Pros: Faster!
Cons: Not space efficient!
Fixed Block Chunking
File
Variable Block Chunking
File is chucked in variable block length
Block size is determined by content
Rolling Hash algorithm : Rabin Karp
RHash = (p^n) * a[0] + (p^[n-1]) * a[1] + (p^[n-2]) * a[2] ..p * a[n-2] + a[n-1] If (RHash & fingerprint) == 0 { Chunk!}
Variable Block Chunking
File
Variable Block Chucking
Pros: Space Efficiency!
Cons: Slower !
Where its Deduped?
Client Side
Pros: Less network traffic
Cros: Heavier ClientsCPU/Memory
Metadata storage
Where its Deduped?
Server Side
Pros: Lighter Clients
Cons: more network traffic
When its Deduped?
Inline Deduped
Offline Deduped
Challenges in Dedupe
Single point of failure Last line of defense! Or fall off the cliff!
Performance
Distributed Dedupe
Current Work: YADL
Yet Another Dedupe Library
Stream based user space dedupe library
File or Object or Block
The Future : YADL-E
Current Work: YADL
https://github.com/YADL/yadl
Contributors:Ewen Pinto ([email protected])
Srinivas B ([email protected])
Karthik US ([email protected])
Sukumar Poojary ([email protected])
THANK YOU
Click to edit the title text format
Click to edit the outline text format
Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level