dedupe nmamit

Download Dedupe nmamit

If you can't read please download the document

Upload: glusterorg

Post on 08-Jan-2017

145 views

Category:

Technology


0 download

TRANSCRIPT

Deduplication in Storage Systems

Joseph FernandesEwen PintoSrinivas Billava

Who we are ?

Joseph Fernandes (Senior Engineer, Red Hat Storage)

Ewen Pinto (VI Sem MCA, NMAMIT, Nitte)

Srinivas Billava (VI Sem MCA, NMAMIT, Nitte)

Agenda

What is Dedupe

Why Dedupe

Type of DedupeWhat is Deduped

Where its Deduped

When its Deduped

Challenges in Dedupe

Current work

What is Deduplication?

Intelligent way of storing data, by removing redundant copies of data and storing only one instance.

What is Deduplication?

Data units are identified by hash index

Redundant data units replaced by pointers

Hash algorithm with minimum collision

Search should be precise and fast

Should have rich metadata filter : Modification Frequency, IO Sizes etc

Should deal with distributed nature of data

Should do load balancing

Why dedupe?

Reduces Total Cost of Ownership (TCO)Storage

Network

Used in Backup/Archive

Disaster Recovery

Replication local/remote

What is deduped?

File Level (Single instancing)

File 1# HASH 1File 2

What is deduped?

File Level (Single instancing)

File 1# HASH 1Pointer File 2

What is deduped?

File Level (Single instancing)

File 1# HASH 1File 2

# HASH 2

What is deduped?

Block Level

File 1# HASH 1

B1B2B3B4B5B6File 1

# HASH 2

# HASH 3

# HASH 4

# HASH 5

# HASH 6

File 1B1B1B3B4B4B6File 2

Fixed Block Chucking

File is divided in even/equal length blocks

Pros: Faster!

Cons: Not space efficient!

Fixed Block Chunking

File

Variable Block Chunking

File is chucked in variable block length

Block size is determined by content

Rolling Hash algorithm : Rabin Karp

RHash = (p^n) * a[0] + (p^[n-1]) * a[1] + (p^[n-2]) * a[2] ..p * a[n-2] + a[n-1] If (RHash & fingerprint) == 0 { Chunk!}

Variable Block Chunking

File

Variable Block Chucking

Pros: Space Efficiency!

Cons: Slower !

Where its Deduped?

Client Side

Pros: Less network traffic

Cros: Heavier ClientsCPU/Memory

Metadata storage

Where its Deduped?

Server Side

Pros: Lighter Clients

Cons: more network traffic

When its Deduped?

Inline Deduped

Offline Deduped

Challenges in Dedupe

Single point of failure Last line of defense! Or fall off the cliff!

Performance

Distributed Dedupe

Current Work: YADL

Yet Another Dedupe Library

Stream based user space dedupe library

File or Object or Block

The Future : YADL-E

Current Work: YADL

https://github.com/YADL/yadl

Contributors:Ewen Pinto ([email protected])

Srinivas B ([email protected])

Karthik US ([email protected])

Sukumar Poojary ([email protected])

THANK YOU

Click to edit the title text format

Click to edit the outline text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level