deduplication now - etouches · • single domain backup data passes through an individual system...

34

Upload: others

Post on 24-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •
Page 2: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

DEDUPLICATION NOW …

AND WHERE IT’S HEADING

Lauren Whitehouse

Senior Analyst, Enterprise Strategy Group

Page 3: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Need Dedupe?

Page 4: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Before/After Dedupe

Page 5: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Deduplication

Backup

Disk

Deduplication

In Backup

Process

Production

Data

Page 6: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Dedupe Evolution

File-level

deduplication OR

single-instance

storage

Block-level

deduplication

technology

WAN

optimization

Deduplication

appliances

Multi-

node/Grid

Solutions

VTL with

deduplication

Backup with

Deduplication

Symantec

OST Interface

Dedupe on tape

Eliminate redundancy

across files Ability to create tapes for

long-term retention that

contain deduplicated data

?

Eliminate redundancy

within and between files

Optimizes network

bandwidth; aids with data

transport between sites

Changes the economics of

disk-to-disk backup

Tape-centric disk-to-disk

now optimized

Multi-node configurations

introduce ability to deliver

HA, load balancing,

performance increase and

global deduplication

Symantec solves catalog

tracking of deduped copies

Deduplication becomes a

more pervasive feature in

backup software

What’s next?

Page 7: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Data Growth Out of Control?

Page 8: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Managing the Data Deluge

20%

42%

23%

9% 6%

9%

28%

24%

9%

30%

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

1% to 10%annually

11% to 20%annually

21% to 30%annually

31% to 40%annually

More than 40%annually

At approximately what rate do you believe your total volume of data is growing annually?

(Percent of respondents)

100 or fewer servers (N=247) More than 100 servers (N=246)

62% with <100 servers have

<20% growth/year

63% with >100 servers have

>20% growth/year

Page 9: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Storage Spending Priorities

8%

9%

9%

12%

14%

15%

15%

17%

17%

18%

18%

21%

21%

23%

24%

36%

0% 10% 20% 30% 40%

Increase use of flash-based SSDs

Unified storage systems

Converged data and storage networking

Storage encryption solution

Advanced file storage / file system technology…

Purchase new NAS storage systems

Tape replacement

Use cloud storage services as way to source…

Tiered storage

Purchase more power-efficient storage hardware

Data reduction technologies

Storage virtualization

Improved storage management software tools

Purchase new SAN storage systems

Data replication solution for off-site disaster…

Backup and recovery solutions

In which data storage areas will your organization make the most significant investments over the next 12-18 months?

(Percent of respondents, five responses accepted, N=289)

Page 10: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Why Do We Need Dedupe?

Data Growth

Page 11: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

• Financial benefits

– Reduce disk costs; delay capital expenditures

– Lower bandwidth costs

– Reduce power & cooling costs

– Tape replacement savings

• Operational benefits

– Reduce operational overhead in backup

– Reduces time and resource needs for recovery

• Business benefits

– Increase retention periods

– Improve recovery objectives

– Improve backup consolidation from ROBOs

– Improve DR

Deduplication Creates Efficiencies

in D2D Backup

Page 12: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Best Dedupe Fit?

• “Traditional” file-level backup

• ROBO use cases

• Virtualized environments

Page 13: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

… and Worst Fit?

• Pre-compressed or encrypted data

• File types that don’t have versions

(multimedia)

Page 14: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

What Impacts Reduction Ratios?

• Backup strategy

(full vs. incremental or differential)

• Change rate between backups

• Retention

• When data is encrypted or compressed

Page 15: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Typical Dedupe Ratios

Less than 10x reduction,

29%

10x to 20x reduction,

56%

More than 20x reduction,

11%

Don’t know, 5%

On average, what degree of capacity reduction has your organization experienced by using data deduplication technology?

(Percent of respondents, N=140)

Page 16: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Capacity Savings

• Weekly full backup over 8 weeks

• 6 week retention

• 20:1 deduplication ratio

5

10

15

20

25

30

35

40

1.25 1.67 1.88 1.67 1.79 1.76 1.84 2.00

1 2 3 4 5 6 7 8

Retention Period (weeks)

Protected Capacity (TB) Stored Capacity (TB)

Page 17: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Which Dedupe Approach Is Best?

Backup Software

VTL Gateway Appliance

NAS Dedupe Device

Page 18: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Identifying Duplicates

• Hash algorithms

– More popular approach

• Fixed block size

• Variable block size

• Sliding window block size

– “Hash collisions” (false positives) a remote risk

– Central index of IDs

• Delta differences

– Faster

– No “false positives”

– Global deduplication across different backup streams is a

limitation

• Hybrid approach

– Combines delta differencing & hash calculation

– Less CPU- and memory-intensive

– Index is smaller

Page 19: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Data Deduplication – Where?

Backup Source Backup Initiator Backup Target

Remote or

Branch Office

ESX Server

OS

Apps

VMs

OS

Apps

OS

Apps WAN

Page 20: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Data Deduplication – When?

Backup Source Backup Initiator Backup Target

Inline deduplication - before data is written to disk

Post-process

deduplication

– after data is

written to disk

Remote or

Branch Office

ESX Server

OS

Apps

VMs

OS

Apps

OS

Apps WAN

Page 21: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Inline vs. Post-Process

Inline

• Requires less I/O

• Replication can begin

immediately

• Re-assembly of data for

recovery could impact

performance

• Examples

– EMC Data Domain

– IBM ProtecTIER

– NEC Hydrastor

– Symantec NBU 5000

Series

– Typically all software

approaches

Post-Process

• Requires more I/O

• Requires disk landing zone

(staging area)

• Dedupe & replication processes

overlap

• Most recent full kept in native

format

• Examples:

– Exagrid

– FalconStor

– GreenBytes

– HP VLS

– Quantum Dxi

– Sepaton DeltaStor

Page 22: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Single- vs. Multi-Node Solutions

Single-Node Dedupe

• Performance & capacity is

limited to upper threshold

– Forklift upgrade

– Add more islands of dedupe

– Over-purchase to

accommodate future growth

• Examples

– EMC Data Domain

– Fujitsu CS

– GreenBytes

– Quantum

Multi-Node Dedupe

• Manages multiple

deduplication systems as

one

• More linear throughput &

capacity scaling

• Load balancing

• Examples

– IBM ProtecTIER

– EMC Avamar

– Exagrid EX Series

– FalconStor FDS

– HP VLS

– NEC HydraStor

– Sepaton DeltaStor

– Symantec NetBackup 5000

Series

Page 23: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Local vs. Global Dedupe

Local • Single domain backup data

passes through an

individual system and is

compared with data passing

through the same system

• Examples:

– EMC Data Domain

– Fujitsu

– GreenBytes

– Quantum

Global

• Deduplication across domains

means backup data is

compared with data within its

system as well as other

systems in the domain

• Can result in higher dedupe

ratios

• Examples:

– Exagrid

– FalconStor

– HP VLS

– IBM ProtecTIER

– NEC

– Sepaton

– Symantec NBU 5000 Series

– Typically most backup

software solutions

Page 24: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Dedupe Approaches

Software-Based

• Content-aware; dedupe can be

policy-based

• Can be more cost-effective

• Flexibility in disk selection

• End-to-end bandwidth

efficiency; remote site backup

• Global dedupe

• Simplified management –

single console, policy engine

• Can extend to tape

• Examples: – Arkeia

– Asigra

– Atempo

– CA

– Cofio

– CommVault

– Druva

Hardware-Based

• Multiple backup vendor

environments

• No impact on application

performance

• Optimized replication

• Scalability of some solutions

may cause disruptive upgrades

or dedupe “islands”

• Examples: – EMC

– Exagrid

– FalconStor

– Fujitsu

– GreenBytes

– HP

– IBM

– NEC

– Quantum

– Sepaton

– Symantec

- EMC Avamar

- I365

- IBM

- PHD Virtual

- Quest

- Symantec NBU & BE

- Veeam

Page 25: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

High-Value Feature

• Target system integration with backup catalogs

and lifecycle policies

– Symantec OpenStorage (OST)

– EMC Networker

Page 26: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

What’s New in Dedupe?

• New dedupe techniques

– Example: Arkeia Progressive

• Dedupe on tape

– Example: CommVault

• Target solutions moving processes “upstream”

– Example: Data Domain Boost

• Modular dedupe

– Example: HP StoreOnce

• Dedupe in hardware/software from same

vendor

– Example: Symantec

• Ongoing improvements in capacity and

performance

Page 27: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Disruptive Trends

Page 28: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Purchase Considerations

9%

10%

12%

14%

17%

17%

21%

23%

24%

31%

33%

35%

46%

64%

0% 10% 20% 30% 40% 50% 60% 70%

When deduplication occurs

Experience of vendor in backup implementation

Deduplication ratio

Granularity of deduplication

Where deduplication occurs

Existing relationship with vendor

Ability to replicate deduplicated data off-site

Ability to deduplicate across systems/data sets as…

Vendor service and support

Scalability of solution

Integration with existing backup processes

Impact on backup/recovery performance

Ease of implementation/use

Cost of solution

Which of the following considerations would you say are most important in your organization’s evaluation and selection of data deduplication

technology? (Percent of respondents, N=145, five responses accepted)

Page 29: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Before Seeking Out Solutions …

• Understand your needs

– Capacity and throughput requirements/planning

• Full backup size; incremental backup size

• Number of full/incremental backups per week

• Change rate of data

• Projected growth rate

• Retention policies

• Full backup window

• Offsite copy window

– Performance requirements

– Requirements for offsite copies

– Budget

Page 30: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

How is Dedupe Evolving?

• Mix of hardware & software approaches

• Scale requirements

– Performance

– Capacity

• Focus on recovery considerations

– Speed of “rehydration” and restore

– Reliability

– Criticality of the index … how is it protected?

• New architectures

• New packaging

• New dedupe techniques

Page 31: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

THANKS!

[email protected]

Twitter: lauwhitehouse

Blog: www.dataprotectionperspectives.com

Page 32: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

APPENDIX

Page 33: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Fixed- vs. Variable-Length Blocks

Fixed-Length Blocks

Variable-Length Blocks

Change in file

Block A Block B Block C Block D Block E

Downstream blocks F, G & H change

= no duplication detected after the change

Initial

Examination

Subsequent

Examination

Change in file

Block A Block B Block F Block G Block H

Block A Block B Block C Block D

Block A Block E Block C Block D

Initial

Examination

Subsequent

Examination

Downstream blocks C & D unchanged

= duplication detected

Page 34: DEDUPLICATION NOW - etouches · • Single domain backup data passes through an individual system and is compared with data passing through the same system ratios • Examples: •

Inline dedupe

Backup Job

Replication

Backup Job

Replication

Post-process dedupe

Time to DR

Time

Time