sunday, may 24, 2015 data minimisation managing data growth while containing cost and carbon...
TRANSCRIPT
Tuesday, April 18, 2023
Data Minimisation
Managing Data Growth While Containing Cost and Carbon Footprint
Ken Hall, Dimension Data
Agenda
Introductions
Today’s data management challenges
Energy efficiency in the data centre
What is Data Minimisation?
Online Active Archiving
Backup Data De-Duplication
Data Minimisation effects
Developing the business case
Questions & Answers
Dimension Data - ‘Data Centre & Storage Solutions’
Network Integration
Microsoft Solutions Infrastructure
Microsoft Solutions Application Integration
Security
Managed Services
Customer Interactive Solutions
Data Centre & Storage Solutions – Availability, Compliance & Optimisation
• Storage Solutions – SAN, NAS, CAS
• Virtualisation Solutions – DR, Server & Desktop Consolidation
• Backup, Recovery & Archiving Solutions
• Data Centre Environmental’s – Power, Cooling & Rack Solutions
Key Technology Partners
• APC, Cisco, EMC, HDS, HP, IBM, Microsoft, NetApp, Quantum, Symantec, Sun
The Digital Universe is Rapidly Expanding
Source: IDC White Paper, "The Diverse and Exploding Digital Universe," March 2008
Ten-fold growth in five years!
1,773 exabytes
173 exabytes
Exa
byt
es
0
200
400
600
800
1,000
1,200
1,400
1,600
1,800
2006 2007 2008 2009 2010 2011
Amount of Digital Information Created and Replicated Each Year
Typical DD Customer – Exponential Data Growth
• Annual Compound Data Growth of 65%
• Daily Incremental and Weekly Full
• 2 Week Retention on Disk (3 Full’s - 10 Incr)
• 4 Week Retention on Tape
• 12 Monthly’s on Tape kept indefinitely
• Having to squeeze more into Backup Window
• B2D Requirement Growing Rapidly
• Backup Media Server/s Under Pressure
• Network Bandwidth Constraints
• Tape Infrastructure &Handling Costs Increasing
Coping with Information Growth in Today’s Economy
*“Global purchases of IT goods and services… will equal $1.66 trillion in 2009, declining by 3 percent after an 8 percent rise in 2008.”
Global IT Market Outlook: 2009, Forrester Research, January 12, 2009
In 2009, IT budgets are flat or declining* Escalating costs for primary storage
Difficulty meeting backup and recovery windows
Ensuring high availability of information
Providing timely access to historical information
Data Center Energy Use is Doubling
IT energy use has doubled since 2000 and will likely double again by 2011
Energy operating costs will soon exceed the cost of purchase for servers
Existing conservation technologies can reduce consumption to 2002 levels
Comparison of Projected Electricity Use, 2007 to 2011
Source: EPA report to Congress, 2007
0
20
40
60
80
100
120
140
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
An
nu
al
Ele
ctr
icit
y U
se
(bil
lio
n k
Wh
/ye
ar
State of the art scenario
Historical energy use
Available Capabilities for Energy Efficiency
Improve Efficiency – Reduce Energy Consumption
INCREASEUTILIZATION
REDUCECAPACITY
Storage tiering
Virtual LUNS
File and e-mail tiering
Storage virtualisation
Large-capacity drives
Replication acrossstorage tiers
Snaps
Clones
Compression
De-duplication
Archiving
Server virtualisation
Data migration
Storage consolidation
Virtual Provisioning
Flash drives
Optimisation algorithms
Automated discovery
Document management
How can we...
Implement a Data Minimisation Strategy
Manage exponential data growth, while...
Improving access to organisational data Containing data management and infrastructure costs Reducing the data centre’s carbon footprint...
Online archiving of e-mail and file systems Backup with data de-duplication
Data Minimisation Elements
Retention and compliance
Data reduction
Universal access
Simplify management
Tier backup infrastructure
Optimise media: B2D, VTL, de-dupe and tape
Address security issues
Simplify management
Identify candidates for archiving
Classify and move
Establish SLAs based on information class
New Technologies and Services are Enablers
PrimaryStorage
BackupArchive
Data Minimisation – How it works
1. Archive the inactive data before you perform the backup process
Identify Inactive Data based on polices Automate the movement of the data to a lower cost storage tier or dedicated
archive platform leaving stubs behind Items are retrieved from the online archive on user demand Backup up the archive infrequently or never
2. Backup the remaining data using resource efficient data de-duplication
Rapid ‘Full Backups’ - only the ‘sub-file’ changes are sent and stored on disk Minimal Bandwidth – only a fraction of the typical 200% is sent over the wire Minimal Storage Consumption – only unique ‘sub-file’ blocks are stored Protect more, with less for longer
Today: Energy-Efficient Storage Design
1 TB Data on Different Capacity/Performance Drives
787 kWh/yr
1,434 kWh/yr
3,048 kWh/yr
94%
87%
73%50%
393 kWh/yr
CONSUME LESS ENERGY BY CAPACITY
15K73 GB
15K146 GB
10K300 GB
7.2K500 GB
7.2K 1 TB
6,096 kWh/yr
73 GBFlash drive
3,790 kWh/yr30x
IOPS
38%Less
Energy
April 18, 2023
File System Archiving
Extract inactive, final-form data to an archive
Enhance performance of production applications
Reduce size of backup datasets
Free up expensive Tier 1 disk
Store archived data on high density low cost energy efficient storage
10 TBExtract Alwaysavailable
Before Backup full, 10 TBAfter Back up 4 TB, active data only
Active archive
Primarystorage
4 TB
6 TB
Secondarystorage
Inactive data
Reclaimedstorage
Production
ActivedataActivedata
E-Mail Archiving
Message Server E-mail Archive ServerSpace saved on e-mail server is typically 60–80%
Shortcut
User’s Inbox
Message 1 Jan. 1, 2008To: Rick Subject: QuestionAttached:
Message 2 Jan. 1, 2008To: Ron Subject: UpdateAttached:
Message 3 Feb. 1, 2008To: Bill Subject: Training
Message 1 Jan. 1, 2008To: Rick Subject: QuestionAttached:
Message 2 Jan. 1, 2008To: Ron Subject: UpdateAttached:
Message 3 Feb. 1, 2008To: Bill Subject: Training
Shortcut
Shortcut
Mail Archival automatically create shortcuts to archived messages / attachments…and deletes the original attachments from the e-mail server
E-mail Archive
Message 1 Jan. 1, 2008To: Rick Subject: QuestionAttached:
Message 2 Jan. 1, 2008To: Ron Subject: UpdateAttached:
Message 3 Feb. 1, 2008To: Bill Subject: Training
Message 1 Jan. 1, 2008To: Rick Subject: QuestionAttached:
Message 2 Jan. 1, 2008To: Ron Subject: UpdateAttached:
Message 3 Feb. 1, 2008To: Bill Subject: Training
Definition of De-duplication
“The process of detecting and identifying the unique data segments within a given set of information, enabling the elimination of redundancy when stored or moved.”
Before: total segments = 39 After: Unique segments = 6
Data Set 3
Data Set 2
Data Set 1
De-duplication
Data De-duplication: How it Works
A B C D
Unique data stored on disk, available for immediate recovery
Only unique data segments are backed up
AB
CD
Data already backed up, so only a unique ID pointer is stored (20 bytes)
E
ENew data segment identified and backed up
First Instance Duplicate Instance Modified Instance
A B
C D
A B
C D
B
C D
E
May 2007 May 2007 June 2008
Key Point – Data Minimisation requires a platform that doesn’t need to be backed up!
WORM DISK
Tier 3 Disk
Active ArchivingWORM delivers unique features for online archives
Location independence
Self-healing and management
Guaranteed authenticity
Single-instancing
Online ArchivingTier 3 Disk with SATA and NAS with ATA
Offline ArchivingTape is best suited for offline archives
Tape
Customer Archival Requirements
Management Efficiency
Arc
hivi
ng F
unct
iona
lity
Data Minimisation Strategy - How it all fits together
Tier 1Primary Storage
Tier 2Secondary Storage
Tier 3Archive long termRetention on disk
80% of data
Tier 5Legacy long
Term retentionOn tape
Optional 20%
Tier 4Backup to disk
(De-Dupe)Quick recoveryOptional 20%
Daily data backups
Daily data
backups
O H De-duped DataStaticData
growth
StaticData
growth
Tier 3 Data Growth
No management
required
Quantified Results – Reduce Tier 1/2 with Archiving
Major reduction in expensive Tier1/2 Storage
Tier 3 Archive storage minimised due to single instancing & compression
73% reduction in power and cooling requirements for archived data
Quantified Results – The Data Minimisation Leverage
Good Tier 4 Savings with Archiving or De-Duplication
Excellent results by combining Archiving & Backup Data De-Duplication
6 x reduction in power and cooling requirements for B2D storage
Quantified Results – Less Tape Infrastructure
Associated reduction in Tape Library Slots, Drives, Management & Handling
Power of combining Archiving & De-Duplication – 560 Less LTO4 Tapes in Year3
Tape could be removed altogether – Offsite Replication & Disk Spin-Down
Data management cost comparison – Data Minimisation
Significant Reduction of Backup Infrastructure and Tape Management
• Tape Drive, Tape Licences, Slots, Library, Backup Server, Tape Media, Offsite Storage & Recall Costs, Admin Costs
April 18, 2023© Copyright Dimension Data 2000 - 200622
Data Minimisation Assessment – Business Case
• Current backup minimisation methods give you better efficient backups
• However it doesn't fix the cause of the problem which is data growth
• A combination of data archival, backup de-duplication and compression represents the most effective manner to contain data within your environment
• Helps quantify business case for archiving (or other appropriate solution)
• Workshop to identify costs/issues
April 18, 202323 © Copyright Dimension Data 2000 - 2008
Data minimisation strategy achieved by...
Archiving over 70% of data to a protected environment which removed the need for that data to be backed up via archiving
Minimised the impact of data backup via de-duplication and compression (reduction in data volume and backup data by 80%)
Minimised the impact of VMware on the environment through de-duplication
Contained Tier 1 disk growth and spend
Provided the most storage efficient backup method possible today
Estimated savings to be over 5 Million dollars in 5 years.
My initial Sync took 12 hours now I backup in 50 mins’ – Dimension Data Customer
0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
4,500
2006 2008 2010 2012 2014 2016
Estimated Infrastructure Run Rate
Units / kW / Tons
0
5,000
10,000
15,000
20,000
25,000
Footprintsq. ft.
Equipment (Units) Power (kW) Cooling (Tons) Footprint (sq. ft.)
$
$
$
$0
$500
$1,000
$1,500
$2,000
$2,500
$3,000
$3,500
$4,000
$4,500
"K$"
Cost BAU $708 $1,410 $2,107 $4,226
Cost Optimized $278 $560 $840 $1,678
Savings $430 $850 $1,267 $2,548
Year 1 Year 2 Year 3 Total
$