73885 bruce yellin - significant savings are within your ...€¦ · risk avoidance and disaster...

47
Significant Savings Are Within Your Reach When You Understand the True Cost of Storage Bruce Yellin EMC Proven Profesional Knowledge Sharing 2009 Bruce Yellin Advisory Technology Consultant EMC Corporation [email protected]

Upload: others

Post on 28-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

Significant Savings Are Within Your Reach When You Understand the True Cost of StorageBruce Yellin

EMC Proven Profesional Knowledge Sharing 2009

Bruce YellinAdvisory Technology ConsultantEMC [email protected]

Page 2: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 2

Table of Contents

The Predicament .........................................................................................................................3 Where Do You Begin? ................................................................................................................4

Data Classification.....................................................................................................................................................5 Drive Basics ..............................................................................................................................................................7 Cache = Incredible Performance...............................................................................................................................8 How Much Data Can a Disk Process? ......................................................................................................................9 RAID Protection ......................................................................................................................................................12 True Usable Capacity..............................................................................................................................................15 Platform Basics and Scalability - Which Type Of Platform Is Right For You? .........................................................19 Connectivity and Storage Protocol Basics...............................................................................................................20 Backup/Recovery, Archiving and Deduplication......................................................................................................21

Archiving.....................................................................................................................................................................................22 Deduplication..............................................................................................................................................................................22 Backup and Restore...................................................................................................................................................................24

Risk Avoidance and Disaster Recoverability ...........................................................................................................25 Consolidate, Virtualize and Thin Provision to Save Money .....................................................................................26 What Are The Operating Costs Of Your Storage Frame? .......................................................................................28 How Long Do You Need To Save The Data?..........................................................................................................30 Cutting Costs and Spending Wisely ........................................................................................................................30

Ready For A New Storage Frame? ..........................................................................................34 Leasing Basics ........................................................................................................................................................37 A Basic Approach to Budgeting...............................................................................................................................40 Performing Your Own Maintenance and Save Money on Service Costs.................................................................42 Can You Really Afford to Run 24x7?.......................................................................................................................43 The Future – Reality or Fiction? ..............................................................................................................................43

Conclusion ................................................................................................................................45 Appendix A –Some of the U.S. data governance regulations...............................................46 Biography ..................................................................................................................................47

Disclaimer: The views, processes or methodologies published in this compilation are those of the authors. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies.

Page 3: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 3

THE PREDICAMENT You were just handed your new, cost-conscious annual budget and can’t imagine how you are

going to get through the next twelve months with the allocated funding. Data volumes are

growing and the new budget barely reflects last year’s growth. You wonder “What’s causing the

dramatic increase in storage requirements?” Personally, I have seen my customers’ storage

requirements jump almost 60% a year and they ask the same question.

File sharing and document creation are two areas where data growth has dramatically

increased. Documents are fatter than ever with fancy embedded html, pictures, charts and other

features. Then, we e-mail the document adding to the primary and data backup storage.

Here is a diagram based on an IDC

report1 showing how easily a 1MB

document, when e-mailed to four

friends, can quickly turn into 51.5MB of

storage. You have probably seen a

dramatic increase in your own e-mail

volumes storages.

Many studies indicate that the bulk of

your office data has not been accessed in over 6 months, yet it resides on your most expensive

primary storage and is backed up regularly. Our excuse? It is really difficult to figure out what we

need to keep and what we can delete. Simply,

o what application created the data

o which application owns it

o which application needs the data

o how long we should keep it or whether it can be safely deleted

So rather than deal with it, we store it and hope we can find it again.

1 http://tsr.blogs.com/telecom/files/diverse-exploding-digital-universe.pdf, page 8

Page 4: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 4

It comes down to basic cause and effect. You cannot stop data growth and simply storing it is

inefficient. Further, the consequences of data sprawl reach beyond the impact to your budget. It

places a strain on your network, servers, storage platforms and limited staff. Your backups take

longer, you may have to add more data center space and witness your business continuity

plans growing more complex. In the end, your best strategy is to apply the latest techniques and

try to save money along the way.

Case in point – look at the ramifications of

data growth in your company in terms of

electrical needs. What was the cost and

difficulty of getting more power into your

data center? Does this chart reflect your

company’s energy demands?

There is hope. Vendors realize how serious this problem has become and have developed

numerous approaches designed to help you process more data at a lower cost. These include

deduplication, archiving, disk backup, higher density, etc. My goal is to help you understand the

storage side of the problem so you can make wise choices when dealing with data sprawl.

WHERE DO YOU BEGIN? Where do you begin with the catch-22 of incredible growth, a limited budget and the lack of tools

to deal with the problem? Frankly, if you are running out of capacity, you have to make a short-

term decision to buy more. With a little more time, you can construct a plan for an evolving,

affordable storage infrastructure. This first section, data classification, deals with identifying

what you have and helping you to understand what you need. Later sections guide you through

available options and help you to make a buying decision that you and your budget can live with.

Page 5: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 5

Data Classification

From a storage perspective, one goal of data classification is to assign the correct storage

profile to meet a service level agreement (SLA). You must determine the right mix of technology,

cost and staff to maintain it. Your data would be on the fastest disks, housed in the most

reliable platform, and replicated for full business continuity if cost was not an issue. Choosing

the wrong classification inevitably results in a higher cost or poorer performance.

FFOOOODD FFOORR TTHHOOUUGGHHTT – Putting frequently accessed, vital data on slow storage may achieve sub-optimal performance. Likewise, infrequently accessed, less essential data on fast storage means you are spending too much money. If the data lacks the proper backup/recovery attention, you impact your RPO2 and RTO3. Likewise, over protect your data and you are spending too much money.

I help customers plan new systems by starting with a dialog not unlike your family physician

asking “How are you feeling?” I ask them to create a list of applications they use (like a medical

history) so I can learn about their data governance plans and help them pick the right type of

platform. Here are some questions you can use and their possible design impact:

Question Possible Impact Does the data require any security access? • Firewalls, encryption, remote support requirementsDoes it need to be retained/archived for a required period of time?

• WORM4, eDiscovery5, legal hold

What is the application that creates or owns the data? Do other systems use it?

• Application name/Operating Environment (e.g. Accounts Payable/Oracle)

• Data rules and processes Does it have any performance or other special requirement? Does it need to go faster? Are there any issues, goals or areas of improvement?

• RAID6 level, tiering, drive classification, storage frame tunables, network speeds

• High end or “regular” database, OLTP, e-mail, imaging, file/print, test and development, web, etc.

• Server make/model and SAN7 connectivity requirements, such as boot from SAN

Backup frequency? If there is an issue, what are the RTO and RPO for it?

• Tape backup, backup to disk, local snap8 or replication needs

2 RPO - Recovery Point Objective is the point in time you want to recover the business to after an outage 3 RTO - Recovery Time Objective is the amount of time for which a business needs to be restored after an outage. 4 WORM - Write Once, Read Many storage is for archival. It is often constructed with low-cost SATA drives in a NAS file system. 5 eDiscovery - Electronic discovery is legal discovery in litigation 6 RAID - Redundant Array of Independent Disks allows for the use of two or more disk drives for greater performance, reliability or larger logical sizes. 7 SAN – Storage Area Network – switching hardware connecting servers to storage using fiber optic cables. Associated with block-level storage (in contrast to file-level storage). 8 Snap or Snapshot is a point in time copy of data that typically saves space compared to a full volume copy.

Page 6: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 6

Question Possible Impact Does it require local and/or remote protection

• Clusters? Fault tolerance? Part of your business continuity plan?

Who is the staffer responsible for it? • IT does not typically own data – emergency, backup/recovery

Sizing and growth plans or peak business cycles?

• Usable capacity GB2 (not RAW capacity –covered in a later section)

• Capacity planning, 1, 2 and 3 year horizons

The following application classification chart can help you to build a list of storage frame

requirements that ensures you will purchase storage providing appropriate protection and

performance. Make a list with an application in one column and a category for each of these

topics. There are no rules for what type of application should go on what tier of drives:

Tier Typical applications Suggested RAID Protection

0/1 - Enterprise Critical – highest value to the company, fastest performance, best affordable protection, company can not to business without it.

"Hot" database logs and temp files High write requirements File systems with critical SLAs Clustered file systems sync data

RAID 1 or 5, solid-state disks, 15,000 RPM (15K), fibre-channel(FC)9 or SAS10 drives

2 - Business Critical – secondary activities that need to be available as soon as practical, high performance, acceptable protection.

OLTP eCommerce ERP Databases Web servers E-mail

RAID 5, 10,000 RPM (10K), FC or SAS drives

3 – Department Critical – day to day flow, good performance

Databases Some online backup Multi media editing Imaging Document management "Data warehouse Data mining" NAS file systems

RAID 5 or 6, 7,200 RPM (7.2K), SAS, SATA11 drives

4 – Offline, Seasonal, Retention – no immediate need, planned access

Archived data Backup to disk

RAID 6, 7,200 or 5,400 RPM, SATA drives, spin-down

9 FC or Fibre Channel - Uses a disk command set from the SCSI interface to provide data for block-mode access at speeds ranging from 1,000 Mb/s to 8,000 Mb/s. Typically uses fibre optic cables. 10 SAS - Serial Attached SCSI uses a smaller pin-count cable than SCSI. Theoretical transfer speed is 300 MB/s with a next generation SAS reaching 600 MB/s. SAS is considered to be the follow on technology to SCSI. 11 SATA - Serial Advanced Technology Attachment is an ATA drive using serially sent data over a 7 wire cable. Controller intelligence relied on the host CPU with a theoretical transfer speed of 300/MBs. Future specifications call for 60/MBs

Page 7: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 7

As you would imagine, Tier 0 drives are much faster than Tier 1, and so on. The tiering concept

is simple - place data on the correct performing set of drives. For example, you might not want

to take your family of five on vacation in a subcompact car, nor if you had a choice would you

want to drive a truck to get a gallon of milk. You could, but neither choice would be efficient. Use

faster, more expensive drives on portions of application that really demand fast performance

and slower, denser drives only when they make sense.

Faster, smaller drives can cost a lot more per GB10 than slower, larger drives as we will see in

later sections. A fast 300GB10 15K FC drive costs $501 as compared to $147 for a slow 1.5TB10

7.2K SATA II drive12. You probably can’t use larger drives for all of your workload because your

system would be too slow. Ideally, performance oriented storage software using data metrics

can move data to the correct hardware tier for a properly balanced system. Lastly, your

requirements can change over time, so perform some periodic data balancing tune-ups.

Drive Basics

Storage frames offer SSDs (Solid State Disks), FC, SAS, SATA and SCSI. Let’s take a look at

where your data should reside because each drive type has different characteristics, different

application profiles, and a major impact on your budget.

The SSDs have no moving parts, are the fastest drives, and cost more than FC or SAS drives.

The FC and SAS drives have the best mechanical performance offering the same rotation

speed and access time specifications, yet enterprise class frames still uses FC drives rather

than SAS drives (this is expected to change over the

next few years). SATA drives are associated with low-

access or archive applications. Drive support varies

from one manufacturer to another, so check with your

vendor for which type they support on their frames.

12 Prices are from www.cdw.com as of 1/21/09

Page 8: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 8

FC and SAS drives have speeds between 7,200-15,000 RPM while SATA speeds are 5,400-

7,200 RPM. The common abbreviations you will see are 15K for 15,000 RPM, etc. The higher

the number, the faster the drive. The specifications shown in

this chart give the relative performance of each drive type. The

FC drive has the fastest internal data rate and ties SAS for the

lowest latency – more on these metrics in a later section.

Due to cable length issues, it is far less common to see external arrays with SCSI drives.

Cache = Incredible Performance

For many years, cache memory has been used as an intermediary to speed the transfer of data

between the server and the relatively slow, mechanical disk drive. For example, when a host

writes a data record to the disk, the fast cache memory in the storage frame receives the record

and quickly tells the host that the data was received thereby allowing the next record to be

processed with minimal delay. Without cache, the record would be sent directly to the relatively

slower disk drive before the next record could be processed. In some systems, host-cache

combinations yield response times of 1ms whereas without cache, that time could be 4ms-

15ms+. In both instances, the record gets where it is going. With cache, the delay to the host is

at a minimum since it is shielded from the milliseconds it takes for the drive(s) to perform

mechanical movements. When cache starts to fill up or when there is idle

time, cached data is written to the correct disk – this is called destaging. If

your applications need high performance, spending money on more cache

can in fact save money since it allows you to use slower performing drives

in the storage frame. Cache is usually supported by a battery system to

prevent data loss if power is lost before the data has been destaged.

When a server gets a record from disk, some storage frames use algorithms and cache to

predict which record is needed next in advance of the actual request. It is like a parking valet at

a high-end restaurant who knows you completed your meal and brings your car to the front door

before you hand him the claim ticket.

Page 9: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 9

Applications with a sequential read profile derive the most benefit from an intelligent cached

system while, while random read profiles are less likely to benefit from cache and instead rely

on the physical disk speed for performance. Cache works wonders on both sequential and

random writes. A well-tuned cache system can easily achieve a success ratio of 90% or more.

Midrange frames typically have less cache memory (usually 1-16GB or more) than enterprise

frames (usually 16GB-512GB or more). Cache effectiveness is tied to the sophistication of the

algorithms used to predict requests and manage the de-staging of records to the disks.

How Much Data Can a Disk Process?

The disk drive is the building block of the storage frame, the slowest part of the frame. To gauge

performance, the industry uses the terms IOPS (Input/Output operations Per Second) and

SDTR (Sustained Data Transfer Rate)13. Note: These metrics become theoretical when systems

have intelligent caching systems and other methods to boost performance.

IOPS is the key metric for applications performing a lot of small block, random transactions,

such as e-mail or OLTP (e.g. retrieve your name from a database). SDTR is the key metric for

applications that perform very large transfers, such as archival, imaging and backup.

When a small number of records are read from or written to disk, the

mechanical movement of disk head (seek) and the rotational speed of the

disk platter determine how many transactions the drive achieves. The more

IOPS a drive can handle, the more transactions it can process. Faster platter

rotation speed and less head movement are goals for optimal performance.

13 Sometimes called the STR (Sequential Transfer Rate)

Page 10: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 10

Here is a formula that can be used to calculate IOPS:

IIOOPPSS FFOORRMMUULLAA: ( ) IOPSlatencyrotationalavgseekavg

=+ ..

1000

EEXXAAMMPPLLEE: A Seagate Cheetah ST3146356FC14 is a 146 GB10 15,000 RPM drive. It has an average READ seek time of 3.4 ms and an average rotational latency15 of 2.0 ms. Divide that sum into 1000 (the conversion of ms to seconds) and the drive has 185 READ IOPS.

( ) IOPSREAD _1850.24.3

1000=

+If we assume that 75%

of the time the drive performs a READ, the formula becomes a weighted average of READ and WRITE:

( ) IOPS1810.2%25*5.3%75*4.3

1000=

++

In the next section, we will look at RAID protection and how it affects IOP performance.

14 “Cheetah 15K.6” http://www.seagate.com/docs/pdf/datasheet/disc/ds_cheetah_15k_6.pdf 15 The average time it takes for the disk platter to rotate ½ turn.

Page 11: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 11

Where IOPS is important for small blocks of data, SDTR focuses on how much data can be

retrieved per second. In terms of wall clock time, IOPS are almost inconsequential when

transferring large blocks of data. Once the disk head positions itself over the data, it is not how

much time it takes to get the first record, but how fast it retrieves hundreds of megabytes of data.

With large blocks of data (not typical for Microsoft Windows), SDTR takes into account the

transfer of the data, head movement overhead, rotational delay and disk controller overhead.

Be careful when examining the SDTR because some vendors do not disclose all the necessary

information. Some vendors quote the SDTR from the denser outer

edge/zone of the rotating disk platter which yields the highest rates,

and others provide information on the less-dense inner edge of the

platter16. By averaging the two extremes, you have the average SDTR.

In the end, if the data is on the outer edge, the rate is higher than from the inner edge.

In this example, the light pink (FC or SAS 15K RPM) drives can read a 1TB10 file in 7 seconds

with the light yellow (FC or SAS 10K RPM) and light blue (SAS or SATA 7.2K RPM) drives

taking almost twice as long with 12-13 seconds. To calculate the time, divide 1TB10 by the

Average SDTR. The IOPS calculations are included for applications performing small block,

random operations – FC/SAS 15K RPM have the higher IOPS, FC/SAS 10K RPM have about

1/3 fewer IOPS and SAS/SATA 7.2K RPM have almost 60% less IOPS than the 15K drives.

16 Wondering how the outer edge of a platter can be faster than the inner edge? The platter is made up of concentric circles. The circles at the outer edge are longer in circumference than the ones on the inner edge. Hence, there is more data on the outer edge than the inner edge. Given the platter spins at the same rate, the disk head reads more data from the outer edge than from the inner edge. The diagram is from http://www.north.ecasd.k12.wi.us/departments/tcs/Web%20Pages/Powerpoints/328,16,Drive Operation

Page 12: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 12

As mentioned earlier, these calculations are from components of a larger storage frame. In the

real world, they are merely indicative of the frames ability rather than hard and fast calculated

results. For example, your frame might have hundreds of drives, dozens of applications, various

block sizes, random/sequential access, some doing reading or writing at a given moment, and

then layer on the improvement from storage cache, buffers, controller overhead, etc. - you get

the idea. The best way to gauge performance is to run all of your applications at the same time

on the best layout you can afford and measure the “wall clock” time of a given application.

Frames using the best components are typically the fastest. But to save money, you need to

know the level of performance that your application needs. The cost difference for one hundred

146GB10 15K RPM FC drives is much higher than fifteen 1TB10 7.2K RPM SATA drives. Keep in

mind that running a fast database application on 1TB10 SATA drives will likely result in poor

performance. And as we will see in later sections, there is more to cost than just the drive – i.e.,

one hundred drives take more power, cooling and floor space than fifteen drives. Picking the

right tier of drives for each application makes economic sense.

RRUULLEE OOFF TTHHUUMMBB: To achieve consistent high performance, spread your workload over as many drives as you can and do not focus an entire application on just a couple of drives. By spreading the workload “wide” and sharing parts of the disk with other applications, you benefit from a greater IOP and SDTR capability.

RAID Protection

RAID protection is mandatory for storage architectures because it guards against catastrophic

data loss. We need redundant drives since hard drives fail. First documented in 1978, RAID

caught on in the mid-80’s as a way to reduce the impact of drive failure using redundant drives.

Seagate publishes MTBF (Mean Time Between Failure) specifications of 1,420,000 hours (161

years) for their Cheetah drives. In contrast, a Google17 analysis involving over 100,000 hard

drives found failure rates ranging from 2.5% in the first 3 months to 8% in the first 2 years18.

Who is right? It is not a matter of right or wrong, but how you deal with the situation.

17 “Failure Trends in a Large Disk Drive Population”, http://research.google.com/archive/disk_failures.pdf

Page 13: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 13

Fortunately, many midrange and enterprise storage frames can almost predict drive failure

before they fail. They employ counters for “seek errors”, “CRC errors”, and “bad spot

remapping” disk events. If a counter reaches a threshold, the system issues a service alert and

a spare drive automatically copies data from the suspect drive to the spare before it completely

fails. Note: the copying can take hours or more to complete for small 146GB10 drives and as

much as days for larger 1TB10 drives. With higher end systems, the

automatic notification “Batman Signal” (via a modem, e-mail home or

other method) notifies your service vendor that a spare drive is being

used and the frame requests maintenance. [I still enjoy hearing from a

customer that a technician came to their office with a replacement drive before they had a

chance to open a service call.]

When the failing/failed drive is synchronized with the spare, either you or the technician can

replace it. In some systems, the replacement drive takes over the exact same function as the

failed drive, resulting in a second copying session. Other systems insert the replacement in the

spare drive pool. There are pros and cons to each approach, so review this with your vendor.

There are different levels of RAID protection, each with a profile

designed for certain levels of risk and performance. RAID-1 uses

an extra drive for protection – your primary storage is on the “left”

drive (data A in the picture) and its exact copy on the “right” drive. It

is used by applications needing the best performance and highest level of protection. It is also

the most costly scheme. Most vendors implement RAID-1 to allow the data to be retrieved from

either drive based on which disks’ head is closer to the correct data, resulting in twice as many

READs as RAID-5. WRITEs are also much faster than RAID-5 because there is no need to

calculate parity19 across the drive pair.

SSTTOORRAAGGEE TTIIPP: RAID-1 has the highest performance and greatest protection, but comes at a higher cost (e.g. - for 100 source disks, you need 100 additional protection disks). Recommended for portions of demanding databases.

18 Equally important, they noted that “Temperature is often quoted as the most important environmental factor affecting disk drive reliability.” “...after their first scan error, drives are 39 times more likely to fail within 60 days than drives with no such errors. First errors in reallocations, offline reallocations, and probational counts are also strongly correlated to higher failure probabilities.” 19 Parity is used to check for data encoding errors and to correct them

Page 14: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 14

RAID-5’s approach yields more usable storage and is known for

good performance, good protection, and can be used by most

applications at a reasonable cost. The protection comes from

spreading parity across alternate data drives. If a single drive in the

group has a problem, the parity data and surviving disks are used

to reconstruct the data on a spare drive. I use RAID-5 (4+1) to

represent 4 data drives and 1 parity drive. RAID-5 is not advised

for larger drives such as 1TB10 since it can take more than a day to duplicate a bad drive with a

spare and you don’t want to risk a second failure in the same group (a data loss scenario).

SSTTOORRAAGGEE TTIIPP: RAID-5 provides good protection against data loss, but it requires another disk to protect a group of original disks. RAID-5 (4+1) protection for 100 source disks requires 25 additional disks. Protection schemes run from a 3:1 (3+1) to a 15:1 (15+1) grouping. Writing to (3+1) is faster than (15+1). Recommended for most situations.

As noted above, RAID-5 with 1TB10 and larger drives could expose your business to an

unacceptable level of risk due to long rebuild times on failed drives. To prevent a catastrophic

second failure during a prolonged rebuild time, use RAID-620 and its second parity drive to

guard against two simultaneous drive failures in a single group. With RAID-6, you might have 6

data drives and 2 parity drives. Unfortunately, the additional parity results in a small

performance penalty – it is slightly slower than RAID-5 when it comes to WRITE performance.

Ask your vendor about this reduction because “a RAID controller can suffer a 20 percent drop in

overall performance in RAID 6 compared to a RAID 5.... “21 There are pros and cons to every

choice. If you believe as I do that capacity will continue to grow and that drives are a commodity,

then RAID-6 is a requirement because no one can risk losing data.

SSTTOORRAAGGEE TTIIPP: RAID-6 provides superior protection against double drive failures in a RAID group. Use with large drive sizes due to very long rebuild times of a failed drive. RAID 6(12+2) has 12 data drives worth of capacity and 2 parity drives for an 85% usability ratio. I have seen RAID-6 range from (4+2) to (14+2).

20 Network Appliance has a proprietary variant called RAID-DP for double parity. 21 http://www.enterprisenetworksandservers.com/monthly/art.php?1754

Page 15: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 15

RRUULLEE OOFF TTHHUUMMBB: The bigger the RAID grouping, such as RAID-5 (8+1) versus (4+1), the longer the rebuild time for a failed drive. That increases risk. RAID-1 has the fastest rebuild time.

When designing for ultimate performance (versus capacity) to meet demands of intense

databases, etc., consider the benefits of RAID-1 over RAID-5/6 (and the use of either SSDs or

15K RPM drives). RAID-1 typically allows an application to retrieve data from either drive of a

mirrored pair; it doubles the READ performance compared to RAID-5/6. Most applications read

more than write, so RAID-1 could be important to your design. The tradeoff is cost. For example,

you get 4 usable drives by purchasing 8 drives for use with RAID-1 – the same usable capacity

by purchasing 5 drives and using RAID-5 (4+1). We will examine the cost equation a little later.

In the last section, we discussed IOPS at the individual drive level. With RAID protection, the

impact of IOPS is further magnified. When a drive performs a WRITE, it first places the data on

the drive and then reads it back to make sure it was written correctly (2 IOPS). When you take

into account that the platter must rotate to position itself under the drive head, a WRITE I/O

becomes much slower than a READ I/O. With RAID-5, the parity must be re-calculated and

written to the disk as well. This means you have the original write (2 IOPS) plus another 2 IOPS

for the parity update – 4 IOPS to do a single write (called write penalty) versus 2 IOPS for RAID-

1. RAID-6 has even more overhead because of the double parity calculation. That is why, for

example, the Microsoft Exchange 2007 design calls for RAID-1 over RAID-5/6. You would need

to purchase more drives with RAID-5/6 for the equivalent performance of fewer RAID-1 drives.

True Usable Capacity

Hard drive capacity continues to increase and the cost per GB continues to decrease. On

January 27, 2009, Western Digital announced the first 2TB10 drive22 with 3TB10 drives expected

in 2010 and 12TB10 in 201423. Please don’t jump to the

conclusion that you could meet your company’s 200TB10

storage requirement by buying just 100 of the new 2TB10

drives. This plan, after formatting, accounting for spare

22 http://www.westerndigital.com/en/company/releases/PressRelease.asp?release={01D0EF49-E149-410A-A173-F872D0E6C335} 23 HDD Performance Trends http://www.gizmag.com/go/6176/

Page 16: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 16

drives, and RAID-6 (using between 4+2 to 14+2 groups), would leave you with between

115TB10 to 151TB10. It would also run poorly for applications needing a lot of performance. Let’s

take a look at some examples of usable capacity.

Chuck Hollis24 said in his blog that the EMC CLARiiON® can yield a 70% usable/raw storage

ratio. He said that every CLARiiON had hot spares, snapshots, RAID Parity and overhead

before you get to usable capacity. This following chart formed the basis of his argument.

Let’s take a look the CLARiiON in more detail (other

platforms yield different calculations). Snaps take a

percentage of usable space to make pointer-based copies.

Let’s assume 10% in this case. EMC recommends 1 spare

for every 30 usable drives. This example uses RAID 5

(8+1) for 88% efficiency. The disk format reserve includes

categories such as the number conversion from “base 10" gigabytes to “base 2” gigabytes. The

vault reserve is for storage frame housekeeping. The CLARiiON also has a 1.5% overhead

because it uses 520 byte records (8 bytes of administration for every 512 byte record). This

customer would purchase 171x146GB10 drives and have 120 drives of usable storage.

As mentioned above, converting from GB10, to GB2 accounts for almost a 7% drop in usable

capacity because the number base is different. For example, a Seagate’s 300 GB10 drive =

300x1,0003 bytes = 300,000,000,000 bytes while your application, based on the binary system,

expects 300x10243 bytes = 322,122,547,200 bytes, ergo a loss of 7%.

Setting aside the marketing message, I used Hollis’s approach to calculate usable versus raw

storage for various configurations. The red-pink area of the following spreadsheet compares the

use of RAID 5 (4+1) (5 drives, 4 data, 1 parity) against (7+1).

24 Chuck Hollis is a Vice President and Global Marketing CTO at EMC Corporation. http://chucksblog.typepad.com/chucks_blog/2008/08/your-storage-mi.html

Page 17: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 17

The

excel

form

ulas

for

colu

mn “B” are to the left. In the yellow area, the calculation uses 248x146GB10 drives with RAID 5

protection (either 5 or 8 drives). RAID 5 (7+1) has a usable to raw ratio of 77.1% without snap

space and 69.4% with a 10% snap area. Your application could use 27,901GB2 of space for

every 248 drives purchased (36,208GB2 raw). While 7+1 is more efficient than 4+1, it may not

be the right choice with a lot of WRITE activity.

In green, you can see that RAID 5 (14+1) is more efficient than (9+1). In blue, RAID 6, (14+2) is

more efficient than (6+2). To achieve the highest storage efficiency and lowest cost without

regard to WRITE performance, use the RAID level with the lowest protection ratio – i.e., RAID 5

(14+1) has less overhead

than RAID 6 (14+2). [To

display formulas in Excel,

press “CTRL + `”.]

How much of a price

premium are you willing to

pay for the added protection

of RAID-6? As mentioned

earlier, the extra protection

RAID-6 provides costs more

and offers less usable

capacity. From a power

profile, RAID 6 (6+2) and

RAID 5 (7+1) are the same because they both use 8 drives. However, with 248 drives using

RAID-6 (6+2), you would have a TRUE USABLE CAPACITY of 44,226 GB2 while the same

Page 18: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 18

number of drives using RAID 5 (7+1) yields 51,597 GB2, or 16.6% more space per dollar.

Another way to look at this is you would buy fewer drives using RAID 5 than with RAID 6. To

reach the 44,226 GB2 capacity, you need to buy 216 drives (yields 44,932 GB2 ) and 2 less disk

enclosures25 for a savings of $21,932 ($174,398 - $152,466). If you are budget conscious, this

can be significant. Spend the extra 16% if RAID 6 protection is important.

Unfortunately, none of these calculations consider what I call a high water mark (also called

“white space”). Never fill an active file system in the real world (higher utilization is possible with

static or archived data). Most experts agree on a high water mark of 85%-90% of capacity to

allow processes such as defragmentation to work efficiently.

So from the chart, using 5x146GB10 drives in

a RAID 5(4+1) group would yield 22,958GB2.

With the reduction of another 15% for a high

water mark, the true usable capacity becomes

22,958x.85 = 19,514GB2, so your

248x146GB10 drives would yield only 79GB2

per drive or 54%!

With saving money in mind, let’s look at

possible cost savings by using a denser drive.

Based on publicly available prices26, a

146GB10 15K FC drive costs $294 versus $501 for a 300GB1015K FC drive. Because the chart

shows we have almost double the number of drives using 146GB10 as 300GB10, we see there is

a direct drive savings and a corresponding reduction in the 15 disk array enclosures. The total

savings is $34,388 ($123,062 - $88,674), or 28%. That does not take into account

power/cooling savings, or the reduced footprint and maintenance costs. Using a larger number

of members per RAID group increases the savings.

25 On 1/21/09, SanDirect.com CLARiiON Disk Array Enclosure=$2950 26 On 1/21/09, CDW.com 146GB 15K FC=$294, 300GB 15K FC=$501

Page 19: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 19

From our earlier discussion, we know a 15K RPM drive has about 180 IOPS. Let’s contrast the

IOPS of 240x146GB10 15K with 120x300GB10 15K drives (ignore the hot spares). With twice the

number of drives, the 146GB10 drives yield twice the IOPS (240x180 = 43,200 IOPS versus the

300GB10 drives of 120x180 = 21,600 IOPS). If performance is more important than cost, select

the 240 drives over the 120 because they will do twice as many I/O operations. You might save

even more money using faster SSDs. While you want to save money, don’t buy a brand new

“slow” system; buy the right level of performance. I suggest consulting with your vendor’s sales

staff for help with this RAID/capacity/pricing process.

Platform Basics and Scalability - Which Type Of Platform Is Right For You?

Storage is housed in

enterprise, midrange or

JBOD27 frames. This chart

shows the profiles of each

type. Picking the right frame

can save money in the long-

term, so let’s look at the

characteristics of each:

o Enterprise class systems are known for their ability to handle multiple server types and

operating systems including mainframe, AS/40028, and open systems platforms such as

Microsoft Windows (from Dell, HP, IBM, etc.), Sun Solaris, IBM AIX, HP HP/UX, Linux

systems, and others. Midrange systems, also called “departmental systems”, support

open systems hosts. JBOD systems are tied to a single, particular open systems host.

o Enterprise systems cost more than midrange and can usually sustain the failure of a key

component without impacting performance. Loss of certain critical components in a

midrange frame could cause a dramatic loss of storage processing power.

o If you need dozens of front-end host connections to support hundreds of servers, then

you would favor enterprise systems over midrange systems. Enterprise systems also

have more system cache and hold more drives than midrange systems.

27 DAS - A disk drive internally in a server or externally in an enclosure with multiple disks using point-to-point connectivity. Typically low-end storage without value added intelligent controllers. Also called J.B.O.D. (Just a Bunch Of Disks) 28 AS/400 is an IBM server that started out as a minicomputer in the late 1980’s and has evolved into a higher performance platform. Name changes along the way have included iSeries and recently System i.

Page 20: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 20

o Enterprise frames typically support SSD, FC and SATA drives while midrange use SSD,

FC, SAS and SATA. Over time, SAS drives may replace FC drives leaving us with SSD,

SAS and SATA (see the section “The Future – Reality or Fiction?”).

Connectivity and Storage Protocol Basics

Servers connect to the storage frame in a variety of ways and at a variety of costs29.

o With JBOD frames, you typically need a SATA, SAS or SCSI host controller. These

controllers start at $100 USD, $200 and $100 respectively.

o IP30 is often used with midrange and some enterprise frames and is associated with NAS

(Network Attached Storage). IP uses the iSCSI31 (block mode storage) for applications

that rely on SCSI drivers, or CIFS32 and NFS33 (both file mode storage allowing shared

file access) for file systems found in local area networks. NAS uses “blue” Ethernet cable

to connect servers to storage using IP. It uses the IP port built into the server or for

higher performance, an iSCSI/TOE34 cards. These TOE adapters start around $550.

CCIIFFSS EEXXAAMMPPLLEE. If you are using a file through a drive letter that is not local to your computer, such as a G:\myfile.doc or N:\myfolder\myfile.doc drive letter, it is probably on a remotely attached file server or NAS storage frame.

o FC is typically used on midrange and enterprise frames, and sometimes on expensive

JBOD frames. FC typically uses fiber optic “orange” cable with speeds up to 8Gb/s (800

MB/s). It is best suited to high performance, high consolidation

and high server count environments. FC uses either a point-to-

point DAS (Direct-Attached Storage) connection model or

networked through SAN switches and directors, such as those from Brocade or Cisco.

FC supports the block mode storage protocol used by applications relying on SCSI

drivers such as a database. It is common to see FC storage frames using switches

29 www.cdw.com 30 Internet Fibre Channel Protocol is an internet protocol that carries fibre channel packets within IP packets. 31 iSCSI (Internet Small Computer System Interface) is a block-mode storage protocol using IP cables 32 Common Internet File System. Typically found with Microsoft Windows to allow files to be shared between servers and storage. Can be thought of as a drive letter beyond local physical drives, such as G: or H:. 33 Network File System. Typically found with UNIX or Linux operating systems to allow files to be shared between servers and storage. Can be thought of as a mount point in a tree structure. 34 TOE (TCP Offload Engine) offloads the significant server CPU TCP overhead to a network card similar to an HBA.

Page 21: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 21

supporting hundreds of servers. A host connects with a pair of high throughput FC HBAs

(host bus adapters), such as those from Emulex and Qlogic. HBAs start around $350.

o ATA35 is not typically used to connect servers to external drives.

o FICON36 and ESCON37 connect mainframes to a storage frame.

Backup/Recovery, Archiving and Deduplication

There probably isn’t much you can do to decrease the amount of data your company creates or

processes, but you can reduce its impact and spend less money.

With data volumes growing so quickly, have your backup times increased dramatically? And

since you can’t add time to the backup windows, did you add more, expensive tape drives to

gain additional backup parallelism? Something has to give. So here is a three pronged backup

containment strategy that will save money.

This graph, from EMC’s Forum 2008 event,

illustrates a process based on archiving,

deduplication and disk-based backup/restore.

The black line shows your data growth.

Archiving old data that has not been used in a

long time to a platform that does not have to

be backed up ever again is shown in dark blue. Archiving shrinks your primary storage and

gives you less to backup every day or week. Then deduplication, in green, reduces the amount

of data to backup, either at the file or sub-file level. This can shrink the data stored on your

primary drives and further reduces the amount you back up. Lastly, use a more efficient disk-

based backup approach as shown in light blue. Let’s take a look at each layer in more detail.

35 ATA - Advanced Technology Attachment is a method to connect the drive to a controller over a 40 wire ribbon cable where all signals are sent in parallel. Also called IDE (Integrated Drive Enclosure) or EIDE (Enhanced IDE). Transfer speed is 133/MBs. 36 FICON (FIbre CONnectivity) is a newer fibre-channel connectivity for mainframes. 37 ESCON (Enterprise Systems CONnection) is an older fibre-channel connectivity for mainframes.

Page 22: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 22

Archiving

Archiving is the first step to contain data growth. From an earlier diagram, 75% of your data may

not have been accessed in the past 180 days, yet it is included in every full backup, week after

week. Archiving takes less time, fewer tapes and tape drives, lower network utilization and lower

cost to transport tapes off-site.

Archiving is accomplished through software that inserts a pointer in place of the unused or

under-used data. Should the data be needed again, it can be automatically, non-disruptively

returned to its original location. The data is placed in an archiving appliance (or low-cost WORM

system) to hold the data the pointers refer to. Data stored in the appliance is also deduped and

compressed to save space. Some archiving appliances support geographic replication so you

never have to back it up again.

E-mail archiving is a great place to start – think about how often you access 6 month old e-mail.

Vendors like Symantec suggest they can reduce “...the online message store by 50-75%...38”.

Archiving has other values for your corporation such as in data retention and compliance.

Please see the section titled “How Long Do You Need To Save The Data” for more information.

Deduplication

Data deduplication is the second step in your management strategy. Deduplication of data is a

simple concept - eliminate redundant data at the lowest level possible such that multiple copies

do not occupy any significant amount of storage, and have the data non-disruptively

available when needed. In many ways, it is like a can of Campbell’s condensed

soup – a majority of the water was removed at the factory so it costs less to ship

and takes less space in the grocery store, etc. Adding water and heat reconstitutes

to its original form. With deduplication, when it is needed in its original state, it is “rehydrated.”

38 http://www.symantec.com/business/theme.jsp?themeid=globalsem_enterprisevault&header=0&footer=1&depthpath=0&tab=1&om_sem_eid=Business.com&om_sem_cid=biz_sem_Enterprise_Vault_US_English&om_sem_adid=Category_-_Email_Archiving&om_sem_kw=email+archiving

Page 23: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 23

Deduplication allows your budget to survive data sprawl. With less data to store, your primary

storage requirements remain stable or may even shrink. It takes less time and requires less

magnetic tape (or its equivalent) to backup deduplicated data. There are also environmental

savings from fewer drives, meaning less power and cooling. Network circuits may be reduced

because they have less data to transmit.

The first time target deduplication is used, the amount of data will not be reduced as much as it

will the second time and thereafter. For example, a 200TB10 backup up with a 20x deduplication

ratio will likely yield a first backup of 150-180TB10 and not 10TB10. The next backup of 200TB10

could be saved in much less space - perhaps 10TB10 or even 1TB10 given most of the backup

data is already on the target device. In contrast, data compression only substitutes repetitive

patterns with smaller unique patterns and has no idea what was backed up yesterday. The

magazine eWeek observed39 that data compression typically achieves a 2-to-1 reduction while

“...file deduplication might yield a 3-to-1 or 4-to-1 reduction.” Data deduplication “...at the level of

individual disk blocks...” can yield ‘...a 20-to-1 reduction or better....”

Deduplication occurs on many levels, the first being file duplicates

(such as when e-mailing PDF copies of this white paper to your

friends – clearly, it is non-changing, yet we save a copy which is then

backed up over and over again) as shown in the diagram40. It could

also recognize and shortcut duplicate words and phrases within a

document, such as the number of times I write “storage” or “disk” in

this article. There are many variations on this technology.

It is difficult to predict deduplication savings since they are data dependent. Text files are a

great example where deduplication shines and some studies point to 500:1 while 10:1 or 20:1

seems to be typical. Run a vendor sponsored study for a better estimate.

Deduplication is not for everything – if the processing overhead of source de-duping is too high,

it may not be worth it. Also, some processes such as encrypted file systems or making a tape

copy of a deduplicated file system (it would need to be rehydrated first) may not support it.

39 http://www.eweek.com/c/a/Database/What-Is-the-Difference-Between-Data-Deduplication-File-Deduplication-and-Data-Compression/ 40 http://www.dell.com/downloads/global/power/ps3q08-20080379-DataStorage.pdf

Page 24: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 24

Where deduplication should be used in an architecture is a subject of debate. Some vendors

advocate deduping production data, some from the server side when used for backup, and

others on the data in the virtual tape library (VTL). Then there some products that dedupe as

data enters into the backup device (in-band) and others after the backup completes (out of

band). It is difficult to say which one or combination of approaches is best for you, so do due

diligence, ask questions, and request demonstrations.

Lastly, deduplication saves bandwidth when the data is replicated offsite. During transmission,

only a pointer to the reduced data is sent and not the original, larger block.

Backup and Restore

With stale data archived and the rest deduplicated, there should be a lot less to back up every

day. It may allow you to keep your tape backup status-quo or even to modernize it. There have

been many advances in magnetic tape

technology, and while the tape drive death bell

is rung by some storage vendors, the tape

industry survives. The promised capacities and

speeds of LTO-5/6 and DLT-S5/6/7 are

amazing and seemingly just around the corner.

Another option uses backup-to-disk (B2D) or VTLs (with deduplication and remote replication).

Using disk instead of tape accomplishes faster backups and dramatically faster restores than

with a conventional tape library. B2D can utilize your frame’s low cost SATA 1-1.5TB10 drives.

With your primary backup on disk, you can make a 2nd tape copy or remotely copy the disk

backup to another frame. Restoration is from the local or remote disk. Unlike tape, disk backup

can not be starved for data (causing a streaming magnetic tape to stop and start). An example

of deduplication and backup was cited by Computerworld41 in 2007 “...John Thomas, IT

manager at Atlanta-based law...” used deduplication to “...cut his backup window from 11 hours

to 50 minutes.” That is a significant savings on so many levels!

41https://www.computerworld.com/action/article.do?command=viewArticleBasic&taxonomyName=SAN&articleId=9011622&taxonomyId=147&intsrc=kc_li_story

Page 25: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 25

Risk Avoidance and Disaster Recoverability

If a key system fails, will it cost your company $10,000/hr or more? If your staff can’t get to the

office because of an ice/snow storm (or even hurricane Katrina which shut down New Orleans

for weeks), would you be out of business? You can take steps to avoid or reduce that risk.

Use your storage frame to make a full clone copy or a “copy on first write” snapshot of the data

to protect your primary data locally. You can restore from a clone/snapshot or use it as input to

your backup media server. Let’s assume you made a clone of your production data at midnight

and then used the clone on a backup media server to make a backup tape. With a significant

data loss at 8AM, it is faster to restore back to midnight using the clone rather than a backup

tape. The clone/snap can also be used as input data for another application, such as a data

warehouse or a reporting program. Please remember that clones and snaps take disk space,

and you would need to use RAID to protect them.

Your company could also be impacted by accidental or intentional corruption. Your best

approach is to roll back your operations on the affected system(s) to a known, healthy point in

time such as midnight in the previous example, then roll forward with as much good data from

transaction logs as you can safely and timely ascertain. You may want extra servers or disk

space to evaluate the current state against the restored state if the corruption was extensive. As

you repair the corruption, your production systems may still be running. You may instead want

to follow your disaster recovery plan if the repair is going to take too long.

The storage frame architecture is another consideration if high availability is paramount. Some

frames operate in a grid-like fashion where a single failure will not likely slow down any

applications. Other frames use an active-active arrangement where a critical failure can cause

all of the work done by one side to be taken over by the surviving side. For example, a storage

controller failure on an active-active frame would shift the work to the remaining controller which

already has a load – i.e., storage requests will likely slow down. Lastly, given that most storage

frames permit non-disruptive repair, either design can be repaired without downtime.

Page 26: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 26

To protect against a site failure, you want a disaster recovery design that places a second

storage frame (with a replicated copy of your data) and servers at a geographically remote

location. Each vendor has their own remote replication strategies, some using synchronous

replication (for EMC’s SRDF42, the distance is less than 200 kilometers or 124 miles), and some

using asynchronous replication over unlimited distances. Marketing names include HDS’s

TrueCopy and IBM’s PPRC/Global Mirror. Enterprise class storage frames typically support

more concurrent replication than midrange frames. There are many ways to replicate your data

and numerous nuances, so please consult with your storage vendor for the options and prices.

Use a specialty company such as SunGard, IBM, HP, AT&T and others to deploy geographic

duplication of facilities while saving money on a second site. They can provide professionally

staffed sites, duplicate equipment, space for your staff, and experts who can tailor a solution to

your needs. This protection is useful for natural and manmade disasters, and should take into

account that your staff may not be physically able or willing to get there. Don’t forget the plan to

return from the disaster when your company is ready to resume normal business operations.

Also, it is unlikely that you will you need to replicate all of your data, only the data essential to

your operations – this should come out of your classification work.

An interesting disaster recovery approach uses virtualization. At “the touch of a button,”

applications move their active state to an alternate data center to resume processing, and then

return when your primary site is back to normal.

Consolidate, Virtualize and Thin Provision to Save Money

You designed your IT environment, but over time, has it grown into an unstructured tangle of

patches that need to be strategically designed? Do you have underutilized servers or storage or

other components that if consolidated, could save money by delaying the next purchase of

similar equipment, or save on staff, power, and cooling and floor space? Can two databases on

different servers run on a single server? Can multiple virtualized applications run together on a

single physical server?

42 Symmetrix Remote Data Facility. Data is physically mirrored between two or more geographic sites. Sadly, one of main impetuses for EMC to create SRDF was the 1993 World Trade Center bombing.

Page 27: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 27

Virtualization and products such as VMware’s ThinApp can save storage by using a single

image of an operating system or an application and doing away with individual images for each

virtual desktop, thereby reducing storage footprints.

You may also have high performance databases using storage layouts designed to maximize

performance. In reality, they may not need dedicated high performance components because of

the cache efficiency of the storage frame or lower than expected transaction rates. In this case,

you may be able to share 15K RPM drives with other databases that need log devices, etc.

Lastly, just a few SSDs can easily offset the need for dozens of mechanical 15K RPM.

Virtualization is not a “free lunch.” It may require servers with more memory, faster CPUs, and

virtualization software, but think of the impact of reducing ten servers with 10% utilization rates

into a single server! When you consolidate servers, you will have lower LAN costs, fewer

storage connections, and a simpler backup. You may also find ways to consolidate storage. You

can then use the unused servers for your seasonal peak processing requirements.

To start consolidating, use a server inventory coupled with some type of time mapping – i.e.,

consolidation candidates could be a SQL database that is heavily used on weekends and

different applications that are only used on weekdays. By combining them, you would gain from

having a single server, less licensed software, less storage, less software patch time, etc.

The last part of the consolidation story involves thin provisioning. Some database administrators

don’t want to repeatedly ask for more database storage, so they may ask for more than they

need. Conversely, storage administrators use tools to determine that databases are using a

fraction of the storage allocated. That’s where thin provisioning

comes in. Thin provisioning lets an application believe it has as

much storage as it needs while it really has only a fraction of

that amount. A pool of unallocated storage offers automatic,

non-disruptive storage to applications. This helps you to avoid

buying more storage than you really require. The diagram43

summarizes how it works. Expect each vendor to have implementation nuances.

43 http://www.compellent.com/Products/Software/Thin-Provisioning.aspx

Page 28: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 28

What Are The Operating Costs Of Your Storage Frame?

Vendors like EMC44 and HP45 offer power usage calculators so you can determine electrical and

heat consumption to calculate annual operation costs. Use these to examine the TCO for each

solution – i.e., a frame with a higher price tag may have lower environmental needs and cost

less in the long run than a lower priced unit. Here is a configuration comparison I created using

data from the

EMC and HP

calculators:

(The yellow cell B4 was

not provided – it is

calculated using the formula KWh=BTUh/3414)

Operating the EMC equipment in West Virginia would cost 5.66¢ x 1.8435KWh x 8760h/y =

$914 a year while the HP equipment would cost 5.66¢ x 1.8574KWh x 8760h/y = $920 a year.

(In Hawaii, it would cost 5x more!) The table shows that the units have almost the same annual

cost. To find the cost of electricity, I use this site http://www.neo.ne.gov/statshtml/115.htm

44 http://powercalculator.emc.com 45 http://h30099.www3.hp.com/configurator/calc/eva4400.xls The full catalog of power calculators can be found at http://h30099.www3.hp.com/configurator/calc/Power%20Calculator%20Catalog.xls

Page 29: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 29

While you can use vendor calculators, you can also build your own simple spreadsheet based

on vendor specifications. Here is an example where I determined the power and cooling

requirements of a CLARiiON® CX3-80 with 400x300GB10 15K FC drives. Enter your information

in the yellow-blue cells. I have added comments in column C for your benefit.

With minor

spreadsheet

changes, you can

compare the

operational cost

between a solution

with 300GB10

($20,651) versus

450GB10 drives ($14,632).

The 450GB10 solution

costs $6,019 less to

operate per year or

$18,057 over 3 years!

These calculations are

estimates. For simplicity, I

entered the useful capacity

of a 300GB10 FC drive as

300GB10 and not my “True Usable Capacity” as stated earlier.

The energy efficiency

ratio (EER) measures

how well a cooling

system works when the

outdoor temperature is 95° F. A higher EER means the system is more efficient. Here is a

simple chart using the formula is EER = BTU / Watt.

Page 30: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 30

How Long Do You Need To Save The Data?

It depends! There are probably thousands of international and domestic laws, regulations (and

penalties) associated with what, how, when and why records must be created, stored, accessed,

and retained. Consult your legal department for an interpretation of governance regulations

since non-compliance could put your company in jeopardy.

Do you need to retain all the e-mail created by senior managers or purge it when they leave the

company? If you are a multi-national company, are you required to observe data regulations

mandated by another country? The rules and policies needed on a per-person or application

basis, coupled with international travel, makes the whole problem very complex. Other dizzying

aspects include the mandate to delete or shred archived data once you legally are not required

to keep it anymore – just how do your purge individual e-mail from 7 year old backup tapes? I

included a list of some of the well known U.S. regulations46 and ramifications in Appendix A.

Cutting Costs and Spending Wisely

Operational costs encompass storage management, power, cooling,

floor space and other items. With the cost per terabyte dropping (blue

line) and the volume of data increasing (pink line) along with higher

operational costs (red line), the expense of operating the storage can

easily exceed the cost of buying it!

The trend almost mandates we rein in operational expenses – i.e., how much data do you want

to back up? A 2001 IDC report said the 1997 administrator handled 750GB10 of storage and

1.3TB10 in 200147. They predicted by 2004, the administrator would manage 5.3TB10 of disk.

Well, we have come a long way since that analysis!

46http://www.hds.com/assets/pdf/how-application-optimized-storage-solutions-from-hitachi-data-systems-help-companies-achieve-regulatory-compliance.pdf 47 http://searchstorage.techtarget.com/tip/0,289483,sid5_gci769891,00.html

Page 31: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 31

Consider these four easy rules to cut your existing costs:

1. Consolidation allows you to cut back on storage frames. A single data instance is more

economical than redundant data. Fewer frames save on licensing and environmentals.

Sharing resources such a large tape library costs less than each unit having their own

tape backup system. Administrators are more efficient with centralized resources.

2. Resource management reporting starts with timely information on how storage is used.

Reports are key to consolidation – with higher storage utilization, you postpone new

purchases. Have you purged the MP3s your users keep on your storage system

(because they know it is backed up frequently)? Do you store duplicate information?

Good reporting tools will let you manage the storage you have.

3. Multi-vendor solutions sound great, but when selecting different storage vendors to solve

similar problems, commonality suffers. For example, when you standardize on a single

architecture, your staff has fewer permutations to manage, scripting becomes simpler,

replication techniques are clear-cut, and choosing larger frames to support hundreds of

applications at the same time reduces licenses and maintenance costs. There is

sometimes a lack of interoperability between each vendor’s storage technologies.

4. New technology can easily do the job of older equipment. For example, Brocade has an

80-port switch that does the work of a 4-year old SAN director. ISCSI can connect

servers to storage without a SAN switch. A VTL is an efficient way of doing backup and

restore without tape drives or sending tapes offsite. Storage on demand with thin

provisioning can save staff hours and money. Virtualization allows for fewer physical

servers. Evaluate the technology and determine how it will save money.

Page 32: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 32

So with an eye towards controlling future costs, think about:

1. Certification and Training –When you bring in new equipment and solutions to reduce

capitol costs, your staff needs a working knowledge of the new the tools and techniques

to manage storage growth and reduce operational costs. Education involves cost, but

not educating your staff has a higher price. Reduce education costs by accessing

eLearning, and encouraging cross-training to reduce downtime.

2. Does your staff spend too much time managing your environment? Do daily tasks take

too long? There have been major improvements in provisioning software, software

“wizards”, “self maintaining” systems, and automated tasks such as thin provisioned

storage expansion. Evaluate the vendor’s tools for monitoring, security, troubleshooting,

local/remote replication and recovery, HBA/path management, storage expansion, cache

tuning, LUN creation, SAN zoning, clone/snap usage, tunables, etc. Clearly, training can

reduce the management burden.

3. How much effort is required to return your business to a steady state if there is a data

loss or severe outage? Keep a run book that lists the steps required in the event of a

problem, keep it current, and test it. A key staff member on vacation at a remote island

or on a plane won’t be able to help you, so cross-training is essential. Practice

recovering from a disk or tape to know the steps and how long they will take. Your

storage vendor may have great technical support in major cities, but if your data center

is in a remote part of the country, will they get there in a timely manner?

4. Are you alerted to security breeches; how do you pinpoint the intrusion? Are there

auditable paths showing who is working on your equipment locally or servicing it

remotely? Conversely, do you have too much security? That can be burdensome.

5. Do daily/weekly automated reports on usable capacity, charge-backs, alerts, etc. help

manage your storage? Are reports easy to get? Are they easy to read and customize?

Page 33: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 33

6. Are upgrades and repairs done online? If a component fails, can you repair it safely

during production hours (remember, people make errors) or must you wait for a

downtime window because the repair is disruptive? There are risks with either approach.

7. You calculated floor space, power and cooling requirements during the evaluation phase,

but things change. Run out of space and you will hear “We need more room!” Build your

data center between non-load bearing walls that can be dismantled for expansion space

rather than next to windows and outside facing walls. It is easier to move cubicles than

to expand onto another floor. Avoid under-floor cabling – it is faster and less costly to run

cables above a frame. Also, factor in power and cooling inflation costs.

8. Buying a new system can put a burden on your operational costs as your staff becomes

proficient – select equipment with longer useful life to minimize upgrade cycles. It may

also help to buy equipment that is upgradeable or can be upgraded with data in place.

9. Reduce your monthly network costs through intelligent network compression. There are

solutions that allow you to use an OC3 rather than a more expensive OC12 link.

10. Do you need an FC Storage Area Network? SANs can reduce management costs by

allowing a single person to handle more storage than with JBOD or stove pipe storage.

SAN TCO is great with resources such as a tape library, or when “change” is a daily

event. However, without training, a SAN can be complex.

11. Insist on equipment with low power/cooling requirements to avoid higher monthly bills or

equipping a facility with extra power and cooling (if you can get it).

12. Incorporate archiving and deduplication in your backup strategy. Backing up less data

reduces backup time and saves staff hours. With backup replication, you remove the

cost of tape drives, media, transportation off-site, and off-site tape storage costs.

13. Thin provisioning can help if you tend to underutilize storage assets. Just monitor your

reserve pool. Applications needing more storage are serviced non-disruptively.

Page 34: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 34

14. Professional services can be a wise investment when the work requires special skills or

you have minimal staff. They are highly trained and do not add to your headcount.

To achieve other savings, we are often faced with the “do it now versus later” tradeoff. David

Merrill wrote a white paper “Storage

Economics”48 that said “investment or re-tooling

costs [is] necessary to impact the longer-term

OPEX [Operational Expenses].” He uses three

bands in this diagram to show the impact of

certain changes on operational expenses:

o inner (easy to accomplish changes with

proven savings)

o middle (takes 12 -18 months and yields

soft savings)

o outer (goal or end state which may

require outsourcing to accomplish)

READY FOR A NEW STORAGE FRAME? You have squeezed every little bit (sorry for the pun) out of your old storage frame and it is time

to buy a new one. You have a solid understanding of your environment and the job that the new

frame must do. You are considering a midrange or enterprise frame with 3 tiers of drives and

local clone protection. You create a short list of vendors, solicit product presentations,

demonstrations, and evaluate their hardware, software and service capabilities. It’s time to

make a decision, so let’s examine the next steps.

Some customers consider multiple storage vendors because they believe it gives them leverage

during purchase negotiations – i.e., vendors competing for your business. This can work,

especially in very large IT shops with a large team of skilled professionals. Competition can

reduce your purchase price and prevent a single vendor from limiting your choices.

48 http://www.infoweek.ch/dossiers/files/StorageEconomics_Hitachi.pdf, page 8

Page 35: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 35

However, you may experience the opposite effect in a small-to-medium sized shop because

management complexity and staff training needs can yield a higher TCO. There can also be

delays if the disparate systems need to communicate with each other, or downtime when two

vendors must collaborate on your problem.

Multi-vendor bidding can have intended and unintended consequences. When vendors are

forced into overly aggressive bids, their discounts can increase dramatically, especially at the

end of a business quarter or year-end. However, “best practices” (costing more) can be ignored

resulting in lower performing or under-featured solutions. Example - instead of 150x146GB10

15K drives with great IOP performance, aggressive pricing results in a significantly cheaper,

lower performing set of 73x300GB10 10K drives of equal overall capacity. If a vendor believes

they have a slim chance of winning your business, they may offer a “low ball” proposal, thereby

forcing other vendors to do the same thing.

Lastly, be wary of low bids. Ask yourself how two offers can be so different. Are you getting last

year’s model, less gear, software or service? Will upgrades cost a lot of money? Can the vendor

stay in business if they often win deals with very low prices? Deni Connor keeps a list of storage

vendors who went out of business49 - a harsh reminder to those customers who need service

from a new business. It is up to you to decide on the lowest purchase price or the lowest TCO,

but I suggest you consider more than the price per terabyte.

Some of My Best Deal Tips

1. Understand the full cost of the equipment. These include shipping, installation, wiring,

union labor requirements, taxes, training, insurance, hardware and software support on-

site, after-hours support, warranty extension, tiered costs (based on raw capacity), etc.

2. Can you make your purchase at the end of a calendar quarter or a year? Many vendors

have financial deadlines that can be leveraged to negotiate higher discounts.

3. Volume purchases –bigger discounts may be available with a single larger order than

multiple small orders. Software discounts may be much higher than hardware discounts

because the cost of goods is lower.

49 http://www.networkworld.com/newsletters/stor/2009/010509stor2.html

Page 36: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 36

4. Special offers can save a lot of money. Ask about:

a. Free upgrades, such as faster drives or software titles.

b. Are there any promotions? Example - buy 3 SAN switches, get 1 free?

c. Trade-ins value on old equipment or “sweep-the-floor” opportunities.

d. How about free maintenance? Annual hardware maintenance costs can run as

high as 12% to 18% or more of the equipment list price50.

5. Are there discounts for pre-paying hardware and software maintenance contracts for

your ownership period?

6. How about third party maintenance? Savings of 40%51 are possible, particularly for less

critical storage frames. Be sure to check references and the quality of the parts. If a

problem involves software and hardware, you need to know who to contact.

7. Have you considered a four year lease? Some equipment has a longer useful life. A

longer lease can be an advantage.

8. Ask about upgrade pricing. Upgrades are likely with equipment leased for 36+ months.

Hardware prices should drop, so ask for a discount off list price instead of a price/GB10.

Negotiate upgrade prices, especially with a coterminous lease. Please see the upcoming

section on leasing.

9. Ask for a bundled price from vendors who resell other

vendors equipment (such as servers and communication

gear). Are better deals possible by buying a Brocade

switch from EMC rather than Sun? For example, Dell sells

Brocade, Cisco and some EMC equipment. Here is a

partial list of vendors also sell other vendors equipment.

50 http://erp.ittoolbox.com/groups/select/erp-select/typical-annual-maintenance-cost-207950 51 http://hosteddocs.ittoolbox.com/WDC080505.pdf

Page 37: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 37

10. Buy enough storage or equipment for at least 12 months. Constant upgrades are a

nuisance, cost more, and may lead to a sub-optimal design52.

11. Is 9x5 maintenance acceptable instead of 24x7? It costs less; ask about service options.

12. Thin provisioning can help if your end-users ask for more storage than they need. You

allocate the storage that is really required from a common storage pool and purchase

storage to keep the pool stocked. This keeps utilization very high while minimizing waste.

13. Education is an expense. Will the vendor offer training credits for no additional charge?

14. You may reduce storage costs through tiering - please see the data classification section.

15. When purchasing a used storage frame through a third party or a broker, remember to

check with the support vendor about any “re-certification” fees.

16. Can you reduce maintenance costs by having trained staff do simple hardware repairs?

You may be able to stock common vendor-supplied parts, such as disk drives. Major

repairs still require vendor customer service experts.

Leasing Basics

Saving money on the financial part of your decision can be as significant as choosing the right

equipment, so let’s start with the world of leasing. First, involve your finance department.

Second, the more you know the better your position.

If your company purchases the equipment (and software and maintenance), they will pay cash

(or get a loan) and will own the gear along with a warranty of 36-48 months after which

maintenance fees will be due. The equipment will also have to be disposed of/recycled when it

reaches the end of its useful life (donation, scrap metal, etc.).

52 Example, the fastest storage frame will not be of use if you only buy 2 upgrade drives at a time and buy an entire application on them. You will create “hot spots”. You want to spread the workload over many drives to take advantage of a greater number of IOPS.

Page 38: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 38

Leasing, on the other hand, allows you to make fractional payments over time and perhaps

obtain tax or other benefits. With leasing, your company never owns the equipment – ownership

belongs to the lessor53. With the last payment, the equipment is returned to the lessor,

purchased at a fair market price, or you may have the lessor extend the lease.

MMIIGGRRAATTIIOONN CCHHAALLLLEENNGGEESS: Remember, whether you own it or lease it, your data is on the storage and it presents a migration challenge when you are done with it. You need to account for this expense, such as: -- Will you need loaner transition gear? -- Do you need to perform parallel testing with the old and new equipment? -- How much time will it take and what will it cost? -- Can you do the work during the week or just on weekends? -- Do you need to maintain your 24x7 operations along with a migration effort? -- Will you need a vendor’s “Professional Services” assistance? -- What is the risk to your business if something fails? -- Do you want to erase your data on the storage before returning it? -- Does your staff need training on the new equipment? -- If you have a problem, do you need a back-out plan?

At a high level, there are capital leases or operational leases. A capital lease54 is in many ways,

the same as purchasing the equipment because the equipment’s value is part of your balance

sheet and is treated like both a long-term asset and a long-term liability. Most customers select

a capital lease because the depreciation expense is included.

A capital lease includes one or more of the following phrases55:

o The lessor transfers ownership of the asset to the lessee.

o A bargain purchase option is given to the lessee.

o The life of the lease is greater than 75% of the economic life of the asset.

o The present value of the minimum lease payments is => 90% of the fair market value.

Based on this definition, if the lease is not a capital lease (fails all these tests), it is an operating

lease. With the operating lease, you use the equipment for an “easy monthly payment” but it

does not appear on your company’s books as either a liability or an asset and can not be

53 The “lessor” is the person who does the leasing. The “lessee” is the person to whom something is leased 54 The definition comes from how it is treated on a balance sheet – it is an obligation that has to be capitalized. 55 http://en.wikipedia.org/wiki/Accounting_for_leases_in_the_United_States

Page 39: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 39

depreciated. Some customers like the fact it avoids “hitting the books” and it can result in lower

payments (because the residual value of the equipment has been factored in).

A coterminous lease (capitol or operational) uses the existing end date of the current equipment

as the end date for new equipment added to the lease. A rolling lease gets a new end date for

just the new equipment.

Example – Pretend it is May, 2009 and your storage frame lease started May, 2007 and ends

May, 2010 (you are 24 months into a 36 month lease). You buy some new disks for $10,000

and want to finance conterminously with your May, 2007 lease. The new drives go on the

current 36 month lease starting on May, 2009 with a lease end date of May, 2010. Your outlay is

$10,000 (plus finance charges) for drives that have a useful life to you of only 12 months – i.e.,

$10,000 / 12 months. The new drives and the storage frame are returned to the lessor in May,

2010. With a 36 month rolling lease, you would still have the drive upgrade through May, 2012

or 2 years after the original unit was returned – does that have any value? My recommendation

- discuss substantial upgrades with your vendor for a better deal, or have the vendor structure a

lease buyout as part of a deal to buy a newer system with the added capacity.

LLEEAASSIINNGG TTIIPP: Pay attention to system upgrades when the base equipment is leased. Your company’s finance department can help you decide whether to use a coterminous lease or a rolling lease.

BBEE CCAARREEFFUULL when adding equipment to a coterminous lease because the full equipment cost is spread out over the remaining term.

A lease has three main components – the equipment cost, the lease factor, and the (easy)

monthly payment. These equate to the following math terms:

LLEEAASSIINNGG TTEERRMMSS:

RateFactormentMonthlyPaypmentCostOfEqui = E.g.: $100,000 = $3,000 / .03

RateFactorpmentCostOfEquimentMonthlyPay ×= E.g. $3,000=$100,000x.03

pmentCostOfEquimentMonthlyPayRateFactor = E.g. .03 = $3,000 / $100,000

Page 40: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 40

Let’s try a simple leasing example. The equipment

you want to lease costs $1,000,000, the interest

rate is 7%, you want a 48 month lease, and the

residual value is 10% (or $100,000). You could put

the following into Excel and learn your easy monthly

payment is $21,427.

At the end of a lease, you will likely have these options56:

o Upgrade – Payoff remaining obligation and write a new lease.

o Purchase FMV - Pay asset’s Fair Market Value and own the asset.

o Return – Satisfy the lease payments and return the asset to the lessor.

o Renew - Continue to lease the asset.

What is the useful life of storage equipment? A storage unit has a practical life of 3-4 years.

While it could run for 10+ years, a Symmetrix 3430 from 1999 had at most, 96x18GB10 SCSI

drives and less than 2TB10 of raw capacity – the power profile alone would justify its end of life.

Other equipment could be useful in less critical roles beyond the 4th year. For example, a SAN

switch with a production life of 48 months could be useful to your development organization for

another 24-36 months. You need to weigh support costs against new equipment costs, or pay

for time and materials if it breaks. You can use 9x5 instead of 24x7 service plans or fix it

yourself using spare parts. Older gear tends to breakdown more often, could cost more to repair,

and likely requires more energy than newer equipment.

A Basic Approach to Budgeting

We all know that a budget is based on planning to estimate expenses and income. The goal is

to estimate as accurately as possible. You have completed your acquisition homework; you

know your payments, energy profile, staffing requirements and other variables. Make some

basic assumptions for the (likely) increase in software, maintenance, power and labor costs, and

decrease in hardware costs. Don’t let your estimate fall short or you may have to make do with

less or even delay projects due to insufficient funding.

56 http://www.alanyc.org/docs/455,41,End of Lease Options

Page 41: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 41

IT budgets have two main components, capitol expense (CAPEX) and operating expense

(OPEX). CAPEX is an asset, like hardware and software, that you own for a long period of time,

e.g. 3-4 years. OPEX includes maintenance and wages.

Here are items in each category57:

Break large tasks into smaller phases or milestones to arrive at accurate estimates. PERT58

charts categorize the activity as pessimistic, optimistic, or most likely. Ranges, such as 20-25

hours, are critical to estimation because unforeseen things happen such as:

• snow storms • your home town wins the Super Bowl! • power outages • strikes • software bugs • broken equipment • unexpected impact of feeder projects • the flu • lack of dedicated resources • seasonality/productivity around holidays

Using milestones (in no particular order), account for:

• equipment purchasing • power/cooling installation • hardware/software install/configuration • licenses obtained • training staff • scripts written and tested • readiness of servers and other equipment • migration completed • disaster recovery drills • staff schedules, vacations, staff coverage

The smaller the unit of estimate, the more accurate the budget will likely be. No matter how well

you estimate, it will not be 100% perfect. If this is a repeat of a previous task or your staff is

skilled, the budget should be aligned with actual expenditures.

57 http://www.yesser.gov.sa/english/documents%5CBest_Practices_of_IT_Budgeting_en.pdf 58 Program Evaluation and Review Technique – a project management method to analyze tasks

Page 42: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 42

Understand the critical path of tasks and dependencies. Require written estimates if you need to

bring in a tradesman (such as electricians) or subject matter expert. If you need test equipment,

such as rental servers, don’t overlook their power or staffing profile on your budget. If you need

custom documentation, does that add risk to the project deadline or have any budget impact?

How much testing will be needed for new equipment or process? Do you need a budget reserve

for unplanned activities or risks?

Are tasks being outsourced? Is the work done in your shop or remotely (how will you track

budget expenses)? When employing consultants, is it on a fixed fee (simpler budget process) or

“time and materials” scheme (higher budget risk)? If a deadline is missed or the quality is not

acceptable, it can impact other projects and jeopardize your budget. Minimize the risk by

supervising the work or leverage an outsourcing company that is bidding on future work.

Performing Your Own Maintenance and Save Money on Service Costs

You do not need to be an expert to deal with the majority of break-fix problems with storage

hardware. The most common incident is a drive failure and vendors have made strides in

allowing you to make these repairs yourself. Clearly, you want to know the limitations of your

staff, but with a little training, they can handle the bulk of the work and save you money.

Many vendors offer 2-3 levels of support. The best plan typically is 24x7 with 2-4 hour response

time, a middle plan with perhaps 5 days a week coverage/next business day support, and low-

cost plan that may include remote support and parts for your staff to use. You can pick one plan

for production equipment and another plan for development equipment.

For example, your staff will first open a support ticket to replace a drive in an EMC AX4 storage

unit. Once the technician confirms that the disk needs replacing, you are sent a new drive. The

step by step repair instructions are on line59 with diagrams

describing the activity. No tools are needed. With a little work,

you could save 10%-30% in maintenance costs.

59 http://www.emc.com/microsites/clariion-support/ax45-service.htm

Page 43: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 43

Can You Really Afford to Run 24x7?

Many IT environments must run 24x7 operations, such as municipal power generation, military

operations, police and fire departments, etc., but we often use the phrase without regard to the

cost implications. Running 24x7 requires a budget than can support full redundancy for your

network, servers, storage, power grid, etc., plus 3 shifts of workers 7 days a week (people

generally only work 5 days week). Some systems are not engineered for that rigorous a

workload, so does every component need to run 24x7? For example, Microsoft Windows

desktops tend to need rebooting after applying a patch.

Some of the requirements for a storage system that runs 24x7 could include:

o hot repairs of physical components and non-disruptive software upgrades

o maintaining file systems below the magic 80%-85% of capacity

o keeping extra drives in the frame, having “storage on demand” and thin provisioning

o on site support engineers (very expensive) or minimally on call

o on-site depot with common parts, even for systems with redundant components

o midrange frame replication - critical failures can severely impact processing power

o automated alerting mechanism for software/hardware failures (to aid remote diagnosis)

o dual power feeds for your system from different power circuits

o RAID protection and hot spares

o a disaster recovery plan

I once heard a story from an Air Force Colonel in charge of mid-air refueling aircraft. He

explained that if an airborne tanker was not in the right location, other planes that needed

refueling could crash. This critical 24x7 coordination required a data center that replicated to

another data center. One day their screens went blank. They later found that an 18-wheel truck

had knocked down a single key telephone pole that carried their bi-directional looped network

communications. Moral of the story – plan for disasters.

The Future – Reality or Fiction?

Deciding when to purchase new technology is always difficult. With an immediate need, you

have no choice but to buy a storage system that will accept future upgrades. If you do wait,

there will always be the next thing, so when do you make the purchase?

Page 44: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 44

Looking at Seagate60 product plans, they predict FC drives being replaced by SAS drives over

then next few years. Seagate also expects the industry to shift to a 2.5” form factor as these

devices become enterprise ready. We can reasonably expect prices to decrease, component

density/TB to increase, and a power/heat profile that is less than the current 3.5” drive.

When SSDs reach 500-1,000GB10 of capacity, possibly later in 2009 or early 2010, and lower

price tags, you will likely see designs incorporate just tier 0 SSDs and tier 3 SATA.

Archival will become even more popular as cloud

computing storage conveniently stores a bulk of the

60%/year storage growth. For example, since 2006

Google has been rumored to be building a “G:” or

Google Web Drive that permits placing data in the cloud

by merely copying it to or from the G: drive. To the right

is a screen shot of their Picassa application allowing a

user to upload their pictures to the “Google Web Drive.” There was even talk of being able to

boot your computer from the G: drive. Google would likely employ sophisticated deduplication

storage techniques to provide such a global service.

60 http://www.pbprojects.co.uk/PDF/ESandtheChannel_Marc_Jourlait.pdf

Page 45: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 45

CONCLUSION These economic times make it more important to focus on storage costs. You not only support

existing initiatives but address your company’s data growth. It all begins with knowing your

storage requirements and deciding if you can make better use of what you already own, or if

you need to purchase new capacity.

When the time comes to acquire more storage, there are many strategies to ensure you are

getting the most flexible and affordable technology. Deciding on the right equipment is a lot

more involved than just picking disk drives. Understanding the true cost of storage involves

studying application requirements, protecting the data, planning for future growth and making

the right financial and business choices.

The total cost of ownership is what will make or break your storage system. You are not only

solving today’s issues but also laying a foundation that will help your company thrive when

economic conditions inevitably improve. Learn as much as you can about your own environment

and of course, don’t be afraid to ask the tough questions.

Page 46: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 46

APPENDIX A –SOME OF THE U.S. DATA GOVERNANCE REGULATIONS

Page 47: 73885 Bruce Yellin - Significant Savings Are Within Your ...€¦ · Risk Avoidance and Disaster Recoverability ... reliable platform, and replicated for full business continuity

2009 EMC Proven Professional Knowledge Sharing 47

BIOGRAPHY

Bruce has worked for EMC Corporation for almost 10 years performing many functions as a

pre-sales engineer. He supports over 10 Enterprise accounts in the greater New Jersey-New

York area.

Prior to EMC, he has worked for MetLife, NCR, SCO, Novell, AT&T/UNIX System Labs, Data

General, Prime Computer and Equitable Life Insurance in roles that ranged from operations

research to product manager to data warehouse architect to applications programmer. He has

a Bachelor of Science degree in Computer Science from Florida International University, a

Master of Science degree in Computer Science from the New York Institute of Technology and

has taught computer science at Hunter College in NYC.

He started working with computers in 1969 and has published numerous articles and presented

at professional forums. He is most proud of his work with the United States Tennis Association

and the development of their automated tennis player ranking system.

Bruce holds an EMC Proven Professional Networked Storage-SAN Specialist certification

(EMCTA).