high availability in storage systems

High Availability in Storage Systems

Introduction In today’s connected world, information and communication have become vital and fundamental

aspects of every sphere. Be it an individual or a business, data has become the lifeblood of our daily

existence. Large scale panics due to Twitter blackouts are proof to this fact of life today. For businesses,

even brief down times can result in substantial losses for the business. Long term down times resulting

due to various human and natural disasters can cripple a business and bring it to its knees. According to

Dunn and Bradstreet, 59% of Fortune 500 companies experience a minimum of 1.6 hours of down time

per week, which translates in to $46 million per year. According to Network Computing, the Meta Group

and Contingency Planning Research, the typical hourly cost of downtime varies from roughly $90,000 for

Media firms to about $6.5 million for Brokerage services. Thus, it becomes clear that based on the

nature and the size of the business, the financial impacts of down time can vary from one end of the

spectrum to another. Often, the impact of a down time cannot be predicted accurately. While there are

some obvious impacts of a down time in terms of lost revenue and productivity, there can be several

intangible impacts such as brand image damages that could have not-so-obvious and far-reaching

effects on the business.

Disaster Recovery is not High Availability Disaster Recovery (DR) has become the buzz words of all enterprises today. In the volatile and uncertain

world of today, it becomes extremely important to plan for contingencies to protect against possible

disasters. Disasters could be software related or hardware related. Software disasters could be a result

of virus and other security threats such as hacking, deletion of data –accidentally or with malicious

intent. Hardware related disasters could be a result of various hardware components such as drives,

motherboards, power supplies, or natural and man-made site disasters such as fire, flooding and other

natural and man-made disasters. Different disasters need different recovery strategies – while software

failures can be protected using techniques such as Snapshots, Continuous Data Protection (CDP),

hardware failures can be recovered by building component redundancies within the system (such as

RAID, RPS) and ensuring that the data is backed up on alternate media using D2D, D2D2T backup

methodologies and synchronous and asynchronous replication strategies.

Thus, disaster recovery is one aspect of Business Continuity (BC) strategies that a company must employ

to minimize the impacts of down time. Disaster recovery is often measured in terms of two Service Level

Agreement (SLA) objectives – Recovery Point Objective (RPO) and Recovery Time Objective (RTO). RPO

represents the acceptable amount of data loss on a disaster, measured in time. RTO represents the

amount of time within which the business must be restored after a disaster.

While, a DR strategy focuses on the effectiveness of a recovery process after a disaster, it does not focus

on keeping the data available without any down times. Availability is measured as the ratio of mean

time between failures (MTBF) to the sum of MTBF and mean time to repair (MTTR). Thus, availability is

indicative of the percentage of time the system is available throughout its useful life. As mentioned

earlier, one of the primary goals of disaster recovery strategies is to minimize the RTO (down time).

Since MTTR is a measure of the down time and must meet the objectives of RTO, a comprehensive

disaster recovery strategy must also encompass strategies to increase availability. Thus, while DR

strategies are strictly not availability strategies, they do meet availability requirements to an extent.

Figure 1: System Availability

Classes of Availability Availability is often expressed as a percentage of system availability. Often an availability of about 90-

95% is sufficient for most applications. However, for extremely critical business data such amounts of

availability is simply not enough. As mentioned before, for services such as brokerage services and

businesses offering online services, a down time of the order of more than a few minutes an year could

have significant impacts on its operations. For e.g. a 99.9% availability typically means about 9 hours of

down time per year. The financial and other impacts of such down times could spell trouble for the

business. Often, truly highly available solutions have an availability of 99.999% (“five nines”) or

99.9999% (“six nines”). Such solutions

have a down time of the order of a few

seconds to a couple of minutes per year.

There are different classes of data

protection mechanisms based on the

availability. Figure 2 shows a pyramid of

various data protection strategies. As

one goes up the hierarchy the down

time decreases and hence the

availability increases. The top two levels

of the pyramid constitute strategies that

represent true high availability (“five

nines” and “six nines”).

Uptime D

ow

nti

me

Uptime

Do

wn

tim

e

Uptime

Figure 2: Classes of Data Protection

Active/Active Dual Controllers: SBB The fundamental way to make a storage system highly available is to make each and every component

of the system highly redundant. This includes the processors, memory modules, network and other host

connectivity ports, power supplies, fans and other components. Apart from these, the drives are also

configured in RAID to ensure drive failure

tolerance. However, still the disk array

controller (RAID controller) and motherboard

of the system constitute single points of failure

in the system.

Storage Bridge Bay (SBB) is a specification

created by a non-profit working group that

defines a mechanical/electrical interface

between a passive backplane drive array and

the electronics packages that give the array its

“personality”, thereby standardizing storage

controller slots. One chassis could have

multiple controllers that can be hot-swapped.

This ability to have multiple controllers means that the system is protected against controller failures as

well, thereby giving it a true high availability.

But, such a configuration is not bereft of challenges. One of the primary challenges in such a system is

due to the fact that it hosts two intelligent controllers within the same unit that share the common mid-

plane and drive array. Since the drive array is shared, the two controllers must exercise a mutual

exclusion policy on the drives to ensure that they don’t modify the same data simultaneously causing

data corruptions and inconsistencies. Thus the RAID module on the controllers must be cluster-aware to

avoid such collisions and handle conflicts thereof. Further, the two controllers will have their own cache

of the meta-data and data that is stored on these drives. The synchronization between these two caches

needs to be maintained to ensure that one controller can resume the activities of the other controller

upon its failure.

Figure 3: Storage Bridge Bay

Figure 4: Dual Redundant Controller unit

In order to do this Clustered RAID communication and maintain Cache Coherency, the two controllers

need to have a set of (preferably) dedicated communication channels. A combination of more than one

communication channels such as SAS fabric, Ethernet connections etc, could be employed here to

ensure minimal performance impact and redundancies in this communication layer as well. As with all

dual redundant “intelligent” clusters, the loss of the inter-node communication could result in the two

controllers losing cache coherency. Further, as the communication is lost, each controller could try to

take up the operation of its peer controller resulting in a split brain scenario. In order to handle this split

IP Network

Controller

A

Controller

B

SAN

Volumes

Cache

RAID

SAS Expander

Shared SAS disks

IP SAN

PSU 1 PSU 2

SAS X4

GbE Link

Dual Redundant Unit

IP-SAN

Client

(with DSM)

Network Switch

iTX

Storage

iTX

Storage

brain scenario, the two controllers also need to maintain a quorum using dedicated areas of the shared

drive array to avoid conflicts and data corruptions.

The key advantage of such a dual controller setup is that it is almost fully redundant with hot-swappable

components. However, despite the controllers being redundant, the mid-plane connecting the

controllers to the drive back-plane is still shared making it a single point of failure.

High Availability Cluster The highest class of availability in the availability pyramid is achieved using High Availability Clusters. As

mentioned before, while dual controllers are

extremely robust and fault tolerant, they are still

susceptible to mid-plane failures. Apart from

these, since the drive array is shared between the

two controllers, RAID is the only form of protection

available against drive failures. High Availability

Clusters are cluster of storage nodes that are

implemented by having redundant storage nodes

which ensure continuity of data availability despite

component and a storage node failure as well. This

represents the highest form of availability that is

possible (“six nines”). In comparison to SBB based

dual controller nodes, HA Clusters do not suffer

from any single point of failures. In addition, since the drive arrays are not shared by the two systems,

the individual systems have their own RAID configurations, thereby making the HA Cluster resilient to

more drive failures than SBB setup. Unstable data center environments such as rack disturbances are

common causes of premature drive failures. Dual controller nodes are more prone to system failures

compared to HA Clusters in such

environments. And finally, HA

Clusters are also resilient to site

failures thus making them the best

in class availability solution.

However, SBB based dual controller

units have a lower disk count

making them more power efficient

and a greener solution with smaller

data center footprint.

HA Clusters also encounter the split

brain syndrome associated with

dual controller nodes. However,

unlike dual controller nodes, this

Figure 5: HA Cluster

Figure 6: DSM based HA Cluster

problem cannot be addressed using a quorum disk as the two units do not share the drive array. One

way to address this problem is to have a client side device specific module (DSM) that performs the

quorum action on a split brain. The DSM sits on the path of the IO and decides on the path to send the

IO. In addition, it keeps track of the status of the HA Cluster, whether the two nodes are synchronized

and permits a failover action from one system to another only when the two nodes are completely

synchronized.

The drawback in having a client-side DSM is that the HA cluster becomes dependent on the client. Also,

if the clients are also clustered, then each of the clients needs to have distributed DSMs that

communicate amongst each other. A client-

agnostic HA Cluster can be created if we

understand the reasons for a split brain scenario in

HA Cluster. Typically, a split brain scenario that

causes data corruption can occur in a HA Cluster

when the network path between the storage nodes

have failed severing the communication

between the two nodes, while the client itself can

access both the nodes. In this scenario, both the

storage nodes will try to take-over the cluster

ownership and unless the client has some way of

knowing the right owner of the cluster, the

IOs could potentially be sent to the wrong storage node causing data corruptions. Thus, a HA Cluster

where the storage nodes have lost contact with each other while the connections from the client to

both the storage nodes are alive is the cause of split brain scenario. As we can see in Figure 6, such a

setup is not a true high availability solution as it does not provide path failover capability. Thus, it can be

seen true HA setups are not prone to split bran syndromes for HA Clusters. Figure 7 shows one such

network configuration that supports client-agnostic HA Cluster configurations.

True High Availability A true highly available data center is one in which every component in the system – not just the storage

– are highly available. This includes the client machines, application Servers such as databases, mail

servers etc, network switches, and network paths from clients to the application server, and network

paths from application servers to storage systems. The following figure shows a true HA setup where

every component in the setup has redundancy built in to it. Thus, a failure in any single component –

say, a switch or a network path, does not make the system unavailable.

Figure 7: Client agnostic HA Configuration

Figure 8: True High Availability

Summary

Thus, it can be seen true storage high availability can be only ensured when there is redundancy built in

to every component of a storage sub-system. Dual redundant controller setup and HA Cluster setups are

two such setups that deliver the best in class availability. Each of them have their own advantages and

drawbacks, but a combination of these approaches in addition to application server and network path

redundancies deliver a truly highly available data center setup.

For More Information about High Availability in Storage Systems , http://www.amiindia.co.in

or [email protected]

http://www.amiindia.co.in/

http://www.amiindia.co.in/

high availability in storage systems

Documents

natural disasters

software disasters

recovery time objective

different disasters

manmade disasters

hardware related disasters

recovery process

possible disasters