high availability in storage systems
DESCRIPTION
Disaster Recovery (DR) has become the buzz words of all enterprises today. In the volatile and uncertain world of today, it becomes extremely important to plan for contingencies to protect against possible disasters.TRANSCRIPT
High Availability in Storage Systems
Introduction In today’s connected world, information and communication have become vital and fundamental
aspects of every sphere. Be it an individual or a business, data has become the lifeblood of our daily
existence. Large scale panics due to Twitter blackouts are proof to this fact of life today. For businesses,
even brief down times can result in substantial losses for the business. Long term down times resulting
due to various human and natural disasters can cripple a business and bring it to its knees. According to
Dunn and Bradstreet, 59% of Fortune 500 companies experience a minimum of 1.6 hours of down time
per week, which translates in to $46 million per year. According to Network Computing, the Meta Group
and Contingency Planning Research, the typical hourly cost of downtime varies from roughly $90,000 for
Media firms to about $6.5 million for Brokerage services. Thus, it becomes clear that based on the
nature and the size of the business, the financial impacts of down time can vary from one end of the
spectrum to another. Often, the impact of a down time cannot be predicted accurately. While there are
some obvious impacts of a down time in terms of lost revenue and productivity, there can be several
intangible impacts such as brand image damages that could have not-so-obvious and far-reaching
effects on the business.
Disaster Recovery is not High Availability Disaster Recovery (DR) has become the buzz words of all enterprises today. In the volatile and uncertain
world of today, it becomes extremely important to plan for contingencies to protect against possible
disasters. Disasters could be software related or hardware related. Software disasters could be a result
of virus and other security threats such as hacking, deletion of data –accidentally or with malicious
intent. Hardware related disasters could be a result of various hardware components such as drives,
motherboards, power supplies, or natural and man-made site disasters such as fire, flooding and other
natural and man-made disasters. Different disasters need different recovery strategies – while software
failures can be protected using techniques such as Snapshots, Continuous Data Protection (CDP),
hardware failures can be recovered by building component redundancies within the system (such as
RAID, RPS) and ensuring that the data is backed up on alternate media using D2D, D2D2T backup
methodologies and synchronous and asynchronous replication strategies.
Thus, disaster recovery is one aspect of Business Continuity (BC) strategies that a company must employ
to minimize the impacts of down time. Disaster recovery is often measured in terms of two Service Level
Agreement (SLA) objectives – Recovery Point Objective (RPO) and Recovery Time Objective (RTO). RPO
represents the acceptable amount of data loss on a disaster, measured in time. RTO represents the
amount of time within which the business must be restored after a disaster.
While, a DR strategy focuses on the effectiveness of a recovery process after a disaster, it does not focus
on keeping the data available without any down times. Availability is measured as the ratio of mean
time between failures (MTBF) to the sum of MTBF and mean time to repair (MTTR). Thus, availability is
indicative of the percentage of time the system is available throughout its useful life. As mentioned
earlier, one of the primary goals of disaster recovery strategies is to minimize the RTO (down time).
Since MTTR is a measure of the down time and must meet the objectives of RTO, a comprehensive
disaster recovery strategy must also encompass strategies to increase availability. Thus, while DR
strategies are strictly not availability strategies, they do meet availability requirements to an extent.
Figure 1: System Availability
Classes of Availability Availability is often expressed as a percentage of system availability. Often an availability of about 90-
95% is sufficient for most applications. However, for extremely critical business data such amounts of
availability is simply not enough. As mentioned before, for services such as brokerage services and
businesses offering online services, a down time of the order of more than a few minutes an year could
have significant impacts on its operations. For e.g. a 99.9% availability typically means about 9 hours of
down time per year. The financial and other impacts of such down times could spell trouble for the
business. Often, truly highly available solutions have an availability of 99.999% (“five nines”) or
99.9999% (“six nines”). Such solutions
have a down time of the order of a few
seconds to a couple of minutes per year.
There are different classes of data
protection mechanisms based on the
availability. Figure 2 shows a pyramid of
various data protection strategies. As
one goes up the hierarchy the down
time decreases and hence the
availability increases. The top two levels
of the pyramid constitute strategies that
represent true high availability (“five
nines” and “six nines”).
Uptime D
ow
nti
me
Uptime
Do
wn
tim
e
Uptime
Figure 2: Classes of Data Protection
Active/Active Dual Controllers: SBB The fundamental way to make a storage system highly available is to make each and every component
of the system highly redundant. This includes the processors, memory modules, network and other host
connectivity ports, power supplies, fans and other components. Apart from these, the drives are also
configured in RAID to ensure drive failure
tolerance. However, still the disk array
controller (RAID controller) and motherboard
of the system constitute single points of failure
in the system.
Storage Bridge Bay (SBB) is a specification
created by a non-profit working group that
defines a mechanical/electrical interface
between a passive backplane drive array and
the electronics packages that give the array its
“personality”, thereby standardizing storage
controller slots. One chassis could have
multiple controllers that can be hot-swapped.
This ability to have multiple controllers means that the system is protected against controller failures as
well, thereby giving it a true high availability.
But, such a configuration is not bereft of challenges. One of the primary challenges in such a system is
due to the fact that it hosts two intelligent controllers within the same unit that share the common mid-
plane and drive array. Since the drive array is shared, the two controllers must exercise a mutual
exclusion policy on the drives to ensure that they don’t modify the same data simultaneously causing
data corruptions and inconsistencies. Thus the RAID module on the controllers must be cluster-aware to
avoid such collisions and handle conflicts thereof. Further, the two controllers will have their own cache
of the meta-data and data that is stored on these drives. The synchronization between these two caches
needs to be maintained to ensure that one controller can resume the activities of the other controller
upon its failure.
Figure 3: Storage Bridge Bay
Figure 4: Dual Redundant Controller unit
In order to do this Clustered RAID communication and maintain Cache Coherency, the two controllers
need to have a set of (preferably) dedicated communication channels. A combination of more than one
communication channels such as SAS fabric, Ethernet connections etc, could be employed here to
ensure minimal performance impact and redundancies in this communication layer as well. As with all
dual redundant “intelligent” clusters, the loss of the inter-node communication could result in the two
controllers losing cache coherency. Further, as the communication is lost, each controller could try to
take up the operation of its peer controller resulting in a split brain scenario. In order to handle this split
IP Network
Controller
A
Controller
B
SAN
Volumes
Cache
RAID
SAS Expander
Shared SAS disks
IP SAN
PSU 1 PSU 2
SAS X4
GbE Link
Dual Redundant Unit
IP-SAN
Client
(with DSM)
Network Switch
iTX
Storage
iTX
Storage
brain scenario, the two controllers also need to maintain a quorum using dedicated areas of the shared
drive array to avoid conflicts and data corruptions.
The key advantage of such a dual controller setup is that it is almost fully redundant with hot-swappable
components. However, despite the controllers being redundant, the mid-plane connecting the
controllers to the drive back-plane is still shared making it a single point of failure.
High Availability Cluster The highest class of availability in the availability pyramid is achieved using High Availability Clusters. As
mentioned before, while dual controllers are
extremely robust and fault tolerant, they are still
susceptible to mid-plane failures. Apart from
these, since the drive array is shared between the
two controllers, RAID is the only form of protection
available against drive failures. High Availability
Clusters are cluster of storage nodes that are
implemented by having redundant storage nodes
which ensure continuity of data availability despite
component and a storage node failure as well. This
represents the highest form of availability that is
possible (“six nines”). In comparison to SBB based
dual controller nodes, HA Clusters do not suffer
from any single point of failures. In addition, since the drive arrays are not shared by the two systems,
the individual systems have their own RAID configurations, thereby making the HA Cluster resilient to
more drive failures than SBB setup. Unstable data center environments such as rack disturbances are
common causes of premature drive failures. Dual controller nodes are more prone to system failures
compared to HA Clusters in such
environments. And finally, HA
Clusters are also resilient to site
failures thus making them the best
in class availability solution.
However, SBB based dual controller
units have a lower disk count
making them more power efficient
and a greener solution with smaller
data center footprint.
HA Clusters also encounter the split
brain syndrome associated with
dual controller nodes. However,
unlike dual controller nodes, this
Figure 5: HA Cluster
Figure 6: DSM based HA Cluster
problem cannot be addressed using a quorum disk as the two units do not share the drive array. One
way to address this problem is to have a client side device specific module (DSM) that performs the
quorum action on a split brain. The DSM sits on the path of the IO and decides on the path to send the
IO. In addition, it keeps track of the status of the HA Cluster, whether the two nodes are synchronized
and permits a failover action from one system to another only when the two nodes are completely
synchronized.
The drawback in having a client-side DSM is that the HA cluster becomes dependent on the client. Also,
if the clients are also clustered, then each of the clients needs to have distributed DSMs that
communicate amongst each other. A client-
agnostic HA Cluster can be created if we
understand the reasons for a split brain scenario in
HA Cluster. Typically, a split brain scenario that
causes data corruption can occur in a HA Cluster
when the network path between the storage nodes
have failed severing the communication
between the two nodes, while the client itself can
access both the nodes. In this scenario, both the
storage nodes will try to take-over the cluster
ownership and unless the client has some way of
knowing the right owner of the cluster, the
IOs could potentially be sent to the wrong storage node causing data corruptions. Thus, a HA Cluster
where the storage nodes have lost contact with each other while the connections from the client to
both the storage nodes are alive is the cause of split brain scenario. As we can see in Figure 6, such a
setup is not a true high availability solution as it does not provide path failover capability. Thus, it can be
seen true HA setups are not prone to split bran syndromes for HA Clusters. Figure 7 shows one such
network configuration that supports client-agnostic HA Cluster configurations.
True High Availability A true highly available data center is one in which every component in the system – not just the storage
– are highly available. This includes the client machines, application Servers such as databases, mail
servers etc, network switches, and network paths from clients to the application server, and network
paths from application servers to storage systems. The following figure shows a true HA setup where
every component in the setup has redundancy built in to it. Thus, a failure in any single component –
say, a switch or a network path, does not make the system unavailable.
Figure 7: Client agnostic HA Configuration
Figure 8: True High Availability
Summary
Thus, it can be seen true storage high availability can be only ensured when there is redundancy built in
to every component of a storage sub-system. Dual redundant controller setup and HA Cluster setups are
two such setups that deliver the best in class availability. Each of them have their own advantages and
drawbacks, but a combination of these approaches in addition to application server and network path
redundancies deliver a truly highly available data center setup.
For More Information about High Availability in Storage Systems , http://www.amiindia.co.in