disaster recovery in the cloud - whitepaper

8

Click here to load reader

Upload: karolina-dryja

Post on 13-Jul-2015

97 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Disaster recovery in the Cloud - whitepaper

Whitepaper Disaster Recovery in the Cloud

A realistic guide to selecting the appropriate Disaster Recovery strategy and provider

A transforming market Business Continuity Planning (BCP) and Disaster Recovery (DR) have changed significantly over recent years due to the disruptive impact of the cloud computing market. The emergence of new cloud-based technologies,

the associated progress in virtualisation products and the reduction in connectivity costs have significantly influenced DR and BCP solutions with important benefits across the board, transforming the relevant offerings in the

marketplace.

What used to be a stagnant necessity, often associated with doubling the

cost of ownership of an Information System, now Disaster Recovery solutions can be extensions of flexible “production” platforms and

components of an overall Cloud migration strategy.

Commercial virtualisation technologies and standards have revolutionised the

way Disaster Recovery services are now provided and consumed. Whilst traditionally a recovery/failover platform would mean re-designing and building

a physical replica of the primary Information System, modern Disaster Recovery services can be consumed on-demand and at a fraction of the cost.

Contact

■ Telephone: 020 7680 6330 ■ Support: 020 7680 6347

■ Email: [email protected]

■ Web: www.techgateplc.com

Address:

Moorfoot House 221 Marsh Wall

Canary Wharf London

E14 9FJ

Page 2: Disaster recovery in the Cloud - whitepaper

Whitepaper Disaster Recovery in the Cloud

The changing economics of Disaster Recovery solutions Decreasing costs have been a major driver, not only for DR solutions, but for the wider adoption of Cloud-based services in

general. Server and storage infrastructure costs, networking equipment and connectivity price drops have allowed for the creation of low-cost “recovery platforms” that can support parts or the entirety of a production Information System. Leased lines (point-to-point connections) and Wide Area Networks have decreased exponentially in cost, while their speed and

capacity has multiplied tenfold over the last 10 years.

Indeed, modern Disaster Recovery as a Service (DRaaS) platforms not only provide cost amortisation (OPEX) consumption,

but also flexible resource planning and customisation for the recovery of applications and data. In addition, server and storage virtualisation technologies using smart provisioning of resources can result in more cost-efficient implementations.

The overall comparison over a traditional Disaster Recovery solution can demonstrate cost-savings of up to 70% for certain scenarios.

On-demand flexible recovery solutions DRaas platforms come in different flavours and commercial models albeit the main difference compared to traditional legacy systems (as with all the “as a service” cloud computing offerings) is the consumption model. This means the service is based on the workloads/applications the customer wants to protect, without having to commit to building another monolithic

Information System from scratch. Consequently, the underlying infrastructure and often the topology of the source system become irrelevant, with the customer having to worry only about real business-aligned factors such as the desired Recovery Time and Recovery Point Objectives (RTO and RPO).

DRaaS solutions can range from hypervisor-based replication services such as VMware’s vSphere Replication, Site

Recovery Manager (SRM) or Microsoft’s Hyper-V’s Replica, Virtual Machine/Operating System – based replication using third-party software such as Platespin, Double-Take etc. - and Cloud Backup and Recovery implementations, depending on the business requirements. Solution design is also dependant on the customer’s Information System’s status quo as the

business-critical applications that need to be protected may be on a combination of physical and virtual servers, or have additional requirements, such as hypervisor or Storage Area Network-based replication dependencies.

In fact, quite often, hybrid implementations provide the most comprehensive protection and offer the best cost-efficiency and

alignment with an organisation’s overall Business Continuity strategy.

Things to consider The importance of RTO and RPO Different applications have different values for an organisation and therefore should have different priorities with regard to their Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) e.g. can you tolerate losing one hour or one

day's worth or data? A trading platform may, for example, have far lower RPO and RTO than a management reporting tool. This means that in one scenario Storage Area Network (SAN) replication might be required to achieve an almost instant failover with near-zero loss of data, whereas in a scenario, a managed cloud backup and recovery solution might

be sufficient and far more cost effective. The RTO and RPO are in most cases a good starting point when designing a Disaster Recovery solution.

Regulatory requirements Regulations governing Information Systems and applications can also determine the approach to a DRaaS solution. Regulation can be internal (relating to IT and general business governance), as well as external, coming from a governmental regulatory authority or body. Regulation can dictate RTOs and RPOs when protecting applications, referring

to the limit of data that can be lost and the time taken to recover services within specific industry verticals. Regulation also drives data domicile and security requirements, topics that are more relevant than ever. The recent increasing awareness and discussion about US and EU legislative enforcement have resulted in organisations and

regulatory authorities

Page 3: Disaster recovery in the Cloud - whitepaper

Whitepaper Disaster Recovery in the Cloud

demanding more clarity on data domicile. Therefore, the specific geographic location of the DR and backup platforms is becoming ever more important, with many customers demanding that Data Centres, and thus all their data, are not only located in the UK, but also owned and managed by a UK-headquartered service provider, with no significant business

or operations outside of the UK.

Make “like-to-like” comparisons One of the common mistakes during the selection of a Disaster Recovery solution is not comparing similar functionality and implementations, oftentimes communicated and presented to a customer as a “competitive” quote. The requirement and

specifications of a Business Continuity strategy that extends to a DR solution is a business-related concept and while an open “consulting-type” discussion should take place as part of the engagement with a provider, customers should understand the design and foundations of an offering, avoiding “shoehorning” their requirement for commercial reasons.

The importance of recovery: The purpose of a DR solution is to ensure an organisation can recover data and

applications in the event of a disaster which renders the source (primary) platform inaccessible. Whereas in the past a DR strategy was often purely focussed on backing up and recovering data, most DRaaS offerings nowadays incorporate the element of functionality when recovering - a functional environment that office or remote

users can connect to, accessing the protected data and applications, in order to continue their work as usual. However, there are still occasions where DR offerings do not include functional recovery platforms, or there is no comprehensive design to cater for the realistic business requirements during an invocation event. In addition,

clear and in-depth technical and commercial documentation is essential to avoid not only hidden costs, but the catastrophic results of the solution not working when really needed.

Backup is always relevant and it’s not replication: A typical misconception is comparing or having to choose

between a replication-based and a backup-based solution. While the comprehensive protection of a system, achieving the best RTOs and RPOs in most cases comprises both elements, one cannot replace the other. For

example, replicating a Private Cloud to an off-site DRaaS platform with VMware’s SRM can offer excellent RTOs and RPOs, but offers very limited ‘point in time’ backup functionality if it is not combined with an enterprise-class backup and archiving solution. On the other hand, a Cloud Backup and Recovery implementation will offer a very

cost- efficient alternative to tape backup when combined with cloud recovery functionality, but will never achieve the very low RTOs and RPOs required for business-critical applications.

Not all recovery platforms are the same: Although the technical and commercial benefits of a DRaaS solution

include the independence of the offering from the underlying hardware, the quality, design and specification of the

recovery platform are of the utmost importance. Indeed, a high-end, production-ready recovery platform that can provide equal performance and infrastructure-level resilience can be very different to a non-fault-tolerant, low-end virtual or physical server in terms of achieving the goals of a Business Continuity strategy. Again, detailed design

and specification of the solution as well as prioritisation of the critical systems and components is required.

Security One of the most prominent and inhibiting factors with regard to the adoption of anything “cloud-related” is security, which

has traditionally been the infamous bone of contention across all organisations. The 2013 global surveillance disclosures and the associated hype generated, have further fuelled discussions around the security of data stored on cloud platforms and the readiness of cloud providers to defend them, leaving a rather confused IT consumer audience. As the cloud

market matures and both national and industry regulators begin to refine their guidance and regulations regarding data security (as well as data domicile) the security question may be answered through initiatives such as accreditations to defined standards and the encryption or anonymisation of personally identifiable data held outside of the organisation.

Page 4: Disaster recovery in the Cloud - whitepaper

Whitepaper Disaster Recovery in the Cloud

However, there are some basic elements to consider, in order to ensure security when selecting a modern Disaster Recovery solution:

Accreditations: Accreditations are very strong statements about the business you are partnering with (and not

limited to security). When considering outsourcing and cloud providers, industry-standard, corporate accreditations

provided by independent third-party auditing firms, should be fundamental determining factors during the evaluation process. Independently-audited accreditations, that are accepted across the IT industry, such as ISO27001 (Information Management), BS25999/ISO22301 (Business Continuity), and, should be used to “filter” potential

candidates. These two will guarantee that the cloud provider concerned has the necessary processes in place to handle customer information securely, as well as being able to maintain business operations during periods of major incidents. Other popular accreditations include ISAE3402/SAS-70, referring to assurance reporting on service

organisations and PCI DSS, covering card payment security and ISO20000/ITIL (Service Management).

o As achieving and maintaining ongoing compliance with all of these standards requires significant time and

investment on behalf of the provider, accreditations are excellent proof of organisations commitment to providing a reliable, and of course secure, service.

Infrastructure: As with everything IT-related, security starts from the very foundations of the infrastructure. This

refers to the overall “setup” of the cloud provider (network, Data Centres, security management) as well as the specific solution infrastructure. If, for example, a DRaaS platform that makes use of a shared storage solution is based on a commodity, low-end SAN with weak security, then a customer making use of the same hardware -

without necessarily knowing – might be exposed to other “tenants” on the same SAN. Equally, somewhere within the provider’s infrastructure, there might be a firewall or another networking device that is not being managed well, leaving the whole network vulnerable to attack. In general, ownership, efficient management and effective

monitoring of its network and nodes, is a prerequisite for the provider with whom you choose to partner.

Security in solution design: The design of the DR solution itself plays the most critical role in terms of security.

Although Disaster Recovery is considered to be a “secondary” element to an organisation’s Information System, the level of security and associated design principles should be consistent and encompass the DR solution.

Encryption over the Internet with a Virtual Private Network (VPN) or a dedicated point-to-point connection, terminating on a private/dedicated firewall at the provider’s Data Centre, are basic security measures. Additional encryption can be native in the technology used as well. For example, by using replication software or an

integrated backup solution that stores data in an encrypted at rest, secure storage platform.

As the frequency of security threats to cloud providers’ Data Centres are on the increase, additional security

services such as penetration testing, application security assessment, defence or even source code review, can enhance the level of security for when sensitive data or business-critical applications are sent outside the organisation’s perimeter. To address these security requirements, the cloud provider should be able to work with

its customers and partners (consultants, security specialists etc), or even better, offer those services itself.

Profile of the cloud provider More difficult to define, but one of the most important factors when selecting a cloud provider for DR (as well as with any other type of IT partner), is its overall profile. For example, the size of the organisation often determines the flexibility that

the provider is able to offer in order to accommodate the customer’s requirements and desired commercial terms.

Flexibility becomes even more relevant with complex implementations and not off –the-shelf commodity services, where

customised design and technical consulting is a vital part of the engagement. For instance, a hybrid implementation often requires multiple service elements, as the co-location of physical hardware alongside the cloud recovery platform, as well as

dedicated connectivity services and/or load balancing technologies.

Page 5: Disaster recovery in the Cloud - whitepaper

Whitepaper Disaster Recovery in the Cloud

Although large monolithic providers normally tend to be more inflexible than smaller, specialist providers, commercial size and limited infrastructure and connectivity can be risks when dealing players at the other end of the spectrum.

Depending on the size of the customer, and therefore the relative value of their contract, a smaller, industry-specific or

specialist cloud provider is more likely to provide more personalised and cost-efficient support to their customer than a

global generalist IT vendor. However, partnering with a smaller provider can be a commercial concern for the customer, especially if you are looking at placing a long term contract, so, in all cases, proper financial due diligence must be carried on the prospective provider during the selection phase.

Service Level Agreements (SLAs) and terms For the last 4-5 years, cloud SLAs have been a very hot topic, discussed thoroughly among potential cloud consumers. As the cloud market is still maturing, customers are pressuring the suppliers to create more coherent and well-structured SLAs which are both easy to understand and easy to measure. However, there have been numerous incidents where the larger

cloud providers have been found to have incorporated a “small print” approach to caveat their responsibility in case they fail to meet their SLA and terms (e.g. regional availability outages, bandwidth traffic charges etc.). The only applicable and proven method with regard to avoiding disappointment – and potential disastrous results – is to challenge the SLA of a

provider with specific, in-depth scenarios/questions, and if not satisfied, demand that anything missing is incorporated into the SLA and/or the contractual terms and conditions.

Exit strategy As with “all things outsourced”, implementing a cloud-based DR solution means outsourcing, often for a long time period, your applications and data to a third-party organisation outside of your in-house controls and management. Although cloud

based DR solutions are far more flexible and can be more “portable” (meaning that the solution can be migrated to another third- party provider relatively easily) than cloud based production systems, vendor lock-in is still a very relevant and typically inhibiting factor amongst potential cloud adopters. In order to address this concern, a clear and concise agreement

between the provider and the customer in which the exit strategy process is described, including the migration to a new provider.

Fortunately, modern virtualisation technologies can offer significant portability between cloud providers, effectively mitigating

the risk of being locked-in when changing providers. However, this is definitely not a panacea and any future exit strategy

must be clearly defined.

Connectivity and user access When we refer to off-site DR solutions, we usually mean a third-party recovery platform that the customer’s users can, in the event of a failure of the primary systems, connect to and continue to access their applications and data. A very

common mistake when selecting a DR solution is ignoring, or not prioritising, the way the users will be connecting to a restored/failed- over application and the design elements or connectivity required to accomplish this.

For example, while it might be straightforward to establish a connection for a backup and recovery solution over a very

limited Internet line, what happens when users need to connect remotely to their applications when an event renders their

on- premise IT systems inaccessible? Or, equally, does the design of the DR solution provide access to all the different groups of users that want to connect using different means and from different locations? The value of a DR solution is really those hours, or even those few minutes, during which an organisation will rely on the functionality of its DR platform

to deliver when needed.

Proof of concept, testing and documentation In order to mitigate the risks associated with the adoption of any DR solution, there is no other better way than to actually try

Page 6: Disaster recovery in the Cloud - whitepaper

Whitepaper Disaster Recovery in the Cloud

it. Depending, of course, on the type of solution, a Proof of Concept or demonstration of a similar solution is invaluable. In most cases, cloud DR providers will be able to demonstrate some, if not all, elements of the proposed solution, allowing the customer to test both the solution’s functionality and to ensure it meets their requirements.

After the work is done and the solution is implemented, regular testing is the best way to make sure that the agreed solution is delivering at the expected levels. As the customer’s requirements, specifications and ultimately the solution itself might change, testing should be an inherent - and contracted - feature of the offering. The monthly, quarterly or more commonly

annual tests are also a way to monitor, prove and amend the essential documentation that forms a key element of any Business Continuity Plan.

Testing of the solution must also cover user connectivity to the DR platform, which is often overlooked. In some cases,

defining an RTO is something that can be achieved only after testing and documenting the solution. In general, a period of

pre- and post-sales technical engagement is crucial, and usually involves work from both parties to achieve the optimum results and implementation.

Design Options As mentioned above, cloud-based DR solutions offer a range of options in terms of the technology and design that can be offered:

Shared versus dedicated compute Falling into the broad category of Public Cloud services, shared/multi-tenanted DRaaS platforms are DR systems that are used by multiple customers. The main advantage of shared compute platforms is they are usually commercially more

appealing, as the cost of the compute infrastructure is distributed across multiple customers. In theory, dedicated platforms are inherently more secure and offer better performance, as only one customer has access to it. Solutions based on shared platforms are more restrictive, in terms of flexibility, and are often based on more “commodity type” hardware

which may not deliver the performance required when you invoke your DR system.

Dedicated platforms on the other hand, can be built to the exact design and system requirements as specified by the

customer, offering complete management and administration control. In these scenarios, the cost of the hardware required for the solution is usually amortised across the length of the contract with the provider.

Software licencing can also be a consideration when deciding between shared and dedicated platforms as certain software vendors, such as Microsoft, are very restrictive about if and when customer owned licencing can be used on shared platforms.

Shared versus dedicated storage Choosing between a shared and a dedicated storage hardware option will make a significant difference in the cost of the service. For more complex, large-scale solutions with specific performance or security requirements, selecting a dedicated

storage option can still be cost-effective and offer the performance and security features required, as long as the customer is prepared to enter into a longer term contract.

Something else to be taken into consideration when selecting shared or dedicated storage, is the option to use different types of storage, i.e. faster or slower performance disks within the same DR solution. This use of multiple storage types is often referred to as Storage Tiers. Common storage options, in terms of performance, are Solid State Drive (SSD), SSD-

enabled Serial Attached SCSI (SAS), SAS and Serial ATA (SATA). The ability to migrate data from one Storage Tier to another, and even from shared to dedicated storage, or to use a combination of different storage pools in a customised implementation, is also something very important to consider.

Page 7: Disaster recovery in the Cloud - whitepaper

Whitepaper Disaster Recovery in the Cloud

Managed/non-managed and automated/automatic failover Most generic and DR specialist cloud providers have evolved from traditional Managed Service Providers (MSP) and

Outsourcing/Data Centre vendors, which means that, in most cases, a DRaaS solution is expected to automatically work for the customer in case of an incident. Although outsourcing your DR with a cloud provider offers peace of mind, there are different levels of management and automation with distinct and important differences that fit different requirements.

Depending on the type of solution and the technology required, cloud providers can offer a range of service choices. For example, using a fully managed cloud backup and recovery solution means that, in an emergency, the provider would invoke

the DR environment for the customer and make the DR platform available for users to connect to. On the other hand, with a customer-managed service, the customer can have management and administration of a platform that they can invoke themselves (and therefore test) whenever they want, including a real invocation incident.

A common misconception is between an automatic and automated failover of the DR platform. A solution that offers automatic failover will automatically switch functionality and redirect its users to the DRaaS platform to access their

applications and data, whereas in the latter, failover would need to be instigated by the customer’s IT systems manager.

Resilience and performance As discussed previously, the required performance and resilience levels of the DR platform can have a significant impact on

the design and commercials of a DRaaS implementation. However, Disaster Recovery platforms are complete Information Systems in their own right, to which these concepts still apply, despite often only being seen as an additional “safety nets” to primary production systems. The required fault-tolerance of the DRaaS platform should be taken into consideration when

designing the overall solution. Redundancy within the platform’s server farm as well as networking devices such as firewalls, switches, and load balancing or alternative user connectivity options, should be addressed.

Equally, in terms of performance, a production-ready DR platform must be designed to meet the specifications of the primary production platform that it is replacing. This means that the compute, storage and networking capabilities of the platform should always be kept up-to-date as part of an integrated and holistic Information System, as well as mimicking

the application and security awareness, hardware dependencies, performance and resilience of the primary platform.

About Techgate plc Techgate has been providing Business Continuity consultancy and Disaster Recovery solutions for more than a decade now. The company’s heritage has been always providing high-availability infrastructure systems to support our customers with their day-to-day business. Over this time, Techgate has evolved into a fully-fledged Managed Services Provider, designing

and supplying Cloud solutions that offer enterprise-class levels of performance, availability, security and compliance.

Our extensive and proven expertise in virtualisation, cloud services and business continuity, combined with our innovative

approach, enables us to deliver highly resilient cloud infrastructures for production systems and state-of-the-art disaster recovery solutions. Our key markets are Finance, Legal and Independent Software Vendors (ISVs) where reliability, high

availability and security are key considerations.

In order to provide and maintain the highest levels of service delivery and management, we have invested heavily in industry

leading infrastructure and network systems and have developed business processes and operations that are fully accredited to ISO27001 (Information Security Management) and BS25999 (Business Continuity).

Our collaboration with technology partners such as VMware, IBM, Intel and others over the past 3 years has enabled us to

bring to market innovative Disaster Recovery as a Service products based on industry-leading technology. Techgate was one

Page 8: Disaster recovery in the Cloud - whitepaper

Whitepaper Disaster Recovery in the Cloud

of the first providers in the world to integrate VMware’s premium in-house Disaster Recovery component, Site Recovery Manager (SRM), into an award-winning cloud-based offsite Disaster Recovery as a Service (DRaaS) solution.

Our general approach, as well as towards DR solutions is that there is no “one size fits all” IT solution, which is why although

we offer a wide range of technologies and products such as cloud backup, cloud replication, bespoke recovery platforms

and managed recovery services, every solution to us is different and therefore unique.

Throughout the years, our primary aim has always been to offer the best possible technical and commercial solutions,

designed specifically to meet our customers’ unique and business critical IT infrastructure and systems requirements, which is why many of our original customers are still with us today and we hope they stay with us for another decade.

How Techgate is different

Flexible business model enabling us to provide innovative, bespoke solutions with minimum time to market.

Wholly owned UK-based Tier 3 Data Centres and network.

Guaranteed, true enterprise-class levels of performance, security, resilience and compliance.

Extensive expertise and background in Business Continuity and Disaster Recovery.

Directly accessible highly skilled dedicated engineering team and consultancy resources.

About the authors David Reeve David is the Director of Professional

Services at Techgate plc and has been a Vice President at JPMorgan Chase and The Bank of New York

Mellon. He has held a number of strategic technology roles including CTO and Head of Computer

Services for several UK banks.

Kostas Roungeris Kostas is a Cloud Solutions Specialist at Techgate plc, also focusing

on the overall cloud market strategy of the company. He has management/business IT

consulting experience, as well as an active academic research interest in cloud computing.

To read more about our DRaaS solutions, scan: