sql server business continuity in a hosted environment reference architecture

53
SQL Server Business Continuity in a Hosted Environment Reference Architecture Published: June 2013

Upload: getafix

Post on 31-Dec-2015

24 views

Category:

Documents


0 download

DESCRIPTION

Availability solution using SQL Server

TRANSCRIPT

Page 1: SQL Server Business Continuity in a Hosted Environment Reference Architecture

SQL Server Business Continuity in a Hosted EnvironmentReference ArchitecturePublished: June 2013

Page 2: SQL Server Business Continuity in a Hosted Environment Reference Architecture

Overview..........................................................................................................1

Business Continuity Solutions for HSPs............................................................1

Clustering Virtual Machine Hosts..................................................................3

SQL Server AlwaysOn Failover Cluster Instances..........................................5

SQL Server AlwaysOn Availability Groups.....................................................7

Hyper-V Replicas...........................................................................................9

SQL Server Transaction Log Shipping.........................................................11

SQL Server Transactional and Peer-to-Peer Replication..............................11

Business Continuity Infrastructure.................................................................14

Networking Prerequisites............................................................................14

Storage Prerequisites..................................................................................16

Solution Deployment.....................................................................................20

Active Directory Prerequisites.....................................................................20

Client Connection Prerequisites..................................................................20

Cluster Services Configuration....................................................................21

SQL Server Installation and Configuration..................................................23

SQL Server AlwaysOn Availability Group Configuration..............................25

SQL Server Clustered Instance and Availability Group Client Sessions.......29

Hyper-V Replica...........................................................................................29

SQL Server Transaction Log Shipping.........................................................31

SQL Server Transactional and Peer-to-Peer Replication..............................32

Recovery........................................................................................................35

Windows Failover Cluster Instances............................................................35

Availability Groups......................................................................................35

Hyper-V Replica...........................................................................................36

SQL Server Transaction Log Shipping.........................................................37

SQL Server Transactional and Peer-to-Peer Replication..............................37

Summary.......................................................................................................38

SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 3: SQL Server Business Continuity in a Hosted Environment Reference Architecture

SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 4: SQL Server Business Continuity in a Hosted Environment Reference Architecture

OverviewThis is a hardware-agnostic guide to building the infrastructure necessary for business continuity of Microsoft® SQL Server® databases from the point of view of a hosted service provider (HSP) and tenant. Reference architecture in this context refers to various templates SQL Server administrators might use in building a business continuity solution and provides a common language with which to discuss implementations.

Business Continuity Solutions for HSPsBusiness continuity is the ability to weather the loss of hardware, an application, service, or an entire data center with little to no data loss. Redundancy at the server level allows for the loss of memory, processors, power, network adapters, and other critical components. Redundancy at the storage level allows for loss of disks where SQL Server program files, database data files, or database log files reside.

Most of these business continuity solutions are available as cloud solutions residing totally in the HSP environment, or hybrid-IT solutions where some components reside on-premises in the tenant’s organization, and some in the cloud.

Business continuity solutions are described as High Availability or Disaster Recovery. High availability is defined as meeting a maximum level of operational up-time in the service-level agreement between the HSP and the tenant, typically described as a number of nines, that is, (Actual Uptime) / (Expected Uptime) x 100%.

Examples of high availability in the cloud are Windows® Failover Clusters, where the clustered resource is a set of virtual machines; AlwaysOn Failover Cluster Instances where a SQL Server instance is the clustered resource; and Microsoft SQL Server 2012 AlwaysOn Availability Groups with

1 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 5: SQL Server Business Continuity in a Hosted Environment Reference Architecture

synchronous writes. One of the requirements of high availability is shared storage or a sufficiently high-speed connection between storage systems so that writes can be synchronous without noticeable delays. For this reason, high availability is generally thought of as being contained within a single data center.

Cluster services allow the grouping of a set of independent servers (nodes) that are physically connected through a local or wide area network. The cluster treats the nodes as a single management entity. These servers share storage or have high speed connections and storage replication to storage specific to a set of nodes (asymmetric storage), possibly in different data centers.

Cluster services are designed to detect the failure of a component such as hardware or a service, such as SQL Server, and automatically transfer ownership of the clustered resource to another node. With automatic failover, clients can reconnect to a virtual machine or a SQL Server instance network name, now being hosted on a different node, with a minimum of downtime.

Disaster recovery in hosted environment is protection against the failure of an entire data center. Customers with hybrid-IT solutions host their own on-premises SQL Server databases or virtual machines with failover to a hosted service provider. Hosted service providers will failover resources from a primary data center to a remote data center.

Examples of disaster recovery solutions are SQL Server AlwaysOn availability groups with asynchronous writes, Hyper-V® replica, SQL Server log shipping, and SQL Server replication. Each of these has a latency period between writes on primary secondary nodes. With asynchronous availability group replicas, log shipping, and replication, this latency is the time it takes between a transaction committing on the primary node and the same transaction committing on a secondary node. The latency could be milliseconds, seconds, or minutes, depending on the solution in use and factors like network traffic.

2 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 6: SQL Server Business Continuity in a Hosted Environment Reference Architecture

Clustering Virtual Machine Hosts

Figure 1: Hyper-V host servers in a Windows Failover Cluster Instance, sharing storage that can fail over up to 4,000 virtual machines, each with an instance of SQL Server.

Many hosted service providers will find hosting SQL Servers on virtual machines to give them and the customer a lower total cost of ownership because of the ease with which virtual machines can be managed using Windows 2012 and Microsoft System Center Virtual Machine Manager. Even though there is overhead to virtualization that would not be present with a SQL Server instance running on a physical server, many HSPs consider it worth the overhead to make the administrator’s job easier and to keep the customer happy with a greater degree of uptime.

Windows Server Failover Clusters provide high availability for virtual machine hosts. Windows Server 2012 physical hosts can fail over anywhere from a few to 4,000 virtual machines, each running an instance of SQL Server, to another node.

Guest virtual machine clustering is also supported, and it is discussed in more detail in the section on AlwaysOn Availability Groups. This section is limited to physical hosts only.

Windows Server® 2012 Standard Edition supports two-node clusters. For clusters with more than two nodes, the Enterprise Edition is required.

For a physical cluster hosting virtual machines, storage is shared between nodes or independent asymmetric storage with storage visible only to nodes in a specific data center. Very high speed connections are required between storage systems

3 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 7: SQL Server Business Continuity in a Hosted Environment Reference Architecture

in different data center for high availability and to accomplish a failover in little more than would be required to fail over in the case of shared storage.

Clustered Shared Volumes are supported for hosting virtual machines on Windows Server 2012 clusters. Using clustered shared volumes, multiple nodes can read and write to the same physical storage device (LUN).

Live Migration allows a running Hyper-V virtual machine to be copied from one physical server to another without any interruption in service. Live migrations can utilize up to 10-gigabit network bandwidths. Hyper-V virtual machines on a clustered shared volume can move from node to node using live migration independently from each other.

Windows Server 2012 introduces a new feature—live migration queuing. If multiple Hyper-V virtual machines are migrated at the same time, the cluster service will migrate as many as possible in parallel and queue the remaining migrations. Migrations waiting in the queue proceed when sufficient resource exists. An administrator can view live migration status in Failover Cluster Manager.

Other live migration features in Windows Server 2012 include the ability to use live migration to migrate Hyper-V virtual machines between nodes in the same cluster or between nodes in different clusters; and, migrating virtual machine storage to a different storage device.

In Windows Server 2012, priorities may be assigned to clustered Hyper-V virtual machines. These priorities determine the order in which virtual machines fail over and restart after a failover. Priority also ensures that higher priority Hyper-V virtual machines have adequate resource. If there is not enough host resource for all virtual machines to restart at the same time, lower priority virtual machines release resource and only start after higher priority virtual machines.

Priorities are High, Medium, Low, or “No Auto Start.” A clustered virtual machine by default has a priority of “Medium.” Lower priority virtual machines will release resources if those resources are needed by higher priority virtual machines. The

4 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 8: SQL Server Business Continuity in a Hosted Environment Reference Architecture

“No Auto Start” priority means a virtual machine will not automatically restart after a failover.

SQL Server AlwaysOn Failover Cluster Instances

Figure 2: High availability with Failover Cluster Instances across distances requires SAN replication and a fast network connection between storage systems.

There may be times when a customer has a database of sufficient size and resource requirements that the hosted service provider must furnish a dedicated physical server, such as for a data warehouse or a high-volume transaction processing application. As part of the SLA, a high availability solution with shared storage to guarantee necessary uptime may be required.

The customer might also find a hybrid-IT high availability and disaster recovery solution attractive where that SQL Server fails over to another physical server at the hosted service provider; in this case, both servers are nodes in a Windows Failover Cluster.

A SQL Server AlwaysOn Failover Cluster instance is installed across nodes of a Windows Server Failover Cluster instance. SQL Server database files, including system databases, are on shared storage. The one exception to this is the Tempdb database, which in SQL Server 2012 can be on local storage.

A network name and one or more IP addresses are assigned to the SQL Server clustered instance. This name and address or addresses allow clients to seamlessly connect to the SQL Server on the active primary node that owns the database files. Other nodes in the cluster are passive. SQL Server does not support the use of clustered shared volumes.

5 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 9: SQL Server Business Continuity in a Hosted Environment Reference Architecture

If a cluster node fails or the SQL Server service on a node fails, automatic or manual failover can transfer ownership of database files on clustered storage to a different node.

The cluster service starts the SQL Server services on the new primary node. SQL Server recovers each of the databases. Recovery rolls forward any transactions in the transaction log of a database that are not yet applied to the database files. Recovery rolls back transactions that were in progress at the time of the failure and had not yet committed and verifies data pages are in the “before” state.

When recovery completes and the SQL Server clustered instance network name and addresses point to the new node, clients can start using the database again. This failover and recovery can take anywhere from a few seconds to a minute or more, depending on how much work needs to be done during the recovery and how many databases there are to recover.

SQL Server 2012 supports the multi-subnet cluster configuration where each node can potentially be on a different subnet. This configuration is often used for disaster recovery plans. For example, the hosted service provider might have nodes in different data centers. This configuration is also used for hybrid-IT where there are cluster nodes in both the tenant’s on-premises organization and the hosted service provider’s data center.

6 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 10: SQL Server Business Continuity in a Hosted Environment Reference Architecture

SQL Server AlwaysOn Availability Groups

Figure 3: Availability groups are sets of databases that can be HA with synchronous writes and a fast connection; DR with databases in different data centers.

SQL Server AlwaysOn Availability groups are high availability and disaster recovery solutions. Availability groups fail over databases, unlike a failover cluster instance that fails over the entire SQL Server instance. The SQL Server hosting an availability group may be on a virtual machine in a hosted environment due to its ease of management.

Each SQL Server in an availability group is installed on a cluster node in the same cluster, but there is no shared storage between nodes; each node has independent storage. A SQL Server Failover Cluster Instance (where the entire instance is the clustered resource) may not also have databases in an availability group.

One or more writable databases is selected to be part of an availability group (the primary replica). Other SQL Servers on nodes in the same cluster are selected to host copies of the databases (secondary replicas).

Once the secondary replicas are in sync with the primary replica, any write to the primary replica availability group databases is mirrored on the secondary replicas. Writes to one of the secondary replicas can be synchronous; the transaction does not commit, and notification of success is not returned to

7 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 11: SQL Server Business Continuity in a Hosted Environment Reference Architecture

the client until the write is successful on both the primary and synchronous secondary replica. Transactions are written asynchronously to all other replicas.

Especially for a secondary replica in synchronous commit mode, connections between the primary replica and the synchronous secondary replica should be very fast. The longer a transaction is in progress, such as in the case of a primary replica waiting until a secondary replica commits, the longer users have to wait and the longer locks are in place on the primary replica. Therefore, when transaction time is slow, there is a greater likelihood of blocking locks and deadlocks. When a deadlock occurs, a transaction is rolled back, and the client will need to retry it. With blocking locks, more CPU resource is being used to manage locks and multiuser concurrency.

Figure 4: Transactions commit on the primary replica without waiting for secondary replicas set to asynchronous commit. There is no latency on the primary replica as a result. Only manual failovers are permitted to secondary replica, and there may be data loss.

Besides the one synchronous secondary replica, writes to other secondary replicas are asynchronous; the transaction on the primary replica commits, and then mirroring to the secondary replicas occurs. There is some latency between the commit on the primary replica and the commit on an asynchronous secondary replica. This will usually be measured in milliseconds but will also have a lot to do with network throughput. If a failure occurs on the primary node and the failover starts before a transaction already committed on the primary replica

8 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 12: SQL Server Business Continuity in a Hosted Environment Reference Architecture

is committed on the secondary replica, there is the danger of data loss.

Each availability group can have one availability group listener, a network name, and one or more IP addresses that point to the primary replica in the availability group. Client sessions also can log into the primary node directly, but a listener makes it possible for a client to log on seamlessly to whatever node is hosting the primary replica without knowledge of the individual nodes and retry logic.

Availability groups and the clusters that host them can span networks. A hosted service provider may have sets of IP addresses for different data centers on different subnets. The multi-subnet feature of Windows Clusters allows the cluster nodes to be on these different subnets, and the availability group listener can have addresses on each subnet.

With an availability group, the cluster services keep track of the health of all the nodes. If the primary node fails, the cluster service changes the status of the availability group with SQL Server. A failover can occur automatically or manually. The failover consists of changing the node to which the availability group listener points and the SQL Server on the secondary replica recovering the availability group databases and bringing them online for writing.

Read-only client sessions can connect to secondary replicas in an availability group for activities like reporting. This helps reduce multi-user concurrency issues in databases with a high volatility of read and write activity.

9 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 13: SQL Server Business Continuity in a Hosted Environment Reference Architecture

Hyper-V Replicas

Figure 5: Hyper-V Replica asynchronously copies virtual machine files from a host server to a replica server for disaster recovery.

Hyper - V Replica is a new feature in Windows Server 2012 included with the Hyper-V role. It replicates a running primary virtual machine asynchronously across the network to a Windows Server 2012 Hyper-V replica server. The secondary replica server can accept replications of virtual machines from multiple primary hosts.

An example deployment of this as a disaster recovery solution is a Windows Server 2012 cluster hosting Hyper-V virtual machines replicating to another Hyper-V cluster in a distant data center. The secondary cluster is intended as not active until it is brought online if the primary data center fails.

Hyper-V replicas do not have to be in the same domain. With certificate authentication, replicas can be in untrusted domains or workgroups. All that is required is network connectivity between Windows Server 2012 instances with the Hyper-V role, properly configured firewalls and routers, and sufficient storage capacity.

The primary and secondary hosts also do not have to share a similar storage configuration. The secondary replica virtual machine files can be stored on SMB shares, Clustered Shared Volumes, SANs, or directly attached storage devices.

The difference between Hyper-V replicas and live migration is that replicas keep up-to-date copies of virtual machines ready for failover if a primary site fails. Live migration is a planned

10 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 14: SQL Server Business Continuity in a Hosted Environment Reference Architecture

administrative activity when it is necessary to move a virtual machine from one host to another.

An initial synchronization takes place between the primary host and the secondary host. Once the secondary virtual machine is initialized, replication sends incremental updates across the network.

For a planned failover, replication finishes copying any unreplicated data in the pipeline to the secondary virtual machine before it is brought online, resulting in no loss of data. For disaster recovery after unplanned outages, the secondary virtual machine is manually brought online, and some data loss is possible from any unreplicated changes that were not sent before the outage.

SQL Server Transaction Log Shipping

Transaction log shipping is one of the oldest disaster recovery solutions for SQL Server. Scheduled jobs periodically back up the primary copy of the database. Other scheduled jobs periodically restore those backups to a secondary copy of the database.

Secondary databases can be available for reading between restores, but they must disconnect for the next restore to take place. The database remains in the standby state, and in a state of not being recovered to continue to accept restores, in order for transactions that did not finish in one restore to possibly finish in the next restore.

Failover is manual. The database is recovered with the most recent backup that could be restored. If the partition where the database transaction log files reside is available, even though the data files may have been lost, the transaction log of the database can be backed up one last time and restored to the secondary database. This backup and restore after failure restores all committed transactions up to the time of the failure. Otherwise, the amount of data lost will be those transactions committed after the last backup.

Log shipping only affects the user databases being backed up and restored. SQL Server logins are not transferred to the

11 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 15: SQL Server Business Continuity in a Hosted Environment Reference Architecture

secondary server. There should be a method in place to periodically transfer changes to Windows and SQL authentication logins and changes to server roles to secondary servers.

SQL Server Transactional and Peer-to-Peer Replication

SQL Server Transactional Replication and Peer-to-Peer Replication can both be used as disaster recovery solutions. SQL Server replication uses a publisher-subscriber model. The primary database is the publisher. Selected tables or filtered data are the articles in the publication. A secondary database is a subscriber.

After an initial synchronization, both types of replication have a job that reads transaction log data of the primary database. The log reader writes the transactions to a third database, the distribution database. Another job, the distribution agent, reads the transactions in the distribution database and duplicates the transaction on the subscribers.

Transactional replication assumes that changes to a row of data will take place only on that row’s publisher. If multiple servers have a duplicate database kept up to date with replication and there are changes to a table in all the databases, additional data should be in the table to isolate changes to that row’s publisher—for example, a ServerId column. There might be a trigger or stored procedure that restricts who can make changes with permissions and also restricts the rows being modified to those owned by the current server.

Transactional replication with updating subscribers has been deprecated and will be removed from a future version of SQL Server. If changes need to be made to subscribing servers, peer-to-peer replication is recommended.

Peer-to-peer replication is a type of transactional replication. Changes are replicated from the publisher to subscribers in near-real time. It is often used to scale out reads. Peer-to-peer replication is not supported for databases that are part of an availability group.

12 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 16: SQL Server Business Continuity in a Hosted Environment Reference Architecture

Copies of the data are kept up to date on all the servers in the peer-to-peer topology, and changes can be made on any server. Every server is both a publisher and subscriber. All the changes to an article flow from the server where the change is made to all the other subscribers.

In a peer-to-peer topology, any publisher can change a row. There is the possibility of conflict if one server changes a row and before the change is propagated, a second server changes the same row. There are no locks placed on rows that signify which server owns that row for the purpose of making the change. The only locks are typical insert, update, and delete locks. You can set a property on the publication that detects conflicts when the distribution agent applies a change to a row at the subscriber. This will cause the distribution agent to fail until the conflict is resolved. If the property that allows the distribution agent to detect conflicts is turned off, a potential change could be lost.

Changes with either type of replication flow asynchronously to subscribers after the transaction commits on the publisher. While the replication is in near-real time, there is still some latency. The near-real time replication also depends on the continuous operation of the log reader agent and distribution agent jobs. Administrators may sometimes schedule the distribution agent job, allowing transactions to accumulate in the distribution database until a known time for replicating changes window, so that there aren’t as many conflicts between reads and writes on a subscriber.

Replication only affects the published user databases. SQL Server logins are not transferred to the secondary server. There should be a method in place to periodically transfer changes to Windows authenticated logins and SQL Server authentication logins and changes to server roles to secondary servers.

13 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 17: SQL Server Business Continuity in a Hosted Environment Reference Architecture

Business Continuity InfrastructureThe following are the infrastructure components and hardware best practices for these high availability and disaster recovery solutions where cluster services are required. Server computers should be certified for Windows Server Failover Clusters.

Clustered servers that will be Hyper-V hosts must support the hardware requirements for the Hyper - V role including:

A 64-bit processor. Hardware-assisted virtualization. Hardware-enforced Data Execution Prevention must be available and

enabled.

Networking Prerequisites

The reliability of the cluster is dependent to a large degree on the underlying network infrastructure, whether the cluster is built on physical hosts or virtual machines. There should be multiple networks between cluster nodes and storage systems, each with a minimum of one physical network adapter. In some cases, multiple adapters per function are necessary to eliminate single points of failure and to meet performance objectives. Private networks have their own dedicated subnets.

Use NIC Teaming to combine physical network adapters to work together for greater aggregate throughput and for failover in case one adapter fails. Windows Server supports up to 32 NICs in a team.

Do not use NIC teaming for network attached storage, such as iSCSI or Fibre Channel over Ethernet. Storage networking should take advantage of multipath IO.

For other best practices, see the Windows Server 2012 Hyper-V best practices.

For network adapters on the physical host that will be dedicated to virtual machine external networks, NICs that support Single Root-IO Virtualization virtualize the NIC at the

14 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 18: SQL Server Business Continuity in a Hosted Environment Reference Architecture

hardware level and expose the virtual NIC to the VM without operating system overhead. This can reduce the CPU overhead required for network traffic.

Windows Server 2012 includes bandwidth management if more than one of these types of traffic is included on a single network adapter. Maximum and minimum bandwidth limits can be specified.

The following networks should be considered for Hyper-V hosts in clusters. It is a good idea to name each network adapter to reflect its purpose.

A cluster health and management private network with one dedicated network adapter per host. This network will be used for the cluster heartbeat traffic—and to transfer updates to the cluster database and cluster shared volumes. It may also be used to access the file share used for the Node and File Share Majority quorum configuration.

One dedicated private network for Hyper-V server management. Cluster health traffic may also use this network.

Two network adapters (minimum) dedicated to multipath IO between the host and iSCSI and Fibre channel over Ethernet. Any network dedicated to storage traffic should be restricted and disabled for cluster health or management traffic.

One dedicated private network for Live Migration and/or Hyper-V replica traffic. This network can also be used for cluster health and management traffic.

Two network adapters (minimum) per host that are used exclusively for virtual machine external networks. These adapters will not have IP addresses on the physical hosts.

Additional physical network adapters that are used by virtual machines as multipath IO access to storage systems. These adapters will not have IP addresses on the physical hosts.

Before configuring clusters between data centers, configure how the networks will be bridged. VPNs are a common way of internetworking data centers to ease port management. Clusters and SQL Server availability groups both require multiple ports to be opened across data centers.

Server nodes, the cluster access point, the SQL Server clustered instance, virtual machines running SQL Server, and the availability group listener should all have static IP addresses. The cluster access point and the availability group listener will need addresses on each subnet.

15 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 19: SQL Server Business Continuity in a Hosted Environment Reference Architecture

If IPv6 addresses will not be used, disable IPv6 on all network adapters.

Storage Prerequisites

SQL Server availability groups do not share storage on a cluster. Each node can have independent storage when the cluster will host availability groups.

Clustered shared storage, including drivers and firmware, must meet the requirements of Windows Server 2012 clusters. Confirm the compatibility of storage for clusters that will fail over virtual machines or SQL Server instances before attempting to form the clusters.

Storage devices on different networks should be isolated to cluster nodes on that network. Servers from different clusters and cluster nodes on different networks should not be able to access the same storage.

Use multipath IO (MPIO) for redundancy between hosts and storage devices. MPIO architecture supports iSCSI, Fibre Channel, and serial attached storage (SAS) SAN connectivity by establishing multiple sessions or connections to the storage array.

When using Serial Attached SCSI or Fibre Channel, elements of the storage stack should be identical in all clustered servers. The multipath IO configuration should be identical.

When using iSCSI, each clustered server should have multiple network adapters or host bus adapters dedicated to cluster storage. Storage networks should be isolated from all other networks.

Cluster Serial Attached Storage (SAS) can be configured using Storage Spaces in Windows Server 2012. A clustered storage pool must be made up of at least three SAS-connected physical devices with at least 4-gigabyte capacity. The physical disks must be dedicated to the storage pool. The pool must use fixed provisioning and use either simple or mirror storage spaces. Parity storage spaces are not supported.

16 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 20: SQL Server Business Continuity in a Hosted Environment Reference Architecture

A cluster with nodes in separate data centers uses asymmetric storage. Very high speed storage replication is required to keep the secondary nodes in the remote data center up to date in near-real time.

Windows Server 2012 supports hosting Hyper-V virtual machines on SMB 3.0 file shares in a cluster. SQL Server 2012 database data and log files can also be on SMB 3.0 file shares.

A hosted service provider should work closely with customers to determine the SLA requirements for storage IOPS. Physical storage should be configured for the best combination of performance and reliability for SQL Server database files when virtual machines are configured for pass-through, or Hyper-V VHDX files that are, in turn, hosting SQL Server data and log files. Storage optimized for random access will return close to the same performance for either of these.

For a physical host with a fewer total number of physical disks available where a mixture of RAID partitioning schemes might be used, storage for a physical host can be tiered according to whether it is intended for the host operating system, guest operating systems, guest SQL Server data files, guest SQL Server log files, guest SQL Server Tempdb files, or backup files. However, with sophistication in reliability and redundancy increasing, many storage administrators are opting for large disk arrays with everything in one place.

SQL Server database data files have traditionally been on separate spindles from log files. This is because even if a data partition has been lost, making the database unusable, if the log partition is still available, the log can be backed up in order to restore all committed transactions up to the time of the failure.

RAID 1+0 can be used for both data and log files. One configuration is to have many more spindles per LUN for data, for example 4x4 sets, resulting in a better combination of read and write performance and greater reliability. An array intended for the best write performance for log files might contain one or more 2x2 sets. For smaller arrays, RAID 1 might

17 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 21: SQL Server Business Continuity in a Hosted Environment Reference Architecture

be used for the operating system partitions and log files while RAID 1+0 is used for data.

Many administrators are confident enough of their storage redundancy and reliability that they are partitioning a single RAID 1+0 array for use by all database files, both data and log. Some SQL Server appliances use this approach, with presenting only a single drive C: to the virtual machine. With sufficient confidence in the infrastructure reliability and performance, it is no longer necessary for data files and log files to be on separate spindles.

If the cluster will span multiple data centers and storage replication will be used, see the Requirements and Recommendations for a Multi-Site Failover Cluster.

Make sure all drive letters and paths are identical on all servers.

It is recommended that all storage for SQL Server database data and log files write through to the physical media. There should not be any write-behind caching with SQL Server data.

When configuring storage, the Tempdb database should have its database data and log files on the partitions reserved for data and log files. SQL Server performance can suffer when Tempdb files are on storage that performs less than optimally for SQL Server reads and writes.

SQL Server supports the use of SSD storage. While cost prohibitive in many cases, using SSD for Tempdb database files can improve performance greatly for applications that extensively use temporary tables.

If storage devices support the new Microsoft Offloaded Data Transfer, copying large amounts of data, typically the slowest part of a database migration or backup and restore, can speed up substantially. The copy is offloaded to the hardware. The Offloaded Data Transfer is embedded in the Windows copy engine.

Keeping SQL Server backup files on storage separate from database data and log files is a good practice. A RAID 1 or RAID

18 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 22: SQL Server Business Continuity in a Hosted Environment Reference Architecture

5 partition will provide redundancy where top performance is not required.

19 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 23: SQL Server Business Continuity in a Hosted Environment Reference Architecture

Solution DeploymentThe following are the deployment considerations for failover cluster instances where virtual machines and SQL Server instances are the cluster resources, availability groups, log shipping, and replication. Sections on clusters apply only to Windows Failover Cluster Instances, where virtual machines, SQL Server instances or SQL Server availability groups are the cluster resource.

Clusters hosting virtual machines or SQL Server instances are assumed to be on physical machines. Clusters hosting availability groups are assumed to be virtual machines.

Log shipping and replication do not have any clustering requirements. Scheduled jobs that carry out log shipping and replication activities may or may not be logging on to clustered SQL Server instances.

Active Directory Prerequisites

All nodes in a cluster, regardless of type, must be part of the same Active Directory® domain. There are no Active Directory requirements for log shipping or replication.

Where required, there should be a minimum of one Active Directory domain controller and DNS server in each data center.

Configure IP addresses for the private networks between cluster nodes. Add the local domain DNS server addresses to every network adapter that will use the domain controller.

Add all cluster node network adapters for private networks to the Active Directory domain. All servers in the cluster must be in the same domain.

Client Connection Prerequisites

Before configuring the cluster, also configure how clients will connect. Servers and clients must be in the same Active Directory domain or trusted domains if clients will be logging on to SQL Server using Windows authentication.

20 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 24: SQL Server Business Continuity in a Hosted Environment Reference Architecture

SQL Server authenticated logins can also be used. With SQL Server authentication, the login and the password are stored in SQL Server and are independent of Active Directory authentication.

Hosted Service Provider network administrators and DBAs will need networks to connect to both the hosts, possibly hosting clustered SQL Server instances and virtual machines hosting SQL Server. There may be separate networks for tenants, and in the case of multi-tenant SQL Servers, perhaps even one network per tenant. Hybrid-IT organizations will need a way of bridging their on-premises network to the HSP network, perhaps by including servers in both data centers in the same, or a trusted, domain. Port 1433 is typically used for SQL Server client session; port 5022 for the mirroring function of an availability group.

Consider whether Windows Firewalls should be disabled on cluster hosts and on servers hosting SQL Server. If there are other ways of shielding a server from undesired traffic, doing this may make administration of these servers a lot easier. There are many ports required by cluster services and Active Directory.

Cluster Services Configuration

These steps apply to clusters hosting virtual machines or SQL Server instances on physical hosts or availability groups on virtual machines. Storage considerations only apply to clusters with shared storage. There is no clustered shared storage for clusters hosting SQL Server availability groups.

Add the Failover Cluster feature to each cluster node. Choose a node that will be the primary node in the cluster. If SQL Server is already installed and running, this is where databases that are going to be failed over with the cluster or the availability group are in a writable state. Run Failover Cluster manager or use Windows PowerShell® scripts to create the cluster on this primary node.

There is no option in Failover Cluster Manager for asymmetric storage. If asymmetric storage is being used for nodes in different data centers, add storage to the appropriate nodes. If

21 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 25: SQL Server Business Continuity in a Hosted Environment Reference Architecture

necessary, use LUN masking or zoning to prevent distant nodes from using the storage.

The cluster validation test should test all components for a shared storage cluster. If the cluster is using asymmetric storage, you may have to run the validation tests on a node specific to each separate storage device. The validation test for storage can be skipped for a SQL Server availability group cluster.

Correct any errors in the report. Finish creating the cluster, specifying a network name and one or more IP addresses (one for each subnet) for the cluster access point. Once the cluster is formed, verify the cluster access point and all the nodes are online.

A cluster resource IP address may be offline. This is to be expected if the node that is the cluster group owner does not have an address for that subnet.

IP addresses that are cluster resources for a failover cluster instance will be set to an OR dependency automatically in Windows Server 2012. Verify that every node in the cluster is on a subnet that allows it to own at least one of the addresses specified.

The subnet for which the cluster group owner does have an address should be online. If the wrong node is the owner, it is easy to change the node that is the owner.

After the cluster is formed, you can choose a quorum configuration. The quorum defines how nodes and other resources will contribute to voting on the health of the cluster. If the cluster loses a quorum, that is, it lacks 50% + 1 of the votes necessary to vote on the health of the cluster, the cluster may stop or be unable to start. There are still ways to work with SQL Server if the quorum stops the cluster from working, but there are administrative steps involved.

Each cluster node has a quorum vote by default. In general, it is desirable for the quorum to contain an odd number of votes. If the cluster has an even number of nodes, it is recommended to configure a disk or file share witness.

22 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 26: SQL Server Business Continuity in a Hosted Environment Reference Architecture

If the cluster has shared storage and an even number of nodes, consider the node and disk majority quorum. The quorum drive requires a dedicated LUN where the cluster database is stored as a vote.

For clusters hosting availability groups that do not have shared storage and have an even number of nodes, the node and file share majority can be configured. A file share member of the quorum requires the cluster computer object have read and write permission on the share and the file system.

SQL Server Installation and Configuration

AlwaysOn Failover Cluster Instances are shared storage clusters that fail over the entire SQL Server instance including the system databases, except Tempdb if it is on local storage, and all user databases. At failover, a new primary node takes ownership of the SQL Server resources on shared local storage or remote asymmetric storage.

Create a new SQL Server failover cluster that will be failing over the entire SQL Server instance through SQL Server Setup by choosing New SQL Server Failover Cluster Installation.

Clusters hosting SQL Server AlwaysOn Availability Groups do not have shared storage. SQL Server is installed as a stand-alone installation from Setup. The cluster is used for monitoring the health of the nodes hosting SQL Server in the availability group.

A hosted service provider might find SQL Server availability groups being hosted on virtual machines is a very administratively efficient way of having high availability and disaster recovery with the ease of managing virtual machines. Microsoft System Center Virtual Machine Manager can configure availability groups.

To begin configuring cluster nodes for either SQL Server failover cluster instances or availability groups, create a single domain account. This will be used by the SQL Server service on all cluster nodes to log on to the network.

There are two user rights that are not granted by default that can help improve performance. Granting the Lock Pages in

23 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 27: SQL Server Business Continuity in a Hosted Environment Reference Architecture

Memory user right on each node for the SQL Server service account will reduce memory pressure. When the virtual machine operating system takes memory away from SQL Server, this user right will give memory back without paging, which reduces disk access times.

Granting the Perform Volume Maintenance Tasks user right on each node for the SQL Server service account will substantially improve performance of file growth events but carries with it a slight security risk. SQL Server by default zeros out each 8K page when space is allocated to database files. By granting this right, the pages are allocated without being zeroed out. The security risk is that through SQL Server administrative commands that are well known, an administrator can see the contents of an 8K page. These may contain data from deleted operating system files.

Install SQL Server 2012 Enterprise on each node. For SQL Server failover cluster instances, select “New SQL Server Failover Cluster installation.” For a SQL Server that will be hosting an availability group, select a stand-alone installation.

Choose the features the SQL Server will require. Cluster-aware features that can be part of a failover cluster instance with shared storage are the MSSQLSERVER service (the database engine), the SQL Server Agent service, and SQL Server Analysis Services. These selections are not applicable to a cluster hosting a SQL Server availability group. The SQL Browser service should remain in a disabled state for both types of clusters as long as a default instance of SQL Server is being installed.

Choose the Client Tools Connectivity, Client Tools Backward Compatibility, and Management Tools – Complete as desired. Verify client computers can connect to the SQL Server on every node in the cluster.

Be sure to choose at least one user or Windows group to be an administrator. If the user performing the installation will be an administrator on the SQL Server, you can choose Add Current User to the collection of administrators.

24 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 28: SQL Server Business Continuity in a Hosted Environment Reference Architecture

When specifying the default locations of data and log files for user databases, note that this is only the default and can be changed for individual databases. Make sure that all databases will be stored in the same paths including drive letters, directories, and file names, on all storage systems for all nodes. System database directories including Tempdb should also be the same for all SQL Servers on all nodes in the cluster.

Before production deployment, test failover of a failover cluster instance. Verify when failing over SQL Server instances, the new primary node successfully owns the SQL Server instance and clients can write to databases.

SQL Server AlwaysOn Availability Group Configuration

After the cluster is created and SQL Server is installed on all the cluster nodes, enable the Availability Group property of the SQL Server service. This must be done on each node. The property is accessible through SQL Server Configuration Manager.

An availability group contains a single primary replica for clients connecting to write to the databases in the availability group. Choose which node will host the primary replica. This may be a database that already exists. There is no need to change the cluster configuration; SQL Server will automatically change the cluster group owner as necessary. From this point forward, you should not have to change any cluster properties for a normal configuration. This will all be handled by SQL Server.

If one or more databases that will be part of the availability group do not already exist, create at least one database on the primary node that will be part of the availability group. This is required even if it is a temporary database used only for this purpose.

Create a file share to store SQL Server backups accessible from all nodes in the availability group. This will be used to store backups to sync availability group replicas. The SQL Server service account for the replicas will need network and file system permissions to read and write to the file share.

25 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 29: SQL Server Business Continuity in a Hosted Environment Reference Architecture

Before creating the availability group, any database that is a member of the availability group must be backed up at least once. Create a full database backup if a database is new. If the backup share used to sync the availability group replicas has not been used before, this will be an opportunity to test it to verify the SQL Server service account has the necessary permissions.

Choose which nodes will be secondary replicas. One replica in addition to the primary replica may be configured for synchronous commits and automatic failover for high availability. Other secondary replicas must be configured with asynchronous commits and manual failover for disaster recovery.

Synchronous writes mean a transaction does not commit on the primary replica, and notification of the write’s success is not returned to the client until the write is successful on a synchronous secondary replica. The fasted possible connection is important for synchronous writes between replicas not to cause noticeable delays for users. Longer transaction times also mean locks are in place longer, there is a greater likelihood of blocking locks and deadlocks, and there is high CPU usage in order to manage these resources.

An asynchronous write means there is some latency between the commits on the primary and secondary replicas. If a transaction commits on the primary replica but failure of the primary replica occurs before the transaction is written to the secondary replica, data can be lost.

Create the availability group. The following steps are in the same order as they will be presented in the Create Availability Group Wizard or how they will have to be configured using Transact-SQL scripts or Windows PowerShell commands. If you use the Wizard, the steps can be scripted with your specific choices.

Select one or more databases that will be members of the availability group. Specify the node that will be the primary replica, the secondary node in synchronous commit mode with automatic failover if desired for high availability, and the

26 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 30: SQL Server Business Continuity in a Hosted Environment Reference Architecture

secondary replicas in asynchronous commit mode for disaster recovery. If you use the Create Availability Group Wizard, the wizard will confirm all the databases qualify for membership by having their recovery mode set to Full Recovery and they have been backed up at least once.

You may choose any replicas that will accept read-only connections during initial configuration. This does not configure read-only routing. Read-only routing will have to be configured through Transact-SQL or Windows PowerShell commands.

There are limitations to backing up SQL Server databases from secondary replicas. Be sure to know the limitations before configuring backup preferences.

Create an Availability Group Listener. Specify the network name that clients will use to connect to the availability group along with one or more IP addresses. SQL Server will register the listener name and IP addresses with the domain DNS server. IP addresses are required for each subnet on which a cluster node resides.

The listener will route incoming client connections to the appropriate replica. Read-write client connections always connect to the single primary replica. Read-only connections can connect to a secondary replica if read-only routing is configured and the client supplies the correct properties in the connection string (see Read-Only Routing, below).

Specify a synchronization method. If all the nodes have the same drive letters, directory paths, and file names for the database files; you can choose the full synchronization method in the wizard. This will do a full backup and transaction log backup of each database in the availability group on the primary replica and restore them to all the secondary replicas. If database data files or log files have a different path on any node, the Join Only synchronization preference must be used and the backups restored manually to each node.

The availability group validation may generate a warning message if the quorum file share is on the same subnet as nodes. Microsoft best practice is to have the file share in a separate data center on a separate network from any nodes.

27 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 31: SQL Server Business Continuity in a Hosted Environment Reference Architecture

Synchronize the replicas to start the automatic mirroring between the primary and secondary replicas. Port 5022 is used by default for mirroring replica databases.

Configure read-only routing for clients that use the appropriate properties in a .NET connection string to establish read-only connections to a secondary replica. Read-only routing must be configured using Transact-SQL or Windows PowerShell; there is no SQL Server Management Studio UI in SQL Server 2012 for configuring read-only routing.

To access a read-only secondary replica, a client must log on to the availability group listener with a .NET connection string that includes the database property of an availability group database and the property ApplicationIntent=ReadOnly.

Backups from secondary replicas are allowed with limitations. Full database backups from secondary replicas are copy-only backups. Copy-only backups do not affect the log chain or reset differential backups.

The only type of transaction log backup that can be restored after a copy-only full backup is restored is a backup with the NORECOVERY option. This backup requires that no users be working in the source database. A copy-only full backup cannot be used to start a backup set that will be restored and so cannot be executed from a secondary replica. A full database backup that is intended to begin restoring a backup set must be executed from the primary replica.

Transaction log backups can be done from primary replicas or secondary replicas as long as the secondary replica is in a synchronized (synchronous commits) or synchronizing (asynchronous commits) state. Differential backups cannot be performed from secondary replicas.

To configure backups from secondary replicas, each node in the availability group is assigned a backup priority. All this does is set the backup preferences. It has nothing to do with how backups actually run. Backup jobs must be configured on each server. Each job runs as scheduled. If the replica on which the job runs is not the preferred replica, the job ends without

28 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 32: SQL Server Business Continuity in a Hosted Environment Reference Architecture

running the backup. Only on the preferred replica does the job run.

For additional information, see AlwaysOn Architecture Guide: Building a High Availability and Disaster Recovery Solution using AlwaysOn Availability Groups.

SQL Server Clustered Instance and Availability Group Client Sessions

Client sessions should use these .NET connection string properties to connect to SQL Server, including:

Always specify the Database=databaseName or Initial Catalog=databaseName property in the connection string, where databaseName is the name of the SQL Server database which the client session will use. These properties are equivalent.

Specify the MultiSubnetFailover=True connection string property when connecting to a SQL Server cluster instance or an availability group listener with IP addresses on multiple subnets. This will improve connection performance.

Connect to an availability group listener using both the database name and the ApplicationIntent=ReadOnly property to have the client session automatically routed to a read-only secondary replica when the availability group has been configured with read-only routing.

Hyper-V Replica

Hyper - V Replica provides asynchronous replication of Hyper-V virtual machines between a primary host and a secondary replica host. The replica host could be in a remote data center for disaster recovery.

Hyper-V Replica does not require any particular kind of server or storage configuration. All that is required are two Windows Server 2012 instances with the Hyper-V role installed and sufficient network bandwidth between the primary and the replica servers. The replica server can receive virtual machines from multiple Hyper-V host servers. The primary host can also host virtual machine replicas, and the replica host can also host primary virtual machines.

Domain membership is also not required. The replica server can be in a workgroup.

29 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 33: SQL Server Business Continuity in a Hosted Environment Reference Architecture

Hyper-V replica uses the Volume Shadow Copy service. This allows administrators to create “point in time” copies of a replicated virtual machine.

Hyper-V replica can be installed on a cluster or stand-alone server. If it is installed on a cluster, the Hyper-V Replica Broker role must be added to each node in the cluster through Failover Cluster manager.

Both the primary and replica servers must be configured for Hyper-V replica.

In Virtual Machine Manager on the secondary host replica server, enable the server as a replica server. Then configure it for the type of connection and certificates it will use, if necessary, along with ports, to receive virtual machine data.

Authentication using certificates can be used with the HTTPS protocol. Certificates are required on both the primary and replica servers.

On the primary replica, enable replication for one or more virtual machines in Hyper-V Manager. Specify the replica server. Choose the connection parameters that match what was specified on the replica server. If necessary, specify the certificate that was specified on the replica server.

One or more individual virtual disk files (.VHD, .VHDX) may be chosen for the primary virtual machine. Specify at least the disk(s) containing the partitions for the operating system and SQL Server database data and log files.

Choose one or more recovery points for the primary virtual machine. If something goes wrong recovering to the most recent recovery point, you can choose an earlier one.

Before incremental changes start replicating, choose an initial synchronization method. An initial copy of the virtual machine can be transmitted across the network, or you can use external media to physically move it to the replica server. Once the configuration is complete, if the initial synchronization occurs across the network, you can start synchronization immediately or schedule when it should start.

30 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 34: SQL Server Business Continuity in a Hosted Environment Reference Architecture

Before incremental updates start, you may have to configure firewall rules for either the HTTP or HTTPS traffic.

Once the initial synchronization is complete, incremental changes are sent from the primary host servers to the replica servers periodically as changes occur.

For more information and an illustrated step-by-step deployment example, see A Practical Guide to Microsoft Hyper - V Replica, Part I .

SQL Server Transaction Log Shipping

A database that will use log shipping as a disaster recovery solution must be in full recovery mode. Typically, full database backups and transaction log backups are already occurring on a scheduled basis.

Choose how often you would like transaction log backups to be applied to the secondary copy of the database. This decision, in effect, is how much data you can afford to lose. If a backup and restore occurs every 15 minutes, then you are saying up to 15 minutes worth of data can be lost.

To configure log shipping, in SQL Server Management Studio, in the Object Explorer properties of a database, you can configure how often you want the primary copy of the database to be backed up and the secondary SQL Server instances to which the database will be restored. These selections will go into creating backup and restore jobs on the primary and secondary SQL Server instances. These jobs can be scripted.

Make sure that these jobs don’t conflict with backup jobs already running on the primary server. Choose one method or the other, not both.

When configuring log shipping, you can also optionally specify a log shipping monitor server. Backup job history and results are stored on the primary SQL Server; restore job history and results are stored on the secondary SQL Server. You can also specify a log shipping monitor SQL Server where this information is stored. Many log shipping jobs from many servers can forward job history and results to the monitor server.

31 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 35: SQL Server Business Continuity in a Hosted Environment Reference Architecture

You can also configure jobs to run that consist of Transact-SQL scripts or Windows PowerShell Cmdlets. The job steps are to back up the transaction log of the primary database, copy the backup file(s) from the primary to the secondary server so the restore can be local, and restore the transaction log on the secondary SQL Server. These steps can be combined into a single job or each made individual jobs that run according to the schedule in the SLA between the hosted service provider and the customer.

The restore scripts need to take care that all transaction logs are restored in the correct order. The beginning and ending transaction log sequence numbers in each backup, columns FIRST_LSN and LAST_LSN, can be retrieved from the BACKUPSET table in the MSDB database on the primary SQL Server. These columns can be used more reliably to establish the correct order of restoring backups than, for example, the timestamp on the backup file.

SQL Server Transactional and Peer-to-Peer Replication

Transaction and Peer-to-Peer replication can both be configured through the SQL Server Management Studio client tool. You can also configure replication by using a set of Transact-SQL stored procedures.

Enable the publishing database for transactional replication. This is required before creating a publication. Contained databases, such as those that would be used for database as a service, cannot be enabled for publishing.

Select which database objects you would like to publish. Tables are the most typically published objects, but indexed views, stored procedures and other types of database objects can also be articles in the publication.

Determine which SQL Server should host the distribution database. If there are multiple publishers, each should have its own distribution database to avoid a single point of failure.

Plan whether the distribution agent will run as a job on the distribution server (a push subscription), or as a job on each

32 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 36: SQL Server Business Continuity in a Hosted Environment Reference Architecture

subscribing server (a pull subscription). Push subscriptions are the default.

Create the distribution database, if necessary. Create a directory and file share where a backup may be stored for initial synchronization of the subscribing servers. A full database backup of the published database will be stored in the directory that will be restored on each subscriber.

You will need to specify how the agents log into the network. These are the snapshot agent for doing the full backup and the distribution agent for restoring the full backup and replicating changes from the distribution database to each subscriber. The accounts specified will need the appropriate read and write permissions on the file share and in the distribution server file system. These permissions will also be required for the service account the SQL Server service is using to log on to the network.

For peer-to-peer replication, set the conflict detection as desired. It is on by default. If a conflict is detected at a subscriber, it will cause the distribution agent to fail.

SQL Server Management Studio contains a sophisticated user interface for creating and managing replication components, and monitoring replication jobs. The settings you specify for replication properties and creating publications and articles can be used by SQL Server Management Studio to generate a script you can modify and use for other publishers and subscribers. All the replication properties are set through a set of replication stored procedures.

Create the publication and articles. Specify the initial synchronization method and the location where backup files should be stored, if any.

If a backup is used to synchronize the publisher and subscribers, the snapshot agent will create the full database backup, store it in designated file share, and the distribution agent will restore it to the subscriber. Once the initial synchronization is complete, each transaction to an article will flow from the published database to the distribution database using the log reader agent. The distribution agent reads the

33 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 37: SQL Server Business Continuity in a Hosted Environment Reference Architecture

SQL in the distribution database and executes it on the subscribing database.

34 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 38: SQL Server Business Continuity in a Hosted Environment Reference Architecture

RecoveryThe following are considerations for recovering SQL Server databases after a failure.

Windows Failover Cluster Instances

If the primary node of a cluster hosting virtual machines or SQL Server instances fails, a failover to another node occurs automatically. The cluster quorum, a vote of nodes, and possibly the file share and disk containing the cluster database, determine the health of the cluster and when to fail over.

Use Failover Cluster Manager to determine the cause of the failure. Verify the quorum is still intact. If the cluster is still operational, allow the cluster to continue on the new primary node or fail it over manually to a different node that will sustain the cluster resources until the original primary node can be brought back online.

If a SQL Server failover cluster instance fails, use Failover Cluster Manager to evict the node that failed while the cause is determined and repaired.

If an entire data center is lost, the quorum for the cluster may be lost. In this case, the entire cluster may fail. You can force the cluster to restart on a surviving node without a quorum. This is a last resort. It is always better to keep the cluster intact if possible. Once the cluster services have started with the force quorum switch, verify the SQL Server has started or the Hyper-V virtual machines are running. After recovery, these resources should once again be usable.

Availability Groups

If the availability group primary node fails and a secondary node is in synchronous commit mode with automatic failover, run the Availability Group Dashboard on that node for a big picture view of the availability group. You can use SQL Management Studio from any client on the network and log on to the node in Object Explorer. You should see that automatic failover is complete within a minute or so, and the old

35 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 39: SQL Server Business Continuity in a Hosted Environment Reference Architecture

secondary replica is the new primary replica and accepting client connections.

If there is no automatic failover secondary replica, manually fail over the availability group to a secondary replica node. You can use using the Availability Group Failover Wizard in the Availability Group Dashboard in SQL Server Management Studio. You can also use Transact-SQL or Windows PowerShell commands. This is possible as long as a cluster quorum exists and the cluster is still up.

If the cluster has failed but cluster services are running on a remaining target node, attempt a failover of the availability group to that node by running the Availability Group Failover Wizard or the corresponding Transact-SQL or Windows PowerShell commands.

If the Availability Group Failover Wizard failover fails because of a cluster error, start cluster services with the force quorum switch on that node.

After the cluster service starts, fail over the availability group to this node using the Availability Group Failover Wizard. If transactions committed on the primary node that were not written successfully to this secondary node, that data could be lost.

An availability group listener that was properly configured with IP addresses of all the cluster nodes should now direct client sessions to the new primary node automatically. If the client sessions are not connecting to the node, determine if clients can connect by specifying the computer name of the cluster node.

When the repairs to the nodes are complete and the availability group is fully back online, you can manually fail back to the original primary node.

Hyper-V Replica

You can manually start the replica virtual machine through Virtual Machine Manager on that server. You can also start the virtual machine using Windows PowerShell commands.

36 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 40: SQL Server Business Continuity in a Hosted Environment Reference Architecture

SQL Server Transaction Log Shipping

To bring a secondary log shipping database online for writes, first determine if the primary server is still available. If it is, determine if the transaction log file partition is still available for the log shipping database, even though the data file partition may be lost and the database is not usable for clients.

If the transaction log file partition is still available even though the database is not usable because other storage was lost, you can execute one final backup of the transaction log of the database to get all committed transactions up to the time of the failure.

Verify that all available backups prior to the final backup were restored to the secondary database. Restore the final backup using the WITH RECOVERY option. The database should now be writable and have all committed transactions up to the time of the failure.

Clients will need to be redirected to log on to the new primary server after the restore with recovery is complete.

SQL Server Transactional and Peer-to-Peer Replication

If the Log Reader agent job runs continuously, all committed transactions should be in the distribution database. If the distribution database is available, the Distribution Agent job should commit the final committed transaction prior to the failure to the secondary. Since these writes are synchronous, there may be data loss, but hopefully it is minimal.

There is no database recovery to be done, the database is writable. Clients can be redirected to the new server and start writing to the new primary database immediately.

37 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 41: SQL Server Business Continuity in a Hosted Environment Reference Architecture

SummaryHosted service providers are giving customers more choices. There are a wide variety of options available for business continuity for hosted service providers to deploy across their own data centers or to work with the customer’s hybrid-IT organization. This reference architecture document details the steps to plan, deploy, and recover for many of the business continuity features available for hosted service providers hosting SQL Server 2012. High availability features allow near-continuous operation with no loss of data. Disaster recovery options allow for the customer or hosted service provider losing an entire data center with recovery in as little time as possible with a minimal loss of data. Flexible features meeting the highest demand that are easy to configure will allow the hosted service provider and the customer to meet the most demanding business continuity needs.

38 SQL Server Business Continuity in a Hosted Environment Reference Architecture

Page 42: SQL Server Business Continuity in a Hosted Environment Reference Architecture

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This white paper is for informational purposes only. Microsoft makes no warranties, express or implied, in this document.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in, or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

© 2013 Microsoft Corporation. All rights reserved.

The example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious. No association with any real company, organization, product, domain name, e-mail address, logo, person, place, or event is intended or should be inferred.

Microsoft, Active Directory, Hyper-V, SQL Server, Windows, Windows PowerShell, and Windows Server are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

39 SQL Server Business Continuity in a Hosted Environment Reference Architecture