maximizing sharepoint availability whitepaper v1 · pdf filemaximizing sharepoint availability...

Maximizing SharePoint Availability Whitepaper v1.1

4/2018

This technical whitepaper describes how to configure High Availability

and Disaster Recovery for SharePoint Server at the Database level, in

addition to what is the best practices to implement AlwaysOn Availability

Group in SQL Server.

Fadi Abdulwahab

CSSLP, MCC, MCITP

2 | P a g e

Author

Author for SharePoint 2013 book, focus on building secure

applications.

Achieved many projects with Microsoft Technologies since

2006 for banks, universities, and ministries.

Experienced in SharePoint Administration, Infrastructure,

Development, Governance, and Disaster Recovery.

Specialties: SharePoint Server, SharePoint Search, ASP.NET/C#, OWASP Top 10, SQL

Server Administration and High Availability Solutions.

Recognized as Microsoft Community Contributor in July 2013

(ISC) 2 - CSSLP® Certified Secure Software Lifecycle Professional in July 2015

AWS Solutions Architect – Associate certification in April 2017

AWS Solutions Architect – Professional certification in April 2018

Maximizing SharePoint Security v2.0 whitepaper

https://gallery.technet.microsoft.com/Maximizing-SharePoint-cf7f7efc

My Blog:

https://fabdulwahab.com

My Twitter Account:

https://twitter.com/fadi_abdulwahab

My LinkedIn account:

https://www.linkedin.com/in/fadiabdulwahab

My MSDN Profile:

https://social.msdn.microsoft.com/profile/fadi%20abdulwahab/

My SharePoint Book (Advanced Topics in SharePoint 2013 in Arabic language):

http://www.neelwafurat.com/itempage.aspx?id=lbb229815-208246&search=books

https://gallery.technet.microsoft.com/Maximizing-SharePoint-cf7f7efc

https://fabdulwahab.com/

https://twitter.com/fadi_abdulwahab

https://www.linkedin.com/in/fadiabdulwahab

https://social.msdn.microsoft.com/profile/fadi%20abdulwahab/

http://www.neelwafurat.com/itempage.aspx?id=lbb229815-208246&search=books

3 | P a g e

Disclaimer

This document is provided "As is", therefore test any changes before go live.

Product or company names mentioned in this document may be the trademarks of their

respective owners.

You can use this whitepaper for your environments and other needs.

Fadi Abdulwahab © 2018, all right reserved.

I will be happy with your feedback because your feedback is very important, if you have

comments or new points please send it to me @ [email protected]

mailto:[email protected]

4 | P a g e

Table of Contents Author .......................................................................................................................................... 2

Disclaimer..................................................................................................................................... 3

Why Maximizing SharePoint Availability .......................................................................................... 5

Introduction.................................................................................................................................. 6

Terminologies ............................................................................................................................... 6

High Availability ..................................................................................................................... 6

Disaster Recovery .................................................................................................................. 6

Recovery Time Objective (RTO) ............................................................................................. 6

Recovery Point Objective (RPO) ............................................................................................ 6

Replica ................................................................................................................................... 6

Solutions in SQL Server .................................................................................................................. 7

Backup and Restore ............................................................................................................... 7

Log Shipping .......................................................................................................................... 7

Replication ............................................................................................................................. 8

Mirroring ............................................................................................................................... 8

Failover Cluster ...................................................................................................................... 8

AlwaysOn Availability Group ................................................................................................. 8

SharePoint Topology ................................................................................................................... 10

Prerequisites ............................................................................................................................... 10

SQL Client Alias .................................................................................................................... 11

SQL Server instances ........................................................................................................... 13

AlwasyOn Configuration .............................................................................................................. 15

Configuring Windows Server Failover Clustering ................................................................ 15

Enabling AlwaysOn High Availability ................................................................................... 29

Creating AlwaysOn High Availability ................................................................................... 30

Testing AlwaysOn Availability Group................................................................................... 40

SQL Server Versions ..................................................................................................................... 43

Log and Backup Management ...................................................................................................... 46

Ports .......................................................................................................................................... 54

SQL Authentication ..................................................................................................................... 55

Cluster Quorum Models ............................................................................................................... 60

Disaster Recovery Plan ................................................................................................................ 63

Recovering from a Disaster ................................................................................................. 75

Revert back to Primary data center .................................................................................... 79

General SharePoint Considerations ............................................................................................... 87

References.................................................................................................................................. 88

5 | P a g e

Why Maximizing SharePoint Availability

Availability is about doing your best to create reliable application components and

infrastructure, accepting the reality that your application probably will have at least some

failures, and designing in quick recovery technology to minimize (or even eliminate)

downtime.

Maximizing because availability is about degrees (it's about No. of 99.999!).

The idea of this document have come from the slides I published since 5 years ago on

SlideShare http://blogs.msdn.com/b/fabdulwahab/archive/2013/04/04/alwayson-in-sql-

server-2012.aspx about SQL Server AlwaysOn.

I try in this version to recover the most fundamental steps and configurations in order to

implement AlwaysOn Availability Group as High Availability and Disaster Recovery solution

for SharePoint Database Servers.

Finally, treat High Availability as continuous process, it's not just about "set and forget".

http://blogs.msdn.com/b/fabdulwahab/archive/2013/04/04/alwayson-in-sql-server-2012.aspx

http://blogs.msdn.com/b/fabdulwahab/archive/2013/04/04/alwayson-in-sql-server-2012.aspx

6 | P a g e

Introduction Maximizing SharePoint availability is a big task and it covers many layers including the

software and hardware to provide high availability for SharePoint applications, but in this

document we are going to maximize the availability of SQL Server databases using the latest

Microsoft technologies.

Microsoft provides many solutions to enable High availability (HA) and Disaster Recovery

(DR) for SharePoint farm. These solution has Pros and Cons based on your business

requirement, Recovery Time Objective (RTO), Recovery Point Objective (RPO) and budget, In

Short, Service Level Agreement (SLA).

Terminologies Before we delve into this document, it's better to understand some basic meaning of the

below terminologies.

High Availability High Availability (HA) refers to a system or component that is continually operational for

desirably long length of time in order to minimize or mitigate the impact of downtime.

Disaster Recovery Disaster Recovery (DR) is a process that you can use to help recovering information systems

and data, if a disaster occurs.

Recovery Time Objective (RTO) This is the duration of the outage. The initial goal is to get the system back online in at least

a read-only capacity to facilitate investigation of the failure.

Recovery Point Objective (RPO) This is often referred to as a measure of acceptable data loss. It is the time gap or latency

between the last committed data transaction before the failure and the most recent data

recovered after the failure.

Replica Two types of replicas exist: a single primary replica, which hosts the primary databases, and

secondary replicas, each of which hosts a set of secondary databases and serves as a

potential failover targets for the availability group.

7 | P a g e

Solutions in SQL Server Microsoft provides many solutions to enable High Availability (HA) or Disaster Recovery (DR)

for SQL Server environments. At the high level, these options are:

Backup and Restore

Log Shipping

Replication

Mirroring

Failover Cluster

AlwaysOn Availability Group

There are solutions out of SQL Server product like SAN replication, Hyper-V replication and

other solutions but these types of replications are not supported by Microsoft because they

may cause consistency issues especially for search index and timer jobs.

The only exception for Virtual machine replication is Azure Site Recovery, which does

support replication of virtual machines into Azure for the purposes of Disaster Recovery, you

can find more information in these links:

https://docs.microsoft.com/en-us/azure/site-recovery/site-recovery-sharepoint

https://docs.microsoft.com/en-us/office365/enterprise/sharepoint-server-2013-disaster-

recovery-in-microsoft-azure

Backup and Restore The purpose of creating SQL Server backups is to enable you to recover a damaged

database. We can summarize the Pros and Cons in the following points:

There is a possibility to lose Data

Inexpensive solution for DR

It doesn't provide HA

Protection at database level

Log Shipping SQL Server Log shipping allows you to automatically send transaction log backups from

a primary database on a primary server instance to one or more secondary databases on

separate secondary server instances. We can summarize the Pros and Cons in the following

points:

Can be HA or DR

No automatic failover


Inexpensive solution

There is a possibility to lose Data

https://docs.microsoft.com/en-us/azure/site-recovery/site-recovery-sharepoint

https://docs.microsoft.com/en-us/office365/enterprise/sharepoint-server-2013-disaster-recovery-in-microsoft-azure

https://docs.microsoft.com/en-us/office365/enterprise/sharepoint-server-2013-disaster-recovery-in-microsoft-azure

8 | P a g e

Replication Replication is a set of technologies for copying and distributing data and database objects

from one database to another and then synchronizing between databases to maintain

consistency. We can summarize the Pros and Cons in the following points:



Inexpensive solution

Can be used as load balancing

Can be HA or DR

Mirroring Database mirroring is a primarily software solution for increasing database availability by

replicate the data between primary and secondary servers. We can summarize the Pros and

Cons in the following points:

Can be HA or DR

Limited to two servers

It supports automatic failover but it needs witness server for this purpose

Replaced with AlwaysOn in SQL Server 2012 and higher releases

Note:

Not all databases can be configured with Failover database server from central

administration but you can add this property from PowerShell for these database like

configuration database using the below commands:

$database = Get-SPDatabase | where { $_.Name -eq "SomeSharePointDB" }

$database.AddFailoverServiceInstance("SQLSecond")

$database.Update();

Failover Cluster A Windows Server Failover Clustering (WSFC) cluster is a group of independent servers that

work together to increase the availability of applications and services. SQL Server takes

advantages of WSFC services to provide local high availability through redundancy at the

server-instance level. We can summarize the Pros and Cons in the following points:

HA

Expensive Solution

Protection at instance level

It supports automatic failover

AlwaysOn Availability Group This is the latest solution provided by Microsoft from SQL Server 2012 and higher releases

which has the features of Database Mirroring and Failover Cluster with many enchantments

and new features. We can summarize the Pros and Cons in the following points:

Can have up to 4 replicas or more based on SQL Server versions (In Mirroring , only

one primary and one secondary servers)

9 | P a g e

Can be HA and DR (HA = Sync mode , DR = Async mode)

No need for SAN storage (In Failover cluster , SAN storage or other network disks are

required)

Can be deployed in geographical (like in Failover Cluster but with more

enchantments)

No need for witness Server

You can use Availability Group Listener (Virtual IP and Name like in Failover cluster)

Replica Servers can be accessed for backup or reporting service operations … etc.

Supports automatic failover (In case of Sync mode)

Needs SQL Server enterprise edition

So in this case you can use one solution to provide HA and DR instead of using multiple

solutions like what it was in previous versions for example, using Database Mirroring for

remote DR (or log shipping) and Failover cluster for HA.

Terms

AlwaysOn Failover Cluster instance (FCI) = SQL Server Failover Cluster Instance

AlwaysOn Availability Group = like Database mirroring in old version but with many

enhancements

This table from Microsoft whitepaper to show the differences between these solutions

(Based on SQL Server 2012 version):

High Availability and Disaster Recovery

SQL Server Solutions

Potential Data Loss

(RPO)

Potential Recovery

Time (RTO)

Automatic Failover

Readable Secondaries

AlwaysOn Availability Group - synchronous-commit

Zero Seconds Yes 0 - 2

AlwaysOn Availability Group - asynchronous-commit

Seconds Minutes No 0 - 4

AlwaysOn Failover Cluster Instance NA Seconds -to-minutes

Yes NA

Database Mirroring - High-safety (sync + witness)

Zero Seconds Yes NA

Database Mirroring - High-performance (async)

Seconds Minutes No NA

Log Shipping Minutes Minutes -to-hours

No Not during a restore

Backup, Copy, Restore Hours Hours -to-days

No Not during a restore

10 | P a g e

SharePoint Topology Let us assume that we have the following topology design for the SharePoint Farm and we

need to enable high availability for SharePoint databases.

In this diagram we have redundant servers for both SharePoint and SQL Server to provide

High availability in case one of these servers go down then the second can continue to serve

the user's requests.

SQL Server 01 is the primary server which the SharePoint servers connect to it directly or by

using the Availability Group listener. Any transactions run in SQL Server 01 (Primary Server),

it should be committed also in SQL Server 02 (Secondary Server) before it return the success

message to the users (this is called sync mode).

Note

In Async commit mode, transactions will be committed in SQL Server 01 (Primary Server)

and return the success message to the user and then write these changes to SQL Server 02

(Secondary Server), so in this case there is a possibility to loss data if the connection is gone

after the changes committed to SQL Server 01 but this solution is more suitable with DR

because the latency and performance issues.

Prerequisites For this lab, I used Windows Azure to build the SharePoint Farm which has the following

Virtual machines:

SP1 = SharePoint Server 2016 beta 2 Server

DC1 = Domain Active Directory Server

SQL1 = SQL Server 2014 SP1 Enterprise (Standalone) Server

11 | P a g e

o SharePoint Farm connect directly to this instance

SQL2 = SQL Server 2014 SP1 Enterprise (Standalone) Server

SQL Client Alias It's highly recommended to use SQL client alias name to make the changes easily in future in

case of you change the SQL Server instance name or the SQL server itself.

Steps

Open Run command, type cliconfg

Go to Alias Tab

Fill the fields (SQLDB is a fake name)

12 | P a g e

Note

Also make sure to create SQL alias name in C:\Windows\SysWOW64\cliconfg.exe for 64-bit

access.

During the SharePoint configuration, enter SQL Alias name instead of the real SQL Server

name

As the below image, SharePoint farm is configured with SQL Alias name

13 | P a g e

Notes

Don't forget to configure these settings in order to install the SharePoint successfully:

1. Grant the SharePoint farm service account these SQL Server roles (db_creator ,

public , securityadmin)

2. Set Max Degree Parallelism = 1 at the instance level settings

Again, don't forget to do the same steps in the secondary server.

SQL Server instances In our lab, we have two SQL Server instances as standalone servers (for SharePoint, you only

need SQL Server service).

14 | P a g e

And these are the databases created for the SharePoint farm

Note

The secondary server has no databases.

If you check the AlwaysOn setting, you will get the following message:

So it's time to configure AlwaysOn Availability Group in SQL server for High Availability.

15 | P a g e

AlwasyOn Configuration To configure AlwaysOn Availability Group, we will go with these steps:

1. Configuring Windows Server Failover Clustering

2. Enabling AlwaysOn High Availability

3. Creating AlwaysOn High Availability

4. Testing AlwaysOn Availability Group

Configuring Windows Server Failover Clustering Start by opening Server Manager and Click on Add Roles and Features

Click Next

Keep the default option and click Next

16 | P a g e

Select first server (SQL Server 01) and click Next

Don't select anything here and click Next

17 | P a g e

Select Failover Clustering feature and click Next

Click add features

18 | P a g e

Click Install

Repeat the same steps on the secondary Server (SQL Server 02).

In Primary Server, let us create the new cluster by opening the Failover Cluster Manager

Select Create Cluster…

19 | P a g e

Note

Run it with the domain administrator account because it will create Computer Object and (A)

record in Active Directory and DNS server.

Click Next

Select the SQL Server nodes to be added to this cluster

20 | P a g e

Select first option, to run the validation tests (To test the prerequisites requirements like

Network configuration, Storage …)

Click Next

21 | P a g e

Select second option, to customize the tests to meet AlwaysOn requirements only

Don't select Storage option (because we don't have SAN storage in AlwaysOn case)

22 | P a g e

Click Next

Validate the Results, Fix the errors or warning (if exists) and then click Finish

23 | P a g e

After the closing validation process, select the second option and click Next

Enter the Cluster name and IP address (it will create A record in DNS Server)

24 | P a g e

Click Next

Click Finish

25 | P a g e

Two nodes have added to Windows cluster

Next step to configure the Quorum, click on the cluster name and select the below option

Click Next

26 | P a g e

Select the last option and click Next

Select the two nodes and Click Next

27 | P a g e

Select the second option and click Next

In this case, I will use a file share witness in the active directory server

28 | P a g e

Click Next

Click finish

29 | P a g e

Note

Windows cluster service account must has write permission to File share witness.

Enabling AlwaysOn High Availability Open SQL Server Configuration Manager and Right click on SQL Server service and select

properties

Enable AlwasyOn Availability Groups and Click Ok

30 | P a g e

Restart the SQL Server service

Repeat the same steps on the secondary server.

Creating AlwaysOn High Availability Open SQL Server Management Studio, but before creating AlwaysOn High Availability make

sure that all databases which need to be replicated have Full Recovery Model (Right click on

the database and choose properties and then go to options tab)

31 | P a g e

And also make sure to have Full Backup for these databases

Choose Full for backup type

32 | P a g e

Then go to AlwaysOn High Availability folder and create new …

Click Next

33 | P a g e

Enter a valid name (just label) and click Next

34 | P a g e

Select all databases

35 | P a g e

Note

If you don't take full back up or databases are not in Full recovery model then you will get

warnings and it will prevent you to continue until you take the Full back up and make these

databases in full recovery model.

Click on Add Replica …

Select Automatic Failover and Sync Commit for both nodes

36 | P a g e

Also choose "Yes" to make the secondary servers readable when it's acting as Secondary

server role

Create Shared folder to move backup files to secondary servers

37 | P a g e

Validate the configuration

Ignore the listener configuration for now and click Next

38 | P a g e

Click Finish

Availability Group has configured.

39 | P a g e

To add Listener, Go to Availability Group Listener and Click Add Listener…

Enter a valid Name, IP and Port (It will create (A) record in DNS Server)

Go to SharePoint Servers and change the SQL alias name to use the listener name

40 | P a g e

Testing AlwaysOn Availability Group To check and monitor the Health status of Availability Group, click on Show Dashboard

From here you can review the state and issues if exists

To simulate the failover scenario, click on Failover…

Click Next

41 | P a g e

Click Next

Connect to Secondary Server and Click Next

42 | P a g e

Click Finish

Now the Second Server will take the role of primary server

43 | P a g e

Note

To test the automatic failover, shut down the SQL Server or Service for the primary node.

SQL Server Versions The below table show High availability features supported by different versions.

For SQL Server 2012

44 | P a g e

For SQL Server 2014

For SQL Server 2016 and higher version

AlwaysOn will be supported with standard edition but with some limitations:

45 | P a g e

Limited to two nodes only (a primary and a secondary server)

Like mirroring, you can’t read from the secondary, nor take backups of it

It is not appropriate for a SharePoint farm as it only supports a single database

within the Basic Availability Group

46 | P a g e

Log and Backup Management Log file stores records about all transactions performed on the database in order to recover

the database in the event of failure to consistent state and also it used to manage the

transactions statements (COMMIT and ROLLBACK). In addition, log file enable DBA to

recover the database to the point-in-time restore.

When database in Full Recovery model, SQL Server retain every single log record in log file

and make the log file grow to add the new log records. SQL Servers also use these logs to

send the transactions to replica nodes in case of Database Mirroring or AlwaysOn.

The only right way to truncate the log file is to use log backup because it will let the log file

to overwrite the existing space (inactive VLF blocks inside the log file) and avoid any

increasing in the size of the log file (Overwrite doesn't mean reducing the size of the log but

it helps in preventing the growth of the log).

The log backup should be taken along with full database backups (also with differential

database backups).

Note

Even in case of AlwaysOn, you need to implement backups because what if both nodes

crashed due to hardware failure so you need backup plan in order to recover your databases

and to protect your contents and also you need log backup to manage the log growth and

restore the database to point-in-time restore.

In AlwaysOn, because databases in Full recovery model, we need to take care of log files to

avoid issues like log growth until the log take the whole space of the disk which causing SQL

Server to enter in read-only mode.

Note

Switching the database to SIMPLE recovery model or using SHRINK will break the log chain

which it will cause to lose the possibility to recover the database to the point-in-time

restore, but if you in a situation that force you to use SHRINK to reduce the log file size then

use SHRINK.

For more information check this article http://tctblgs.azurewebsites.net/shrinking-sql-log-

files-in-an-availability-group-cluster-or-database-mirror/

Building backup strategy can be different from environment to environment based on

business requirements and SLA .In this lab I will show you how to configure Full Backup and

Log backup in order to save you in the time of disaster and to prevent the log file to grow

excessively.

Go to Availability Group and right click on it and choose properties

http://tctblgs.azurewebsites.net/shrinking-sql-log-files-in-an-availability-group-cluster-or-database-mirror/

http://tctblgs.azurewebsites.net/shrinking-sql-log-files-in-an-availability-group-cluster-or-database-mirror/

47 | P a g e

Click on Backup Preferences tab

There are four preferences as following:

1. Prefer Secondary which it run backups on secondary servers only except if no

secondary nodes then it will run it on primary server

2. Secondary only which it will run backups only on secondary server even if the

automated backups exists on primary server it will not run

3. Primary which it will run on primary server only

48 | P a g e

4. Any Replica which it will run the backups on any replica based on Priority table (you

can exclude some replica from handling backup operations)

In this case, we will go with Prefer Secondary option which it will run the automated

backups on secondary servers only except if there is no secondary nodes then it will run on

primary server, so in all cases there is automated backup.

Why to run backup on secondary servers?

It allows SQL Server to reduce or eliminate resource contention between production activity

and backups.

Go to Secondary Server to create Maintenance plan for databases backup for this example

(again backup plan based on your strategy and SLA):

Daily Full backup

Log backup for every 6 hour

Give it a descriptive name

Add Backup Database Task to first Subplan (Full Backup)

49 | P a g e

Open properties of Backup Database Task and Select Full Backup type

Select databases to backup

50 | P a g e

Specify the backup folder

Choose (Copy-only backup) and Click Ok

Note

Copy-only backup is a backup type which is separated from the sequence of standard SQL

backups. This type of backup will not break the LSNs chain for backups. In this case because

the backup will run in replica servers, SQL Server prevent you to take normal full backup in

these replica servers to avoid breaking the LSN chain in Primary Server.

51 | P a g e

Open the subplan (Log Backup) and add Backup database task

Select Transaction Log Backup type

52 | P a g e

Select databases to backup

Specify the backup folder

53 | P a g e

Click Ok

Then save the maintenance plan

Note

You can run log backup in any servers whether on primary or secondary server.

54 | P a g e

Try the following scenarios:

Scenario Status

Run Automated Backups on Secondary Server Takes Backups

Run Automated Backups on Primary Server (Make SQL Server 2 as Primary Server)

Doesn't take Backup

Run Automated Backups on Primary Server and No secondary Server (Stop SQL Server 1)

Takes Backups

Ports The below table show the most common default ports numbers that used by SQL Server

with AlwaysOn:

1433 TCP Default instance database engine (Can be changed)

1434 UDP SQL Browser Service

5022 TCP AlwaysOn default port for primary and secondary replicas

398 TCP/UDP LDAP authentication

53 TCP/UDP DNS

3343 TCP/UDP Cluster Network Communication

For more information you can check this article

SQL Server: Frequently Used Ports http://social.technet.microsoft.com/wiki/contents/articles/13106.sql-server-frequently-

used-ports.aspx

http://social.technet.microsoft.com/wiki/contents/articles/13106.sql-server-frequently-used-ports.aspx

http://social.technet.microsoft.com/wiki/contents/articles/13106.sql-server-frequently-used-ports.aspx

55 | P a g e

SQL Authentication In normal cases, SharePoint will connect to SQL Server using Windows authentication mode.

With AlwaysOn implementation, just make sure that service accounts of SharePoint need to

be added in all nodes with the same roles and permissions.

But if you have an application needs to connect to SQL Server using SQL authentication

mode then in this case you will get issues with AlwaysOn because SQL Login user has

different internal SID in both nodes even if you create the SQL login with the same Name so

SQL Server will consider the user doesn't has permission to replica databases.

To fix this issue, SQL Server 2012 and new versions introduced new database concept called

Contained Databases.

A contained database is a database that is isolated from the instance of SQL Server that

hosts the database. User authentication in a contained database can be performed by the

database, reducing the databases dependency on the logins of the instance of SQL Server. By

this way, SQL Login user (with its internal SID) will be stored inside the database so during

the replication operation it will replicate the login as metadata related to this database.

Steps to configure a contained database:

Run the following script in both nodes to enable partial databases

EXEC sys.sp_configure N'contained database authentication', N'1' GO RECONFIGURE WITH OVERRIDE GO

Go to Custom Application Database and Right click on it and choose properties

56 | P a g e

Go to options table and choose Partial for Containment type

If you get the below error then try this script

57 | P a g e

USE [master] GO ALTER DATABASE [ApplicationERP] SET CONTAINMENT = PARTIAL WITH NO_WAIT GO

Go to Security folder inside the database >> Users >> and click on New User…

Enter a valid username and password which will be stored inside the database

58 | P a g e

Grant it the right privilege (select the least privilege of permission in production servers)

If you check the replica database, you will find the user replicated under the database in

replica servers

59 | P a g e

Then you can use this SQL Login user with the application connection string, but don't forget

to enable SQL Server and Windows Authentication mode in all nodes

Use normal connection string

<connectionStrings> <add name="ConnectionStringName" providerName="System.Data.SqlClient" connectionString="Data Source=SQL1;Initial Catalog=ApplicationERP;Integrated Security=False;User Id=test;Password=P@ssw0rd;"/> </connectionStrings>

60 | P a g e

Cluster Quorum Models The cluster contains nodes and resources, to consider these resources in High Available

situation only if the nodes are up and running. The cluster required more than half of the

nodes to be up and running otherwise the cluster will go down. Quorum for the cluster

maintain the number of nodes (also could be disk witness or file share witness like in our

lab) that must be online for the cluster to be run also to prevent scenarios when nodes can't

communicated with each other which cause each node to try to own the resources at the

same time. By default, every node in a failover cluster has a vote to determine whether the

cluster continues running or not.

The value ‘0’ means the node doesn’t have a vote. The value ‘1’ means the node has a vote.

Each node in a WSFC cluster participates in periodic heartbeat communication to share the

node's health status with the other nodes.

Cluster Quorum has four models, only first three are recommended to use:

Node Majority: only Nodes can vote

Node and File Share Majority: Nodes and File Share witness can vote

Node and Disk Majority: Nodes and Disk witness can vote

No Majority: Disk Only: Only disk witness can vote and this model used in prior to

Windows 2003 which it was only supported disk witness quorum.

To understand the usage for these models let us assume the following examples:

1. If you have 4 nodes which it's equal to 4 votes then if 1 node fail then the 3

remaining nodes which is more than half of the cluster nodes will stay running.

2. If you have 4 nodes which it's equal to 4 votes then if 2 nodes fail then the 2

remaining nodes will go down because it's not more than half of cluster nodes.

3. To increase the high availability for the cluster with 4 nodes then we can add File

Share or disk witness which can have a vote, so in this case if we have 4 nodes + 1

File Share or Disk which it's equal 5 votes then if 1 node fail then the 4 remaining

nodes will stay running and if 2 fail then the 3 remaining nodes will stay running, so

by adding the File Share or Disk witnesses we increase the availability with cheap

means without a need to purchase a server (node).

4. If you have 2 nodes and there is no File Share or disk then if one node goes down

then the cluster will go down.

It's recommended to have odd number of votes which they equal to more than half by the

quorum calculation (Minimum 2 Nodes + File Share or Disk witnesses).

Which model to select?

By default, Failover Cluster manager picks the best model based on cluster configuration and

nodes. If there is odd number of Nodes then the cluster will select (Node Majority) and if

there is even number of Nodes and also has File Share then the cluster will select (Node and

File Share Majority) and if it's disk then it will be (Node and Disk Majority).If there is even

number of Nodes and no disk or file share witness then the mode will be (Node Majority)

with warning messages.

61 | P a g e

If the Cluster use No Majority: Disk Only then in this case the cluster has only 1 vote and if

the disk goes down then the whole cluster will go down.

Let us now check the Failover Cluster manager selection based on our lab configuration, go

to Availability Group Dashboard and click on View Cluster Quorum Information

The Quorum model in this case is Node and Fileshare Majority because we have 2 nodes

and 1 Fileshare witness and all have votes.

So in this case to make the cluster goes down we need to bring 2 votes to be offline then the

cluster will go down. In the below image, only the SQL 01 is online and others are offline so

the cluster is down

62 | P a g e

Note

You can access the database by direct name only.

If you try to make one node is down (I will make the Fileshare offline) then in this case we

have 2 votes which it is more than half of the cluster nodes

In this case, the cluster still is running

63 | P a g e

Disaster Recovery Plan To extend the SharePoint topology to support disaster recovery solution, let us add extra

SQL server to the below design in different data center

In this case we added extra SQL Server 03 in different data center which act as disaster

recovery replica with the following considerations because they are requirements for

Availability Group:

Join SQL Server 03 to the same Window cluster

Use the same domain Active directory

Use Async mode with DR solution to avoid latency and performance issues


To join the server to the current windows cluster, click on Add Node…

Click Next

64 | P a g e

Add SQL Server 03 and click Next

After validating the node , choose second option and click Next

65 | P a g e

Click Next

Click Finish

66 | P a g e

Then go to SQL Server service, right click on it and choose properties

Enable AlwaysOn Availability Group and click Ok

67 | P a g e

Restart the SQL Server service

Currently we have 3 nodes in the current windows cluster.

68 | P a g e

Also consider to modify the quorum model to prevent the Disaster recovery nodes from

affecting the quorum votes in Primary data center ,so in this case we will assign '0'

NodeWeight to DR nodes to avoid the outage or disconnection of DR nodes by using the

following command (you can use PowerShell or cluster.exe commands):

Now the primary data center nodes only can votes

Note

To failover to DR , you need to change the vote weight for DR nodes by giving them '1' vote

and make the Nodes in primary data center with '0' vote and in case of revert back to

primary data center do the vice versa.

Before add this server as replica, make sure to add SharePoint services accounts, configure

Max Degree of Parallelism and enable contained databases on this new node.

Then to add this node to AlwaysOn Availability Group, click Add Replica…

69 | P a g e

Click Next

Connect to second node and click Next

70 | P a g e

Add SQL Server 03 with Async mode (also make it readable) and click Next

71 | P a g e

Enter the Shared folder to move the backups to this node and click Next

Click Next

72 | P a g e

Click Finish

73 | P a g e

Now we have 3 replica, 2 in primary data center and 1 in disaster recovery data center.

74 | P a g e

And if you check the Cluster Quorum Information

75 | P a g e

Recovering from a Disaster If the primary data center goes down for any reasons, then these are the steps you can

follow to recover from a disaster.

Let us assume in our lab that SQL 01 and SQL 02 go down.

At DR Data center, the cluster will be down because the quorum and half of nodes are down

so in this case you can't access the AG Listener and you can't access the cluster manager

therefore we need to start the Cluster by commands (Called force Quorum mode).

Note

If the cluster is running in DR nodes then stop it to start it with Force Quorum mode by using

this command

Stop-ClusterNode –Name "SQL3"

76 | P a g e

Or go to services.msc and stop the cluster service

Open PowerShell and run this command:

Start-ClusterNode –name "SQL3" –FixQuorum

Once the cluster service on the DR node has started, the availability group will show offline

in the Failover Cluster Manager, and cannot be brought online.

77 | P a g e

Change the votes for these nodes as following:

Now to bring AG online, run this script in SQL Server 03, but in this case you need to accept

the risk of data lose because maybe there is data or unsent log yet not replicated from the

primary data center before the failure occurs.

78 | P a g e

Check now

Now you can access the database using Availability Group Listener

If you check the dashboard, ignore the warning messages which are related to nodes in

Primary data center.

79 | P a g e

Revert back to Primary data center To resume back to the primary servers, then follow these steps:

Let us assume that the Nodes in primary data center are up and running and these nodes

connected again to Windows cluster.

80 | P a g e

Note that Databases are not synchronized in the primary nodes

Before do anything you need to evaluate the case, if you want to take the trial log backup

from primary databases or resync from DR replica server, in this case we will resync with DR

nodes.

First change the votes for these nodes

Run these commands to assign the new votes

81 | P a g e

After you run the commands, you have only votes in primary data center nodes

Then to resume from DR node to primary data center, for each databases run resume script

Do the same in second node, now databases in primary nodes are sync.

82 | P a g e

Modify the sync commit mode in SQL Server 03 to be in sync mode instead of async mode to

synchronize prior to failover the databases.

83 | P a g e

Now the databases are in sync commit mode in all nodes.

84 | P a g e

Then failover to SQL Server 01 by running the below command in SQL Server 01

85 | P a g e

Everything return back again as it was before the disaster happened.

Change DR node again to async commit mode

Now if you check the dashboard, you will find the node in primary data center take the role

of Primary node.

86 | P a g e

87 | P a g e

General SharePoint Considerations These are important points to consider with remote disaster recovery solutions:

Not all SharePoint databases supported in async mode like for example SharePoint

Config and Central admin databases and these databases should be different in both

data centers because they store information about the servers so it's better to not

sync them in DR solutions.

For more information

http://www.harbar.net/archive/2014/03/20/Support-for-SQL-Server-Always-On-

Async-Replication-with-SharePoint.aspx

https://docs.microsoft.com/en-us/SharePoint/administration/supported-high-

availability-and-disaster-recovery-options-for-sharepoint-databas

When adding new Site Collection in a database that is replicated to the Disaster

Recovery farm, make sure to update the configuration database in Disaster recovery

farm to register the new created site collection because it will not automatically be

updated, you can updated using the following PowerShell:

$db = Get-SPDatabase | where {$_.Name -eq "DatabaseName"}

$db.RefreshSitesInConfigurationDatabase()

In case of multi subnet failover cluster, consider to use MultiSubnetFailover

property for all SharePoint databases to make the connection to different subnet

more stable and to avoid connection timeouts issue.

For more information

http://blogs.msdn.com/b/sambetts/archive/2015/02/13/multi-subnet-sql-server-

clusters-sharepoint-2013-spdatabase-multisubnetfailover.aspx

http://www.harbar.net/archive/2014/03/20/Support-for-SQL-Server-Always-On-Async-Replication-with-SharePoint.aspx

http://www.harbar.net/archive/2014/03/20/Support-for-SQL-Server-Always-On-Async-Replication-with-SharePoint.aspx

https://docs.microsoft.com/en-us/SharePoint/administration/supported-high-availability-and-disaster-recovery-options-for-sharepoint-databas

https://docs.microsoft.com/en-us/SharePoint/administration/supported-high-availability-and-disaster-recovery-options-for-sharepoint-databas

http://blogs.msdn.com/b/sambetts/archive/2015/02/13/multi-subnet-sql-server-clusters-sharepoint-2013-spdatabase-multisubnetfailover.aspx

http://blogs.msdn.com/b/sambetts/archive/2015/02/13/multi-subnet-sql-server-clusters-sharepoint-2013-spdatabase-multisubnetfailover.aspx

88 | P a g e

References SQL Server Transaction Log Management by Tony Davis and Gail Shaw

https://www.simple-talk.com/books/sql-books/sql-server-transaction-log-

management-by-tony-davis-and-gail-shaw/

AlwaysOn Architecture Guide: Building a High Availability and Disaster Recovery

Solution by Using Failover Cluster Instances and Availability Groups

https://msdn.microsoft.com/en-us/library/jj215886.aspx

Microsoft SQL Server AlwaysOn Solutions Guide for High Availability and Disaster

Recovery https://msdn.microsoft.com/en-us/library/hh781257.aspx

https://www.simple-talk.com/books/sql-books/sql-server-transaction-log-management-by-tony-davis-and-gail-shaw/

https://www.simple-talk.com/books/sql-books/sql-server-transaction-log-management-by-tony-davis-and-gail-shaw/

https://msdn.microsoft.com/en-us/library/jj215886.aspx

https://msdn.microsoft.com/en-us/library/hh781257.aspx

89 | P a g e

Thank You Thanks for reading this Whitepaper. Again, I really hope this has been informative and that

will help you to increase the SharePoint High availability. For any questions or comments,

send me an email @ [email protected] .

mailto:[email protected]

maximizing sharepoint availability whitepaper v1 · pdf filemaximizing sharepoint availability...

Documents