maximizing sharepoint availability whitepaper v1 · pdf filemaximizing sharepoint availability...
TRANSCRIPT
Maximizing SharePoint Availability Whitepaper v1.1
4/2018
This technical whitepaper describes how to configure High Availability
and Disaster Recovery for SharePoint Server at the Database level, in
addition to what is the best practices to implement AlwaysOn Availability
Group in SQL Server.
Fadi Abdulwahab
CSSLP, MCC, MCITP
2 | P a g e
Author
Author for SharePoint 2013 book, focus on building secure
applications.
Achieved many projects with Microsoft Technologies since
2006 for banks, universities, and ministries.
Experienced in SharePoint Administration, Infrastructure,
Development, Governance, and Disaster Recovery.
Specialties: SharePoint Server, SharePoint Search, ASP.NET/C#, OWASP Top 10, SQL
Server Administration and High Availability Solutions.
Recognized as Microsoft Community Contributor in July 2013
(ISC) 2 - CSSLP® Certified Secure Software Lifecycle Professional in July 2015
AWS Solutions Architect – Associate certification in April 2017
AWS Solutions Architect – Professional certification in April 2018
Maximizing SharePoint Security v2.0 whitepaper
https://gallery.technet.microsoft.com/Maximizing-SharePoint-cf7f7efc
My Blog:
https://fabdulwahab.com
My Twitter Account:
https://twitter.com/fadi_abdulwahab
My LinkedIn account:
https://www.linkedin.com/in/fadiabdulwahab
My MSDN Profile:
https://social.msdn.microsoft.com/profile/fadi%20abdulwahab/
My SharePoint Book (Advanced Topics in SharePoint 2013 in Arabic language):
http://www.neelwafurat.com/itempage.aspx?id=lbb229815-208246&search=books
3 | P a g e
Disclaimer
This document is provided "As is", therefore test any changes before go live.
Product or company names mentioned in this document may be the trademarks of their
respective owners.
You can use this whitepaper for your environments and other needs.
Fadi Abdulwahab © 2018, all right reserved.
I will be happy with your feedback because your feedback is very important, if you have
comments or new points please send it to me @ [email protected]
4 | P a g e
Table of Contents Author .......................................................................................................................................... 2
Disclaimer..................................................................................................................................... 3
Why Maximizing SharePoint Availability .......................................................................................... 5
Introduction.................................................................................................................................. 6
Terminologies ............................................................................................................................... 6
High Availability ..................................................................................................................... 6
Disaster Recovery .................................................................................................................. 6
Recovery Time Objective (RTO) ............................................................................................. 6
Recovery Point Objective (RPO) ............................................................................................ 6
Replica ................................................................................................................................... 6
Solutions in SQL Server .................................................................................................................. 7
Backup and Restore ............................................................................................................... 7
Log Shipping .......................................................................................................................... 7
Replication ............................................................................................................................. 8
Mirroring ............................................................................................................................... 8
Failover Cluster ...................................................................................................................... 8
AlwaysOn Availability Group ................................................................................................. 8
SharePoint Topology ................................................................................................................... 10
Prerequisites ............................................................................................................................... 10
SQL Client Alias .................................................................................................................... 11
SQL Server instances ........................................................................................................... 13
AlwasyOn Configuration .............................................................................................................. 15
Configuring Windows Server Failover Clustering ................................................................ 15
Enabling AlwaysOn High Availability ................................................................................... 29
Creating AlwaysOn High Availability ................................................................................... 30
Testing AlwaysOn Availability Group................................................................................... 40
SQL Server Versions ..................................................................................................................... 43
Log and Backup Management ...................................................................................................... 46
Ports .......................................................................................................................................... 54
SQL Authentication ..................................................................................................................... 55
Cluster Quorum Models ............................................................................................................... 60
Disaster Recovery Plan ................................................................................................................ 63
Recovering from a Disaster ................................................................................................. 75
Revert back to Primary data center .................................................................................... 79
General SharePoint Considerations ............................................................................................... 87
References.................................................................................................................................. 88
5 | P a g e
Why Maximizing SharePoint Availability
Availability is about doing your best to create reliable application components and
infrastructure, accepting the reality that your application probably will have at least some
failures, and designing in quick recovery technology to minimize (or even eliminate)
downtime.
Maximizing because availability is about degrees (it's about No. of 99.999!).
The idea of this document have come from the slides I published since 5 years ago on
SlideShare http://blogs.msdn.com/b/fabdulwahab/archive/2013/04/04/alwayson-in-sql-
server-2012.aspx about SQL Server AlwaysOn.
I try in this version to recover the most fundamental steps and configurations in order to
implement AlwaysOn Availability Group as High Availability and Disaster Recovery solution
for SharePoint Database Servers.
Finally, treat High Availability as continuous process, it's not just about "set and forget".
6 | P a g e
Introduction Maximizing SharePoint availability is a big task and it covers many layers including the
software and hardware to provide high availability for SharePoint applications, but in this
document we are going to maximize the availability of SQL Server databases using the latest
Microsoft technologies.
Microsoft provides many solutions to enable High availability (HA) and Disaster Recovery
(DR) for SharePoint farm. These solution has Pros and Cons based on your business
requirement, Recovery Time Objective (RTO), Recovery Point Objective (RPO) and budget, In
Short, Service Level Agreement (SLA).
Terminologies Before we delve into this document, it's better to understand some basic meaning of the
below terminologies.
High Availability High Availability (HA) refers to a system or component that is continually operational for
desirably long length of time in order to minimize or mitigate the impact of downtime.
Disaster Recovery Disaster Recovery (DR) is a process that you can use to help recovering information systems
and data, if a disaster occurs.
Recovery Time Objective (RTO) This is the duration of the outage. The initial goal is to get the system back online in at least
a read-only capacity to facilitate investigation of the failure.
Recovery Point Objective (RPO) This is often referred to as a measure of acceptable data loss. It is the time gap or latency
between the last committed data transaction before the failure and the most recent data
recovered after the failure.
Replica Two types of replicas exist: a single primary replica, which hosts the primary databases, and
secondary replicas, each of which hosts a set of secondary databases and serves as a
potential failover targets for the availability group.
7 | P a g e
Solutions in SQL Server Microsoft provides many solutions to enable High Availability (HA) or Disaster Recovery (DR)
for SQL Server environments. At the high level, these options are:
Backup and Restore
Log Shipping
Replication
Mirroring
Failover Cluster
AlwaysOn Availability Group
There are solutions out of SQL Server product like SAN replication, Hyper-V replication and
other solutions but these types of replications are not supported by Microsoft because they
may cause consistency issues especially for search index and timer jobs.
The only exception for Virtual machine replication is Azure Site Recovery, which does
support replication of virtual machines into Azure for the purposes of Disaster Recovery, you
can find more information in these links:
https://docs.microsoft.com/en-us/azure/site-recovery/site-recovery-sharepoint
https://docs.microsoft.com/en-us/office365/enterprise/sharepoint-server-2013-disaster-
recovery-in-microsoft-azure
Backup and Restore The purpose of creating SQL Server backups is to enable you to recover a damaged
database. We can summarize the Pros and Cons in the following points:
There is a possibility to lose Data
Inexpensive solution for DR
It doesn't provide HA
Protection at database level
Log Shipping SQL Server Log shipping allows you to automatically send transaction log backups from
a primary database on a primary server instance to one or more secondary databases on
separate secondary server instances. We can summarize the Pros and Cons in the following
points:
Can be HA or DR
No automatic failover
Protection at database level
Inexpensive solution
There is a possibility to lose Data
8 | P a g e
Replication Replication is a set of technologies for copying and distributing data and database objects
from one database to another and then synchronizing between databases to maintain
consistency. We can summarize the Pros and Cons in the following points:
Protection at database level
No automatic failover
Inexpensive solution
Can be used as load balancing
Can be HA or DR
Mirroring Database mirroring is a primarily software solution for increasing database availability by
replicate the data between primary and secondary servers. We can summarize the Pros and
Cons in the following points:
Can be HA or DR
Limited to two servers
It supports automatic failover but it needs witness server for this purpose
Replaced with AlwaysOn in SQL Server 2012 and higher releases
Note:
Not all databases can be configured with Failover database server from central
administration but you can add this property from PowerShell for these database like
configuration database using the below commands:
$database = Get-SPDatabase | where { $_.Name -eq "SomeSharePointDB" }
$database.AddFailoverServiceInstance("SQLSecond")
$database.Update();
Failover Cluster A Windows Server Failover Clustering (WSFC) cluster is a group of independent servers that
work together to increase the availability of applications and services. SQL Server takes
advantages of WSFC services to provide local high availability through redundancy at the
server-instance level. We can summarize the Pros and Cons in the following points:
HA
Expensive Solution
Protection at instance level
It supports automatic failover
AlwaysOn Availability Group This is the latest solution provided by Microsoft from SQL Server 2012 and higher releases
which has the features of Database Mirroring and Failover Cluster with many enchantments
and new features. We can summarize the Pros and Cons in the following points:
Can have up to 4 replicas or more based on SQL Server versions (In Mirroring , only
one primary and one secondary servers)
9 | P a g e
Can be HA and DR (HA = Sync mode , DR = Async mode)
No need for SAN storage (In Failover cluster , SAN storage or other network disks are
required)
Can be deployed in geographical (like in Failover Cluster but with more
enchantments)
No need for witness Server
You can use Availability Group Listener (Virtual IP and Name like in Failover cluster)
Replica Servers can be accessed for backup or reporting service operations … etc.
Supports automatic failover (In case of Sync mode)
Needs SQL Server enterprise edition
So in this case you can use one solution to provide HA and DR instead of using multiple
solutions like what it was in previous versions for example, using Database Mirroring for
remote DR (or log shipping) and Failover cluster for HA.
Terms
AlwaysOn Failover Cluster instance (FCI) = SQL Server Failover Cluster Instance
AlwaysOn Availability Group = like Database mirroring in old version but with many
enhancements
This table from Microsoft whitepaper to show the differences between these solutions
(Based on SQL Server 2012 version):
High Availability and Disaster Recovery
SQL Server Solutions
Potential Data Loss
(RPO)
Potential Recovery
Time (RTO)
Automatic Failover
Readable Secondaries
AlwaysOn Availability Group - synchronous-commit
Zero Seconds Yes 0 - 2
AlwaysOn Availability Group - asynchronous-commit
Seconds Minutes No 0 - 4
AlwaysOn Failover Cluster Instance NA Seconds -to-minutes
Yes NA
Database Mirroring - High-safety (sync + witness)
Zero Seconds Yes NA
Database Mirroring - High-performance (async)
Seconds Minutes No NA
Log Shipping Minutes Minutes -to-hours
No Not during a restore
Backup, Copy, Restore Hours Hours -to-days
No Not during a restore
10 | P a g e
SharePoint Topology Let us assume that we have the following topology design for the SharePoint Farm and we
need to enable high availability for SharePoint databases.
In this diagram we have redundant servers for both SharePoint and SQL Server to provide
High availability in case one of these servers go down then the second can continue to serve
the user's requests.
SQL Server 01 is the primary server which the SharePoint servers connect to it directly or by
using the Availability Group listener. Any transactions run in SQL Server 01 (Primary Server),
it should be committed also in SQL Server 02 (Secondary Server) before it return the success
message to the users (this is called sync mode).
Note
In Async commit mode, transactions will be committed in SQL Server 01 (Primary Server)
and return the success message to the user and then write these changes to SQL Server 02
(Secondary Server), so in this case there is a possibility to loss data if the connection is gone
after the changes committed to SQL Server 01 but this solution is more suitable with DR
because the latency and performance issues.
Prerequisites For this lab, I used Windows Azure to build the SharePoint Farm which has the following
Virtual machines:
SP1 = SharePoint Server 2016 beta 2 Server
DC1 = Domain Active Directory Server
SQL1 = SQL Server 2014 SP1 Enterprise (Standalone) Server
11 | P a g e
o SharePoint Farm connect directly to this instance
SQL2 = SQL Server 2014 SP1 Enterprise (Standalone) Server
SQL Client Alias It's highly recommended to use SQL client alias name to make the changes easily in future in
case of you change the SQL Server instance name or the SQL server itself.
Steps
Open Run command, type cliconfg
Go to Alias Tab
Fill the fields (SQLDB is a fake name)
12 | P a g e
Note
Also make sure to create SQL alias name in C:\Windows\SysWOW64\cliconfg.exe for 64-bit
access.
During the SharePoint configuration, enter SQL Alias name instead of the real SQL Server
name
As the below image, SharePoint farm is configured with SQL Alias name
13 | P a g e
Notes
Don't forget to configure these settings in order to install the SharePoint successfully:
1. Grant the SharePoint farm service account these SQL Server roles (db_creator ,
public , securityadmin)
2. Set Max Degree Parallelism = 1 at the instance level settings
Again, don't forget to do the same steps in the secondary server.
SQL Server instances In our lab, we have two SQL Server instances as standalone servers (for SharePoint, you only
need SQL Server service).
14 | P a g e
And these are the databases created for the SharePoint farm
Note
The secondary server has no databases.
If you check the AlwaysOn setting, you will get the following message:
So it's time to configure AlwaysOn Availability Group in SQL server for High Availability.
15 | P a g e
AlwasyOn Configuration To configure AlwaysOn Availability Group, we will go with these steps:
1. Configuring Windows Server Failover Clustering
2. Enabling AlwaysOn High Availability
3. Creating AlwaysOn High Availability
4. Testing AlwaysOn Availability Group
Configuring Windows Server Failover Clustering Start by opening Server Manager and Click on Add Roles and Features
Click Next
Keep the default option and click Next
16 | P a g e
Select first server (SQL Server 01) and click Next
Don't select anything here and click Next
17 | P a g e
Select Failover Clustering feature and click Next
Click add features
18 | P a g e
Click Install
Repeat the same steps on the secondary Server (SQL Server 02).
In Primary Server, let us create the new cluster by opening the Failover Cluster Manager
Select Create Cluster…
19 | P a g e
Note
Run it with the domain administrator account because it will create Computer Object and (A)
record in Active Directory and DNS server.
Click Next
Select the SQL Server nodes to be added to this cluster
20 | P a g e
Select first option, to run the validation tests (To test the prerequisites requirements like
Network configuration, Storage …)
Click Next
21 | P a g e
Select second option, to customize the tests to meet AlwaysOn requirements only
Don't select Storage option (because we don't have SAN storage in AlwaysOn case)
22 | P a g e
Click Next
Validate the Results, Fix the errors or warning (if exists) and then click Finish
23 | P a g e
After the closing validation process, select the second option and click Next
Enter the Cluster name and IP address (it will create A record in DNS Server)
24 | P a g e
Click Next
Click Finish
25 | P a g e
Two nodes have added to Windows cluster
Next step to configure the Quorum, click on the cluster name and select the below option
Click Next
26 | P a g e
Select the last option and click Next
Select the two nodes and Click Next
27 | P a g e
Select the second option and click Next
In this case, I will use a file share witness in the active directory server
28 | P a g e
Click Next
Click finish
29 | P a g e
Note
Windows cluster service account must has write permission to File share witness.
Enabling AlwaysOn High Availability Open SQL Server Configuration Manager and Right click on SQL Server service and select
properties
Enable AlwasyOn Availability Groups and Click Ok
30 | P a g e
Restart the SQL Server service
Repeat the same steps on the secondary server.
Creating AlwaysOn High Availability Open SQL Server Management Studio, but before creating AlwaysOn High Availability make
sure that all databases which need to be replicated have Full Recovery Model (Right click on
the database and choose properties and then go to options tab)
31 | P a g e
And also make sure to have Full Backup for these databases
Choose Full for backup type
32 | P a g e
Then go to AlwaysOn High Availability folder and create new …
Click Next
33 | P a g e
Enter a valid name (just label) and click Next
34 | P a g e
Select all databases
35 | P a g e
Note
If you don't take full back up or databases are not in Full recovery model then you will get
warnings and it will prevent you to continue until you take the Full back up and make these
databases in full recovery model.
Click on Add Replica …
Select Automatic Failover and Sync Commit for both nodes
36 | P a g e
Also choose "Yes" to make the secondary servers readable when it's acting as Secondary
server role
Create Shared folder to move backup files to secondary servers
37 | P a g e
Validate the configuration
Ignore the listener configuration for now and click Next
38 | P a g e
Click Finish
Availability Group has configured.
39 | P a g e
To add Listener, Go to Availability Group Listener and Click Add Listener…
Enter a valid Name, IP and Port (It will create (A) record in DNS Server)
Go to SharePoint Servers and change the SQL alias name to use the listener name
40 | P a g e
Testing AlwaysOn Availability Group To check and monitor the Health status of Availability Group, click on Show Dashboard
From here you can review the state and issues if exists
To simulate the failover scenario, click on Failover…
Click Next
41 | P a g e
Click Next
Connect to Secondary Server and Click Next
42 | P a g e
Click Finish
Now the Second Server will take the role of primary server
43 | P a g e
Note
To test the automatic failover, shut down the SQL Server or Service for the primary node.
SQL Server Versions The below table show High availability features supported by different versions.
For SQL Server 2012
44 | P a g e
For SQL Server 2014
For SQL Server 2016 and higher version
AlwaysOn will be supported with standard edition but with some limitations:
45 | P a g e
Limited to two nodes only (a primary and a secondary server)
Like mirroring, you can’t read from the secondary, nor take backups of it
It is not appropriate for a SharePoint farm as it only supports a single database
within the Basic Availability Group
46 | P a g e
Log and Backup Management Log file stores records about all transactions performed on the database in order to recover
the database in the event of failure to consistent state and also it used to manage the
transactions statements (COMMIT and ROLLBACK). In addition, log file enable DBA to
recover the database to the point-in-time restore.
When database in Full Recovery model, SQL Server retain every single log record in log file
and make the log file grow to add the new log records. SQL Servers also use these logs to
send the transactions to replica nodes in case of Database Mirroring or AlwaysOn.
The only right way to truncate the log file is to use log backup because it will let the log file
to overwrite the existing space (inactive VLF blocks inside the log file) and avoid any
increasing in the size of the log file (Overwrite doesn't mean reducing the size of the log but
it helps in preventing the growth of the log).
The log backup should be taken along with full database backups (also with differential
database backups).
Note
Even in case of AlwaysOn, you need to implement backups because what if both nodes
crashed due to hardware failure so you need backup plan in order to recover your databases
and to protect your contents and also you need log backup to manage the log growth and
restore the database to point-in-time restore.
In AlwaysOn, because databases in Full recovery model, we need to take care of log files to
avoid issues like log growth until the log take the whole space of the disk which causing SQL
Server to enter in read-only mode.
Note
Switching the database to SIMPLE recovery model or using SHRINK will break the log chain
which it will cause to lose the possibility to recover the database to the point-in-time
restore, but if you in a situation that force you to use SHRINK to reduce the log file size then
use SHRINK.
For more information check this article http://tctblgs.azurewebsites.net/shrinking-sql-log-
files-in-an-availability-group-cluster-or-database-mirror/
Building backup strategy can be different from environment to environment based on
business requirements and SLA .In this lab I will show you how to configure Full Backup and
Log backup in order to save you in the time of disaster and to prevent the log file to grow
excessively.
Go to Availability Group and right click on it and choose properties
47 | P a g e
Click on Backup Preferences tab
There are four preferences as following:
1. Prefer Secondary which it run backups on secondary servers only except if no
secondary nodes then it will run it on primary server
2. Secondary only which it will run backups only on secondary server even if the
automated backups exists on primary server it will not run
3. Primary which it will run on primary server only
48 | P a g e
4. Any Replica which it will run the backups on any replica based on Priority table (you
can exclude some replica from handling backup operations)
In this case, we will go with Prefer Secondary option which it will run the automated
backups on secondary servers only except if there is no secondary nodes then it will run on
primary server, so in all cases there is automated backup.
Why to run backup on secondary servers?
It allows SQL Server to reduce or eliminate resource contention between production activity
and backups.
Go to Secondary Server to create Maintenance plan for databases backup for this example
(again backup plan based on your strategy and SLA):
Daily Full backup
Log backup for every 6 hour
Give it a descriptive name
Add Backup Database Task to first Subplan (Full Backup)
49 | P a g e
Open properties of Backup Database Task and Select Full Backup type
Select databases to backup
50 | P a g e
Specify the backup folder
Choose (Copy-only backup) and Click Ok
Note
Copy-only backup is a backup type which is separated from the sequence of standard SQL
backups. This type of backup will not break the LSNs chain for backups. In this case because
the backup will run in replica servers, SQL Server prevent you to take normal full backup in
these replica servers to avoid breaking the LSN chain in Primary Server.
51 | P a g e
Open the subplan (Log Backup) and add Backup database task
Select Transaction Log Backup type
52 | P a g e
Select databases to backup
Specify the backup folder
53 | P a g e
Click Ok
Then save the maintenance plan
Note
You can run log backup in any servers whether on primary or secondary server.
54 | P a g e
Try the following scenarios:
Scenario Status
Run Automated Backups on Secondary Server Takes Backups
Run Automated Backups on Primary Server (Make SQL Server 2 as Primary Server)
Doesn't take Backup
Run Automated Backups on Primary Server and No secondary Server (Stop SQL Server 1)
Takes Backups
Ports The below table show the most common default ports numbers that used by SQL Server
with AlwaysOn:
1433 TCP Default instance database engine (Can be changed)
1434 UDP SQL Browser Service
5022 TCP AlwaysOn default port for primary and secondary replicas
398 TCP/UDP LDAP authentication
53 TCP/UDP DNS
3343 TCP/UDP Cluster Network Communication
For more information you can check this article
SQL Server: Frequently Used Ports http://social.technet.microsoft.com/wiki/contents/articles/13106.sql-server-frequently-
used-ports.aspx
55 | P a g e
SQL Authentication In normal cases, SharePoint will connect to SQL Server using Windows authentication mode.
With AlwaysOn implementation, just make sure that service accounts of SharePoint need to
be added in all nodes with the same roles and permissions.
But if you have an application needs to connect to SQL Server using SQL authentication
mode then in this case you will get issues with AlwaysOn because SQL Login user has
different internal SID in both nodes even if you create the SQL login with the same Name so
SQL Server will consider the user doesn't has permission to replica databases.
To fix this issue, SQL Server 2012 and new versions introduced new database concept called
Contained Databases.
A contained database is a database that is isolated from the instance of SQL Server that
hosts the database. User authentication in a contained database can be performed by the
database, reducing the databases dependency on the logins of the instance of SQL Server. By
this way, SQL Login user (with its internal SID) will be stored inside the database so during
the replication operation it will replicate the login as metadata related to this database.
Steps to configure a contained database:
Run the following script in both nodes to enable partial databases
EXEC sys.sp_configure N'contained database authentication', N'1' GO RECONFIGURE WITH OVERRIDE GO
Go to Custom Application Database and Right click on it and choose properties
56 | P a g e
Go to options table and choose Partial for Containment type
If you get the below error then try this script
57 | P a g e
USE [master] GO ALTER DATABASE [ApplicationERP] SET CONTAINMENT = PARTIAL WITH NO_WAIT GO
Go to Security folder inside the database >> Users >> and click on New User…
Enter a valid username and password which will be stored inside the database
58 | P a g e
Grant it the right privilege (select the least privilege of permission in production servers)
If you check the replica database, you will find the user replicated under the database in
replica servers
59 | P a g e
Then you can use this SQL Login user with the application connection string, but don't forget
to enable SQL Server and Windows Authentication mode in all nodes
Use normal connection string
<connectionStrings> <add name="ConnectionStringName" providerName="System.Data.SqlClient" connectionString="Data Source=SQL1;Initial Catalog=ApplicationERP;Integrated Security=False;User Id=test;Password=P@ssw0rd;"/> </connectionStrings>
60 | P a g e
Cluster Quorum Models The cluster contains nodes and resources, to consider these resources in High Available
situation only if the nodes are up and running. The cluster required more than half of the
nodes to be up and running otherwise the cluster will go down. Quorum for the cluster
maintain the number of nodes (also could be disk witness or file share witness like in our
lab) that must be online for the cluster to be run also to prevent scenarios when nodes can't
communicated with each other which cause each node to try to own the resources at the
same time. By default, every node in a failover cluster has a vote to determine whether the
cluster continues running or not.
The value ‘0’ means the node doesn’t have a vote. The value ‘1’ means the node has a vote.
Each node in a WSFC cluster participates in periodic heartbeat communication to share the
node's health status with the other nodes.
Cluster Quorum has four models, only first three are recommended to use:
Node Majority: only Nodes can vote
Node and File Share Majority: Nodes and File Share witness can vote
Node and Disk Majority: Nodes and Disk witness can vote
No Majority: Disk Only: Only disk witness can vote and this model used in prior to
Windows 2003 which it was only supported disk witness quorum.
To understand the usage for these models let us assume the following examples:
1. If you have 4 nodes which it's equal to 4 votes then if 1 node fail then the 3
remaining nodes which is more than half of the cluster nodes will stay running.
2. If you have 4 nodes which it's equal to 4 votes then if 2 nodes fail then the 2
remaining nodes will go down because it's not more than half of cluster nodes.
3. To increase the high availability for the cluster with 4 nodes then we can add File
Share or disk witness which can have a vote, so in this case if we have 4 nodes + 1
File Share or Disk which it's equal 5 votes then if 1 node fail then the 4 remaining
nodes will stay running and if 2 fail then the 3 remaining nodes will stay running, so
by adding the File Share or Disk witnesses we increase the availability with cheap
means without a need to purchase a server (node).
4. If you have 2 nodes and there is no File Share or disk then if one node goes down
then the cluster will go down.
It's recommended to have odd number of votes which they equal to more than half by the
quorum calculation (Minimum 2 Nodes + File Share or Disk witnesses).
Which model to select?
By default, Failover Cluster manager picks the best model based on cluster configuration and
nodes. If there is odd number of Nodes then the cluster will select (Node Majority) and if
there is even number of Nodes and also has File Share then the cluster will select (Node and
File Share Majority) and if it's disk then it will be (Node and Disk Majority).If there is even
number of Nodes and no disk or file share witness then the mode will be (Node Majority)
with warning messages.
61 | P a g e
If the Cluster use No Majority: Disk Only then in this case the cluster has only 1 vote and if
the disk goes down then the whole cluster will go down.
Let us now check the Failover Cluster manager selection based on our lab configuration, go
to Availability Group Dashboard and click on View Cluster Quorum Information
The Quorum model in this case is Node and Fileshare Majority because we have 2 nodes
and 1 Fileshare witness and all have votes.
So in this case to make the cluster goes down we need to bring 2 votes to be offline then the
cluster will go down. In the below image, only the SQL 01 is online and others are offline so
the cluster is down
62 | P a g e
Note
You can access the database by direct name only.
If you try to make one node is down (I will make the Fileshare offline) then in this case we
have 2 votes which it is more than half of the cluster nodes
In this case, the cluster still is running
63 | P a g e
Disaster Recovery Plan To extend the SharePoint topology to support disaster recovery solution, let us add extra
SQL server to the below design in different data center
In this case we added extra SQL Server 03 in different data center which act as disaster
recovery replica with the following considerations because they are requirements for
Availability Group:
Join SQL Server 03 to the same Window cluster
Use the same domain Active directory
Use Async mode with DR solution to avoid latency and performance issues
No automatic failover
To join the server to the current windows cluster, click on Add Node…
Click Next
64 | P a g e
Add SQL Server 03 and click Next
After validating the node , choose second option and click Next
65 | P a g e
Click Next
Click Finish
66 | P a g e
Then go to SQL Server service, right click on it and choose properties
Enable AlwaysOn Availability Group and click Ok
67 | P a g e
Restart the SQL Server service
Currently we have 3 nodes in the current windows cluster.
68 | P a g e
Also consider to modify the quorum model to prevent the Disaster recovery nodes from
affecting the quorum votes in Primary data center ,so in this case we will assign '0'
NodeWeight to DR nodes to avoid the outage or disconnection of DR nodes by using the
following command (you can use PowerShell or cluster.exe commands):
Now the primary data center nodes only can votes
Note
To failover to DR , you need to change the vote weight for DR nodes by giving them '1' vote
and make the Nodes in primary data center with '0' vote and in case of revert back to
primary data center do the vice versa.
Before add this server as replica, make sure to add SharePoint services accounts, configure
Max Degree of Parallelism and enable contained databases on this new node.
Then to add this node to AlwaysOn Availability Group, click Add Replica…
69 | P a g e
Click Next
Connect to second node and click Next
70 | P a g e
Add SQL Server 03 with Async mode (also make it readable) and click Next
71 | P a g e
Enter the Shared folder to move the backups to this node and click Next
Click Next
72 | P a g e
Click Finish
73 | P a g e
Now we have 3 replica, 2 in primary data center and 1 in disaster recovery data center.
74 | P a g e
And if you check the Cluster Quorum Information
75 | P a g e
Recovering from a Disaster If the primary data center goes down for any reasons, then these are the steps you can
follow to recover from a disaster.
Let us assume in our lab that SQL 01 and SQL 02 go down.
At DR Data center, the cluster will be down because the quorum and half of nodes are down
so in this case you can't access the AG Listener and you can't access the cluster manager
therefore we need to start the Cluster by commands (Called force Quorum mode).
Note
If the cluster is running in DR nodes then stop it to start it with Force Quorum mode by using
this command
Stop-ClusterNode –Name "SQL3"
76 | P a g e
Or go to services.msc and stop the cluster service
Open PowerShell and run this command:
Start-ClusterNode –name "SQL3" –FixQuorum
Once the cluster service on the DR node has started, the availability group will show offline
in the Failover Cluster Manager, and cannot be brought online.
77 | P a g e
Change the votes for these nodes as following:
Now to bring AG online, run this script in SQL Server 03, but in this case you need to accept
the risk of data lose because maybe there is data or unsent log yet not replicated from the
primary data center before the failure occurs.
78 | P a g e
Check now
Now you can access the database using Availability Group Listener
If you check the dashboard, ignore the warning messages which are related to nodes in
Primary data center.
79 | P a g e
Revert back to Primary data center To resume back to the primary servers, then follow these steps:
Let us assume that the Nodes in primary data center are up and running and these nodes
connected again to Windows cluster.
80 | P a g e
Note that Databases are not synchronized in the primary nodes
Before do anything you need to evaluate the case, if you want to take the trial log backup
from primary databases or resync from DR replica server, in this case we will resync with DR
nodes.
First change the votes for these nodes
Run these commands to assign the new votes
81 | P a g e
After you run the commands, you have only votes in primary data center nodes
Then to resume from DR node to primary data center, for each databases run resume script
Do the same in second node, now databases in primary nodes are sync.
82 | P a g e
Modify the sync commit mode in SQL Server 03 to be in sync mode instead of async mode to
synchronize prior to failover the databases.
83 | P a g e
Now the databases are in sync commit mode in all nodes.
84 | P a g e
Then failover to SQL Server 01 by running the below command in SQL Server 01
85 | P a g e
Everything return back again as it was before the disaster happened.
Change DR node again to async commit mode
Now if you check the dashboard, you will find the node in primary data center take the role
of Primary node.
86 | P a g e
87 | P a g e
General SharePoint Considerations These are important points to consider with remote disaster recovery solutions:
Not all SharePoint databases supported in async mode like for example SharePoint
Config and Central admin databases and these databases should be different in both
data centers because they store information about the servers so it's better to not
sync them in DR solutions.
For more information
http://www.harbar.net/archive/2014/03/20/Support-for-SQL-Server-Always-On-
Async-Replication-with-SharePoint.aspx
https://docs.microsoft.com/en-us/SharePoint/administration/supported-high-
availability-and-disaster-recovery-options-for-sharepoint-databas
When adding new Site Collection in a database that is replicated to the Disaster
Recovery farm, make sure to update the configuration database in Disaster recovery
farm to register the new created site collection because it will not automatically be
updated, you can updated using the following PowerShell:
$db = Get-SPDatabase | where {$_.Name -eq "DatabaseName"}
$db.RefreshSitesInConfigurationDatabase()
In case of multi subnet failover cluster, consider to use MultiSubnetFailover
property for all SharePoint databases to make the connection to different subnet
more stable and to avoid connection timeouts issue.
For more information
http://blogs.msdn.com/b/sambetts/archive/2015/02/13/multi-subnet-sql-server-
clusters-sharepoint-2013-spdatabase-multisubnetfailover.aspx
88 | P a g e
References SQL Server Transaction Log Management by Tony Davis and Gail Shaw
https://www.simple-talk.com/books/sql-books/sql-server-transaction-log-
management-by-tony-davis-and-gail-shaw/
AlwaysOn Architecture Guide: Building a High Availability and Disaster Recovery
Solution by Using Failover Cluster Instances and Availability Groups
https://msdn.microsoft.com/en-us/library/jj215886.aspx
Microsoft SQL Server AlwaysOn Solutions Guide for High Availability and Disaster
Recovery https://msdn.microsoft.com/en-us/library/hh781257.aspx
89 | P a g e
Thank You Thanks for reading this Whitepaper. Again, I really hope this has been informative and that
will help you to increase the SharePoint High availability. For any questions or comments,
send me an email @ [email protected] .