exchange server 2013 high availability - site resilience

Exchange Server 2013High Availability | Site ResilienceScott SchnollPrincipal Technical WriterMicrosoft Corporation

Agenda

Storage

High Availability

Site Resilience

Storage

Storage Challenges

Capacity is increasing, but IOPS are notDatabase sizes must be manageableReseeds must be fast and reliablePassive copy IOPS are inefficientLagged copies have asymmetric storage requirementsLow agility from low disk space recovery

Storage Innovations

Multiple Databases Per VolumeAutomatic ReseedAutomatic Recovery from Storage FailuresLagged Copy Enhancements

Multiple databases per volume

Multiple Databases Per Volume

DB1 DB4DB3DB2

DB4

DB3

DB2

DB1

DB4

DB3

DB2

DB1

DB4

DB3

DB2

DB1

Passive

ActiveLagge

d

4-member DAG4 databases4 copies of each database4 databases per volume

Symmetrical design with balanced activation preference

Number of copies per database = number of databases per volume


DB1 DB1DB1DB1

Single database copy/disk:Reseed 2TB Database = ~23 hrsReseed 8TB Database = ~93 hrs

DB1DB1

Passive

Active

20 MB/s


DB1 DB4DB3DB2

Single database copy/disk:Reseed 2TB Database = ~23 hrsReseed 8TB Database = ~93 hrs

DB4

DB3

DB2

DB1

DB4

DB3

DB2

DB1

DB4

DB3

DB2

DB1

Passive

ActiveLagge

d

4 database copies/disk:Reseed 2TB Disk = ~9.7 hrsReseed 8TB Disk = ~39 hrs

DB1

12 MB/s

20 MB/s

20 MB/s

12 MB/s

DB1

DB4

DB3

DB2


Requirements

Single logical disk/partition per physical disk

Best Practices

Same neighbors on all servers

Balance activation preferences

Database copies per volume = copies per database

Autoreseed

Seeding Challenges

Disk failure on active copy = database failoverFailed disk and database corruption issues need to be addressed quicklyFast recovery to restore redundancy is needed

Seeding Innovations

Automatic Reseed (Autoreseed) - use spares to automatically restore database redundancy after a disk failure

Autoreseed

In-Use Storage

Spares

X

Autoreseed Workflow

Periodically scan for

failed and suspended

copies

Check prerequisite

s: single copy, spare availability

Allocate and

remap a spare

Start the seed

Verify that the

new copy is

healthy

Admin replaces

failed disk

Autoreseed Workflow

1. Detect a copy in an F&S state for 15 min in a row2. Try to resume copy 3 times (with 5 min sleeps in between)3. Try assigning a spare volume 5 times (with 1 hour sleeps in

between)4. Try InPlaceSeed with SafeDeleteExistingFiles 5 times (with 1

hour sleeps in between)5. Once all retries are exhausted, workflow stops6. If 3 days have elapsed and copy is still F&S, workflow state is

reset and starts from Step 1

Autoreseed Workflow

PrerequisitesCopy is not ReseedBlocked or ResumeBlockedLogs and EDB files are on same volumeDatabase and Log folder structure matches required naming conventionNo active copies on failed volumeAll copies are F&S on the failed volumeNo more than 8 F&S copies on the server (if so, we might be in a controller failure situation)

For InPlaceReseedIf EDB files exists, wait for 2 days before in-place reseeding (based on LastWriteTime of edb file)Only up to 10 concurrent seeds are allowed

Autoreseed

Configure storage subsystem with spare

disks

Create DAG, add servers with configured

storage

Create directory and mount points

Configure 3 AutoDAG properties

Create mailbox databases and

database copies

\

ExchDbs

ExchVols

Vol1 Vol3MDB1 MDB2

MDB1

Vol2

MDB2

MDB1.DB MDB1.log

MDB1.DB MDB1.log

AutoDagDatabasesRootFolderPath

AutoDagVolumesRootFolderPath

AutoDagDatabaseCopiesPerVolume = 1

Autoreseed

RequirementsSingle logical disk/partition per physical diskSpecific database and log folder structure must be used

RecommendationsSame neighbors on all serversDatabases per volume should equal the number of copies per databaseBalance activation preferences

Configuration instructions at http://aka.ms/autoreseed

Autoreseed

Numerous fixes in CU1Autoreseed not detecting spare disks correctly or using detected disks

GetCopyStatus has a new field 'ExchangeVolumeMountPoint' which shows the mount point of the database volume under C:\ExchangeVolumes

Better tracking around mount path and ExchangeVolume path

Increased autoreseed copy limits (previously 4, now 8)

Automatic Recovery From Storage Failures

Storage Challenges

Storage controllers are essentially mini-PCs and they can crash/hang

Other operator-recoverable conditions can occurLoss of vital system elementsHung or highly latent IO

Storage Innovations

Exchange Server 2013 includes functionality to automatically recovery from a variety of new storage-related failures

Innovations added in Exchange 2010 also carried forward

Even more behaviors added in CU1

Automatic Recovery from Storage Failures

Exchange Server 2010

ESE Database Hung IO (240s)

Failure Item Channel Heartbeat (30s)

SystemDisk Heartbeat (120s)

Exchange Server 2013

System Bad State (302s)

Long I/O times (41s)

MSExchangeRepl.exe memory threshold (4GB)

System Bus Reset (Event 129)

Replication service endpoints not responding

Lagged Copy Challenges

Lagged Copy Challenges

Activation is difficultRequire manual careCannot be page patched

Lagged Copy Innovations

Automatic play down of log files in critical situationsIntegration with Safety Net

Lagged Copy Innovations

Automatic log play downLow disk space (enable in registry)Page patching (enabled by default)Less than 3 other healthy copies (enable in AD; configure in registry)

Simpler activation with Safety NetNo need for log surgery or hunting for the point of corruption

High Availability

High Availability Challenges

High availability focuses on database healthBest copy selection insufficient for new architectureManagement challenges around maintenance and DAG network configuration

High Availability Innovations

Managed AvailabilityBest Copy and Server SelectionDAG Network Autoconfig

Managed Availability


Key tenet for Exchange 2013All access to a mailbox is provided by the protocol stack on the Mailbox server that hosts the active copy of the user’s mailboxIf a protocol is down on a Mailbox server, all active databases lose access via that protocol

Managed Availability was introduced to detect these kinds of failures and automatically correct themFor most protocols, quick recovery is achieved via a restart actionIf the restart action fails, a failover can be triggered


An internal framework used by component teams

Sequencing mechanism to control when recovery actions are taken versus alerting and escalation

Includes a mechanism for taking servers in/out of service (maintenance mode)

Enhancement to best copy selection algorithm


MA failovers come in two formsServer: Protocol failure can trigger server failoverDatabase: Store-detected database failure can trigger database failover

MA includes Single Copy AlertAlert is per-server to reduce flowStill triggered across all machines with copiesMonitoring triggered through a notificationLogs 4138 (red) and 4139 (green) events based on 4113 (red) and 4114 (green) events

Best Copy and Server Selection

Best Copy Selection Challenges

Process for finding the “best” copy of a specific database to activate

Exchange 2010 uses several criteriaCopy queue lengthReplay queue lengthDatabase copy status – including activation blockedContent index status

Not good enough for Exchange Server 2013, because protocol health is not considered


Still an Active Manager algorithm performed at *over time based on extracted health of the systemReplication health still determined by same criteria and phases

Criteria now includes health of the entire protocol stackConsiders a prioritized protocol health set in the selectionFour priorities – critical, high, medium, low (all health sets have a priority)Failover responders trigger added checks to select a “protocol not worse” target


All HealthyChecks for a server hosting a copy that has all health sets in a healthy state

Up to Normal HealthyChecks for a server hosting a copy that has all health sets Medium and above in a healthy state

All Better than SourceChecks for a server hosting a copy that has health sets in a state that is better than the current server hosting the affected copy

Same as SourceChecks for a server hosting a copy of the affected database that has health sets in a state that is the same as the current server hosting the affected copy

DAG Network Autoconfig

DAG Network Autoconfig

Automatic or manual DAG network configDefault is AutomaticRequires specific configuration settings on MAPI and Replication network interfaces

Manual edits and EAC controls blocked when automatic networking is enabledSet DAG to manual network setup to edit or change DAG networks

DAG networks automatically collapsed in multi-subnet environment

Site Resilience

Site Resilience Challenges

Operationally complexMailbox and Client Access recovery connectedNamespace is a SPOF

Site Resilience Innovations

Operationally simplifiedMailbox and Client Access recovery independentNamespace provides redundancy

Site Resilience

Key CharacteristicsDNS resolves to multiple IP addressesAlmost all protocol access in Exchange 2013 is HTTPHTTP clients have built-in IP failover capabilitiesClients skip past IPs that produce hard TCP failuresAdmins can switchover by removing VIP from DNSNamespace no longer a SPOFNo dealing with DNS latency

Site Resilience

Previously loss of CAS, CAS array, VIP, LB, some portion of the DAG required admin to perform a datacenter switchover

In Exchange Server 2013, recovery happens automaticallyThe admin focuses on fixing the issue, instead of restoring service

Site Resilience

Previously, CAS and Mailbox server recovery were tied together in site recoveries

In Exchange Server 2013, recovery is independent, and may come automatically in the form of failoverThis is dependent on the customer’s business requirements and configuration

Site Resilience

With the namespace simplification, consolidation of server roles, separation of CAS array and DAG recovery, de-coupling of CAS and Mailbox by AD site, and load balancing changes…

if available, three locations can simplify mailbox recovery in response to datacenter-level events

Site Resilience

You must have at least three locationsTwo locations with Exchange; one with witness server

Exchange sites must be well-connected

Witness server site must be isolated from network failures affecting Exchange sites

alternate datacenter: Portlandprimary datacenter: Redmond

Site Resilience

cas3 cas4cas1 cas2

VIP: 192.168.1.50 VIP: 10.0.1.50

mail.contoso.com: 192.168.1.50, 10.0.1.50


Site Resilience

cas3 cas4cas1 cas2

VIP: 192.168.1.50X VIP: 10.0.1.50

mail.contoso.com: 192.168.1.50, 10.0.1.50

Removing failing IP from DNS puts you in control of in service time of VIPWith multiple VIP endpoints sharing the same namespace, if one VIP fails, clients automatically failover to alternate VIP(s)

mail.contoso.com: 10.0.1.50

third datacenter: Paris

alternate datacenter: Portland

primary datacenter: Redmond

Site Resilience

dag1mbx1 mbx2 mbx3 mbx4

Assuming MBX3 and MBX4 are operating and one of them can lock the witness.log file, automatic failover should occur

witness

X


Site Resilience

dag1

witness

mbx1 mbx2 mbx3 mbx4XXX


dag1

Site Resilience

witness

mbx1 mbx2 mbx3 mbx4

alternate witness

1. Mark the failed servers/site as down: Stop-DatabaseAvailabilityGroup DAG1 –ActiveDirectorySite:Redmond

2. Stop the Cluster Service on Remaining DAG members: Stop-Clussvc

3. Activate DAG members in 2nd datacenter: Restore-DatabaseAvailabilityGroup DAG1 –ActiveDirectorySite:Portland

X

Questions?

Scott SchnollPrincipal Technical [email protected]://aka.ms/schnoll

schnoll

exchange server 2013 high availability - site resilience

Documents