scott schnoll microsoft...
TRANSCRIPT
Scott SchnollPrincipal Technical WriterMicrosoft CorporationSession Code: UNC378
Agenda
Site resilience requirements
Planning and designing for site resilience
Site activation steps
Switchback steps
Site resilience requirements
Business requirements drive site resilienceRisk assessment reveals a high impact threat to meeting SLAs in terms of data loss and loss of availability
Site resilience required to mitigate the risk
Business requirements dictate low recovery point objective (RPO) and recovery time objective (RTO)
Site resilience requirements
Ensuring business continuity brings expense and complexity
Site switchover is a coordinated effort, and takes practice to ensure the real event is handled
Exchange admins
Admins from other services (AD, DNS, ISA, etc.)
Exchange 2010 makes it as simple as possibleLow-impact testing can be performed with cross-site single database switchover
Technology is only half the picture in any site resilience solution, but will be the primary focus of this session
Exchange 2007Site resilience choices
CCR+SCR and /recoverCMS
SCC+SCR and /recoverCMS
CCR stretched across datacenters
SCR and database portability
SCR and /m:RecoverServer
SCC stretched across datacenters with synchronous replication
Exchange 2010 makes it simpler
Database Availability Group (DAG) with members in different datacenters
Supports automatic and manual cross-site database switchovers and failovers (*overs)
No stretched Active Directory site
No special networking needed
No /recoverCMS
Suitability of site resilience solutions
Solution RTO goal RPO goal Deploymentcomplexity
Ship backups and restore
High High Low
Standby Exchange 2003 clusters
Moderate Low High
CCR+SCR in separate AD sites
Moderate Low Moderate
CCR in a stretched AD site
Low Low High
Exchange 2010 DAGs
Low Low Low
Designing a site resilient solution Namespace planning
Each datacenter should be considered active when planning for namespaces
Each datacenter needs the following namespaces
OWA/OA/EWS/EAS namespace
POP/IMAP namespace
RPC Client Access namespace
SMTP namespace
In addition, one of the datacenters will maintain the autodiscover namespace
Designing a site resilient solution Namespace planning
Best Practice: Use Split DNS for Exchange hostnames used by clients
Goal: minimize number of hostnamesmail.contoso.com for Exchange connectivity on intranet and Internet
mail.contoso.com has different IP addresses in intranet/Internet DNS
Important – before moving down this path, be sure to map out all the host names (outside of Exchange) that you will want to create in the internal zone
Designing a site resilient solution Namespace planning
Datacenter 1
CAS HT
MBX
Datacenter 2
HT CAS
ADAD MBX
Internal DNSMail.contoso.comPop.contoso.comImap.contoso.comAutodiscover.contoso.comSmtp.contoso.comOutlook.contoso.com
Internal DNSMail.region.contoso.comPop.region.contoso.comImap.region.contoso.comSmtp.region.contoso.comOutlook.region.contoso.com
Exchange ConfigExternalURL = mail.region.contoso.comCAS Array = outlook.region.contoso.comOA endpoint = mail.region.contoso.com
Exchange ConfigExternalURL = mail.contoso.comCAS Array = outlook.contoso.comOA endpoint = mail.contoso.com
External DNSMail.region.contoso.comPop.region.contoso.comImap.region.contoso.comSmtp.region.contoso.com
External DNSMail.contoso.comPop.contoso.comImap.contoso.comAutodiscover.contoso.comSmtp.contoso.com
Best Practices:Network
LatencyLess than 250 ms round trip
Router ACLs should be used between MAPI and replication networks
If DHCP is used for the replication network, DHCP can be used for the static routes
Lower TTL for all Exchange records to 5 minutesOWA/EAS/EWS/OA, IMAP/POP, SMTP, RPCCAS
Both internal and external DNS zone
Best Practices:Exchange servers
Number of mailbox serversSecondary site should have fewer servers than the primary site
How many servers?
Live Active Directory and Hub/CAS in the secondary site, even if activation is blocked
Designing a site resilient solution: a real example
At first, there was Pioneer, with ~450 mailboxesHalf active on 60, half active on 61, all passive on 62
The Failover site was added for site resilienceCopies on MBX63 were set up as lagged passives
They could have been activation blocked
One new database active on 63, with a passive copy on 62
Pioneer Failover
MB
X6
1
MB
X6
0
MB
X6
2 DF-DAG-03
MB
X6
3
Designing for site resilience: Mailbox
Seed from passive could be used if we added another server in Failover
Witness server is in PioneerAlternate witness added
Routable DAG networks for heartbeatingRouter ACLs are needed to prevent crosstalk between networks
DAC mode was enabled
Pioneer Failover
MB
X6
1
MB
X6
0
MB
X6
2 DF-DAG-03
MB
X6
3
Designing for site resilience: Mailbox
Datacenter Activation ModeWhen servers are first powered on
They attempt to contact a server that can mount for permission
Alternatively, they should be able to contact all started servers
Pioneer Failover
MB
X6
1
MB
X6
0
MB
X6
2 DF-DAG-03
MB
X6
3
Designing for site resilience: Transport and UM
Hub Transport servers are site-aware and deliver messages to the right site
Shadow redundancy retains messages at the previous cross-site hop as well
UM servers can be configured with the same dial plan across sites
Some VoIP gateways may allow specifying a name
Pioneer Failover
DF-DAG-03
HT0
1
HT0
2
UM
03
UM
04
UC
H0
1
UC
H0
2
Designing for site resilience: CAS
RPC client access service establishes a RPC endpoint for client access on the CAS role
RPC client access service results in redirection or proxying for single DB failures
Pioneer Failover
DF-DAG-03C
06
C0
7
UC
H0
1
UC
H0
2
Hubs and UMs
Designing for site resilienceCertificates
Use Subject Alternative Name (SAN) certificate which can cover multiple hostnames
Best practice: minimize the number of certificates1 certificate for all CAS servers + reverse proxy + Edge/Hub
1 additional certificate if using OCS
If leveraging a certificate per datacenter, then ensure that the Certificate Principal Name is the same on all certificates
Set-OutlookProvider EXPR -CertPrincipalNamemsstd:pioneer.exchange.microsoft.com
OCS requires certificates with <=1024 bit keys, and the server name in the certificate principal name
Site Resilience Recommendations
Design High Availability for DependenciesActive Directory
Network services (DNS, TCP/IP, etc.)
Telephony services (Unified Messaging)
Backup services
Network services
Infrastructure (power, cooling, etc.)
Site Resilience Recommendations
Make sure the Directory Services operations staff fully understands recovery scenario and procedures
Script DNS changes
Log on with the right credentials
Verify machines are ready for use
Test activation periodically – validate, validate, validate!
Datacenter Switchover Process
Failure occurs
Activation decision
Terminate partially running primary datacenter
Activate secondary datacenterValidate prerequisites
Activate mailbox servers
Activate other roles (in parallel with previous step)
Service is restored
Site Resilience Tasks
Stop-DatabaseAvailabilityGroupAdds servers to stopped list
Removes servers from started list
Force cleanup stopped nodes
Restore-DatabaseAvailabilityGroupForce quorum
Evict stopped nodes
Start using alternate file share witness if necessary
Start-DatabaseAvailabilityGroupRemove servers from stopped list
Join servers to cluster
Add joined servers to started list
Terminate primary site: Mailbox
The mailbox servers must be marked in Active Directory as stopped in both sites
Stop-DatabaseAvailabilityGroup DF-DAG-03 –ActiveDirectorySite Pioneer –ConfigurationOnly
The mailbox servers can be powered off instead
This process reduces the risk of split brain
Pioneer Failover
MB
X6
1
MB
X6
0
MB
X6
2 DF-DAG-03
MB
X6
3
Hubs and UMCAS
Terminate primary site: UM
The UM servers must be disabled to prevent call routing to the failed datacenter
Disable-UMServer UM03
Disable-UMServer UM04
Can remove servers from VoIP gateway instead
DNS change if the VoIP gateway is configured to send calls to a DNS
Pioneer Failover
DF-DAG-03
UM
03
UM
04
UC
H0
1
UC
H0
2
HubsCAS
MB
X6
3
Restore secondary site: Mailbox and UM
The mailbox servers in the secondary datacenter must be activatedStop-Service ClusSvc
Restore-DatabaseAvailabilityGroup
If the UM server was disabledEnable-UMServer
Pioneer Failover
MB
X6
1
MB
X6
0
MB
X6
2
DF-DAG-03
CAS
UM
03
UM
04
UC
H0
1
UC
H0
2
Hubs
Restore secondary site: CAS
Internal and external DNS changes to point A records to secondary site
OWA/OA/EWS/EAS namespace
POP/IMAP namespace
RPC Client Access namespace
SMTP namespace
Similar changes are required if using ISA
Pioneer Failover
DF-DAG-03
UC
H0
1
UC
H0
2
Hubs and UMs
Pioneer namespioneer.exchange.microsoft.comimap.pioneer.exchange.microsoft.compop.pioneer.exchange.microsoft.commomt.exchange.corp.microsoft.com
Failover namesFailover.exchange.microsoft.comMomt.failover.exchange.microsoft.com
C0
6
C0
7
Switchover to secondary datacenter
Primary data center failsStop-DatabaseAvailabilityGroup DF-DAG-03 –ActiveDirectorySitePioneer –ConfigurationOnly (in both data centers)
Stop-Service clussvc
Disable-UMServer UM03 (same for UM04)
Restore-DatabaseAvailabilityGroup DF-DAG-03 –ActiveDirectorySiteFailover –AlternateWitnessDirectory c:\fsw\DAG1 –AlternateWitnessServer UCH01
Databases mount (non-activation block scenario)Move-ActiveMailboxDatase (activation block scenario)
Adjust DNS records for SMTP and HTTPS access and adjust CAS configuration (if necessary)
Enable-UMServer UCH01 (same for UCH02)
Pioneer Failover
Restore primary site
Verify that all services are working in the primary datacenter
Activate stopped mailbox servers
Start-DatabaseAvailabilityGroup DF-DAG-03 –ActiveDirectorySite Pioneer
Allow replication to occur before scheduling outage
Dismount databases
MB
X6
1
MB
X6
0
MB
X6
2
DF-DAG-03
MB
X6
3
CAS
UM
03
UM
04
UC
H0
1
UC
H0
2
HubsPioneer namespioneer.exchange.microsoft.comimap.pioneer.exchange.microsoft.compop.pioneer.exchange.microsoft.commomt.exchange.corp.microsoft.com
Failover namesFailover.exchange.microsoft.comMomt.failover.exchange.microsoft.com
Pioneer Failover
Failover namesFailover.exchange.microsoft.comMomt.failover.exchange.microsoft.com
Restore primary site
Reverse DNS changes
Re-enable UM servers
Move Witness back to primary siteSet-DatabaseAvailabilityGroup DF-DAG-03 –WitnessDirectory c:\fsw\dag1 –WitnessServer DF-C14-06
Mount databases to complete failback and restore service
MB
X6
1
MB
X6
0
MB
X6
2DF-DAG-03
MB
X6
3
UC
H0
1
UC
H0
2
Hubs
C0
6
C0
7
UM
03
UM
04
Pioneer namespioneer.exchange.microsoft.comimap.pioneer.exchange.microsoft.compop.pioneer.exchange.microsoft.commomt.exchange.corp.microsoft.com
Switchback to primary datacenter
Verify that all services are working in the primary datacenterStart-DatabaseAvailabilityGroup DF-DAG-03 –ActiveDirectorySite Pioneer
Set-DatabaseAvailabilityGroup DF-DAG-03 –WitnessDirectory c:\fsw\dag1 –WitnessServer HT01
Reseed data or allow replication to occur and update copies in primary datacenter
Schedule downtime for the mailbox databases and dismount them
Change DNS records back to primary datacenter
Move databases back to primary datacenterMove-ActiveMailboxDatabase Failover_Test –ActivateOnServer MBX60
Mount databases in primary datacenter
Enable UM server in primary datacenter
Disable UM server in secondary datacenter
Key Takeaways
Unified framework for high availability and site resilience that is native to Exchange
Simplified, quick, easy to validate site switchover and switchback process
Easier to deploy site resilience with incremental deployment
Housekeeping
Level 2Room S221: OFC208 – by Tara Seppa
Room S222: DAT08-HOL-E – by Microsoft Certified Trainer
Room S224 & 225: MGT339 – by Lawrence Tse
Room S226 & 227: VIR381 – by Bryon Surace
Room S228: WCL05-HOL – by Microsoft Certified Trainer
Level 4Room S421: UNC310 – by Andrew Ehrensing
Room S423: WMB201 – by Jim Tsui
Room S425: DEV396R – by Andrew Coates
Room S427: DEV377 – by Xiao Ying Guo
Room S426: SEC11-HOL – by Microsoft Certified Trainer
http://microsoft.com/technet
Resources for IT Professionals
http://microsoft.com/msdn
Resources for Developers
www.microsoft.com/learning
Microsoft Certification & Training Resources
Resources
Complete an
evaluation on
CommNet and
enter to win!
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.