scott schnoll microsoft...

40

Upload: others

Post on 07-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378
Page 2: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Scott SchnollPrincipal Technical WriterMicrosoft CorporationSession Code: UNC378

Page 3: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Agenda

Site resilience requirements

Planning and designing for site resilience

Site activation steps

Switchback steps

Page 4: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Site resilience requirements

Business requirements drive site resilienceRisk assessment reveals a high impact threat to meeting SLAs in terms of data loss and loss of availability

Site resilience required to mitigate the risk

Business requirements dictate low recovery point objective (RPO) and recovery time objective (RTO)

Page 5: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Site resilience requirements

Ensuring business continuity brings expense and complexity

Site switchover is a coordinated effort, and takes practice to ensure the real event is handled

Exchange admins

Admins from other services (AD, DNS, ISA, etc.)

Exchange 2010 makes it as simple as possibleLow-impact testing can be performed with cross-site single database switchover

Technology is only half the picture in any site resilience solution, but will be the primary focus of this session

Page 6: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Exchange 2007Site resilience choices

CCR+SCR and /recoverCMS

SCC+SCR and /recoverCMS

CCR stretched across datacenters

SCR and database portability

SCR and /m:RecoverServer

SCC stretched across datacenters with synchronous replication

Page 7: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Exchange 2010 makes it simpler

Database Availability Group (DAG) with members in different datacenters

Supports automatic and manual cross-site database switchovers and failovers (*overs)

No stretched Active Directory site

No special networking needed

No /recoverCMS

Page 8: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Suitability of site resilience solutions

Solution RTO goal RPO goal Deploymentcomplexity

Ship backups and restore

High High Low

Standby Exchange 2003 clusters

Moderate Low High

CCR+SCR in separate AD sites

Moderate Low Moderate

CCR in a stretched AD site

Low Low High

Exchange 2010 DAGs

Low Low Low

Page 9: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378
Page 10: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Designing a site resilient solution Namespace planning

Each datacenter should be considered active when planning for namespaces

Each datacenter needs the following namespaces

OWA/OA/EWS/EAS namespace

POP/IMAP namespace

RPC Client Access namespace

SMTP namespace

In addition, one of the datacenters will maintain the autodiscover namespace

Page 11: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Designing a site resilient solution Namespace planning

Best Practice: Use Split DNS for Exchange hostnames used by clients

Goal: minimize number of hostnamesmail.contoso.com for Exchange connectivity on intranet and Internet

mail.contoso.com has different IP addresses in intranet/Internet DNS

Important – before moving down this path, be sure to map out all the host names (outside of Exchange) that you will want to create in the internal zone

Page 12: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Designing a site resilient solution Namespace planning

Datacenter 1

CAS HT

MBX

Datacenter 2

HT CAS

ADAD MBX

Internal DNSMail.contoso.comPop.contoso.comImap.contoso.comAutodiscover.contoso.comSmtp.contoso.comOutlook.contoso.com

Internal DNSMail.region.contoso.comPop.region.contoso.comImap.region.contoso.comSmtp.region.contoso.comOutlook.region.contoso.com

Exchange ConfigExternalURL = mail.region.contoso.comCAS Array = outlook.region.contoso.comOA endpoint = mail.region.contoso.com

Exchange ConfigExternalURL = mail.contoso.comCAS Array = outlook.contoso.comOA endpoint = mail.contoso.com

External DNSMail.region.contoso.comPop.region.contoso.comImap.region.contoso.comSmtp.region.contoso.com

External DNSMail.contoso.comPop.contoso.comImap.contoso.comAutodiscover.contoso.comSmtp.contoso.com

Page 13: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Best Practices:Network

LatencyLess than 250 ms round trip

Router ACLs should be used between MAPI and replication networks

If DHCP is used for the replication network, DHCP can be used for the static routes

Lower TTL for all Exchange records to 5 minutesOWA/EAS/EWS/OA, IMAP/POP, SMTP, RPCCAS

Both internal and external DNS zone

Page 14: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Best Practices:Exchange servers

Number of mailbox serversSecondary site should have fewer servers than the primary site

How many servers?

Live Active Directory and Hub/CAS in the secondary site, even if activation is blocked

Page 15: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Designing a site resilient solution: a real example

At first, there was Pioneer, with ~450 mailboxesHalf active on 60, half active on 61, all passive on 62

The Failover site was added for site resilienceCopies on MBX63 were set up as lagged passives

They could have been activation blocked

One new database active on 63, with a passive copy on 62

Pioneer Failover

MB

X6

1

MB

X6

0

MB

X6

2 DF-DAG-03

MB

X6

3

Page 16: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Designing for site resilience: Mailbox

Seed from passive could be used if we added another server in Failover

Witness server is in PioneerAlternate witness added

Routable DAG networks for heartbeatingRouter ACLs are needed to prevent crosstalk between networks

DAC mode was enabled

Pioneer Failover

MB

X6

1

MB

X6

0

MB

X6

2 DF-DAG-03

MB

X6

3

Page 17: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Designing for site resilience: Mailbox

Datacenter Activation ModeWhen servers are first powered on

They attempt to contact a server that can mount for permission

Alternatively, they should be able to contact all started servers

Pioneer Failover

MB

X6

1

MB

X6

0

MB

X6

2 DF-DAG-03

MB

X6

3

Page 18: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Designing for site resilience: Transport and UM

Hub Transport servers are site-aware and deliver messages to the right site

Shadow redundancy retains messages at the previous cross-site hop as well

UM servers can be configured with the same dial plan across sites

Some VoIP gateways may allow specifying a name

Pioneer Failover

DF-DAG-03

HT0

1

HT0

2

UM

03

UM

04

UC

H0

1

UC

H0

2

Page 19: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Designing for site resilience: CAS

RPC client access service establishes a RPC endpoint for client access on the CAS role

RPC client access service results in redirection or proxying for single DB failures

Pioneer Failover

DF-DAG-03C

06

C0

7

UC

H0

1

UC

H0

2

Hubs and UMs

Page 20: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Designing for site resilienceCertificates

Use Subject Alternative Name (SAN) certificate which can cover multiple hostnames

Best practice: minimize the number of certificates1 certificate for all CAS servers + reverse proxy + Edge/Hub

1 additional certificate if using OCS

If leveraging a certificate per datacenter, then ensure that the Certificate Principal Name is the same on all certificates

Set-OutlookProvider EXPR -CertPrincipalNamemsstd:pioneer.exchange.microsoft.com

OCS requires certificates with <=1024 bit keys, and the server name in the certificate principal name

Page 21: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Site Resilience Recommendations

Design High Availability for DependenciesActive Directory

Network services (DNS, TCP/IP, etc.)

Telephony services (Unified Messaging)

Backup services

Network services

Infrastructure (power, cooling, etc.)

Page 22: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Site Resilience Recommendations

Make sure the Directory Services operations staff fully understands recovery scenario and procedures

Script DNS changes

Log on with the right credentials

Verify machines are ready for use

Test activation periodically – validate, validate, validate!

Page 23: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378
Page 24: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Datacenter Switchover Process

Failure occurs

Activation decision

Terminate partially running primary datacenter

Activate secondary datacenterValidate prerequisites

Activate mailbox servers

Activate other roles (in parallel with previous step)

Service is restored

Page 25: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Site Resilience Tasks

Stop-DatabaseAvailabilityGroupAdds servers to stopped list

Removes servers from started list

Force cleanup stopped nodes

Restore-DatabaseAvailabilityGroupForce quorum

Evict stopped nodes

Start using alternate file share witness if necessary

Start-DatabaseAvailabilityGroupRemove servers from stopped list

Join servers to cluster

Add joined servers to started list

Page 26: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Terminate primary site: Mailbox

The mailbox servers must be marked in Active Directory as stopped in both sites

Stop-DatabaseAvailabilityGroup DF-DAG-03 –ActiveDirectorySite Pioneer –ConfigurationOnly

The mailbox servers can be powered off instead

This process reduces the risk of split brain

Pioneer Failover

MB

X6

1

MB

X6

0

MB

X6

2 DF-DAG-03

MB

X6

3

Hubs and UMCAS

Page 27: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Terminate primary site: UM

The UM servers must be disabled to prevent call routing to the failed datacenter

Disable-UMServer UM03

Disable-UMServer UM04

Can remove servers from VoIP gateway instead

DNS change if the VoIP gateway is configured to send calls to a DNS

Pioneer Failover

DF-DAG-03

UM

03

UM

04

UC

H0

1

UC

H0

2

HubsCAS

Page 28: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

MB

X6

3

Restore secondary site: Mailbox and UM

The mailbox servers in the secondary datacenter must be activatedStop-Service ClusSvc

Restore-DatabaseAvailabilityGroup

If the UM server was disabledEnable-UMServer

Pioneer Failover

MB

X6

1

MB

X6

0

MB

X6

2

DF-DAG-03

CAS

UM

03

UM

04

UC

H0

1

UC

H0

2

Hubs

Page 29: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Restore secondary site: CAS

Internal and external DNS changes to point A records to secondary site

OWA/OA/EWS/EAS namespace

POP/IMAP namespace

RPC Client Access namespace

SMTP namespace

Similar changes are required if using ISA

Pioneer Failover

DF-DAG-03

UC

H0

1

UC

H0

2

Hubs and UMs

Pioneer namespioneer.exchange.microsoft.comimap.pioneer.exchange.microsoft.compop.pioneer.exchange.microsoft.commomt.exchange.corp.microsoft.com

Failover namesFailover.exchange.microsoft.comMomt.failover.exchange.microsoft.com

C0

6

C0

7

Page 30: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Switchover to secondary datacenter

Primary data center failsStop-DatabaseAvailabilityGroup DF-DAG-03 –ActiveDirectorySitePioneer –ConfigurationOnly (in both data centers)

Stop-Service clussvc

Disable-UMServer UM03 (same for UM04)

Restore-DatabaseAvailabilityGroup DF-DAG-03 –ActiveDirectorySiteFailover –AlternateWitnessDirectory c:\fsw\DAG1 –AlternateWitnessServer UCH01

Databases mount (non-activation block scenario)Move-ActiveMailboxDatase (activation block scenario)

Adjust DNS records for SMTP and HTTPS access and adjust CAS configuration (if necessary)

Enable-UMServer UCH01 (same for UCH02)

Page 31: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Pioneer Failover

Restore primary site

Verify that all services are working in the primary datacenter

Activate stopped mailbox servers

Start-DatabaseAvailabilityGroup DF-DAG-03 –ActiveDirectorySite Pioneer

Allow replication to occur before scheduling outage

Dismount databases

MB

X6

1

MB

X6

0

MB

X6

2

DF-DAG-03

MB

X6

3

CAS

UM

03

UM

04

UC

H0

1

UC

H0

2

HubsPioneer namespioneer.exchange.microsoft.comimap.pioneer.exchange.microsoft.compop.pioneer.exchange.microsoft.commomt.exchange.corp.microsoft.com

Failover namesFailover.exchange.microsoft.comMomt.failover.exchange.microsoft.com

Page 32: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Pioneer Failover

Failover namesFailover.exchange.microsoft.comMomt.failover.exchange.microsoft.com

Restore primary site

Reverse DNS changes

Re-enable UM servers

Move Witness back to primary siteSet-DatabaseAvailabilityGroup DF-DAG-03 –WitnessDirectory c:\fsw\dag1 –WitnessServer DF-C14-06

Mount databases to complete failback and restore service

MB

X6

1

MB

X6

0

MB

X6

2DF-DAG-03

MB

X6

3

UC

H0

1

UC

H0

2

Hubs

C0

6

C0

7

UM

03

UM

04

Pioneer namespioneer.exchange.microsoft.comimap.pioneer.exchange.microsoft.compop.pioneer.exchange.microsoft.commomt.exchange.corp.microsoft.com

Page 33: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Switchback to primary datacenter

Verify that all services are working in the primary datacenterStart-DatabaseAvailabilityGroup DF-DAG-03 –ActiveDirectorySite Pioneer

Set-DatabaseAvailabilityGroup DF-DAG-03 –WitnessDirectory c:\fsw\dag1 –WitnessServer HT01

Reseed data or allow replication to occur and update copies in primary datacenter

Schedule downtime for the mailbox databases and dismount them

Change DNS records back to primary datacenter

Move databases back to primary datacenterMove-ActiveMailboxDatabase Failover_Test –ActivateOnServer MBX60

Mount databases in primary datacenter

Enable UM server in primary datacenter

Disable UM server in secondary datacenter

Page 34: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Key Takeaways

Unified framework for high availability and site resilience that is native to Exchange

Simplified, quick, easy to validate site switchover and switchback process

Easier to deploy site resilience with incremental deployment

Page 35: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Housekeeping

Page 36: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Level 2Room S221: OFC208 – by Tara Seppa

Room S222: DAT08-HOL-E – by Microsoft Certified Trainer

Room S224 & 225: MGT339 – by Lawrence Tse

Room S226 & 227: VIR381 – by Bryon Surace

Room S228: WCL05-HOL – by Microsoft Certified Trainer

Level 4Room S421: UNC310 – by Andrew Ehrensing

Room S423: WMB201 – by Jim Tsui

Room S425: DEV396R – by Andrew Coates

Room S427: DEV377 – by Xiao Ying Guo

Room S426: SEC11-HOL – by Microsoft Certified Trainer

Page 37: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378
Page 38: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

http://microsoft.com/technet

Resources for IT Professionals

http://microsoft.com/msdn

Resources for Developers

www.microsoft.com/learning

Microsoft Certification & Training Resources

Resources

Page 39: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

Complete an

evaluation on

CommNet and

enter to win!

Page 40: Scott Schnoll Microsoft Corporationdownload.microsoft.com/documents/hk/technet/techdays2009/UNC378.pdfScott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC378

© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,

IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.