high availability and failover clusters in exchange server 2007
TRANSCRIPT
High Availability and Failover High Availability and Failover Clusters in Exchange Server 2007Clusters in Exchange Server 2007
What We Will CoverWhat We Will Cover
In this session you will learn how Exchange In this session you will learn how Exchange Server 2007 enables in-the-box high-Server 2007 enables in-the-box high-availability solutions with fast, one click availability solutions with fast, one click recovery. This session covers advances to recovery. This session covers advances to availability management on clustered and availability management on clustered and non-clustered Exchange servers. Learn non-clustered Exchange servers. Learn about the improvements made to high about the improvements made to high availability in Exchange Server 2007, availability in Exchange Server 2007, including Local Continuous Replication, including Local Continuous Replication, Cluster Continuous Replication and Single Cluster Continuous Replication and Single Copy Clusters.Copy Clusters.
Exchange Server 2007 Exchange Server 2007 High Availability GoalsHigh Availability Goals
Deliver data and service availability Deliver data and service availability solutionssolutions
Decrease deployment and operational costsDecrease deployment and operational costs
Enable options for more Exchange Enable options for more Exchange customerscustomers
Improve solution behaviorImprove solution behavior
Enable large, low-cost mailboxes (> 1 GB)Enable large, low-cost mailboxes (> 1 GB)
Introducing Continuous Introducing Continuous ReplicationReplicationaka Log Shippingaka Log Shipping
A dA data outage ata outage is is expensive to expensive to recover fromrecover fromRestoring from backup takes a long timeRestoring from backup takes a long time
There may be significant data lossThere may be significant data loss
Keep a copy of the dataKeep a copy of the dataIt has to be up-to-dateIt has to be up-to-date
Two configurationsTwo configurationsA copy of the data on the same machine (LCR)A copy of the data on the same machine (LCR)
A copy of the data on a different machine (CCR)A copy of the data on a different machine (CCR)
Exchange Server
Store
DB Copy Active Node
Store
DB
Passive Node
Copy
CCRCCR
LCRLCR
Cluster
Standalone Server
Continuous Replication TheoryContinuous Replication Theory
Make a copy of the dataMake a copy of the data
As the original data is modified, make the As the original data is modified, make the same modifications to the copysame modifications to the copy
Less expensive than re-copying all the dataLess expensive than re-copying all the data
This provides an up-to-date copy of the This provides an up-to-date copy of the datadata
ESE LoggingESE Logging
A logfile is a list of database modificationsA logfile is a list of database modificationsPhysical changes are loggedPhysical changes are logged
Used for recovery after a crashUsed for recovery after a crash
Basic technology is industry standardBasic technology is industry standardLots of subtletiesLots of subtleties
Logging a Database UpdateLogging a Database Update
Database
Update: Page 25: Mark message 8 read
Insert: Page 1376: Message 101
Delete: Page 1376: Delete message 101
Insert: Page 1030: Message 103
Insert: Page 1029: Message 102
Insert: Page 1030: Message 104
priv1.edbpriv1.edb
E00.logE00.logPage 367
Message 105
To: [email protected]: UpdateWhat is the latest status on LLR Perf?
Insert: Page 367: Message 105 …
Implementing ReplicationImplementing Replication
Make a copy of a databaseMake a copy of a database
As log records are created, apply the As log records are created, apply the changes in them to the copychanges in them to the copy
Database Database
…
LogfileLogfileInsert: Page 367: Message 105 …
Page 367
Message 105
To: [email protected]: UpdateWhat is the latest status on LLR Perf?
Page 367
Message 105
To: [email protected]: UpdateWhat is the latest status on LLR Perf?
OriginalOriginal CopyCopy
LCR/CCR Basic ArchitectureLCR/CCR Basic Architecture
Exchange store runs normallyExchange store runs normally
Replication service keeps a copy of the Replication service keeps a copy of the database up-to-datedatabase up-to-date
Copies and replays log filesCopies and replays log files
Cluster service provides failoverCluster service provides failoverMove network identity (client transparency)Move network identity (client transparency)
LCR failover is manualLCR failover is manualRestore-StorageGroupCopy taskRestore-StorageGroupCopy task
Exchange Server
Store
DB
ReplicationService
Copy Active Node
Store
DB
Passive Node
ReplicationService
Copy
CCRCCR
LCRLCR
Cluster
LogRecords
LogRecords
Standalone Server
Basic Replication PipelineBasic Replication Pipeline
SourceDB
InspectorDirectory
TargetLogDirectory
DBCopy
Store
ReplicationServiceSource
LogDirectory
ReplicationService
ReplicationService
ESE LogfilesESE Logfiles
Each storage group has a number (Each storage group has a number (0000 for the first for the first one)one)
Each log has a generation number, starting with 1Each log has a generation number, starting with 1
EExxxx.log.log (e.g. E (e.g. E0000.log).log) is the current log file is the current log file
A full EA full Exxxx..log is renamed using its generation log is renamed using its generation numbernumber
Sample log streamSample log streamEE00000000000100000001.log.log
EE00000000000200000002.log.log
EE0000.log .log (current log file)(current log file)
When EWhen Exxxx.log is filled.log is filledEE00000000000100000001.log.log
EE00000000000200000002.log.log
EE0000.log.log renamed to renamed to EE00000000000300000003.log.log
New New EE0000.log.log created created
Logging DetailsLogging Details
Logged changes are physical, not logicalLogged changes are physical, not logicalDelivering one message is actually many low-Delivering one message is actually many low-level physical operationslevel physical operations
Logging is write-aheadLogging is write-aheadThe database page is modified in memoryThe database page is modified in memory
The log is written to diskThe log is written to disk
The database page is written to diskThe database page is written to disk
Changes are cumulativeChanges are cumulative
Log CopyingLog Copying
A ‘pull’ modelA ‘pull’ model
Exchange server creates logfiles normallyExchange server creates logfiles normally
Logfiles are copied by Replication serviceLogfiles are copied by Replication serviceEExxnnnnnnnnxxnnnnnnnn.log files copied as they appear.log files copied as they appear
EExxxx.log is copied for handoff/failover.log is copied for handoff/failoverIf it can’t be copied loss setting is consultedIf it can’t be copied loss setting is consulted
Log VerificationLog Verification
Log files are copied to the Inspector Log files are copied to the Inspector directorydirectory
Checksum and signature are verifiedChecksum and signature are verifiedChecksum failures cause a log file to be Checksum failures cause a log file to be recopiedrecopied
If a log file can’t be copied a re-seed is requiredIf a log file can’t be copied a re-seed is required
Log file is moved to the log directory after Log file is moved to the log directory after successful inspectionsuccessful inspection
Log ReplayLog Replay
Apply changes in log files to database copyApply changes in log files to database copy
A special recovery modeA special recovery modeDifferent from ‘eseutil /r’Different from ‘eseutil /r’
Undo phase is skippedUndo phase is skipped
If possible, log files are replayed in batchesIf possible, log files are replayed in batchesImproves performanceImproves performance
Get-StorageGroupCopyStatusGet-StorageGroupCopyStatus
LastLogCopyNotifiedLastLogCopyNotifiedLast generation seen in the source directoryLast generation seen in the source directory
LastLogCopied LastLogCopied Last generation copied by Replication serviceLast generation copied by Replication serviceCopied to inspector directoryCopied to inspector directory
LastLogInspectedLastLogInspectedLast generation inspectedLast generation inspectedMoved to log file directoryMoved to log file directory
LastLogReplayedLastLogReplayedLast generation replayed into the database copyLast generation replayed into the database copy
Available through performance monitorAvailable through performance monitor
Replication PipelineReplication Pipeline
SourceDB
InspectorDirectory
CopyLogDirectory
DBCopy
Store
Source LogDirectory
LastLogCopyNotified LastLogCopied
LastLogInspected
LastLogReplayed
ReplicationService
ReplicationService
ReplicationService
CCR FailoverCCR Failover
Cluster service monitors the resourcesCluster service monitors the resourcesFailure detection is not instantaneousFailure detection is not instantaneous
IP Address or Network Name resource IP Address or Network Name resource failures cause failoverfailures cause failover
A machine, or network access to it, has failed A machine, or network access to it, has failed completelycompletely
Exchange service failure or timeout doesn’t Exchange service failure or timeout doesn’t cause failovercause failover
The service is restarted on the same nodeThe service is restarted on the same node
Database failure doesn’t cause failoverDatabase failure doesn’t cause failoverDon’t want to move 49 databases because 1 Don’t want to move 49 databases because 1 failedfailed
CCR File ShareCCR File Share
Replication service runs remotely but needs Replication service runs remotely but needs access to log filesaccess to log files
Share created on the active nodeShare created on the active node
Readable by ‘Exchange Servers’ groupReadable by ‘Exchange Servers’ groupMachine accounts of all Exchange serversMachine accounts of all Exchange servers
Run as LocalSystem to access the shareRun as LocalSystem to access the share
‘‘Exchange Servers’ group granted R/O Exchange Servers’ group granted R/O access to filesaccess to files
CCR servers onlyCCR servers only
CCR File Share PermissionsCCR File Share Permissions
This is normal!(Permissions are very restrictive)
Active
E00 (Gen 5)E00 0000 0005
E00 0000 0004
Move-ClusteredMailboxServerMove-ClusteredMailboxServer
Passive node copies Passive node copies log fileslog files
EExxxx.log is in use.log is in use
On moveOn move,, E Exxxx.log is .log is copiedcopied
Designations are Designations are now reversednow reversed
E00 0000 0001
E00 (Gen 2)
E00 (Gen 3)
E00 (Gen 4)
E00 (Gen 6)
E00 (Gen 5)
E00 0000 0003
E00 0000 0002
E00 0000 0001
E00 0000 0002
E00 0000 0003
E00 0000 0004
Node 1 Node 2
Active
Lossy Lossy FailoverFailover
Failover without Failover without copying all log filescopying all log files
Passive DB is not Passive DB is not completely up-to-completely up-to-datedate
Log generation Log generation numbers are reusednumbers are reused
Log files have Log files have different content!different content!
Databases are Databases are different!different!
E00 0000 0001
E00 (Gen 2)
E00 (Gen 3)
E00 (Gen 4)E00 (Gen 4)
E00 (Gen 5)
E00 (Gen 6)
E00 (Gen 5)
E00 0000 0003
E00 0000 0002
E00 0000 0001
E00 0000 0002
E00 0000 0003
E00 0000 0004E00 0000 0004
E00 (Gen 5)
E00 0000 0004
E00 0000 0005
Node 1 Node 2
DivergenceDivergence
When the copy has information not in the When the copy has information not in the original it is divergedoriginal it is diverged
Divergence may be in database or log filesDivergence may be in database or log files
Lossy failover will produce a divergenceLossy failover will produce a divergence
‘‘Split-brain’ operation on a cluster also Split-brain’ operation on a cluster also causes divergencecauses divergence
Even if clients can’t connect, background Even if clients can’t connect, background maintenance still modifies the databasemaintenance still modifies the database
Administrator error can cause divergence!Administrator error can cause divergence!e.g. running eseutil /re.g. running eseutil /r
Detecting DivergenceDetecting Divergence
Replication service detects divergenceReplication service detects divergence
Comparing databases would be too slowComparing databases would be too slow
Compare the last log file on the copy to the Compare the last log file on the copy to the same log file on the activesame log file on the active
Runs when the store creates a new log fileRuns when the store creates a new log file
Log file headers contain creation times Log file headers contain creation times which chain them togetherwhich chain them together
Only replace EOnly replace EXXXX.log with a log file which is .log with a log file which is a superseta superset
Active
≠
Divergence DetectionDivergence Detection
Node 1Node 1 compares compares its its last last full full log file with log file with Node 2Node 2
E00 0000 0001
E00 (Gen 2)
E00 (Gen 3)
E00 (Gen 4)E00 (Gen 4)
E00 (Gen 5)
E00 (Gen 6)
E00 (Gen 5)
E00 0000 0003
E00 0000 0002
E00 0000 0001
E00 0000 0002
E00 0000 0003
E00 0000 0004E00 0000 0004
E00 (Gen 5)
E00 0000 0004
E00 0000 0005
Node 1 Node 2
Active
≤E00 0000 0004
Replacing EReplacing Exxxx.log.log
The two E00.logs The two E00.logs are different are different generationsgenerations
E00.log on the E00.log on the passive is a subset passive is a subset of E00…5.logof E00…5.log
Delete E00.logDelete E00.log
Copy E00...5.logCopy E00...5.log
E00 0000 0001
E00 (Gen 2)
E00 (Gen 3)
E00 (Gen 4)
E00 (Gen 5)
E00 (Gen 6)
E00 (Gen 5)
E00 0000 0003
E00 0000 0002
E00 0000 0001
E00 0000 0002
E00 0000 0003
E00 0000 0004
E00 0000 0005
Node 1 Node 2
Correcting DivergenceCorrecting Divergence
Reseed will always workReseed will always workExpensive for large databasesExpensive for large databases
Look at the common caseLook at the common caseLossy failoverLossy failover
Only a few log files are lostOnly a few log files are lost
SolutionSolutionDecrease log file size to reduce data lossDecrease log file size to reduce data loss
Use Lost Log Resilience (LLR)Use Lost Log Resilience (LLR)
Lost Log ResilienceLost Log Resilience
Remember a log record is written before Remember a log record is written before the modified database pagethe modified database page
But the database page can be written as soon But the database page can be written as soon as that happensas that happens
Delay writing the page to the database until Delay writing the page to the database until 1 or more log generations are created1 or more log generations are created
A new ESE feature in Exchange 2007A new ESE feature in Exchange 2007
Log Stream LandmarksLog Stream Landmarks
Checkpoint/Minimum Log RequiredCheckpoint/Minimum Log RequiredRecovery has to start from this pointRecovery has to start from this point
Waypoint/Maximum Log RequiredWaypoint/Maximum Log RequiredThe last log file actually required for recoveryThe last log file actually required for recovery
No modifications after the waypoint have been No modifications after the waypoint have been written to the databasewritten to the database
Committed/Log CommittedCommitted/Log CommittedLast log file generatedLast log file generated
Logs
DBs
Standalone Server Data Standalone Server Data AvailabilityAvailabilityProblems:Problems:
Data outages expensive to Data outages expensive to recoverrecoverSignificant data loss Significant data loss (hours?)(hours?)Today replication requires Today replication requires integration of partner integration of partner productsproducts Logs
DBs
LCR Key CharacteristicsLCR Key CharacteristicsOne machineOne machine
Enabled per storage groupEnabled per storage group
Two copies, ReplayTwo copies, ReplayOne datacenterOne datacenterEasy configurationEasy configuration
Local Continuous ReplicationLocal Continuous ReplicationOther requirements and behaviorsOther requirements and behaviors
Manual activation per storage groupManual activation per storage group
Resource costsResource costs
Range of configurationsRange of configurations
Variety of backup optionsVariety of backup options
Reduced Backup TCOReduced Backup TCO
Configuration limitationsConfiguration limitations
BenefitsBenefitsEnables recovery in minutesEnables recovery in minutes
Enables recovery without lossEnables recovery without loss
Decreases backup costsDecreases backup costs
Enables large mailboxEnables large mailbox
Enables I/O offloadEnables I/O offload
Logs
DBs
Logs
DBs
Exchange Server ClustersExchange Server Clusters
Exchange Server 2003Exchange Server 2003Requires shared storageRequires shared storage
Single copy of mailbox dataSingle copy of mailbox data
Transport, OWA, and Mailbox cluster awareTransport, OWA, and Mailbox cluster aware
Up to 8 node active/passiveUp to 8 node active/passive
2 Node active/active2 Node active/active
Exchange Server 2007 (Single Exchange Server 2007 (Single Copy Cluster)Copy Cluster)
Requires shared storageRequires shared storage
Single copy of mailbox dataSingle copy of mailbox data
Mailbox OnlyMailbox Only
Simple redundancy for Edge, Hub Simple redundancy for Edge, Hub Transport, Client Access, and UMTransport, Client Access, and UM
Up to 8 node active/passiveUp to 8 node active/passive
Active/active cutActive/active cut
Improvements in:Improvements in:
Install, Management, Install, Management, BehaviorBehavior
Q
DB
Log
s
SMTSMTPP
MBMBOWAOWA
DB
Q
Log
s
MBMB
SCC Resource Model SCC Resource Model
IP AddressIP Address
PhysicalPhysicalDiskDisk
Information StoreInformation Store
System AttendantSystem Attendant
Network NameNetwork Name
MailboxMailbox Database 3Database 3
HTTP (DAV)HTTP (DAV)
Beta 2Beta 2 RTM (expected) RTM (expected)
System AttendantSystem Attendant
Single Copy ClusterSingle Copy Cluster
Lacks full redundancyLacks full redundancyQuorum and Exchange levelsQuorum and Exchange levels
Deployment and operational complexity Deployment and operational complexity
CostCost
Recovery time after corruption or data Recovery time after corruption or data failure varies based on backup technologyfailure varies based on backup technology
Two datacenter solution requires Two datacenter solution requires integration of partner technologyintegration of partner technology
Created cluster continuous replication to Created cluster continuous replication to address address
DB
Q
Log
s
MBMB
Cluster Continuous ReplicationCluster Continuous Replication
Two node clusterTwo node clusterWitness on Hub TransportWitness on Hub TransportConfigurable heartbeat retriesConfigurable heartbeat retries
Two copiesTwo copies
ClusteredClusteredAutomatic recoveryAutomatic recoveryServer HCLServer HCL
Full redundancyFull redundancy
ReplayReplay
1 or 2 datacenters1 or 2 datacenters
Q Q
q
DB
DB
L
og
s
L
og
s
FileFileShareShare
KB 921181
Database Logs Database
Copy and verify logsCopy and verify logs
\\node1\GUID\\node1\GUID
E00.logE00.log
E0000000012.loE0000000012.logg
E0000000011.loE0000000011.logg
E0000000012.logE0000000012.log
E0000000011.logE0000000011.log
Advance DB Advance DB by playing by playing
logslogs
Online Online seedseed
Updated Updated DB DB
Cluster Continuous Replication Cluster Continuous Replication ExampleExampleActiveActive PassivePassive
Node1 Node2
Logs
\\node2\GUID\\node2\GUID
Advance DB Advance DB by playing by playing
logslogs
Updated Updated DB DB
ActiveActivePassivePassive
E00’.logE00’.log
E0000000014.loE0000000014.logg
E0000000013.loE0000000013.loggE0000000013.logE0000000013.log
Incremental Reseed
E0000000014.logE0000000014.log
Cluster Continuous Cluster Continuous ReplicationReplication
Other requirements and behaviorsOther requirements and behaviorsOutage ManagementOutage Management
Easy-to-use Easy-to-use scheduledscheduled outage support outage support
Automatic recovery of an Automatic recovery of an unscheduledunscheduled outage outage
Automatic database mount dialAutomatic database mount dial
Transport dumpsterTransport dumpster
Symmetric failoverSymmetric failover
Resource requirements (no penalty)Resource requirements (no penalty)
Variety of backup optionsVariety of backup options
Reduced backup TCOReduced backup TCO
Configuration limitationsConfiguration limitations
Q Q
q
DB
DB
L
og
s
L
og
s
FileFileShareShare
KB 921181
Benefits of CCRBenefits of CCR
Fast recovery to data problems on active nodeFast recovery to data problems on active node
No single point of failureNo single point of failure
More flexibility in hardware selectionMore flexibility in hardware selectionDirect Attached StorageDirect Attached Storage
No cluster validationNo cluster validation
Simplified storage requirementsSimplified storage requirements
Exchange provided database replication solution Exchange provided database replication solution
Enables a single Mailbox server failover to second Enables a single Mailbox server failover to second data centerdata center
Simplified deploymentSimplified deployment
Improved management experienceImproved management experience
Ability to offload workloadAbility to offload workload
Q Q
q
DB
DB
L
og
s
L
og
s
FileFileShareShare
KB 921181
Exchange Server 2007Exchange Server 2007High Availability TakeawaysHigh Availability Takeaways
DeliversDelivers standalone and clustered solutions standalone and clustered solutions DecreasesDecreases deployment and operational costs deployment and operational costs EnablesEnables HA options for more Exchange HA options for more Exchange
customerscustomers ImprovesImproves solution behavior solution behavior EnablesEnables large low cost mailboxes (1GB+) large low cost mailboxes (1GB+)
See LCR and CCR in Action!See LCR and CCR in Action!
Blogcast: LCRBlogcast: LCR
http://msexchangeteam.com/archive/2006/http://msexchangeteam.com/archive/2006/05/24/427788.aspx05/24/427788.aspx
Blogcast: CCRBlogcast: CCR
http://msexchangeteam.com/archive/2006/http://msexchangeteam.com/archive/2006/08/09/428642.aspx08/09/428642.aspx
© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation
as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES,
EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.