dbi 309 the sql server customer advisory team (sql cat) represents the customer-facing resources...
TRANSCRIPT
High Availability and Disaster Recovery Customer Panel: Architectures and Lessons LearnedMicrosoft:
Sanjay MishraPrem Mehra
Customers:David P. SmithAyad ShammoutThomas GrohserMichael Steineke
DBI 309
Key Takeaways of the Session
SQL Server 2008 and SQL Server 2008 R2 can meet very high HA and DR requirementsUpgrades to SQL Server 2008 and to SQL Server 2008 R2 can be achieved with downtime limited to minutesDemanding HA and DR deployments require very good documented operational procedures and highly skilled staff
SQL CAT (Customer Advisory Team)
Achieving Customer SuccessLG Electronics - is a manufacturing corporation with more than 115 businesses worldwide and more than 80,000 employees, Improved transaction performance of 5TB-sized BI system ‘from minutes to seconds’, KRW 90billion reduction in costs
Case study coming soon
Yahoo!- Tier-1 Mission Critical BI solution, 12TB Analysis Services database, 3.5 billion display advertising impressions, 35 billion segments.
Case study coming soon
Making a Better ProductDrives feedback and product requirements back into SQL Server development teams from deep and strategic customer and ISV engagements
Sharing with the Communityhttp://sqlcat.com
The SQL Server Customer Advisory Team (SQL CAT) represents the customer-facing resources from the SQL Server Product Group. SQL CAT is comprised of product and solution experts that regularly engage in the largest, most complex, and most unique customer deployments worldwide.
Design Wins Program
Customer BenefitsDirect engagement from product group (SQL CAT technical/project experience)Reduced riskArchitecture Best Practices and Guidance SQL CAT-led service offerings (i.e. Architecture Design Reviews) optionalLab visit to Redmond or Regional Microsoft Technology Center (MTC) optionalBusiness Investment Funds (BIF) optional/limited availability!Invitations to special events/programs (i.e. TAP) optionalCo-marketing and PR
Example (SQL Server Qualifying Criteria)10+ TB DW, 3k tx/s OLTP, Large 500GB+ Cubes, Competitive migrations, Complex deployments, Server Consolidation (1000+)
Nominating a CustomerCustomer’s account team can nominate their customer to the SQL CAT team
The Customer Experience (CX) Design Wins Program is how the Microsoft Business Platform Division identifies and invests in strategic and large-scale customer projects with challenging, unique, and complex applications running on the Microsoft platform.
Content
HA DR Capabilities and TechnologiesArchitectural Solutions and Customer DeploymentsKey TakeawaysQuestions & Answers
Proven HA / DR Architectures: Successfully Deployed by Customers
# Architecture Key Distinguishing Scenario Use & Deployment Characteristics
Examples
1 Failover Clustering for HA and Database Mirroring for DR
A) Single data copy for HA sufficient B) Positive experience with Failover clustering C) Comfortable deploying two different technologies for HA &
DR
ServiceU and CareGroup
2 Synchronous Database Mirroring for HA/DR and Log Shipping for additional DR
A) Require deploying fewer (only one) technology for HA & DR
B) Avoid costs associated with Failover clustering C) For HA, remote data center execution acceptable
bwin
3 Geo-Cluster for HA/DR A) Require deploying fewer (only one) technology for HA & DR
B) Positive experience with Geo-Clustering
Edgenet
4 Failover Clustering for HA and SAN-based Replication for DR
A) Require deploying single DR technology across multiple DBMSs
B) A third party DR technology acceptable
Progressive Insurance
5 Peer-to-Peer Replication for HA and DR (and reporting)
A) Simultaneous data manipulation from multiple sitesB) Potential data loss acceptable
Enterprise in Travel Industry
Content
HA DR Capabilities and TechnologiesArchitectural Solutions and Customer DeploymentsKey TakeawaysQuestions & Answers
ServiceU Corporation
David P SmithChief Technology Officer
ServiceU Corporation
ServiceUSoftware as a Service (SaaS) providerProvide solutions for reserved seat ticketing, box office management, event management and online paymentsCustomers in 50 states and 15 countriesPCI Level 1 Service Provider (credit card compliance)
HA/DR RequirementsNo service = no revenuePCI requires same security measures at DR site; needs to be set up prior to emergency in order to meet same strict guidelinesGoal: eliminate all single points of failure: network, servers, data, data centers
ServiceU Corporation
Usage of SQL Server HA technologiesAll SQL Servers are clustered, including
At the DR SiteThe Development and Test Environments
Asynchronous Database Mirroring used for all critical databases between main datacenter and DR datacenter
Log shipping – used to ‘seed’ databases in order to start Database Mirroring
http://sqlcat.com/whitepapers/archive/2009/08/04/high-availability-and-disaster-recovery-at-serviceu-a-sql-server-2008-technical-case-study.aspx
Atlanta Standby Data CenterMemphis Primary Data Center
SQL Server Infrastructure
DNS
Asynchronous Database Mirroring
Windows 2008 SQL 2008 Windows 2008 SQL 2008
MIRROR
Preferred
PRINCIPAL
DB Connection to Memphis for Regular Test Exercise
DNS
WEB FARM WEB FARM
DNS
ServiceU Upgrade Goals
Upgrade production systems from:Windows Server 2003 to Windows Server 2008SQL Server 2005 to SQL Server 2008 With new hardware
New Servers at both Data Centers to accommodate growth, andAdd disks to the SANs at both Data Centers and reconfigure LUNs
Achieve these goals with least service interruption: No more than 20 minutes Total downtime during the complex upgrade: ~16 minutesSLA permits up to 45 minutes per year
ServiceU Upgrade Process
Setup a temporary cluster (Windows Server 2008 and SQL Server 2008) in the primary data centerRemove DBM to the DR site, and establish DBM from production cluster to temporary cluster Failover to temporary cluster. Temporary cluster is now Production clusterRebuild the old production cluster with Windows Server 2008 and SQL Server 2008Establish DBM from temporary production cluster to the newly built clusterFailover to newly built cluster. New cluster is now productionRepeat the last 3 steps for the DR siteWe first set up Log Shipping and then convert it to Database Mirroring for convenience and flexibility
Caregroup
Ayad ShammoutSenior Database Consultant
Caregroup Healthcare System
Four Hospitals located in Boston16,000 Employees146 Mission Critical Clinical Applications2 Million Patient Medical RecordsAnnual Revenue : $2 BillionHA/DR requirements for clinical databases:
RTO : 0 downtimeRPO: No data loss
All mission-critical SQL Servers are Clustered and Mirrored
http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?casestudyid=4000001003
16
CareGroup DB Classification & SLA
80 databases rated “AAA” RPO 0 & RTO 0Deploy Clustering + Synchronous database mirroring Use EMC Clariion SAN with SSD disk
300 databases rated “AA” RPO =<1 hour & RTO 1 hourDeploy Clustering + Asynchronous mirroring Use EMC Clariion SAN
Rest of the databases rated “A” RPO & RTO 1 day
SQL Server Disaster Recovery
18
Mirror S
erver
SQL Server Cluster
Cisco Global Site Selector (GSS) DNS
SQLHostNameA\SQL1Active IP:100.10.56.30
Alias Name = GreenActive IP: 100.10.56.30 100.85.3.10
Connect to: Green\SQL1
SQLHostNameB\SQL1Passive IP:100.85.3.10
DR
Site
Mirroring
Prin
cip
al S
erve
r
Applications:1- SharePoint2- SSRS3- BlackBerry4- Citrix Server5- VMware VC
Windows Server 2008 R2SQL Server 2008 R2
Mirroring
Mirroring
Upgrading Failover Cluster:To Windows Server 2008 R2 and SQL Server 2008 R2
EvictEvict
Evict
Windows Server 2003 SQL Server 2005 6 nodes Cluster
Each SQL instance has two preferred owners
Give back to Server Team
Mirroring
Mirroring
Mirroring
Mirroring
Borrowed from Server Team
RebuildRebuild
bwin
Thomas Grohser
About bwin
World’s biggest publicly listed online gaming platformWorld’s leading provider of online Sports BettingOne of the largest Poker networksComprehensive range of Payment Service ProvidingIntegrated gaming portal – 22 languages, 25 core marketsMore than 20 million registered customers1,500 employees bwin builds on the strengths of the web in order to tie up responsibility and gaming 15 million page views and up to 980,000 users a day
bwin Mission & Challenges
The Mission: Failure is not an optionDefinition: VLDB – A database that needs attention – it’s not just sizeService Level Agreement for Financial Transactions
100% Transactional ConsistencyZero data loss99.99x% availability @ 24 x 7
Assumed worst case scenario: full datacenter failure with complete data loss within the datacenter
bwin Solution and Environment
The Solution:Standardize everythingWork by the bookHave some clever guys at hand (if the book runs out of pages)
Environment 5 DBA’s & 1 Database Engineer 100+ SQL Server Instances (SQL11, 2008R2, 2008 & 2005) 120+ TB of data 1,400+ Databases 1,600+ TB storage 5,000+ GB RAM450,000+ SQL Statements per second on a single server500+ Billion database transactions per day
bwin Infrastructure Scale Up & Zero Data Loss
Principal Mirror
Log Shipping1h delay
Log Shipping2nd copy
All Log Backups andFull Backups Days 1,3,5…
All Log Backups andFull Backups Days 2,4,6…
Scale Up Infrastructure
Itanium based (phased out)
NUMA Node configuration:4 Dual Core CPU’s64 GB RAM8 x 1 GB NIC4 x 4 GB HBA (SAN Volumes as needed)2 x RAID Controller with 25 spindles each
We use servers with 1, 2, 4 or 8 NUMA Nodes
New Xeon based systems
NUMA Node configuration:8 or 10 Core CPU’s64 GB RAM1 x 10 GE NIC1 x 8 GB HBA (SAN Volumes as needed)1 x Fusion IO SSD
We use servers with 4 or 8 Nodes
bwin Infrastructure Scale Up and High Availability
Principal Mirror
Log Shipping1h delay
Log Shipping2nd copy
All Log Backups andFull Backups Days 1,3,5…
All Log Backups andFull Backups Days 2,4,6…
Edgenet
Michael SteinekeVP of Information Technology
About Edgenet
Leader in Data Services, Guided Selling and Marketing SolutionsConsumers and businesses want details about products. At Edgenet, we organize that product information to increase sales.Provide retail applications
Help retailers sell configurable productsHelp consumers compare and purchase the right product for them.
Collect, certify and distribute product dataGoogle Search & ShoppingBing Search & ShoppingRetailersOne of Four Active US GDSN-certified pools
Rigorous certification and data quality scoring process
Edgenet Distance Cluster Solution
System was built to provide disaster recovery for our data pool solutionsNear real-time data replication, and MSDTC support
Hardware/SoftwareSQL Server 2008 R2 EnterpriseWindows Server 2008 R2 DatacenterBrocade 5300 - 8 Gb FC Switches EMC RecoverPoint CEEMC Clariion CX4-80NEC Express 5800/1080a GX
2 Node SQL Stretch Cluster 850 mi. – Milwaukee to Atlanta4 Clustered SQL instances, 1 Clustered MSDTC11 TB of useable replicated storage – 54 LUNSTip: The network latency between cluster nodes will cause SQL installations to take a lot more time than a local cluster installation
Use SQL Slipstream functionality for SP’s & CU’s when installing a distance clusterPlan appropriate time to apply post-installation updates
Edgenet Cluster Diagram
RecoverPoint Appliance(s)EMC RecoverPoint CE
Milwaukee, WI Atlanta, GA
300 Mb Ethernet Stretch Vlan 10.10.10.0/24
RecoverPoint Appliance(s)EMC RecoverPoint CE
Asynchronous Replication
850
Mile
s
NEC Express 5800/1080a GX4 Socket – Oct Core (32 Cores)
Xeon X7560 – 2.27 GHz512 GB RAM
SQL Server 2008 R2 EnterpriseWindows Server 2008 R2 Datacenter
NEC Express 5800/1080a GX4 Socket – Oct Core (32 Cores)
Xeon X7560 – 2.27 GHz512 GB RAM
SQL Server 2008 R2 EnterpriseWindows Server 2008 R2 Datacenter
SAN
Fab
ric
Bro
cade
530
0 8
Gb
Sw
itch
SAN
Fab
ric
Bro
cade
530
0 8
Gb
Sw
itch
SANEMC Clariion CX4-80
15k Fibre Channel Disk
SANEMC Clariion CX4-80
15k Fibre Channel Disk
Passive SQL Server Cluster Node
Active SQL Server Cluster Node
Edgenet – RecoverPoint CE
Key Takeaways
Key Takeaways
SQL Server 2008 and SQL Server 2008 R2 can meet very high HA and DR requirementsUpgrades to SQL Server 2008 and to SQL Server 2008 R2 can be achieved with downtime limited to minutesDemanding HA and DR deployments require very good documented operational procedures and highly skilled staff
Appendix
•SQL Server 2008 Failover Clustering http://sqlcat.com/whitepapers/archive/2009/07/08/sql-server-2008-failover-clustering.aspx•Multi-Site Cluster http://download.microsoft.com/download/3/b/5/3b51a025-7522-4686-aa16-8ae2e536034d/WS2008%20Multi%20Site%20Clustering.doc•Mirroring a Large Number of Databases in a Single SQL Server Instance http://sqlcat.com/technicalnotes/archive/2010/02/10/mirroring-a-large-number-of-databases-in-a-single-sql-server-instance.aspx•Database Mirroring and Log Shipping Working Together: http://sqlcat.com/whitepapers/archive/2008/01/21/database-mirroring-and-log-shipping-working-together.aspx•Asynchronous Database Mirroring with Log Compression in SQL Server 2008 http://sqlcat.com/technicalnotes/archive/2007/12/17/asynchronous-database-mirroring-with-log-compression-in-sql-server-2008.aspx•Using Replication for High Availability and Disaster Recovery•High Availability and Disaster Recovery at ServiceU: A SQL Server 2008 Technical Case Study•Database Mirroring Best Practices and Performance Considerations•Database Mirroring Log Compression in SQL Server 2008 Improves Throughput•Asynchronous Database Mirroring with Log Compression in SQL Server 2008 http://sqlcat.com/technicalnotes/archive/2007/12/17/asynchronous-database-mirroring-with-log-compression-in-sql-server-2008.aspx•High Availability and Disaster Recovery for Microsoft’s SAP Data Tier: A SQL Server 2008 Technical Case Study•Failure Is Not an Option: Zero Data Loss and High Availability
Questions ?
Database Platform (DAT) Resources
Try the new SQL Server Mission Critical BareMetal Hand’s on-Labs
Visit the updated website for SQL Server® Code Name “Denali” on www.microsoft.com/sqlserver and sign to be notified when the next CTP is availableFollow the @SQLServer Twitter account to watch for updates
Visit the SQL Server Product Demo Stations in the DBI Track section of the Expo/TLC Hall. Bring your questions, ideas and conversations!
• Microsoft® SQL Server® Security & Management • Microsoft® SQL Server® Optimization and Scalability• Microsoft® SQL Server® Programmability • Microsoft® SQL Server® Data Warehousing• Microsoft® SQL Server® Mission Critical • Microsoft® SQL Server® Data Integration
Resources
www.microsoft.com/teched
Sessions On-Demand & Community Microsoft Certification & Training Resources
Resources for IT Professionals Resources for Developers
www.microsoft.com/learning
http://microsoft.com/technet http://microsoft.com/msdn
Learning
http://northamerica.msteched.com
Connect. Share. Discuss.
Complete an evaluation on CommNet and enter to win!
Scan the Tag to evaluate this session now on myTech•Ed Mobile
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS
PRESENTATION.
Appendix
Atlanta Standby Data CenterMemphis Primary Data Center
Upgrading Infrastructure 1
Asynchronous Database Mirroring
Installed TemporaryWindows 2008
SQL 2008 Cluster
Broke Mirroring toAtlanta
SQL 2005 Cluster SQL 2005 Cluster
MIRROR
Established Mirroring
MIRROR
Disabled Log Shipping
Temporary SQL 2008 Cluster
Setup Log Shipping
Preferred
DNSDNS
WEB FARM WEB FARM
DNS
PRINCIPAL
Log Shipping
DB Connection to Memphis for Regular Test Exercise
Atlanta Standby Data CenterMemphis Primary Data Center
Upgrading Infrastructure 2
Temporary SQL 2008 Cluster
SQL 2005 Cluster
PRINCIPAL
DB Connection to Memphis for Regular Test Exercise
SQL 2005 Cluster
MIRROR
MIRROR
Preferred
DNSDNS
WEB FARM WEB FARM
DNS
Memphis Primary Data Center
Upgrading Infrastructure 3
Temporary SQL 2008 Cluster
SQL 2005 Cluster
PRINCIPAL
MIRROR S
wit
ch
ed
to
a W
eb
S
erv
er
De
liv
eri
ng
D
ow
n T
ime
Me
ss
ag
e
Manual Failover & Broke MirroringQuick Testing with SQL
2008
Switched To Web Farm Connecting to Temporary SQL 2008 Cluster
GO / NO GO
Preferred
DNS
DNS
WEB FARM
Memphis Primary Data Center
Upgrading Infrastructure 4
Temporary Production
SQL 2008 Cluster
Installed New Windows 2008 SQL 2008 Cluster with Additional Disks
Log Shipping
MIRROR
PRINCIPAL
DB Mirroring
SQL Server 2005 Cluster
Preferred
Disabled Log Shipping. Setup Mirroring
DNS
DNS
WEB FARM
Setup
Memphis Primary Data Center
Upgrading Infrastructure 5
Temporary Production
SQL 2008 Cluster
MIRROR
PRINCIPAL DB Mirroring
Sw
itc
he
d t
o a
We
b
Se
rve
r D
eli
ve
rin
g
Do
wn
Tim
e M
es
sa
ge
Manual Failover
Quick Testing
SQL 2008 Cluster
Production Server
GO / NO GO
Preferred
DNS
DNS
WEB FARM
Atlanta Standby Data CenterMemphis Primary Data Center
Upgrading Infrastructure 6
SQL 2008 Cluster
MIRROR
MIRROR
Break MirrorBetween Production &
Temporary 2008 Cluster
Disable Log Shipping and Setup Asynchronous Mirroring
DB Connection to Memphis for Regular Test Exercise
New Windows 2008 SQL 2008 Cluster with Additional Disks
PRINCIPAL
Preferred
SQL 2008 Cluster
Setup SQL Server 2008Cluster
Setup Log Shipping
DNSDNS
WEB FARM WEB FARM
DNS
Existing SQL Server 2005 Cluster
48
PassiveActive
Windows Server 2003 R2 EE SP2, 64-Bit
SQL Server 2005 EE SP2, 64-Bit
EMC
In-Place Upgrade 1
49
PassiveActive
Step #1:Install Prerequisites:1- .Net Framework 3.5 SP12- Windows Installer 4.53- Windows QFE (KB937444)4- SQL2008 Setup Support filesREBOOT….
Step #2:Install Prerequisites:1- .Net Framework 3.5 SP12- Windows Installer 4.53- Windows QFE (KB937444)4- SQL2008 Setup Support filesREBOOT …..
SQL Instance Manual Failover
In-Place Upgrade 2
50
ActivePassive
Step #4:Upgrade to SQL Server 2008 on Active NodeStep #3:
Upgrade to SQL Server 2008 on Passive Node
Step 5: SQL Instance Automatic Failover
No client connection for 1-2 minutes while db is being upgraded to 2008 on the left node
SQL 2008SQL 2008
Active
Removed from Cluster Group Possible Owners
In-Place Upgrade With Mirroring
51
PassiveActive
Mirro
red S
QL
Step #1:Upgrade to SQL Server 2008 on Mirrored Instance
Step#2: Manual Failover to the database mirroring partner for each database
Mirroring suspended
SQL 2008
SQL Server Cluster
Principal
SQLServer 2008
Mirroring resumed
Step #3:Upgrade Cluster to SQL Server 2008
Step#4: Manual Failover to the database mirroring partner for each database
HA Zero Data Loss Solution Remarks
Zero data loss is higher priority than Availability, so if we can’t harden the transaction to disk in two datacenters, we put our application offlineIf “Principal” fails, we put our application offline, failover to “Mirror”, break the mirror and promote “Log Shipping Copy 2” to be the new mirror.If “Mirror” fails, we put our application offline, let “Log Shipping” catch up and promote it to be the new mirrorIf either of the log shipping secondaries fail, we continue operationOne of the log shipping secondaries has the one hour delay to be able to fix human or applications errors (like deleting data) quickly, if we do not detect deleted data within an hour we have to restore on of our backupsBackup Infrastructure: Each datacenter has one file server optimized to hold large files (but just a few < 10,000) and one to hold small files (but many of them > 1,000,000)
High Availability Solution Remarks
Priority is Availability but with the theoretical ability of loosing some data“Principal” does sync database mirroring to “Mirror” and a Witness watches them bothIf “Principal” fails, we automatically failover to the mirror, a scheduled SQL Server Agent script will then assess the situation and if the failed server does not come online within a few minutes it will break the mirroring session, and promote “Log Shipping Copy 2” to be the new mirror.If the second data center fails we go offline, a scheduled SQL Server Agent script will then assess the situation and if the failed server does not come online within a few minutes (we give it more than the principal) it will break the mirroring session and let “Log Shipping” catch up and promote it to be the new mirror.
Edgenet - RecoverPoint CE