exchange 2010 storage improvements
DESCRIPTION
A deck covering Exchange 2010 Storage improvements built and extended from some of the Microsoft ignite decksTRANSCRIPT
EXCHANGE 2010 STORAGE
IMPROVEMENTS
Nathan Winters – Exchange MVP
Agenda
A Brief History of Exchange Storage The new ethos Feature Deep Dive Summary
History
History ESE/JET Blue
IOPS – Random IO applicationWhy? – Small Expensive drives1.6GB disk $400 in 1996SCSI 2GB and 4GB 100 IOPSSingle Instance Storage
Clustering with Shared StorageBackup an issueSingle Point of Failure
32 bit Not enough RAMRam limited number of users per server
5
History - Exchange 2007
Big improvements in Exchange Server 2007Reduce storage input/output (I/O) (70%)Use large amounts of memory (64 bit)Increased page size (4 kilobyte (KB) -> 8 KB)Lower storage costsSupport large mailboxes (> 1 gigabyte (GB))Provide fast search (CI)Continuous replication (log shipping)High Availability (HA) + fast recoveryEliminate single points of failure
New Ethos
Email Usage Radicati seeing 165 mails per day growing
to 230 over next couple of years Users used to large free storage
25GB 5GB 3 years of mail
Triage once per year to archive○ Not once per day!
Mail available through all clients Cached Mode/Performance issues High Item counts – 5000, 20000, 100000
Disk Technology
Currently 2TB Moving to 8TB Random IO not getting quicker
15K RPM, 10K RPM, 7.2K RPM Density is getting better so can read
more data in the same time Flash – SSD – Didn’t take that bet
Optimised for spinning media for E14Expensive – so use as Cache in SAN
9
Exchange Server 2010 Storage VisionIO ReductionSequential
IO
Large, Fast, Low-cost Mailboxes
SATA/Tier 2 Disk
Optimization
Storage Design
Flexibility
RAID-less Storage (JBOD)
10
Exchange Server 2010 HA Storage Design Flexibility
SAN DAS (SAS) JBOD (SATA)
HA = CCR.33 IOPS/Mailbox2.5” 146GB 10K SAS DisksRAID5 for DBRAID10 for LogsSAS Array Controller (/w BBU)Backup = VSS SnapshotFast Recovery = CCR
HA = DAG (2 DB copies).11 IOPS/Mailbox3.5” 2TB 7.2K SATA/SAS DisksRAID10 for DB & LogsSAS Array Controller (/w BBU)Backup = Optional/VSSFast Recovery = Database Failover
DAS (SATA)
HA = DAG (3+ DB copies).11 IOPS/Mailbox3.5” 2TB 7.2K SATA/SAS Disks1 DB = 1 DiskBackup = Optional/VSSFast Recovery = Database Failover
More options to reduce storage cost
HA = Shared Storage Clustering+1.0 IOPS/Mailbox3.5” 15K 146GB FC DisksRAID10 for DB & LogsDedicated SpindlesMulti-path (HBA’s, FC Switches, SAN array controllers)Backup = Streaming off active Fast Recovery = Hardware VSS (Snapshots/Clones)
11
JBOD/RAID-less Storage: Now An Option JBOD : 1 disk = 1 database (with logs) Requires Exchange Server 2010 High Availability (3+ DB Copies) Annual Disk Failure Rate (AFR) = 5%
JBOD AdvantagesReducing Storage Costs/Complexity
Eliminates unnecessary DB copies: server and storage redundancy can be symmetrical
Reduces disk IO: eliminates RAID write penalty
Enables simple storage design: 1 disk = 1 database (with logs)
Enables simple storage failure recovery
JBOD ChallengesExchange HA/storage must replace RAID functionality
Disk striping performance (e.g. RAID10) cannot be leveraged
Disk failure = database failover (~30 second outage)
Re-enabling resiliency = spare disk assignment/partitioning/format/DB re-seed (scriptable)
Soft disk errors (bad blocks) must be detected and repaired
12
Exchange Server 2010 HA
Mailbox Server
Simplified mailbox High Availability and disaster recovery with new unified platform
DB1
DB3DB2
DB4DB5
Recover quickly from
disk and database
failures
Mailbox Server
DB1DB2
DB4DB5
DB3
Mailbox Server
DB1DB2
DB4DB5
DB3
Replicate databases to remote datacenter
San Jose New York
Evolution of continuous replication technology (database mobility)Easier than traditional clustering to deploy and manageAllows each database to have 16 replicated copiesProvides full redundancy of Exchange roles on as few as two servers
Deep Dive
Exchange 2010 Features Move to Sequential IO
Change Table structureLazy ViewPage size 32KBDatabase Compression (LVC)Read/Write CoalescingDatabase ContiguityCache Compression
Storage Groups Gone Single Point of Failure Gone Optimised for huge mailboxes
15
Random vs. Sequential Disk IO Random IO
Disk head has to move to process subsequent IO
Head movement = High IO latencySeek Latency limits
IO (IOPS) Sequential IO
Disk head does not move to process subsequent IO
Stationary head = low IO latencyDisk RPM speed limits I/O per
second (IOPS)
Disk Head
7.2K SATA Disk (20ms Latency)Random = 50 IOPSSequential = +300 IOPS
16
IO Reduction: Store Table Architecture
Exchange Server 2007
Message/Folder Table (MFT)
Joe:Inbox:H1
Joe:Inbox:H2
Joe:Inbox:H3
Per Database Per Folder
Mailbox Table
Jeff’s Mbx
Ann’s Mbx
Joe’s Mbx
Attachments Table
Jeff:Excel.xls
Ann:Pic.bmp
Joe:Help.doc
Message Table (Msg)
Joe:Msg10
Jeff:Msg32
Ann:Msg180
Folders Table
Jeff:Inbox
Ann:Drafts
Joe:Unread
Exchange Server 2010
View Tables (e.g. From)
Joe:H920
Joe:H302
Joe:H10
Secondary Indexes used for Views
Per Mailbox
Mailbox Table
Jeff’s Mbx
Ann’s Mbx
Joe’s Mbx
Message Header Table
Joe:H10
Joe:H302
Joe:H920
Folders Table
Joe:Inbox
Joe:Drafts
Joe:Unread
Per Database
New store schema = no more single instance storage within a database
Per View
Body Table
Joe:Msg10
Joe:Help.doc
Joe:Msg302
Exchange 2007
Exchange 2010
Many, random, IOs (1 per update)
Fewer, sequential, IOs (1 per view)
All Unread or Flagged items (view)
TimeM1 arrives M2 arrives M1 flagged M3 arrives M2 deleted
User uses OWA/Outlook Online and switches to this view
All Unread or Flagged items (view)
M1 M2 M1 M3 M2
M1 M2 M1 M3 M2
Nickel & Dime Approach
Pay to Play Approach
DB I/O
Store Schema Changes: Lazy View Updates
18
IO Reduction: Database Page Size Increased to 32 KB
Page 1
Msg Header
Page 2
X
Page 3
Msg Body
Disk
Page 4
X
Page 5
MsgBody
DBCache
Page 1
Msg Header
Page 3
Msg Body
Page 5
MsgBody
3 Read IO’s
Page 1 (32KB)
Msg Header, Msg Body
Disk
DBCache
1 Read IO
Exchange Server 2007 DB Read 20 KB Message
Exchange Server 2010 DB Read 20 KB Message
8 KB Pages
32 KB Pages
Page 2 (32KB)
X
Page 1 (32KB)
Msg Header, Msg Body
19
Mitigate DB Space Growth: Database Compression
Problem: Store Schema change, space hints, B+Tree Defrag and 32 KB page size combine to increase DB file size by 20%
Solution: Growth is 100% mitigated by Database CompressionTargeted compression for message headers and text/html bodies (7bit/Express)
E2K7/RTF E14/RTF E14/Mix E14/HTML0.000.200.400.600.801.001.201.40
1.001.20
1.000.88
Counts E2K7 SP1 E2010
Mailbox Count 750 750
Tables 14754 92435
Internal Trees 60852 37652
LV Trees 3 5
Secondary Indexes 85784 4557
Pages 28486144 5814032
Used Pages (%) 85.7% 86.7%
Available Pages (%) 14.3% 13.3%
Msg Table (% space) 84.9% 80.0%
1 Database, 750 x 250MB mailboxesRTF = RTF Compressed, Mix = 77% HTML, 15% RTF, 8% Text
Avg. Message size = ~50KB
Msg Views
32KB Pages
DB Space AnalysisDB File Size Comparison
20
IO Reduction: Read IO Gap Coalescing
Page 1
Msg Header
Page 2
X
Page 3
Msg Body
Disk
Page 4
X
Page 5
Msg Body
Exchange Server 2007 DB Read Behavior
Exchange Server 2010 DB Read Behavior
DBCache
Page 1
Msg Header
Page 3
Msg Body
Page 5
Msg Body
3 Read IO’s
Page 1
Msg Header
Page 2
X
Page 3
Msg Body
Disk
Page 4
X
Page 5
Msg Body
DBCache
Page 1
Msg Header
Page 3
Msg Body
Page 5
Msg Body
Page 2
Temp Buffer
Page 4
TempBuffer
1 Read IO
21
IO Reduction: Maintain Contiguity Over TimeNew Database Maintenance Architecture:
ESE Function Exchange Server 2007 Service Pack 1 (SP1)
Exchange Server 2010 (Beta)
Cleanup (deleted items/mailboxes)
Cleanup performed during Online Defrag (OLD) which occurs during Online Maintenance (OLM) time window
Cleanup performed at run time (when hard delete occurs)—happens during Store dumpster cleanup (OLM), pages are zeroed by default
Database B+Tree Defragmentation (aka OLD2):Background/throttled process that maintains space and contiguity of database tables
Space Compaction (deleted items/mailboxes)
Database is compacted and space reclaimed during Online Defrag (OLD)
Database is compacted and space reclaimed at run-time—auto-throttled
Maintain Contiguity (defragmentation)
N/A: Contiguity is compromised by space compaction
Database is analyzed for contiguity and space at run time and is defragmented in the background (B+Tree Defrag/OLD2)—auto-throttled
Database Checksum When configured, ½ of OLD maintenance window reserved for sequential scan (Checksum), manual throttle—active DB copy only
Two options (both Active and Passive copies):1. Run DB Checksum in the background
24x7 (default). Sequential IO2. Run DB Checksum during OLM window.
Sequential IO
22
IO Reduction: Database Contiguity Results
Exchange Server 2007 Message Header Table (aka MFT)
Exchange Server 2010 Message Header Table (aka MsgHeader)
Blue = contiguous (good)Red = fragmented (bad)
*Production/Dogfood database analysis
Random deletes at the tail
FRAGMENTED
CONTIGUOUS
DB Page Numbers
Summary
24
Exchange IO Trend
Exchange Server 2003
Exchange Server 2007
Exchange Server 2010
0
0.2
0.4
0.6
0.8
1
1
0.33
0.11
DB IOPS/Mailbox
IOPS/Mailbox
+90% Reduction!
25
Putting It All Together: Mailboxes/Disk
Exchange Server 2007 Exchange Server 2010
Mailboxes/Disk (7.2K SATA)
Exchange Server 2010 storage improvements cannot be quantified in IOPS reductions alone
+500
125
+4X Mailboxes/Disk!
250 MB Mailbox Size, 3MB DB Cache/user, 12 x 7.2k SATA disks (DB/Logs on same spindles), Loadgen Outlook 2007 Online Very Heavy Profile, measured at <20ms RPC Average latency
26
Summary Exchange Server 2010 store has…
Reduced DB IOPS by +70%...again!Optimized for large mailboxes (+10 GB) and
100K item countsOptimized for large/slow/low-cost disks
(SATA/Tier2)Made JBOD/RAID-less storage a viable optionEnables unmatched storage flexibility to push
storage Capex costs downProvides many more backup/DR options