exchange 2010 storage improvements

EXCHANGE 2010 STORAGE

IMPROVEMENTS

Nathan Winters – Exchange MVP

Agenda

A Brief History of Exchange Storage The new ethos Feature Deep Dive Summary

History

History ESE/JET Blue

IOPS – Random IO applicationWhy? – Small Expensive drives1.6GB disk $400 in 1996SCSI 2GB and 4GB 100 IOPSSingle Instance Storage

Clustering with Shared StorageBackup an issueSingle Point of Failure

32 bit Not enough RAMRam limited number of users per server

5

History - Exchange 2007

Big improvements in Exchange Server 2007Reduce storage input/output (I/O) (70%)Use large amounts of memory (64 bit)Increased page size (4 kilobyte (KB) -> 8 KB)Lower storage costsSupport large mailboxes (> 1 gigabyte (GB))Provide fast search (CI)Continuous replication (log shipping)High Availability (HA) + fast recoveryEliminate single points of failure

New Ethos

Email Usage Radicati seeing 165 mails per day growing

to 230 over next couple of years Users used to large free storage

25GB 5GB 3 years of mail

Triage once per year to archive○ Not once per day!

Mail available through all clients Cached Mode/Performance issues High Item counts – 5000, 20000, 100000

Disk Technology

Currently 2TB Moving to 8TB Random IO not getting quicker

15K RPM, 10K RPM, 7.2K RPM Density is getting better so can read

more data in the same time Flash – SSD – Didn’t take that bet

Optimised for spinning media for E14Expensive – so use as Cache in SAN

9

Exchange Server 2010 Storage VisionIO ReductionSequential

IO

Large, Fast, Low-cost Mailboxes

SATA/Tier 2 Disk

Optimization

Storage Design

Flexibility

RAID-less Storage (JBOD)

10

Exchange Server 2010 HA Storage Design Flexibility

SAN DAS (SAS) JBOD (SATA)

HA = CCR.33 IOPS/Mailbox2.5” 146GB 10K SAS DisksRAID5 for DBRAID10 for LogsSAS Array Controller (/w BBU)Backup = VSS SnapshotFast Recovery = CCR

HA = DAG (2 DB copies).11 IOPS/Mailbox3.5” 2TB 7.2K SATA/SAS DisksRAID10 for DB & LogsSAS Array Controller (/w BBU)Backup = Optional/VSSFast Recovery = Database Failover

DAS (SATA)

HA = DAG (3+ DB copies).11 IOPS/Mailbox3.5” 2TB 7.2K SATA/SAS Disks1 DB = 1 DiskBackup = Optional/VSSFast Recovery = Database Failover

More options to reduce storage cost

HA = Shared Storage Clustering+1.0 IOPS/Mailbox3.5” 15K 146GB FC DisksRAID10 for DB & LogsDedicated SpindlesMulti-path (HBA’s, FC Switches, SAN array controllers)Backup = Streaming off active Fast Recovery = Hardware VSS (Snapshots/Clones)

11

JBOD/RAID-less Storage: Now An Option JBOD : 1 disk = 1 database (with logs) Requires Exchange Server 2010 High Availability (3+ DB Copies) Annual Disk Failure Rate (AFR) = 5%

JBOD AdvantagesReducing Storage Costs/Complexity

Eliminates unnecessary DB copies: server and storage redundancy can be symmetrical

Reduces disk IO: eliminates RAID write penalty

Enables simple storage design: 1 disk = 1 database (with logs)

Enables simple storage failure recovery

JBOD ChallengesExchange HA/storage must replace RAID functionality

Disk striping performance (e.g. RAID10) cannot be leveraged

Disk failure = database failover (~30 second outage)

Re-enabling resiliency = spare disk assignment/partitioning/format/DB re-seed (scriptable)

Soft disk errors (bad blocks) must be detected and repaired

12

Exchange Server 2010 HA

Mailbox Server

Simplified mailbox High Availability and disaster recovery with new unified platform

DB1

DB3DB2

DB4DB5

Recover quickly from

disk and database

failures

Mailbox Server

DB1DB2

DB4DB5

DB3

Mailbox Server

DB1DB2

DB4DB5

DB3

Replicate databases to remote datacenter

San Jose New York

Evolution of continuous replication technology (database mobility)Easier than traditional clustering to deploy and manageAllows each database to have 16 replicated copiesProvides full redundancy of Exchange roles on as few as two servers

Deep Dive

Exchange 2010 Features Move to Sequential IO

Change Table structureLazy ViewPage size 32KBDatabase Compression (LVC)Read/Write CoalescingDatabase ContiguityCache Compression

Storage Groups Gone Single Point of Failure Gone Optimised for huge mailboxes

15

Random vs. Sequential Disk IO Random IO

Disk head has to move to process subsequent IO

Head movement = High IO latencySeek Latency limits

IO (IOPS) Sequential IO

Disk head does not move to process subsequent IO

Stationary head = low IO latencyDisk RPM speed limits I/O per

second (IOPS)

Disk Head

7.2K SATA Disk (20ms Latency)Random = 50 IOPSSequential = +300 IOPS

16

IO Reduction: Store Table Architecture

Exchange Server 2007

Message/Folder Table (MFT)

Joe:Inbox:H1

Joe:Inbox:H2

Joe:Inbox:H3

Per Database Per Folder

Mailbox Table

Jeff’s Mbx

Ann’s Mbx

Joe’s Mbx

Attachments Table

Jeff:Excel.xls

Ann:Pic.bmp

Joe:Help.doc

Message Table (Msg)

Joe:Msg10

Jeff:Msg32

Ann:Msg180

Folders Table

Jeff:Inbox

Ann:Drafts

Joe:Unread


View Tables (e.g. From)

Joe:H920

Joe:H302

Joe:H10

Secondary Indexes used for Views

Per Mailbox

Mailbox Table

Jeff’s Mbx

Ann’s Mbx

Joe’s Mbx

Message Header Table

Joe:H10

Joe:H302

Joe:H920

Folders Table

Joe:Inbox

Joe:Drafts

Joe:Unread

Per Database

New store schema = no more single instance storage within a database

Per View

Body Table

Joe:Msg10

Joe:Help.doc

Joe:Msg302

Exchange 2007

Exchange 2010

Many, random, IOs (1 per update)

Fewer, sequential, IOs (1 per view)

All Unread or Flagged items (view)

TimeM1 arrives M2 arrives M1 flagged M3 arrives M2 deleted

User uses OWA/Outlook Online and switches to this view

All Unread or Flagged items (view)

M1 M2 M1 M3 M2

M1 M2 M1 M3 M2

Nickel & Dime Approach

Pay to Play Approach

DB I/O

Store Schema Changes: Lazy View Updates

18

IO Reduction: Database Page Size Increased to 32 KB

Page 1

Msg Header

Page 2

X

Page 3

Msg Body

Disk

Page 4

X

Page 5

MsgBody

DBCache

Page 1

Msg Header

Page 3

Msg Body

Page 5

MsgBody

3 Read IO’s

Page 1 (32KB)

Msg Header, Msg Body

Disk

DBCache

1 Read IO

Exchange Server 2007 DB Read 20 KB Message

Exchange Server 2010 DB Read 20 KB Message

8 KB Pages

32 KB Pages

Page 2 (32KB)

X

Page 1 (32KB)

Msg Header, Msg Body

19

Mitigate DB Space Growth: Database Compression

Problem: Store Schema change, space hints, B+Tree Defrag and 32 KB page size combine to increase DB file size by 20%

Solution: Growth is 100% mitigated by Database CompressionTargeted compression for message headers and text/html bodies (7bit/Express)

E2K7/RTF E14/RTF E14/Mix E14/HTML0.000.200.400.600.801.001.201.40

1.001.20

1.000.88

Counts E2K7 SP1 E2010

Mailbox Count 750 750

Tables 14754 92435

Internal Trees 60852 37652

LV Trees 3 5

Secondary Indexes 85784 4557

Pages 28486144 5814032

Used Pages (%) 85.7% 86.7%

Available Pages (%) 14.3% 13.3%

Msg Table (% space) 84.9% 80.0%

1 Database, 750 x 250MB mailboxesRTF = RTF Compressed, Mix = 77% HTML, 15% RTF, 8% Text

Avg. Message size = ~50KB

Msg Views

32KB Pages

DB Space AnalysisDB File Size Comparison

20

IO Reduction: Read IO Gap Coalescing

Page 1

Msg Header

Page 2

X

Page 3

Msg Body

Disk

Page 4

X

Page 5

Msg Body

Exchange Server 2007 DB Read Behavior

Exchange Server 2010 DB Read Behavior

DBCache

Page 1

Msg Header

Page 3

Msg Body

Page 5

Msg Body

3 Read IO’s

Page 1

Msg Header

Page 2

X

Page 3

Msg Body

Disk

Page 4

X

Page 5

Msg Body

DBCache

Page 1

Msg Header

Page 3

Msg Body

Page 5

Msg Body

Page 2

Temp Buffer

Page 4

TempBuffer

1 Read IO

21

IO Reduction: Maintain Contiguity Over TimeNew Database Maintenance Architecture:

ESE Function Exchange Server 2007 Service Pack 1 (SP1)

Exchange Server 2010 (Beta)

Cleanup (deleted items/mailboxes)

Cleanup performed during Online Defrag (OLD) which occurs during Online Maintenance (OLM) time window

Cleanup performed at run time (when hard delete occurs)—happens during Store dumpster cleanup (OLM), pages are zeroed by default

Database B+Tree Defragmentation (aka OLD2):Background/throttled process that maintains space and contiguity of database tables

Space Compaction (deleted items/mailboxes)

Database is compacted and space reclaimed during Online Defrag (OLD)

Database is compacted and space reclaimed at run-time—auto-throttled

Maintain Contiguity (defragmentation)

N/A: Contiguity is compromised by space compaction

Database is analyzed for contiguity and space at run time and is defragmented in the background (B+Tree Defrag/OLD2)—auto-throttled

Database Checksum When configured, ½ of OLD maintenance window reserved for sequential scan (Checksum), manual throttle—active DB copy only

Two options (both Active and Passive copies):1. Run DB Checksum in the background

24x7 (default). Sequential IO2. Run DB Checksum during OLM window.

Sequential IO

22

IO Reduction: Database Contiguity Results

Exchange Server 2007 Message Header Table (aka MFT)

Exchange Server 2010 Message Header Table (aka MsgHeader)

Blue = contiguous (good)Red = fragmented (bad)

*Production/Dogfood database analysis

Random deletes at the tail

FRAGMENTED

CONTIGUOUS

DB Page Numbers

Summary

24

Exchange IO Trend




0

0.2

0.4

0.6

0.8

1

1

0.33

0.11

DB IOPS/Mailbox

IOPS/Mailbox

+90% Reduction!

25

Putting It All Together: Mailboxes/Disk

Exchange Server 2007 Exchange Server 2010

Mailboxes/Disk (7.2K SATA)

Exchange Server 2010 storage improvements cannot be quantified in IOPS reductions alone

+500

125

+4X Mailboxes/Disk!

250 MB Mailbox Size, 3MB DB Cache/user, 12 x 7.2k SATA disks (DB/Logs on same spindles), Loadgen Outlook 2007 Online Very Heavy Profile, measured at <20ms RPC Average latency

26

Summary Exchange Server 2010 store has…

Reduced DB IOPS by +70%...again!Optimized for large mailboxes (+10 GB) and

100K item countsOptimized for large/slow/low-cost disks

(SATA/Tier2)Made JBOD/RAID-less storage a viable optionEnables unmatched storage flexibility to push

storage Capex costs downProvides many more backup/DR options

exchange 2010 storage improvements

Technology

logssas array

3 db copies

db amp

io reduction

sata

database

storage

iops