Matt Gossage Senior Program Manager Microsoft Corporation UNC321 Agenda Exchange storage background Storage technology 2010+ Large mailbox value E2010 storage architecture Store innovations ESE database innovations E2010 storage design Summary.

Download Report

Transcript Matt Gossage Senior Program Manager Microsoft Corporation UNC321 Agenda Exchange storage background Storage technology 2010+ Large mailbox value E2010 storage architecture Store innovations ESE database innovations E2010 storage design Summary.

Matt Gossage
Senior Program Manager
Microsoft Corporation
UNC321
Agenda
Exchange storage background
Storage technology 2010+
Large mailbox value
E2010 storage architecture
Store innovations
ESE database innovations
E2010 storage design
Summary
Exchange 2003 HA/Storage Design
MSIT 4+3 SCC SAN example
+1 IOPS/Mailbox
SAN Fabric A
4 Active Nodes
3 Passive Node
8 Processor cores
4 GB of RAM
4000 Users/Server
250 MB Mailboxes
Backups:
Daily Full
Stream to
disk/tape
SAN Fabric B
RAID10
3.5” 10K
FC Disks
Storage is single point of failure
Exchange 2007 HA/Storage Design
MSIT CCR + DAS example
.33 IOPS/Mailbox
~4000 Mailboxes/Cluster
8 Processor cores
16 GB of RAM
2 GB Mailboxes
Backups: DPM
15 min Incremental
Daily Express Full
Hub
Transport
Server:
File Share Witness
Transport Dumpster
Replay
Public Network
Private
Network
RAID
RAID
CCR
RAID
RAID
Transaction Log Shipping
Active Node
No single points of failure!
Passive Node
RAID5
2.5” 10K
SAS Disks
Disk Technology
Disk Capacity trend predicted to continue
2TB Desktop class SATA disks available
1TB Nearline/Midline SAS disk available
Sequential throughput increasing linearly based
on areal density
2010 SATA = ~250MB/sec
Random I/O performance not expected to
improve substantially
15K RPM is the ceiling
Random vs. Sequential Disk IO
Random IO
Disk Head
Disk head has to move to
process subsequent IO
Head movement = High
IO latency
Seek Latency limits IOPS
Sequential IO
Disk head does not move to
process subsequent IO
Stationary Head =
Low IO latency
Disk RPM speed limits IOPS
7.2K SATA Disk (20ms Latency)
Random = 50 IOPS
Sequential = +300 IOPS!
IOPS = Input/Outputs (IO’s)per second
FLASH/SSD: E2010 Scenarios
NAND
Flash best utilized by E2010 when used as a
cache within storage stack
NAND
HBA / RAID
E2010 Mailbox
Server
SATA SSD
Enterprise SAN Array
PCM
Hybrid HDD
?
E-mail Trends
Messages Sent/Received Per User/Day
250
200
150
100
50
0
2008
2010
2012
The average corporate user, today, can expect to send and receive about 156 messages
a day, and this number is expected to grow to about 233 messages a day by 2012. An
increase of 33% over the four-year period. (Radicati, 2008)
Business users report that they currently spend 19% of their work day, or close to 2
hours/day on email. (Radicati, 2007)
Large Mailbox Value
Large Mailbox = 1-10GB+
“Aggregate Mailbox” =
Primary mailbox +
Archive Mailbox
~1 Year of mail (minimum)
Increased knowledge
worker productivity
Time
Items
Mailbox
Size (MB)
1 Day
200
10
1 Month
4000
200
1 Year
48,000
2,400
4 Years
192,000
9,600
*Very Heavy Profile = 150 Receive + 50 Send /Day,
50KB, no deletions
Reduced mailbox
management
Client Accessibility (Outlook/OWA/Mobile)
Eliminate/Reduce PST’s
Eliminate/Reduce 3rd Party Archive
Large Mailbox Challenges & Solutions
Client Experience
Outlook 2007 Performance
(Cached Mode)
Outlook 2007 (Online)/OWA
Performance
Items/folder Limitations
View Creation Performance
Client Search Performance
Performance Improvements:
Office 2007 SP2 (KB953195)
Updated OST sizing guidance (10GB)
Utilize the E2010 Archive Mailbox
to reduce data cached to OST
E2010 Store/ESE changes
E2010 Store/ESE changes
E2010 Search Performance
Improvements
Real-time result views
2x increase in indexing
performance
E2010 Store/ESE changes
Large Mailbox Challenges & Solutions
Deployment/Operations
Long Backup Times
Fast Recovery
Requirements (RTO)
High Storage Costs
Backup off passive copies
Daily Incremental/Weekly Full
backups
DPM Express Full Backups
E2010 HA + Hold Policy is your
backup
E2010 HA
E2010 Store/ESE changes
IOPS (efficiently utilizing low
performance/high capacity disks)
RAID overhead
Move Mailbox Downtime
E2010 Online Move Mailbox
Database Maintenance
E2010 Store/ESE changes
Online Maintenance Duration (OLD)
DB corruption (-1018) pain point
DB re-seed performance hit on
active copy
Exchange 2010 Storage Vision
IO Reduction
Sequential IO
SATA/Tier 2 Disk
Optimization
Large, Fast, Lowcost Mailboxes
Storage Design
Flexibility
RAID’less Storage
(JBOD)
IOPS Reductions: Store Schema Changes
Store Schema = The way the Store organizes data in the ESE Database
E2010: One simple theme
Move away from doing many, random, small size, disk IOs to doing fewer,
sequential, large size, disk IOs.
Significant Benefits
Fast/Efficient..
OWA/Outlook Online Mode
…end user viewing for “cold” states/first time view creation
…Calendar Operations
…Search performance
Outlook Cached Mode/Exchange Active Sync
OST sync = sequential IO
EAS sync = sequential IO
Server Management
…Move mailbox
…Content Index Crawls
IOPS Reduction: Store
Table
Architecture
Per Database
Per Folder
E2007
Mailbox Table
Folders Table
Message Table
(Msg)
Attachments Table
Message/Folder
Table (MFT)
Jeff’s Mbx
Jeff:Inbox
Joe:Msg10
Jeff:Excel.xls
Joe:Inbox:H3
Ann’s Mbx
Ann:Drafts
Jeff:Msg32
Ann:Pic.bmp
Joe:Inbox:H2
Joe’s Mbx
Joe:Unread
Ann:Msg180
Joe:Help.doc
Joe:Inbox:H1
Secondary Indexes used for Views
Per Mailbox
Per Database
E2010
Per View
Mailbox Table
Folders Table
Message Header
Table
Body
View Tables (e.g.
From)
Jeff’s Mbx
Joe:Inbox
Joe:H10
Joe:Msg10
Joe:H920
Ann’s Mbx
Joe:Drafts
Joe:H302
Joe:Help.doc
Joe:H302
Joe’s Mbx
Joe:Unread
Joe:H920
Joe:Msg302
Joe:H10
New Store Schema = no more single instance storage within a DB
Store Schema Changes: Physical Contiguity
B+ Tree
E2007
1078
92
4577
6
872
7210
3278
21
9346
DB Pages
(Page
Numbers)
Many, small size, IOs (1 per 8K page)
E2010
B+ Tree
1078
B+Tree = Table
1079
1080
1081
1082
1083
3456
3457
3458
Fewer, larger size, sequential IOs
Store Schema Changes: Logical Contiguity
Mailbox
Inbox
Calendar
Drafts
For Follow-up
DL Mail
M1
M3
M5
M4
M2
E2007
Many, small size, IOs
Random
Mailbox
E2010
DL Mail
M1
Calendar
M2
Drafts
M3
For Follow-up
M4
Inbox
M5
Sequential
Fewer, large size, IOs
Store Schema Changes: Lazy View Updates
Reducing IO by deferring view updates
View updates utilize sequential IO
All Unread or Flagged items (view)
M1
E2007
M2
M1
M3
M2
Nickel &
Dime
Approach
DB I/O
Many, random, IOs (1 per update)
M1 arrives
M2 arrives
M1 flagged
M3 arrives
M2 deleted
Time
User uses OWA/Outlook Online and
switches to this view
E2010
Pay to Play
Approach
All Unread or Flagged items (view)
M1
M2
M1
M3
M2
Fewer, sequential, IOs (1 per view)
Outlook 2007 SP2 Large Mailbox
Performance on E2010
IOPS Reduction: ESE Changes
Optimize for new Store Schema
Allocate database space in contiguous manner
Maintain database contiguity over time
Utilize space efficiently (Database compression)
Increase IO Sizes
DB page size increased from 8KB to 32KB
Improved read/write IO coalescing (Gap coalescing)
Provide improved async read capability (Pre-read)
Increase Cache Effectiveness
100MB Checkpoint Depth (HA configurations only)
DB Cache Compression (aka Dehydration)
DB Cache Priority (aka Fast Evict)
IOPS Reduction: Space Management
Allocate space based on contiguity
Database Space Allocation Hints:
• Allocate DB space based on either data compactness or data contiguity
(usage pattern)
DB Cache
Space
Contiguity
Page X
Page Y
Page Z
Msg
Header
Msg
Header
Event
History
Disk
Space
Compactness
Page 1
Page 2
Page 3
Page 4
Page 5
Used
Event
History
Used
Msg
Header
Msg
Header
Contiguity
Random/Compact
Sequential/Bloat
IOPS Reduction: Maintain Contiguity
New database maintenance architecture
ESE Function
E2007 SP1
E2010
Cleanup
(deleted
items/mailboxes)
Cleanup performed during Online
Defrag (OLD) which occurs during
Online Maintenance (OLM) time
window
Cleanup performed at run time (when
hard delete occurs). Happens during
Store dumpster cleanup (OLM), pages
are zeroed by default.
Space Compaction
Database is compacted and space
reclaimed during Online Defrag
(OLD)
Database is compacted and space
reclaimed at run-time. Auto-throttled.
Maintain Contiguity
N/A: Contiguity is compromised
by space compaction
Database is analyzed for contiguity and
space at run time and is defragmented
in the background (B+Tree
Defrag/OLD2). Auto-throttled.
Database Checksum
When configured, ½ of OLD
maintenance window reserved for
sequential scan (Checksum),
manual throttle. Active DB copy
only.
Two options (both Active and Passive
copies):
1. Run DB Checksum in the
background 24x7 (default).
Sequential IO
2. Run DB Checksum during OLM
window. Sequential IO
IOPS Reduction: DB Contiguity Results
DB Page
Numbers
E2007 Message Folder Table (aka MFT)
FRAGMENTED
Random Deletes at the tail
E2010 Message Header Table (aka MsgHeader)
CONTIGUOUS
*Production database analysis
Blue = contiguous (good)
Red = fragmented (bad)
Mitigate DB Space Growth: Database Compression
Store Schema change, Space Hints, B+Tree Defrag & 32KB page
size combine to increase DB file size by 20%.
Growth is 100% mitigated by Database Compression
7bit/XPRESS Compression for message headers and text/html bodies
(Long Values)
DB File Size Comparison
1.50
1.20
1.00
1.00
1.00
DB Space Analysis
0.88
0.50
0.00
E2007/RTF
E2010/RTF
Counts
E2007 SP1
Mailbox Count
750
Tables
14754
Secondary Indexes
85784
Pages
28,486,144
Used Pages (%)
85.7%
Available Pages (%)
14.3%
E2010
750
92435
4557
5,814,032
86.7%
13.3%
E2010/Mix E2010/HTML
1 Database, 750 x 250MB mailboxes,RTF = RTF Compressed, Mix = 77% HTML, 15% RTF, 8% Text, Avg. Message size = ~50KB
Msg Views
32KB Pages
IOPS Reduction: DB Page Size Increased to 32KB
E2007 DB Read 20KB
Message
DB
Cache
Page 1
Page 3
Page 5
Msg
Header
Msg
Body
Msg
Body
Disk
3 Read IO’s
8 KB
Pages
E2010 DB Read 20KB
Message
DB
Cache
Page 1
Page 2
Page 3
Page 4
Page 5
Msg
Header
X
Msg
Body
X
Msg
Body
Page 1 (32KB)
Msg Header, Msg Body
1 Read IO
Disk
32 KB
Pages
Page 1 (32KB)
Page 2 (32KB)
Msg Header, Msg Body
X
~20KB Message
IOPS Reduction: IO Gap Coalescing
Read Case
E2007 DB Read
Behavior
DB
Cache
Page 1
Page 3
Page 5
Msg
Header
Msg
Body
Msg
Body
Disk
3 Read IO’s
E2010 DB Read
Behavior
DB
Cache
Page 1
Page 2
Page 3
Page 4
Page 5
Msg
Header
X
Msg
Body
X
Msg
Body
Page 1
Page 2
Page 3
Page 4
Page 5
Msg
Header
Temp
Buffer
Msg
Body
Temp
Buffer
Msg
Body
1 Read IO
Disk
Page 1
Page 2
Page 3
Page 4
Page 5
Msg
Header
X
Msg
Body
X
Msg
Body
IOPS Reduction: 100MB Checkpoint Depth
Checkpoint Depth = The amount of data that is waiting to be committed to
the database file (edb).
E2010 default Checkpoint Depth Max is increasing from 20MB to 100MB only
on databases protected by E2010 HA (standalone still 20MB).
Deep Checkpoint Benefit = Efficient DB writes (~40% reduction)
100MB Checkpoint Depth = 40% DB write IO reduction
120
100
Database Pages
Repeatedly
Written/sec
80
60
40
DB Writes/sec (avg)
20
0
20
40
60
Checkpoint Depth (MB)
80
100
Loadgen Test: 3000 Mailbox, 12 DB,
Outlook 2007 Online Very Heavy Profile
Deep Checkpoint Risks = long store shutdown times, long crash recovery times.
Risk Mitigation: shutdown databases in parallel, failover on store crash
IOPS Reduction: DB Cache Compression
Problem: New Store Schema + 32KB pages can reduce
efficiency of cache. E.g. A page with 8KB of data
consumes 32KB of memory in the DB Cache.
Solution: Implement DB Cache Compression to shrink
partially used cached pages in memory; allowing more
Effective cache.
1.
2.
32KB Page with only
8KB of data is read
off disk
32KB page is
compressed to
a 8KB inmemory image
DB
Cache
Page 1 (32KB)
Page 1 (8KB)
8KB
8KB
Disk
Page 1 (32KB)
8KB
Up to 30% more cache/mailbox server
More Cache = Less DB IO!
IOPS Reduction: DB Cache Priority
Problem: Background and recovery DB operations can
pollute the cache. E.g. DB Check summing, OLD2, HA
log replay.
Solution: Implement DB Cache Priority to allow lower
cache priorities for background/replay operations.
Outlook
Message Read
HA Log Replay
DB
Maintenance (Passive)
DB Cache Time
Past
Cache Eviction
Now
Cache Entry
ESE Caching Algorithm = LRU-K (Least Recently Used)
Future
Exchange 2010 Storage Speeds and Feeds
Mailbox IO Characteristics: E2007 vs... E2010
DB IO
E2007
E2010
Log IO
E2007
E2010
IO Type
Random
“Sequentialish”
IO Type
Sequential
Sequential
Read:Write
1:1
3:2
Avg Read IO
Size (KB)
12
52
Read:Write
0:1
0:1
n/a
n/a
Avg Write IO
Size (KB)
8
60
Avg Read IO Size
(KB)
Avg Write IO Size
(KB)
10
10
DB IO Sizes increase by
5x!!
Log IO Write Size is
the same...
3000 Mailboxes, 12 DB’s 4MB DBCache/Mailbox, Loagen Outlook 2007 Online Very Heavy Profile, 250MB Mailbox Size
IOPS Reduction: E2007 vs. E2010 Results
DB IOPS Comparison
+70%
Reduction!
500
450
400
350
DB Read IO/Sec
DB Write IO/Sec
DB IO/Sec
300
250
200
150
100
50
0
E2007
E2010
3000 Mailboxes, 3MB DB Cache/user, Loadgen Outlook 2007 Online Very Heavy Profile, 250MB Mailbox Size, E2010 Beta
Exchange IOPS Trend
DB IOPS/Mailbox
+90%
Reduction!
1
Exchange 2003
Exchange 2007
Exchange 2010
0.8
0.6
0.4
0.2
0
Exchange 2003
Exchange 2007
Exchange 2010
Optimize for SATA/Tier 2 Disks
DB Write IO “Burstiness”
Problem: Bursty DB writes negatively affect DB read and Log write latency
• The more write IO’s issued at a time, the more disk contention.
IO Latency Based on Max DB Write IO’s (ms)
120
100
DB Read IO
80
Latency
(ms)
60
40
Log Write IO
20
0
2
4
8
16
32
64
Maximum DB Write IO's Issued
Solution: Throttle DB writes based on Checkpoint target (QoS), DB Write Smoothing
Single 7.2k SATA disk, logs/db on same spindle, Loadgen load generating 250 RPC Operations/second, ~50 IOPS
DB Write Smoothing: Results
E2010 Smooth DB IO Benefit
49
50% Reduction!
50
45
40
34
35
DB Read Latency (ms)
30
Log Write Latency (ms)
25
20
RPC Average Latency
15
10
5
10.1
3.7
5.1
0.7
0
Exchange 2010 Baseline
Exchange 2010 Smooth
DB IO
3000 Mailboxes, 3MB DB Cache/user, 12 x 7.2k SATA disks (DB/Logs on same spindles), Loadgen Outlook 2007 Online Very Heavy Profile
Putting It Altogether: Mailboxes/Disk
E2010 Storage improvements cannot be quantified in IOPS
reductions alone
Mailboxes/Disk
+500
125
Exchange 2007
Exchange 2010
250MB Mailbox Size, 3MB DB Cache/user, 12 x 7.2k SATA disks (DB/Logs on same spindles), Loadgen Outlook 2007 Online Very Heavy Profile, measured at <20ms RPC
Average latency
JBOD/RAID'less Storage: Now an option!
JBOD : 1 disk = 1 Database/Log
Requires E2010 HA (3+ DB Copies)
Annual Disk Failure Rate (AFR) = ~5%
JBOD Advantages
Reducing Storage Costs/Complexity
JBOD Challenges
Exchange HA/Storage must replace
RAID functionality
Eliminates unnecessary DB copies: Server
and Storage redundancy can be symmetrical
Disk Striping performance (e.g. RAID10)
cannot be leveraged
Reduces Disk IO: Eliminates RAID
write penalty
Disk Failure = Database Failover
(~30 second outage)
Enables Simple Storage Design: 1 disk =
1 database
Re-enabling Resiliency = Spare disk
assignment/partitioning/format/DB
re-seed (scriptable)
Enables Simple Storage Failure Recovery
Soft Disk Errors (bad blocks) must be
detected and repaired
JBOD/RAID'less Storage: E2010 Optimizations
Improve HA storage failure
detection and failover
HA now detects storage failures and
automatically fails over
Optimize HA
Failovers/Switchovers
Failovers < 30 seconds
ESE tuned to maintain DB cache
after failover (Cache warming)
Improve storage failure
detection (bad
blocks/corruption)
Active/Passive copy background scan
(Checksum)
Active/Passive copy Lost Write
Detection
Improve Database
Seeding/Repair
Utilize DB passive copy for seeding
source
Seed capability for Content Index
Catalog
Reduce re-seeds by using Single
Page Restore (Active and Passive)
JBOD/RAID'less Storage: Single Page Restore (Active)
1. Page corruption
detected on Active
Copy (e.g. -1018)
2. Active DB places
marker in log stream to
notify passive copies to
ship up to date page
3. Passive receives log
and replays up to
marker, retrieves good
page, invokes Replay
Service callback and
ships page
4. Active receives good
page, writes page to
DB. Page is restored.
5. Subsequent page
repair from additional
copies ignored
Database Availability Group (DAG)
Mailbox Server
Node 1
Mailbox Server
Node 2
Mailbox Server
Node 3
DB1-Active
DB1-CopyA
DB1-CopyB
Log
Log
Log
Page1
Page1
Page1
Page2
Page2
Page2
Page3
Page3
Page3
Database
Database
Database
E2010 HA Storage Design Flexibility
SAN
• HA = Shared Storage
Clustering
• +1.0 IOPS/Mailbox
• 3.5” 15K 146GB FC Disks
• RAID10 for DB & Logs
• Dedicated Spindles
• Multi-path (HBA’s, FC Switches,
SAN array controllers)
• Backup = Streaming off active
• Fast Recovery = Hardware VSS
(Snapshots/Clones)
DAS (SAS)
•
•
•
•
•
•
•
•
HA = CCR
.33 IOPS/Mailbox
2.5” 146GB 10K SAS Disks
RAID5 for DB
RAID10 for Logs
SAS Array Controller (/w BBU)
Backup = VSS Snapshot
Fast Recovery = CCR
DAS (SATA/Tier2)
• HA = DAG (2+ DB copies)
• .11 IOPS/Mailbox
• 3.5” 1TB 7.2K SATA/Tier2 Disks
• RAID10 for DB & Logs
• SAS Array Controller (/w BBU)
• Backup = VSS
Snapshot/Optional
• Fast Recovery = Database
Failover
More options to reduce storage cost
JBOD (SATA/Tier2)
• HA = DAG (3+ DB copies)
• .11 IOPS/Mailbox
• 3.5” 1TB 7.2K SATA/Tier2 Disks
• 1 DB = 1 Disk
• SAS Array Controller (/w BBU)
• Backup = VSS
Snapshot/Optional
• Fast Recovery = Database
Failover
E2010 Storage Design Flexibility
Exchange Online Archive provides mailbox storage flexibility
One Mailbox per user or two
E2010 optimized for DAS storage, SAN storage is fully supported
IOPS reductions/SATA optimizations enable lower performing storage
E2010 HA architected for DAS (simpler)
JBOD* and RAID storage support
E2010 optimized for Tier 2 (SATA) disks, Enterprise disks are fully supported
SSD storage supported but not recommended for mainstream due to
high $/GB
Storage Groups are gone; Max 100 Databases/Server
Max recommended DB Size = 2TB*
Max recommended Folder Item Count = 100K**
*2+ copy E2010 HA only
** Assuming no 3rd party applications
E2010 Storage Requirements
Storage Guidance
Stand Alone
E2010 HA(2 copies)
E2010 HA(3+ copies)
Storage Type
DAS, SAN (Fibre Channel, iSCSI)
Disk Type
SAS, Fibre Channel, SATA/Tier2 , SSD
RAID
RAID recommended
RAID optional
RAID Type
RAID-1/0, RAID-5, RAID-6
JBOD
DB/Log Isolation
Best Practice
Windows Disk Type
Basic (recommended), Dynamic (supported)
Partition Type
GPT (recommended), MBR (supported)
Partition Alignment
Windows 2008/R2 Default (1MB)
File System
NTFS
NTFS Allocation Unit Size
64KB for both database and log volumes
Encryption Support
Outlook Protection Rules, Bitlocker
Not required
See Appendix for full details
E2010 HA/JBOD Storage Example
Single Site, 3 Node, 3 Copy DAG
8 Cores
32GB RAM
Mbx Server 1
8 Cores
32GB RAM
Mbx Server 2
8 Cores
32GB RAM
Mbx Server 3
DB1
DB2
DB3
DB4
DB5
DB6
DB1
DB2
DB3
DB4
DB5
DB6
DB1
DB2
DB3
DB4
DB5
DB6
DB7
DB8
DB9
DB10
DB11
DB12
DB7
DB8
DB9
DB10
DB11
DB12
DB7
DB8
DB9
DB10
DB11
DB12
DB14
DB15
DB16
DB17
DB18
DB20
DB21
DB22
DB23
DB24
DB26
DB27
DB28
DB29
DB30
DB13
DB14
DB15
DB16
DB17
DB18
DB19
DB20
DB21
DB22
DB23
DB24
DB25
DB26
DB27
DB28
DB29
DB30
D
B DB13
1
D DB19
B
1 DB25
DB14
DB15
DB16
DB17
DB18
DB20
DB21
DB22
DB23
DB24
DB26
DB27
DB28
DB29
DB30
DD
BB DB13
11
DD DB19
BB
11 DB25
Database Availability Group (DAG)
Active copy
Passive copy
Legend
Spare Disk
10,000 Mailboxes
Heavy Profile: 120
Messages/day
.11 IOPS/Mailbox
2GB Mailbox Size
3,333 Active
Mailboxes/Server
3 Nodes, 3 Copies =
double disk failure
resiliency
1TB 7.2k disks
(SAS/SATA/Tier2)
JBOD: 30
Disks/node
Online Spares
Battery Backed
Caching Array
Controller
Key Takeaways
Exchange Server 2010..
Reduces DB IOPS by +70%...again!
Optimizes for large mailboxes (+10GB) and 100K
Item counts
Optimizes for large/slow/low-cost disks
(SATA/Tier2)
Makes JBOD/RAID'less storage a viable option
Enables unmatched storage flexibility to
reduce costs
Resources
www.microsoft.com/teched
www.microsoft.com/learning
Sessions On-Demand & Community
Microsoft Certification & Training Resources
http://microsoft.com/technet
http://microsoft.com/msdn
Resources for IT Professionals
Resources for Developers
www.microsoft.com/learning
Microsoft Certification and Training Resources
Related Content
UNC314 – Information Protection and Control in Microsoft Exchange
Server 2010
UNC315 – Federation in Microsoft Exchange Server 2010
UNC312 – Archiving and Retention in Microsoft Exchange Server 2010
UNC320 – Microsoft Exchange Server Outlook Web Access 2010: The Future
of Web-Based E-mail
UNC317 – Microsoft Exchange Server 2010 Management Tools
UNC318 – Microsoft Exchange Server 2010 Transition and Deployment
UNC313 – High Availability in Microsoft Exchange Server 2010
UNC321 – Storage in Microsoft Exchange server 2010
UNC324 – What's New in Exchange Web Services in Microsoft Exchange
Server 2010
UNC319 – Unified Messaging in Microsoft Exchange Server 2010
Call to Action
Learn More!
Related Content at TechEd on “Related Content” Slide
Attend in-person or consume post-event at TechEd Online
Check out online learning/training resources
http://technet.microsoft.com/exchange/2010
http://technet.microsoft.com/office/ocs
Try It Out!
Download the Exchange Server 2010 Beta Evaluation
http://www.microsoft.com/exchange/2010/try-it
Get a 5-Day Trial of Office Communications Server 2007 R2
https://r2.uctrial.com/
IOPS Reductions: Store Schema Elements
How do you move from random IO to Sequential IO?
Element
Physical Contiguity
(ESE)
Logical Contiguity
(Store)
E2007
E2010
Poor physical contiguity of leaf Excellent physical contiguity of
pages. Hence many, small size, leaf pages. So fewer, large size
IOs (1 for each page)
IOs, spanning N pages (N ≈100)
Headers for each folder kept
in separate table. So many,
small size, IOs spread over
many tables
All views and indexes updated
Temporal Contiguity each time a mail is delivered.
(View)
So many, small size, IOs spread
over time
Headers for an entire mailbox
kept in a single table. Hence
fewer, large sized, IOs on a
single table
Views and indexes updated
only when they are accessed by
user. So fewer, large sized, IOs
done together
IOPS Reduction: Maintain Contiguity Over Time
New E2010 behavior…
1. Delivery
2. Random Delete
3. Defragmentation
Mailbox Messages
Mailbox Messages
Mailbox Messages
M1
M1
M1
M2
M2
M3
M3
M3
M5
M4
M4
M7
M5
M6
Contiguous
M5
M6
Fragmented
M10
M11
M7
M7
M12
M8
M8
M13
M9
M9
M14
M10
M10
M15
Contiguous
IOPS Reduction: Write IO Gap Coalescing
DB Cache
E2007 DB Write
Behavior
Page 1
Page 2
Page 3
Page 4
Page 5
Dirty
Clean
Dirty
Clean
Dirty
Writes spaced out over time
3 Write IO’s
Disk
DB Cache
E2010 DB Write
Behavior
Page 1
Page 2
Page 3
Page 4
Page 5
Dirty
Clean
Dirty
Clean
Dirty
1 Write IO
Disk
Big IO: How Big is Too Big?
IO Latency increases with IO size
Random DB IO Latency Based on Size
25
Write
IO Latency (ms)
20
15
Read
10
5
E2010 Max IO Size =
256KB for Read
384KB for Write
0
0
128 256 384 512 640 768 896 1024
IO Size (KB)
SqlIO Test, 1x 750GB 7.2k SATA, no caching array controller
Optimize for SATA/Tier 2 Disks
Solution: Smooth DB Write IO
Throttle DB writes based on Checkpoint target (QoS)
• When Checkpoint Depth equals 1x ->1.24x of Checkpoint target, Limit Max Outstanding DB
writes/LUN to 1
• When Checkpoint Depth meets or exceeds 1.25x of Checkpoint target, ratchet up Max
Outstanding DB writes/LUN
• The further behind on checkpoint, the more aggressively we raise the Max Outstanding DB
writes/LUN (Maximum = 512/LUN)
Max Outstanding DB Writes
Max Outstanding DB Writes vs.. Checkpoint Depth
40
35
30
Works for both
JBOD SATA
through
RAID10 SAN
25
20
15
10
5
0
26 26 27 27 28 28 29 29 30 30 31 31 32 32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43 44
LogCheckpoint
Checkpoint Depth
Log
Depth(MB)
(MB)
20MB Max Checkpoint example
JBOD/RAID'less Storage: Lost Flush Detection
What is a lost flush?
A DB write IO that the disk subsystem/OS returned as completed did not
actually get written to media or was written in the wrong location (aka
lost write).
Why are they so bad?
Your database may be logically corrupt and you do not know it!
How can they be detected in E2010?
Two methods:
1. In Memory Flush Map (Active & Passive): memory overhead of
2 bits/page. Event ID 530 is fired when detected (-1119) and
page can be patched.
2. Database Recovery: Event is fired (ID 516: timestamp mismatch,
(-567)) and database must be re-seeded.
Exchange 2010 High Availability
Simplified Mailbox High Availability and
Disaster Recovery with New Unified Platform
San Jose
Recover quickly
from disk and
database failures
New York
Mailbox
Server
Mailbox
Server
Mailbox
Server
DB1
DB2
DB3
DB4
DB5
DB1
DB2
DB3
DB4
DB5
DB1
DB2
DB3
DB4
DB5
Replicate databases
to remote datacenter
Evolution of Continuous Replication technology (Database Mobility)
Easier than traditional clustering to deploy and manage
Allows each database to have 16 replicated copies
Provides full redundancy of Exchange roles on as few as two servers
E2010 High Availability Architecture
AD site: Dallas
Client
DB1
DB3
AD site: San Jose
DB5
CAS/HUB
Mailbox
Server 6
Database
Availability
Group (DAG)
Mailbox
Server 1
Mailbox
Server 2
Mailbox
Server 3
Mailbox
Server 4
Mailbox
Server 5
DB1
DB4
DB2
DB5
DB3
DB2
DB5
DB3
DB1
DB4
DB3
DB1
DB4
DB2
DB5
JBOD/RAID'less Storage: Single Page Restore
Passive
1.
Page corruption
detected on DB Copy
(e.g. -1018)
2.
Passive copy pauses
log replay (log
copying continues)
3.
4.
5.
Database Availability Group (DAG)
Mailbox Server
Node 1
Mailbox Server
Node 2
Mailbox Server
Node 3
Passive retrieves the
corrupted page # from
the active using DB
seeding infrastructure
DB1-Active
DB1-CopyA
DB1-CopyB
Log
Log
Log
Passive copy waits till
log file which meets
max required
generation
requirement is
copied/inspected, then
patches page
Page1
Page1
Page1
Page2
Page2
Page2
Page3
Page3
Page3
Database
Database
Database
Passive resumes log
replay
Exchange 2010 Storage Guidance
Storage Type
Direct Attached Storage (DAS)
Storage Area Network (SAN): iSCSI
Stand Alone
Database Availability Group: 2 nodes, 2 Database copies Database Availability Group: 3+ nodes, 3+ Database copies
Supported
Supported
Supported. Best Practice = Do not share physical Supported. Best Practice = Do not share physical disks
disks backing Exchange data with other
backing Exchange data with other applications.
applications.
Supported
Supported. Best Practice = Do not share physical disks backing
Exchange data with other applications.
Storage Area Network (SAN): Fibre
Channel (FC)
Supported. Best Practice = Do not share physical Supported. Best Practice = Do not share physical disks
disks backing Exchange data with other
backing Exchange data with other applications.
applications.
Best Practice = Do not place both database copies on the
same physical spindles.
Supported. Best Practice = Do not share physical disks backing
Exchange data with other applications.
Best Practice = Do not place both database copies on the same
physical spindles.
Network Attached Storage (NAS): SMB
Physical Disk Type
SATA
Not Supported
Not Supported
Supported, requires battery backed caching array Supported, requires battery backed caching array
controller for data integrity
controller for data integrity
Supported, requires battery backed caching array controller for
data integrity
SAS
FC
SSD (Flash Disk)
Physical Disk Write Caching (enabled)
Storage RAID
Supported
Supported
Supported
Not Supported
RAID recommended
Supported
Supported
Supported
Not Supported
RAID recommended
Supported
Supported
Supported
Not Supported
RAID optional
EDB Volume
Log Volume
Disk Array RAID Stripe Size (kb)
Storage Array Cache Settings
RAID5/6, RAID10, RAID1
RAID1, RAID10
256KB
75% Write Cache, 25% Read Cache (with Battery
Backed Cache)
RAID5/6, RAID10, RAID1
RAID1, RAID10
256KB
75% Write Cache, 25% Read Cache (with Battery Backed
Cache)
JBOD, RAID5/6, RAID10, RAID1
JBOD, RAID1, RAID10
256KB
75% Write Cache, 25% Read Cache (with Battery Backed Cache)
Preliminary Storage Guidance:
Subject to Change!
Database/Log file placement
Database/Log Isolation
Not Supported
Best Practice (for recoverability) = separate
Database file (.edb) and logs from same Database can
database file (.edb) and logs from same Database share same volume and same physical disk.
on to different volumes backed by different
physical disks
Database file (.edb) and logs from same Database can share
same volume and same physical disk. This is a best practice for
JBOD/RAID'less storage scenario where one or more volumes
store the edb and log files backed by the same physical disk.
Database Files/Volume
Based on backup methodology
Based on backup methodology
Log Streams/Volume
Based on backup methodology
Based on backup methodology
RAID = based on backup methodology, JBOD = one DB
file/volume is recommended
RAID = based on backup methodology, JBOD = one log
stream/volume is recommended
Recommended
Supported
Recommended
Supported
Recommended
Supported
File System
NTFS Defragmentation
NTFS Allocation Unit Size
Recommended
Supported
Windows 2008 Default: 1MB
Drive Letter or Mount Point (mount point host
volume must be RAIDed)
NTFS support only
Not required, not recommended
64KB for both edb and log volumes
Recommended
Supported
Windows 2008 Default: 1MB
Drive Letter or Mount Point (mount point host volume
must be RAIDed)
NTFS support only
Not required, not recommended
64KB for both edb and log volumes
Recommended
Supported
Windows 2008 Default: 1MB
Drive Letter or Mount Point (mount point host volume must be
RAIDed)
NTFS support only
Not required, not recommended
64KB for both edb and log volumes
NTFS Compression
Not Supported for Exchange Database files
Not Supported for Exchange Database files
Not Supported for Exchange Database files
NTFS Encrypted File System (EFS)
Not Supported for Exchange Database files
Not Supported for Exchange Database files
Not Supported for Exchange Database files
Windows Bitlocker (volume encryption)
Supported for all Exchange database and log files Supported for all Exchange database and log files
Windows Disk Type
Basic Disk
Dynamic Disk
Partition Type
GUID Partition Table (GPT)
Master Boot Record (MBR)
Partition Alignment
Volume Path
Supported for all Exchange database and log files
Complete an
evaluation on
CommNet and
enter to win!
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should
not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.