provide faster I/O bandwidth scale-up and scale-out reference configurations best practices Data Warehouse Fast Track.

Transcript provide faster I/O bandwidth scale-up and scale-out reference configurations best practices Data Warehouse Fast Track.

provide
faster I/O bandwidth
scale-up and
scale-out
reference configurations
best practices
Data Warehouse
Fast Track
Fast Track
PDW
Fast Track
SSAS
PDW
PDW
SSAS
Traditional MD
design
Characteristic
Typical BI (DW’s & DM’s)
OLTP (Operational Database)
Data Activity
Large reads (disjoint sequential
scans)
Large writes (new data appends)
Indexed reads and writes
Large scale hashing
100’s of Gigabytes to Terabytes
(need medium to large storage
farms)
Historical (contributes to large data
volumes)
Largely unpredictable
Small transactions
Constant small index reads, writes,
and updates
Up to 20 GB/sec sustained
throughput
IOPS is more important than
sustained throughput
Database sweet spot size
Time period
Queries
I/O throughput requirement
Gigabytes (require smaller to
medium sized storage farms)
Current
Predictable
Microsoft/HP
SQL Server/HP
Fast Track reference
configurations
Traditional DW design reference
configurations
OR
SQL Server Parallel Data
Warehouse (PDW)
Lower hardware costs
Mmm, what will my
logical & physical DB
design look like ?
Different logical and physical DB design philosophies
balance workload across filegroups, LUNS - which, in turn,
gets stripped across many physical disks
PRO: Data is distributed and I/Os are parallelized across multiple
physical disk drives
CON: Little attention is traditionally paid to how or where data is
physically loaded or indexes built
It is not uncommon to have hundreds of disk drives to support the I/O
throughput requirements in a traditional DW environment
RAID 5
Let’s lake a closer look under the hood to understand BI performance
How does Fast Track and PDW get it’s speed ?
X-Ray view at the physical disk level
First let’s look at a traditional DW…..
Loading data into the data warehouse/mart
Sequential data
from multiple fact tables data may be loaded
simultaneously
Fact table
3rd day load
Fact table
Dimension table insert/updates may occur in parallel
Indexes may be updated in real time as rows are inserted
or updated - or Indexes
rebuilt
Traditional ROLAP design tends to encourage
physical disk fragmentation
Fact table
6th day load
5th day load
Fact table
2nd day load
Fact table
Initial load
Data is stored wherever it happens to land
Trade-offs using indexes in data warehouses/data marts
Duplicate
data
building indexes
can be resource intensive
Indexes (duplicate data) consume a large amount of disk space
DBAs spend a lot of time managing and tuning indexes
slower disk service times due to extra disk head
movement
ad-hoc queries may get inconsistent response times
because of slower scan rates & long disk service times)
Column
Pre-Calculated
data
Index /
Column
Why Fast Track & PDW provide high levels of I/O throughput…
Eliminating indexes and
storing data sequentially will
provide the fastest disk
throughput rates
Fast Track & PDW
Index-lite
Summary
table
Fastest sequential scan
rates
Index-lite is faster
because there is
less disk head
movement
Traditional DW design with
indexes & summary tables
Index
Fast Track & PDW
Disk throughput is
slower with indexes,
aggregates and summary
tables
Example: Average disk Seek time is typically about 4ms; Full stroke is about 7.5ms.
At 15K RPM = 250 revolutions/sec. = 4ms for a full revolution = Average latency is about 2ms.
Fast Track & PDW are designed to stream large blocks of data sequentially which is even faster than “average latency” because
disk heads are directly over the streaming data.
Seek time is typically 2 - 4x longer than average latency. By eliminating seek time you can have approximately 2 – 4x
fewer disk drives in order to maintain a given throughput level.
Fast Track & PDW are designed to stream large blocks of data sequentially!
64 disks vs. 200 disk drives to provide
the similar levels of I/O throughput
In addition, HP and SQL Server PDW uses
Massively Parallel Processing (MPP) to expand
Fast Track concepts in a BI “appliance”
Fast Track and PDW get it’s speed
from FAST scan rates !
Traditional DB design reference configurations
HP Business Intelligence Sizer for Microsoft SQL Server
2005/2008
http://h71019.www7.hp.com/ActiveAnswers/us/en/sizers/
microsoft-sql-bi.html
http://h20195.www2.hp.com/v2/GetPDF.aspx/4AA30347ENW.pdf
http://h20195.www2.hp.com/v2/GetPDF.aspx/4AA30314ENW.pdf
Fast Track or PDW
Traditional DB
design
Basic
6 – 12TB
DL38x w/
MSA2000
Mainstream
12 – 24TB
DL585 G6 w/
MSA2000
Mainstream
16 – 32 TB
DL580 G5 w/
MSA2000 G2
Premium
24 – 48 TB
DL785 G6 w/
MSA2000 G2
HP SQL Server 2008 Parallel Data Warehouse (PDW)
Control Rack
Data Rack
Free Your IT Pressures . . . Get More Value
Without HP Factory Express
Faster time
to solution
Free up valuable
IT resources
With HP Factory Express
Maximize your IT
investment
ProLiant Servers
Miscellaneous Techniques to Improve SQL
Server BI Performance
Techniques to Improve Performance
Separate the DB/NON DB workloads
PDW automatically uses compression
Compression Reduces I/O workload but Increases CPU utilization
more pages being read via each I/O
Techniques to Improve Performance
Primary key considerations
PDW uses hash key
Techniques to Improve Performance
Techniques to Improve Performance
Only inner joins between a fact table and a dimension table are considered
Joins with dimensions are only considered when the dimension input cardinalities are smaller
than the input cardinality from the fact table
Make sure table “stats” are updated
Techniques to Improve Performance
Resource Governor
Only within a SQL Server instance
Controls CPU & memory resources used by SQL
Server
• Normal, Advanced and power users
• Set workload group priority
PDW uses the Resource Governor
SQL Server Analysis Services 2008
SQL Server Analysis Services 2008
Techniques to Improve Performance
SSAS
SSAS has to major components
Formula Engine (does most of the analysis work and tries to keep
cells in memory) – Fast clock speeds are best
Storage Engine (if cells are not in memory, the Storage Engine gets
the data from disk) – Goal is to minimize Storage Engine use and
keep data in memory for the Formula Engine to use
Faster Storage (SSD) OR more disk drives for quicker responses
to Storage Engine
Manage your partitions in your AS Database by query
performance required
Because Large Cubes > 100 GB may not fit in memory. So we design
the partitions to get into memory as quickly as possible.
Best Practice – less than 4 million cells per partition
Techniques to Improve Performance
This is especially important if you run SSAS cubes on the same server
as a ROLAP data warehouse/mart
Tune memory
Techniques to Improve Performance
Windows System Resource Manager (WSRM)
WSRM is useful in a consolidated environment (SQL Server, SSAS,
SSIS, etc.)
Do not use WSRM to set SQL Server instance limits. SQL Server
has it’s own settings you can tune
Techniques to Improve Performance
Use FAST I/O
Move active partitions to fast storage or SSD implementation
Use a Dedicated Temp Storage Directory
Keep drive contention minimized.
SSD is excellent for this purpose
Configure memory based upon your usage using the
“Aggregation Design Wizard”
Techniques to Improve Performance
Multiple copies against the same source
Techniques to Improve Performance
“Row-at-a-time inserts” may be better than bulk load
Buffers are allocated via Execution Trees
Each of these Numbered Steps represents a new
Execution Tree
Spawning multiple copies of the package with a
horizontal partition of data will create more
process space and execution trees
www.microsoft.com/teched
www.microsoft.com/learning
http://microsoft.com/technet
http://microsoft.com/msdn
Sign up for Tech·Ed 2011 and save $500
starting June 8 – June 31st
http://northamerica.msteched.com/registration
You can also register at the
North America 2011 kiosk located at registration
Join us in Atlanta next year