provide faster I/O bandwidth scale-up and scale-out reference configurations best practices Data Warehouse Fast Track.
Download ReportTranscript provide faster I/O bandwidth scale-up and scale-out reference configurations best practices Data Warehouse Fast Track.
provide faster I/O bandwidth scale-up and scale-out reference configurations best practices Data Warehouse Fast Track Fast Track PDW Fast Track SSAS PDW PDW SSAS Traditional MD design Characteristic Typical BI (DW’s & DM’s) OLTP (Operational Database) Data Activity Large reads (disjoint sequential scans) Large writes (new data appends) Indexed reads and writes Large scale hashing 100’s of Gigabytes to Terabytes (need medium to large storage farms) Historical (contributes to large data volumes) Largely unpredictable Small transactions Constant small index reads, writes, and updates Up to 20 GB/sec sustained throughput IOPS is more important than sustained throughput Database sweet spot size Time period Queries I/O throughput requirement Gigabytes (require smaller to medium sized storage farms) Current Predictable Microsoft/HP SQL Server/HP Fast Track reference configurations Traditional DW design reference configurations OR SQL Server Parallel Data Warehouse (PDW) Lower hardware costs Mmm, what will my logical & physical DB design look like ? Different logical and physical DB design philosophies balance workload across filegroups, LUNS - which, in turn, gets stripped across many physical disks PRO: Data is distributed and I/Os are parallelized across multiple physical disk drives CON: Little attention is traditionally paid to how or where data is physically loaded or indexes built It is not uncommon to have hundreds of disk drives to support the I/O throughput requirements in a traditional DW environment RAID 5 Let’s lake a closer look under the hood to understand BI performance How does Fast Track and PDW get it’s speed ? X-Ray view at the physical disk level First let’s look at a traditional DW….. Loading data into the data warehouse/mart Sequential data from multiple fact tables data may be loaded simultaneously Fact table 3rd day load Fact table Dimension table insert/updates may occur in parallel Indexes may be updated in real time as rows are inserted or updated - or Indexes rebuilt Traditional ROLAP design tends to encourage physical disk fragmentation Fact table 6th day load 5th day load Fact table 2nd day load Fact table Initial load Data is stored wherever it happens to land Trade-offs using indexes in data warehouses/data marts Duplicate data building indexes can be resource intensive Indexes (duplicate data) consume a large amount of disk space DBAs spend a lot of time managing and tuning indexes slower disk service times due to extra disk head movement ad-hoc queries may get inconsistent response times because of slower scan rates & long disk service times) Column Pre-Calculated data Index / Column Why Fast Track & PDW provide high levels of I/O throughput… Eliminating indexes and storing data sequentially will provide the fastest disk throughput rates Fast Track & PDW Index-lite Summary table Fastest sequential scan rates Index-lite is faster because there is less disk head movement Traditional DW design with indexes & summary tables Index Fast Track & PDW Disk throughput is slower with indexes, aggregates and summary tables Example: Average disk Seek time is typically about 4ms; Full stroke is about 7.5ms. At 15K RPM = 250 revolutions/sec. = 4ms for a full revolution = Average latency is about 2ms. Fast Track & PDW are designed to stream large blocks of data sequentially which is even faster than “average latency” because disk heads are directly over the streaming data. Seek time is typically 2 - 4x longer than average latency. By eliminating seek time you can have approximately 2 – 4x fewer disk drives in order to maintain a given throughput level. Fast Track & PDW are designed to stream large blocks of data sequentially! 64 disks vs. 200 disk drives to provide the similar levels of I/O throughput In addition, HP and SQL Server PDW uses Massively Parallel Processing (MPP) to expand Fast Track concepts in a BI “appliance” Fast Track and PDW get it’s speed from FAST scan rates ! Traditional DB design reference configurations HP Business Intelligence Sizer for Microsoft SQL Server 2005/2008 http://h71019.www7.hp.com/ActiveAnswers/us/en/sizers/ microsoft-sql-bi.html http://h20195.www2.hp.com/v2/GetPDF.aspx/4AA30347ENW.pdf http://h20195.www2.hp.com/v2/GetPDF.aspx/4AA30314ENW.pdf Fast Track or PDW Traditional DB design Basic 6 – 12TB DL38x w/ MSA2000 Mainstream 12 – 24TB DL585 G6 w/ MSA2000 Mainstream 16 – 32 TB DL580 G5 w/ MSA2000 G2 Premium 24 – 48 TB DL785 G6 w/ MSA2000 G2 HP SQL Server 2008 Parallel Data Warehouse (PDW) Control Rack Data Rack Free Your IT Pressures . . . Get More Value Without HP Factory Express Faster time to solution Free up valuable IT resources With HP Factory Express Maximize your IT investment ProLiant Servers Miscellaneous Techniques to Improve SQL Server BI Performance Techniques to Improve Performance Separate the DB/NON DB workloads PDW automatically uses compression Compression Reduces I/O workload but Increases CPU utilization more pages being read via each I/O Techniques to Improve Performance Primary key considerations PDW uses hash key Techniques to Improve Performance Techniques to Improve Performance Only inner joins between a fact table and a dimension table are considered Joins with dimensions are only considered when the dimension input cardinalities are smaller than the input cardinality from the fact table Make sure table “stats” are updated Techniques to Improve Performance Resource Governor Only within a SQL Server instance Controls CPU & memory resources used by SQL Server • Normal, Advanced and power users • Set workload group priority PDW uses the Resource Governor SQL Server Analysis Services 2008 SQL Server Analysis Services 2008 Techniques to Improve Performance SSAS SSAS has to major components Formula Engine (does most of the analysis work and tries to keep cells in memory) – Fast clock speeds are best Storage Engine (if cells are not in memory, the Storage Engine gets the data from disk) – Goal is to minimize Storage Engine use and keep data in memory for the Formula Engine to use Faster Storage (SSD) OR more disk drives for quicker responses to Storage Engine Manage your partitions in your AS Database by query performance required Because Large Cubes > 100 GB may not fit in memory. So we design the partitions to get into memory as quickly as possible. Best Practice – less than 4 million cells per partition Techniques to Improve Performance This is especially important if you run SSAS cubes on the same server as a ROLAP data warehouse/mart Tune memory Techniques to Improve Performance Windows System Resource Manager (WSRM) WSRM is useful in a consolidated environment (SQL Server, SSAS, SSIS, etc.) Do not use WSRM to set SQL Server instance limits. SQL Server has it’s own settings you can tune Techniques to Improve Performance Use FAST I/O Move active partitions to fast storage or SSD implementation Use a Dedicated Temp Storage Directory Keep drive contention minimized. SSD is excellent for this purpose Configure memory based upon your usage using the “Aggregation Design Wizard” Techniques to Improve Performance Multiple copies against the same source Techniques to Improve Performance “Row-at-a-time inserts” may be better than bulk load Buffers are allocated via Execution Trees Each of these Numbered Steps represents a new Execution Tree Spawning multiple copies of the package with a horizontal partition of data will create more process space and execution trees www.microsoft.com/teched www.microsoft.com/learning http://microsoft.com/technet http://microsoft.com/msdn Sign up for Tech·Ed 2011 and save $500 starting June 8 – June 31st http://northamerica.msteched.com/registration You can also register at the North America 2011 kiosk located at registration Join us in Atlanta next year