Transcript Document
SQL Server 2014: In-Memory OLTP Adoption Considerations Mike Weiner @MikeW_CAT [email protected] Performance-critical data fits in memory Push decisions to compilation time Conflicts are Rare Built-In Main-Memory Optimized T-SQL Compiled to Native Machine Code Non-Blocking Execution SQL Server Integration Architectural Pillars • Direct pointers to rows • Indexes exist only in memory • No buffer pool • No write-ahead logging • Stream-based storage • T-SQL compiled to machine code leveraging VC compiler • Procedure and its queries, becomes a C function • Aggressive optimizations @ compile-time • Multi-version optimistic concurrency control with full ACID support • Lock-free data structures • No locks, latches or spinlocks • No I/O during transaction • Same manageability, administration & development experience • Integrated queries & transactions • Integrated HA and backup/restore Results Principles In-Memory OLTP Architectural Pillars - Review Speed of an in-memory cache with capabilities of a database Queries & business logic run at native-code speed Transactions execute to completion without blocking Hybrid engine and integrated experience Introduction Application Patterns for In-Memory OLTP In-Memory OLTP can meet demanding scale and performance goals with SQL Server In-Memory OLTP is a solution for applications suffering from RDBMS performance and scale bottlenecks There are some common workload patterns which work well with In-Memory OLTP Adoption Methodology and considerations for In-Memory OLTP Evaluate (early) whether In-Memory OLTP can address the application needs. Some workloads may not benefit from In-Memory OLTP. Some may with a large amount of work required Migrate in stages and there are tools to help Consider aspects of In-Memory OLTP Engine in design Successful Patterns with In-Memory OLTP* Implementation Scenario Pattern Characteristics and Challenge Main Benefits of In-Memory OLTP High Data Insert Rate Primarily append-only store Inability to ingest write workload High performance read operations Unable to meet scale-up requirements Compute Heavy Data Processing Insert/Update/Delete workload Heavy computation inside database Read and write contention o Eliminate contention Minimize IO logging Eliminate contention Efficient data retrieval Minimize code execution time CPU efficiency for scale Eliminate contention Minimize code execution time Efficient data processing Low Latency Require low latency business transactions which typical database solutions cannot achieve High concurrency exacerbates latency Eliminate contention Minimize code execution time Efficient data retrieval Heavy insert, update and point lookups User scale under load from multiple stateless web servers Eliminate contention Efficient data retrieval Optional IO reduction/removal Read Performance and Scale Session State Management *In-Memory OLTP – Common Workload Patterns and Migration Considerations Early Workload Performance Gains Factor X Gains for Applications BEST FIT Apps that take full advantage: e.g. web app session state 30 INGEST / READ HEAVY Apps with periodic bulk updates & heavy random reads 10 LEGACY APP Existing apps typically see 4-7x improvement 5 UPDATE HEAVY OLTP Despite 20 years of optimizing for OLTP benchmarks – we still get ~2x on a workload derived from TPC-C 2 0 10 X factor Gains 20 30 In-Memory OLTP impact within the Engine Traditional Execution Stack Client Connectivity Query Execution Data Access (Buffer Pool) Transaction Logging Gains In-Memory OLTP Execution Stack No Improvement 2X to 30X Improvement *Including concurrency gains Same Latency Potentially Less Volume for Durable I/O Eliminated for Non-Durable Client Connectivity Procedure Execution Data Access (Memory) Transaction Logging Transaction Logging Adoption Steps • Assess application • Is the application suitable for In-Memory OLTP? • Define goals and measure • Identify bottlenecks • “Analysis” through MDW and ‘Transaction Performance Analysis Report’ • Staged migration • “Migrate” utilizing Advisors in Management Studio for help • Measure against goals • Understand considerations • Memory, Storage, Indexing Assess Application • Invest in workloads that are OLTP in nature with few exceptions • Move critical data-sets/T-SQL into In-Memory OLTP and add more via subsequent dev/test iterations • Some applications may require a lot of work to achieve the gains, while others migrate quite easily. • If you lack the ability to change code In-Memory OLTP in SQL Server 2014 won’t work • Some deployments may not be a good fit • Hardware limitations • Long running reporting workloads • Row-versions can utilize a lot of memory if queries are “long running” • Queries which execute much better with parallelism • Dependencies on locking, MARS, unsupported collations, specialized indexes like XML/FullText • Very dynamic, undefined workload patterns Establish Performance Baseline • Production is great • More accurate and realistic data • But hard to get access and may impact workload • Simulation works, but requires… • Scale, transaction mix, profiling period matters • Try to be as realistic as possible with scale, transaction mix, hardware configuration, etc. • Baseline using Workload Tools • Ostress, Distributed Replay, Visual Studio Test System etc… • In-Memory OLTP’s Analyze, Migrate and Report Toolset • Performance Monitor/DMV’s Identify Bottlenecks • Table contention • Look at dm_db_index_operational_stats, dm_os_latch_stats, and Latch Waits/sec counter • Target table with heaviest latching • Table access • Look at dm_db_index_operational_stats • Target table with heaviest accesses • Stored procedure CPU usage • Look at dm_exec_query_stats • Target stored procedure with highest CPU usage • Tooling in place • In-Memory OLTP’s Analyze, Migrate and Report Toolset Testing with realistic workload is critical http://msdn.microsoft.com/en-us/library/dn511655(v=sql.120).aspx Taking an Iterative Approach to Migration Identify bottlenecks in tables Address surface area limitations and migrate code Identify performance critical transaction code Address unsupported constructs and migrate data Perform minimal changes for interpreted access Durability Considerations • Durability = SCHEMA_AND_DATA • Operations on data are always fully logged • Data is stored in checkpoint files on disk and can be recovered • Requires storage considerations • Durability = SCHEMA_ONLY • Transient data with no log or checkpoint files • Schema can be recovered but data cannot be recovered after server restart or failure • Reduces dependency on disk drives, improves overall transaction throughput • Not universally applicable – use for select scenarios only Storage Considerations for SCHEMA_AND_DATA objects Log Drive • Log IO is still critical to latency • Index operations and UNDO records are not logged Data Drive – Checkpoint File Pairs (CFP), space and performance • Have checkpoint files on drive with high sequential I/O • Needed for offline checkpoint, garbage collection (merge–read/consolidate), and recovery • Spread checkpoint files over odd numbers of containers over multiple disks • Data/Delta allocation round-robin across all containers, improve streaming IO bandwidth • Size of files may be larger then size of tables in-memory • • • • Start with allocating storage up to 4 times size of durable tables Automated merge to consolidate CFPs to remove deleted rows Merged CFPs to go through state transition and are eventually garbage collected Requires multiple checkpoints and log truncation. Need log backup based on recovery model • Use workload’s steady state to determine disk usage http://blogs.technet.com/b/dataplatforminsider/archive/2014/01/16/storage-allocation-and-managementfor-memory-optimized-tables.aspx Memory Considerations Memory Allocation • Size of rows (datatypes) and indexes, plus row-versions • • • • Row versions must stay in-memory until all transactions with their timestamp end are done Garbage collection, unless under memory pressure, is async and can take time to remove rows Nonclustered Hash Index: fixed, function of bucket count, 8 bytes per bucket Nonclustered Index: dynamic, function of row count, overhead Recommendations: • • • • Around 250GB max for SCHEMA_AND_DATA tables in a database Consider ‘common’ OLTP workload 2x memory allocation (2 row versions). Consider other workloads which may require memory under SQL Server process Use Resource Governor to control memory allocated at the database level • Create Resource Pool…MAX_MEMORY_PERCENT • Pools only allocate a percent (around 80%) to memory-optimized objects for system stability Estimating Memory Consumption • Memory Size = Table Size + SUM(Index Size) • • • • Table Size = Row Size * Row Count Row Size = SUM(Column Sizes) + Row Header Size Row Header Size = 24 + 8 * Index Count Column Size = Dependent on column type and associated padding/overhead • Hash Index Size = Bucket_Count * 8 bytes • Nonclustered Index Size = Row Count * (Key Size + 8) bytes • Details available in Books Online - http://msdn.microsoft.com/enus/library/dn205318(v=sql.120).aspx Further System Considerations No need to re-invest in hardware (although likely more memory). Processor • Number of cores/sockets – All deployments have been 2-4 socket • Not NUMA aware Network • Bandwidth, latency and configuration are still critical • Receive-Side Scaling configurations can become important • Consider batching client calls to avoid number of client/server round-trips for latency Index Guidelines • Nonclustered memory-optimized: • Execution similar to nonclustered on disk-based tables • Choose index direction that conforms to seek direction • Definition of column in index is distinct ordering during seek – Should match ORDER BY • Nonclustered Hash memory-optimized: • Bucket_count requirement • Bucket-count too low or too high impacts read and write behavior & memory • Performance generally good up to 5x bucket count, if avg key values per bucket > 10 consider nonclustered • Usually better to overestimate rather than under • Queries on sub-set of columns in hash index result in scans • Index key should be exact columns search is on • Recall indexes need to be re-created on startup Index Guidelines Operation Nonclustered Nonclustered Hash Specify Bucket Count Not Required Required Index Scan, retrieve all table rows Supported Supported Index seek on equality predicate(s) (=) Supported* Performs better than Nonclustered. Predicate must include all columns in hash key, otherwise will result in scan Index seek on inequality predicates (>, <, <=, >=), BETWEEN Performs better than Nonclustered Hash* Will result in scan Retrieve rows in a sort-order matching the index definition Supported Not Supported Retrieve rows in a sort-order matching the reverse of the index definition Not Supported Not Supported * Predicate does not need to include all key columns but must include at least leading key columns, otherwise will result in scan Interpreted Transact-SQL vs. Natively Compiled • Interpreted T-SQL Access – Access both memory- and disk-based tables – Less performant – Virtually full T-SQL surface • When to use – Ad hoc queries – Surface area not available in native – Compilation at runtime can impact plan (positively) • Natively Compiled Procs ‒ Access only memoryoptimized tables ‒ Maximum performance ‒ Limited T-SQL surface area • When to use ‒ Well defined OLTP-style operations ‒ Optimize performance critical business logic ‒ Query plan unlikely to change based on statistics Query Performance Lack of parallelism can be problematic for some queries Consider trade-off between faster execution (native) vs. parallel operation Joins between memory-optimized table and columnstore can be problematic Statistics creation and updates can be problematic for large tables Requirement of FULLSCAN can take a long time If query has many attributes individual column create stats can have an impact Consider impact of no auto-update statistics For InterOp queries: determining how often and when can affect plans For native procedures: Do data load, execute statistics update, create native compiled procedure Recap: Migration Steps Assess application Is the application suitable for In-memory OLTP? Define goals and measure Identify bottlenecks “Analysis” through MDW and ‘Transaction Performance Analysis Report’ Staged migration “Migrate” utilizing Advisors in Management Studio for help Measure against goals Understand considerations Memory, Storage, Indexing Review Application Patterns for In-Memory OLTP In-Memory OLTP can meet demanding scale and performance goals with SQL Server In-Memory OLTP is a solution for applications suffering from engine performance and scale bottlenecks There are some common workload patterns which work well with In-Memory OLTP Adoption Methodology and considerations for In-Memory OLTP Evaluate (early) whether In-Memory OLTP can address the application needs. Some workloads may not benefit from In-Memory OLTP. Some may with a large amount of work required Migrate in stages and there are tools to help Consider aspects of In-Memory OLTP Engine in design References In-Memory OLTP Common Workloads and Migration Considerations In-Memory OLTP Internals Overview In-Memory OLTP Sample Database and Workload (based on AdventureWorks) http://msdn.microsoft.com/en-us/library/dn511655(v=sql.120).aspx ASP.NET In-Memory OLTP Provider: http://www.nuget.org/packages/Microsoft.Web.SessionState.SqlInMemory/ In-Memory OLTP Blog Series Books Online: In-Memory OLTP