Tuning Your Game for Next Generation Intel HD Graphics Chuck Desylva Jeff Laflam Friday, March 12 @ 1:30PM Room 302, South Hall.
Download ReportTranscript Tuning Your Game for Next Generation Intel HD Graphics Chuck Desylva Jeff Laflam Friday, March 12 @ 1:30PM Room 302, South Hall.
Tuning Your Game for Next Generation Intel HD Graphics Chuck Desylva Jeff Laflam Friday, March 12 @ 1:30PM Room 302, South Hall http://www.intel.com/software/gdc [email protected] [email protected] See Intel at GDC: Intel Booth at Expo, North Hall Intel Interactive Lounge 2 Client PC Graphics Market Trends 2010 Intel® HD Graphics Architecture Enhancements Demo + Case Studies Optimization Guidelines for Intel HD Graphics 3 250 Mobile Integrated Mobile Integrated MSS Millions Units 200 Desktop Integrated 150 100 Mobile Discrete Intel 78.8% Desktop Integrated MSS 50 0 2008 2009 2010 2011 2012 2013 Desktop Discrete Intel 64.1% Source: Mercury Research PC Graphics Report (Q4’09) 4 32nm Processor Core First implementation of CPU/Graphics Engine in one package. Evolution of the Intel® GMA HD Graphics Architecture - 3D Performance Improvements Full Frame rate Blu-ray Playback Takes Advantage of 45nm Process - Reduced dynamic capacitance, voltage scaling and HiK+MG lower dynamic as well as lower leakage power Single Cooling solution for both dies - Reduces cost to the OEM - Reduces overall system power requirements 45nm Graphics & Memory Controller Not all features are available on every processor line item 5 Westmere is the codename for the family of 32nm processors based upon the Intel microarchitecture codename Nehalem. Vertex and primitive processing • Improvements in HWVP (>2X peak) • Improved Vertex throughput • Increased # of threads for VS ~1.4X Increase in Texture Throughput • Much faster Clip/Cull as well as Setup ~1.4X Increase in Pixel Throughput Rasterization and Z Optimizations 6 • Now includes Hierarchical-Z which provides compression for completely occluded pixels >1.5X Increase in Peak Computes including transcendental instructions # of EUs; • Fast Z Clear possible at 4X the normal peak throughput with Hierarchical-Z enabled Improved Support for Complex Shaders Improved IPC; Higher Frequency More threads/EU; Larger register file/EU, and Increased Instruction Cache First we tried turning off any z-prepass tests when optimizing for Intel graphics.. 7 8 • As it turns out, Cryptic was enabling an Uber-shader for lighting. • Many code guards controlled by boolean constants. • Up front shader compilation was not as optimal as runtime compilation. • Benefit to them is that they only need one shader per material. • By turning off the Uber-shader our performance more than doubled. • A driver sighting was opened and is under investigation. • Let us help you, because sometimes you help us! 9 uniform bool Light Param00:register(b0); uniform bool LightParam01:register(b1); if (LightBoolLightParam00) { if (LightBoolParam01) { // spot light code } else { // point light code } } else { if (LightBoolParam01) { // directional light code } else { // light disabled } } 10 • Approach for finding the root cause of slow performance • Walk the scene • Use the scene Ceiling performance? Wall performance? Ground performance? This wall is the key… 11 • GPA Overrides… your friends • Z-Test Enable/Disable • All the “others” Culling? Alpha? Beta? Look what we found behind the wall… 12 • System Analyzer • Overrides (NULL HW DRV) • GPU Time • CPU Time • Frame Analyzer • Draw Calls • Multi-pass • Spikes How many draw calls for that wall?… 13 • Pixel Height Test • Bounding Spheres • Pixel Coverage • CPU side test • Scale of Occlusion Culling I M A G E P L A N E Camera Position Scene Object Pixel Occupancy of the object (sphere) I M A G E Camera Position P L A N E I can’t see it anyway… 14 • Efficiently keep compute units busy • To avoid stalling the pipeline • Minimize Runtime and Driver Overhead • Scale visual effects and redundant computes that don’t render. • Optimize Pixel/Texel Operations. 15 CMD Buffer DMA Share 1.CopyResource Between Driver Render & Graphics Engine Command Command Staging Resource Copy output 2. App (CPU) …Map() Resource 3. CPU Stall Until Flush • Graphics Driver stores Asychronous D3D Calls in CMD Buffer. • If application issues CopyResource/CopySubresou rceRegion it gets mapped in the CMD Buffer. • If the App then tries to Map() the resource that was the target of the Copy call the CMD Buffer gets flushed. • The CPU then stalls waiting on the GE CMD Buffer to empty. • The App (CPU) will then begin accessing the resource while the GE sits Idle. STUTTERING CPU CPU CPU CPU F0 CPU F1 F2 F3 CPU CPU F4 F5 GPU GPU GPU GPU GPU F0 F1 F2 F3 F4 STALL F0 CPU STALL F1 CPU GPU GPU GPU F0 F1 F2 N-2 SYNCH CPU CPU CPU F0 F1 F2 F3 GPU GPU GPU F0 F1 F2 // 1. Create an event query from the current device. IDirect3DQuery9* pEvent; m_pD3DDevice->CreateQuery(D3DQUERYTYPE_EVENT, &pEvent); F2 CPU • This effect can be referred to as Stuttering. • To avoid this effect some implement “Frame rate smoothing”. // 2. Add an end marker to the command buffer queue. pEvent->Issue(D3DISSUE_END); // 3. Empty the command buffer and wait until the GPU is idle. while(S_FALSE == pEvent->GetData( NULL, 0, D3DGETDATA_FLUSH )); • The best solution is to call CopyResource at Frame N, which executes at Frame N+1. The copy should be finished when the app is processing N+2. • Put some time between locks by sychronizing N-2 frames. • • • • • • 18 For best EU Utilization minimize register usage • Large shaders impact performance when register usage is limited. • Mask alpha when not needed. Minimize use of transcendentals like LOG, POW, EXP etc. • Space out must have ops where possible Pre-load Shaders to avoid MidScene Compiles Avoid Mid-Scene textures changes Minimize Geometry shader usage. Experiment with Texture Sampling calls. • The Intel Driver optimizes for DX9 the most frequently used constants. – Avoid global constants where possible. • Limit Dynamically Indexed Constants C[a0] C[r] of Dx9/10. • In DX10 when a constant changes the complete buffer gets updated. – Group cbuffers by frequency of updates. – Organize cbuffers based on feature scaling. – Pack data into float4 boundaries. 19 http://software.intel.com/en-us/articles/directx-constants-optimizations-for-intel-integrated-graphics/ • Use large batches (i..e. >200-1000 primitives) ideally. • To minimize the number of state changes between batches: • Optimize the number of draw calls per frame. The more you have the higher the likelihood that you will be CPU limited. • If Small batches are needed, use Instancing for higher performance. • Keep in mind that some instancing methods use Geometry shaders. http://software.Intel® .com/en-us/articles/rendering-grass-withinstancing-in-directx-10/ 20 • One of the areas that are heavy on computes are full screen visual effects. • Specifically per pixel post-processing that requires multiple passes • Balance visual quality with speed by reducing complexity of the shaders and/or the number of passes. Some candidates to check for: • Glow/Bloom • Depth of Field • Motion Blur • HDR/Tone Mapping • Heat Distortion • Atmospheric Effects • Dynamic Ambient Occlusion 21 • Reduce LOD resolution for distant objects Reject objects outside the view frustum by doing visibility check Cull objects using Occlusion Query for complex scenes Maximize Use of Hi-Z and Early-Z - Render front to back where possible 22 Minimize MRT Usage. Avoid proprietary texture formats or formats 23 outside of the Dx Spec. Balance Texture load instructions with arithmetic instructions where possible Reduce number texture fetches for Low Fidelity modes Minimize use of large textures Use Compressed Textures with mip-maps Implement Shadows as a scalable feature Clear Color, Stencil and Z-Buffer in the same API call Integrated Graphics Software Development Forum - http://softwarecommunities.intel.com/isn/Community/enUS/forums/2414/ShowForum.aspx Developers Guide for Intel® Integrated Graphics - http://software.intel.com/en-us/articles/intel-graphics-media- 24 accelerator-developers-guide Intel® Graphics Performance Analyzer - www.intel.com/software/gpa Articles Mentioned in this Presentation - http://software.intel.com/en-us/articles/rendering-grasswith-instancing-in-directx-10 - http://software.intel.com/en-us/articles/directx-constantsoptimizations-for-intel-integrated-graphics/ Don’t Dread Threads Thursday, March 11 @ 9AM North 122, North Hall A Visual Guide to Game and Task Performance on Mass-market PC Game Platforms Thursday, March 11 @ 4:30PM North 122 Building Games for Netbooks Friday, March 12 @ 9AM Room 310, South Hall Task-based Multithreading – How to Program for 100 Cores Friday, March 12 @ 4:30PM Room 300, South Hall 25 26 • Efficiently keep compute units busy • To avoid stalling the pipeline • Minimize Runtime and Driver Overhead • Scale visual effects and redundant computes that don’t render. • Optimize Pixel/Texel Operations. 27 Risk Factors The above statements and any others in this document that refer to plans and expectations for the third quarter, the year and the future are forward-looking statements that involve a number of risks and uncertainties. Many factors could affect Intel® ’s actual results, and variances from Intel® ’s current expectations regarding such factors could cause actual results to differ materially from those expressed in these forward-looking statements. Intel® presently considers the following to be the important factors that could cause actual results to differ materially from the corporation’s expectations. Ongoing uncertainty in global economic conditions pose a risk to the overall economy as consumers and businesses may defer purchases in response to tighter credit and negative financial news, which could negatively affect product demand and other related matters. Consequently, demand could be different from Intel® 's expectations due to factors including changes in business and economic conditions, including conditions in the credit market that could affect consumer confidence; customer acceptance of Intel® ’s and competitors’ products; changes in customer order patterns including order cancellations; and changes in the level of inventory at customers. Intel® operates in intensely competitive industries that are characterized by a high percentage of costs that are fixed or difficult to reduce in the short term and product demand that is highly variable and difficult to forecast. Additionally, Intel® is in the process of transitioning to its next generation of products on 32nm process technology, and there could be execution issues associated with these changes, including product defects and errata along with lower than anticipated manufacturing yields. Revenue and the gross margin percentage are affected by the timing of new Intel® product introductions and the demand for and market acceptance of Intel® 's products; actions taken by Intel® 's competitors, including product offerings and introductions, marketing programs and pricing pressures and Intel® ’s response to such actions; and Intel® ’s ability to respond quickly to technological developments and to incorporate new features into its products. The gross margin percentage could vary significantly from expectations based on changes in revenue levels; capacity utilization; start-up costs, including costs associated with the new 32nm process technology; variations in inventory valuation, including variations related to the timing of qualifying products for sale; excess or obsolete inventory; product mix and pricing; manufacturing yields; changes in unit costs; impairments of long-lived assets, including manufacturing, assembly/test and intangible assets; and the timing and execution of the manufacturing ramp and associated costs. Expenses, particularly certain marketing and compensation expenses, as well as restructuring and asset impairment charges, vary depending on the level of demand for Intel® 's products and the level of revenue and profits. The current financial stress affecting the banking system and financial markets and the going concern threats to investment banks and other financial institutions have resulted in a tightening in the credit markets, a reduced level of liquidity in many financial markets, and heightened volatility in fixed income, credit and equity markets. There could be a number of follow-on effects from the credit crisis on Intel® ’s business, including insolvency of key suppliers resulting in product delays; inability of customers to obtain credit to finance purchases of our products and/or customer insolvencies; counterparty failures negatively impacting our treasury operations; increased expense or inability to obtain short-term financing of Intel® ’s operations from the issuance of commercial paper; and increased impairments from the inability of investee companies to obtain financing. The majority of our non-marketable equity investment portfolio balance is concentrated in companies in the flash memory market segment, and declines in this market segment or changes in management’s plans with respect to our investments in this market segment could result in significant impairment charges, impacting restructuring charges as well as gains/losses on equity investments and interest and other. Intel® 's results could be impacted by adverse economic, social, political and physical/infrastructure conditions in countries where Intel® , its customers or its suppliers operate, including military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. Intel® 's results could be affected by adverse effects associated with product defects and errata (deviations from published specifications), and by litigation or regulatory matters involving Intel® lectual property, stockholder, consumer, antitrust and other issues, such as the litigation and regulatory matters described in Intel® 's SEC reports. A detailed discussion of these and other risk factors that could affect Intel® ’s results is included in Intel® ’s SEC filings, including the report on Form 10-Q for the quarter ended June 27, 2009. 29 Rev. 2/27/10