Tuning Your Game for Next Generation Intel HD Graphics Chuck Desylva Jeff Laflam Friday, March 12 @ 1:30PM Room 302, South Hall.

Download Report

Transcript Tuning Your Game for Next Generation Intel HD Graphics Chuck Desylva Jeff Laflam Friday, March 12 @ 1:30PM Room 302, South Hall.

Tuning Your Game for Next Generation
Intel HD Graphics
Chuck Desylva
Jeff Laflam
Friday, March 12 @ 1:30PM
Room 302, South Hall
http://www.intel.com/software/gdc
[email protected]
[email protected]
See Intel at GDC:
Intel Booth at Expo, North Hall
Intel Interactive Lounge
2
Client PC Graphics Market Trends
2010 Intel® HD Graphics Architecture
Enhancements
Demo + Case Studies
Optimization Guidelines
for Intel HD Graphics
3
250
Mobile
Integrated
Mobile Integrated
MSS
Millions Units
200
Desktop
Integrated
150
100
Mobile Discrete
Intel 78.8%
Desktop
Integrated MSS
50
0
2008 2009 2010 2011 2012 2013
Desktop
Discrete
Intel 64.1%
Source: Mercury Research PC Graphics Report (Q4’09)
4



32nm
Processor
Core
First implementation of CPU/Graphics
Engine in one package.
Evolution of the Intel® GMA HD Graphics
Architecture
-
3D Performance Improvements
Full Frame rate Blu-ray Playback
Takes Advantage of 45nm Process
- Reduced dynamic capacitance, voltage scaling
and HiK+MG  lower dynamic as well as lower
leakage power

Single Cooling solution for both dies
- Reduces cost to the OEM
- Reduces overall system power requirements
45nm
Graphics &
Memory
Controller
Not all features are available on every processor line item
5
Westmere is the codename for the family of 32nm processors based upon the Intel microarchitecture codename Nehalem.
Vertex and primitive processing
• Improvements in HWVP (>2X
peak)
• Improved Vertex throughput
• Increased # of threads for VS
~1.4X Increase in Texture
Throughput
• Much faster Clip/Cull as well as
Setup
~1.4X Increase in Pixel
Throughput
Rasterization and Z
Optimizations
6
• Now includes
Hierarchical-Z which
provides compression for
completely occluded pixels
>1.5X Increase in Peak Computes including
transcendental instructions # of EUs;
• Fast Z Clear possible at
4X the normal peak
throughput with
Hierarchical-Z enabled
Improved Support for Complex Shaders
Improved IPC; Higher Frequency
More threads/EU; Larger register file/EU,
and Increased Instruction Cache
First we tried turning off any z-prepass tests when
optimizing for Intel graphics..
7
8
• As it turns out, Cryptic was enabling
an Uber-shader for lighting.
• Many code guards controlled
by boolean constants.
• Up front shader compilation
was not as optimal as runtime
compilation.
• Benefit to them is that they only
need one shader per material.
• By turning off the Uber-shader our
performance more than doubled.
• A driver sighting was opened and
is under investigation.
• Let us help you, because
sometimes you help us!
9
uniform bool Light Param00:register(b0);
uniform bool LightParam01:register(b1);
if (LightBoolLightParam00)
{
if (LightBoolParam01) {
// spot light code
} else {
// point light code
}
} else {
if (LightBoolParam01) {
// directional light code
} else {
// light disabled
}
}
10
• Approach for finding the root cause of slow performance
• Walk the scene
• Use the scene
Ceiling performance?
Wall performance?
Ground performance?
This wall is the key…
11
• GPA Overrides…
your friends
• Z-Test
Enable/Disable
• All the “others”
Culling?
Alpha?
Beta?
Look what we found behind the wall…
12
• System Analyzer
• Overrides (NULL HW DRV)
• GPU Time
• CPU Time
• Frame Analyzer
• Draw Calls
• Multi-pass
• Spikes
How many draw calls for that wall?…
13
• Pixel Height Test
• Bounding Spheres
• Pixel Coverage
• CPU side test
• Scale of Occlusion
Culling
I
M
A
G
E
P
L
A
N
E
Camera
Position
Scene Object
Pixel Occupancy of the object
(sphere)
I
M
A
G
E
Camera
Position
P
L
A
N
E
I can’t see it anyway…
14
• Efficiently keep compute units busy
• To avoid stalling the pipeline
• Minimize Runtime and Driver Overhead
• Scale visual effects and redundant computes
that don’t render.
• Optimize Pixel/Texel Operations.
15
CMD Buffer
DMA Share
1.CopyResource
Between Driver
Render
& Graphics Engine
Command
Command
Staging Resource
Copy output
2. App (CPU) …Map() Resource
3. CPU Stall Until Flush
• Graphics Driver stores
Asychronous D3D Calls in
CMD Buffer.
• If application issues
CopyResource/CopySubresou
rceRegion it gets mapped in
the CMD Buffer.
• If the App then tries to Map()
the resource that was the
target of the Copy call the
CMD Buffer gets flushed.
• The CPU then stalls waiting on
the GE CMD Buffer to empty.
• The App (CPU) will then begin
accessing the resource while
the GE sits Idle.
STUTTERING
CPU CPU CPU CPU
F0
CPU
F1
F2
F3
CPU
CPU
F4
F5
GPU
GPU
GPU
GPU
GPU
F0
F1
F2
F3
F4
STALL
F0
CPU
STALL
F1
CPU
GPU
GPU
GPU
F0
F1
F2
N-2 SYNCH
CPU
CPU
CPU
F0
F1
F2
F3
GPU
GPU
GPU
F0
F1
F2
// 1. Create an event query from the current
device.
IDirect3DQuery9* pEvent;
m_pD3DDevice->CreateQuery(D3DQUERYTYPE_EVENT,
&pEvent);
F2
CPU
• This effect can be referred to
as Stuttering.
• To avoid this effect some
implement “Frame rate
smoothing”.
// 2. Add an end marker to the command buffer
queue.
pEvent->Issue(D3DISSUE_END);
// 3. Empty the command buffer and wait until
the GPU is idle.
while(S_FALSE == pEvent->GetData( NULL, 0,
D3DGETDATA_FLUSH ));
• The best solution is to call
CopyResource at Frame N,
which executes at Frame N+1.
The copy should be finished
when the app is processing
N+2.
• Put some time between locks
by sychronizing N-2 frames.
•
•
•
•
•
•
18
For best EU Utilization minimize
register usage
• Large shaders impact
performance when register usage
is limited.
• Mask alpha when not needed.
Minimize use of transcendentals like
LOG, POW, EXP etc.
• Space out must have ops where possible
Pre-load Shaders to avoid MidScene Compiles
Avoid Mid-Scene textures changes
Minimize Geometry shader usage.
Experiment with Texture Sampling
calls.
• The Intel Driver optimizes for DX9 the most
frequently used constants.
– Avoid global constants where possible.
• Limit Dynamically Indexed Constants C[a0] C[r] of
Dx9/10.
• In DX10 when a constant changes the complete
buffer gets updated.
– Group cbuffers by frequency of updates.
– Organize cbuffers based on feature scaling.
– Pack data into float4 boundaries.
19
http://software.intel.com/en-us/articles/directx-constants-optimizations-for-intel-integrated-graphics/
• Use large batches (i..e. >200-1000 primitives) ideally.
• To minimize the number of state changes between
batches:
• Optimize the number of draw calls per frame. The
more you have the higher the likelihood that you will
be CPU limited.
• If Small batches are needed, use Instancing for higher
performance.
• Keep in mind that some instancing methods use
Geometry shaders.
http://software.Intel® .com/en-us/articles/rendering-grass-withinstancing-in-directx-10/
20
•
One of the areas that are heavy on computes are full
screen visual effects.
•
Specifically per pixel post-processing that requires multiple passes
• Balance visual quality with speed by reducing complexity
of the shaders and/or the number of passes. Some
candidates to check for:
• Glow/Bloom
• Depth of Field
• Motion Blur
• HDR/Tone Mapping
• Heat Distortion
• Atmospheric Effects
• Dynamic Ambient Occlusion
21
• Reduce LOD resolution for
distant objects
 Reject objects outside the view
frustum by doing visibility
check
 Cull objects using Occlusion
Query for complex scenes
 Maximize Use of Hi-Z and
Early-Z
- Render front to back where
possible
22
 Minimize MRT Usage.
 Avoid proprietary texture formats or formats






23
outside of the Dx Spec.
Balance Texture load instructions with
arithmetic instructions where possible
Reduce number texture fetches for Low
Fidelity modes
Minimize use of large textures
Use Compressed Textures with mip-maps
Implement Shadows as a scalable feature
Clear Color, Stencil and Z-Buffer in the
same API call
 Integrated Graphics Software Development Forum
- http://softwarecommunities.intel.com/isn/Community/enUS/forums/2414/ShowForum.aspx
 Developers Guide for Intel® Integrated Graphics
- http://software.intel.com/en-us/articles/intel-graphics-media-


24
accelerator-developers-guide
Intel® Graphics Performance Analyzer
- www.intel.com/software/gpa
Articles Mentioned in this Presentation
- http://software.intel.com/en-us/articles/rendering-grasswith-instancing-in-directx-10
- http://software.intel.com/en-us/articles/directx-constantsoptimizations-for-intel-integrated-graphics/
Don’t Dread Threads
Thursday, March 11 @ 9AM
North 122, North Hall
A Visual Guide to Game and Task Performance on Mass-market PC
Game Platforms
Thursday, March 11 @ 4:30PM
North 122
Building Games for Netbooks
Friday, March 12 @ 9AM
Room 310, South Hall
Task-based Multithreading – How to Program for 100 Cores
Friday, March 12 @ 4:30PM
Room 300, South Hall
25
26
• Efficiently keep compute units busy
• To avoid stalling the pipeline
• Minimize Runtime and Driver Overhead
• Scale visual effects and redundant computes
that don’t render.
• Optimize Pixel/Texel Operations.
27
Risk Factors
The above statements and any others in this document that refer to plans and expectations for the third quarter, the year and the
future are forward-looking statements that involve a number of risks and uncertainties. Many factors could affect Intel® ’s actual
results, and variances from Intel® ’s current expectations regarding such factors could cause actual results to differ materially from
those expressed in these forward-looking statements. Intel® presently considers the following to be the important factors that
could cause actual results to differ materially from the corporation’s expectations. Ongoing uncertainty in global economic
conditions pose a risk to the overall economy as consumers and businesses may defer purchases in response to tighter credit and
negative financial news, which could negatively affect product demand and other related matters. Consequently, demand could be
different from Intel® 's expectations due to factors including changes in business and economic conditions, including conditions in
the credit market that could affect consumer confidence; customer acceptance of Intel® ’s and competitors’ products; changes in
customer order patterns including order cancellations; and changes in the level of inventory at customers. Intel® operates in
intensely competitive industries that are characterized by a high percentage of costs that are fixed or difficult to reduce in the short
term and product demand that is highly variable and difficult to forecast. Additionally, Intel® is in the process of transitioning to its
next generation of products on 32nm process technology, and there could be execution issues associated with these changes,
including product defects and errata along with lower than anticipated manufacturing yields. Revenue and the gross margin
percentage are affected by the timing of new Intel® product introductions and the demand for and market acceptance of Intel® 's
products; actions taken by Intel® 's competitors, including product offerings and introductions, marketing programs and pricing
pressures and Intel® ’s response to such actions; and Intel® ’s ability to respond quickly to technological developments and to
incorporate new features into its products. The gross margin percentage could vary significantly from expectations based on
changes in revenue levels; capacity utilization; start-up costs, including costs associated with the new 32nm process technology;
variations in inventory valuation, including variations related to the timing of qualifying products for sale; excess or obsolete
inventory; product mix and pricing; manufacturing yields; changes in unit costs; impairments of long-lived assets, including
manufacturing, assembly/test and intangible assets; and the timing and execution of the manufacturing ramp and associated costs.
Expenses, particularly certain marketing and compensation expenses, as well as restructuring and asset impairment charges, vary
depending on the level of demand for Intel® 's products and the level of revenue and profits. The current financial stress affecting
the banking system and financial markets and the going concern threats to investment banks and other financial institutions have
resulted in a tightening in the credit markets, a reduced level of liquidity in many financial markets, and heightened volatility in fixed
income, credit and equity markets. There could be a number of follow-on effects from the credit crisis on Intel® ’s business,
including insolvency of key suppliers resulting in product delays; inability of customers to obtain credit to finance purchases of our
products and/or customer insolvencies; counterparty failures negatively impacting our treasury operations; increased expense or
inability to obtain short-term financing of Intel® ’s operations from the issuance of commercial paper; and increased impairments
from the inability of investee companies to obtain financing. The majority of our non-marketable equity investment portfolio balance
is concentrated in companies in the flash memory market segment, and declines in this market segment or changes in
management’s plans with respect to our investments in this market segment could result in significant impairment charges,
impacting restructuring charges as well as gains/losses on equity investments and interest and other. Intel® 's results could be
impacted by adverse economic, social, political and physical/infrastructure conditions in countries where Intel® , its customers or its
suppliers operate, including military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns
and fluctuations in currency exchange rates. Intel® 's results could be affected by adverse effects associated with product defects
and errata (deviations from published specifications), and by litigation or regulatory matters involving Intel® lectual property,
stockholder, consumer, antitrust and other issues, such as the litigation and regulatory matters described in Intel® 's SEC reports.
A detailed discussion of these and other risk factors that could affect Intel® ’s results is included in Intel® ’s SEC filings, including
the report on Form 10-Q for the quarter ended June 27, 2009.
29
Rev. 2/27/10