CPU Efficiency

Download Report

Transcript CPU Efficiency

DirectX 12
Advanced Graphics and Performance
Max McMullen
Direct3D Development Lead
Microsoft
It’s been a busy year…
• API is largely complete, with working drivers
• Over 50% of gamers have DirectX 12 hardware
• Massive industry support: Early Access, Engines, Titles
• 1 yr free upgrade to Windows 10 from Windows 7, 8.x
• And now…
*Based on Steam survey
Agenda
• Refresh on Direct3D 12
• New Feature Levels
• Unity on Direct3D 12
• CPU & GPU Performance Improvements
• Fable – 11 versus 12
Direct3D 12 API
• Reduce CPU overhead
• Increase scalability across multiple CPU cores
• Greater developer control
• Console level API efficiency and performance
• Superset of D3D 11 rendering functionality
CPU Overhead and Multithread Improvements
• Pipeline state objects
• Explicit resource binding management
• Flexible pipeline parameterization
• Explicit CPU/GPU synchronization
• Command Reuse
Pipeline State Objects
D3D
Vertex
Vertex
Shader
Shader
D3D
HS/DS/…
Rasterizer
D3D
Pixel
Pixel
Shader
Shader
Pipeline
State
Object
HW State 1
HW State 2
HW State 3
D3D
Blend
Blend
State
State
Explicit Resource Binding Management
Descriptor
Descriptor
{
Type
Format
Mip Count
pData
}
Descriptor Heap
Descriptor Table
Start Index
Size
Resource Binding Tiers
Max Descriptor Heap
CBV/SRV/UAVs
Max CBVs per stage
Max SRVs per stage
Max UAVs in all stages
Max Samplers per stage
Max SRV Descriptor Tables
Tier 1
220
Tier 2
220
Tier 3
220+
14
128
8
16
5
14
full heap
64
full heap
5
full heap
full heap
full heap
full heap
no limit
Binding Tiers in the D3D 12 Market
Tier 3
17%
Tier 2
44%
Tier 1
39%
Explicit Resource Binding: Hazard Resolution
• Resource hazards
•
•
•
•
Render Target to/from Texture
Copy Source to/from Copy Destination
Tiled Resource Aliasing
etc…
• ResourceBarrier API to resolve hazards
Flexible Pipeline Parameterization
• Two parts: Root Signature and Root Arguments
• Contains constants, descriptors, and descriptor tables
• Leverage hardware specific registers and pipelined
renaming paths for highest frequency parameters
• Remove indirection from a constant descriptor index
to an explicit descriptor
Explicit CPU/GPU synchronization
• Application responsible to manage CPU & GPU race-conditions
• Synchronization primitive is a fence
• Application chooses granularity of synchronization
• One increment per-frame is well amortized
• Increment per command list submission possible
New Feature Levels
Direct3D 12
New Rendering Features
• Conservative Rasterization
• ROVs
• Typed UAV Loads
• Tiled Resources Tier 3: Volumes
• PS Specified Stencil Ref
New Feature Levels
• Feature Level 12.0
• Resource Binding Tier 2
• Tiled Resources Tier 2: Texture2D
• Typed UAV Tier 1
• Feature Level 12.1
• Conservative Rasterization Tier 1
• ROVs
Unity on Direct3D 12
Kasper Engelstoft
Unity Graphics Engineer
Direct3D 12 in Unity
• Porting experience
• Case study: multithreaded shadow rendering
• What’s next for D3D 12 in Unity?
D3D 12 porting experience
•
•
•
•
•
Started porting in September with SDK1
After 2 weeks, we had something rendering
In October, SDK2 API changes hit...
Mid-January 95% of our tests were passing
Then SDK3 hit...
D3D 12 optimization case study
• Multi-threading shadow map rendering
• Move work away from main thread
• Generate d3d cmd lists for each of the shadow
maps on their own worker threads
• Cmd lists executed in parallel with the main
scene cmd list building
Why shadow maps?
• Rendered before the main scene
• Simple render loop
• Extracting receivers & casters is quite CPU
intensive
• The shadow jobs don’t require waiting until
ID3D12CommandList needs to be executed
Before
After
Future D3D 12 work
• Prerecorded command bundles
• One bundle per material pass
• Bundles for standard operations
•
Mipmap generation
• Use shader model 5.1 features
CPU & GPU
Performance Improvements
Direct3D 12
Shader Cache
• Redundant compilation from IL to hardware specific instructions
• Optimize startup and level load times, reduce glitches
Heavy shader compilation
during start-up
Heavy shader compilation
during level-load
Time (s)
start-up
menu
level load
play
Shader Cache
• Frames typically have 200 to 400 Pipeline State Objects
• Long traces typically have 300 to 1000 Pipeline State Objects
• Cache operates on fully compiled PSOs, not individual shader
stages
• Serialization and deserialization under developer control
ExecuteIndirect
• Replacement for DrawIndirect and DispatchIndirect
• Can perform multiple draws with a single API call
• Number of draws can be controlled by CPU or GPU
• Can even change bindings between draw calls
• Works on all 12 hardware from FL 11.0 and up
ExecuteIndirect Command Signature
• Operations performed by ExecuteIndirect described by a command
signature
• Describes the layout of the argument buffer and the set of commands
• Operations include:
•
•
•
•
Set vertex or index buffer
Change root constants
Set root resource views (SRV, UAV, CBV)
Draw, DrawIndexed, or Dispatch
ExecuteIndirect versus Draw Loop
for (UINT drawIdx = drawStart; drawIdx < drawEnd;
++drawIdx)
mCmdLst->SetGraphicsRootDescriptorTable(RT_SRV,
mTextureStart);
{
// Set bindings
cmdLst->SetGraphicsRootConstantBufferView(RT_CBV,
constantsPointer);
constantsPointer += sizeof(DrawConstantBuffer);
auto textureSRV =
textureStartSRV.MakeOffsetted(staticData->textureIndex,
handleIncrementSize);
cmdLst->SetGraphicsRootDescriptorTable(RT_SRV,
textureSRV);
cmdLst->DrawIndexedInstanced(dynamicData>indexCount, 1, dynamicData->indexStart, staticData>vertexStart, 0);
}
mCmdLst->ExecuteIndirect(mCommandSignature,
settings.numAsteroids, frame->mIndirectArgBuffer>Heap(), 0, nullptr, 0);
ExecuteIndirect Demo
Intel’s Asteroids Demo Updated
ExecuteIndirect Demo
CPU
GPU
11
39.19 ms
34.81 ms
12
12 Bindless
33.41 ms 28.77 ms
12.85 ms 11.86 ms
12 ExecuteIndirect
5.69 ms
10.59 ms
Flexible Predication and Queries
• Predicates & Queries are now an explicit resource creation on GPU
accessible heaps
• Rendering operations can be predicated based on arbitrary
computation performed by the CPU or GPU
• Resolve operation transforms hardware specific query representation
into standardized buffer contents
• Apps that have lots of occlusion queries per frame will see improved
performance due to bulk resolves
Multiengine
• Expose multiple parallel queues as
explicit API objects
• Queue Types: 3D, Compute, Copy
• Prioritized queues enable new
scenarios
3D
Compute
• High priority, latency sensitive workloads
• Low priority background tasks
Copy
Multiengine
3D Queue
Render
Render
Compute
Copy Queue
Stream textures
Signal
Fence
1
Wait
Fence
1
Render
Multiengine
3D Queue
Render
Wait
Fence
1
Render
Copy Queue
Stream textures
Signal
Fence
1
Compute
Render
Multiengine Demo
Compute and Copy Scenario Test
UAV Barriers
• In D3D11 all UAV accesses in 1 Draw/Dispatch must complete before
any UAV accesses in a subsequent Draw/Dispatch
• This results in idle GPU shader cores for small Draw/Dispatch
• In D3D12 UAV accesses in multiple Draw/Dispatch are truly
unordered, applications must use an explicit barrier to enforce
ordering
• D3D12 – putting the “U” back in UAV
UAV Barriers
Direct3D 11
Draw+UAV
Wait for
Idle
Dispatch
Wait for
Idle
Draw+UAV
Dispatch
Draw+UAV
UAV
Barrier
Draw+UAV
Direct3D 12
Draw+UAV
Wait for
Idle
Draw+UAV
UAV Barrier – Fable A/B Demo
Fable: 11 versus 12
Summary
• Dramatically reduced CPU overhead
• Great multithreaded scalability
• Expose new GPU capabilities
• Increase GPU performance
• Greater developer control
Resources – Previous Talks
• IDF 2014:
https://intel.lanyonevents.com/sf14/connect/sessionDetail.ww?SESSI
ON_ID=1315
• GDC 2014/Build 2014:
http://channel9.msdn.com/Events/Build/2014/3-564
Resources
• Check our booths and quick start challenge at the Expo
• Join early access: http://1drv.ms/1dgelm6
• Upcoming GDC 2015 Talks:
• DirectX Tools: http://schedule.gdconf.com/session/solve-the-tough-graphicsproblems-with-your-game-using-directx-tools-presented-by-microsoft
• Direct3D 12 Power & Performance:
http://schedule.gdconf.com/session/better-power-better-performance-yourgame-on-directx12-presented-by-microsoft
• And several talks by hardware partners…
© 2015 Microsoft Corporation.
All rights reserved. Microsoft, Xbox, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information
herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing
market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date
of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.