Direct3D12 and the future of graphics APIs - Home

Download Report

Transcript Direct3D12 and the future of graphics APIs - Home

DIRECT3D AND THE FUTURE OF
GRAPHICS APIS
Dave Oldcorn, AMD
Dan Baker, Oxide Games
Johan Andersson, EA / DICE
NITROUS AND DX12
Dan Baker
Partner, Oxide Games
2 | AMD Direct3D Futures | March 20th, 2014
HAVEN’T WE BEEN HERE BEFORE?
Goal of DX9
– Remember State blocks?
Goal of DX10
– Large state groups
Goal of DX11
– Deferred contexts
Are we actually getting faster, or are CPUs just faster?
– Quite possible no perf improvements due to API features in 10 years
Maybe adding features isn’t the answer…
3 | AMD Direct3D Futures | March 20th, 2014
DEEPLY ROOTED PROBLEM
 Coding design philosophies clash with real world
 OOP, data hiding, polymorphic design clashes with task-driven, data parallel
 Evident in language trends, striking disconnect between what is considered good code, and what is fast
 Gap has always been there, but has grown in recent years
– 15 years ago, processors often bound by computation
– Now, usually bound by cache misses, serialization, pipeline stalls, etc.
– Multi-Core CPUs are ineffectively utilized
 ‘Heavy Iron’ , e.g. Big Object, Opaque memory is a dead end for performance
 The revolt is beginning in high performance graphics APIS, but will spread
4 | AMD Direct3D Futures | March 20th, 2014
BUT… HOW MUCH FASTER?
Biggest problem with industry today: Acceptance
Only 1 secret in API design: That it can be done.
– And isn’t that hard
– And our code isn’t that ugly
Star Swarm already demonstrating what is possible on a PC
5 | AMD Direct3D Futures | March 20th, 2014
D3D12 FEATURES THAT NITROUS USES
True de-coupled multi-core rendering
– Expecting near linear thread scheduling
Manual Hazard tracking
– Hazards have been resolved already
Memory Heaps
– Bigger chunks of memory pool grouping make management simpler
Descriptor Tables
– Table exposure allows a cheaper way of binding textures
– Allows texture bindings to be shared between non-adjacent batches
6 | AMD Direct3D Futures | March 20th, 2014
WHAT’S DIFFERENT NOW?
API
implemented
Spec Written
Spec
Reviewed
First Engine
use
Released to
public
Then
n
7 | AMD Direct3D Futures | March 20th, 2014
Analysis
done
WHAT’S DIFFERENT NOW?
Start Here
If Ready, exit
here to prep
for release
Discuss
with IHVs,
ISVs
Create
Spec
Implement
Spec
Analyze
Prototype
on Actual
Engines
8 | AMD Direct3D Futures | March 20th, 2014
Now
n
IN THE SPIRIT OF CONTRIBUTING
Oxide proud to announce
that we have a proto-type of
Nitrous running on D3D12
*PR DISCLAIMER* This is
not an official
announcement regarding
D3D12 support
Porting from other modern
APIs is much simpler than
porting from D3D11 to
D3D12
9 | AMD Direct3D Futures | March 20th, 2014
EXPECTED RESULTS
CPU Driver overhead largely put to rest
Huge increases in driver reliability
Huge decreases in frame latency, expecting median frame latency to be
1.5 frames
– Increased perceptual responsiveness
Never a dropped frame or stall due to driver API issues
– *Other OS events could cause stalls
Driver should be far smaller, simpler to implement, IHVs can spend more
time on optimizations
10 | AMD Direct3D Futures | March 20th, 2014
DIRECT3D12 AND THE FUTURE OF
GRAPHICS APIS
Dave Oldcorn, Direct3D12 Driver Architect, AMD
THE PROBLEM
12 | AMD Direct3D Futures | March 20th, 2014
THE PROBLEM
 Mismatch between existing Direct3D and hardware capabilities
– Lots of CPU cores, but only one stream of data
– State communication in small chunks
– “Hidden” work
 Hard to predict from any one given call what the overhead might be
 Implicit memory management
– Hardware evolving away from classical register programming
13 | AMD Direct3D Futures | March 20th, 2014
 Gap between PC ‘raw’ 3D APIs and the
hardware has opened up
 Very high level APIs now ubiquitous; easy to
access even for casual developers, plenty of
choice
 Where the PC APIs are is a middle ground
14 | AMD Direct3D Futures | March 20th, 2014
Capability, ease of use, distance from 3D engine
API LANDSCAPE
Game Engines
Frostbite
Unity
Unreal
BlitzTech
CryEngine
Application
Flash / Silverlight
D3D11
D3D9
OpenGL
D3D7/8
Opportunity
Metal
Console APIs
(register level access)
WHAT ARE THE CONSEQUENCES?
WHAT ARE THE SOLUTIONS?
15 | AMD Direct3D Futures | March 20th, 2014
SEQUENTIAL API
API input
State contributing
to draw
...
Draw
(more, earlier)
Set PS CB
PS CB
Draw x 5
Set VS CB
VS CB
Draw x 3
Set Blend
Blend state
Set PS
PS
Set RT state
RT state
Draw
Draw
Set VS VB
Draw
...
16 | AMD Direct3D Futures | March 20th, 2014
 Sequential API: state for given draw comes from arbitrary
previous time
 Some states must be reconciled on the CPU (“delayed
validation”)
– All contributing state needs to be visible
 GPU isn’t like this, uses command buffers
– Must save and restore state at start and end
THREADING A SEQUENTIAL API
 Sequential API threading
Application simulation
– Simple producer / consumer model
...
Prebuild
Thread 1
Prebuild
Thread 0
 Extra latency
 Buffering has a cost
 More threading would mean dividing tasks on finer grain
– Bottlenecked on application or driver thread
Application Render Thread
Application
Driver Thread
Queued
Buffer 0
Runtime / Driver
Queued
Buffer 1
Queued
Buffer 2
GPU Execution Queue
17 | AMD Direct3D Futures | March 20th, 2014
 Difficult to extract parallelism (Amdahl’s Law)
COMMAND BUFFER API
Application simulation
 GPUs only listen to command buffers
...
Thread 0
Thread 1
Build Cmd
Buffer
Build
Cmd
Buffer
 Let the app build them
– Command Lists, at the API level
 Solves sequential API CPU issues
Application
Runtime / Driver
Queued
Buffer 0
Queued
Buffer 1
GPU Execution Queue
18 | AMD Direct3D Futures | March 20th, 2014
BETTER SCHEDULING
 App has much more control over scheduling work
– Both CPU side and GPU
D3D11: CB building threads tend to interfere
Create thread
Driver thread
 Threads don’t really share much resource
D3D12: CB building threads more independent
 Many more options for streaming assets
Create thread
Build threads
GPU load still added but only after queuing
Create work
Render work
GPU executes
19 | AMD Direct3D Futures | March 20th, 2014
PIPELINE OBJECTS
 Pipeline objects get rid of JIT and enable LTCG for GPUs
 Decouple interface and implementation
Index
Process
?
Primitive
Generation
 We’re aware that this is a hairpin bend for many graphics
engines to negotiate.
– Many engines don’t think in terms of predicting state up
front
VS
?
Rasteriser
– The benefits are worth it
Simplified dataflow
through pipeline
PS
?
Rendertarget
Output
20 | AMD Direct3D Futures | March 20th, 2014
RENDER OBJECT BINDING MISMATCH
On-chip
root table
(1 per stage)
GPU Memory
SRD table
Pointer to table
(here, textures)
SR
GPU Memory
resource
Pointer to (+ params
of) resource
 Hardware uses tables in video memory
 BUT still programmed like a register solution
– So one bind becomes:
 Allocate a new chunk of video memory
 Create a new copy of the entire table
 Update the one entry
 Write the register with the new table base
address
CB
Pointer to table
(constant buffers)
21 | AMD Direct3D Futures | March 20th, 2014
DESCRIPTOR TABLES
 Several tables of each type of resource
On-chip
table
SR.T[0]
Pointer to table
(textures table 0)
GPU Memory
SRD table
– Easy to divide up by frequency
SR.T[0][0]
SR.T[1]
SR.T[0][1]
SR.T[2]
SR.T[0][2]
 Tables can be of arbitrary size; dynamically indexed to
provide bindless textures
SR.T[3]
UAV
 Changing a table pointer is cheap
Samp
CB.T[0]
CB.T[1]
CB.T[1][0]
Pointer to table
(constbuf table 1)
CB.T[1][1]
22 | AMD Direct3D Futures | March 20th, 2014
 Updating a descriptor in a table is not
KEY INNOVATIONS
Innovation
CPU-side win
GPU-side win
Command buffers
Build on many threads
Control of scheduling
Lower latency
Simplified state tracking
Pipeline state objects
Link at create time
No JIT shader compiles
Efficient batched updates
Cheaper state updates
Enables LTCG
Bind objects in groups
Cheap to change group
Cheap to change group
Fits hardware paradigm
Move work to Create
Predictability
Enables optimisations
23 | AMD Direct3D Futures | March 20th, 2014
KEY INNOVATIONS
Innovation
CPU-side win
GPU-side win
Explicit Synchronisation
Efficiency
Required for bindless textures
Less overhead
Explicit Memory
Management
Efficiency
Predictability
Application flexibility
Zero copy
Control over placement
Do less
24 | AMD Direct3D Futures | March 20th, 2014
Predictability, Efficiency
Enables aggressive schedule
FEWER BUGS
NEW PROBLEMS
(AND TIPS TO SOLVE THEM)
25 | AMD Direct3D Futures | March 20th, 2014
NEW VISIBLE LIMITS
 More draws in does not automatically mean more
triangles out
– You will not see full rendering rates with triangles
averaging 1 pixel each.
– Wireframe mode should look different to filled
rendering
26 | AMD Direct3D Futures | March 20th, 2014
NEW VISIBLE LIMITS
 Feeding the GPU much more efficiently means exploring interesting new limits that weren’t visible before
 10k/frame of anything is ~1µs per thing.
 GPU pipeline depth is likely to be 1-10µs (1k-10k cycles).
 Specific limit: context registers
– Shader tables are NOT in the context
– Compute doesn’t bottleneck on context
27 | AMD Direct3D Futures | March 20th, 2014
APPLICATION IN CHARGE
 Application is arbiter of correct rendering
– This is a serious responsibility
– The benefits of D3D12 aren’t readily available without this condition
Applications must be warning-free on the debug layer
 Different opportunities for driver intervention
28 | AMD Direct3D Futures | March 20th, 2014
APPLICATION IN CHARGE
 No driver thread in play
– App can target much lower latency
– BUT implies app has to be ready with new
GPU work
D3D11: No dead GPU time after 1st frame (but extra latency)
App Render
Frame 1
Frame 2
First work sent to driver
Driver
GPU
Dead
Time
Frame 3
Driver buffers Present; no future dead time
F2
F1
F1
F3
F2
F3
No buffered present reveals dead time on GPU
29 | AMD Direct3D Futures | March 20th, 2014
USE COMMAND BUFFERS SPARINGLY
Multiple applications running on system
 Each API command list maps to a single hardware
command buffer
Application 0 queue
CB0
CB1
CB2
 Starting / ending a command list has an overhead
– Writes full 3D state, may flush caches or idle GPU
Application 1 queue
CB0
 We think a good rule of thumb will be to target around 100
command buffers/frame
GPU executes
CB0
CB1
– Use the multiple submission API where possible
CB0
30 | AMD Direct3D Futures | March 20th, 2014
CB2
ROUND-UP
31 | AMD Direct3D Futures | March 20th, 2014
ALL-NEW
 There’s a learning curve here for all of us
 In the main it’s a shallow one
– Compared at least to the general problem of multithreaded rendering
 Multithread is always hard.
– Simpler design means fewer bugs and more predictable performance
32 | AMD Direct3D Futures | March 20th, 2014
WHAT AMD PLAN TO DELIVER
 An early preview driver “soon”
 Release driver for Direct3D12 launch
 Continuous engagement
– With Microsoft
– With ISVs
 Bring your opinions to us and to Microsoft.
33 | AMD Direct3D Futures | March 20th, 2014
DX12 AND FROSTBITE
Johan Andersson
Technical Director
34 | AMD Direct3D Futures | March 20th, 2014
DX12 AND FROSTBITE
 PC is very important for EA and we’ve been pushing hard to improve graphics capabilities on Windows
 Excited to be working with Microsoft and the IHVs on Direct3D again!
 Good & very healthy collaboration between Microsoft, the IHVs and us game/engine developers
 DX12 is a really big step forward from DX11 or GL4
35 | AMD Direct3D Futures | March 20th, 2014
DX12 FEATURES AND FROSTBITE
 Key DX12 features that are a great fit for Frostbite:
– Efficient parallel command buffers
– Descriptor tables
– Pipeline objects
– Explicit resource synchronization
– Explicit memory management
 DX12 is still in development so actively working with Microsoft & the IHVs to help make sure all of it fits
together and is efficient
36 | AMD Direct3D Futures | March 20th, 2014
DX12 PLATFORMS
 DX12 support on Windows 7 & most existing PC hardware is critical for us
– Huge user base still on Windows 7
– Gamers would see major benefits without upgrading
 DX12 support on Xbox One is critical for us
– Will lead to improved performance & quality for future Xbox One titles
– Almost all of our games are cross platform Gen4/PC
– Easier development – renderer is shared between Windows & Xbox One
 Looking forward to DX12 on mobile/tablets
– Power efficiency & low overhead is really key
– Need larger user base to target on Windows for mobile
37 | AMD Direct3D Futures | March 20th, 2014
DX12 AND FROSTBITE
 We are building a DX12 renderer for Frostbite!
– Will work on GPUs from all vendors – benefits a wide set of gamers
 Expected benefits over DX11:
– More stable and consistent performance
– Higher overall performance
– Move our design target – more richer & more detailed game worlds
– Thinner drivers – easier to work with / less of a black box
– More control for us developers – new techniques & optimizations
 Really happy that the full Windows & Xbox eco systems are moving to low-level graphics API!
38 | AMD Direct3D Futures | March 20th, 2014
QUESTIONS
39 | AMD Direct3D Futures | March 20th, 2014