Mantle for Developers - 12 MB

Transcript Mantle for Developers - 12 MB

MANTLE FOR DEVELOPERS
JOHAN ANDERSSON – TECHNICAL DIRECTOR
FROSTBITE
ELECTRONIC ARTS
Mantle?
Simplify advanced development
 Improve performance
 Enable developers to innovate
 Challenge the status quo
Developer impact areas
Control
CPU performance
Programmability
GPU performance
Platforms
Control
New model
Traditional Model:
Black Box
Explicit Model:
Mantle
 Middle-ground abstraction – compromise
between performance & “usability”
 Thin low-level abstraction to expose how
hardware works
 Hidden resource memory & state
 App explicit memory management
 Resource CPU access tied to device context
 Resources are globally accessible
 Driver analyzes & synchronizes implicitly
 App explicit resource state transitions
Control
App responsibility
 Tell when render target will be used as a texture
‒ And many more resource state transitions
 Don’t destroy resources that GPU is using
‒ Keep track with fences or frames
 Manual dynamic resource renaming
‒ No DISCARD for driver resource renaming
 Resource memory tiling
 Powerful validation layer will help!
Control
Explicit control enables
 App high-level decisions & optimizations
‒ Has full scene information
‒ Easier to optimize performance & memory
 Flexible & efficient memory management
‒ Linear frame allocators
‒ Memory pools
‒ Pinned memory
 Reduced development time
‒ For advanced game engines & apps
‒ Easier to get to target performance & robustness
Control
Explicit control enables
 Transient resources
‒ Alias render targets within frame
‒ Major memory savings
‒ No need to pre-allocate everything
 Light-weight driver
‒ Easier to develop & maintain
‒ Reduced CPU draw call overhead
Control
CPU performance
CPU perf
Core concepts
 Descriptor sets
 Monolithic pipelines
 Command buffers
CPU perf
Descriptor sets
 Table with resource references to bind to
graphics or compute pipeline
Image
Memory
Sampler
Link
 Replaces traditional resource stage binding
‒ Major performance & flexibility advantage
‒ Closer to how the hardware works
 Example 1: Single simple dynamic descriptor set
‒ Bind everything you need for a single draw call
‒ Close to DX/GL model but share between stages
Dynamic descriptor set
VertexBuffer (VS)
Texture0 (VS+PS)
Constants (VS)
Texture1 (PS)
 App managed - lots of strategies possible!
‒ Tiny vs huge sets
‒ Single vs multiple
‒ Static vs semi-static vs dynamic
Texture2 (PS)
Sampler0 (VS+PS)
CPU perf
Descriptor sets
 Table with resource references to bind to
graphics or compute pipeline
Image
 Example 2: Reuse static set with nesting
‒ Reduce update time & memory usage
Memory
Static descriptor set
Sampler
Link
Dynamic descriptor set
 Replaces traditional resource stage binding
‒ Major performance & flexibility advantage
‒ Closer to how the hardware works
Constants (VS)
Link
VertexBuffer (VS)
Texture0 (VS+PS)
Texture1 (PS)
Texture2 (PS)
Texture3 (PS)
 App managed - lots of strategies possible!
‒ Tiny vs huge sets
‒ Single vs multiple
‒ Static vs semi-static vs dynamic
Texture4 (PS)
Sampler0 (VS+PS)
Sampler1 (PS)
CPU perf
Monolithic pipelines
 Shader stages & select graphics state combined into single object
‒ No runtime compilation or patching needed!
‒ Significantly less runtime overhead to use
Pipeline state
 Supports parallel building & caching
‒ Fast loading times
 Usage & management up to the app
‒ Static vs dynamic creation
‒ Amount of pipelines
‒ State usage
IA
DB
VS
HS
DS
Tessellator
GS
RS
PS
CB
CPU perf
Command buffers
 Issue pipelined graphics & compute commands into a command buffer
‒ Bind graphics state, descriptor sets, pipeline
‒ Draw calls
‒ Render targets
‒ Clears
‒ Memory transfers
‒ NOT: resource mapping
 Fully independent objects
‒ Create multiple every frame
‒ Or pre-build up front and reuse
CPU perf
DX/GL parallelism
CPU 0
CPU 1
CPU 2
Game
Game
Game
Render
Render
Driver Render
 Automatically extracts parallelism out of most apps 
 Doesn’t scale beyond 2-3 cores 
 Additional latency 
 Driver thread often bottleneck – can collide app threads 
Render
CPU perf
Parallel dispatch with Mantle
CPU 0
Game
Game
Game
CPU 1
Render
Render
Render
CPU 2
Render
Render
Render
CPU 3
Render
Render
Render
CPU 4
Render
Render
Render
 App can go fully wide with its rendering – minimal latency 
 Close to linear scaling with CPU cores 
 No driver threads – no overhead – no contention 
 Frostbite’s approach on all consoles – and on PC with Mantle! 
CPU performance
GPU performance
GPU perf
GPU optimizations
 Thanks to improved CPU performance – CPU
will rarely be a bottleneck for the GPU
‒ CPU could help GPU more:
‒ Less brute force rendering
‒ Improve culling
 Resource states
‒ Gives driver a lot more knowledge & flexibility
‒ Apps can avoid expensive/redundant transitions,
such as surface decompression
 Expose existing GPU functionality
 Shader pipeline object – driver optimizations
‒ Can optimize with pipeline state knowledge
‒ Can optimize across all shader stages
‒ Quad & Rect-lists
‒ HW-specific MSAA & depth data access
‒ Programmable sample patterns
‒ And more..
GPU perf
Queues
 Modern GPUs are heterogeneous machines
with multiple engines
Graphics
‒ Graphics pipeline
‒ Compute pipeline(s)
‒ DMA transfer
‒ Video encode/decode
‒ More…
 Mantle exposes queues for the engines +
synchronization primitives
Compute
DMA
...
Queues
GPU
GPU perf
Queues
Graphics
Compute
DMA
...
Queues
GPU
GPU perf
Queue use cases
 Async DMA transfers
‒ Copy resources in parallel with graphics or
compute
Copy
DMA
Graphics
Render
Other render
Use copy
GPU perf
Queue use cases
 Async DMA transfers
‒ Copy resources in parallel with graphics or
compute
 Async compute together with graphics
‒ ALU heavy compute work at the same time as
memory/ROP bound work to utilize idle units
Compute
Graphics
GBuffer
Non-shadowed lighting
Shadowmap 0
Shadowmap 1
Final lighting
GPU perf
Queue use cases
 Async DMA transfers
 Multiple compute kernels collaborating
‒ Copy resources in parallel with graphics or
compute
‒ Can be faster than über-kernel
‒ Example: Compute geometry backend & compute
rasterizer
 Async compute together with graphics
‒ ALU heavy compute work at the same time as
memory/ROP bound work to utilize idle units
Compute 0
Compute 1
Graphics
Compute Geometry
Compute Rasterizer
Ordinary Rendering
GPU perf
Queue use cases
 Async DMA transfers
 Multiple compute kernels collaborating
‒ Copy resources in parallel with graphics or
compute
‒ Can be faster than über-kernel
‒ Example: Compute geometry backend & compute
rasterizer
 Async compute together with graphics
‒ ALU heavy compute work at the same time as
memory/ROP bound work to utilize idle units
Compute
Graphics
 Compute as frontend for graphics pipeline
‒ Compute runs asynchronously ahead and prepares
& optimizes geometry for graphics pipeline
 Game engines
will buildProcess0
large GPU job graphs
Process0
Process1
‒ Move away from single sequential submission
Draw0
‒ Just as we
already have doneDraw1
on CPU
Draw2
GPU performance
Programmability
Programmability
Explicit Multi-GPU
 Explicit control of GPU queues and synchronization, finally!
‒ Implement your own Alternate-Frame-Rendering
‒ Or something more exotic..
 Use case: Workstation rendering with 4-8 GPUs
‒ Super high-quality rendering & simulation
‒ Load balance graphics & compute job graphs across GPUs
‒ 20-40 TFlops in a single machine!
 Use case: Low-latency rendering
‒ Important for VR and competitive games
‒ Latency optimized GPU job graph scheduling
‒ VR: Simultaneously drive 2 GPUs (1 per eye)
Programmability
New mechanisms
 Command buffer predication & flow control
‒ GPU affecting/skipping submitted commands
‒ Go beyond DrawIndirect / DispatchIndirect
‒ Advanced variable workloads
‒ Advanced culling optimizations
 Write occlusion query results into GPU buffer
‒ No CPU roundtrip needed
‒ Can drive predicated rendering
‒ Or use results directly in shaders (lens flares)
Programmability
Bindless resources
 Mantle supports bindless resources
‒ Shaders can select resources to use instead of
static binding from CPU
‒ Extension of the descriptor set support
 Examples
‒ Performance optimizations – less data to update
‒ Logic & data structures that live fully on the GPU
‒ Scene culling & rendering
‒ Material representations
 Key component that will open up a lot of
opportunities!
‒ Deferred shading
‒ Raytracing
Programmability
Platforms
Platforms
Today
 Mantle gives us strong benefits on Windows today
‒ Console-like performance & programmability on both Windows 7 and Windows 8
‒ For us, well worth the dev time!
 DX & GL are the industry standards
‒ Needed for platforms that do not support Mantle
‒ Needed by devs who do not want/need more control
‒ Have to have fallback paths for GL/DX, but not limit oneself to it
 Mantle and PlayStation 4 will drive our future Frostbite designs & optimizations
‒ PS4 graphics API has great programmability & performance as well
‒ Share concepts, methods & optimization strategies
Platforms
Linux & Mac
 Want to see Mantle on Linux and Mac!
‒ Would enable support for our full engine & rendering
‒ Significantly easier to do efficient renderer with Mantle than with OpenGL
 Use cases:
‒ Workstations
‒ R&D
‒ Not limited by WDDM
‒ Games
‒ Mantle + SteamOS = powerful combination!
Platforms
Mobile
 Mobile architectures are getting closer in capabilities to desktop GPUs
 Want graphics API that allows apps to fully utilize the hardware
‒ Power efficient
‒ High performance
‒ Programmable
 Major opportunity with Mantle – leap frog GL4, DX11
‒ For mobile SoC vendors
‒ For Google and Apple
Platforms
Multi-vendor?
 Mantle is designed to be a thin hardware abstraction
‒ Not tied to AMD’s GCN architecture
‒ Forward compatible
‒ Extensions for architecture- and platform-specific functionality
 Mantle would be a much more efficient graphics API for other vendors as well
‒ Most Mantle functionality can be supported on today’s modern GPUs
 Want to see future version of Mantle supported on all platforms and on all modern GPUs!
‒ Become an active industry standard with IHVs and ISVs collaborating
‒ Enable us developers to innovate with great performance & programmability everywhere
Platforms
Frostbite
Battlefield 4
 Mantle support is in development
‒ Core renderer (closer to PS4 than DX11)
‒ Implement all rendering techniques used in BF4 (many!)
‒ CPU optimizations (parallel dispatch, descriptor sets)
‒ GPU optimizations (minimize transitions, MSAA)
‒ R&D for advanced GPU optimizations
‒ Memory management
‒ Multi-GPU support
‒ ~2 months of work
 Update targeting late December
Frostbite
Plants vs Zombies: Garden Warfare
 Very different rendering
compared to BF4 
 Frostbite Mantle renderer will
work out of the box
 Focus on APU performance
Frostbite
Future
 All Frostbite games designed with Mantle
‒ 15 games in development across all of EA
 Advanced Mantle rendering & use cases
‒ Lots of exciting R&D opportunities!
 Want multi-vendor & multi-platform support!
Email: [email protected]
Web:
http://frostbite.com
Twitter: @repi
THE END