Mantle for Developers - 12 MB
Download
Report
Transcript Mantle for Developers - 12 MB
MANTLE FOR DEVELOPERS
JOHAN ANDERSSON – TECHNICAL DIRECTOR
FROSTBITE
ELECTRONIC ARTS
Mantle?
Simplify advanced development
Improve performance
Enable developers to innovate
Challenge the status quo
Developer impact areas
Control
CPU performance
Programmability
GPU performance
Platforms
Control
New model
Traditional Model:
Black Box
Explicit Model:
Mantle
Middle-ground abstraction – compromise
between performance & “usability”
Thin low-level abstraction to expose how
hardware works
Hidden resource memory & state
App explicit memory management
Resource CPU access tied to device context
Resources are globally accessible
Driver analyzes & synchronizes implicitly
App explicit resource state transitions
Control
App responsibility
Tell when render target will be used as a texture
‒ And many more resource state transitions
Don’t destroy resources that GPU is using
‒ Keep track with fences or frames
Manual dynamic resource renaming
‒ No DISCARD for driver resource renaming
Resource memory tiling
Powerful validation layer will help!
Control
Explicit control enables
App high-level decisions & optimizations
‒ Has full scene information
‒ Easier to optimize performance & memory
Flexible & efficient memory management
‒ Linear frame allocators
‒ Memory pools
‒ Pinned memory
Reduced development time
‒ For advanced game engines & apps
‒ Easier to get to target performance & robustness
Control
Explicit control enables
Transient resources
‒ Alias render targets within frame
‒ Major memory savings
‒ No need to pre-allocate everything
Light-weight driver
‒ Easier to develop & maintain
‒ Reduced CPU draw call overhead
Control
CPU performance
CPU perf
Core concepts
Descriptor sets
Monolithic pipelines
Command buffers
CPU perf
Descriptor sets
Table with resource references to bind to
graphics or compute pipeline
Image
Memory
Sampler
Link
Replaces traditional resource stage binding
‒ Major performance & flexibility advantage
‒ Closer to how the hardware works
Example 1: Single simple dynamic descriptor set
‒ Bind everything you need for a single draw call
‒ Close to DX/GL model but share between stages
Dynamic descriptor set
VertexBuffer (VS)
Texture0 (VS+PS)
Constants (VS)
Texture1 (PS)
App managed - lots of strategies possible!
‒ Tiny vs huge sets
‒ Single vs multiple
‒ Static vs semi-static vs dynamic
Texture2 (PS)
Sampler0 (VS+PS)
CPU perf
Descriptor sets
Table with resource references to bind to
graphics or compute pipeline
Image
Example 2: Reuse static set with nesting
‒ Reduce update time & memory usage
Memory
Static descriptor set
Sampler
Link
Dynamic descriptor set
Replaces traditional resource stage binding
‒ Major performance & flexibility advantage
‒ Closer to how the hardware works
Constants (VS)
Link
VertexBuffer (VS)
Texture0 (VS+PS)
Texture1 (PS)
Texture2 (PS)
Texture3 (PS)
App managed - lots of strategies possible!
‒ Tiny vs huge sets
‒ Single vs multiple
‒ Static vs semi-static vs dynamic
Texture4 (PS)
Sampler0 (VS+PS)
Sampler1 (PS)
CPU perf
Monolithic pipelines
Shader stages & select graphics state combined into single object
‒ No runtime compilation or patching needed!
‒ Significantly less runtime overhead to use
Pipeline state
Supports parallel building & caching
‒ Fast loading times
Usage & management up to the app
‒ Static vs dynamic creation
‒ Amount of pipelines
‒ State usage
IA
DB
VS
HS
DS
Tessellator
GS
RS
PS
CB
CPU perf
Command buffers
Issue pipelined graphics & compute commands into a command buffer
‒ Bind graphics state, descriptor sets, pipeline
‒ Draw calls
‒ Render targets
‒ Clears
‒ Memory transfers
‒ NOT: resource mapping
Fully independent objects
‒ Create multiple every frame
‒ Or pre-build up front and reuse
CPU perf
DX/GL parallelism
CPU 0
CPU 1
CPU 2
Game
Game
Game
Render
Render
Driver Render
Automatically extracts parallelism out of most apps
Doesn’t scale beyond 2-3 cores
Additional latency
Driver thread often bottleneck – can collide app threads
Render
CPU perf
Parallel dispatch with Mantle
CPU 0
Game
Game
Game
CPU 1
Render
Render
Render
CPU 2
Render
Render
Render
CPU 3
Render
Render
Render
CPU 4
Render
Render
Render
App can go fully wide with its rendering – minimal latency
Close to linear scaling with CPU cores
No driver threads – no overhead – no contention
Frostbite’s approach on all consoles – and on PC with Mantle!
CPU performance
GPU performance
GPU perf
GPU optimizations
Thanks to improved CPU performance – CPU
will rarely be a bottleneck for the GPU
‒ CPU could help GPU more:
‒ Less brute force rendering
‒ Improve culling
Resource states
‒ Gives driver a lot more knowledge & flexibility
‒ Apps can avoid expensive/redundant transitions,
such as surface decompression
Expose existing GPU functionality
Shader pipeline object – driver optimizations
‒ Can optimize with pipeline state knowledge
‒ Can optimize across all shader stages
‒ Quad & Rect-lists
‒ HW-specific MSAA & depth data access
‒ Programmable sample patterns
‒ And more..
GPU perf
Queues
Modern GPUs are heterogeneous machines
with multiple engines
Graphics
‒ Graphics pipeline
‒ Compute pipeline(s)
‒ DMA transfer
‒ Video encode/decode
‒ More…
Mantle exposes queues for the engines +
synchronization primitives
Compute
DMA
...
Queues
GPU
GPU perf
Queues
Graphics
Compute
DMA
...
Queues
GPU
GPU perf
Queue use cases
Async DMA transfers
‒ Copy resources in parallel with graphics or
compute
Copy
DMA
Graphics
Render
Other render
Use copy
GPU perf
Queue use cases
Async DMA transfers
‒ Copy resources in parallel with graphics or
compute
Async compute together with graphics
‒ ALU heavy compute work at the same time as
memory/ROP bound work to utilize idle units
Compute
Graphics
GBuffer
Non-shadowed lighting
Shadowmap 0
Shadowmap 1
Final lighting
GPU perf
Queue use cases
Async DMA transfers
Multiple compute kernels collaborating
‒ Copy resources in parallel with graphics or
compute
‒ Can be faster than über-kernel
‒ Example: Compute geometry backend & compute
rasterizer
Async compute together with graphics
‒ ALU heavy compute work at the same time as
memory/ROP bound work to utilize idle units
Compute 0
Compute 1
Graphics
Compute Geometry
Compute Rasterizer
Ordinary Rendering
GPU perf
Queue use cases
Async DMA transfers
Multiple compute kernels collaborating
‒ Copy resources in parallel with graphics or
compute
‒ Can be faster than über-kernel
‒ Example: Compute geometry backend & compute
rasterizer
Async compute together with graphics
‒ ALU heavy compute work at the same time as
memory/ROP bound work to utilize idle units
Compute
Graphics
Compute as frontend for graphics pipeline
‒ Compute runs asynchronously ahead and prepares
& optimizes geometry for graphics pipeline
Game engines
will buildProcess0
large GPU job graphs
Process0
Process1
‒ Move away from single sequential submission
Draw0
‒ Just as we
already have doneDraw1
on CPU
Draw2
GPU performance
Programmability
Programmability
Explicit Multi-GPU
Explicit control of GPU queues and synchronization, finally!
‒ Implement your own Alternate-Frame-Rendering
‒ Or something more exotic..
Use case: Workstation rendering with 4-8 GPUs
‒ Super high-quality rendering & simulation
‒ Load balance graphics & compute job graphs across GPUs
‒ 20-40 TFlops in a single machine!
Use case: Low-latency rendering
‒ Important for VR and competitive games
‒ Latency optimized GPU job graph scheduling
‒ VR: Simultaneously drive 2 GPUs (1 per eye)
Programmability
New mechanisms
Command buffer predication & flow control
‒ GPU affecting/skipping submitted commands
‒ Go beyond DrawIndirect / DispatchIndirect
‒ Advanced variable workloads
‒ Advanced culling optimizations
Write occlusion query results into GPU buffer
‒ No CPU roundtrip needed
‒ Can drive predicated rendering
‒ Or use results directly in shaders (lens flares)
Programmability
Bindless resources
Mantle supports bindless resources
‒ Shaders can select resources to use instead of
static binding from CPU
‒ Extension of the descriptor set support
Examples
‒ Performance optimizations – less data to update
‒ Logic & data structures that live fully on the GPU
‒ Scene culling & rendering
‒ Material representations
Key component that will open up a lot of
opportunities!
‒ Deferred shading
‒ Raytracing
Programmability
Platforms
Platforms
Today
Mantle gives us strong benefits on Windows today
‒ Console-like performance & programmability on both Windows 7 and Windows 8
‒ For us, well worth the dev time!
DX & GL are the industry standards
‒ Needed for platforms that do not support Mantle
‒ Needed by devs who do not want/need more control
‒ Have to have fallback paths for GL/DX, but not limit oneself to it
Mantle and PlayStation 4 will drive our future Frostbite designs & optimizations
‒ PS4 graphics API has great programmability & performance as well
‒ Share concepts, methods & optimization strategies
Platforms
Linux & Mac
Want to see Mantle on Linux and Mac!
‒ Would enable support for our full engine & rendering
‒ Significantly easier to do efficient renderer with Mantle than with OpenGL
Use cases:
‒ Workstations
‒ R&D
‒ Not limited by WDDM
‒ Games
‒ Mantle + SteamOS = powerful combination!
Platforms
Mobile
Mobile architectures are getting closer in capabilities to desktop GPUs
Want graphics API that allows apps to fully utilize the hardware
‒ Power efficient
‒ High performance
‒ Programmable
Major opportunity with Mantle – leap frog GL4, DX11
‒ For mobile SoC vendors
‒ For Google and Apple
Platforms
Multi-vendor?
Mantle is designed to be a thin hardware abstraction
‒ Not tied to AMD’s GCN architecture
‒ Forward compatible
‒ Extensions for architecture- and platform-specific functionality
Mantle would be a much more efficient graphics API for other vendors as well
‒ Most Mantle functionality can be supported on today’s modern GPUs
Want to see future version of Mantle supported on all platforms and on all modern GPUs!
‒ Become an active industry standard with IHVs and ISVs collaborating
‒ Enable us developers to innovate with great performance & programmability everywhere
Platforms
Frostbite
Battlefield 4
Mantle support is in development
‒ Core renderer (closer to PS4 than DX11)
‒ Implement all rendering techniques used in BF4 (many!)
‒ CPU optimizations (parallel dispatch, descriptor sets)
‒ GPU optimizations (minimize transitions, MSAA)
‒ R&D for advanced GPU optimizations
‒ Memory management
‒ Multi-GPU support
‒ ~2 months of work
Update targeting late December
Frostbite
Plants vs Zombies: Garden Warfare
Very different rendering
compared to BF4
Frostbite Mantle renderer will
work out of the box
Focus on APU performance
Frostbite
Future
All Frostbite games designed with Mantle
‒ 15 games in development across all of EA
Advanced Mantle rendering & use cases
‒ Lots of exciting R&D opportunities!
Want multi-vendor & multi-platform support!
Email: [email protected]
Web:
http://frostbite.com
Twitter: @repi
THE END