Transcript GDC 2005

Using GPUView to Understand
your DirectX 11 Game
Jon Story
Developer Relations Engineer, AMD
Agenda
●
●
●
●
●
●
Windows Display Driver Model (WDDM)
What is GPUView?
CPU & GPU Queues
Threads & Events
Case Studies
Summary
Windows Display Driver Model
(WDDM)
Graphics & WDDM
Session
Space
Kernel Mode
Driver (KMD)
Win32
kernel
Kernel
Mode
Dxgkrnl
Application
D3D
Runtime
DWM
Application
Process
User Mode
Driver (UMD)
DWM Process
User
Mode
GPU
GPU Scheduler Database
1
Wait
DMA
Buffer
2
Win32k &
dxgkrnl
KMD
D3D
Runtime
UMD
D3D
Runtime
UMD
Application
#1
Command
Buffer
Application
#2
Command
Buffer
Kernel Mode
User Mode
What is GPUView?
What is GPUView?
●
An additional Microsoft performance tool
●
●
●
Compliments existing tools
Part of the Windows 7 SDK
Built on Event Tracing for Windows
Perfect for monitoring CPU/GPU interaction (even
for multiple GPU setups)
● Allows you to see how well the GPU is being fed
● Supports DX9, DX10 & DX11 on Win7
●
Capturing Data
●
Run an elevated command prompt
●
●
Start your game in windowed mode
●
●
For fullscreen mode perhaps use PsExec from a remote
machine
Start capturing with log.cmd
●
●
●
\Program Files\Microsoft Windows Performance Toolkit\GPUView
Capture 10-15 seconds of your game
Stop logging with log.cmd
Open merged.etl file with GPUView.exe
Was this tool
created for
driver
programmers?
Navigating the Data
●
●
Use the mouse to select a region
Ctrl+Z zooms in to a selection
●
●
●
●
Z zooms out
Use +/- to see more or less detail
Ctrl+E opens the event menu
Click on objects for additional details
●
More on this later…
Zooming in…
DMA Packet Color Coding
Various types of DMA packets may be
submitted to the GPU:
●
●
●
●
●
●
Red: Paging packet
Black: Preemption packet
Brown: DWM packet
Other Color: Standard packet
Other Color + Cross-Hatch: Present packet
What does a Standard DMA
Packet Represent?
●
●
●
Graphics system state objects
Draw commands
References to resource allocations
●
●
●
●
Textures
Vertex & Index Buffers
Render Targets
Constant Buffers
CPU & GPU Queues
SW Context CPU Queues (1)
Desktop Window
Manager packet
D3D app stacking
up 3 frames of
packets
SW Context CPU Queue (2)
CPU queue
depth is 6
Task
submitted to
HW queue
CPU queue is
empty!
New Task
submitted to
CPU queue
SW Context CPU Queues (3)
Objects represent work submitted to a
GPU context
● Queue is represented through time as a
stack
●
●
●
Stack grows on submission of work by the
UMD
Stack shrinks as work is completed by the
GPU
GPU HW Context Queue (1)
Present Packet
Preemption
packet
Queued DMA
Packet
DWM
GPU
Processing
DMA Packet
GPU HW Context Queue (2)
GPU starts
working on
packet
GPU finishes
working on
packet
GPU has no
work to do
GPU HW Context Queue (3)
Queue is represented through time as a
stack
●
●
●
●
Stack grows on submission of work by the
KMD
Stack shrinks as work is completed by the
GPU
Gaps indicate a CPU side bottleneck
Object Selection
Represents latency
Object Details (1)
Packet type & timing
information
Allocation references
in DMA packet
Object Details (2)
(w) = Writable by
GPU
Preferred memory
segment
P0 = Preferred
P1 = Less
P2 = Least
Object Viewer
Segment Numbers:
1 = Vid Mem (CPU visible)
2 = Vid Mem (Non visible)
3 = PCI Express Mem
Clearly the depth
buffer
Paging Buffer Packet
Submitted as the result of a paging
operation (perhaps a large texture)
● Cause is usually resulting from preparing
a DMA buffer
● Look at the DMA packet that follows the
paging operation
●
Threads & Events
HW Threads
Colored bars
represent idle time
Gaps represent work
Thread Execution
●
Thread segments are colored coded:
●
●
●
Light blue: Kernel mode
Dark blue: dxgkrnl
Red: KMD (Kernel Mode Driver)
Charts: FPS / Latency / Memory
Viewing Events
Ctrl+E opens the Event View window
● Can track whatever events take your
interest
● DX- Create / Destroy Allocation
● DX Block
●
●
●
Suggests possible resource contention
Perhaps trying to lock an in use buffer
V-Sync Event
Case Studies
DrawPredicated SDK Sample
GPU is busy,
no gaps
CPU queue is
buffering up nicely
App thread not
saturated
DrawPredicated SDK Sample: +
blocking occlusion queries
GPU is going
idle
Not enough being
queued up
App thread fully
saturated
Getting Occlusion Queries Right
●
Delay picking up results by N frames
●
Where N = Number of GPUs
May need to artificially inflate occlusion
volumes to avoid poping
●
What else could cause this
problem?
●
Locking a Render Target
●
●
Use CopyResource & Staging Textures
This is a queued operation
ContentStreaming SDK Sample (1)
Paging packets
GPU is going
idle
ContentStreaming SDK Sample (2)
Large
resources not
getting
preferred
segments
Avoiding Paging
Keep your video memory usage under
control
●
●
●
Especially in MSAA modes
Drop texture resolution for lower end HW
Avoid excessively large amounts of
dynamic data
●
●
●
Textures & Vertex Buffers
If not sure – talk to us!
MultithreadedRendering11 SDK
Sample
But there is a
lot of D3D
runtime /
driver
overhead
Additional
threads
preparing
packets
Multi-Threaded Rendering and
Deferred Contexts
It is a complex issue
● Don‘t expect it to be a magic bullet
● Strongly recommend you talk to
developer relations from AMD & NVIDIA
●
Summary
Summary
Make sure you‘re keeping the ever hungry
GPU fed
● Keep track of CPU/GPU interaction
● Keep track of your threads
● Monitor multi-GPU interaction
● Add GPUView to your toolbox
●
Acknowledgments
Microsoft for creating GPUView 
● Microsoft for providing background
content
●
Questions?