An Overview of the NVIDIA UNIX Graphics Driver

Transcript An Overview of the NVIDIA UNIX Graphics Driver

An Overview of the
NVIDIA UNIX Graphics Driver
XDevConf, February 8, 2006
Andy Ritger, NVIDIA Corporation
Contents
Unified Driver Architecture
Driver Components
Features
Direct-Rendering Client Interaction with X
Rendering and Scanout Interaction
Video Memory
ABI Compatibility and API Compatibility
Direct-Rendering OpenGL+Damage/Composite
Copyright © NVIDIA Corporation 2004
Unified Driver Architecture
Majority of code base for NVIDIA Graphics Drivers
leveraged on all the operating systems NVIDIA
supports:
Windows
Mac OS X
Linux
Solaris
FreeBSD
Everything OS-specific or window-system-specific
abstracted behind OS interface layers
One driver supports all GPUs
Copyright © NVIDIA Corporation 2004
Driver Components
kernel module (nvidia.ko)
X driver (nvidia_drv.so)
OpenGL library (libGL.so)
GLX driver (libglx.so)
OpenGL core library (libGLcore.so)
Copyright © NVIDIA Corporation 2004
Driver Components (cont.)
shared memory
X protocol
X server
OpenGL app
command
buffer
command
buffers
libGL.so
libglx.so
nvidia_drv
libGLcore
libGLcore.so
user space
kernel
nvidia.ko
GPU
Copyright © NVIDIA Corporation 2004
kernel space
Additional Utilities
nvidia-installer (only needed on Linux)
nvidia-settings
nvidia-xconfig
Copyright © NVIDIA Corporation 2004
Features
Hardware-accelerated direct and indirect OpenGL
Copyright © NVIDIA Corporation 2004
Features
TwinView
2 display devices
scanning from same X
screen
One root window:
spanning comes "for
free"
What is DPI? Nonrectangular layouts?
Copyright © NVIDIA Corporation 2004
Features (cont.)
Multiple X screens on one GPU
Not as efficient as
TwinView for
spanning
Solves DPI and nonrectangular layout
problems of
TwinView
Can advertise
different capabilities
on each X screen
Copyright © NVIDIA Corporation 2004
Features (cont.)
Support for OpenGL with Xinerama
OpenGL direct/indirect rendering can span X screens (even
across GPUs)
Important for CAVEs and Powerwalls
Oil & Gas
Copyright © NVIDIA Corporation 2004
Features (cont.)
Configurability
NV-CONTROL X extension: dynamically query/modify driver
attributes
nvidia-settings is sample NV-CONTROL client
Copyright © NVIDIA Corporation 2004
Features (cont.)
Quad-Buffered Stereo
OpenGL application renders Left/Right eyes
Driver toggles between eyes on every VBlank
Important for many workstation users, CAVEs
Above: CAVE immersive 3d
environment; stereo images
projected on walls and floor; stereo
images must be in sync across all
projectors.
Right: MRI Brain Visualization
Copyright © NVIDIA Corporation 2004
CAVE images courtesy Brown University ~ http://graphics.cs.brown.edu/research/cave/home.html
Features (cont.)
RGB/CI Workstation Overlays
16-bit RGB overlay
8-bit CI overlay
Rendering in overlay does not damage content in main
plane
Useful for user interface in overlay, complex rendering in
mainplane
Useful for legacy applications that require different depths
Used by workstation applications such as Maya
Copyright © NVIDIA Corporation 2004
Features (cont.)
FrameLock
Lock together scanout of displays across a cluster
OpenGL SwapBuffers Locked together
Important for CAVEs and powerwalls
ORNL visualization expert
Jamison Daniel uses the
EVEREST powerwall to
display data from a large
scale climate simulation.
Image courtesy of ORNL
Copyright © NVIDIA Corporation 2004
Features (cont.)
SDI
Serial Digital Interface: video format used in digital
broadcast industry
GPU sends data to SDI in 8, 10, or 12-bit per component
Copyright © NVIDIA Corporation 2004
Features (cont.)
SLI
Multiple GPUs drive one X screen
Alternate Frame Rendering (AFR)
Split Frame Rendering (SFR)
SLI AntiAliasing (SLIAA)
GPU
CPU
Chipset
GPU
Copyright © NVIDIA Corporation 2004
Direct-Rendering Client Interaction
with X
Motivation for Direct Rendering:
Avoid IPC overhead
Avoid moving large quantities of data between client and
server (e.g., OpenGL textures)
Avoid making GLX protocol requests for every OpenGL API
call (e.g., glVertex3f() millions of times per frame)
When OpenGL application is on same system as X server,
performance benefit to bypass GLX protocol
Copyright © NVIDIA Corporation 2004
Direct-Rendering Client Interaction
with X (cont.)
Hardware-acceleration vs direct-rendering:
Hardware-acceleration: using GPU to perform some or all of
the OpenGL rendering pipeline
Direct rendering: by-passing GLX protocol and OpenGL
library renders directly to the hardware
Server-side must coordinate with OpenGL client library
for:
Data propogation
Synchronization
Copyright © NVIDIA Corporation 2004
Direct-Rendering Client Interaction
with X (cont.)
What data needs propogating?
Drawable's geometry
Drawable's cliplist
Other Drawable attributes:
SwapInterval
AntiAliasing
SyncToVBlank
etc...
Copyright © NVIDIA Corporation 2004
Direct-Rendering Client Interaction
with X (cont.)
Control flow:
NVIDIA X driver pushes current drawable state into a
shared memory segment
OpenGL direct-rendering runs asynchronously to X
server
When OpenGL performs operation that must be upto-date wrt window system, checks that it has current
drawable data
If stale, OpenGL retrieves current data from shared
memory and updates internal state
Copyright © NVIDIA Corporation 2004
Direct-Rendering Client Interaction
with X (cont.)
Synchronization needed to ensure integrity of
drawable data in shared memory
Synchronization also needed to ensure correct
ordering of GPU commands issued by each driver (X,
each instance of OpenGL)
Copyright © NVIDIA Corporation 2004
Direct-Rendering Client Interaction
with X (cont.)
Traditional GPUs: 1 command buffer:
Shared by all driver components
Synchronization needed to protect shared buffer
NVIDIA GPUs: multiple command buffers:
One command buffer for each OpenGL client, one for X
driver
Hardware context switches between command buffers
No need to negotiate shared command buffer
Instead, need to manage sequencing of GPU commands
Copyright © NVIDIA Corporation 2004
Direct-Rendering Client Interaction
with X (cont.)
Why is sequencing important?
Consider moving an animating OpenGL window that
is clipped
Operations performed:
OpenGL SwapBuffers: blit back->front per cliprect
X driver: blit from old position to new position per cliprect
Must make sure all outstanding OpenGL rendering is
complete and has reached the framebuffer before X's
blit commands are processed by the GPU
Copyright © NVIDIA Corporation 2004
Direct-Rendering Client Interaction
with X (cont.)
Inter-commandbuffer synchronization; driver-specific
problem to solve
Important concept whenever one client's rendering is
read by another client:
Direct-rendering OpenGL clients rendering to redirected
windows
X rendering to pixmaps that are used as OpenGL textures
(GLX_EXT_texture_from_pixmap)
Copyright © NVIDIA Corporation 2004
Interactions Between Rendering and
Scanout
Flipping vs Blitting
Blit: memcpy
Flip: change what portion of video memory is
scanned out
Flipping is faster, easier to sync to VBlank
Copyright © NVIDIA Corporation 2004
Interactions Between Rendering and
Scanout (cont.)
To flip while an OpenGL application is in a window:
Create a second copy of the desktop
Only the content within the OpenGL window is different
Flip between copies of the desktop
Requires keeping the desktop in sync between the two
copies
Copyright © NVIDIA Corporation 2004
Interactions Between Rendering and
Scanout (cont.)
Quad-Buffered Stereo:
Flip between Left/Right eyes every Vblank
Swaps can be done with either blit or flip
Copyright © NVIDIA Corporation 2004
Interactions Between Rendering and
Scanout (cont.)
Ideally, rendering and scanout would be orthogonal
In practice, they are not:
OpenGL needs to control when and where to flip
SyncToVBlank
Video Memory allocation/configuration may depend on
whether surface will be scanned out
Filtering for AA through scanout
SLI (SFR, AFR, SLIAA)
Frame delivery for video:
Time-sensitive
Driver needs precise control of frame display
Best accomplished with flipping
Copyright © NVIDIA Corporation 2004
Video Memory
Most modern NVIDIA GPUs are packaged with large
quantities of video memory
However:
Not all video memory is CPU mappable; SBIOSes limit how
much can be mapped to the CPU
Some GPUs support rendering to system memory over PCIE bus
CPU mappable
GPU rendering is slower than to native vidmem
Layout of video memory may not be linear
Organization of bits within video memory optimized for
rendering and texturing
acquiring a linear CPU mapping may require sacrifices
Copyright © NVIDIA Corporation 2004
Video Memory (cont.)
Many attributes to the video memory
Selecting the optimal placement of data in the correct
memory space is non-trivial
Placement heuristics perform best when driver has
knowledge of how that data is going to be used
Copyright © NVIDIA Corporation 2004
ABI and API Compatibility
NVIDIA provides one X driver binary used in all X
servers since XFree86 4.0
This is accomplished through:
ABI compatibility
Dynamic loading of symbols
We understand that ABI compatibility needs to be
broken, and we can work with that
Copyright © NVIDIA Corporation 2004
ABI and API Compatibility (cont.)
However, here are a few suggestions:
Breaking ABI compatibility painful for anyone distributing a
driver separately from the X server tree (will that be more
common with the Modular X tree?)
To minimize pain, break ABI infrequently and only when
absolutely necessary
Add new entry points and deprecate old entry points, rather
than change old entry points, to give opportunity to phase
in driver support
Copyright © NVIDIA Corporation 2004
ABI and API Compatibility (cont.)
More Suggestions:
Update ABI version number appropriately
ABI version querable at install time and run time
Minimize number of incompatible ABI versions: minimize
number of driver versions to distribute
If there are several ABI breakages pending, get them all out
of the way at once
If ABI is going to be broken anyway, update APIs when
appropriate (Xv, Glyph management)
Copyright © NVIDIA Corporation 2004
OpenGL + Damage/Composite
Direct-rendering GL+Damage/Composite:
Clients aware that drawable has been redirected
Clients notify X when drawable is damaged
Clients and X drivers to handle synchronization:
Do not use direct-rendering content as source for
compositing operation until direct-rendering content has
reached framebuffer (tricky if direct-rendering client and
composite manager's rendering are in separate GPU
command buffers)
Or, do not notify X server of direct-rendered damage until
rendering has reached framebuffer; but this increases latency
The Synchronization Problem to be discussed later
Copyright © NVIDIA Corporation 2004
OpenGL + Damage/Composite
(cont.)
Compositing overhead will be substantial for directrendering clients, especially for applications with a
high framerate
Important that users can disable compositing when
they want:
Full OpenGL performance
Features that may not be possible with Composite:
Workstation Overlays
Quad-Buffered Stereo
Copyright © NVIDIA Corporation 2004
OpenGL + Damage/Composite
(cont.)
All the building blocks are here for OpenGL
implementors to support direct-rendering OpenGL
with Damage and Composite
Demo of NVIDIA direct-rendering OpenGL with
Damage and Composite; will be available in nvr85series drivers
Copyright © NVIDIA Corporation 2004
Conclusion
NVIDIA Driver has many features important to our
users
Overview of direct-rendering client/X driver
interaction
Data Propogation
Synchronization
Rendering and Scanout Interaction
Video Memory
ABI and API Compatibility
Direct-rendering OpenGL + Damage/Composite
Copyright © NVIDIA Corporation 2004
Questions?
http://developer.nvidia.com/object/
xdevconf_2006_presentations.html
Copyright © NVIDIA Corporation 2004

An Overview of the NVIDIA UNIX Graphics Driver

Transcript An Overview of the NVIDIA UNIX Graphics Driver

Directory