Direct3d 11 PreviewUtah code campfall 2008

Download Report

Transcript Direct3d 11 PreviewUtah code campfall 2008

Richard Thomson
DAZ 3D
www.daz3d.com
Direct3D 11

CTP in November 2008 DirectX SDK

Vista (and beyond) only, not on XP

Evolution of Direct3D 10

Compatible with D3D 10 cards
Evolution of Direct3D

Direct3D 9
 Stable, been around for a while
 Last version to be deployed on Win XP

Direct3D 10
 First Vista-only version
 Big change from D3D 9

Direct3D 10.1
 Incremental tweak to D3D 10
Direct3D 10/10.1/11 vs. 9
Enumeration factored out to DXGI
 Same DXGI used for 10, 10.1 and 11
 Divide render/texture states into chunks
 Chunks of state are immutable objects
 “Device state” consists of set of
assigned state chunks
 Introduces new shader stages beyond
vertex and pixel shaders
 Tighter API specification => no CAPS

Direct3D 11 Focus

Scalability and performance

Improving the development experience

Extending the reach of the GPU
Direct3D 11 New Features
Tessellation
 Compute Shader
 Multithreading
 Shader Subroutines
 Improved Texture Compression
 Other Features

Tessellation
Input Assembler
Direct3D 10 pipeline
Plus
Three new stages for
Tessellation
Vertex Shader
Hull Shader
Tessellator
Domain Shader
Geometry Shader
Rasterizer
Pixel Shader
Output Merger
Stream Output
Hull Shader
HS input:
patch control pts
One Hull Shader
invocation per
patch
Hull Shader
HS output:
Patch control pts after
Basis conversion
HS output:
• TessFactors (how much to tessellate)
• fixed tessellator mode declarations
Tessellator
Domain
Shader
Hull Shader Syntax
[patchsize(12)]
[patchconstantfunc(MyPatchConstantFunc)]
MyOutPoint main(uint Id : SV_ControlPointID,
InputPatch<MyInPoint, 12> InPts)
{
MyOutPoint result;
…
result = TransformControlPoint( InPts[Id] );
return result;
}
Tessellator
Hull Shader
Note: Tessellator
does not see
control points
TS input:
• TessFactors (how much to tessellate)
• fixed tessellator mode declarations
Tessellator
TS output:
• U V {W} domain
points
Domain
Shader
Tessellator
operates per
patch
TS output:
• topology
(to primitive assembly)
Domain Shader
Hull Shader
DS input:
• control points
• TessFactors
Tessellator
DS input:
• U V {W} domain points
Domain Shader
One Domain
Shader invocation
per point from
Tessellator
DS output:
• one vertex
Domain Shader Syntax
void main( out MyDSOutput result,
float2 myInputUV : SV_DomainPoint,
MyDSInput DSInputs,
OutputPatch<MyOutPoint, 12> ControlPts,
MyTessFactors tessFactors )
{
…
result.Position =
EvaluateSurfaceUV( ControlPoints, myInputUV );
}
Single Pass Example
vertex shader
hull shader
Animate/skin
Control
Points
Transform basis,
Determine how
much to tessellate
patch
control points
transformed
control points
tessellator
Tess
Factors
control points
in Bezier patch
Tessellate!
domain shader
Evaluate
surface
including
displacement
U V {W}
domain points
displacement
map
Sub-D Patch
Bezier Patch
Current Authoring Pipeline
(Rocket Frog Taken From Loop &Schaefer, "Approximating Catmull-Clark Subdivision Surfaces with Bicubic Patches“)
Sub-D Modeling
Polygon Mesh
Animation
Displacement Map
Generate LODs
New Authoring Pipeline
(Rocket Frog Taken From Loop &Schaefer, "Approximating Catmull-Clark Subdivision Surfaces with Bicubic Patches“)
Animation
Sub-D Modeling
Displacement Map
Optimally Tessellated Mesh
GPU
Tessellation Summary



Helps us get closer to eliminating “pointy heads”
Scales visual quality across PC hardware
configurations
Supports performance increases
 Coarse model = compression, faster I/0 to GPU
 Rendering tailored to each end user’s hardware

Better cross-platform (Windows + Xbox 360)
development experience
 Xbox 360 has a subset of D3D11’s tessellation
 Parity = ease of cross-platform development
 Extra features = innovation for Windows gaming

Render content as the artist created it!
More on Tessellation

GameFest 2008 Slides and Audio
 “Direct3D 11 Tessellation”
○ Kev Gee, Microsoft
 “Advanced Topics in GPU Tessellation”
○ Natasha Tatarchuk, AMD/ATI
 “Water-Tight, Textured, Displaced Subdivision
Surface Tessellation Using Direct3D 11”
○ Ignacio Castano, NVIDIA
General Purpose GPU
Data Parallel Computing
 GPU performance continues to grow
 Many applications scale well to massive
parallelism without tricky code changes
 Direct3D is the API for talking to GPU
 How do we expand Direct3D to
GPGPU?

Compute Shader
Input Assembler
Vertex Shader
Hull Shader
Tessellator
Domain Shader
Geometry Shader
Stream Output
Direct3D 10 pipeline
Plus
Three new stages for
Tessellation
Plus
Compute Shader
Rasterizer
Pixel Shader
Output Merger
Data Structure
Compute
Shader
Integrated with Direct3D
Fully supports all Direct3D resources
 Targets graphics/media data types
 Evolution of DirectX HLSL
 Graphics pipeline updated to emit
general data structures…
 …which can then be manipulated by
compute shader…
 And then rendered by Direct3D again

Target Applications

Image/Post processing:




Image Reduction
Image Histogram
Image Convolution
Image FFT
A-Buffer/OIT
 Ray-tracing, radiosity, etc.
 Physics
 AI

Computing a Histogram
Histogram()
{
shared int Histograms[16][256];
// array of 16
float3 vPixel = load( sampler, sv_ThreadID );
float fLuminance = dot( vPixel, LUM_VECTOR );
int iBin = fLuminance*255.0f;
// compute bin to increment
int iHist = sv_ThreadIDInGroup & 16; // use thread index
Histograms[iHist][iBin] += 1;
// update bin
// enable all threads in group to complete
SynchronizeThreadGroup;
Computing a Histogram 2
// Write register histograms out to memory:
iBin = sv_ThreadIDInGroup.x;
if (sv_ThreadID.x < 256)
{
for (iHist = 0; iHist < 16; iHist++)
{
int2 destAddr = int2(iHist, iBin);
OutputResource.add(destAddr,
Histograms[iHist][iBin]); // atomic
}
}
}
Compute Shader Summary
Enables much more general algorithms
 Transparent parallel processing model
 Full cross-vendor support
 Broadest possible installed base


GameFest 2008:
 “Direct3D 11 Compute Shader – More
Generality for Advanced Techniques”
○ Chas Boyd, Microsoft
Multithreading

Enables distribution across threads of
 Application code
 Runtime
 Driver





Device: free threaded resource creation
Immediate Context: your single primary device
for state & draws
Deferred Contexts: your per-thread devices for
state & draws
Display Lists: Recorded sequence of graphics
commands
Requires a driver update
Shader Subroutines

Details
 Calls must be fast
 Binding applies to all primitives in a Draw call
 Binding operation must be fast
 Need parameter passing mechanism
 Need access to textures, samplers, etc.

Advantages
 Reduce register usage in Über-shaders
○ Not worst case of all if statements
 Allows specialization of subroutines
Improved Texture Compression

Why?
Existing block palette interpolations too
simple
 Results often rife with blocking artifacts
 No high dynamic range (HDR) support

New Texture Formats

BC6 (aka BC6H)
 High dynamic range
 6:1 compression (16 bpc RGB)
 Targeting high (not lossless) visual quality

BC7
 LDR with alpha
 3:1 compression for RGB or 4:1 for RGBA
 High visual quality
Compression of New Formats

Block compression (unchanged)
 Each block independent
 Fixed compression ratio

Multiple block types (new)
 Tailored to different types of content
 Smooth gradients vs. noisy normal maps
 Varied alpha vs. constant alpha

Decompression results must be bitaccurate with spec
Comparison Results 1
Orig
BC3
Orig
BC7
Abs Error
Comparison Results 2
Orig
BC3
Orig
BC7
Abs Error
Comparison Results 3
HDR Original at
given exposure
Abs Error
BC6 at
given exposure
Other Features
Addressable Stream Out
 Draw Indirect
 Pull-model attribute eval
 Improved Gather4
 Min-LOD texture clamps
 16K texture limits
 Required 8-bit subtexel,
submip filtering precision

Conservative oDepth
 2 GB Resources
 Geometry shader instance
programming model
 Optional double support
 Read-only depth or stencil
views

Thanks
Allison Klein
Senior Lead Program Manager
Direct3D
Microsoft
Chas. Boyd
Architect
Windows Desktop & Gaming Technology
Microsoft
Thank you to
our Sponsors!