Transcript Title Slide

Efficient High-Level Shader
Development
Natalya Tatarchuk
3D Application Research Group
ATI Technologies, Inc.
August 2003
Overview
•
Writing optimal HLSL code
– Compiling issues
– Optimization strategies
– Code structure pointers
•
HLSL Shader Examples
– Multi-layer car paint effect
– Translucent Iridescent Shader
– Überlight Shader
August 2003
Why use HLSL?
•
Faster, easier effect development
–
–
–
Instant readability of your shader code
Better code re-use and maintainability
Optimization
•
•
•
Added benefit of HLSL compiler optimizations
Still helps to know what’s under the hood
Industry standard which will run on cards from
any vendor
–
Current and future industry direction
•
Increase your ability to iterate on a given shader
design, resulting in better looking games
•
Conveniently manage shader permutations
August 2003
Compile Targets
•
Legal HLSL is still independent of compile
target chosen
•
But having an HLSL shader doesn’t mean it
will always run on any hardware!
•
Currently supported compile targets:
–
–
•
vs_1_1, vs_2_0, vs_2_sw
ps_1_1, ps_1_2, ps_1_3, ps_1_4, ps_2_0, ps_2_sw
Compilation is vendor-independent and is
done by a D3DX component that Microsoft can
update independent of the runtime release
schedule
August 2003
Compilation Failure
•
The obvious: program errors (bad syntax, etc)
•
Compile target specific reasons – your
shader is too complex for the selected target
–
Not enough resources in the selected target
•
•
–
Lack of capability in the target
•
•
•
•
Uses too many registers (temporaries, for example)
Too many resulting asm instructions for the compile
target
Such as trying to sample a texture in vs_1_1
Using dynamic branching when unsupported in the
target
Sampling texture too many times for the target
(Example: more than 6 for ps_1_4)
Compiler provides useful messages
August 2003
Use Disassembly for Hints
• Very helpful for understanding relationship between
compile targets and code generation
• Disassembly output provides valuable hints when
“compiling down” to an older compile target
• If successfully compiled for a more recent target
(eg. ps_2_0), look at the disassembly output for
hints when failing to compile to an older target (eg.
ps_1_4)
– Check out instruction count for ALU and tex ops
– Figure out how HLSL instructions get mapped to assembly
August 2003
Getting Disassembly Output for
Your Shaders
• Directly use FXC
– Compile for any target desired
– Compile both individual shader files and full
effects
– Various input arguments
• Allow to turn shader optimizations on / off
• Specify different entry points
• Enable / disable generating debug information
August 2003
Easier Path to Disassembly
• Use RenderMonkey
while developing shaders
– See your changes in
real-time
• Disassembly output is
updated every time a
shader is compiled
– Displays count for ALU
and texture ops, as
well as the limits for
the selected target
– Can save resulting assembly code into text file
August 2003
Optimizing HLSL Shaders
• Don’t forget you are running on a
vector processor
• Do your computations at the most
efficient frequency
– Don’t do something per-pixel that you can do
per-vertex
– Don’t perform computation in a shader that you
can precompute in the app
• Use HLSL intrinsic functions
– Helps hardware to optimize your shaders
– Know your intrinsics and how they map to asm,
especially asm modifiers
August 2003
HLSL Syntax Not Limited
•
The HLSL code you write is not limited by the compile
target you choose
•
You can always use loops, subroutines, if-else
statements etc
•
If not natively supported in the selected compile target,
the compiler will still try to generate code:
–
–
–
•
•
Loops will be unrolled
Subroutines will be inlined
If – else statements will execute both branches, selecting
appropriate output as the result
Code generation is dependent upon compile target
Use appropriate data types to improve instruction count
–
–
Store your data in a vector when needed
However, using appropriate data types helps compiler do
better job at optimizing your code
August 2003
Using If Statement in HLSL
• Can have large performance
implications
– Lack of branching support in most asm
models
– Both sides of an ‘if’ statement will be
executed
– The output is chosen based on which
side of the ‘if’ would have been taken
• Optimization is different than in the
CPU programming world
August 2003
Example of Using If in Vs_1_1
If ( Threshold > 0.0 )
Out.Position = Value1;
else
Out.Position = Value2;
generates following assembly output:
// calculate lerp value based on Value > 0
mov r1.w, c2.x
slt r0.w, c3.x, r1.w
// lerp between Value1 and Value2
mov r7, -c1
add r2, r7, c0
mad oPos, r0.w, r2, c1
August 2003
Example of Function Inlining
// Bias and double a value to take it from 0..1 range to -1..1 range
float4 bx2(float x)
{
return 2.0f * x - 1.0f;
}
float4 main( float4 tc0 : TEXCOORD0,
float4 tc1 : TEXCOORD1,
float4 tc2 : TEXCOORD2,
float4 tc3 : TEXCOORD3)
: COLOR
{
// Sample noise map three times with different
// texture coordinates
float4 noise0 = tex2D(fire_distortion, tc1);
float4 noise1 = tex2D(fire_distortion, tc2);
float4 noise2 = tex2D(fire_distortion, tc3);
// Weighted sum of signed noise
float4 noiseSum =
+
+
bx2(noise0)
bx2(noise1)
bx2(noise2)
* distortion_amount0
* distortion_amount1
* distortion_amount2;
// Perturb base coordinates in direction of noiseSum as function of height (y)
float4 perturbedBaseCoords = tc0 + noiseSum * (tc0.y * height_attenuation.x +
height_attenuation.y);
// Sample base and opacity maps with perturbed coordinates
float4 base
= tex2D(fire_base,
perturbedBaseCoords);
float4 opacity = tex2D(fire_opacity, perturbedBaseCoords);
August 2003
return base * opacity;
}
Code Permutations Via Compilation
static const bool bAnimate = false;
static
= false;
VS_OUTPUT const
vs_main( bool
float4 bAnimate
Pos: POSITION,
float2 Tex: TEXCOORD0 )
{
VS_OUTPUT Out = (VS_OUTPUT) 0;
Out.Pos = mul( view_proj_matrix, Pos );
if ( bAnimate )
{
Out.Tex.x = Tex.x + time / 2;
Out.Tex.y = Tex.y - time / 2;
}
else
Out.Tex = Tex;
return Out;
}
bool bAnimate = false;
const bool
bAnimate
= POSITION,
false;
VS_OUTPUT
vs_main(
float4 Pos:
float2 Tex: TEXCOORD0 )
{
VS_OUTPUT Out = (VS_OUTPUT) 0;
Out.Pos = mul( view_proj_matrix, Pos );
if ( bAnimate )
{
Out.Tex.x = Tex.x + time / 2;
Out.Tex.y = Tex.y - time / 2;
}
else
Out.Tex = Tex;
return Out;
}
August 2003
vs_1_1
dcl_position v0
dcl_texcoord v1
mul r0, v0.y, c1
mad r0, c0, v0.x, r0
mad r0, c2, v0.z, r0
mad oPos, c3, v0.w, r0
mov oT0.xy, v1
5 instructions
vs_1_1
def c6, 0.5, 0, 0, 0
dcl_position v0
dcl_texcoord v1
mul r0, v0.y, c1
mad r0, c0, v0.x, r0
mov r1.w, c4.x
mul r1.x, r1.w, c6.x
mad r0, c2, v0.z, r0
mov r1.y, -r1.x
mad oPos, c3, v0.w, r0
mad oT0.xy, c5.x, r1, v1
8 instructions
Scalar and Vector Data Types
• Scalar data types are not all natively supported
in hardware
– i.e. integers are emulated on float hardware
• Not all targets have native half and none
currently have double
• Can apply swizzles to vector types
float2 vec = pos.xy
– But!
• Not all targets have fully flexible swizzles
• Acquaint yourself with the swizzles
native to the relevant compile targets
(particularly ps_2_0 and lower)
August 2003
Integer Data Type
• Added to make relative addressing
more efficient
• Using floats for addressing purposes
without defined truncation rules can
result in incorrect access to arrays.
• All inputs used as ints should be
defined as ints in your shader
August 2003
Example of Integer Data Type Usage
• Matrix palette indices for skinning
– Declaring variable as an int is a ‘free’
operation => no truncation occurs
– Using a float and casting it to an int or
using directly => truncation will happen
Out.Position = mul( inPos, World[Index]);
// Index declared as float
frc r0.w, r1.w
add r2.w, -r0.w, r1.w
mul r9.w, r2.w, c61.x
mova a0.x, r9.w
m4x4 oPos, v0, c0[a0.x]
// Index declared as int
mul r0.w, c60.x, r1.w
mova a0.x, r0.w
m4x4 oPos, v0, c0[a0.x]
Code generated with float index vs integer index
August 2003
Real-World Shader Examples
• Will present several case studies of
developing shaders used in ATI’s
demos
– Multi-tone car paint effect
– Translucent iridescent effect
– Classic überlight example
• Examples are presented as
RenderMonkeyTM workspaces
– Distributed publicly with version 1.0 release
August 2003
Multi-Tone Car Paint
August 2003
Multi-Tone Car Paint Effect
• Multi-tone base color layer
• Microflake layer simulation
• Clear gloss coat
• Dynamically Blurred Reflections
August 2003
Car Paint Layers Build Up
Multi-Tone Base Color
Microflake Layer
Clear gloss coat
Final Color Composite
August 2003
Multi-Tone Base Paint Layer
• View-dependent lerping
between three paint
colors
• Normal from appearance
preserving simplification
process, N
• Uses subtractive tone to control overall color
accumulation
August 2003
Multi-Tone Base Coat Vertex Shader
VS_OUTPUT main( float4 Pos
: POSITION,
float3 Normal : NORMAL,
float2 Tex
: TEXCOORD0,
float3 Tangent : TANGENT,
float3 Binormal: BINORMAL )
{
VS_OUTPUT Out = (VS_OUTPUT) 0;
// Propagate transformed position out:
Out.Pos = mul( view_proj_matrix, Pos );
// Compute view vector:
Out.View = normalize( mul(inv_view_matrix,
float4( 0, 0, 0, 1)) - Pos );
// Propagate texture coordinates:
Out.Tex = Tex;
// Propagate tangent, binormal, and normal vectors to pixel shader:
Out.Normal
= Normal;
Out.Tangent = Tangent;
Out.Binormal = Binormal;
}
return2003
Out;
August
Multi-Tone Base Coat Pixel Shader
float4 main( float4 Diff: COLOR0,
float2 Tex: TEXCOORD0,
float3 Tangent: TEXCOORD1, float3 Binormal: TEXCOORD2,
float3 Normal: TEXCOORD3, float3 View: TEXCOORD4 )
: COLOR
Compute
the result
{
Fetch normal
from
Normalize
the
float3 vNormal = tex2D( normalMap, Tex );
Compute
color bymap
Nw
lerping
•view
V
a
normal
and
vNormal = 2 * vNormal - 1.0;
vector
to ensure
using
threeworld-space
input
tones
float3 vView =
normalize( View );
scale
and
bias it
using
normal
computed
vector
higher
quality
results
to move
into
[-1;
fresnel term. 1]
float3x3 mTangentToWorld = transpose( float3x3( Tangent,
Binormal, Normal ));
float3
vNormalWorld
= normalize( mul(mTangentToWorld,vNormal));
float fNdotV = saturate( dot( vNormalWorld, vView ) );
float fNdotVSq = fNdotV * fNdotV;
float4 paintColor = fNdotV
* paintColor0
+
fNdotVSq * paintColorMid +
fNdotVSq * fNdotVSq * paintColor2;
return float4( paintColor.rgb, 1.0 );
}
August 2003
Microflake Layer
August 2003
Microflake Deposit Layer
•
Simulating light interaction resulting from metallic
flakes suspended in the enamel coat of the paint
•
Uses high frequency normalized vector noise map
(Nn) which is repeated across the surface of the
car
August 2003
Computing Microflake Layer
Normals
• Start out by using normal vector
fetched from the normal map, N
• Using the high frequency noise map, compute
perturbed normal Np
• Simulate two layers of microflake deposits by
computing perturbed normals Np1 and Np2
aN n bN
N p1 
aN n bN
N p2
where a << b
August 2003
where c = b
cN n  dN

cN n  dN
Microflake Layer Pixel Shader
float4 main(float4 Diff:
COLOR0,
float2 Tex :
TEXCOORD0,
float3 Tangent: TEXCOORD1, float3 Binormal: TEXCOORD2,
float3 Normal: TEXCOORD3, float3 View:
TEXCOORD4,
float3 SparkleTex : TEXCOORD5 ) : COLOR
{
… fetch and signed scale the normal fetched from the normal map
Compute dot products
of the
normalized
float3 vFlakesNormal = 2 * tex2D( microflakeNMap,
SparkleTex
) - 1; view
float3 vNp1 = microflakePerturbationA * vFlakesNormal
+ with the two
normalPerturbation
* vNormal vector
;
float3 vNp2 = microflakePerturbation * ( vFlakesNormal
+ vNormal
) ;
microflaker
layer normals
float3
vView = normalize( View );
Fetch initial perturbed normal
float3x3 mTangentToWorld = transpose( float3x3( Tangent, Binormal,
Compose
thenoise
microflake
Normal
vector
from));
the
map
layer color
float3 vNp1World = normalize( mul( mTangentToWorld, vNp1) );
Compute
float fFresnel1 = saturate( dot( vNp1World,
vView ));normal vectors for
both microflake layers
float3 vNp2World = normalize( mul( mTangentToWorld, vNp2 ));
float
fFresnel2 = saturate( dot( vNp2World, vView ));
float fFresnel1Sq = fFresnel1 * fFresnel1;
float4 paintColor = fFresnel1 * flakeColor + fFresnel1Sq * flakeColor +
fFresnel1Sq * fFresnel1Sq * flakeColor +
pow( fFresnel2, 16 )
* flakeColor;
return float4( paintColor, 1.0 );
}
August 2003
Clear Gloss Coat
August 2003
Dynamically Blurred Reflections
Blurred Reflections
August 2003
Dynamic Blurring of
Environment Map Reflections
• A gloss map can be supplied to specify the
regions where reflections can be blurred
• Use bias when sampling the environment
map to vary blurriness of the resulting
reflections
• Use texCUBEbias for to access the cubic
environment map
• For rough specular, the bias is high, causing
a blurring effect
• Can also convert color fetched from
environment map to luminance in rough trim
areas
August 2003
Clear Gloss Coat Pixel Shader
Premultiply
alpha shader
channel
float4 ps_main( ... /* same inputs
as in theby
previous
*/ )
{
of the environment map to avoid
// ... use normal in world clamping
space (see highlights
Multi-tone and
pixelbrighten
shader)
the reflections
// Compute reflection vector:
float fFresnel
= saturate(dot( vNormalWorld, vView));
float3 vReflection = 2 * vNormalWorld * fFresnel - vView;
float fEnvBias = glossLevel;
// Sample environment map using this
reflection
vector and
bias:
Resulting
reflective
highlights
float4 envMap = texCUBEbias( showroomMap, float4( vReflection,
fEnvBias ) );
Compute the reflection vector
to fetch from the environment map
// Premultiply by alpha:
envMap.rgb = envMap.rgb * envMap.a;
// Brighten the environment map sampling result:
envMap.rgb *= brightnessFactor;
Shader parameter is used to dynamically
blurmap
thereflection
reflections
bythe
biasing
environment
with
paint
the texture fetch from the environment map
// Combine result of
// color:
float fEnvContribution = 1.0 - 0.5 * fFresnel;
return float4( envMap.rgb * fEnvContribution, 1.0 );
}
August 2003
Compositing Multi-Tone Base Layer
and Microflake Layer
• Base color and flake effect are derived
from Np1 and Np2 using the following
polynomial:
color0(Np1·V) + color1(Np1·V)2 + color2(Np1·V)4 + color3(Np2·V)16
Base Color
August 2003
Flake
Compositing Final Look
{
...
// Compute final paint color: combines all layers of paint as well
// as two layers of microflakes:
float fFresnel1Sq = fFresnel1 * fFresnel1;
float4 paintColor = fFresnel1
* paintColor0 +
fFresnel1Sq * paintColorMid +
fFresnel1Sq * fFresnel1Sq * paintColor2 +
pow( fFresnel2, 16 ) * flakeLayerColor;
// Combine result of environment map reflection with the paint
// color:
float fEnvContribution = 1.0 - 0.5 * fNdotV;
// Assemble the final look:
float4 finalColor;
finalColor.a
= 1.0;
finalColor.rgb = envMap * fEnvContribution + paintColor;
return finalColor;
}
August 2003
Original Hand-Tuned Assembly
ps.2.0
def c0, 0.0, 0.5, 1.0, 2.0
def c1, 0.0, 0.0, 1.0, 0.0
dcl_2d
s0
dcl_2d
s1
dcl_cube s2
dcl_2d
s3
dcl t0
dcl t1
dcl t2
dcl t3
dcl t4
dcl t5
texld r0, t0, s1
texld r8, t5, s3
mad r3, r8, c0.w, -c0.z
mad r6, r3, c4.r, r0
mad r7, r3, c4.g, r0
dp3 r4.a, t4, t4
rsq r4.a, r4.a
mul r4, t4, r4.a
mul r2.rgb, r0.x, t1
mad r2.rgb, r0.y, t2, r2
mad r2.rgb, r0.z, t3, r2
dp3 r2.a, r2, r2
rsq r2.a, r2.a
mul r2.rgb, r2, r2.a
dp3_sat r2.a, r2, r4
mul r3, r2, c0.w
August
2003
. . .
40 ALU ops
3 Tex Fetches
43 Total
mad r1.rgb, r2.a, r3, -r4
mov r1.a, c10.a
texldb r0, r1, s2
mul r10.rgb, r6.x, t1
mad r10.rgb, r6.y, t2, r10
mad r10.rgb, r6.z, t3, r10
dp3 r10.a, r10, r10
rsq r10.a, r10.a
mul r10.rgb, r10, r10.a
dp3_sat r6.a, r10, r4
mul r10.rgb, r7.x, t1
mad r10.rgb, r7.y, t2, r2
mad r10.rgb, r7.z, t3, r2
dp3 r10.a, r10, r10
rsq r10.a, r10.a
mul r10.rgb, r10, r10.a
dp3_sat r7.a, r10, r4
mul r0.rgb, r0, r0.a
mul r0.rgb, r0, c2.r
mov r4.a, r6.a
mul r4.rgb, r4.a, c5
mul r4.a, r4.a, r4.a
mad r4.rgb, r4.a, c6, r4
mul r4.a, r4.a, r4.a
mad r4.rgb, r4.a, c7, r4
pow r4.a, r7.a, c4.b
mad r4.rgb, r4.a, c8, r4
mad r1.a, r2.a, c2.z, c2.w
mad r6.rgb, r0, r1.a, r4
mov oC0, r6
Car Paint Shader HLSL Compiler
Disassembly Output
ps_2_0
def c9, 0.5, 1, 0, 0
def c10, 2, -1, 16, 1
dcl t0.xy
dcl t1.xyz
dcl t2.xyz
dcl t3.xyz
dcl t4.xyz
dcl t5.xy
dcl_2d s0
dcl_2d s1
dcl_cube s2
texld r0, t0, s1
mad r5.xyz, c10.x, r0, c10.y
mul r0.xyz, r5.y, t2
dp3 r1.x, t4, t4
mad r0.xyz, t1, r5.x, r0
rsq r0.w, r1.x
mad r1.xyz, t3, r5.z, r0
mul r3.xyz, r0.w, t4
nrm r0.xyz, r1
dp3_sat r6.x, r0, r3
mul r0.xyz, r0, r6.x
add r0.xyz, r0, r0
mad r0.xyz, t4, -r0.w, r0
mov r0.w, c8.x
texld r1, t5, s0
texldb r0, r0, s2
38 ALU ops
3 Tex Fetches
41 Total !
August 2003
mad r2.xyz, c10.x, r1, c10.y
mul r1.xyz, r5, c2.x
mad r1.xyz, c3.x, r2, r1
mul r4.xyz, r1.y, t2
mad r4.xyz, t1, r1.x, r4
add r2.xyz, r5, r2
mad r4.xyz, t3, r1.z, r4
nrm r1.xyz, r4
mul r2.xyz, r2, c7.x
dp3_sat r5.x, r1, r3
mul r1.xyz, r2.y, t2
mul r1.w, r5.x, r5.x
mad r4.xyz, t1, r2.x, r1
mul r1.xyz, r1.w, c6
mad r4.xyz, t3, r2.z, r4
mul r1.w, r1.w, r1.w
nrm r2.xyz, r4
mad r1.xyz, r5.x, c4, r1
dp3_sat r2.x, r2, r3
mad r1.xyz, r1.w, c5, r1
pow r1.w, r2.x, c10.z
mad r1.xyz, r1.w, c1, r1
mul r0.xyz, r0.w, r0
mad r0.w, r6.x, -c9.x, c9.y
mul r0.xyz, r0, c0.x
mad r0.xyz, r0, r0.w, r1
mov r0.w, c10.w
mov oC0, r0
Full Result of Multi-Layer Paint
August 2003
Translucent Iridescent Shader:
Butterfly Wings
August 2003
Translucent Iridescent
Shader: Butterfly Wings
• Simulates translucency of delicate butterfly
wings
– Wings glow from scattered reflected light
– Similar to the effect of softly backlit rice paper
• Displays subtle iridescent lighting
– Similar to rainbow pattern on the surface of soap bubbles
– Caused by the interference of light waves resulting from
multiple reflections of light off of surfaces of varying
thickness
• Combines gloss, opacity and normal maps
for a multi-layered final look
– Gloss map contributes to satiny highlights
– Opacity map allows portions of wings to be transparent
– Normal map is used to give wings a bump-mapped look
August 2003
RenderMonkey Butterfly
Wings Shader Example
• Parameters that contribute to the
translucency and iridescence look:
–
–
–
–
Light position and scene ambient color
Translucency coefficient
Gloss scale and bias
Scale and bias for speed of iridescence change
• Workspace:
Iridescent Butterfly.rfx
August 2003
Translucent Iridescent
Shader: Vertex Shader
..
// Propagate input texture coordinates:
Out.Tex = Tex;
// Define tangent space matrix:
float3x3 mTangentSpace;
mTangentSpace[0] = Tangent;
mTangentSpace[1] = Binormal;
mTangentSpace[2] = Normal;
Compute
light
vector
Compute
vector
DefineHalfway
tangent
space matrix
in
H =view
V + vector
L
Compute
tangent
space
tangent
space
inintangent
space
// Compute the light vector (object space):
float3 vLight = normalize( mul( inv_view_matrix, lightPos ) - Pos );
// Output light vector in tangent space:
Out.Light = mul( mTangentSpace, vLight );
// Compute the view vector (object space):
float3 vView = normalize( mul( inv_view_matrix, float4(0,0,0,1)) - Pos
);
// Output view vector in tangent space:
Out.View = mul( mTangentSpace, vView );
// Compute the half angle vector (in tangent space):
Out.Half = mul( mTangentSpace, normalize( vView + vLight ) );
August
2003
return Out;
Translucent Iridescent
Shader: Loading Information
Load
normal
a normal
and gloss
Load
basefrom
texture
color map
and alpha
valuevalue
from
gloss mapbase
(combined
in onetexture
texturemap
map)
froma combined
and opacity
float3 vNormal, baseColor;
float fGloss, fTranslucency;
// Load normal and gloss map:
float4( vNormal, fGloss ) = tex2D( bump_glossMap, Tex );
// Load base and opacity map:
float4 (baseColor, fTranslucency) = tex2D( base_opacityMap, Tex );
August 2003
Diffuse Illumination For
Translucency
float3 scatteredIllumination = saturate(dot(-vNormal, Light)) *
fTranslucency * translucencyCoeff;
float3 diffuseContribution
= saturate(dot(vNormal,Light)) +
ambient;
baseColor *= scatteredIllumination + diffuseContribution;
Combine diffuse and scattered light with base texture
*(
August 2003
Light scattered on the butterfly wings is
Compute
diffusely
light using
computed
based reflected
on the negative
normal
(for
scattering offnormal
the surface),
light vector
the
bump-mapped
and ambient
and translucency
coefficient and value for
contribution
)=
the given pixel.
+
Adding Opacity to Butterly
Wings
Resulted color is modulated by the opacity value to add
transparency to the wings:
// Premultiply alpha blend to avoid clamping the highlights:
baseColor *= fOpacity;
*
August 2003
=
Making Butterfly Wings Iridescent
Scale
andmap
biaseffect
gradient
mapcomputed
index to make
gradient
based
on the
index
IridescenceSample
is a view-dependent
iridescence change quicker across the wings
// Compute index into the iridescence gradient map, which
// consists of N*V coefficient
float fGradientIndex = dot( vNormal, View) *
iridescence_speed_scale + iridescence_speed_bias;
// Load the iridescence value from the gradient map:
float4 iridescence = tex1D( gradientMap, fGradientIndex );
Resulting iridescence image:
August 2003
Assembling Final Color
// Compute glossy highlights using values from gloss map:
float fGlossValue = fGloss * ( saturate( dot( vNormal, Half )) *
gloss_scale + gloss_bias );
// Assemble the final color for the wings
baseColor += fGlossValue * iridescence;
Assemble final wings color
Compute gloss value based on the original
gloss map input and < N, H> dot product
August 2003
HLSL Disassembly Comparison
ps.2.0
def
def
c0, 0, .5, 1, 2
c1, 4, 0, 0, 0
12 ALU
3 Texture
15 Total
...
texld
r1, t0, s1
mad
r1.xyz, r1, c0.w,
dp3_sat r4.y, r1, t2
dp3_sat r4.w, r1, -t2
texld
r0, t0, s0
mul
r4.w, r4.w, r0.a
mad
r5.w, r4.w, c1.x,
add
r5.rgb, r5.w, c3
mul
r0.rgb, r0, r5
sub_sat r0.a, c0.z, r0.a
dp3
r6.xy, r1, t1
dp3_sat r6.y, r1, t3
mad
r6.y, r6.y, c4.x,
mul
r6.z, r6.y, r1.w
mad
r6.x, r6.x, c4.z,
texld
r2, r6, s2
mul
r0.rgb, r0, r0.a
mad
r0.rgb, r6.z, r2,
mov oC0, r0
-c0.z
r4.y
c4.y
c4.w
r0
Hand-Tuned Assembly Code
August 2003
ps_2_0
def c6, 2, -1, 1, 0
texld r0, t0, s1
mad r2.xyz, c6.x, r0, c6.y
dp3_sat r0.x, r2, t3
mov r1.w, c5.x
mad r1.w, r0.x, r1.w, c3.x
dp3 r0.x, r2, t1
mul r2.w, r0.w, r1.w
mov r0.w, c2.x
mad r0.xy, r0.x, r0.w, c0.x
texld r1, r0, s2
texld r0, t0, s0
dp3_sat r4.x, r2, t2
dp3_sat r3.x, -r2, t2
add r2.xyz, r4.x, c4
mul r1.w, r0.w, r3.x
mul r1.xyz, r2.w, r1
mad r2.xyz, r1.w, c1.x, r2
mul r0.xyz, r0, r2
add r0.w, -r0.w, c6.z
mad r0.xyz, r0, r0.w, r1
mov oC0, r0
15 ALU
3 Texture
18 Total
HLSL Compiler-Generated Disassembly Code
Example of Translucent Iridescent Shader
August 2003
Optimization Study: Überlight
•
Flexible light described in JGT article “Lighting
Controls for Computer Cinematography” by
Ronen Barzel of Pixar
•
Überlight is procedural and has many controls:
–
•
light type, intensity, light color, cuton, cutoff, near edge,
far edge, falloff, falloff distance, max intensity, parallel
rays, shearx, sheary, width, height, width edge, height
edge, roundness and beam distribution
Code here is based upon the public domain
RenderMan® implementation by Larry Gritz
August 2003
Überlight Spotlight Mode
•
Spotlight mode defines a procedural
volume with smooth boundaries
•
Shape of spotlight is made up of two nested
superellipses which are swept along
direction of light
•
Also has smooth cuton and cutoff planes
•
Can tune parameters to get all sorts of
looks
August 2003
Überlight Spotlight Volume
Roundness = ½
August 2003
Überlight Spotlight Volume
Outer swept
superellipse
Roundness = 1
b
Inner swept
superellipse
a
A
B
August 2003
Original clipSuperellipse() routine
•
•
•
Computes attenuation as a function of a point’s position
in the swept superellipse.
Directly ported from original RenderMan source
Compiles to 42 cycles in ps_2_0, 40 cycles on R3x0
float clipSuperellipse (
float3 Q,
float a,
float b,
float A,
float B,
float roundness)
{
float x = abs(Q.x), y =
float re = 2/roundness;
// Test point on the x-y plane
// Inner superellipse
// Outer superellipse
Computes
ellipse
roundness
// Same roundness
for both
ellipses
exponent for every point
abs(Q.y);
Separate
// roundness exponent calculations of
absolute value
float q = a * b * pow (pow(b*x, re) + pow(a*y, re), -1/re);
float r = A * B * pow (pow(B*x, re) + pow(A*y, re), -1/re);
return smoothstep (q, r, 1);
}
August 2003
Vectorized Version
• Precompute functions of roundness in app
• Vectorize abs() and all of the multiplications
• Compiles to 33 cycles in ps_2_0, 28 cycles on R3x0
float clipSuperellipse (
float2 Q,
// Test
point b
on* the
x-yBplane
Compute
x and
* x in a of
Vectorized
computation
Contains
precomputed
float4 aABb,
// Dimensions
of superellipses
single
instruction
Final
result computation
that feeds
the
absolute
value
2/roundness
and
float2 r)
// Two precomputed
functions
of
roundness
and
a
*
y
and
A
*
y
in
into
smoothstep()
function
–roundness
/
2
parameters
{
another instruction
float2 qr, Qabs = abs(Q);
float2 bx_Bx = Qabs.x * aABb.wzyx;
float2 ay_Ay = Qabs.y * aABb;
// Swizzle to unpack bB
qr.x = pow (pow(bx_Bx.x, r.x) + pow(ay_Ay.x, r.x), r.y);
qr.y = pow (pow(bx_Bx.y, r.x) + pow(ay_Ay.y, r.x), r.y);
qr *= aABb * aABb.wzyx;
return smoothstep (qr.x, qr.y, 1);
}
August 2003
smoothstep() function
• Standard function in procedural shading
• Intrinsics built into RenderMan and
DirectX HLSL:
1
0
edge0
August 2003
edge1
C implementation
float smoothstep (float edge0, float edge1, float x)
{
if (x < edge0)
return 0;
if (x >= edge1)
return 1;
// Scale/bias into [0..1] range
x = (x - edge0) / (edge1 - edge0);
return x * x * (3 - 2 * x);
}
August 2003
HLSL implementation
• The free saturate handles x outside of
[edge0..edge1] range
float smoothstep (float edge0, float edge1, float x)
{
// Scale, bias and saturate x to 0..1 range
x = saturate((x - edge0) / (edge1 – edge0));
// Evaluate polynomial
return x * x * (3 – 2 * x);
}
August 2003
Vectorized HLSL Implementation
• Precompute 1/(edge1 – edge0)
– Done in the app for edge widths at cuton and cutoff planes
• Operation performed on float3s to compute three
different smoothstep operations in parallel
• With these optimizations, the entire spotlight volume
computation of überlight compiles to 47 cycles in
ps_2_0, 41 cycles on R3x0
float3 smoothstep3 (float3 edge,
float3 OneOverWidth, float3 x)
{
// Scale, bias and saturate x to [0..1] range
x = saturate( (x - edge) * OneOverWidth );
// Evaluate polynomial
return x * x * (3 – 2 * x);
}
August 2003
Summary
•
Writing optimal HLSL code
–
–
–
•
Compiling issues
Optimization strategies
Code structure pointers
Shader Examples
–
Shipped with RenderMonkey version 1.0
see www.ati.com/developer
MultiTone Car Paint.rfx
August 2003
Iridescent Butterfly.rfx