Cross Platform Development Best Practices Matt Lee, Kev Gee Microsoft Game Technology Group.

Download Report

Transcript Cross Platform Development Best Practices Matt Lee, Kev Gee Microsoft Game Technology Group.

Cross Platform Development Best
Practices
Matt Lee, Kev Gee
Microsoft Game Technology Group
Agenda
Code Considerations
CPU Considerations
GPU Considerations
IO Considerations
Content Considerations
Data Build System
Geometry Formats
Texture Formats
Shaders
Audio Considerations
Compiler Comparison
VS 2005 front end used for both platforms
Preprocessor benefits both platforms
Debugger experience is the same
Full 2005 IDE support coming
Xbox 360 optimizing back end added with
XDK install
Single solution / MSBuild file can target both
platforms
PC CPUs
Intel Pentium D / AMD Athlon64 X2
Programming Model
2 Cores running @ around 3.20 GHz
12-KB Execution trace cache
16-KB L1 cache, 1 MB L2 cache
Deep Branch Prediction
Dynamic data flow analysis
Speculative Execution
Little-endian byte ordering
SIMD instructions
Quad Core announced for early 2007
360 Custom CPU
Custom IBM Processor
3 64-bit PowerPC cores
running at 3.2 GHz
Two hardware threads per
core
32-KB L1 instruction cache
& data cache, per core
Shared 1-MB L2 cache
128-byte cache lines on all
caches
Big-endian byte ordering
VMX 128 SIMD
Lots of Registers
Performance Tools
Profiling approaches are very similar between
PC and Xbox 360
PIX for Xbox 360 & PIX for Windows
Being developed by the same team now
Use instrumented tools on Xbox 360
XbPerfView / Tracedump
Xbox 360 does not have a sampling profiler yet
Use PC profiling tools
Intel VTune / AMD Code Analyst / VS Team System Profiler
Attend the Performance Hands on training!
Focus Your Efforts
Use performance tools to guide work
Areas where we have seen platform
specific efforts reap rewards
Single Data Pass engine design
High Frequency Game API Layers
Use your profiler tools to target the hot spots
Math Library - Bespoke vs XGMath vs D3DXMath
Impact on Code Design
Designing Cross platform APIs
Use of virtual Functions
Parameter passing mechanisms
Pass by reference vs. pass by value
Typedef vector types and intrinsics
Math Library Design Case Study
Use of inlining
Use of Virtual Functions
Be careful when using virtual functions to
hide platform differences
Virtual function performance on Xbox 360
Adds branch instruction which is always
mispredicted!
Compiler limited in optimizing these
Make a concrete implementation for Xbox
360
Avoid virtual functions in inner loops
Cross Platform Render Example
IRenderSystem
Semi-Abstract
Base Class
D3D9
Xbox 360
D3D10
Overrides
Virtual Base
Concrete
Implementation
Overrides
Virtual Base
Cross Platform Render Example (ctd.)
class IRenderSystem
{
……
public:
#if !defined(_XBOX)
virtual void Draw()=0;
#else
void Draw();
#endif
};
void IRenderSystem::Draw()
{
// 360 Implementation
……
}
D3D9 & D3D10
implementations subclass
for specialization
Beware Big Constructors
Ctors can dominate execution time
Ctors often hidden to casual observer
Copy ctors add objects to containers
Arrays of C++ objects are constructed
Overloaded operators may construct temporaries
Consider: should ctor init data?
Example: matrix class zeroing all data
Prefer array initialization = { … }
Inlining
Careful inlining is in general a Good Thing
Plan to spend time ensuring the compiler
is inlining the right stuff
Use Perf Tools such as VTune / Trace recorder
Try the “inline any suitable” option
Enable link-time code generation
Consider profile-guided optimization
Use __forceinline only where necessary
Consider Passing Native Types by
Value
Xbox 360 has large registers
64 bit Native PC does too
Pass and return these types by value
int, __int64, float
Consider these types if targeting SSE / VMX
__m128 / __vector4, XMVECTOR, XMMATRIX
Pass structs by pointer or reference
Help the compiler using _restrict
Math Library Header (Xbox 360)
#if defined( _XBOX )
#include <ppcintrinsics.h>
#include <vectorintrinsics.h>
typedef __vector4
XVECTOR;
typedef const XVECTOR
typedef XVECTOR&
XVECTOR_PARAM;
XVECTOR_OUTPARAM;
#define XMATHAPI inline
#define VMX128_INTRINSICS
#endif
Pass by value
Math Library Header (Windows)
#if defined( _WIN32 )
#include <xmmintrin.h>
typedef __m128
XVECTOR;
typedef const XVECTOR&
typedef XVECTOR&
XVECTOR_PARAM;
XVECTOR_OUTPARAM;
#define XMATHAPI inline
#define SSE_INTRINSICS
#endif
Pass by
reference
Math Library Function
XVECTOR XMATHAPI XVectorAdd( XVECTOR_PARAM vA,
XVECTOR_PARAM vB )
{
#if defined( VMX128_INTRINSICS )
return __vaddfp( vA, vB );
#elif defined( SSE_INTRINSICS )
return _mm_add_ps( vA, vB );
#endif
}
Threading
Why Multithread?
Necessary to take full advantage of modern
CPUs
Attend the Multi-threading talk later today
Covers synchronization prims and lockless sync
methods
See Also:
Talks from Intel and AMD (GDC2005 / GDC-E)
OpenMP – C, not C++, useful in limited circumstances
Concur – C++, see
http://microsoft.sitestream.com/PDC05/TLN/TLN309_fi
les/Default.htm#nopreload=1&autostart=1
D3D Architectural Differences
D3D9 draw call cost is higher on Windows
than on Xbox 360
360 is optimized for a Single GPU target
D3D10 improves draw call cost by design on
Windows
Very important to carefully manage the
number of batches submitted
This can have an impact on content creation
This work will help with 360 performance too
Agenda
Code Considerations
CPU Considerations
GPU Considerations
IO Considerations
Content Considerations
Data Build System
Geometry Formats
Texture Formats
Shaders
Audio Considerations
PC GPUs
Wide variety of available Direct3D9 H/W
CAPs and Shader Models abstract over feature
differences
GPUs that are approximately equivalent performance to the
Xbox 360 GPU
ATi X1900 / NVidia 7800 GTX
Shader Model 3.0 support
Direct3D10 Standardizes feature set
H/W Scales on performance instead
Xbox 360 Custom GPU
Direct3D 9.0+ compatible
High-Level Shader Language (HLSL) 3.0+ support
10 MB Embedded DRAM
Frame Buffer with 256 GB/sec bandwidth
Hardware scaling for display resolution matching
48 shader ALUs shared between pixel and vertex shading
(unified shaders)
Up to 8 simultaneous contexts (threads) in-flight at once
Changing shaders or render state can be cheap, since a new context
can be started up easily
Hardware tesselator
N-patches, triangular patches, and rectangular patches
For non continuous / adaptive cases trade memory for
this feature on PC systems
Explicit Resolve Control
Copies surface data from EDRAM to a texture in
system memory
Required for render-to-texture and presentation
to the screen
Can perform MSAA sample averaging or resolve individual
samples
Can perform format conversions and biasing
Cannot do rescaling or resampling of any kind
This can Impact your Xbox 360 engine design as
it adds an extra step to common operations.
Agenda
Code Considerations
CPU Considerations
GPU Considerations
IO Considerations
Content Considerations
Geometry
Textures
Shaders
Audio data
Use Native File I/O Routines
Only native routines support key features:
Asynchronous I/O
Completion routines
Prefer CreateFile and ReadFile
Guaranteed as fast or faster than any other
alternatives
Avoid fopen, fread, C++ iostreams
Use Asynchronous File I/O
File read/write operations block by default
Async operations allows the game to do
other interesting work
CreateFile with FILE_FLAG_OVERLAPPED
Use FILE_FLAG_NO_BUFFERING, too
Guarantees no intermediate buffering
Use OVERLAPPED struct to determine when
operation is complete
See CreateFile docs for details
Memory Mapped File I/O
Fastest way to load data on Windows
However, the 32 bit address space is getting tight
This is a great 64 bit feature add! 
Memory Mapped I/O not supported on 360
No HDD backed Virtual Memory management
system
Universal Gaming Controller
XInput is the same API for Xbox 360 and Windows
The Microsoft universal controller is a reference
design which can be leveraged by other hardware
manufacturers
XP Driver available from Windows Update
Support is built in to Xbox 360 and Windows Vista
Agenda
Code Considerations
CPU Considerations
GPU Considerations
IO Considerations
Content Considerations
Data Build System
Geometry Formats
Texture Formats
Shaders
Audio Considerations
Data Build System
Add a data build / processing phase to your
production system
Compile, optimize and compress data according to
multiple target platform requirements
Easier and faster to handle endian-ness and other format
conversions offline
Data packing process can occur here too
Invest time in making the build fast
Artists need to rapidly iterate to make quality content
Incremental builds can really help reduce the buildtime
Try the XNA build tools
Copies of XNA build CTP are available NOW!
Geometry Compression
Offline Compression of Geometry
Provides wins across all platforms
Disk I/O wins as well as GPU wins
The compression approach is likely to be target
specific
PC is usually a superset of the consoles in this
area
D3D9 CAPs / limitations to consider
16 bit Normals - D3DDECLTYPE_FLOAT16_2
Compressing Textures
Wide variety of Texture Compression Tools
ATI Compressinator
DirectX SDK DDS tools
NVIDIA – Photoshop DDS Export
Compression tools for 360 (xgraphics.lib)
Supports endian swap of texture formats
Build your own too!
Make them fit your content.
Texture Formats
DXT* / DXGI_FORMAT_BC*
BC == Block Compressed
Standard DXT* formats across all platforms
DXN / DXGI_FORMAT_BC5 / BC5u
2-component format with 8 bits of precision per
component
Great for normal maps
DXT3A / DXT5A
Single component textures made from a DXT3/DXT5
alpha block
4 bits of precision
Xbox 360 / D3D9 Only
Texture Arrays
Texture arrays
generalized version of cube maps
D3D9 emulate using a texture atlas
Xbox 360
Up to 64 surfaces within a texture, optional MIPmaps for each
surface
Surface is indexed with a [0..1] z coordinate in a 3D texture
fetch
D3D10 supports this as a standard feature
Up to 512 surfaces within a texture
Bindable as rendertarget, with per-primitive array index
selection
Custom Vertex Fetch / Vertex Texture
D3D9 Vertex Texture implementations use
intrinsics
tex2dlod()
360 supports explicit instructions for this
D3D10 supports this as a standard feature
Load() from buffer (VB, IB, etc.) at any stage
Sample() from texture at any stage
Effects
D3DX FX and FX Lite co-exist easily
#define around the texture sampler differences
Preshaders are not supported on FX Lite
We advise that these should be optimized to
native code for D3D9 Effects
HLSL Development
Set up your engine and tools for rapid
shader development and iteration
Compile shaders offline for performance,
maybe allow run-time recompilation during
development
Be careful with shader generation tools
Perf needs to be considered
Schedule / Plan work for this
Cross-Platform HLSL Consideration
Texture access instruction considerations
Xbox 360 has native tfetch / getWeights features
Constant texel offsets (-8.0 to 7.5 in 0.5 increments)
Independent of texture size
Direct3D 10 supports integer texture offsets when
fetching
Direct3D 10 supports getdimensions() natively
Equivalent to getWeights
Direct3D 9 can emulate tfetch & getWeights
behavior using a shader constant for texture
dimensions
HLSL Example
float2 g_invTexSize = float2( 1/512.0f, 1/512.0f);
float2 getWeights2D( float2 texCoord )
{
return frac( texCoord / g_invTexSize );
}
float4 tex2DOffset( sampler t, float2 texCoord, float2 offset )
{
texCoord += offset * g_invTexSize;
return tex2D( t, texCoord );
}
Shader management
Find a balance between übershaders and specialized
shader libraries
Dynamic/static branching versus static compilation
Small shader libraries can be built and stored inside a single
Effect file
One technique per shader configuration
Larger shader libraries
Hash table populated with configurations
Streaming code can load could shader groups on demand
Profile-guided content generation
Avoid compiling shaders at run time
Compiled shaders compress very well
Audio Considerations
XACT
(Microsoft Cross-Platform Audio Creation Tool)
API and authoring tool parity:
author once, deploy to both platforms
Primary difference = wave compression
ADPCM on Windows vs. Xbox 360 native XMA support
XMA: controllable quality setting (varies, typically ~614:1)
ADPCM: Static ~3.5:1 compression
Likely need to trade memory for bit rate.
On Windows, can use hard disk streaming to balance
lower compression rates if needed
Call To Action!
Design your games, engines and production
systems with cross platform development in mind
(PC / Xbox 360 / Other)
Invest in making your data build system fast
Take advantage of each platforms strengths
Target a D3D10 content design point and fallback to
D3D9+, D3D9, …
Provide feedback on how we can make production
easier
Attend the XACT, HLSL, SM4.0 and Performance
Hands On Labs
Questions?