Transcript Foo

Making a game with Molehill: Zombie Tycoon

Luc Beaulieu CTO – Frima Studio Jean-Philippe Auclair Lead R&D Software Architect

Session Overview • • • State of Flash Molehill’s API presentation Digging deeper into Molehill

State of Flash • Is Flash Dead?

• FB: Top 10 = 250M MAU • Desktops: Flash 10 installed on 99%+ • SmartPhones: Flash/Air 200+M, 100 devices • Streaming: 120 petabytes per month • Advances in Flash for 3D games • AS3 • 10.1, 10.2 … • Molehill

Molehill’s API Presentation •

Pros:

– GPU Accelerated API – Relies on DirectX 9 and OpenGL ES 2.0

Native Software fallback

Cons:

– No point sprite support, branching, MRT, depth buffer – – No CPU threading support

Native Software fallback

This Page Intentionally Left Green

Digging deeper into Molehill • Assuming a basic knowledge of 3D development terminology • • • • • • • • Display Layers Model/Animation File Format Character Animation: Matrix vs Quaternion Texturing Optimizing the Particle System Fast Lights & Shadows CPU Post-Processing effects Profiling & Debugging tools • Bonus!

– The math explaining all the numbers I’m going to talk about – Cheat sheets

Display Layers

Frima 3D File Format • • Many 3D engines for flash try to support multiple input format …Or support only generic format such as ColladaXML • Using a format optimized for 3D game made in Flash – Small File Size – Small Memory footprint – No processing required

Model & Animation File Processing on low-end computer 5250

6000 5000 4000 3000 2000 1000 0 Collada XML

15

Frima Binary Format Time to process (ms)

• Export pipeline 3DS Max Scene Frima 3D File Format Build Tool Max Script Exporter

Collada XML

Frima 3D File Format • Export pipeline Build Tool Model / Animation Game Object Serialize (AMF) Compress Game File

• In-Game usage Game File Uncompress Unserialize Frima 3D File Format Game Object Add To Scene

Zombie Re-Animation • • Techniques – Matrix linear blending – DualQuaternion linear blending Molehill Constraint – Vertex Shader constants limits: 128 Float4 Zombie: 24 bones

Animation techniques • • • Matrix linear blending can cause loss of volume when joints are twisted or extremely bent When using matrix, each bone take 3 constants – Maximum number of bones is 40 When using DualQuats, each bone take only 2 constants – Maximum number of bones is 60

Matrix (left) / Dual Quaternion (Right)

Transitions & interpolation • • Animation transition require two sets of bones • Idle blending to walk Same thing for frame interpolation (ex: Bullet time Animation) DualQuaternion matrix 48 72 0 32 64 96 VertexShader constant required for animating a character (24 bones) 128 DualQuaternion matrix 0 Anim1 (48) Anim1 (72) 32 Constant for anim 1 Anim2 (48) 64 Constant for anim 2 96 Anim2 (72) 128 Too Much

DualQuaternion matrix File size? Performance?

Animation file size (k) 60 50

DualQuaternion matrix

54 136

0 32 64 96 128 160 192 224

VertexShader assembler instructions for animation processing

256 DualQuaternion matrix

Vertex Shader processing time 130% 100%

Texturing in Molehill

Texturing in Molehill • • The first version of the engine was only using PNGs Adobe Texture Format (ATF) – Texture are kept compressed in Video Memory – Native support for multi-device publishing – – One file containing 3 encoding: DXT1, ETC1 and PVRTC 1.3x bigger than original PNG – – Contain the MipMapping of the texture Does not support transparency

Texturing in Molehill • Transparency – Use PNGs with indexed color – Sample a “alpha mask texture” in the pixel shader

ATF Avatar = opaque PNG Fence = Transparent

Texturing in Molehill • • Many effects can use ATF when using the good blend modes No need for transparency

Splatter = Multiply Fire = Additive

Particle System • Using a divided workload (CPU/GPU) for better performance – Each particle property update is computed on the CPU at each frame • Alpha, Color, Direction, Rotation, frame(If SpriteSheet), etc.

– On the GPU • Applying theses properties • Expending billboard vertex to face the screen

Particle System : Optimization • • • • How many particle?

– Due to the VertexBuffer and IndexBuffer limits, – In ZombieTycoon we were limited to around 16383 particles per draw call Using Fast ByteArray (also known as Alchemy memory or DomainMemory) – Using Azoth, properties updates were 10 times faster Batching draw calls using the same texture Using a 100% GPU particle system – It’s expensive on the GPU – Support only linear transformation – Zero CPU required

Particle System

Lights & shadows • Techniques – ShadowMap & LightMap – – – Dynamic lighting Fake Volumetric lights Fake projected shadows

Lights & shadows • ShadowMap & LightMap – We used two textures, a “multiplied” ShadowMap and an “additive” LightMap Diffuse * ShadowMap + Lightmap = Composite

Lights & shadows • Dynamic lighting – Lighting required expensive pixel shader, currently limited to 256 instructions – Zombie Tycoon support up to 7-9 lights (spot or points) per object.

Lights & shadows • Pixel Shader assembly code – Per light, without Normal/Specular mapping.

Lights & shadows • Fake Volumetric Lights – Using a few billboard particles, it’s easy to fake a nice and lightweight volumetric lighting – All object are sampling Shadow and light maps, and since the light particles are “additive”, if an object is behind the lights, it will look brighter

Lights & shadows

Lights & shadows • Fake projected shadows – We created a particle of a gradient black spot aligned to the ground – Orientation and scale of the particle depends on light position and intensity

CPU Post-Processing • Possibility of reading the BackBuffer – Strongly recommended not to use Readback – Fast pipeline for data from the System memory to Video memory – VERY slow pipeline from video to system memory • Effects: Bloom, Blur, Depth of Field, etc.

Motion Blur

Normal CPU Post-Processing Bloom post-processing

Profiling and Debugging tools (CPU) • FlashDevelop

(O.S.S.)

– Most of the production is using FlashDevelop – Now with a profiler and a debugger, it’s very easy to work with it

Profiling and Debugging tools (CPU) • Adobe Flash Builder Profiler – Profile Function calls – Profile Memory allocation

Profiling and Debugging tools (CPU) • FlashPreloadProfiler

(O.S.S.)

– Profile Function calls – – Profile Memory allocation Profile Loaders status – Can be used in Debug/Release & browser/Projector

Profiling and Debugging tools (GPU) • Pix for windows – List of API calls – Shaders assembly code – Pixel debugger – Texture viewer

Profiling and Debugging tools (GPU) • Intel® Graphics Performance Analyzers (GPA) – Render in wireframe – – Profile Vertex and Pixel shader performance Visualize overdraw and draw call sequence – – Save a frame, and make real-time experiment Identification of bottlenecks

Sources & References • • • Geometric Skinning with Approximate Dual Quaternion Blending – http://isg.cs.tcd.ie/kavanl/papers/sdq-tog08.pdf

Intel® Graphics Performance Analyzers (GPA) – http://software.intel.com/en-us/articles/intel-gpa/ Pix for windows – http://msdn.microsoft.com/en-us/library/ee417072(v=VS.85).aspx

• • • • • Contact Luc Beaulieu [email protected]

TD-Matt blog • http://td-matt.blogspot.com/ FlashPreloadProfiler • http://jpauclair.net/flashpreloadprofiler/ Azoth • http://www.buraks.com/azoth/ Flash in Facebook • AppData.com

Flash Stats • • http://adobe.ly/rwXU http://adobe.ly/gnlUEH Jean-Philippe Auclair [email protected]

 @jpauclair  jpauclair.net

Bonus Slide: The maths!

• • • Character animation: – Matrix linear blending: • 128 Float4 VertexConstant – WorldMatrix – ViewProj matrix = 120Float4 • 120Float4 / / 3Float4 per bone = 40 bones in the constants • Bullet time and transitions require two sets of bones: 40/2 = 20 bones per character max – DualQuaternion linear blending: • 128 Float4 VertexConstant – WorldMatrix – ViewProj matrix = 120Float4 • 120Float4 / / 2Float4 per bone = 60 bones in the constants • Bullet time and transitions require two sets of bones: 60/2 = 30 bones per character max Max Particle Count – – – The VertexBuffer is limited to 65536 vertex, the IndexBuffer is limited to 983040 index of type SHORT In theory, you could have up to 327680 triangle in one draw call In practice, with no vertex re-use between particles and using quads (4 vertex): 65536/6 = 16383 particle max per draw call Lighting – With the PixelShader limit of 256 instructions, we were able to fit around 7 to 9 dynamic lights per object (point or spot light)

• Cheat Sheet Achievement: Geek

Achievement: Super Geek!

Thank You! Questions?

Contact Luc Beaulieu [email protected]

Jean-Philippe Auclair [email protected]

 @jpauclair  jpauclair.net