Transcript Foo
Making a game with Molehill: Zombie Tycoon
Luc Beaulieu CTO – Frima Studio Jean-Philippe Auclair Lead R&D Software Architect
Session Overview • • • State of Flash Molehill’s API presentation Digging deeper into Molehill
State of Flash • Is Flash Dead?
• FB: Top 10 = 250M MAU • Desktops: Flash 10 installed on 99%+ • SmartPhones: Flash/Air 200+M, 100 devices • Streaming: 120 petabytes per month • Advances in Flash for 3D games • AS3 • 10.1, 10.2 … • Molehill
Molehill’s API Presentation •
Pros:
– GPU Accelerated API – Relies on DirectX 9 and OpenGL ES 2.0
–
Native Software fallback
•
Cons:
– No point sprite support, branching, MRT, depth buffer – – No CPU threading support
Native Software fallback
This Page Intentionally Left Green
Digging deeper into Molehill • Assuming a basic knowledge of 3D development terminology • • • • • • • • Display Layers Model/Animation File Format Character Animation: Matrix vs Quaternion Texturing Optimizing the Particle System Fast Lights & Shadows CPU Post-Processing effects Profiling & Debugging tools • Bonus!
– The math explaining all the numbers I’m going to talk about – Cheat sheets
Display Layers
Frima 3D File Format • • Many 3D engines for flash try to support multiple input format …Or support only generic format such as ColladaXML • Using a format optimized for 3D game made in Flash – Small File Size – Small Memory footprint – No processing required
Model & Animation File Processing on low-end computer 5250
6000 5000 4000 3000 2000 1000 0 Collada XML
15
Frima Binary Format Time to process (ms)
• Export pipeline 3DS Max Scene Frima 3D File Format Build Tool Max Script Exporter
Collada XML
Frima 3D File Format • Export pipeline Build Tool Model / Animation Game Object Serialize (AMF) Compress Game File
• In-Game usage Game File Uncompress Unserialize Frima 3D File Format Game Object Add To Scene
Zombie Re-Animation • • Techniques – Matrix linear blending – DualQuaternion linear blending Molehill Constraint – Vertex Shader constants limits: 128 Float4 Zombie: 24 bones
Animation techniques • • • Matrix linear blending can cause loss of volume when joints are twisted or extremely bent When using matrix, each bone take 3 constants – Maximum number of bones is 40 When using DualQuats, each bone take only 2 constants – Maximum number of bones is 60
Matrix (left) / Dual Quaternion (Right)
Transitions & interpolation • • Animation transition require two sets of bones • Idle blending to walk Same thing for frame interpolation (ex: Bullet time Animation) DualQuaternion matrix 48 72 0 32 64 96 VertexShader constant required for animating a character (24 bones) 128 DualQuaternion matrix 0 Anim1 (48) Anim1 (72) 32 Constant for anim 1 Anim2 (48) 64 Constant for anim 2 96 Anim2 (72) 128 Too Much
DualQuaternion matrix File size? Performance?
Animation file size (k) 60 50
DualQuaternion matrix
54 136
0 32 64 96 128 160 192 224
VertexShader assembler instructions for animation processing
256 DualQuaternion matrix
Vertex Shader processing time 130% 100%
Texturing in Molehill
Texturing in Molehill • • The first version of the engine was only using PNGs Adobe Texture Format (ATF) – Texture are kept compressed in Video Memory – Native support for multi-device publishing – – One file containing 3 encoding: DXT1, ETC1 and PVRTC 1.3x bigger than original PNG – – Contain the MipMapping of the texture Does not support transparency
Texturing in Molehill • Transparency – Use PNGs with indexed color – Sample a “alpha mask texture” in the pixel shader
ATF Avatar = opaque PNG Fence = Transparent
Texturing in Molehill • • Many effects can use ATF when using the good blend modes No need for transparency
Splatter = Multiply Fire = Additive
Particle System • Using a divided workload (CPU/GPU) for better performance – Each particle property update is computed on the CPU at each frame • Alpha, Color, Direction, Rotation, frame(If SpriteSheet), etc.
– On the GPU • Applying theses properties • Expending billboard vertex to face the screen
Particle System : Optimization • • • • How many particle?
– Due to the VertexBuffer and IndexBuffer limits, – In ZombieTycoon we were limited to around 16383 particles per draw call Using Fast ByteArray (also known as Alchemy memory or DomainMemory) – Using Azoth, properties updates were 10 times faster Batching draw calls using the same texture Using a 100% GPU particle system – It’s expensive on the GPU – Support only linear transformation – Zero CPU required
Particle System
Lights & shadows • Techniques – ShadowMap & LightMap – – – Dynamic lighting Fake Volumetric lights Fake projected shadows
Lights & shadows • ShadowMap & LightMap – We used two textures, a “multiplied” ShadowMap and an “additive” LightMap Diffuse * ShadowMap + Lightmap = Composite
Lights & shadows • Dynamic lighting – Lighting required expensive pixel shader, currently limited to 256 instructions – Zombie Tycoon support up to 7-9 lights (spot or points) per object.
Lights & shadows • Pixel Shader assembly code – Per light, without Normal/Specular mapping.
Lights & shadows • Fake Volumetric Lights – Using a few billboard particles, it’s easy to fake a nice and lightweight volumetric lighting – All object are sampling Shadow and light maps, and since the light particles are “additive”, if an object is behind the lights, it will look brighter
Lights & shadows
Lights & shadows • Fake projected shadows – We created a particle of a gradient black spot aligned to the ground – Orientation and scale of the particle depends on light position and intensity
CPU Post-Processing • Possibility of reading the BackBuffer – Strongly recommended not to use Readback – Fast pipeline for data from the System memory to Video memory – VERY slow pipeline from video to system memory • Effects: Bloom, Blur, Depth of Field, etc.
Motion Blur
Normal CPU Post-Processing Bloom post-processing
Profiling and Debugging tools (CPU) • FlashDevelop
(O.S.S.)
– Most of the production is using FlashDevelop – Now with a profiler and a debugger, it’s very easy to work with it
Profiling and Debugging tools (CPU) • Adobe Flash Builder Profiler – Profile Function calls – Profile Memory allocation
Profiling and Debugging tools (CPU) • FlashPreloadProfiler
(O.S.S.)
– Profile Function calls – – Profile Memory allocation Profile Loaders status – Can be used in Debug/Release & browser/Projector
Profiling and Debugging tools (GPU) • Pix for windows – List of API calls – Shaders assembly code – Pixel debugger – Texture viewer
Profiling and Debugging tools (GPU) • Intel® Graphics Performance Analyzers (GPA) – Render in wireframe – – Profile Vertex and Pixel shader performance Visualize overdraw and draw call sequence – – Save a frame, and make real-time experiment Identification of bottlenecks
Sources & References • • • Geometric Skinning with Approximate Dual Quaternion Blending – http://isg.cs.tcd.ie/kavanl/papers/sdq-tog08.pdf
Intel® Graphics Performance Analyzers (GPA) – http://software.intel.com/en-us/articles/intel-gpa/ Pix for windows – http://msdn.microsoft.com/en-us/library/ee417072(v=VS.85).aspx
• • • • • Contact Luc Beaulieu [email protected]
TD-Matt blog • http://td-matt.blogspot.com/ FlashPreloadProfiler • http://jpauclair.net/flashpreloadprofiler/ Azoth • http://www.buraks.com/azoth/ Flash in Facebook • AppData.com
Flash Stats • • http://adobe.ly/rwXU http://adobe.ly/gnlUEH Jean-Philippe Auclair [email protected]
@jpauclair jpauclair.net
Bonus Slide: The maths!
• • • Character animation: – Matrix linear blending: • 128 Float4 VertexConstant – WorldMatrix – ViewProj matrix = 120Float4 • 120Float4 / / 3Float4 per bone = 40 bones in the constants • Bullet time and transitions require two sets of bones: 40/2 = 20 bones per character max – DualQuaternion linear blending: • 128 Float4 VertexConstant – WorldMatrix – ViewProj matrix = 120Float4 • 120Float4 / / 2Float4 per bone = 60 bones in the constants • Bullet time and transitions require two sets of bones: 60/2 = 30 bones per character max Max Particle Count – – – The VertexBuffer is limited to 65536 vertex, the IndexBuffer is limited to 983040 index of type SHORT In theory, you could have up to 327680 triangle in one draw call In practice, with no vertex re-use between particles and using quads (4 vertex): 65536/6 = 16383 particle max per draw call Lighting – With the PixelShader limit of 256 instructions, we were able to fit around 7 to 9 dynamic lights per object (point or spot light)
• Cheat Sheet Achievement: Geek
Achievement: Super Geek!
Thank You! Questions?
Contact Luc Beaulieu [email protected]
Jean-Philippe Auclair [email protected]
@jpauclair jpauclair.net