Afrigraph 2004 Interactive Ray-Tracing of Free-Form Surfaces Carsten Benthin Ingo Wald Philipp Slusallek Computer Graphics Lab Saarland University, Germany http://graphics.cs.uni-sb.de.
Download ReportTranscript Afrigraph 2004 Interactive Ray-Tracing of Free-Form Surfaces Carsten Benthin Ingo Wald Philipp Slusallek Computer Graphics Lab Saarland University, Germany http://graphics.cs.uni-sb.de.
Afrigraph 2004 Interactive Ray-Tracing of Free-Form Surfaces Carsten Benthin Ingo Wald Philipp Slusallek Computer Graphics Lab Saarland University, Germany http://graphics.cs.uni-sb.de Motivation Why ray-tracing of free-form surfaces ? Tessellating surfaces has disadvantages: • Scene size complexity • Accuracy • Time complexity (preprocessing step) • Workflow (Animation Tools, CAD) Want directly ray-trace Splines, NURBS, Subdivision Surfaces, ... Previous Approaches Splines and NURBS • Bezier Clipping – Nishita, Campagna, Wang • Newton Iteration – Wang, Martin Subdivision Surfaces • Adaptive refinement – Kobbelt, Mueller Too slow for interactive use Interactive Ray-Tracing (IRT) OpenRT • Wald, Benthin, Dietrich • Interactive performance even on a commodity PC • Limited to triangles Utah RTRT • Shirley, Parker • Supports even spline surfaces • Need large parallel supercomputer for interactive performance Outline Analysis of existing approaches (ray/primitive test) Our simple and generic approach Implementation: Bicubic Bezier, Loop Subdivision Complex Scenes Results Conclusion & Future Work Analysis I Performance „killers“ of current CPU architectures • Memory latency cache misses • Long pipelines branch mispredictions • Complex control flow low instruction level parallelism Code Optimization • Ensure data locality & memory access pattern • Simple code control flow (streamline) • Data level parallelism (SIMD), e.g. Intel‘s SSE Instructions Analysis II Analytical/Iterative algorithms • Algorithmic complexity • Numerical problems • Handling many special cases • Complex control flow Adaptive refinement algorithms • Test for required refinement is costly • Crack prevention additional book-keeping data structures • Complex control flow Our Approach I Simple and generic „divide and conquer“ Fixed number of refinement steps Our Approach II • Few core operations – Refinement – Pruning – Final Intersection • No refinement test • No crack handling • Constant memory costs Implementation for Bicubic Bezier Surfaces • 4x4 control points, suited for 4-way-SIMD • Optimized data layout for maximize SIMD performance – SOA instead AOS • Pruning: Ray as intersection of two planes (half-space criteria) • Refinement: deCasteljau algorithm (alternate direction) • Final Intersection: Triangulation + Ray/Triangle-Tests Implementation for Loop Subdivision Surfaces • AOS data layout • Regular/Irregular Triangles – One refinement step max one irregular vertex • Pruning: Two planes critera (1-neighborhood) • Refinement: Loop subdivision four new triangles • Final Intersection: Triangle test Core Performance (Cycles) Bicubic Bezier Loop Subdivision Pruning 86 172/222 Refinement 168/244 405/600 Final Intersection 294-366 169 Subdivision core operations more complex higher cost Free-Form Objects Reduce tested primitives/ray • Use spatial acceleration structure (KD-Tree) – Construct KD-Tree out of AABBs – Use fast traversal from IRT • Surface Area Heuristics better KD-Tree • Mailboxing • Reduction: up to 92% (78%) Dynamic Objects • Less primtives KD-Tree reconstruction every frame Results (Bezier Surfaces) P4 CPU 2.2 GHz, 640x480, #tris = #primitives * 2^steps * 18 Results (Subdivision Surfaces) P4 CPU 2.2 GHz, 640x480, 640x480, #tris = #primitives * 4^steps Video Conclusion Pros • Simple and robust approach • Interactive performance on single CPU • Memory efficient, compute bounded ray/primitive test • Support for trimming curves • Half the speed compared to triangle approach • Can easily integrated into existing ray-tracers (OpenRT) Cons • Over-refinement • Subdivision Surface implementation not optimal Current & Future Work • Key to performance: tracing ray bundles (e.g. 16 rays) – Amortize core operation costs over rays • Different variants for final intersection step – Plücker triangle test – Bilinear Patch • Very fast KD-Tree reconstruction code – Multithreading • Higher order bezier patches or even direct NURBS support • Improve implementation for Subdivision Surfaces Questions ?