Experiences with fast KD tree construction

Download Report

Transcript Experiences with fast KD tree construction

Experiences with Streaming Construction of SAH KD Trees

Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek

Motivation

  Large speed-up of ray tracing lately   Better algorithms (packet tracing [Wald04, Reshetov05] ) Optimized spatial index structures  Best known: KD trees [Havran00]   Faster hardware Research concentrated mainly on static scenes Dynamic scenes   Building – slow for SAH based KD trees Done in a pre-processing step Stefan Popov Streaming Construction of KD Trees

Dynamic Scenes Approaches

   Embed dynamics in the index structure   Use a two level approach [Wald 03 ] Fuzzy KD trees [Günther06] Update index structure  Grids, BVHs and KD tree hybrids  Faster build/update  Lower traversal performance  No efficient approach for KD trees Rebuild entire KD tree   Need to make it fast Lazy build Stefan Popov Streaming Construction of KD Trees

SAH Algorithm

 Extract & sort events in advance   Abstract objects with AABBs Events given by AABB boundaries 1 3  Recursive top-down construction  Find split plane using SAH  Compute minimum cost  Distribute objects to children   By distributing the events Keep them sorted 2 4 5 8 1, 2, 3, 4, 5, 6, 7, 8 X: 68 7 6 Left 1, 2, 3, 4 Right 4, 5, 6, 7, 8 Stefan Popov Streaming Construction of KD Trees

SAH Cost Function

  Piecewise linear   Discontinuities at object boundaries Evaluate only before opening and after closing event Y 1 2 X 179 169 159 149 139 129 -2 18 38 58 78 98 Stefan Popov Streaming Construction of KD Trees

Distribution Along the Split Axis

   Given: event list & split position Sweep event list and classify  Open event   Before split  label object “both” After split  label object “right”  Close event  Before split  re label object “left”  Copy event to corresponding child’s list  Might have to insert new events Random memory access [ Right Left Both Left [ ] [ ] Right ] X o b e b a L th th o b l e b a L Both l e b a L t h g ri [ [ ] ] [ [ Streaming Construction of KD Trees ] ] Stefan Popov

Distribution Along the Other Axes

 Sweep event lists. Copy event to    Left, if corresponding object labeled “left” or “both” Right, if corresponding object labeled “right” or “both” Look up in object array  Random memory access Y Y Y Y Right Left Both Stefan Popov Streaming Construction of KD Trees

Problems of KD Tree Construction

 Random memory accesses  Expensive cost function evaluation  Initial sorting – inefficient for lazy builds Stefan Popov Streaming Construction of KD Trees

Streaming Algorithm Overview

    Work with unsorted lists of AABBs  Avoid initial sorting Sweep list once to locate initial split plane In a single sweep   Distribute objects (straightforward) Determine split positions of children Once data fits in caches, switch to conventional build Parent list Stefan Popov Left list Right list Streaming Construction of KD Trees

SAH Cost Estimation

 Cost function typically varies only slowly  No need to evaluate SAH at every event  Use sampling!

  18000 16000 14000 12000

SAH

10000 8000 Real minimum Minimum found 6000 -0.4

-0.3

-0.2

-0.1

0 0.1

0.2

0.3

0.4

0.5

Naïve approach  For every event: check all samples 

O(kN)

How to sample efficiently?

Stefan Popov Streaming Construction of KD Trees

Efficient Sampling

 Two step approach   #Objects to left of sample = # Opening events to its left #Objects to right of sample = # Closing events to its right   Count opening/closing events between samples  Regular sampling  index computation in

O(1)

Reconstruct left/right object counts at samples  Using two partial sums from left and right 

O(k+N)

[ 1 0 [ ] 1 1 [ 1 1 ] 0 1 ] Stefan Popov 0 3 1 3 2 2 3 1 3 0 Streaming Construction of KD Trees

Refining of Samples

  SAH  – sum of two monotone functions –

C l

and

C r

Cost between two samples

a

< 

C

C min = min

(

C l

)

+ min

(

C r

)

= C l

(

a

)

b

is bounded from below

+ C r

(

b

) Resample areas where

C min <

 current minimum Typically only few intervals need to be re-sampled (< 5%) 18000 16000 14000 12000 10000 Current minimum 8000 6000 4000 2000 0 -0.4

-0.3

-0.2

C l -0.1

C = C l + C r

0 0.1

0.2

C r 0.3

0.4

0.5

Stefan Popov Streaming Construction of KD Trees

Algorithm properties

 Streaming memory accesses  SAH cost function estimated by sampling  No initial sorting required  Refining of Samples Stefan Popov Streaming Construction of KD Trees

Improvements

  Conventional Algorithm  Use radix sort –

O(N)

 Fastest algorithm if data set fits into caches  No need to order events at same position  Count opening/closing events instead  Removes one radix sort pass Multiple cores  parallelize build   Most time spent in the lower tree levels One sub-tree  one core Stefan Popov Streaming Construction of KD Trees

Results

   Speed-up up to 50%     Only effective in the upper levels Limited by copying of object/events The larger the scene, the higher the speedup Performance independent of triangle order Small decrease in traversal performance (< 2%)  With 1024 samples Multi-threading  2.43x @ 4 cores (no local memory management) Stefan Popov Streaming Construction of KD Trees

Future Work

 Fully multi-threaded implementation  Carefully memory management on NUMA architectures Memory Memory CPU CPU CPU CPU Memory Memory  Extend to other spatial index structures  BVHs, BKD trees, SKD trees, … Stefan Popov Streaming Construction of KD Trees

Conclusion

 Streaming construction algorithm  50% speedup   Cost function sampling Very low quality degradation  Refining of samples Stefan Popov Streaming Construction of KD Trees

Stefan Popov

Thank you!

Streaming Construction of KD Trees

Advantages

 Sequential memory access in the upper levels  Small data foot print in conventional build   Fits in caches Radix sort is efficient  Less computations needed for split plane position estimation  But, what about the tree cost?

Stefan Popov Streaming Construction of KD Trees

Memory Managment

 Use two arrays and alternate them Object count for node

n

=

i n+1 - i n

Objects i n i n+1 Index array Sift to second array Object count +=

SP

Left only Left child’s objects SP x 2 Right only Right child’s objects i m i m+1 Index array i m+2 Streaming Construction of KD Trees Stefan Popov

SAH tree cost

 Optimal KD tree for ray tracing  SAH based  Minimize average expected traversal cost of an arbitrary ray    Stefan Popov Streaming Construction of KD Trees

SAH computation

 Efficient computation – extract & sort events in advance  Compute incrementally. Keep track of objects on left/right  Evaluate after close, before an open events Y 1 2 X 179 169 159 149 139 129 -2 18 38 58 78 98 Stefan Popov Streaming Construction of KD Trees

Alternative Multi-Threading

  required on NUMA architectures) Sub-tree  core not suitable for the first

log(#cores)

levels  Also unsuitable for some architecture (Cell)  Alternative    CPU CPU Gather event counts in bins at each core Merge counts before actual cost evaluation Stefan Popov Streaming Construction of KD Trees

Extension: Multi-Threading

  Multiple cores  parallelize build Most time spent in the lower tree levels  One sub-tree  one core Stefan Popov Streaming Construction of KD Trees