Transcript Experiences with fast KD tree construction
Experiences with Streaming Construction of SAH KD Trees
Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek
Motivation
Large speed-up of ray tracing lately Better algorithms (packet tracing [Wald04, Reshetov05] ) Optimized spatial index structures Best known: KD trees [Havran00] Faster hardware Research concentrated mainly on static scenes Dynamic scenes Building – slow for SAH based KD trees Done in a pre-processing step Stefan Popov Streaming Construction of KD Trees
Dynamic Scenes Approaches
Embed dynamics in the index structure Use a two level approach [Wald 03 ] Fuzzy KD trees [Günther06] Update index structure Grids, BVHs and KD tree hybrids Faster build/update Lower traversal performance No efficient approach for KD trees Rebuild entire KD tree Need to make it fast Lazy build Stefan Popov Streaming Construction of KD Trees
SAH Algorithm
Extract & sort events in advance Abstract objects with AABBs Events given by AABB boundaries 1 3 Recursive top-down construction Find split plane using SAH Compute minimum cost Distribute objects to children By distributing the events Keep them sorted 2 4 5 8 1, 2, 3, 4, 5, 6, 7, 8 X: 68 7 6 Left 1, 2, 3, 4 Right 4, 5, 6, 7, 8 Stefan Popov Streaming Construction of KD Trees
SAH Cost Function
Piecewise linear Discontinuities at object boundaries Evaluate only before opening and after closing event Y 1 2 X 179 169 159 149 139 129 -2 18 38 58 78 98 Stefan Popov Streaming Construction of KD Trees
Distribution Along the Split Axis
Given: event list & split position Sweep event list and classify Open event Before split label object “both” After split label object “right” Close event Before split re label object “left” Copy event to corresponding child’s list Might have to insert new events Random memory access [ Right Left Both Left [ ] [ ] Right ] X o b e b a L th th o b l e b a L Both l e b a L t h g ri [ [ ] ] [ [ Streaming Construction of KD Trees ] ] Stefan Popov
Distribution Along the Other Axes
Sweep event lists. Copy event to Left, if corresponding object labeled “left” or “both” Right, if corresponding object labeled “right” or “both” Look up in object array Random memory access Y Y Y Y Right Left Both Stefan Popov Streaming Construction of KD Trees
Problems of KD Tree Construction
Random memory accesses Expensive cost function evaluation Initial sorting – inefficient for lazy builds Stefan Popov Streaming Construction of KD Trees
Streaming Algorithm Overview
Work with unsorted lists of AABBs Avoid initial sorting Sweep list once to locate initial split plane In a single sweep Distribute objects (straightforward) Determine split positions of children Once data fits in caches, switch to conventional build Parent list Stefan Popov Left list Right list Streaming Construction of KD Trees
SAH Cost Estimation
Cost function typically varies only slowly No need to evaluate SAH at every event Use sampling!
18000 16000 14000 12000
SAH
10000 8000 Real minimum Minimum found 6000 -0.4
-0.3
-0.2
-0.1
0 0.1
0.2
0.3
0.4
0.5
Naïve approach For every event: check all samples
O(kN)
How to sample efficiently?
Stefan Popov Streaming Construction of KD Trees
Efficient Sampling
Two step approach #Objects to left of sample = # Opening events to its left #Objects to right of sample = # Closing events to its right Count opening/closing events between samples Regular sampling index computation in
O(1)
Reconstruct left/right object counts at samples Using two partial sums from left and right
O(k+N)
[ 1 0 [ ] 1 1 [ 1 1 ] 0 1 ] Stefan Popov 0 3 1 3 2 2 3 1 3 0 Streaming Construction of KD Trees
Refining of Samples
SAH – sum of two monotone functions –
C l
and
C r
Cost between two samples
a
<
C
C min = min
(
C l
)
+ min
(
C r
)
= C l
(
a
)
b
is bounded from below
+ C r
(
b
) Resample areas where
C min <
current minimum Typically only few intervals need to be re-sampled (< 5%) 18000 16000 14000 12000 10000 Current minimum 8000 6000 4000 2000 0 -0.4
-0.3
-0.2
C l -0.1
C = C l + C r
0 0.1
0.2
C r 0.3
0.4
0.5
Stefan Popov Streaming Construction of KD Trees
Algorithm properties
Streaming memory accesses SAH cost function estimated by sampling No initial sorting required Refining of Samples Stefan Popov Streaming Construction of KD Trees
Improvements
Conventional Algorithm Use radix sort –
O(N)
Fastest algorithm if data set fits into caches No need to order events at same position Count opening/closing events instead Removes one radix sort pass Multiple cores parallelize build Most time spent in the lower tree levels One sub-tree one core Stefan Popov Streaming Construction of KD Trees
Results
Speed-up up to 50% Only effective in the upper levels Limited by copying of object/events The larger the scene, the higher the speedup Performance independent of triangle order Small decrease in traversal performance (< 2%) With 1024 samples Multi-threading 2.43x @ 4 cores (no local memory management) Stefan Popov Streaming Construction of KD Trees
Future Work
Fully multi-threaded implementation Carefully memory management on NUMA architectures Memory Memory CPU CPU CPU CPU Memory Memory Extend to other spatial index structures BVHs, BKD trees, SKD trees, … Stefan Popov Streaming Construction of KD Trees
Conclusion
Streaming construction algorithm 50% speedup Cost function sampling Very low quality degradation Refining of samples Stefan Popov Streaming Construction of KD Trees
Stefan Popov
Thank you!
Streaming Construction of KD Trees
Advantages
Sequential memory access in the upper levels Small data foot print in conventional build Fits in caches Radix sort is efficient Less computations needed for split plane position estimation But, what about the tree cost?
Stefan Popov Streaming Construction of KD Trees
Memory Managment
Use two arrays and alternate them Object count for node
n
=
i n+1 - i n
Objects i n i n+1 Index array Sift to second array Object count +=
SP
Left only Left child’s objects SP x 2 Right only Right child’s objects i m i m+1 Index array i m+2 Streaming Construction of KD Trees Stefan Popov
SAH tree cost
Optimal KD tree for ray tracing SAH based Minimize average expected traversal cost of an arbitrary ray Stefan Popov Streaming Construction of KD Trees
SAH computation
Efficient computation – extract & sort events in advance Compute incrementally. Keep track of objects on left/right Evaluate after close, before an open events Y 1 2 X 179 169 159 149 139 129 -2 18 38 58 78 98 Stefan Popov Streaming Construction of KD Trees
Alternative Multi-Threading
required on NUMA architectures) Sub-tree core not suitable for the first
log(#cores)
levels Also unsuitable for some architecture (Cell) Alternative CPU CPU Gather event counts in bins at each core Merge counts before actual cost evaluation Stefan Popov Streaming Construction of KD Trees
Extension: Multi-Threading
Multiple cores parallelize build Most time spent in the lower tree levels One sub-tree one core Stefan Popov Streaming Construction of KD Trees