Cache-Efficient Layouts of Bounding Volume Hierarchies (BVHs) Sung-Eui Yoon Lawrence Livermore National Laboratory Dinesh Manocha Univ.

Download Report

Transcript Cache-Efficient Layouts of Bounding Volume Hierarchies (BVHs) Sung-Eui Yoon Lawrence Livermore National Laboratory Dinesh Manocha Univ.

Cache-Efficient Layouts of Bounding Volume Hierarchies (BVHs)

Sung-Eui Yoon

Lawrence Livermore National Laboratory

Dinesh Manocha

Univ. of North Carolina at Chapel Hill

Goal

Compute cache-coherent layouts of bounding volume hierarchies (BVHs)

● ●

For various geometric applications Handles any kind of BVHs and spatial partitioning hierarchies (e.g., kd-tree)

2

Bounding Volume Hierarchies (BVHs)

Widely used data structures in:

Ray tracing

● ●

Collision detection Visibility culling

Ray tracing Dynamic simulation 3

Bounding Volumes (BVs)

Axis-aligned bounding boxes (AABBs)

Oriented bounding boxes (OBBs) [Gottschalk et al. 96]

Spheres [Hubbard 93]

Triangles of a mesh ●

Discrete orientation polytopes (k-DOPs) [Klosowski et al. 98]

4

Layout of BVHs

Nodes (and triangles) of BVHs are stored in arrays

● ●

What is a good layout?

How to compute cache-efficient layouts?

B A D C E Layout method 1D Layout of nodes: A B C D E 5

Motivation

Lower growth rate of data access speed Growth rate during 1993 – 2004

60 50 40 30 20 10 0 1.5X

Disk access speed

20X

RAM access speed

46X

CPU speed

Courtesy: http://www.hcibook.com/e3/online/moores-law/ 6

Memory Hierarchies and Caches Fast memory or cache Slow memory Block transfer CPU Disk Access time: 10 -6 sec 10 -4 sec 1 sec

7

Main Contributions

An algorithm computing cache-efficient layouts of BVHs

● ●

Probabilistic model Simple layout construction method

Applicable to spatial partitioning hierarchies

8

Related Work

● ● ●

Mesh layouts Layouts of search trees Layouts of BVHs

9

Related Work

Mesh layouts

Cache-coherent layouts of meshes and graphs [Yoon et al. 05, Yoon and Lindstrom 06] Require an input graph that represents access patterns on a BVH

● ●

Layouts of search trees Layouts of BVHs

10

Related Work

● ●

Mesh layouts Layouts of search trees

[Gil and Itai 99, Alstrup et al. 03] Require a probability function that each node will be accessed

Layouts of BVHs

11

Related Work

● ● ●

Mesh layouts Layouts of search trees Layouts of BVHs

Studied in collision detection [Ericson 04] and ray tracing [Havran 97]

Blocking-based layouts Emde Boas 77] [Terdiman 03, van

12

Outline

● ● ●

Probabilistic model Layout computation Results

13

Outline

● ● ●

Probabilistic model Layout computation Results

14

Traversals of Collision Queries on BVHs

Takes two objects

Two 3D objects for collision detection

One 3D object and one ray for ray tracing

BVH 1 BVH 2 15

Two Localities

● ●

Parent-child locality Spatial locality

16

Parent-child Locality

A B A B BVH 1 BVH 2 17

Spatial Locality

D C E D BVH 1 E C BVH 2 18

Probabilistic Model

Quantify localities in a uniform way

Measure the probability for localities

Based on geometric relationships between bounding volumes

19

Probabilistic Model

● ●

Pr (n)

Probability that a node, n, will be accessed during runtime traversal

Intersected

Two major factors

g ●

Prob. that p is accessed

Conditional prob. that p is also intersected given g is intersected

p n Accessed and Intersected b Pr(

n

)  Pr(

p

) Pr(

X p

 1 |

X g

 1 ), where X p (or X g ) is a boolean random variable indicating collision between p (or g) and b 20

Probability Computation

● Pr(

X p

 1 |

X g

 1 )

: Conditional prob. that p is also intersected given g is intersected

Do not know any information about b

g p Intersected Intersected b n 21

Contact Space

Contact space of b against p and g

Denoted as S p and S g

Pr(

X p

 1 |

X g

 1 ) 

Vol

(

S p

Vol

(

S g

)

S g

) Intersected g p Intersected n S p = p S g = g b b S p ∩S g 22

Contact Space

Assume b is a sphere

Computed from Minkowski sum

S g S p S g S p ∩S g b S p ∩S g S p b ●

Configuration space, in general

Too expensive to compute

23

Approximate Probability Computation

Assumes “b” to be a point, a degenerated case

● ●

Exact value is not required Only 5% incorrect decisions compared to considering many other cases

Surface area heuristics (SAH) [MacDonald and Booth 90, Havran 00]

Equivalent to our approximation

24

Outline

● ● ●

Probabilistic model Layout computation Results

25

Overview of Layout Algorithm

● ●

Cache-oblivious layout computation

Do not assume any particular cache block sizes

Designed to work well with various (geometric) block sizes [Yoon and Lindstrom 06] Two main steps in recursion

Cluster construction w/ parent-child locality

Layout clusters w/ spatial locality

26

Clustering

Minimize the working set size during collision queries

Maximize the sum of probabilities of nodes in a cluster

NP-complete even for cache-aware layout given a search query [Gil and Itai 99]

27

Greedy Clustering

Employ top-down greedy clustering

Compute balanced sized clusters

Maintain convexity [Gil and Itai 99]

Cluster 0.9

0.5

0.8

0.1

28

Layout of Clusters

Uses cache-oblivious layouts of meshes

[Yoon et al. 05]

Spatial locality 29

Layout of Clusters

Uses cache-oblivious layouts of meshes

[Yoon et al. 05]

Spatial locality 30

Outline

● ● ●

Probabilistic model Layout computation Results

31

Results

Collision detection

Use oriented bounding box (OBB) [Gottschalk et al. 96]

Breadth-first tree traversal

Ray tracing

Use kd-tree [Wald 04]

Depth-first tree traversal

32

Collision Detection – Robot and Power Plant Models

20k triangles 1M triangles 33

Collision Detection – Performance Comparison I 41% ~ 500% performance improvement 1200 1000 800 600 400 200 Working set size (KB) 0 COLBVH Our cache oblivious layout VEB

van Emde Boas layout

Collision time (ms/100) BFL

Breadth first layout

COM L

Cache oblivious mesh layout

Different layouts DFL

Depth first layout 34

Collision Detection – Performance Comparison II 35% ~ 2600% performance improvement 2500 2000 1500 1000 500 0 Working set size (KB) Collision time (ms/100) layout

Emde Boas layout first layout

Different layouts

oblivious mesh layout first layout 35

Cache-Oblivious Layout vs Cache-Aware Layout

● ●

Cache-aware layouts

Take advantage of block size information (4KB) Minor performance degradation

8% compared to cache-aware layouts

36

Ray Tracing – Lucy Model

28 million triangles Pentium IV with 1GB 37

Ray Tracing – Performance Comparison 77% ~ 180% performance improvement

1200 1000 800 600 400 200 0

Render time (sec) layout Working set size (MB)

Boas layout BFL Breadth-first layout

Different layouts

layout 38

Major Differences over Other Layouts

Commonly used layouts

Consider connectivity of trees

Two improvements of our layouts

Probabilistic model based on geometry

Layout method considering two different localities

39

Limitations

● ● ●

No guarantee that our layout always improves the performance May not improve the performance of computationally intensive queries (e.g., exact penetration depth computation) Assumes that collision algorithm does not use front tracking

40

Advantages

Generality

Works with any geometric hierarchies

Does not require cache parameters

Usability

Can gain performance improvement without modifying codes

Replaces only data layouts

41

Conclusion

Cache-efficient layouts of BVHs

Probabilistic model

● ●

Simple layout construction method Applied to collision detection and ray tracing

42

Ongoing and Future Work

● ●

Extend to other proximity and LOD queries [Yoon et al. 06] Investigate other geometric hierarchies

Improve the quality of hierarchies

Apply to deforming models [Lauterbach et al. 06]

43

Acknowledgements

● ●

Model contributors Funding agencies

● ●

Army Research Office DARPA

Intel

● ● ● ● ●

Lawrence Livermore National Laboratory Microsoft National Science Foundation Office of Naval Research RDECOM

44

Acknowledgements

● ● ● ● ● ● ● ● ●

Russ Gayle Ted Kim Ming Lin Peter Lindstrom Brandon Lloyd Valerio Pascucci Stephane Redon LLNL data analysis group members Anonymous reviewers

45

Questions?

Thanks!

46

UCRL-PRES-223220

This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-ENG-48.

Note: this talk is not supported or sanctioned by DoE, UC, LLNL, CASC

47

Additional slides

48

BVHs of Massive Models

Complex and massive models

Double eagle tanker (82M triangles) Isosurface (472M) ●

High memory requirement

Can have gigabyte data size

St. Matthew (372M) 49

Memory Hierarchies

Size 1KB Register 1MB Caches 1GB Main memory > 1GB Disk storage Speed 10 0 ns 10 1 ns 10 2 ns 10 4 ns 50

Mesh Layouts

● ●

Rendering sequences

Triangle strips

[Deering 95, Hoppe 99, Bogomjakov and Gotsman 02] Processing sequences

[Isenburg and Gumhold 03, Isenburg and Lindstrom 04] Assume that access pattern globally follows the layout order!

51

Mesh Layouts

Cache-aware and cache-oblivious layouts of meshes and graphs

[Yoon et al. 05, Yoon and Lindstrom 06] Require an input graph that represents access patterns on a BVH

52

Layouts of Search Trees

● ●

Cache-aware layout of search tree [Gil and Itai 99] Cache-oblivious search tree layout [Alstrup et al. 03] Require a probability function that each node is accessed

53

Layouts of BVHs

● ●

Realtime collision detection book [Ericson 04] Layouts analysis in ray tracing [Havran 97]

● ●

Opcode [Terdiman 03]

Uses blocking van Emde Boas layout [van Emde Boas 77]

Uses recursive blocking

54

Layout of Clusters

Uses cache-oblivious layouts of meshes

[Yoon et al. 05]

Spatial locality 55