Interactive View-Dependent Rendering with Conservative

Download Report

Transcript Interactive View-Dependent Rendering with Conservative

Cache-Oblivious Mesh
Layouts
Sung-Eui Yoon,
2
Valerio Pascucci,
1
2
Peter Lindstrom
1
Dinesh Manocha
1: University of North Carolina - Chapel Hill
2: Lawrence Livermore National Laboratory
http://gamma.cs.unc.edu/COL
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Goal
• Compute cache-coherent layouts
of polygonal meshes
♦ For geometric processing and
visualization
♦ Handle any kinds of polygonal models
(e.g., irregular geometry)
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Motivation
• High growth rate of computational
power of CPUs and GPUs
Growth rate
during 1993 – 2004
50
45
40
35
30
25
20
15
10
5
0
Disk
access
speed
RAM
access
speed
CPU
speed
Courtesy:
http://www.hcibook.com/e3/online/moores-law/
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Memory Hierarchies and
Caches
Fast memory Slow memory
or cache
CPU or
GPU
Block
transfer
Disk
Access time: 100ns
102ns
106ns
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Cache-Coherent Layouts
• Cache-Aware
♦ Optimized for particular cache
parameters (e.g., block size)
• Cache-Oblivious
♦ Minimizes data access time without any
knowledge of cache parameters
♦ Directly applicable to various hardware
and memory hierarchies
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
CAD Model –
Double Eagle Tanker Model
82 million triangles
Irregular distribution of geometry
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Isosurface and Scanned
Models
Isosurface
100M triangles
St. Matthew
372M triangles
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Main Contribution
• Algorithm to compute cacheoblivious layouts of polygonal
meshes
Cache-oblivious metric
Multilevel optimization
framework
Applicable to
hierarchical representations
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Live Demo – ViewDependent Rendering (VDR)
• Based on multiresolution hierarchy
♦ Dynamically computes simplification
♦ Cache-oblivious layout is used to minimize
GPU vertex cache misses
GeForce Go
6800 Ultra
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Related Work
• Cache-coherent algorithms
• Mesh layouts
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Cache-Coherent Algorithms
• Cache-aware [Coleman and
McKinley 95, Vitter 01, Sen et al.
02]
• Cache-oblivious [Frigo et al. 99,
Arge et al. 04]
Focus on specific problems such as
sorting and linear algebra computations
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Mesh Layouts
• Rendering sequences
♦ Triangle strips
♦ [Deering 95, Hoppe 99, Bogomjakov and
Gotsman 02]
• Processing sequences
♦ [Isenburg and Gumhold 03, Isenburg and
Lindstrom 04]
Assume that access pattern
globally follows the layout order!
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Mesh Layouts
• Space-filling curves
♦ [Sagan 94, Velho and Gomes 91, Pascucci
and Frank 01, Lindstrom and Pascucci 01,
Gopi and Eppstein 04]
Assume geometric regularity!
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Outline
• Overview
• Cache-oblivious metric
• Results
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Outline
• Overview
• Cache-oblivious metric
• Results
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Overview
va
vb
vd
Input graph
vc
Multilevel optimization
Cache-oblivious metric
Local permutations
Result 1D layout
va
vb vd
vc
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Graph-based
Representation
• Undirected graph, G = (V, E)
♦ Represents access patterns of
applications
• Vertex
va
vb
vd
vc
♦ Data element
♦ (e.g., mesh vertex or mesh triangle)
• Edge
♦ Connects two vertices if they are likely to
be accessed sequentially
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Problem Statement
• Vertex layout of G = (V, E)
♦ One-to-one mapping of vertices to indices
in the 1D layout
: V
 {1, ... , | V |}
va
vb
vd

1
2
3
4
va
vb vd
vc
vc
• Compute a  that minimizes the
expected number of cache misses
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Local Permutation
Vertex layout
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Terminology
• Edge span of (va, vb)
|  (va )   (vb ) |
 (va )  1
Layout mapping
|  (va )   (vc ) | 4
 (vc )  5
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Terminology
• Ei
♦ Set of edges having edge span i in the
layout
4
(va , vc )  E4
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Terminology
• Edge span distribution
♦ | E | where i is in [1, n]
i
1
4
3
1
2
1
| E1 | 4
| E2 | 1
| E3 | 1
| E4 | 1
4
Number
of edges
1
1 2 3 4
Edge span
1
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Cache Miss Ratio Function
(CMRF), pi
• Probability of a cache miss for a
given edge span i
Cache miss ratio =
Probability to have
a cache miss
pi
1
0
1
i
Edge span
n-1
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Number of Cache Misses at
Runtime
• Estimated by multiplying two
factors
♦ Runtime edge span distribution
♦ CMRF
(
p2
+
Edge span 2
p4
+
Edge span 4
p2)  (2,1)  ( p2, p4)
Edge span 2
1D Layout:
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Number of Cache Misses at
Runtime
Runtime edge span
distribution
CMRF
(
p2
+
Edge span 2
p4
+
Edge span 4
p2)  (2,1)  ( p2, p4)
Edge span 2
1D Layout:
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Expected Number of Cache
Misses
Edge span distribution of the layout
The number
of vertices
n 1
| E | p
i 1
i
i
♦ Approximate runtime edge span
distribution with one of the layout
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Outline
• Overview
• Cache-oblivious metric
• Results
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Cache-Oblivious Metric
• Decides if a local permutation
reduces number of cache misses
♦ Probabilistic formulation
♦ Reduces to geometric volume computation
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Does a Local Permutation
Decrease Cache Misses?
n 1
n 1
?
 | Ei | pi   (| Ei |  | Ei |) pi
i 1
i 1
| Ei |
| Ei |  | Ei |
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Does a Local Permutation
Decrease Cache Misses?
n 1
n 1
 | E | p   (| E |  | E |) p
i 1
i
i
i 1
n 1

 | E | p
i 1
i
i
i
i
i
0
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Monotonocity of CMRF,
pi
• Assume CMRF is a monotonically
increasing function of edge span
1
Cache miss
ratio
0
pi
1
i
Edge span
∞
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Exact Cache-Oblivious
Metric
n 1

 | E | p
i 1
i
i
0
Monotonicity of CMRF
where
0  p1  p2  ...  pn2  pn1  1
All the possible cache
configurations
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Geometric Formulation
Half hyperspace
p2
n 1
 | E | p
i 1
i
i
0
0
p1
where
0  p1  p2  ...  pn2  pn1  1 p2
Closed hyperspace

n 1
0
p1
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Geometric Volume
Computation
• Assume each CMRF to be equally
likely
n 1
 | E | p
i 1
i
i
p2
0
where
0  p1  p2  ...  pn2  pn1  1
0
p1
• Half hyperspace (blue area)
♦ Space of CMRFs that reduce cache misses
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Geometric Volume
Computation
Time complexity
n 1
♦ Exact: O(n
♦ Approximate:
) [Lasserre and Zeron 01]
5 [Kannan et al. 97]
O(n )
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Fast and Approximate
Volume Comparison
• Define a top polytope in closed
hyperspace
• Compute the centroid, C, of the
top polytope
Top polytope
Centroid, C
p2
0
p1
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Fast and Approximate
Volume Comparison
• Use the centroid for approximate
volume comparison
♦ The volume containing the centroid is likely
to be larger
Centroid, C
p2
0
p1
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Bound of Approximation
• 0.1% ~ 0.3% compared to the
exact metric
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Final Approximate Metric
Centroid
m
 | E
j 1
l( j)
Pack non-zero
| j 0
 | Ei | to 1,…, m
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Layout Optimization
• Find an optimal layout that
minimizes our metric
♦ Combinatorial optimization problem
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Multilevel Minimization
Step 1:
Coarsening
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Multilevel Minimization
Step 2:
Ordering of coarsest graph
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Multilevel Minimization
Step 3:
Refinement and
local optimization
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Outline
• Overview
• Cache-oblivious layouts
• Results
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Layout Computation Time
• Process 70 million vertices per
hour
♦ Takes 2.6 hours to lay out St. Matthew
model (372 million triangles)
♦ 2.4GHz of Pentium 4 PC with 1 GB main
memory
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Number of edges
Edge Span Distributions of
Different Layouts
Cache-oblivious layout
Original layout
Spectral layout
Edge span
>
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Applications
• View-dependent rendering
• Collision detection
• Isocontour extraction
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
View-Dependent Rendering
• Layout vertices and triangles of
CHPM [Yoon et al. 04]
♦ Reduce misses of GPU vertex cache
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
View-Dependent Rendering
Peak performance: 145 M tri / s on
GeForce 6800 Ultra
Models
# of Tri.
Our
layout
St.
Matthew
372M
106 M/s
Isosurface
100M
Double
Eagle
Tanker
82M
Simplification layout
[Yoon et al. 04]
23 M/s
4.5X
90 M/s
20 M/s
2.1X
22 M/s
47 M/s
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Realtime Captured Video – St.
Matthew Model
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Comparison with Other
Rendering Sequences
1.2
1.1
1
Cache
miss ratio 0.9
(misses 0.8
per
0.7
triangle) 0.6
0.5
0.4
0.3
Universal rendering sequences
[Bogomjakov and Gotsman 2002]
Our layout
8
32
16
Vertex cache size
64
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Comparison with Other
Rendering Sequences
1.2
[Hoppe 99]
1.1
Optimized for 16 vertex cache size
1
Cache 0.9
with FIFO replacement
miss ratio
0.8 Our layout
(misses
0.7
per
triangle) 0.6
0.5 Optimized for no particular cache size
0.4
0.3
64
32
16
8
Vertex cache size
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Performance during ViewDependent Rendering
0.9
[Hoppe 99]
0.85
0.8 Optimized for full resolution
Cache miss 0.75
ratio
0.7
(given cache
Our layout
0.65
size 32)
0.6
0.55
Optimized for various resolutions
0.5
100%
75%
50%
25%
10%
Resolution
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Comparison with Space Filling
Curve on Power Plant Model
2
1.9
1.8
1.7
1.6
Cache miss
1.5
ratio
1.4
1.3
1.2
1.1
1
Space filling curve (Z-curve)
Our layout
8
16
32
Vertex cache size
64
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Collision Detection
• Bounding volume hierarchies
♦ Widely used to accelerate the
performance of collision detection
♦ Traversed to find contacting area
♦ Uses pre-computed layouts of OBB trees
[Gottschalk et al. 96]
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Rigid Body Simulation
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Collision Detection Time
Depth-first layout
2X on average
Cache-oblivious layout
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Isocontour Extraction
• Contour tree [van
Kreveld et al. 97]
• Use mesh as the
input graph
• Extract an
isocontour that is
orthogonal to z-axis
Puget sound,
134 M triangles
Isocontour
z(x,y) = 500m
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Comparison – First
Extraction of Z(x,y) = 500m
Disk access time is bottleneck
25
Relative
Performance
over
Z-axis sorted
layout
20
21
15
13
10
5
0
2
Cacheoblivious
layout
1
Z-axis
sorted
Y-axis
sorted
Spectral
layout
Nearly optimized for particular isocontour
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Comparison – Second
Extraction of Z(x,y) = 500m
Relative
Performance
over
Z-axis sorted
layout
400
350
300
250
200
150
100
50
0
379
21
13
212
0.8
1
2
Cacheoblivious
layout
Z-axis
sorted
Y-axis
sorted
Spectral
layout
Memory and L1/L2 cache access times are bottleneck
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Limitations
• Assumptions on CMRF
♦ May not work well for all applications
• Does not compute global
optimum
♦ Greedy solution
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Advantages
• General
♦ Applicable to all kinds of polygonal models
♦ Works well for various applications
• Cache-oblivious
♦ Can have benefit from CPU/GPU cache to
memory and disk
• No modification of runtime
application
♦ Only layout computation
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
OpenCCL: Cache-Coherent
Layouts of Graphs and Meshes
• Source codes for computing a
cache-coherent layout
• Easy to use
CLayoutGraph
Graph Oblivious
(NumVertex); Mesh Layout”
Google
“Cache
0
or
Graph.AddEdge (0, 1);
Graph.AddEdge
(0, 2);
Http://gamma.cs.unc.edu/COL
Graph.AddEdge (1, 2);
1
2
int Order [NumVertex];
Graph.ComputeOrdering (Order);
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Conclusion
• Novel algorithm for computing
cache-oblivious mesh layouts
♦ Cast the problem as an optimization
♦ Probabilistically compute the expected
number of caches misses
♦ Achieve significant improvements (2 to
20X) without modifying runtime
applications
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Ongoing and Future Work
• Apply to other applications
♦ Simplification and approximate collision
detection [Yoon et al. 04]
♦ Shortest path computation, etc.
• Investigate optimality
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Ongoing and Future Work
• Cache-Oblivious Layouts of
Bounding Volume Hierarchies
[Yoon and Manocha 05]
♦ Tech. Report, University of North Carolina
at Chapel Hill
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Acknowledgements
• Anonymous donor
♦ Power plant model
• Digital Michelangelo Project
♦ St. Matthew model at Stanford University
• LLNL ASCI VIEWS
♦ Isosurface model
• Newport news shipbuilding
♦ Double eagle tanker
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Acknowledgements
•
•
•
•
•
•
•
Army Research Office
DARPA
Intel Corporation
Lawrence Livermore Nat’l Lab.
National Science Foundation
Office of Naval Research
RDECOM
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Acknowledgements
•
•
•
•
•
•
Martin Isenburg
Dawoon Jung
Brandon Lloyd
Elise London
Brian Salomon
Avneesh Sud
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Questions?
Project URL
http://gamma.cs.unc.edu/COL
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL