Texturing Massive Terrain
Download
Report
Transcript Texturing Massive Terrain
Colt McAnlis
Graphics Programmer – Blizzard
60 minutes (ish)
Texturing data is too large to fit into
memory
Texturing data is unique
Lots of resolution
Down to maybe 1meter / per pixel
Vertex data
General terrain texturing issues
Low End Hardware
Review of technologies
Paging & Caches
DXT++ Compression
Compositing frameworks
Editing Issues
Example Based Texture Synthesis
Only subsection visible at a time
Non-visible areas remain on disk
New pages must be streamed in
Quickly limited by Disk I/O
Fast frustum movements kill perf
New pages occur frequently
Instead page in full radius around
player
Only need to stream in far-away pages
Chunks stream in levels of mipmaps
As Distance changes, so does LOD
New mip levels brought in from disk
Textures typically divided across
chunk bounds
Not ideal for Draw call counts..
Each chunk has it’s own mipchain
Difficult To filter across boundaries
But we don’t need full chains at each
chunk
Radial paging requires less memory
Would be nice to have easier
filtering
What if we had one large mip-chain?
Use one texture per ‘distance’
All textures are same size
Resolution consistent for range
As distance increases, quality decreases
Can store as 3d texture / array
Only bind 1 texture to GPU
The benefit of this is that we can use
1 texture
No more filtering-across-boundary
issues
Texturing no longer a reason for
breaking batches
1 sample at 1 level gets proper filtering
Mip mapping still poses a problem
though
Since mips are separated out
Each ‘distance’ only needs 2 mips
At distance boundaries, mip levels
should be identical.
Current mip, and the next smallest
Current distance is mipped out to next
distance
Memory vs. perf vs. quality tradeoff
YMMV
Mip Transition
MipChain
How do we update the texture?
GPU resource?
Should use render-to-texture to fill it.
But what about compression?
Can’t RTT to compressed target
GPU compress is limited
Not enough cycles for good quality
Shouldn’t you be GPU bound??
So then use the CPU to fill it?
Lock + memcpy
Paging & Caches
DXT++ Compression
Compositing frameworks
Editing Issues
Example Based Texture Synthesis
Goal : Fill large texture on CPU
Problem : DXT is good
But other systems are better (JPG)
ID Software:
JPEG->RGBA8->DXT
Re-compressing decompressed streams
2nd level quality artifacts can be introduced
Decompress / recompress speeds?
We have to end up at GPU friendly
format
Remove the Middle man?
Sooner or later..
We would need to decompress directly to DXT
Means we need to compress the DXT data MORE
Let’s look at DXT layout
DXT1 : Results in 4bpp
High 565
Low 565
2 bit
Selectors
In reality you tend to have a lot of them :
512x512 texture is 16k blocks
…
Really, two different types of data per texture
16 bit block colors
2bit selectors
Each one can be compressed even further
Input texture :
Potential for millions of colors
Input texture :
Actual used colors
16 bit compressed
Used colors
•Two unique colors per block.
•But what if that unique color exists in other blocks?
•We’re duplicating data
•Let’s focus on trying to remove duplicates
Lossless data compression
Represents least-bit dictionary set
IE more frequently used values have
smaller bit reps
String : AAAABBBCCD (80 bits)
Symbol
Used %
Encode
A
50%
0
B
25%
10
C
15%
110
D
5%
111
Result : 00001010101101101111 (20 bits)
More common colors will be given
smaller indexes
4096 identical 565 colors = 8kb
Huffman encoded = 514 bytes
4k single bits, one 16 bit color
Problem : As number of unique
colors increases, Huffman becomes
less effective.
Similar colors can be quantized
Vector Quantization
Human eye won’t notice
Groups large data sets into correlated
groups
Can replace group
elements with
single value
Step #1 - Vectorize unique input
colors
Step #2 – Huffmanize quantized
colors
Reduces the number of unique colors
Per-DXT block, store the Huffman index
rather than the 565 color.
W00t..
Each selector block is a small
number of bits
Chain 2bit selectors together to
make larger symbol
Can use huffman on these too!
4x4 array of 2bit –per block values
Results in four 8 bit values
Or a single 32 bit value
Might be too small to get good compression
results
Doesn’t help much if there’s a lot of unique
selectors
Do tests on your data to find the
ideal size
8bit-16 bit works well in practice
DXT Data
Seperate
Vector
Quantization
Q Block
Colors
Huffman
Huffman
Table
Color Indexes
Selector Indexes
Selector
Bits
Huffman
Huffman
Table
TO DISK
Block
Colors
Color Indexes
Selector Indexes
Huffman Table
Selector Bits
Huffman Table
Block Colors
Fill DXT blocks
Uncompressed
DXT1 (4bpp)
DXT1 ++
3mb
512kb
91kb
Uncompressed
DXT3A (4bpp)
DXT ++
1mb
512kb
9kb
Getting back to texturing..
Insert decompressed data into mipstack level
Can lock the mip-stack level
Update the sub-region on the CPU
Decompression not the only way..
Paging & Caches
DXT++ Compression
Compositing frameworks
Editing Issues
Example Based Texture Synthesis
Pages for the cache can come from
anywhere
Doesn’t have to be compressed unique
data
What about splatting?
Standard screenspace method
Can we use it to fill the cache?
Splatting is standard texturing
method
Re-render terrain to screen
Bind new texture & alpha each time
Results accumulated via blending
De facto for terrain texturing
Same process can work for our
caching scheme
Don’t splat to screen space,
Get same memory benefits
Composite to page in the cache
What about compression?
Can’t composite & compress
Alpha blending + DXT compress???
Composite->ARGB8->DXT
Compression is awesome
Repeating textures + low-res alpha
But we could get better results
= large memory wins
Decouples us from Verts overdraw
Which is a great thing!
Quality vs. Perf tradeoff
Hard to get unique quality @ same perf
More blends = worse perf
Trade uniqueness for memory
Tiled features very visible.
Effectively wasting cycles
Re-creating the same asset every frame
Mix of compositing & decompression
Fun ideas for foreground /
background
Switch between them based on distance
Fun ideas for low-end platforms
High end gets decompression
Low end gets compositing
Fun ideas for doing both!
A really flexible pipeline..
Decompress
Cache
Disk
Data
CPU
Compress
2D
Compositor
GPU
Compress
Paging & Caches
DXT++ Compression
Compositing frameworks
Editing Issues
Example Based Texture Synthesis
Standard pipelines choke on data
Designed for 1 user -> 1 asset work
Mostly driven by source control setups
Need to address massive texturing
directly
Problem with allowing multiple
artists to texture a planet.
1 artist per planet is slow…
Standard Source Control concepts
fail
If all texturing is in one file, it can only
safely be edited by one person at a time
Solution : 2 million separate files?
Need a better setup
Allows multiple users to edit
texturing
User Feedback is highly important
Edited areas are highlighted
immediately to other users
Highlighted means ‘has been changed’
Highlighted means ‘you can’t change’
Texturing Server
Change Made
Artist A
Data Updated
Artist B
Custom merge tool required
Each machine only checks in their
sparse changes
Server handles merges before
submitting to actual source control
Acts as ‘man in the middle’
Source Control
Texturing
Server
Changes
Changes
Artist A
Artist B
What about planet-sized batch
operations?
Could modify entire planet at once?
Would ignore affected areas?
Double Edged Sword..
Important to still have batching.
Maybe limit batch operation distances?
Flag if trying to modify edited area?
Common texturing concepts
Set texture by slope
Set texture by height
Set texture by area
Could we extend it further?
View ‘set’ operations as ‘masks’
Set texturing by procedural functions
Combine masks in a graph setup
Common concept
.kkriger, worldmachine, etc
Masks can re-generate based upon
vertex changes
Generate multiple masks for other
data
As long as you store the graph, not the
mask.
Apply trees, objects, etc
Cool algorithms here for all
Paging & Caches
DXT++ Compression
Compositing frameworks
Editing Issues
Example Based Texture Synthesis
Repeating textures causes problems
Takes more blends to reduce repetition
Increases Memory
Increases perf Burden
Would be nice to fix that
automagically
Generates output texture per-pixel
Chooses new pixel based upon current
neighborhood
Represent input pixel as a function
of its neighbors
Create search acceleration structure
Find ‘neighborhood’ similar to input
This is known as ‘Per-pixel’
synthesis
Texture being synthesized
Exemplar
Basically a Nearest Neighbor search
Doesn’t give best quality
Only correcting input pixel based upon
previously corrected neighborhood
Introduces sequential dependencies
Need to increase neighborhood size to
get better results
This increases sample time
Exemplar
Noisy Output image
Hoppe 2006 (Microsoft Research)
Multi-resolution: Fixes pixels at
various sizes in output image
This ‘keeps’ course texture features
Reduces image artifacts
GPU based
Highly controllable
Artists / mesh provided vector fields
Can synthesize large textures
Use terrain normals as input
Allows texture to ‘flow’ with contours
Allow artists to adjust vectors
Rather than have same repeating
texture
So they can paint custom swirls etc.
Could even use to synthesize terrain
vertex data
But that’s another talk ;)
Still too slow to composite MASSIVE
terrain @ edit time
Synthesize the whole planet?
Would have to be a render-farm
process.
Actually, still too slow to do nonmassive terrain..
Maybe generate custom decals?
But what about CPU?
Multicore may shed light on it
Future research?
Use 1 Texture Resource for texture
data
Use DXT++ to decrease footprint
MipStack structure
W/o using RGBA->DXT
Multi-input cache filling algorithms
Stream + Composite
Use Custom texturing server
Make texture synthesis Faster!!
I’m talking to you Mr. Hoppe ;)
Andrew Foster
Rich Geldreich
Ken Adams