Database Operation On GPU

Transcript Database Operation On GPU

SIFT on GPU (the slides are not
updated for newer versions of SiftGPU)
Changchang Wu
5/8/2007
Outline
• Background and related implementation
• SIFT on GPU (SiftGPU)
• Goal: fast, general, flexible
• Conclusion and Future Work
SIFT (Lowe, IJCV04)
• Scale Invariant Feature Transform
• Detect and describe features that are
invariant to similarity transformation
• Popular technique in computer vision
• Panorama Generation
• Microsoft Photosynth
• Content Based Image Retrieval
SIFT (Lowe, IJCV04)
• Scale-space extrema detection
• Difference-of-Gaussian function
D( x, y, )  (G( x, y, k )  G( x, y, ))  I ( x, y)
• A close approximate of scale normalized
Laplacian of Gaussian
, more
stable than gradient, Hessian, or Harris
corner function.
• Maximum and minimum of DOG are
Invariant to scale change
Scale-space Construction
σdoubles for
the next octave,
just resample
For each octave:
s Intervals,
k=21/s,s+3
Gaussian images
Finding Local Extrema
Comparing a pixel (marked
with X) to its 26 neighbors in
3x3 regions at the current
and adjacent scales (marked
DOG space
with circles).
Also assign orientations to keypoints using the
maxima in local gradient orientation histogram (In a
window of size 3*sigma)
 ( x, y)  atan2((L( x, y  1)  L( x, y  1)) /( L( x  1, y)  L( x  1, y)))
Refinement
• Sub-pixel localization
• Fitting 3D quadratic function in the 3x3x3
cube to find sub-pixel location
• Edge elimination
Feature descriptor
• Select Gaussian image at expected scale
• Compute weighted histogram of gradient orientation
(relative to keypoint orientations)
128D vector =16 squares x 8(directions)
Existing implementations
• CPU version
• Lowe’s binary (http://www.cs.ubc.ca/~lowe/keypoints/)
• Andrea Vedaldi’s SIFT++
• (http://vision.ucla.edu/~vedaldi/code/siftpp/siftpp.html)
• C#(autopanosift), Matlab…
• GPU version
• Sudipta Sinha’s GPUSIFT
• Sebastian Heymann’s
Current Progress
•
•
•
•
•
•
•
•
•
•
Intensity conversion and sampling (cg + GLSL)
Image pyramid (cg + GLSL)
keypoint detection (cg + GLSL)
Sub-pixel localization (none)
Edge elimination (cg only)
Feature List generation (cg + GLSL+CPU)
Orientation (cg fp40 only)
Display List generation (cg + GLSL)
Descriptor generation (cg)
Visualization (cg + GLSL, Glut+win32)
• + means multiple versions of implementations
Scale Space Construction
• Run horizontal and vertical Gaussian filtering
separately
• When # of DOG level in an octave is 3, the largest
Gaussian kernel can be 19x19
• Compute difference of Gaussian in the same
pass since it is already read out
• Didn’t use Ping-pong, sometimes write and
read same texture, because not all channels
need to be changed.
Color channel mapping
• Use Texture from Destination instead of
PingPong
Keypoint Detection
• Compare with 26 neighbors?
• Do in 4 steps
• Intra-level comparing with 8 neighbors, (compute
gradient in this pass, and edge elimination)
• Store the maximum and minimum of the 9 pixels in
an auxiliary texture
• Early z culling based on the in-level suppression
• Comparing with the maximum and minimum of the
pixel at upper level and lower level
Feature List Generation on GPU
• Use Gernot Ziegler’s histogram pyramid
method. Use all RGBA chanels
1. Do reduction, and read
back the highest level.
2. Allocate texture to hold
the feature list
3. Traverse the pyramid to
get location
Feature Orientation
• Use a circular window (use 3*sigma as radius)
• Compute weighted histogram of orientations
(36 bins as 9 float4)
Binary search to locate desired bin
bin+=float4(fmod(idx,4)==float4(0,1,2,3))
• Smoothing the histogram
smoothing kernel can easily be large
one (1 3 6 7 6 3 1 )/27 as three (1 1 1)/3
• next
Feature Orientation
• Find the bins that are
• larger than 0.8 times the maximum
• Local maximum
• Do interpolation to get sub-bin orientation
• Save the largest N<=4M to RGBA of M
output textures
• Save N to the original texture, and set N
to 0 when N is larger than a threshold
Reshape Feature List
• Rebuild the feature list according to
orientations (variable # of orientations)
• Use the histogram pyramid method
Feature Descriptor
• Use 4 textures for MRT, and 8 RGBA
pixels in each. (8*4*4 = 128)
• Trilinear interpolation is implemented
A better Geometry Shader
Version is in Progress
Use 2*sigma ( instead of 6*sigma ) as box size to display here
Display VBO generation
• Display SIFT features as rotated/scaled
square to illustrate scale and orientation.
• Say feature texture is WxH (normally H is 1,
because no more than 2048..)
• Make a texture that is Wx(4H)
• For point (x, y), the index is Idx=y*W+x
• Then original index is idxo=Idx/4
• And sub-index is fmod(idx,4), and use sub-index to
offset and rotate this point
• Copy render result to VBO (vertex buffer
object)
Parameterization
• This SIFT on GPU also tries to give
flexibility by providing parameters
• # of octaves, # of levels, sigma0
• Starting octave, starting level
• Filter window size
• Orientation window size
• Descriptor window size
•…
• Shaders are dynamically generated
Result
• Speed on nVidia 8800
• 13 Hz on a 640*480 image
• 4 Hz on a 2048*1536 image
• Part can run on laptop
• Raedon X300 (Maximum instruction is 96)
• No orientation/Edge elimination/Descriptor
Conclusion
• Very close to sift++
• Finished a basic and also flexible
framework of SIFT
• Reduced CPU/GPU data transfer by
feature list generation on GPU
Future work
• Sub-pixel localization
• Try Geometry Shader or CUDA for
descriptor generation
• Try the packed texture format of
Sebastian Heymann’s implementation
• Compatibility with more Graphic Cards

Database Operation On GPU

Transcript Database Operation On GPU

Directory